Linear quadratic regulator is used to stablize a time-invariant system if the system is stablizable.
A control system in the state-space form:
where $x_t$ is state, $u_t$ is control action, $w_t$ is noise $N(0, \sigma^2)$.
is called stabilizable if it is possible to construct an unconstrained control action $u$, which results in
\begin{equation*} \lim_{t \to \infty} x_t = 0 \end{equation*}With the consideration that the value/cost function for the optimization is quadratic, it is possible to derive the optimal action $u_t$ in the format of
\begin{equation*} u_{t}=-L_t x_{t} \end{equation*}The goal is to minimize the cost function in order to find the optimal action $u_t$.
The concept of Q function refers to a mathematical function that estimates the expected long-term reward of taking a specific action in a given state, assuming taking the optimal actions for all the future states. The Q stands for "quality".
The format of Bellman's equation is used to derive the Q values at a specific state with a specific action recursively.
\begin{equation*} Q_{t}(x_t,u_t)=E[g(x, u, w)+\gamma \min_{u_{t+1}} Q_{t+1}(x_{t+1}, u_{t+1})] \end{equation*}The concept of a Value function represents the expected long-term reward at a given state.
It is essentially the Q function, assigned with the optimal action, at the same given state.
Therefore, the Value function can be written as \begin{equation*} V_{t}(x_t)=\min_{{u_t}}Q_{t}(x_t,u_t)=\min_{{u_t}}E[g(x, u, w)+ V_{t+1}(x_{t+1})] \end{equation*}
where the discount factor $\gamma$ is assigned to be 1 and $g(x, u, w)$ is the cost profile in the system.
\begin{equation*} g(x, u, w)=x_t^T M_t x_t+u_t^T R_t u_t \end{equation*}In order to derive the optimal action to be
\begin{equation*} u_{t}=-L_t x_{t} \end{equation*},
it is required that the value function $V_{t}(x_{t})$ is quadratic.
Make $V_{t}(x_{t})$ to have this format
\begin{equation*} V_{t}(x_t)=x_{t}^{T}K_{t}x_{t}+\alpha_t \end{equation*}\begin{equation*} V_{t+1}(x_{t+1})=x_{t+1}^{T}K_{t+1}x_{t+1}+\alpha_{t+1} \end{equation*}In order to link the relationship between $V_t$ and $u_t$, insert the form $x_{t+1}=Ax_{t}+Bu_{t}+w_{t}$ into the Value function and insert the form of $g(x,u,w)$ in terms of $x_t$ and $u_t$.
\begin{align} V_{t}(x_t) &=\min_{{u_t}}E[g(x, u, w)+ V_{t+1}(x_{t+1})] \\ &=\min_{{u_t}}E[x_t^T M_t x_t+u_t^T R_t u_t+(Ax_{t}+Bu_{t}+w_{t})^T K_{t+1} (Ax_{t}+Bu_{t}+w_{t})+\alpha_{t+1}] \end{align}By reorganizing the terms and cancel the term associated with $E[w_t]=0$,
\begin{equation*} V_t(x_t)=\min_{{u_t}}E[x_t^T (M_t+A^TK_{t+1}A) x_t+u_t^T (R_t+B^T K_{t+1} B) u_t+2(x_t^T A^T K_{t+1} B u_t)+ w_t^T K_{t+1} w_t+\alpha_{t+1}] \end{equation*}Let \begin{align} F_t&= M_t+A^T K_{t+1} A \\ G_t&=R_t+B^T K_{t+1} B \\ H_t&=A^T K_{t+1} B \\ \alpha_{t+1}&=w_t^T K_{t+1} w_t+\alpha_{t+1} \end{align}
\begin{equation*} V_t(x_t)=\min_{{u_t}}E[x_t^T F_t x_t+u_t^T G_t u_t+2(x_t^T H_t u_t)+ \alpha_{t}] \end{equation*}The optimal action $u_t$ can be obtained by taking \begin{equation*} \frac{\partial V_t}{\partial u_t} = 2 G_t u_t+2 x_t^T H_t=0 \end{equation*}
\begin{equation*} u_{t(optimal)}=-(G_t^{-1} H_t^T) x_t \end{equation*}\begin{equation*} L_t=G_t^{-1} H_t^T \end{equation*}Insert $u_{t(optimal)}$ into $V_t$,
\begin{align} V_t(x_t)&=\min_{{u_t}}E[x_t^T F_t x_t+u_t^T G_t u_t+2(x_t^T H_t u_t)+ \alpha_{t}] \\ &=\min_{{u_t}}E[x_t^T F_t x_t+(-(G_t^{-1} H_t^T) x_t)^T G_t (-(G_t^{-1} H_t^T) x_t)+2(x_t^T H_t (-(G_t^{-1} H_t^T) x_t))+\alpha_{t}] \\ &=\min_{{u_t}}E[x_t^T(F_t-H_tG_t^{-1}H_t^T) x_t +\alpha_t ] \end{align}Map with the form $V_{t}(x_t)=x_{t}^{T}K_{t}x_{t}+\alpha_t$, \begin{equation*} K_t=F_t-H_tG_t^{-1}H_t^T \end{equation*}
Insert $F_t$, $G_t$ and $H_t$ into $K_t$, \begin{align} K_t&=F_t-H_tG_t^{-1}H_t^T \\ &=M_t+A^T K_{t+1} A-(A^T K_{t+1} B)(R_t+B^T K_{t+1} B)^{-1}(A^T K_{t+1} B)^T \end{align}
When $u_t$ is optimal, the value function does not change, i.e., $V_t=V_{t+1}$ and $K_t=K_{t+1}=K$.
The equation above becomes the significant algebraic Riccati equation:
\begin{equation*} M+A^T K A-(A^T K B)(R+B^T K B)^{-1}(A^T K B)^T-K=0 \end{equation*}The $K$ value can be solved by using
K=scipy.linalg.solve_discrete_are(A, B, M, R)
Reference:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve_discrete_are.html
K=scipy.linalg.solve_discrete_are(A, B, M, R)
$L$=scipy.linalg.solve($G_t$, $H_t^T$)
where
$G_t=R_t+B^T K B$
$H_t=A^T K B$
Reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve.html
The state space $x$ monitors three parts.
The sales $w$ withdraws parts from the inventory according to the inventory level without overselling. In this project, assume random normal distribution for $w$.
The refill purchase $u$ refills all three parts corresponding to the assembly relationship and without overpurchase.
The inventory outcome using Linear Quadratic Regulator is shown in the figure below: