LQR Control

Linear Quadratic Regulator for Optimal Policy¶

Linear quadratic regulator is used to stablize a time-invariant system if the system is stablizable.
A control system in the state-space form:

\begin{equation*} x_{t+1}=Ax_{t}+Bu_{t}+w_{t} \end{equation*}

where $x_t$ is state, $u_t$ is control action, $w_t$ is noise $N(0, \sigma^2)$.

is called stabilizable if it is possible to construct an unconstrained control action $u$, which results in

\begin{equation*} \lim_{t \to \infty} x_t = 0 \end{equation*}

With the consideration that the value/cost function for the optimization is quadratic, it is possible to derive the optimal action $u_t$ in the format of

\begin{equation*} u_{t}=-L_t x_{t} \end{equation*}

Use Q function to define the value/cost function¶

The goal is to minimize the cost function in order to find the optimal action $u_t$.

The concept of Q function refers to a mathematical function that estimates the expected long-term reward of taking a specific action in a given state, assuming taking the optimal actions for all the future states. The Q stands for "quality".

The format of Bellman's equation is used to derive the Q values at a specific state with a specific action recursively.

\begin{equation*} Q_{t}(x_t,u_t)=E[g(x, u, w)+\gamma \min_{u_{t+1}} Q_{t+1}(x_{t+1}, u_{t+1})] \end{equation*}

The concept of a Value function represents the expected long-term reward at a given state.
It is essentially the Q function, assigned with the optimal action, at the same given state.

Therefore, the Value function can be written as \begin{equation*} V_{t}(x_t)=\min_{{u_t}}Q_{t}(x_t,u_t)=\min_{{u_t}}E[g(x, u, w)+ V_{t+1}(x_{t+1})] \end{equation*}

where the discount factor $\gamma$ is assigned to be 1 and $g(x, u, w)$ is the cost profile in the system.

\begin{equation*} g(x, u, w)=x_t^T M_t x_t+u_t^T R_t u_t \end{equation*}

Use Gradient Descent to derive optimal action by minimizing the value function¶

In order to derive the optimal action to be

\begin{equation*} u_{t}=-L_t x_{t} \end{equation*}

it is required that the value function $V_{t}(x_{t})$ is quadratic.

Make $V_{t}(x_{t})$ to have this format

\begin{equation*} V_{t}(x_t)=x_{t}^{T}K_{t}x_{t}+\alpha_t \end{equation*}\begin{equation*} V_{t+1}(x_{t+1})=x_{t+1}^{T}K_{t+1}x_{t+1}+\alpha_{t+1} \end{equation*}

In order to link the relationship between $V_t$ and $u_t$, insert the form $x_{t+1}=Ax_{t}+Bu_{t}+w_{t}$ into the Value function and insert the form of $g(x,u,w)$ in terms of $x_t$ and $u_t$.

\begin{align} V_{t}(x_t) &=\min_{{u_t}}E[g(x, u, w)+ V_{t+1}(x_{t+1})] \\ &=\min_{{u_t}}E[x_t^T M_t x_t+u_t^T R_t u_t+(Ax_{t}+Bu_{t}+w_{t})^T K_{t+1} (Ax_{t}+Bu_{t}+w_{t})+\alpha_{t+1}] \end{align}

By reorganizing the terms and cancel the term associated with $E[w_t]=0$,

\begin{equation*} V_t(x_t)=\min_{{u_t}}E[x_t^T (M_t+A^TK_{t+1}A) x_t+u_t^T (R_t+B^T K_{t+1} B) u_t+2(x_t^T A^T K_{t+1} B u_t)+ w_t^T K_{t+1} w_t+\alpha_{t+1}] \end{equation*}

Let \begin{align} F_t&= M_t+A^T K_{t+1} A \\ G_t&=R_t+B^T K_{t+1} B \\ H_t&=A^T K_{t+1} B \\ \alpha_{t+1}&=w_t^T K_{t+1} w_t+\alpha_{t+1} \end{align}

\begin{equation*} V_t(x_t)=\min_{{u_t}}E[x_t^T F_t x_t+u_t^T G_t u_t+2(x_t^T H_t u_t)+ \alpha_{t}] \end{equation*}

The optimal action $u_t$ can be obtained by taking \begin{equation*} \frac{\partial V_t}{\partial u_t} = 2 G_t u_t+2 x_t^T H_t=0 \end{equation*}

\begin{equation*} u_{t(optimal)}=-(G_t^{-1} H_t^T) x_t \end{equation*}\begin{equation*} L_t=G_t^{-1} H_t^T \end{equation*}

Find the optimal value of the Value function in terms of action $u_t$¶

Insert $u_{t(optimal)}$ into $V_t$,

\begin{align} V_t(x_t)&=\min_{{u_t}}E[x_t^T F_t x_t+u_t^T G_t u_t+2(x_t^T H_t u_t)+ \alpha_{t}] \\ &=\min_{{u_t}}E[x_t^T F_t x_t+(-(G_t^{-1} H_t^T) x_t)^T G_t (-(G_t^{-1} H_t^T) x_t)+2(x_t^T H_t (-(G_t^{-1} H_t^T) x_t))+\alpha_{t}] \\ &=\min_{{u_t}}E[x_t^T(F_t-H_tG_t^{-1}H_t^T) x_t +\alpha_t ] \end{align}

Map with the form $V_{t}(x_t)=x_{t}^{T}K_{t}x_{t}+\alpha_t$, \begin{equation*} K_t=F_t-H_tG_t^{-1}H_t^T \end{equation*}

algebraic Riccati equation¶

Insert $F_t$, $G_t$ and $H_t$ into $K_t$, \begin{align} K_t&=F_t-H_tG_t^{-1}H_t^T \\ &=M_t+A^T K_{t+1} A-(A^T K_{t+1} B)(R_t+B^T K_{t+1} B)^{-1}(A^T K_{t+1} B)^T \end{align}

When $u_t$ is optimal, the value function does not change, i.e., $V_t=V_{t+1}$ and $K_t=K_{t+1}=K$.

The equation above becomes the significant algebraic Riccati equation:

\begin{equation*} M+A^T K A-(A^T K B)(R+B^T K B)^{-1}(A^T K B)^T-K=0 \end{equation*}

The $K$ value can be solved by using
K=scipy.linalg.solve_discrete_are(A, B, M, R)

Reference:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve_discrete_are.html

Recursive approach to minimze the Value function¶

Step 1. Calculate the optimal K¶

K=scipy.linalg.solve_discrete_are(A, B, M, R)

Step 2. Calculate the optimal action coefficient L¶

$L$=scipy.linalg.solve($G_t$, $H_t^T$)

where

$G_t=R_t+B^T K B$
$H_t=A^T K B$

Reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve.html

Step 3. At each time iteration, calculate the action, next state and cost at current state¶

\begin{align} u_{t}&=-L x_{t} \\ x_{t+1}&=Ax_{t}+Bu_{t}+w_{t} \\ g(x, u, w)&=x_t^T M_t x_t+u_t^T R_t u_t \end{align}

Example. Inventory Control (Part Count, Refill, Sales)¶

The state space $x$ monitors three parts.
The sales $w$ withdraws parts from the inventory according to the inventory level without overselling. In this project, assume random normal distribution for $w$.
The refill purchase $u$ refills all three parts corresponding to the assembly relationship and without overpurchase.

The inventory outcome using Linear Quadratic Regulator is shown in the figure below: