ebook img

Dynamic Programming & Optimal Control Adi Ben-Israel PDF

61 Pages·2005·0.37 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Dynamic Programming & Optimal Control Adi Ben-Israel

Dynamic Programming & Optimal Control Adi Ben-Israel Adi Ben-Israel, RUTCOR–Rutgers Center for Operations Research, Rut- gers University, 640 Bartholomew Rd., Piscataway, NJ 08854-8003, USA E-mail address: [email protected] LECTURE 1 Dynamic Programming 1.1. Recursion Arecursion isaruleforcomputingavalueusingpreviouslycomputedvalues,forexample, the rule f = f +f , (1.1) k+1 k k−1 computes the Fibonnaci sequence, given the initial values f = f = 1. 0 1 Example 1.1. A projectile with mass 1 shoots up against earth gravity g. The initial velocity of the projectile is v . What maximal altitude will it reach? 0 Solution. Let y(v) be the maximal altitude reachable with initial velocity v. After time ∆t, the projectile advanced approximately v∆t, and its velocity has decreased to approximately v −g∆t. Therefore the recursion y(v) ≈ vδt+y(v −g∆t) , (1.2) gives y(v)−y(v −g∆t) ≈ v , ∆t v ∴ y(cid:48)(v) = g v2 ∴ y(v) = , (1.3) 2g and the maximal altitude reached by the projectile is v2/2g. 0 Example 1.2. Consider the partitioned matrix (cid:181) (cid:182) B c A = (1.4) r α where B is nonsingular, c is a column, r is a row, and α is a scalar. Then A is nonsingular iff α−rB−1c (cid:54)= 0 , (1.5) (verify!) in which case (cid:181) (cid:182) B−1 +βB−1crB−1 −βB−1c A−1 = , (1.6a) −βrB−1 β 1 where β = . (1.6b) α−rB−1c 3 4 1. DYNAMIC PROGRAMMING Can this result be used in a recursive computation of A−1? Try for example     1 2 3 4 −1 −2 A = 1 2 4 , with inverse A−1 =  0 −1 1  . 1 3 4 −1 1 0 Example 1.3. Consider the LP max cTx s.t. Ax = b x ≥ 0 and denote its optimal value by V(A,b). Let the matrix A be partitioned as A = (A ,a ), 1 n where a is the last column, and similarly partition the vector c as cT = (cT,c ). Is the n 1 n recursion V(A,b) = max{c x +V(A ,b−a x )} (1.7) n n 1 n n xn≥0 valid? Is it useful for solving the problem? 1.2. Multi stage optimization problems and the optimality principle Consider a process consisting of N sequential stages, and requiring some decision at each stage. The information needed at the beginning of stage i is called its state, and denoted by x . The initial state x is assumed given. A decision u is selected in stage i from the i 1 i feasible decision set U (x ), that in general depends on the state x . This results in a reward i i i r (x ,u ) and the next state becomes i i i x = T (x ,u ) , (1.8) i+1 i i i where T (·,·) is called the state transformation (or dynamics) in stage i. The terminal i state x results in a reward S(x ), called the salvage value of x . If the rewards N+1 N+1 N+1 accumulate, the problem becomes   (cid:88)N xi+1 = Ti(xi,ui) , i = 1,N ,  max r (x ,u )+S(x ) : u ∈ U (x ) , i = 1,N , . (1.9)  i i i N+1 i i i  1=1 x1given . This problem can be solved, in principle, as an optimization problem in the variables u ,...,u . However, this ignores the special, sequential, structure of the problem. 1 N The Dynamic Programming (DP) solution is based on the following concept. Definition 1.1. The optimal value function (OV function) at stage k, denoted V (·), is k   (cid:88)N xi+1 = Ti(xi,ui) , i ∈ k,N ,  V (x) := max r (x ,u )+S(x ) : u ∈ U (x ) , i ∈ k,N , . (1.10) k  i i i N+1 i i i  1=k xk = x . The optimal value of the original problem is V (x ). An optimal policy is a sequence of 1 1 decisions {u ,...,u } resulting in the value V (x ). 1 N 1 1 Bellman’s Principle of Optimality (abbreviated PO) is often stated as follows: 1.2. MULTI STAGE OPTIMIZATION PROBLEMS AND THE OPTIMALITY PRINCIPLE 5 An optimal policy has the property that whatever the initial state and the initial decisions are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision, [5, p. 15]. The PO can be used to recursively compute the OV functions V (x) = max {r (x,u)+V (T (x,u),u)} , k ∈ 1,N , (1.11) k k k+1 k u∈U (x) k V (x) = S(x) . (1.12) N+1 The last equation is called the boundary condition (BC). (cid:80)N Example 1.4. It is required to partition a positive number x in N parts, x = u , such i i=1 (cid:80)N that the sum of squares u2 is minimized. i i=1 For each k ∈ 1,N define the OV function (cid:40) (cid:41) (cid:88)k (cid:88)k V (x) := min u2 : u = x . k i i i=1 i=1 The recursion (1.11) here becomes (cid:169) (cid:170) V (x) := min u2 +V (x−u) , k k−1 u∈[0,x] with V (x) = x2 as BC. 1 For k = 2 we get (cid:169) (cid:170) x2 x V (x) := min u2 +(x−u)2 = , with optimal u = . 2 u∈[0,x] 2 2 x2 x Claim: For genral N, the optimal value is V (x) = , with optimal policy of equal u = . N i N N Proof. (by induction) (cid:169) (cid:170) V (x) = min u2 +V (x−u) N N−1 u∈[0,x] (cid:189) (cid:190) (x−u)2 = min u2 + , by the induction hypothesis , u∈[0,x] N −1 x2 x = , for u = . N N (cid:164) The following example shows that the PO, as stated above, is incorrect. For details see the book [29] by M. Sniedovich where the PO is given a rigorous treatment. 6 1. DYNAMIC PROGRAMMING Example 1.5. ([29, pp. 294–295]) Consider the graph in Figure 1.1. It is required to find the shortest path from node 1 to node 7, the length of a path is defined as the length of its longest arc. For example, length{1,2,5,7} = 4 . Ifthenodesareviewedasstates, thenthepath{1,2,4,6,7}isoptimalw.r.t. s = 1. However, the path {2,4,6,7} is not optimal w.r.t. the node s = 2, as its length is greater than the length of {2,5,7}. 2 2 5 3 1 1 4 4 7 1 1 1 2 5 2 3 6 Figure 1.1. An illustration why the PO should be used carefully 1.3. Inverse Dynamic Programming Consider a multi–stage decision process of § 1.2, where the state x is the project budget. A reasonable question is to determine the minimal budget that will enable achieving a given target v, i.e. min{x : V (x) ≥ v} , 1 denotedI (v)andcalledtheoptimal input forv. Similarly, atanystagewedefinetheoptimal 1 input as (cid:189) min{x : V (x) ≥ v} , I (v) := k k = 1,2,··· ,N . (1.13) k ∞ if the target v is unattainable , The terminal optimal input is defined as I (v) := S−1(v) (1.14) N+1 assuming the salvage value function S(·) is monotonic. A natural recursion for the optimal inputs is: I (v) := min{x : ∃u ∈ U (x) (cid:127) T (x,u) ≥ I (v−r (x,u))} , k = N,N−1,··· ,1 , (1.15) k k k k+1 k with (1.14) as BC. 1.3. INVERSE DYNAMIC PROGRAMMING 7 Exercises. Exercise 1.1. Use DP to maximize the entropy (cid:40) (cid:41) (cid:88)N (cid:88)N max − p logp : p = 1 . i i i i=1 i=1 Exercise 1.2. Let Z denote the nonnegative integers. Use DP to write a recursion for + the knapsack problem (cid:40) (cid:41) (cid:88)N (cid:88)N max f (x ) : w (x ) ≤ W, ,x ∈ Z , i i i i i + i=1 i=1 where c (·),w (·) are given functions: Z → R and W > 0 is given. State any additional i i + properties of f (·) and w (·) that are needed in your analysis. i i Exercise 1.3. ([7]) A cash amount of x cents can be represented by x coins of 50 cents , 1 x coins of 25 cents , 2 x coins of 10 cents , 3 x coins of 5 cents , and 4 x coins of 1 cent . 5 The representation is: x = 50x +25x +10x +5x +x . 1 2 3 4 5 (a) Use DP to find the representation with the minimal number of coins. (b) Show that your solution agrees with the ”greedy” solution: (cid:185) (cid:186) (cid:106) (cid:107) x x−50x 1 x = , x = , etc. 1 2 50 25 where (cid:98)α(cid:99) is the greatest integer ≤ α. (c) Suppose a new coin of 20 cents is introduced. Will the DP solution still agree with the greedy solution? Exercise 1.4. The energy required to compress a gas from pressure p to pressure p 1 N+1 in N stages is proportional to (cid:181) (cid:182) (cid:181) (cid:182) (cid:181) (cid:182) p α p α p α 2 3 N+1 + +···+ p p p 1 2 N with α a positive constant. Show how to choose the intermediate pressures p ,··· ,p so as 2 N to minimize the energy requirement. Exercise 1.5. Consider the following variation of the game NIM, defined in terms of N piles of matches containing x ,x ,··· ,x matches. The rules are: 1 2 N (i) two players make moves in alternating turns, (ii) if matches remain, a move consists of removing any number of matches all from the same pile, 8 1. DYNAMIC PROGRAMMING (iii) the last player to move loses. Forexample, consideragamewithinitialpiles{x ,x ,x } = {1,4,7}wheremovesbyplayers 1 2 3 I II I, II are denoted by → and → resp., I II I II I {1,4,7} → {1,4,5} → {0,4,5} → {0,4,4} → {0,3,4} → {0,3,3} II I II I II → {0,2,3} → {0,2,2} → {0,2,1} → {0,0,1} → {0,0,0} , and II loses . (a) Use DP to find an optimal move for an initial state {x ,x ,··· ,x }. 1 2 N (b) Find a simple rule to determine if an initial state is a winning position. (c) Is {x ,x ,x ,x } = {1,3,5,7} a winning position? 1 2 3 4 Hint: Let V(x ,x ,··· ,x ) be the optimal value of having the piles {x ,x ,··· ,x } when 1 2 N 1 2 N it is your turn to play, with (cid:189) 0, if {x ,x ,··· ,x } is a losing position V(x ,x ,··· ,x ) = 1 2 N 1 2 N 1, otherwise. Let y be the number of matches removed from pile i. Then i (cid:189) (cid:190) V(x ,x ,··· ,x ) = 1− max max V(x ,x ,··· ,x −u ,··· ,x ) 1 2 N 1 2 i i N i=1,...,N 1≤yi≤xi where V(x ,x ,··· ,x ) = 0 if all x = 0 except, say x = 1 1 2 N i j Exercise 1.6. We have a number of coins, all of the same weight except for one which is of different weight, and a balance. (a)Determine theweighingprocedures whichminimize themaximumtimerequired tolocate the distinctive coin in the following cases: • the coin is known to be heavier, • it is not known whether the coin is heavier or lighter. (b) Determine the weighing procedures which minimize the expected time required to locate the coin. (c) Consider the more general problem where there are two or more distinctive coins, under various assumptions concerning the distinctive coins. Exercise 1.7. A rocket consists of k stages carrying fuel and a nose cone carrying the pay load. After the fuel carried in stage k is consumed, this stage drops off, leaving a k −1 stage rocket. Let W = weight of nose cone , 0 w = initial gross weight of stage k , k W = W +w , initial gross weight of sub-rocket k , k k−1 k p = initial propellant weight of stage k , k v = change in rocket velocity during burning of stage k . k Assume that the change in velocity v is a known function of W and p , so that . k k k v = v(W ,p ) k k k 1.3. INVERSE DYNAMIC PROGRAMMING 9 from which p = p(W ,v ) k k k Since W = W +w , and the weight of the kth stage is a known function, g(p ), of the k k−1 k k propellant carried in the stage, we have w = w(p(W +w ,v )) k k−1 k k whence, solving for w , we have k w = w(W ,v ) k k−1 k (a) Use DP to design a k-stage rocket of minimum weight which will attain a final velocity v. (b) Describe an algorithm for finding the optimal number of stages k∗. (c) Discuss the factors resulting in an increase of k∗ i.e. in more stages of smaller size. Exercise 1.8. Suppose that we are given the information that a ball is in one of N boxes, and the a priori probability, p , that it is in the kth box. k (a) Show that the procedure which minimizes the expected time required to find the ball consists of looking in the most likely box first. (b) Consider the more general problem where the time consumed in examining the kth box is t , and where there is a probability q that the examination of the kth box will yield k k no information about its contents. When this happens, we continue the search with the information already available. Let F(p ,p ,··· ,p ) be the expected time required to find 1 2 N the ball under an optimal policy. Find the functional equation that F satisfies. (c) Prove that if we wish to “obtain” the ball, the optimal policy consists of examining first the box for which p (1−q ) k k t k is a maximum. On the other hand, if we merely wish to “locate” the box containing the ball in the minimum expected time, the box for which this quantity is maximum is examined first, or not at all. Exercise 1.9. A company has m jobs that are numbered 1 through m. Job i requires k i employees. The “natural” monthly wage of job i is w dollars, with w ≤ w for all i. The i i i+1 jobs are to be grouped into n labor grades, each grade consisting of several consecutive jobs. All employees in a given labor grade receive the highest of the natural wages of the jobs in that grade. A fraction r of the employees in each jobs quit in each month. Vacancies must j be filled by promoting from the next lower grade. For instance, a vacancy in the highest of n labor grades causes n−1 promotions and a hire into the lowest labor grade. It costs t dollars to train an employee to do any job. Write a functional equation whose solution determines the number n of labor grades and the set of jobs in each labor grade that minimizes the sum of the payroll and training costs. Exercise 1.10. (The Jeep problem) Quoting from D. Gale, [18, p. 493], ··· the problem concerns a jeep which is able to carry enough fuel to travel a distance d, but is required to cross a desert whose distance is greater then d (for example 2d). It is to do this by carrying fuel from its home base

Description:
Dynamic Programming & Optimal Control. Adi Ben-Israel. Adi Ben-Israel, RUTCOR–Rutgers Center for Operations Research, Rut- gers University
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.