Dynamic Programming

I. Perspective
II. Principle of Optimality
III. Steps of Dynamic Programming
First Application: The Matrix Chain Problem
Second Application: The All-Pairs Shortest Path Problem
Third Application: Optimal Binary Search Trees

I. Perspective

Dynamic programming is an optimization technique.
Greedy vs. Dynamic Programming :
- Both techniques are optimization techniques, and both build solutions from a collection of choices of individual elements.
- The greedy method computes its solution by making its choices in a serial forward fashion, never looking back or revising previous choices.
- Dynamic programming computes its solution bottom up by synthesizing them from smaller subsolutions, and by trying many possibilities and choices before it arrives at the optimal set of choices.
- There is no a priori litmus test by which one can tell if the Greedy method will lead to an optimal solution.
- By contrast, there is a litmus test for Dynamic Programming, called The Principle of Optimality
Divide and Conquer vs. Dynamic Programming:
- Both techniques split their input into parts, find subsolutions to the parts, and synthesize larger solutions from smalled ones.
- Divide and Conquer splits its input at prespecified deterministic points (e.g., always in the middle)
- Dynamic Programming splits its input at every possible split points rather than at a pre-specified points. After trying all split points, it determines which split point is optimal.

II. Principle of Optimality

Definition: A problem is said to satisfy the Principle of Optimality if the subsolutions of an optimal solution of the problem are themesleves optimal solutions for their subproblems.
Examples:
- The shortest path problem satisfies the Principle of Optimality.
- This is because if a,x1,x2,...,xn,b is a shortest path from node a to node b in a graph, then the portion of xi to xj on that path is a shortest path from xi to xj.
- The longest path problem, on the other hand, does not satisfy the Principle of Optimality. Take for example the undirected graph G of nodes a, b, c, d, and e, and edges (a,b) (b,c) (c,d) (d,e) and (e,a). That is, G is a ring. The longest (noncyclic) path from a to d to a,b,c,d. The sub-path from b to c on that path is simply the edge b,c. But that is not the longest path from b to c. Rather, b,a,e,d,c is the longest path. Thus, the subpath on a longest path is not necessarily a longest path.

III. Steps of Dynamic Programming

Dynamic programming design involves 4 major steps:
1. Develop a mathematical notation that can express any solution and subsolution for the problem at hand.
2. Prove that the Principle of Optimality holds.
3. Develop a recurrence relation that relates a solution to its subsolutions, using the math notation of step 1. Indicate what the initial values are for that recurrenec relation, and which term signifies the final solution.
4. Write an algorithm to compute the recurrence relation.
Steps 1 and 2 need not be in that order. Do what makes sense in each problem.
Step 3 is the heart of the design process. In high level algorithmic design situations, one can stop at step 3. In this course, however, we will carry out step 4 as well.
Without the Principle of Optimality, it won't be possible to derive a sensible recurrence relation in step 3.
When the Principle of Optimality holds, the 4 steps of DP are guaranteed to yield an optimal solution. No proof of optimality is needed.

IV. First Application: The Matrix Chain Problem

Input: n matrices A₁, A₂,...,A_n of dimensions
P₁ x P₂, P₂ x P₃, ... , P_n x P_n+1, respectively.
Goal: to compute the matrix product A₁A₂...A_n
Problem: In what order should A₁A₂...A_n be multiplied so that it would take the minimum number of computations to derive the product.
Matrix multiplication cost:
- Let A and B be two matrices of dimensions p x q and q x r.
- Let C= AB. C is of diemnsions p x r
- Element C_ij= A_i1B_1j + A_i2B_2j + ... + A_iqB_qj
- Thus C_ij takes n scalar multiplications and (n-1) scalar additions.
- Since scalar multiplication is more expensive than scalar addition, we count only the scalar multiplications.
- Thus C_ij takes n multiplications.
- Therefore, A_B takes p x r x q = pqr multiplications.
Example of the best way of multiplying 3 matrices:
- takes A₁ of dimensions 3 x 5, A₂ of dimensions 5 x 7, and A₃ of dimensions 7 x 2.
- (A₁A₂)A₃ takes 3*5*7 + 3*7*2=147
- A₁(A₂A₃) takes 5*7*2 + 3*5*2=100
- Thus, A₁(A₂A₃) is much cheaper to compute than (A₁A₂)A₃, although Both lead to the same final answer.
- Exercise: Formulate a greedy method for the Matrix Chain Problem, and prove by a counter example that it does not necessarily lead to an optimal solution.
- ADynamic programming design:
  - Notation: let M_ij denote the cost of multiplying A_i...A_j, where the cost is measured in the number of scalar multiplications.
    Clearly, M(i,i)=0 for all i, and M(1,n) is what we are looking for.
  - Proof of the principle of optimality
    - Every way of multiplying a sequence of matrices can be represented by a binary (infix) tree, where the leaves are the matrices, and the internal nodes are intemediary products.
    - Let T be the tree correspodning to the optimal way of multiplying A_i...A_j.
    - T has a left subtree L and a right subtree R. L corresponds to multiplying B=A_i...A_k, and R to multiplying C=A_k+1 ... A_j, for some integer k (i <= k <= j-1).
    - The cost corresponding to T is cost(T)= cost(R) + Cost(L) + cost(BC).
    - for the Principle of Optimality to hold, we need to show that L is the best tree of A_i...A_k, and R is the best tree for A_k+1 ... A_j. It suffices to show it for L.
    - The proof is by contradiction. If L were not optimal, then there would be a better tree L' for A_i...A_k.
    - Cost(L') < Cost(L).
    - Then, takes the tree T' whose left subtree is L' and whose right subtree is R.
    - ```
    Cost(T') = Cost(L') + Cost(R) + Cost(BC) 
             < Cost(L) + Cost(R) + Cost(BC) = Cost(T)
       
```
- Thus, Cost(T') < cost(T), which contradicts the fact that T was the best tree for its matrices.
- Therefore, L must be optimal. Q.E.D.
- Derivation of the recurrence relation:
  - use the same notation T, L and R for the optimal way of multiplying A_i...A_j. L is the left subtree corresponding to A_i...A_k (for some k), and R corresponds to A_k+1...A_n.
  - ```
   
  M_ij=cost(T)
        =Cost(L)+Cost(R)+Cost(BC)
        =M_ik + M_k+1,j + P_iP_k+1P_j+1.
     
```
  - Since we do not know the right value of k, and since M_ij is supposed to be the minimum possible, the relation should be M_ij=min{M_ik + M_k+1,j + P_iP_k+1P_j+1 | i <= k <= j-1}
  - Illustration:
    - n=4. A₁ is 3 x 5, A₂ is 5 x 7, A₃ is 7 x 3, and A₄ is 3 x 4
    - P₁=3, P₂=5, P₃=7, P₄=3, P₅=4
    - We will build a table to compute the M_ijs bottom up. For each M_ij, we record the k that gives M_ij its min vakue
    - M₁₁=0 M₂₂=0 M₃₃=0 M₄₄=0
      
      M₁₂=105 M₂₃=105 M₃₄=84
      
      M₁₃=150
      k=1 M₂₄=165
      k=3
      
      M₁₄=186
      k=3
    - Optimal multiplication way: (A₁(A₂A₃))A₄.

V. Second Application: The All-Pairs Shortest Path Problem

Input: A weighted graph, represented by its weight matrix W.
Problem: find the distance between every pair of nodes.
Dynamic programming Design:
- Notation: A^(k)(i,j) = length of the shortest path from node i to node j where the label of every intermediary node is <= k.
  A⁽⁰⁾(i,j) = W[i,j].
- Principle of Optimality: We already saw that any sub-path of a shortest path is a shortest path between its end nodes.
- recurrence relation:
- Divide the paths from i to j where every intermediary node is of label <=k into two groups:
  1. Those paths that do go through node k
  2. Those paths that do not go through node k.
- the shortest path in the first group is the shortest path from i to j where the label of every intermediary node is <= k-1.
- therefore, the length of the shortest path of group 1 is A^(k-1)(i,j)
- Each path in group two consists of two portions: The first is from node i to node k, and the second is from node k to node j.
- the shortest path in group 2 does not goe through K more than once, for otherwise, the cycle around K can be eliminated, leading to a shorter path in group 2.
- Therefore, the two portions of the shortest path in group 2 have their intermediary labels <= k-1.
- Each portion must be the shortest of its kind. That is, the portion from i to k where intermediary node is <= k-1 must be the shortest such a path from i to k. If not, we would get a shorter path in group 2. Same thing with the second portion (from j to k).
- Therefore, the length of the first portion of the shortest path in group 2 is A^(k-1)(i,k)
- Therefore, the length of the 2nd portion of the shortest path in group 2 is A^(k-1)(k,j)
- Hence, the length of the shortest path in group 2 is A^(k-1)(i,k) + A^(k-1)(k,j)
- Since the shortest path in the two groups is the shorter of the shortest paths of the two groups, we get
- A^(k)(i,j)=min(A^(k-1)(i,j), A^(k-1)(i,k) + A^(k-1)(k,j)).
- The algorithm follows:



Procedure APSP(input: W[1:n,1:n];A[1:n,1:n])
begin 
   for i=1 to n do
      for j=1 to n do
         A⁽⁰⁾(i,j) := W[i,j];
      endfor
   endfor
   for k=1 to n do
      for i=1 to n do
         for j=1 to n do
            A^(k)(i,j)=min(A^(k-1)(i,j),A^(k-1)(i,k) + A^(k-1)(k,j))
         endfor
      endfor
   endfor
end

Note that once A^(k) has been computed, there is no need for A^(k-1)
Therefore, we don't need to keep the superscript
By dropping it, the algorithm remains correct, and we save on space
the new implementation follows:



Procedure APSP(input: W[1:n,1:n];A[1:n,1:n])
begin 
   for i=1 to n do
        for j=1 to n do
           A(i,j) := W[i,j];
        endfor
   endfor
   for k=1 to n do
        for i=1 to n do
           for j=1 to n do
              A(i,j)=min(A(i,j),A(i,k) + A(k,j));
           endfor
        endfor
   endfor
end

Time Complexity Analysis:
- The first double for-loop takes O(n²) time.
- The tripl-for-loop that follows has a constant-time body, and thus takes O(n³) time.
- Thus, the whole algorithm takes O(n³) time.

VI. Third Application: Optimal Binary Search Trees

Input: a₁ < a₂ < ... < a_n
p₁ p₂ ... p_n
q₀ q₁ q₂ ... q_n
p_i = Prob[a_i is accessed], i=1,2,...,n
q_i = Prob[accessing an element X, a_i < X < a_i+1]
q₀ = Prob[accessing an element X, X < a₁]
q_n = Prob[accessing an element X, a_n < X]
Problem: Find for the the array a_1..n a binary search tree of minimum cost.
Cost Measure: average search time
When computing the average, take into account both successful and unsuccessful searches.
Quantitatively, cost of a BST T is C(T) where C(T)= SUMⁿ_i=1 p_icost(a_i) +SUMⁿ_i=0 q_icost(X, a_i < X < a_i+1) (to simplify the notation, assume a₀ = -infinity and a_n+1 = infinity)cost(a_i) = level(a_i) + 1
cost(X, a_i < X i+1) = level(E_i)
where
E_i = External node where X would be if it were in the tree
Notation:
T_ij = OBST(a_i+1, ... , a_j)
C_ij = cost(T_ij)
C_ij = SUM^j_s=i+1 p_s{level_{T_ij}(a_s) + 1} + SUM^j_s=i q_slevel_{T_ij}(E_s)
T_0n is the final tree being sought
T₀₀ is empty
T_i,i+1 is a single-node tree that has element a_i+1
Principle of Optimality:
- Let T_0n be an OBST for the elements a₁ < a₂ < ... < a_n, and let L and R be its left subtree and right subtree. Suppose that the root of T_0n is a_k, for some k.
- Clearly, the numbers in the tree L are a₁, a₂, ... , a_k-1
  and the numbers in the tree R are a_k+1, a_k+2, ... , a_n.
- We need to show that L is an OBST for its elements (and also R is an OBST for its elements).
- It will be shown below that C(T_0n)=C(L)+C(R)+ p₁+p₂+...+p_n + q₀+q₁+q₂+...+q_n
- That is, C(T_0n)=C(L)+C(R)+W where
  W=p₁+p₂+...+p_n + q₀+q₁+q₂+...+q_n
- W is independent of the structure of L and R.
- If L is not an optimal BST for its elements, then we can find a better tree L' for the same elements, where C(L') < C(L). Let T' be the tree with root a_k, left subtree L' and right subtree R.
- We have C(T') = C(L') + C(R) + W < C(L) + C(R) + W < C(T_0n)
- That is, T' is better than T_0n, contradicting the fact that T_0n is an optimal BST. Therefore, L must be an optimal BST for its elements. The same proof applies to R.
- Thus, a subtree of an OBST must be an OBST. This proves the principle of optimality.
Recurrence Relation for C_ij:
1. T_ij is rooted at a_k, for some k, (i+1 <= k <= j), its left subtree is T_i,k-1 and it right subtree is T_kj.
2. C_ij = SUM^j_s=i+1 p_s{level_{T_ij}(a_s) + 1} + SUM^j_s=i q_slevel_{T_ij}(E_s)
3. C_i,k-1 = SUM^k-1_s=i+1 p_s{level_{T_i,k-1}(a_s) + 1} +
  SUM^k-1_s=iq_slevel_{T_i,k-1}(E_s)
4. C_kj = SUM^j_s=k+1 p_s{level_{T_kj}(a_s) + 1} +
  SUM^j_s=k q_slevel_{T_kj}(E_s)
5. SUM^j_s=i+1 p_s{level_{T_ij}(a_s) + 1} =
  SUM^k-1_s=i+1 p_s{level_{T_ij}(a_s) + 1} +
  p_k{level_{T_ij}(a_k + 1} +
  SUM^j_s=k+1 p_s{level_{T_ij}(a_s) + 1} =
  SUM^k-1_s=i+1 p_s{level_{T_ij}(a_s) + 1} +
  p_k +
  SUM^j_s=k+1 p_s{level_{T_ij}(a_s) + 1}
6. Noting that level_{T_ij}(a_s)= level_{T_i,k-1}(a_s) + 1 for i <= k-1, and
  level_{T_ij}(a_s)= level_{T_kj}(a_s) + 1, for i >= k+1, we conclude from (5):
7. SUM^j_s=i+1 p_s{level_{T_ij}(a_s) + 1} = SUM^k-1_s=i+1 p_s{level_{T_i,k-1}(a_s) + 1+1} + p_k + SUM^j_s=k+1 p_s{level_{T_i,k-1}(a_s) + 1+1} =
  SUM^k-1_s=i+1 p_s{level_{T_i,k-1}(a_s) + 1} + SUM^j_s=k+1 p_s{level_{T_i,k-1}(a_s) + 1} + SUM^k-1_s=i p_s + p_k + SUM^j_s=k+1 p_s
  SUM^j_s=i+1 p_s{level_{T_ij}(a_s) + 1} = SUM^k-1_s=i+1 p_s{level_{T_i,k-1}(a_s) + 1} + SUM^j_s=k+1 p_s{level_{T_i,k-1}(a_s) + 1} + SUM^j_s=i+1 p_s
  Similar arithmetic will show that
  SUM^j_s=i q_slevel_{T_ij}(E_s) = SUM^k-1_s=iq_slevel_{T_i,k-1}(E_s) + SUM^j_s=k q_slevel_{T_kj}(E_s) + SUM^j_s=i q_s
8. Therefore,
  C_ij = SUM^k-1_s=i+1 p_s{level_{T_i,k-1}(a_s) + 1} +
  SUM^j_s=k+1 p_s{level_{T_i,k-1}(a_s) + 1} +
  SUM^j_s=i+1 p_s +
  SUM^k-1_s=iq_slevel_{T_i,k-1}(E_s) +
  SUM^j_s=k q_slevel_{T_kj}(E_s +
  SUM^j_s=i q_s =
  C_i,k-1 + C_kj + SUM^j_s=i+1 p_s + SUM^j_s=i q_s
9. C_ij = C_i,k-1 + C_kj + W_ijW_ij = SUM^j_s=i+1 p_s + SUM^j_s=i q_s
10. Since we don't know which k should be at the root, and since we want to minimize C_ij, we should take the k that gives the minimum. Therefore,
11. C_ij = min_{i+1 <= k <= j}{C_i,k-1 + C_kj + W_ij}
  W_ij = SUM^j_s=i+1 p_s + SUM^j_s=i q_s
  C_ii = 0, W_ii = q_i
  Denote by r_ij the index k that gives the minimum to C_ij.

Algorithm:



This procedure computes the weights W_ijs 

Procedure Weight(Input:p[1:n], q[0:n]; Output: W[0:n,0:n])
begin
   for i=1 to n do
      W[i,i] = q(i);
   endfor

   for l=1 to n do
      for i=0 to n-l do
         k = i+l;
         W[i,k]=W[i,k-1] + p[k] + q[k];
      endfor
   endfor
end



This procedure computes the C_ijs and the
r_ijs 

Procedure OBST(Input:p[1:n], q[0:n], W[0:n,0:n]; 
  Output: C[0:n,0:n], r[0:n,0:n])
begin
   for i=0 to n do
      C[i,i] := 0;
   endfor

   for l=1 to n do
      for i=0 to n-l do
         j=i+l;
         C[i,j] := infinity;
         m := i+1; --m keeps the index of the min
         for k=i+1 to j do
            if C[i,j] >= C[i,k-1] + C[k,j] then
               C[i,j] := C[i,k-1] + C[k,j];
               m := k;
            endif
         endfor
         C[i,j] := C[i,j] + W[i,j];
         r[i,j] := m;
      endfor
   endfor
end



This procedure creates the tree T_ij

Procedure create-tree(Input: r[0:n,0:n], a[1:n], i, j; 
   Output: T)
begin
   if (i==j) then T=null; return; endif
   T := new(node); -- the root of T_ij
   k := r[i,j];
   T --> data := a[k];
   if (j==i+1) return; endif
   create-tree(r[0:n,0:n], a[1:n], i, k-1; T --> left);
   create-tree(r[0:n,0:n], a[1:n], k, j; T --> right);
end



This procedure is the master program that 
creates the whole tree T_0n

Procedure Final-tree(Input: a[1:n],p[1:n],q[1:n]; 
   Output: T)
begin
   Weight(p[1:n], q[0:n], W[0:n,0:n]);
   OBST(p[1:n], q[0:n], W[0:n,0:n], 
  C[0:n,0:n], r[0:n,0:n]);
   create-tree(r[0:n,0:n], a[1:n], 0, n, T); 
end

Example:

a₁ < a₂ < a₃ < a₄p₁=1/10, p₂=2/10, p₃=3/10, p₄=1/10
q₀=0, q₁=1/10, q₂=1/20, q₃=1/20, q₄=1/10

W₀₀=0 W₁₁=1/10 W₂₂=1/20 W₃₃=1/20 W₄₄=1/10

W₀₁=2/10 W₁₂=3.5/10 W₂₃=4/10 W₃₄=2.5/10

W₀₂=4.5/10 W₁₃=7/10 W₂₄=6/10

W₀₃=8/10 W₁₄=9/10

W₀₄=10/10

C₀₀=0 C₁₁=0 C₂₂=0 C₃₃=0 C₄₄=0

C₀₁=2/10
r₀₁=1 C₁₂=3.5/10
r₁₂=2 C₂₃=4/10
r₂₃=3 C₃₄=2.5/10
r₃₄=4

C₀₂=6.5/10
r₀₂=2 C₁₃=10.5/10
r₁₃=3 C₂₄=8.5/10
r₂₄=3

C₀₃=14/10
r₀₃=2 C₁₄=15/10
r₁₄=3

C₀₄=19/10
r₀₄=3
The tree T₀₄ has as root a₃ because r₀₄=3
The left subtree is then T₀₂ and the right subtree is T₃₄
T₃₄ is a single-node tree having a₄ because r₃₄=4
T₀₂ has as root a₂ because r₀₂=2
The left subtree of T₀₂ is T₀₁ and its right subtree is T₂₂ (which is empty)
T₀₁ is a single-node tree having a₁ because r₀₁=1
this completes the tree (to be drawn in class).

প্রিয় বাংলা

বৃহস্পতিবার, ১৬ এপ্রিল, ২০১৫

Dynamic Programming

I. Perspective

II. Principle of Optimality

III. Steps of Dynamic Programming

First Application: The Matrix Chain Problem

Second Application: The All-Pairs Shortest Path Problem

Third Application: Optimal Binary Search Trees

I. Perspective

II. Principle of Optimality

III. Steps of Dynamic Programming

IV. First Application: The Matrix Chain Problem

V. Second Application: The All-Pairs Shortest Path Problem

VI. Third Application: Optimal Binary Search Trees

M₁₁=0	M₂₂=0	M₃₃=0	M₄₄=0
M₁₂=105	M₂₃=105	M₃₄=84
M₁₃=150 k=1	M₂₄=165 k=3
M₁₄=186 k=3

W₀₀=0	W₁₁=1/10	W₂₂=1/20	W₃₃=1/20	W₄₄=1/10
W₀₁=2/10	W₁₂=3.5/10	W₂₃=4/10	W₃₄=2.5/10
W₀₂=4.5/10	W₁₃=7/10	W₂₄=6/10
W₀₃=8/10	W₁₄=9/10
W₀₄=10/10

C₀₀=0	C₁₁=0	C₂₂=0	C₃₃=0	C₄₄=0
C₀₁=2/10 r₀₁=1	C₁₂=3.5/10 r₁₂=2	C₂₃=4/10 r₂₃=3	C₃₄=2.5/10 r₃₄=4
C₀₂=6.5/10 r₀₂=2	C₁₃=10.5/10 r₁₃=3	C₂₄=8.5/10 r₂₄=3
C₀₃=14/10 r₀₃=2	C₁₄=15/10 r₁₄=3
C₀₄=19/10 r₀₄=3