Dynamic Programming
I. Perspective
II. Principle of Optimality
III. Steps of Dynamic Programming
First Application: The Matrix Chain Problem
Second Application: The All-Pairs Shortest Path Problem
Third Application: Optimal Binary Search Trees
I. Perspective
- Dynamic programming is an optimization technique.
- Greedy vs. Dynamic Programming :
- Both techniques are optimization techniques, and both build solutions from a collection of choices of individual elements.
- The greedy method computes its solution by making its choices in a serial forward fashion, never looking back or revising previous choices.
- Dynamic programming computes its solution bottom up by synthesizing them from smaller subsolutions, and by trying many possibilities and choices before it arrives at the optimal set of choices.
- There is no a priori litmus test by which one can tell if the Greedy method will lead to an optimal solution.
- By contrast, there is a litmus test for Dynamic Programming, called The Principle of Optimality
- Divide and Conquer vs. Dynamic Programming:
- Both techniques split their input into parts, find subsolutions to the parts, and synthesize larger solutions from smalled ones.
- Divide and Conquer splits its input at prespecified deterministic points (e.g., always in the middle)
- Dynamic Programming splits its input at every possible split points rather than at a pre-specified points. After trying all split points, it determines which split point is optimal.
II. Principle of Optimality
- Definition: A problem is said to satisfy the Principle of Optimality if the subsolutions of an optimal solution of the problem are themesleves optimal solutions for their subproblems.
- Examples:
- The shortest path problem satisfies the Principle of Optimality.
- This is because if a,x1,x2,...,xn,b is a shortest path from node a to node b in a graph, then the portion of xi to xj on that path is a shortest path from xi to xj.
- The longest path problem, on the other hand, does not satisfy the Principle of Optimality. Take for example the undirected graph G of nodes a, b, c, d, and e, and edges (a,b) (b,c) (c,d) (d,e) and (e,a). That is, G is a ring. The longest (noncyclic) path from a to d to a,b,c,d. The sub-path from b to c on that path is simply the edge b,c. But that is not the longest path from b to c. Rather, b,a,e,d,c is the longest path. Thus, the subpath on a longest path is not necessarily a longest path.
III. Steps of Dynamic Programming
- Dynamic programming design involves 4 major steps:
- Develop a mathematical notation that can express any solution and subsolution for the problem at hand.
- Prove that the Principle of Optimality holds.
- Develop a recurrence relation that relates a solution to its subsolutions, using the math notation of step 1. Indicate what the initial values are for that recurrenec relation, and which term signifies the final solution.
- Write an algorithm to compute the recurrence relation.
- Steps 1 and 2 need not be in that order. Do what makes sense in each problem.
- Step 3 is the heart of the design process. In high level algorithmic design situations, one can stop at step 3. In this course, however, we will carry out step 4 as well.
- Without the Principle of Optimality, it won't be possible to derive a sensible recurrence relation in step 3.
- When the Principle of Optimality holds, the 4 steps of DP are guaranteed to yield an optimal solution. No proof of optimality is needed.
IV. First Application: The Matrix Chain Problem
- Input: n matrices A1, A2,...,An of dimensions
P1 x P2, P2 x P3, ... , Pn x Pn+1, respectively. - Goal: to compute the matrix product A1A2...An
- Problem: In what order should A1A2...An be multiplied so that it would take the minimum number of computations to derive the product.
- Matrix multiplication cost:
- Let A and B be two matrices of dimensions p x q and q x r.
- Let C= AB. C is of diemnsions p x r
- Element Cij= Ai1B1j + Ai2B2j + ... + AiqBqj
- Thus Cij takes n scalar multiplications and (n-1) scalar additions.
- Since scalar multiplication is more expensive than scalar addition, we count only the scalar multiplications.
- Thus Cij takes n multiplications.
- Therefore, AB takes p x r x q = pqr multiplications.
- Example of the best way of multiplying 3 matrices:
- takes A1 of dimensions 3 x 5, A2 of dimensions 5 x 7, and A3 of dimensions 7 x 2.
- (A1A2)A3 takes 3*5*7 + 3*7*2=147
- A1(A2A3) takes 5*7*2 + 3*5*2=100
- Thus, A1(A2A3) is much cheaper to compute than (A1A2)A3, although Both lead to the same final answer.
- Exercise: Formulate a greedy method for the Matrix Chain Problem, and prove by a counter example that it does not necessarily lead to an optimal solution.
- A Dynamic programming design:
- Notation: let Mij denote the cost of multiplying Ai...Aj, where the cost is measured in the number of scalar multiplications.
Clearly, M(i,i)=0 for all i, and M(1,n) is what we are looking for.
- Proof of the principle of optimality
- Every way of multiplying a sequence of matrices can be represented by a binary (infix) tree, where the leaves are the matrices, and the internal nodes are intemediary products.
- Let T be the tree correspodning to the optimal way of multiplying Ai...Aj.
- T has a left subtree L and a right subtree R. L corresponds to multiplying B=Ai...Ak, and R to multiplying C=Ak+1 ... Aj, for some integer k (i <= k <= j-1).
- The cost corresponding to T is
cost(T)= cost(R) + Cost(L) + cost(BC). - for the Principle of Optimality to hold, we need to show that L is the best tree of Ai...Ak, and R is the best tree for Ak+1 ... Aj. It suffices to show it for L.
- The proof is by contradiction. If L were not optimal, then there would be a better tree L' for Ai...Ak.
- Cost(L') < Cost(L).
- Then, takes the tree T' whose left subtree is L' and whose right subtree is R.
Cost(T') = Cost(L') + Cost(R) + Cost(BC) < Cost(L) + Cost(R) + Cost(BC) = Cost(T)- Thus, Cost(T') < cost(T), which contradicts the fact that T was the best tree for its matrices.
- Therefore, L must be optimal. Q.E.D.
- Derivation of the recurrence relation:
- use the same notation T, L and R for the optimal way of multiplying Ai...Aj. L is the left subtree corresponding to Ai...Ak (for some k), and R corresponds to Ak+1...An.
Mij=cost(T) =Cost(L)+Cost(R)+Cost(BC) =Mik + Mk+1,j + PiPk+1Pj+1.- Since we do not know the right value of k, and since Mij is supposed to be the minimum possible, the relation should be
Mij=min{Mik + Mk+1,j + PiPk+1Pj+1 | i <= k <= j-1}
- Illustration:
- n=4. A1 is 3 x 5, A2 is 5 x 7, A3 is 7 x 3, and A4 is 3 x 4
- P1=3, P2=5, P3=7, P4=3, P5=4
- We will build a table to compute the Mijs bottom up. For each Mij, we record the k that gives Mij its min vakue
M11=0 M22=0 M33=0 M44=0 M12=105 M23=105 M34=84 M13=150
k=1M24=165
k=3M14=186
k=3
- Optimal multiplication way: (A1(A2A3))A4.
- Notation: let Mij denote the cost of multiplying Ai...Aj, where the cost is measured in the number of scalar multiplications.
V. Second Application: The All-Pairs Shortest Path Problem
- Input: A weighted graph, represented by its weight matrix W.
- Problem: find the distance between every pair of nodes.
- Dynamic programming Design:
- Notation: A(k)(i,j) = length of the shortest path from node i to node j where the label of every intermediary node is <= k.
A(0)(i,j) = W[i,j].
- Principle of Optimality: We already saw that any sub-path of a shortest path is a shortest path between its end nodes.
- recurrence relation:
- Divide the paths from i to j where every intermediary node is of label <=k into two groups:
- Those paths that do go through node k
- Those paths that do not go through node k.
- the shortest path in the first group is the shortest path from i to j where the label of every intermediary node is <= k-1.
- therefore, the length of the shortest path of group 1 is A(k-1)(i,j)
- Each path in group two consists of two portions: The first is from node i to node k, and the second is from node k to node j.
- the shortest path in group 2 does not goe through K more than once, for otherwise, the cycle around K can be eliminated, leading to a shorter path in group 2.
- Therefore, the two portions of the shortest path in group 2 have their intermediary labels <= k-1.
- Each portion must be the shortest of its kind. That is, the portion from i to k where intermediary node is <= k-1 must be the shortest such a path from i to k. If not, we would get a shorter path in group 2. Same thing with the second portion (from j to k).
- Therefore, the length of the first portion of the shortest path in group 2 is A(k-1)(i,k)
- Therefore, the length of the 2nd portion of the shortest path in group 2 is A(k-1)(k,j)
- Hence, the length of the shortest path in group 2 is A(k-1)(i,k) + A(k-1)(k,j)
- Since the shortest path in the two groups is the shorter of the shortest paths of the two groups, we get
- A(k)(i,j)=min(A(k-1)(i,j), A(k-1)(i,k) + A(k-1)(k,j)).
- The algorithm follows:
- Notation: A(k)(i,j) = length of the shortest path from node i to node j where the label of every intermediary node is <= k.
Procedure APSP(input: W[1:n,1:n];A[1:n,1:n]) begin for i=1 to n do for j=1 to n do A(0)(i,j) := W[i,j]; endfor endfor for k=1 to n do for i=1 to n do for j=1 to n do A(k)(i,j)=min(A(k-1)(i,j),A(k-1)(i,k) + A(k-1)(k,j)) endfor endfor endfor end
- Note that once A(k) has been computed, there is no need for A(k-1)
- Therefore, we don't need to keep the superscript
- By dropping it, the algorithm remains correct, and we save on space
- the new implementation follows:
Procedure APSP(input: W[1:n,1:n];A[1:n,1:n]) begin for i=1 to n do for j=1 to n do A(i,j) := W[i,j]; endfor endfor for k=1 to n do for i=1 to n do for j=1 to n do A(i,j)=min(A(i,j),A(i,k) + A(k,j)); endfor endfor endfor end
- Time Complexity Analysis:
- The first double for-loop takes O(n2) time.
- The tripl-for-loop that follows has a constant-time body, and thus takes O(n3) time.
- Thus, the whole algorithm takes O(n3) time.
VI. Third Application: Optimal Binary Search Trees
- Input: a1 < a2 < ... < an
p1 p2 ... pn
q0 q1 q2 ... qn
pi = Prob[ai is accessed], i=1,2,...,n
qi = Prob[accessing an element X, ai < X < ai+1]
q0 = Prob[accessing an element X, X < a1]
qn = Prob[accessing an element X, an < X]
- Problem: Find for the the array a1..n a binary search tree of minimum cost.
- Cost Measure: average search time
When computing the average, take into account both successful and unsuccessful searches. - Quantitatively, cost of a BST T is C(T) where
C(T)= SUMni=1 picost(ai) +SUMni=0 qicost(X, ai < X < ai+1) (to simplify the notation, assume a0 = -infinity and an+1 = infinity)cost(ai) = level(ai) + 1
cost(X, ai < X i+1) = level(Ei)
where
Ei = External node where X would be if it were in the tree - Notation:
Tij = OBST(ai+1, ... , aj)
Cij = cost(Tij)
Cij = SUMjs=i+1 ps{level Tij(as) + 1} + SUMjs=i qslevel Tij(Es)
T0n is the final tree being sought
T00 is empty
Ti,i+1 is a single-node tree that has element ai+1
- Principle of Optimality:
- Let T0n be an OBST for the elements a1 < a2 < ... < an, and let L and R be its left subtree and right subtree. Suppose that the root of T0n is ak, for some k.
- Clearly, the numbers in the tree L are a1, a2, ... , ak-1
and the numbers in the tree R are ak+1, ak+2, ... , an. - We need to show that L is an OBST for its elements (and also R is an OBST for its elements).
- It will be shown below that C(T0n)=C(L)+C(R)+ p1+p2+...+pn + q0+q1+q2+...+qn
- That is,
C(T0n)=C(L)+C(R)+W whereW=p1+p2+...+pn + q0+q1+q2+...+qn - W is independent of the structure of L and R.
- If L is not an optimal BST for its elements, then we can find a better tree L' for the same elements, where C(L') < C(L). Let T' be the tree with root ak, left subtree L' and right subtree R.
- We have
C(T') = C(L') + C(R) + W < C(L) + C(R) + W < C(T0n) - That is, T' is better than T0n, contradicting the fact that T0n is an optimal BST. Therefore, L must be an optimal BST for its elements. The same proof applies to R.
- Thus, a subtree of an OBST must be an OBST. This proves the principle of optimality.
- Recurrence Relation for Cij:
- Tij is rooted at ak, for some k, (i+1 <= k <= j), its left subtree is Ti,k-1 and it right subtree is Tkj.
- Cij = SUMjs=i+1 ps{level Tij(as) + 1} + SUMjs=i qslevel Tij(Es)
- Ci,k-1 = SUMk-1s=i+1 ps{level Ti,k-1(as) + 1} +
SUMk-1s=iqslevel Ti,k-1(Es) - Ckj = SUMjs=k+1 ps{level Tkj(as) + 1} +
SUMjs=k qslevel Tkj(Es) - SUMjs=i+1 ps{level Tij(as) + 1} =
SUMk-1s=i+1 ps{level Tij(as) + 1} +
pk{level Tij(ak + 1} +
SUMjs=k+1 ps{level Tij(as) + 1} =
SUMk-1s=i+1 ps{level Tij(as) + 1} +
pk +
SUMjs=k+1 ps{level Tij(as) + 1} - Noting that levelTij(as)= levelTi,k-1(as) + 1 for i <= k-1, and
levelTij(as)= levelTkj(as) + 1, for i >= k+1, we conclude from (5): - SUMjs=i+1 ps{level Tij(as) + 1} = SUMk-1s=i+1 ps{level Ti,k-1(as) + 1+1} + pk + SUMjs=k+1 ps{level Ti,k-1(as) + 1+1} =
SUMk-1s=i+1 ps{level Ti,k-1(as) + 1} + SUMjs=k+1 ps{level Ti,k-1(as) + 1} + SUMk-1s=i ps + pk + SUMjs=k+1 ps
SUMjs=i+1 ps{level Tij(as) + 1} = SUMk-1s=i+1 ps{level Ti,k-1(as) + 1} + SUMjs=k+1 ps{level Ti,k-1(as) + 1} + SUMjs=i+1 ps
Similar arithmetic will show that
SUMjs=i qslevel Tij(Es) = SUMk-1s=iqslevel Ti,k-1(Es) + SUMjs=k qslevel Tkj(Es) + SUMjs=i qs
- Therefore,
Cij = SUMk-1s=i+1 ps{level Ti,k-1(as) + 1} +
SUMjs=k+1 ps{level Ti,k-1(as) + 1} +
SUMjs=i+1 ps +
SUMk-1s=iqslevel Ti,k-1(Es) +
SUMjs=k qslevel Tkj(Es +
SUMjs=i qs =
Ci,k-1 + Ckj + SUMjs=i+1 ps + SUMjs=i qs - Cij = Ci,k-1 + Ckj + WijWij = SUMjs=i+1 ps + SUMjs=i qs
- Since we don't know which k should be at the root, and since we want to minimize Cij, we should take the k that gives the minimum. Therefore,
- Cij = mini+1 <= k <= j{Ci,k-1 + Ckj + Wij}
Wij = SUMjs=i+1 ps + SUMjs=i qs
Cii = 0, Wii = qi
Denote by rij the index k that gives the minimum to Cij.
This procedure computes the weights Wijs
Procedure Weight(Input:p[1:n], q[0:n]; Output: W[0:n,0:n])
begin
for i=1 to n do
W[i,i] = q(i);
endfor
for l=1 to n do
for i=0 to n-l do
k = i+l;
W[i,k]=W[i,k-1] + p[k] + q[k];
endfor
endfor
end
This procedure computes the Cijs and the
rijs
Procedure OBST(Input:p[1:n], q[0:n], W[0:n,0:n];
Output: C[0:n,0:n], r[0:n,0:n])
begin
for i=0 to n do
C[i,i] := 0;
endfor
for l=1 to n do
for i=0 to n-l do
j=i+l;
C[i,j] := infinity;
m := i+1; --m keeps the index of the min
for k=i+1 to j do
if C[i,j] >= C[i,k-1] + C[k,j] then
C[i,j] := C[i,k-1] + C[k,j];
m := k;
endif
endfor
C[i,j] := C[i,j] + W[i,j];
r[i,j] := m;
endfor
endfor
end
This procedure creates the tree Tij
Procedure create-tree(Input: r[0:n,0:n], a[1:n], i, j;
Output: T)
begin
if (i==j) then T=null; return; endif
T := new(node); -- the root of Tij
k := r[i,j];
T --> data := a[k];
if (j==i+1) return; endif
create-tree(r[0:n,0:n], a[1:n], i, k-1; T --> left);
create-tree(r[0:n,0:n], a[1:n], k, j; T --> right);
end
This procedure is the master program that
creates the whole tree T0n
Procedure Final-tree(Input: a[1:n],p[1:n],q[1:n];
Output: T)
begin
Weight(p[1:n], q[0:n], W[0:n,0:n]);
OBST(p[1:n], q[0:n], W[0:n,0:n],
C[0:n,0:n], r[0:n,0:n]);
create-tree(r[0:n,0:n], a[1:n], 0, n, T);
end
- a1 < a2 < a3 < a4p1=1/10, p2=2/10, p3=3/10, p4=1/10
q0=0, q1=1/10, q2=1/20, q3=1/20, q4=1/10
W00=0 W11=1/10 W22=1/20 W33=1/20 W44=1/10 W01=2/10 W12=3.5/10 W23=4/10 W34=2.5/10 W02=4.5/10 W13=7/10 W24=6/10 W03=8/10 W14=9/10 W04=10/10
C00=0 C11=0 C22=0 C33=0 C44=0 C01=2/10
r01=1C12=3.5/10
r12=2C23=4/10
r23=3C34=2.5/10
r34=4C02=6.5/10
r02=2C13=10.5/10
r13=3C24=8.5/10
r24=3C03=14/10
r03=2C14=15/10
r14=3C04=19/10
r04=3 - The tree T04 has as root a3 because r04=3
The left subtree is then T02 and the right subtree is T34
T34 is a single-node tree having a4 because r34=4
T02 has as root a2 because r02=2
The left subtree of T02 is T01 and its right subtree is T22 (which is empty)
T01 is a single-node tree having a1 because r01=1
this completes the tree (to be drawn in class).