ebook img

Sequence Alignment and Dynamic Programming PDF

124 Pages·2005·1.35 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Sequence Alignment and Dynamic Programming

6.096 – Algorithms for Computational Biology Sequence Alignment and Dynamic Programming Lecture 1 - Introduction Lecture 2 - Hashing and BLAST Lecture 3 - Combinatorial Motif Finding Lecture 4 - Statistical Motif Finding Challenges in Computational Biology 4 Genome Assembly 5 Regulatory motif discovery 1 Gene Finding DNA 2 Sequence alignment 6 Comparative Genomics TCATGCTAT TCGTGATAA 3 Database lookup 7 Evolutionary Theory TGAGGATAT TTATCATAT TTATGATTT 8 Gene expression analysis RNA transcript 9 Cluster discovery 10 Gibbs sampling 11 Protein network analysis 12 Regulatory network inference 13 Emerging network properties Comparing two DNA sequences • Given two possibly related strings S1 and S2 – What is the longest common subsequence? S1 A C G T C A T C A S2 T A G T G T C A SS11 AA CC GG TT CC AA TT CC AA Edit distance: •Number of changes SS22 TT AA GG TT GG TT CC AA needed for S1ÆS2 A G T T C A LCSS How can we compute best alignment S1 A C G T C A T C A S2 T A G T G T C A • Need scoring function: – Score(alignment) = Total cost of editing S1 into S2 – Cost of mutation – Cost of insertion / deletion – Reward of match • Need algorithm for inferring best alignment – Enumeration? – How would you do it? – How many alignments are there? Why we need a smart algorithm • Ways to align two sequences of length m, n n m m n m n §  · (  )! 2  ¨ ¸ | ¨ ¸ m m 2 m ( !) S ˜ © ¹ • For two sequences of length n n Enumeration Today's lecture 10 184,756 100 20 1.40E+11 400 100 9.00E+58 10,000 Key insight: score is additive! i S1 A C G T C A T C A S2 T A G T G T C A j • Compute best alignment recursively i, j – For a given split ( ), the best alignment is: • Best alignment of S1[1..i] and S2[1..j] • + Best alignment of S1[ i..n] and S2[ j..m] i i S1 A C G T C A T C A S2 T A G T G T C A j j Key insight: re-use computation S1 S1 A C G T C A T C A A C G T C A T C A S2 S2 T A G T G T C A T A G T G T C A S1 A C G T C A T C A S1 A C G T C A T C A S2 T A G T G T C A S2 T A G T G T C A S1 A C G T S1 C G T C A T C A S2 T A G T G S2 T G T C A Identical sub-problems! We can reuse our work! Solution #1 – Memoization • Create a big dictionary, indexed by aligned seqs – When you encounter a new pair of sequences – If it is in the dictionary: • Look up the solution – If it is not in the dictionary • Compute the solution • Insert the solution in the dictionary • Ensures that there is no duplicated work – Only need to compute each sub-alignment once! Top down approach Solution #2 – Dynamic programming • Create a big table, indexed by (i,j) – Fill it in from the beginning all the way till the end – You know that you’ll need every subpart – Guaranteed to explore entire search space • Ensures that there is no duplicated work – Only need to compute each sub-alignment once! • Very simple computationally! Bottom up approach Key insight: Matrix representation of alignments A C G T C A T C A T A G T G T C A S1 A C G T C A T C A S2 T A A Goal: G G Find best path T T G through the matrix C/G T T C C A A

Description:
6.096 – Algorithms for Computational Biology Sequence Alignment and Dynamic Programming Lecture 1 - Introduction Lecture 2 - Hashing and BLAST
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.