系统发育学/比对准备/全局比对
外观
全局比对方法旨在比对两个序列的整个长度。如果没有序列演化的模型,这些方法是非常通用的,并且可以应用于任何序列(不仅仅是生物学的)相似性。
要讨论的第一个算法是 Needleman-Wunsch。
requires: seq1 = first sequence; seq2 = second sequence
//scoring matrices values
match = 1
mismatch = -1
gap = -1
sc_matrix //initialize scoring matrix the size of which is the largest sequence length
p_matrix //initialize pointing matrix the size of which is the largest sequence length
sc_matrix[0][0] = 0
p_matrix[0][0] = 0 // 0 = no where; 1 = left; 2 = right; 3 = down; 4 = up; diagonal = 5;
for i=0 to length of seq1
sc_matrix[0][i] = gap * i
p_matrix[0][i] = 1
end for
for i=0 to length of seq2
sc_matrix[i][0] = gap * i
p_matrix[i][0] = 1
end for
for i=0 to length of seq2
for j=0 to length of seq1
subseq1 <= seq1[j]
subseq2 <= seq2[i]
if subseq1 = subseq2 then
diag <= sc_matrix[i][j] + match
else
diag <= sc_matrix[i][j] + mismatch
end if
up <= sc_matrix[i][j] + gap
left <= sc_matrix[i][j] + gap
if diag >= up then
if diag>= left then
sc_matrix[i][j] <= diag
p_matrix[i][j] <= 5
else
sc_matrix[i][j] <= left
p_matrix[i][j] <= 1
end if
else
if diag>= left then
sc_matrix[i][j] <= up
p_matrix[i][j] <= 4
else
sc_matrix[i][j] <= left
p_matrix[i][j] <= 1
end if
end for
end for
al_seq1 <= ""
al_seq2 <= ""
keep_going <= true
i <= seq1.length
j <= seq2.length
while keep_going = true
if p_matrix = 0
keep_going = false
end if
if p_matrix[i][j] = 5 then
al_seq1.add(seq1[i])
al_seq2.add(seq1[j])
i <= i - 1
j <= j - 1
else if p_matrix[i][j] = 1 then
al_seq1.add(seq1[i])
al_seq2.add("-")
i <= i - 1
else if p_matrix[i][j] = 4 then
al_seq1.add("-")
al_seq2.add(seq1[j])
j <= j - 1
end if
end while
al_seq1 <= al_seq1.reverse
al_seq2 <= al_seq2.reverse