跳至内容

系统发育学/比对准备/全局比对

来自维基教科书,开放世界的开放书籍

全局比对方法旨在比对两个序列的整个长度。如果没有序列演化的模型,这些方法是非常通用的,并且可以应用于任何序列(不仅仅是生物学的)相似性。

要讨论的第一个算法是 Needleman-Wunsch。

requires: seq1 = first sequence; seq2 = second sequence
//scoring matrices values
match = 1
mismatch = -1
gap = -1
sc_matrix //initialize scoring matrix the size of which is the largest sequence length
p_matrix //initialize pointing matrix the size of which is the largest sequence length
sc_matrix[0][0] = 0
p_matrix[0][0] = 0 // 0 = no where; 1 = left; 2 = right; 3 = down; 4 = up; diagonal = 5;
for i=0 to length of seq1
  sc_matrix[0][i] = gap * i
  p_matrix[0][i] = 1
end for
for i=0 to length of seq2
  sc_matrix[i][0] = gap * i
  p_matrix[i][0] = 1
end for
for i=0 to length of seq2
  for j=0 to length of seq1
    subseq1 <= seq1[j]
    subseq2 <= seq2[i]
    if subseq1 = subseq2 then
      diag <= sc_matrix[i][j] + match
    else
      diag <= sc_matrix[i][j] + mismatch
    end if
    up <= sc_matrix[i][j] + gap
    left <= sc_matrix[i][j] + gap
    if diag >= up then
      if diag>= left then
        sc_matrix[i][j] <= diag
        p_matrix[i][j] <= 5
      else
        sc_matrix[i][j] <= left
        p_matrix[i][j] <= 1
      end if
    else
      if diag>= left then
        sc_matrix[i][j] <= up
        p_matrix[i][j] <= 4
      else
        sc_matrix[i][j] <= left
        p_matrix[i][j] <= 1
    end if
  end for
end for
al_seq1 <= ""
al_seq2 <= ""
keep_going <= true
i <= seq1.length
j <= seq2.length
while keep_going = true
  if p_matrix = 0
    keep_going = false
  end if
  if p_matrix[i][j] = 5 then
    al_seq1.add(seq1[i])
    al_seq2.add(seq1[j])
    i <= i - 1
    j <= j - 1
  else if p_matrix[i][j] = 1 then
    al_seq1.add(seq1[i])
    al_seq2.add("-")
    i <= i - 1
  else if p_matrix[i][j] = 4 then
    al_seq1.add("-")
    al_seq2.add(seq1[j])
    j <= j - 1   
  end if
end while
al_seq1 <= al_seq1.reverse
al_seq2 <= al_seq2.reverse
华夏公益教科书