跳转到内容

算法实现/字符串/最长重复子串

来自维基教科书,自由的教科书

在一个字符串中搜索其最长的重复子串。此函数查找的是非重叠重复,而不是其他实现中的重叠重复。也就是说,对于banana,此形式只会找到an,而不会找到重叠的ana。该函数在密文模式搜索中应用。当某些其他因素满足时,可以利用这种非重叠模式来找到凯撒密码的密钥长度。

返回单个字符串的最长重复非重叠子串及其重复次数。在竞赛中,如果长度相同,则返回重复次数最多的子串。如果仍然匹配,则返回先找到的子串。最坏情况是未找到重复,此时循环次数达到最大值(0.5*n)^2。

Function LongestRepeatSubstring(sIn As String, Optional nSS As Long) As String
    'finds longest repeated non-overlapping substring and number of repeats
    'LongestRepeatSubstring is output containing longest substring 
    'sIn is input parameter containing string to test
    'nSS is output parameter,if provided, contains number of found repeats
    
    Dim s1 As String, s2 As String, X As Long
    Dim sPrev As String, nPrev As Long, nLPrev As Long
    Dim nL As Long, nTrial As Long, nPos As Long, vAr as Variant
        
    nL = Len(sIn)
    For nTrial = Int(nL / 2) To 1 Step -1
        For nPos = 1 To (nL - (2 * nTrial) + 1)
            X = 0
            s1 = Mid(sIn, nPos, nTrial)
            s2 = Right(sIn, (nL - nPos - nTrial + 1))
            vAr = Split(s2, s1)            
            X = UBound(vAr) - LBound(vAr)
            If X > 0 Then
                If nPrev < X Then
                    sPrev = s1
                    nPrev = X
                End If
            End If
        Next nPos
        If nPrev <> 0 Then
            LongestRepeatSubstring = sPrev
            nSS = nPrev
            Exit Function
        End If
    Next nTrial
End Function


华夏公益教科书