ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

用VB实现字符串相似度算法(编辑距离算法 Levenshtein Distance)

2022-05-22 22:35:06  阅读:328  来源: 互联网

标签:Distance VB End String Dim sentence Length 算法 Integer


原文来自Angel_Kitty《用C#实现字符串相似度算法(编辑距离算法 Levenshtein Distance)》

把代码翻译成了VB,具体描述请阅读作者的原文。

Public Class SearchHelper
    ''' <summary>
    ''' 对结果进行排序,不能够直接使用相似度进行排序。因为相似度并没有考虑到句子的长度。
    ''' <br/>按照使用习惯,通常会把匹配度高,并且句子长度短的放在前面。
    ''' <br/>这就得到了排序因子:(不匹配度+0.5)/句子长度。
    ''' </summary>
    ''' <param name="param">要比较的源字符串</param>
    ''' <param name="items">用来比较的关键字字符串数组</param>
    ''' <returns></returns>
    Public Function Search(param As String, items As String()) As String()
        If String.IsNullOrWhiteSpace(param) AndAlso IsNothing(items) AndAlso items.Length = 0 Then
            Dim result(0) As String
            Return result
        End If


        Dim words As String() = param.Split(New Char() {" ", "\u3000"}, StringSplitOptions.RemoveEmptyEntries).OrderBy(Of Integer)(Function(item) item.Length).ToArray()


        Dim q = From sentence In items.AsParallel()
                Let MLL = Mul_LnCS_Length(sentence, words)
                Where MLL >= 0
                Order By (MLL + 0.5) / sentence.Length, sentence
                Select sentence
        Return q.ToArray()
    End Function



    ''' <summary>
    ''' 编辑距离(多字符串)
    ''' </summary>
    ''' <param name="sentence"></param>
    ''' <param name="words">多个关键字。长度必须大于0,必须按照字符串长度升序排列。</param>
    ''' <returns></returns>
    Public Function Mul_LnCS_Length(sentence As String, words As String()) As Integer

        Dim sLength As Integer = sentence.Length
        Dim result As Integer = sLength
        Dim flags(sLength) As Boolean
        Dim C(sLength + 1, words(words.Length - 1).Length + 1) As Integer
        'int[,] C = New int[sLength + 1, words.Select(s => s.Length).Max() + 1];
        For Each word As String In words

            Dim wLength As Integer = word.Length
            Dim first As Integer = 0, last = 0
            Dim i, j, LCS_L As Integer

            'foreach 速度会有所提升,还可以加剪枝
            For i = 0 To sLength - 1
                For j = 0 To wLength - 1
                    If sentence(i) = word(j) Then
                        C(i + 1, j + 1) = C(i, j) + 1
                        If (first < C(i, j)) Then
                            last = i
                            first = C(i, j)
                        End If
                    Else
                        C(i + 1, j + 1) = Math.Max(C(i, j + 1), C(i + 1, j))
                    End If
                Next
                LCS_L = C(i, j)
            Next

            While i > 0 AndAlso j > 0
                If C(i - 1, j - 1) + 1 = C(i, j) Then
                    i -= 1
                    j -= 1
                    If flags(i) = False Then
                        flags(i) = True
                        result -= 1
                    End If
                    first = i
                ElseIf (C(i - 1, j) = C(i, j)) Then
                    i -= 1
                Else    ' If (C(i, j - 1) = C(i, j))
                    j -= 1
                End If
            End While

            If LCS_L <= (last - first + 1) >> 1 Then
                Return -1
            End If
        Next

        Return result
    End Function

    Public Function Distance(str1 As String, str2 As String) As Integer
        Dim n As Integer = str1.Length
        Dim m As Integer = str2.Length

        Dim C(n + 1, m + 1) As Integer
        Dim i, j, x, y, z As Integer
        For i = 0 To n
            C(i, 0) = i
        Next
        For i = 1 To m
            C(0, i) = i
        Next
        For i = 0 To n - 1
            For j = 0 To m - 1
                x = C(i, j + 1) + 1
                y = C(i + 1, j) + 1
                If (str1(i) = str2(j)) Then
                    z = C(i, j)
                Else
                    z = C(i, j) + 1
                End If
                C(i + 1, j + 1) = Math.Min(Math.Min(x, y), z)
            Next
        Next
        Return C(n, m)
    End Function
End Class

 

标签:Distance,VB,End,String,Dim,sentence,Length,算法,Integer
来源: https://www.cnblogs.com/ovbm/p/16299307.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有