文本对比工具

专业的在线文本差异对比工具,支持字符级和行级比较

对比选项

设置文本对比的参数

原文(Original)

输入原始文本

修改后(Modified)

输入修改后的文本

对比结果

显示文本差异

新增: 0
删除: 0
修改: 0
相似度: 100%

代码示例

以下是在不同编程语言中实现文本对比的代码示例:

JavaScript 文本对比

// 简单的行级diff算法
function lineDiff(text1, text2) {
    const lines1 = text1.split('\n');
    const lines2 = text2.split('\n');
    const result = [];
    let i = 0, j = 0;
    
    while (i < lines1.length || j < lines2.length) {
        if (i < lines1.length && j < lines2.length && lines1[i] === lines2[j]) {
            result.push({ type: 'equal', content: lines1[i], line1: i, line2: j });
            i++;
            j++;
        } else if (i < lines1.length && (j >= lines2.length || !lines2.includes(lines1[i]))) {
            result.push({ type: 'delete', content: lines1[i], line: i });
            i++;
        } else if (j < lines2.length && (i >= lines1.length || !lines1.includes(lines2[j]))) {
            result.push({ type: 'add', content: lines2[j], line: j });
            j++;
        } else {
            result.push({ type: 'equal', content: lines1[i], line1: i, line2: j });
            i++;
            j++;
        }
    }
    
    return result;
}

// 计算相似度(Levenshtein距离)
function levenshteinDistance(str1, str2) {
    const m = str1.length;
    const n = str2.length;
    const dp = Array(m + 1).fill(null).map(() => Array(n + 1).fill(0));
    
    for (let i = 0; i <= m; i++) dp[i][0] = i;
    for (let j = 0; j <= n; j++) dp[0][j] = j;
    
    for (let i = 1; i <= m; i++) {
        for (let j = 1; j <= n; j++) {
            if (str1[i - 1] === str2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = Math.min(
                    dp[i - 1][j] + 1,
                    dp[i][j - 1] + 1,
                    dp[i - 1][j - 1] + 1
                );
            }
        }
    }
    
    return dp[m][n];
}

// 计算相似度百分比
function calculateSimilarity(str1, str2) {
    const distance = levenshteinDistance(str1, str2);
    const maxLength = Math.max(str1.length, str2.length);
    if (maxLength === 0) return 100;
    return Math.round((1 - distance / maxLength) * 100);
}

Python 文本对比

import difflib

# 行级diff
def line_diff(text1, text2):
    lines1 = text1.split('\n')
    lines2 = text2.split('\n')
    differ = difflib.Differ()
    
    result = list(differ.compare(lines1, lines2))
    return result

# 统一diff格式
def unified_diff(text1, text2, fromfile='Original', tofile='Modified'):
    lines1 = text1.split('\n')
    lines2 = text2.split('\n')
    
    result = list(difflib.unified_diff(
        lines1, lines2,
        fromfile=fromfile,
        tofile=tofile,
        lineterm=''
    ))
    return '\n'.join(result)

# HTML diff格式
def html_diff(text1, text2):
    lines1 = text1.split('\n')
    lines2 = text2.split('\n')
    differ = difflib.HtmlDiff()
    
    html_result = differ.make_table(
        lines1, lines2,
        fromdesc='Original',
        todesc='Modified',
        context=True,
        numlines=5
    )
    return html_result

# 计算相似度
def calculate_similarity(text1, text2):
    return difflib.SequenceMatcher(None, text1, text2).ratio() * 100

# 获取差异统计
def get_diff_stats(text1, text2):
    lines1 = text1.split('\n')
    lines2 = text2.split('\n')
    differ = difflib.Differ()
    
    diff = list(differ.compare(lines1, lines2))
    
    additions = sum(1 for line in diff if line.startswith('+ '))
    deletions = sum(1 for line in diff if line.startswith('- '))
    modifications = additions + deletions
    
    return {
        'additions': additions,
        'deletions': deletions,
        'modifications': modifications,
        'similarity': calculate_similarity(text1, text2)
    }

Python difflib 高级用法

import difflib

# 字符级diff
def char_diff(text1, text2):
    matcher = difflib.SequenceMatcher(None, text1, text2)
    
    result = []
    for tag, i1, i2, j1, j2 in matcher.get_opcodes():
        if tag == 'equal':
            result.append(('equal', text1[i1:i2]))
        elif tag == 'delete':
            result.append(('delete', text1[i1:i2]))
        elif tag == 'insert':
            result.append(('insert', text2[j1:j2]))
        elif tag == 'replace':
            result.append(('delete', text1[i1:i2]))
            result.append(('insert', text2[j1:j2]))
    
    return result

# 单词级diff
def word_diff(text1, text2):
    words1 = text1.split()
    words2 = text2.split()
    matcher = difflib.SequenceMatcher(None, words1, words2)
    
    result = []
    for tag, i1, i2, j1, j2 in matcher.get_opcodes():
        if tag == 'equal':
            result.append(('equal', ' '.join(words1[i1:i2])))
        elif tag == 'delete':
            result.append(('delete', ' '.join(words1[i1:i2])))
        elif tag == 'insert':
            result.append(('insert', ' '.join(words2[j1:j2])))
        elif tag == 'replace':
            result.append(('delete', ' '.join(words1[i1:i2])))
            result.append(('insert', ' '.join(words2[j1:j2])))
    
    return result

关于文本对比

什么是文本对比?

文本对比(Diff)是指比较两个文本之间的差异,找出新增、删除和修改的内容。文本对比工具在代码审查、文档比较、版本控制等场景中非常重要。

常见的Diff格式

  • Unified Diff:最常用的格式,以+和-表示新增和删除
  • Context Diff:显示上下文行,更容易理解
  • HTML Diff:以HTML格式显示,用颜色标识差异
  • Side-by-side Diff:并排显示原文和修改后的内容

文本对比的应用场景

  • 代码审查:查看代码变更
  • 文档比较:对比不同版本的文档
  • 版本控制:Git、SVN等工具的diff功能
  • 数据比对:比较数据集的差异