对比选项
设置文本对比的参数
原文(Original)
输入原始文本
修改后(Modified)
输入修改后的文本
对比结果
显示文本差异
新增:
0
删除:
0
修改:
0
相似度:
100%
代码示例
以下是在不同编程语言中实现文本对比的代码示例:
JavaScript 文本对比
// 简单的行级diff算法
function lineDiff(text1, text2) {
const lines1 = text1.split('\n');
const lines2 = text2.split('\n');
const result = [];
let i = 0, j = 0;
while (i < lines1.length || j < lines2.length) {
if (i < lines1.length && j < lines2.length && lines1[i] === lines2[j]) {
result.push({ type: 'equal', content: lines1[i], line1: i, line2: j });
i++;
j++;
} else if (i < lines1.length && (j >= lines2.length || !lines2.includes(lines1[i]))) {
result.push({ type: 'delete', content: lines1[i], line: i });
i++;
} else if (j < lines2.length && (i >= lines1.length || !lines1.includes(lines2[j]))) {
result.push({ type: 'add', content: lines2[j], line: j });
j++;
} else {
result.push({ type: 'equal', content: lines1[i], line1: i, line2: j });
i++;
j++;
}
}
return result;
}
// 计算相似度(Levenshtein距离)
function levenshteinDistance(str1, str2) {
const m = str1.length;
const n = str2.length;
const dp = Array(m + 1).fill(null).map(() => Array(n + 1).fill(0));
for (let i = 0; i <= m; i++) dp[i][0] = i;
for (let j = 0; j <= n; j++) dp[0][j] = j;
for (let i = 1; i <= m; i++) {
for (let j = 1; j <= n; j++) {
if (str1[i - 1] === str2[j - 1]) {
dp[i][j] = dp[i - 1][j - 1];
} else {
dp[i][j] = Math.min(
dp[i - 1][j] + 1,
dp[i][j - 1] + 1,
dp[i - 1][j - 1] + 1
);
}
}
}
return dp[m][n];
}
// 计算相似度百分比
function calculateSimilarity(str1, str2) {
const distance = levenshteinDistance(str1, str2);
const maxLength = Math.max(str1.length, str2.length);
if (maxLength === 0) return 100;
return Math.round((1 - distance / maxLength) * 100);
}
Python 文本对比
import difflib
# 行级diff
def line_diff(text1, text2):
lines1 = text1.split('\n')
lines2 = text2.split('\n')
differ = difflib.Differ()
result = list(differ.compare(lines1, lines2))
return result
# 统一diff格式
def unified_diff(text1, text2, fromfile='Original', tofile='Modified'):
lines1 = text1.split('\n')
lines2 = text2.split('\n')
result = list(difflib.unified_diff(
lines1, lines2,
fromfile=fromfile,
tofile=tofile,
lineterm=''
))
return '\n'.join(result)
# HTML diff格式
def html_diff(text1, text2):
lines1 = text1.split('\n')
lines2 = text2.split('\n')
differ = difflib.HtmlDiff()
html_result = differ.make_table(
lines1, lines2,
fromdesc='Original',
todesc='Modified',
context=True,
numlines=5
)
return html_result
# 计算相似度
def calculate_similarity(text1, text2):
return difflib.SequenceMatcher(None, text1, text2).ratio() * 100
# 获取差异统计
def get_diff_stats(text1, text2):
lines1 = text1.split('\n')
lines2 = text2.split('\n')
differ = difflib.Differ()
diff = list(differ.compare(lines1, lines2))
additions = sum(1 for line in diff if line.startswith('+ '))
deletions = sum(1 for line in diff if line.startswith('- '))
modifications = additions + deletions
return {
'additions': additions,
'deletions': deletions,
'modifications': modifications,
'similarity': calculate_similarity(text1, text2)
}
Python difflib 高级用法
import difflib
# 字符级diff
def char_diff(text1, text2):
matcher = difflib.SequenceMatcher(None, text1, text2)
result = []
for tag, i1, i2, j1, j2 in matcher.get_opcodes():
if tag == 'equal':
result.append(('equal', text1[i1:i2]))
elif tag == 'delete':
result.append(('delete', text1[i1:i2]))
elif tag == 'insert':
result.append(('insert', text2[j1:j2]))
elif tag == 'replace':
result.append(('delete', text1[i1:i2]))
result.append(('insert', text2[j1:j2]))
return result
# 单词级diff
def word_diff(text1, text2):
words1 = text1.split()
words2 = text2.split()
matcher = difflib.SequenceMatcher(None, words1, words2)
result = []
for tag, i1, i2, j1, j2 in matcher.get_opcodes():
if tag == 'equal':
result.append(('equal', ' '.join(words1[i1:i2])))
elif tag == 'delete':
result.append(('delete', ' '.join(words1[i1:i2])))
elif tag == 'insert':
result.append(('insert', ' '.join(words2[j1:j2])))
elif tag == 'replace':
result.append(('delete', ' '.join(words1[i1:i2])))
result.append(('insert', ' '.join(words2[j1:j2])))
return result
关于文本对比
什么是文本对比?
文本对比(Diff)是指比较两个文本之间的差异,找出新增、删除和修改的内容。文本对比工具在代码审查、文档比较、版本控制等场景中非常重要。
常见的Diff格式
- Unified Diff:最常用的格式,以+和-表示新增和删除
- Context Diff:显示上下文行,更容易理解
- HTML Diff:以HTML格式显示,用颜色标识差异
- Side-by-side Diff:并排显示原文和修改后的内容
文本对比的应用场景
- 代码审查:查看代码变更
- 文档比较:对比不同版本的文档
- 版本控制:Git、SVN等工具的diff功能
- 数据比对:比较数据集的差异