Linux 查看两个文件的交集

这个是很常见的需求,假如两个文件

123
456
789
123
789
abc

我等劳模,上来就挽袖子,自己写脚本了

#!/usr/bin/env python

import sys

def calc_uniq(file1, file2):
    file1 = open(file1)
    file2 = open(file2)
    set1 = set()
    set2 = set()
    for line in file1:
        set1.add(line)
    for line in file2:
        set2.add(line)
    set_intersection = set1 & set2
    print file1.name, "uniq item:", len(set1)
    print file2.name, "uniq item:", len(set2)
    print "intersection len:", len(set_intersection)
    print "intersection detail:"
    print set_intersection
    file1.close()
    file2.close()
    
if __name__ == "__main__":
    if len(sys.argv) == 3:
        calc_uniq(sys.argv[1], sys.argv[2])
    else:
        print "this script calculate the intersection of two files"
        print "usage:", sys.argv[0], "path_to_file1 path_to_file2"

倒也能跑

1 uniq item: 3
2 uniq item: 3
intersection len: 2
intersection detail:
set(['789\n', '123\n'])

但是搜一下,看到这里,http://blogread.cn/it/article/…

1. 取出两个文件的并集(重复的行只保留一份)

cat file1 file2 | sort | uniq

2. 取出两个文件的交集(只留下同时存在于两个文件中的文件)

cat file1 file2 | sort | uniq -d

3. 删除交集,留下其他的行

cat file1 file2 | sort | uniq -u

感觉瞬间挫爆了,辛辛苦苦那么多行,人家一行就搞定

但是,等等,看看这个用例

123
123
456
456

来看输出

$ cat 1 2 | sort | uniq -d
123
456

傻了吧

不过另外又看到这里,http://www.commandlinefu.com/c…

grep -Fx -f file1 file2

这个更加霸气啊,而且还不出错,真是把 shell 玩出花了

Leave a Reply

Your email address will not be published. Required fields are marked *