删除单个目录下重复文件

计算哈希值（MD5），和已有的哈希结果比对，重复删除，没有则保留。

改吧改吧可以在整个磁盘下删除重复文件，建立软连接到首个文件

 1: import os,sys

 2: import hashlib

3:

 4: def file_md5(filepath):

 5:     f = open(filepath,'rb')

 6:     md5obj = hashlib.md5()

 7:     md5obj.update(f.read())

 8:     hash = md5obj.hexdigest()

 9:     return hash

10:

11:

 12: def file_dedup(dirpath):

 13:     hashpool = [];

 14:  print "dedup files in",dirpath

 15:     filelist = os.listdir(dirpath)                    #list all files and dirs in the dir

 16:     for efile in filelist:

 17:         print efile,"check!!!"

 18:         filepath = os.path.join(dirpath,efile)        #file's absolute path

 19:         if os.path.isdir(filepath):                    #if file is a dir

 20:             print filepath,"is a file"

 21:             continue

 22:         else:

 23:             filehash = file_md5(filepath)

 24:             if filehash in hashpool:

 25:                 print "exist! delete"

 26:                 os.remove(filepath)

 27:             else:

 28:                 print "new file"

 29:                 hashpool.append(filehash)

30:

 31: if __name__ == "__main__":

 32:     dirpath = "E://files"

 33:     file_dedup(dirpath)

《删除单个目录下重复文件》上有2条评论

这是文件级重删啊，不过缺点是修改了一个文件，其它链接也跟着变了。

呆鸥