删除单个目录下重复文件

计算哈希值(MD5),和已有的哈希结果比对,重复删除,没有则保留。

改吧改吧可以在整个磁盘下删除重复文件,建立软连接到首个文件

 1: import os,sys

 2: import hashlib

 3:

 4: def file_md5(filepath):

 5:     f = open(filepath,'rb')

 6:     md5obj = hashlib.md5()

 7:     md5obj.update(f.read())

 8:     hash = md5obj.hexdigest()

 9:     return hash

 10:

 11:

 12: def file_dedup(dirpath):

 13:     hashpool = [];

 14:  print "dedup files in",dirpath

 15:     filelist = os.listdir(dirpath)                    #list all files and dirs in the dir

 16:     for efile in filelist:

 17:         print efile,"check!!!"

 18:         filepath = os.path.join(dirpath,efile)        #file's absolute path

 19:         if os.path.isdir(filepath):                    #if file is a dir

 20:             print filepath,"is a file"

 21:             continue

 22:         else:

 23:             filehash = file_md5(filepath)

 24:             if filehash in hashpool:

 25:                 print "exist! delete"

 26:                 os.remove(filepath)

 27:             else:

 28:                 print "new file"

 29:                 hashpool.append(filehash)

 30:

 31: if __name__ == "__main__":

 32:     dirpath = "E://files"

 33:     file_dedup(dirpath)

删除单个目录下重复文件》上有2条评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据