
1 kingxsp 2013-06-10 21:35:37 +08:00 推荐pybloomfiltermmap库。 |
2 binux 2013-06-10 21:53:18 +08:00 import hashlib hash = hashlib.md5 bloom = 0 def check(str): global bloom str_hash = hash(str) if bloom & int(str_hash.hexdigest(), 16) == 256 ** str_hash.digest_size: return True bloom |= int(str_hash.hexdigest(), 16) return False |
4 C0VN 2013-06-11 00:25:12 +08:00 过滤重复url 这样行不行? list( set( urls ) ) |