{"id":99,"date":"2006-04-16T17:56:00","date_gmt":"2006-04-16T17:56:00","guid":{"rendered":"http:\/\/blog.trungson.com\/?p=99"},"modified":"2006-04-16T17:56:00","modified_gmt":"2006-04-16T17:56:00","slug":"crc32-hash-collision","status":"publish","type":"post","link":"http:\/\/blog.trungson.com\/?p=99","title":{"rendered":"CRC32 hash collision"},"content":{"rendered":"<p>I was trying to use CRC32() to uniquely identify distinctive domain names because it&#8217;s probably the most economical in MySQL datatype (int only takes 4 bytes) comparing to MD5() as a 32-char string. However, it seems hash collisions with CRC32 occur too frequently. Just in a set of about 800k string, got a bunch of duplicate ids. For eg:<\/p>\n<pre>\nfc0591 => 1009521187\n123rainerbommert => 1009521187\n<\/pre>\n<p>Also, using the length of the string to make additional distinction does not seem to help. For eg, both has length of 9, same CRC32 but different strings<\/p>\n<pre>\na1sellers => 3605292603\nadvertees => 3605292603\n<\/pre>\n<p>Some stats: out of over 7M domains, about 6200 collisions (duplicate CRC32 hashes, mostly twice, but 3 cases 3 strings yield the same hash)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was trying to use CRC32() to uniquely identify distinctive domain names because it&#8217;s probably the most economical in MySQL datatype (int only takes 4 bytes) comparing to MD5() as a 32-char string. However, it seems hash collisions with CRC32 occur too frequently. Just in a set of about 800k string, got a bunch of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/blog.trungson.com\/index.php?rest_route=\/wp\/v2\/posts\/99"}],"collection":[{"href":"http:\/\/blog.trungson.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.trungson.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.trungson.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.trungson.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=99"}],"version-history":[{"count":0,"href":"http:\/\/blog.trungson.com\/index.php?rest_route=\/wp\/v2\/posts\/99\/revisions"}],"wp:attachment":[{"href":"http:\/\/blog.trungson.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=99"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.trungson.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=99"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.trungson.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=99"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}