Has anyone come across a simhash function implemented in Java?
I've already searched for it, but couldn't find anything.
btw. It looks like Google has patented the algorithm. If you are in US, successfully compete with Google, and do not have own parent portfolio, then do not tell them you are using it.
An implementation in C
http://dsrg.mff.cuni.cz/~holub/sw/shash/
[Removed no longer relevant BibSonomy text]
Here you can find the full java source code. It's very simple. A demo also is provided. http://aneurone.blogspot.com/2012/09/simhash.html
In the mean time, the hash4j library includes a SimHash Java implementation. There is also a FastSimHash implementation, which is up to 10x faster using a bit hack as described in this blog post.
According to this page, you should ask the developers of BibSonomy.