I have endeavored to concurrently implement Dixon's algorithm, with poor results. For small numbers <~40 bits, it operates in about twice the time as other implementations in my class, and after about 40 bits, takes far longer.
I've done everything I can, but I fear it has some fatal issue that I can't find.
My code (fairly lengthy) is located here. Ideally the algorithm would work faster than non-concurrent implementations.
Why would you think it would be faster? Spinning up a thread and adding synchronized calls are HUGE time syncs. If you can't avoid the synchronized keyword, I highly recommend a single-threaded solution.
You may be able to avoid them in various ways--for instance by ensuring that a given variable is only written by one thread even if read by others or by acting like a functional language and making all your variables final using Recursion for variable storage (Iffy, hard to imagine this would speed anything).
If you really need to be fast, however, I did find some very counter-intuitive things out recently from my own attempt at finding a speedy solution...
What I've been able to intuit is that the compiler is extremely smart at optimizing and is tuned to optimize "Ideal" java code. Static methods are no where near ideal--they are kind of a counter-pattern.. one of the most.
I suggest you write the clearest, best OO code you can that actually runs correctly as a reference--then time it and start attempting tweaks to speed it up.