I'm looking for a fast method to efficiently compute (a
⋅b
) modulo n
(in the mathematical sense of that) for a
, b
, n
of type uint64_t
. I could live with preconditions such as n!=0
, or even a<n && b<n
.
Notice that the C expression (a*b)%n
won't cut it, because the product is truncated to 64 bits. I'm looking for (uint64_t)(((uint128_t)a*b)%n)
except that I do not have a uint128_t
(that I know, in Visual C++).
I'm in for a Visual C++ (preferably) or GCC/clang intrinsic making best use of the underlying hardware available on x86-64 platforms; or if that can't be done for a portable inline
function.
7 years later, I got a solution working in Visual Studio 2019