How to customize the output of the Postgres Pseudo Encrypt function?

2.1k views Asked by At

I would like to use the pseudo_encrypt function mentioned a few times on StackOverflow to make my IDs look more random: https://wiki.postgresql.org/wiki/Pseudo_encrypt

How can I customize this to output unique "random" numbers for just me. I read somewhere that you can just change the 1366.0 constant, but I don't want to take any risks with my IDs as any potential ID duplicates would cause major issues.

I really have no idea what each constant actually does, so I don't want to mess around with it unless I get some direction. Does anyone know which constants I can safely change?

Here it is:

CREATE OR REPLACE FUNCTION "pseudo_encrypt"("VALUE" int) RETURNS int     IMMUTABLE STRICT AS $function_pseudo_encrypt$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
    l1:= ("VALUE" >> 16) & 65535;
    r1:= "VALUE" & 65535;
    WHILE i < 3 LOOP
        l2 := r1;
        r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
        r1 := l2;
        l1 := r2;
        i := i + 1;
END LOOP;
RETURN ((l1::int << 16) + r1);
END;
$function_pseudo_encrypt$ LANGUAGE plpgsql;

for bigint's

CREATE OR REPLACE FUNCTION "pseudo_encrypt"("VALUE" bigint) RETURNS bigint IMMUTABLE STRICT AS $function_pseudo_encrypt$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
BEGIN
    l1:= ("VALUE" >> 32) & 4294967295::bigint;
    r1:= "VALUE" & 4294967295;
    WHILE i < 3 LOOP
        l2 := r1;
        r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767*32767)::bigint;
        r1 := l2;
        l1 := r2;
        i := i + 1;
    END LOOP;
RETURN ((l1::bigint << 32) + r1);
END;
$function_pseudo_encrypt$ LANGUAGE plpgsql;
2

There are 2 answers

8
Daniel Vérité On BEST ANSWER

Alternative solution: use different ciphers

Other cipher functions are now available on postgres wiki. They're going to be significantly slower, but aside from that, they're better candidates for generating customized random-looking series of unique numbers.

For 32 bit outputs, Skip32 in plpgsql will encrypt its input with a 10 bytes wide key, so you just have to choose your own secret key to have your own specific permutation (the particular order in which the 2^32 unique values will come out).

For 64 bit outputs, XTEA in plpgsql will do similarly, but using a 16 bytes wide key.

Otherwise, to just customize pseudo_encrypt, see below:

Explanations about pseudo_encrypt's implementation:

This function has 3 properties

  • global unicity of the output values
  • reversability
  • pseudo-random effect

The first and second property come from the Feistel Network, and as already explained in @CodesInChaos's answer, they don't depend on the choice of these constants: 1366 and also 150889 and 714025.

Make sure when changing f(r1) that it stays a function in the mathematical sense, that is x=y implies f(x)=f(y), or in other words the same input must always produce the same output. Breaking this would break the unicity.

The purpose of these constants and this formula for f(r1) is to produce a reasonably good pseudo-random effect. Using postgres built-in random() or similar method is not possible because it's not a mathematical function as described above.

Why these arbitrary constants? In this part of the function:

r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;

The formula and the values 1366, 150889 and 714025 come from Numerical recipes in C (1992, by William H.Press, 2nd ed.), chapter 7: random numbers, specifically p.284 and 285. The book is not directly indexable on the web but readable through an interface here: http://apps.nrbook.com/c/index.html .It's also cited as a reference in various source code implementing PRNGs.

Among the algorithms discussed in this chapter, the one used above is very simple and relatively effective. The formula to get a new random number from a previous one (jran) is:

jran = (jran * ia + ic) % im;
ran = (float) jran / (float) im;  /* normalize into the 0..1 range */

where jran is the current random integer.

This generator will necessarily loop over itself after a certain number of values (the "period"), so the constants ia, ic and im have to be chosen carefully for that period to be as large as possible. The book provides a table p.285 where constants are suggested for various lengths of the period.

ia=1366, ic=150889 and im=714025 is one of the entries for a period of 229 bits, which is way more than needed.

Finally the multiplication by 32767 or 215-1 is not part of the PRNG but meant to produce a positive half-integer from the 0..1 pseudo-random float value. Don't change that part, unless to widen the blocksize of the algorithm.

4
CodesInChaos On

This function looks like a blockcipher based on a Feistel network - but it's lacking a key.

The Feistel construction is bijective, i.e. it guarantees that there are no collisions. The interesting part is: r2 := l1 # f(r1). As long as f(r1) only depends on r1 the pseudo_encrypt will be bijective, no matter what the function does.

The lack of key means that anybody who knows the source code can recover the sequential ID. So you're relying on security-though-obscurity.

The alternative is using a block cipher which takes a key. For 32 bit blocks there are relatively few choices, I know of Skip32 and ipcrypt. For 64 bit blocks there are many ciphers to choose from, including 3DES, Blowfish and XTEA.