Playfair Hillclimbing crack

4.1k views Asked by At

I'm writing a python script to crack a playfair cipher, with only the ciphertext. First i generate about 30-100 decryption keys and run them on the ciphertext, ranking each one on it's digraph frequencies. To the next 'generation'/iteration I copy the ones with the best score. Then they are mutated (letters swapped places in the 5x5 grid) and re-add to the next iteration, which is ranked again, and so on.

I've noticed that the script often finds a local maximum - a key giving a similar distribution, yet not the real deal. I think a solution to the problem would be to introduce more variation to the population of keys (by the end of the script, all of them are almost the same).

I tried to implement it by adding to each generation a couple totally random keys but they get eliminated almost immediately. What would be a better way of doing it? I've also thought of tactics like simulated annealing, but have no idea how much help they would be.

EDIT: Sample ciphertext as requested (key: playfair example)

['p', 'l', 'a', 'y', 'f']
['i', 'r', 'e', 'x', 'm']
['b', 'c', 'd', 'g', 'h']
['k', 'n', 'o', 'q', 's']
['t', 'u', 'v', 'w', 'z']

as el ir ul vi ne uz qk dm kz qe ca qe tb qc pv zb md nv om lo gv qo od er qc zg pv vk ov or iw zg ro nz ge ro af yp qe zi lo rk pr ad xl dl ix cl qr rk dq vu sa zb xv qe ho dm dn ok eb xe do bm iz kd de as kv ef kc rd lv om dm vy km ur et xe aq zb xe tx om rt gh rk hc fg mk py dr qo af zs xv nv ac df ad dl yr do bm ef pm zs lo ce yl ai ca nv ca fy wi dm ov ne tx zb bm kn ul bn ar km uz fo ka ro do gp lo kv dm ml qe zi lo rk pr ad xl tx zb le nv oc py dr lo ca le dx xa mo pr oi yp en dy oc dk zb as kv ix ol pr dr oq pb dr gb eo ak vg xe do df re zb pv nl cr do ya an ad iu dm re dm eo qm dm am pu ad xl nl er nv kz qn oq yg df pb uz fo ya ay dk vu lo gd ex ip ya bp up xv yf nv vk pz dm vq vo vk pr kz ro

2

There are 2 answers

3
Hyperboreus On

You mutate your generation, but you do not recombine it, if I read your question correctly. Maybe you can go about it like this:

  1. Generation 0 is random keys.
  2. Pick the n highest ranked keys, the rest dies.
  3. Crossbreed the survivors among themselves. A low percentage mutates on recombination.
  4. Rinse and repeat.

Recombination could look like this:

  1. Parents are A and B and they produce n offspring.
  2. For each offspring choose 0 <= x <= y < 25 (maybe with a constraint y - x > c). The offspring's key is A[:x] + B[x:y] + A[y:].
0
JeanClaudeDaudin On

The Hillclimbing algo principles applied to crypto works as follows:

  1. Make a key at random (a 25 chars alphabet make at ramdom) and determine fitness with your digraphs scoring. Call it the ‘parent’
  2. Make some change to the parent to produce a ‘child’ key (random swaps in the 5x5 grid keytable is not so bad) and measure its fitness with your digraphs scoring function.
  3. If the child is fitter than the parent, make the child the new parent and dispense with the old
  4. Go back to (2), unless no improvements in the last 1000 changes in which case go back to (1)

That's the way to avoid to get blocked in a local maximum of the digraphs scoring function.

If you want better results than Hillclimbing, you must develop a Simulated Annealing algorithm. Also a scoring function based on trigrams or 4-grams has less local maximums. If you want to crack it faster, you'd move from python to C/C++.

HillClimbing and Simulated Annealing algorithms can be used to crack Playfair ciphers as well as all other 5*5 grid based ciphers, and also simple substitution ciphers and Vigenere ciphers.

An important thing with Playfair cipher is that it's weak: all circular horizontal or vertical permutations of the 5x5 grid is an equivalent key. So the algos convergence for that cipher is faster.

To make the parent alphabet use the following:

alpha='ABCDEFGHIKLMNOPQRSTUVWXYZ'
used=[0]*25;parent=['']*25
for i in range(25):
    j=randrange(25)
    while used[j]:j=randrange(25)
    parent[i]=alpha[j];used[j]=1

The playfair deciphering function:

def DEplayfair(a,key):
l=[];order={}
for k in range(25):order[key[k]]=k
for i in range(0,len(a),2):
    ord1=order[a[i]]
    raw1=ord1//5
    col1=ord1%5
    ord2=order[a[i+1]]
    raw2=ord2//5
    col2=ord2%5
    if raw1==raw2:
        l.append(key[5*raw1 + (col1 + 4)%5])
        l.append(key[5*raw2 + (col2 + 4)%5])
    elif col1==col2:
        l.append(key[col1 + 5*((raw1 + 4)%5)])
        l.append(key[col2 + 5*((raw2 + 4)%5)])
    else:
        l.append(key[5*raw1 + col2])
        l.append(key[5*raw2 + col1])
return ''.join(l)

For the SA algos, you can find the basic principles of SA in: http://en.wikipedia.org/wiki/Simulated_annealing

The Playfair cipher was used by the US Coast Guards during WWII and there is a famous historical Playfair cipher: http://practicalcryptography.com/ciphers/playfair-cipher/

PS: The name of the main character in your playfair cipher example is Alice.