And now for something completely different. In a previous post, I said than no encryption algorithm is secure, but only resistive to attack. Well I lied. There is one, and only one, provably secure encryption technique. This technique, while wildly impractical, is a fun exercise, and was actually used to change world history during the Cold War.

So why are all encryption algorithms insecure? Simple, because they key is shorter than the data. Take this plaintext:

I believe that fear of life brings a greater fear of death.

-David Blaine

That’s 59 bytes of text, yet the typical size for an encryption key is usually 128bits, or 16 bytes, or 268.75% larger. That means at *some* point, a pattern must be used to expand the key over the plaintext. Encryption algorithms are designed to obfuscate patterns in the output, or make their intervals so large that recognition takes massive computation and memory. However, this work at MIT shows that entire ‘queries’ can be ran against fully encrypted data due to patterns existing in the output. My explanation is a gross oversimplification, but the takeaway here is that encrypted data differs significantly from random data.

One way to solve this problem is to make the encryption key larger than our plaintext. In 1917 two Americans, Gilbert Vernam (of Bell Labs) and Joseph Mauborgne (U.S. Army), had the same thought while following up on an earlier idea to secure telegraph systems. They hypothesized that if two machines shared a random sequence of numbers, and were separated by a great distance, they could communicate securely by combining one number from the plaintext, with one number with the random sequence, then at the receiving end, compensate for the random number. The result would be gibberish to an attacker, unless they knew the correct random number.

Later in the 1940s the US and the Soviets proved in separate efforts that this system was impossible to break. However, the system had one catch: if the random sequence was any way statistically predictable, it simply wasn’t secure. Hence the name “One Time Pad”. If the ‘Pad’ (originally a bunch of numbers typed out on a notepad) was used more than ‘One Time’, it could be compared against other messages, giving the output statistical value.

So how does this work in practice? First sample some entropy. My sample is: 5, 2, 8, 3, 6. The numbers in the sequence have no statistical meaning compared to another sample of entropy. If I shift my set down (4, 1, 7, 2, 5) or up (6, 3, 9, 4, 7), it’s still statistically meaningless, unless you knew something about the original data.

Now, take John Adams’ last plaintext (a toast actually):

Independence forever!

-John Adams (on 50 years of freedom)

Using the standard ASCII table, shifting every letter down would transform the quote to:

Hmcdodmcdmbdenqdudq

-John Adams (on the number 1)

Unlike shifting a random sequence, the letters are still statistically significant. In English, ‘e’ is the most letter. This reflects in the plaintext, where ‘e’ occurs 5 times. In the obfuscated quote, ‘d’ appears 5 times. An attacker could easily surmise that ‘d == e’

Now, lets combine the techniques: shift every letter of the plaintext down with a random number. I’ll use the following random sequence: 2, 4, 5, 0, 7, 1, 7, 6, 7, 6, 0, 7, 3, 2, 0, 7, 5, 0, 8, 2, 4

Gj_eidg^^hc^dok`v]p

-John Adams (on randomness)

Now, all the letters are statistically random. Since ‘^’ is now the most common character in the output, an attacker might guess ‘^ == e’ but would be incorrect. Without knowledge of the pad or plaintext, an attacker, even with unlimited computing power and time, cannot guess the actual message. Furthermore, one can supply a false pad and get a completely different message. For instance, applying this pad: 6, -4, -21, -4, 3, -1, -11, -20, -17, -5, 2, -9, -81, -1, -5, 2, -3, 21, -15, 4 to the ciphertext yields an entirely new quote:

Antiferromagnetically

-John Adams (on magnetism)

In the real world, XOR is used rather than adding/subtracting for a myriad of reasons, the two biggest being: it happens to work with binary data, and it covers the statistical significance of the alphabet up nicely. The real world also requires that pads be kept secret and only used once.

In conclusion, the One Time Pad is the only provably secure encryption algorithm, provided than the Pad is completely unpredictable. But with that security comes many impracticalities: distribution of pads, pad length, and pad destruction.

Look forward to a few more cryptography posts, then we will resume course on development topics!

public static void main(String[] args) { String quote = "Independence forever!"; int[] entropy = new int[] { 2, 4, 5, 0, 7, 1, 7, 6, 7, 6, 0, 7, 3, 2, 0, 7, 5, 0, 8, 2, 4 }; char[] chars = quote.toCharArray(); for (int i = 0; i < chars.length; i++) { System.out.print((char) ((int) chars[i] - entropy[i])); } }

Encode program

public static void main(String[] args) { String quote = "Gj_eidg^^hc^dok`v]p"; int[] entropy = new int[] { 2, 4, 5, 0, 7, 1, 7, 6, 7, 6, 0, 7, 3, 2, 0, 7, 5, 0, 8, 2, 4 }; char[] chars = quote.toCharArray(); for (int i = 0; i < chars.length; i++) { System.out.print((char) ((int) chars[i] + entropy[i])); } }

Decode program

A helpful source and further reading: http://en.wikipedia.org/wiki/One-time_pad