Introduction to Strong Cryptography – p1.0 – Hash functions, US patriots

Posted: June 7, 2011 in cryptography
Tags: , , ,

This article is one of many in my Strong Cryptography series. Today we’ll be covering one of the most useful Strong Cryptography primitives: The Hash function. These little beasts are amazing and have a multitudes of uses!

So, what is a hash function? Glad you asked, lets quote the unabated Wikipedia:

A hash function is any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array (cf. associative array). The values returned by a hash function are called hash values, hash codes, hash sums, checksums or simply hashes.

Sigh, Wikipedia used to have nice user-friendly definitions for common folk, since that’s not the case, let me  give my own definition:

A hash function takes some input bytes, then calculates a fixed value that identifies that sequence of  input bytes. Sorta like a checksum… Wait, a checksum IS a hash function! Cool!

-Jonathan on cryptography.

A (very bad) hash function could be as simple as “divide everything by two until you have one digit left, rounding up as you go along.” We could this function ƒ (ƒ stands for ƒailure of a hash function), and so ƒ(48) = 6 or ƒ(451) = 8.

A simple way to use this function ƒ is to protect a website. When a user creates an account, they choose a number as password (p). During the account creation process we calculate a new number (c = ƒ(p)) and store it for the user. We end up with (c) number, 0-9,  in a database for each user.

To check a supplied password (s) against the database, we simply evaluate if (c ==ƒ(s)).

This hash function ƒ isn’t really very useful in the real world, as our keyspace is only 9 possibilities, it’s scales horribly, and it’s trivial to reverse engineer, but if we substitute fail function ƒ for something better, we end up with a secure system!

Technical Properties of an Ideal Hash Algorithm

  • It should represent a true many-to-one function. It should take an input, and produce an identity output that can be reproduced exactly every time. For every plaintext, there is exactly one hash value. For every hash value, there are infinite plaintexts.
  • The input plaintext can be any length. Yes, but see the next point:
  • The output hash value is always the same size, meaning it’s fixed length. So if you gave the algorithm a plaintext of 50 bytes or 765kbytes, the length of the hash value (the output) would as be say, 16 bytes. The exact length of the output depends on the algorithm design.
  • The always produce the same hash value (output) for the same plaintext (input). So if you hash the quote, “I go on the principle that a public debt is a public curse” you should get the value, vjPWnnLiB0BLrqUuJjtpEM5KClg= every time. This is called identifying, fingerprinting, or hashing the plaintext.
  • A small change in the plaintext will result in a dramatic change to the hash value. So if you change to Madison’s quote to “I go on the principle that a public debt is not a public curse” the hash value is now: g+8o7vlQAGjvlst+XsOEwIzF0Qc=.
  • Permanent and “one way”, meaning that the hashValue cannot be reversed to plaintext. 
  • Given a particular hashValue, it is extremely difficult to find a plaintext that will produce that desired hashValue. These last two points run together… Basically, our function ƒ is far from ideal. Lets say you have (p = 5). It’s very easy to calculate an input (s) that would produce (5 = ƒ(s)). You could use any of (s = { 10, 20, 40} ). An ideal function should make this extremely difficult!!!

SHA1, the standard hash algorithm

SHA-1 (Secure Hash Algorithm – 1) is a hash algorithm that is quite complex, and is known to be secure at the time of this writing, for most applications (the paranoid should use SHA-256 or SHA-512). We can supply any length plaintext to SHA-1 and receive a consistent 20 byte hash value in return. Here’s how to do this is Ruby:

exabrial@arclight:~$ irb
>> require 'base64'
=> true
>> require 'openssl'
=> true
>> shaBytes = Digest::SHA1.digest('Give me liberty, or give me death!')
=> "\367\241\266`30 \2656\214G343\266\276\204\225cB\370\r"
>>  puts Base64.encode64(shaBytes)
=> nil
>> exit

You can use an online hash calculator to check to see if Patrick Henry’s quote was hashed correctly. Go ahead, I’ll wait…

SHA-1(Base64)    : 96G2YBggtTaMRxwztr6ElWNC+A0=
If we make a slight change to the plaintext, lets say: “Give me liberty, or let me eat cake!“, we will get a dramatically different output:
SHA-1(Base64)    : M2vPzwTPc7ALM+OkiGAwCkS1DY4=

Actually, just changing the case of the first character “give me liberty, or give me death!” will produce a completely different hashValue:

SHA-1(Base64)    : g1UFdWJfXWfkIVz42uLLxrJv58g=

Hash Algorithm Input is indeed infinte

Remember how we said the input to the hash function can be infinite? How about the entire text of The Declaration of Independence (12kb)? If you upload the file to the online calculator we’ve been using, the result is still 20 bytes:

SHA-1(Base64): mGnTR5dnrXrMqEMVoLKCMzWEHjU=

Here’s a quick Java program to crunch the same calculation:

import java.nio.CharBuffer;
import org.apache.commons.codec.binary.Base64;

public class SHA1Hasher {
	public static void main(String[] args) throws Exception {
		MessageDigest md = MessageDigest.getInstance("SHA1");
		InputStreamReader insr = new InputStreamReader(SHA1Hasher.class.getResourceAsStream("/usdeclar.txt"));
		CharBuffer charBuff = CharBuffer.allocate(16000);
		String plainText;
		byte[] hashBytes;
		String hashValue;;

		plainText = charBuff.flip().toString();
		hashBytes = md.digest(plainText.getBytes());
		hashValue = Base64.encodeBase64String(hashBytes);
		System.out.println("SHA-1(Base64): " + hashValue);

Real World Usage of Hash Algorithms, Scenario 1: Storing account passwords in a database in a secure manner

You’ve been tasked with creating a website for the American colonial resistance. The British are abound and are randomly inspecting everyones’ databases for suspicious activity. The Founding Fathers have an unfortunate habit of choosing patriotic passwords like “I<3America”. How can mask passwords from British patrols so the secrets of who is loyal to The Crown, and who is a revolutionary is kept secret?

The best thing to do is created a ‘salted password table’ like this:

tjefferson43 HiE0AZhRWAs6Mmd7dVqppM1WtJc= WTg68cVxPI

What we will do is store the password hash, plus some random text called a salt (to add entropy to the user’s password). So Thomas Jefferson’s original password: I<3America now becomes I<3AmericaWTg68cVxPI and is hashed permanently to:  HiE0AZhRWAs6Mmd7dVqppM1WtJc=.

So how do you process a login, now that the plaintext is gone? You perform the same calculation: Take the username from the login form, and lookup the salt. Then take the plaintext password from the login form and concatenated the salt from the database to reform the original string. Hash that string, and if matches the database hash, you know the correct password was supplied! Jefferson’s original patriotic password is completely masked, so those pesky Tories will never know his true allegiance! Yarr!

But what if another patriot (Paul Revere) chooses the exact same password, but is caught by the British? Couldn’t the British Regulars then just infer that anyone with the same password hash is also a patriot (and be hung)?

Because we added salt (entropy) to the plaintext password, we’re guaranteed to never have the same hash twice in our database. Lets say Paul Revere creates an account with the same password: I<3America, but we followed the required practice of using SecureRandom to create a new salt:

tjefferson43 HiE0AZhRWAs6Mmd7dVqppM1WtJc= WTg68cVxPI
paulrevere1776 aktJ/0cn69Y41vssNfZDHY1CsdE= sWVUaTGa6e

So Paul Revere’s password I<3America becomes I<3AmericasWVUaTGa6e, which is hashed to: aktJ/0cn69Y41vssNfZDHY1CsdE= and stored. Later, when Paul Revere was caught and was revealed to be a patriot, Jefferson’s secret is kept safe by salting and hashing.

To summarize the benefits:

  • A database breach means your user accounts are still secure.
  • It’s impossible to tell if two people have the same password.
  • There are no restrictions to the length of a user’s password.
  • Your database column can be a fixed length instead of a varchar.
  • SHA1 was created by people who are smarter than you (and I). You can deflate your ego and leverage their brains.
Sidenote: The salt column is extremely important! Never ever, store, or send, a hash without a salt! Furthermore, the salt value must be securely generated with a random generator. You must use your language’s SecureRandom generator! You’ve been warned on these two points: they are CRITICALLY important (I can’t stress this enough)!

Real World Usage of Hash Algorithms, Scenario 2: Noah Webster

Yes, Noah Webster was an American revolutionary: He was a member of Connecticut militia, an editor of the Federalist newspaper, the author of American copyright,  among many other things. He went on later in life to write some dictionary thing, but the important part here is he was a real hard ass when it came to teaching.

So say Noah Webster has hired you to write a program to test his students spelling. Because he so adamantly believed in spelling correctly, it’s important each student get 100%. Knowing this is a bit unrealistic though, he will allow the students to retake the test as many times as they want until they get 100%. The test consists of Mr. Webster dictating a small passage of text to the class and having them enter it correctly into your program. The program should tell them whether they spelled everything correctly. This will allow them to figure out the correct answer before turning in their final answer.

Another important piece of American history was that Noah Webster was a closet Ruby fan; you have to write this in Ruby. This presents a problem, because the students will be running your program, you can’t just embed the correct answer in your program, or they could look at the source code.

By using a hash, you can embed the hash of the correct text in the program and compare the students input to the hash.

Mr. Webster begins the test by saying to his students:

Before a standing army can rule, the people must be disarmed; as they are in almost every kingdom of Europe. The supreme power in America cannot enforce unjust laws by the sword; because the whole body of the people are armed, and constitute a force superior to any bands of regular troops that can be, on any pretense, raised in the United States.

The correct hash is:  6+IHAAJ1i0KmoPaowU1i0xSjcO4=

Knowing what you know now, can you finish the program for Mr. Webster? Post your answer in the comments!

  1. Allan says:

    Nice intro article. Quick read. Thanks!

  2. Dear Jonathan,

    Very interesting article. It was nice to go back to the basics and understand how all these algorithms work.

    However, you mentioned that the “salt” should be stored along with the hashed password in the database. This is not needed, if you use libraries like Jasypt. For more details kindly have a look at the article

  3. autodidact says:

    I found your link via popurls. I clicked it because I love American history and was intrigued by the mention of patriots and hash functions! I learned something as I am not a programmer. As an autodidact, I thank you for a well written, concise and humorous article suited for those of us who are not well versed in cryptology.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s