Posts Tagged ‘Java’

What is the secret to good architecture? While there are plenty of posts written about this topic on the internet, I found this article on DZone (originally posted here) intriguing.

The article is written by Niklas Schlimm, whom I do not know personally. The article can be summarized with three points:

Three [proposed] Laws of Good Software Architecture

  1. Good architecture is one that enables architects to have a minimum of irrevocable decisions.
  2. To make decisions revocable you need to design for flexibility.
  3. To make use of flexibility one needs to refactor mercilessly.
CYA Engineering?
  1. What I think Niklas is trying to say is: you will get it wrong, and sadly, you’ll get it wrong most of the time, so design an architecture where you can fix your mistakes with a couple of swaps. Is designing an architecture that allows you to revoke your decisions at whim really fair? Does the CEO get the same privilege?
  2. Flexible is not durable. They don’t build make buildings out of rubber. We’ve probably all worked on over-engineered legacy systems that were “flexible.” Most times, good intentions turning into dancing circles around libraries and frameworks in an unholy ritual of abstraction.
  3. You should be refactoring mercilessly, all the time. But do you need 10 layers of abstraction to do this? Do revocable decisions allow you to revoke custom code?
Alternative laws

Lets address the problem here. Code doesn’t age, or stop working, it’s Human’s that forget how things work, why decisions were made, and becomes disenchanted with the past. The input to the system changes, usage levels increase, and people leave the organization. The only real constant is the code itself!

Given that, I present:

Laws of software developed by Humans

  1. Humans cannot predict where software will be extended in the future. Instead trust in YAGNI and dependency injection.
  2. Humans cannot predict where software is slow. Instead we have profilers and the three rules of optimization.
  3. Humans cannot predict when or how business direction will change, the economy, or future technology enhancements. That’s why we have iterative development: Agile, <a href=”http://en.wikipedia.org/wiki/Scrum_(development)”>Scrum and XP.
Refuting the laws given with the alternative laws
  1. Good architecture is one that enables architects to have a minimum of irrevocable decisions.
    Alternative law #3 says you won’t be able to predict which decisions are irrevocable. Instead of designing an architecture with revocable everything, design the minimum possible architecture that barely scrapes by to get the job done.

    Instead of developing your own bash build script, use Maven, and don’t do anything silly with it. “</amvn clean install” and “mvn clean deploy” should be all the commands you need to send your artifact onward.

    Instead of getting locked into a particular open source project or vendor, use JEE6 standards like JSF2.0, JPA2.0, JMS, and EJB3.1, or commit to using Guava and Guice. Simplicity is key, less is better. Using tools they way they were intended saves time.
  2. To make decisions revocable you need to design for flexibility.
    Alternative law #1 says you stink at designing for flexibility, so don’t do it. Rather than designing for flexibility, design an exit strategy document for each technology you use. This includes frameworks built in house: how will you exit from those once they become legacy code? (The best way is to abstain from writing them.)

    Alternative law #2 also applies here, don’t use obscure technology like a NoSQL database until you actually determine you’ll need a a NoSQL database by profiling. It you’ve truly designed minimum architecture and opted for simplicity, it will be simple to change out your backing store with a different database. It’s good to remember that you are not Google. If you design simple architecture, you’ll have plenty of time later on to switch to NoSQL when it’s needed, rather than rewriting a thousand lines of Redis specific JSON handling code upfront.
  3. To make use of flexibility one needs to refactor mercilessly.
    Given Alternative laws 1, 2, and 3, this brings old law #3 into new light. You should refactor mercilessly and eliminate technologies. Gage yourself: How many new technologies did you bring in versus how many exit plans did you execute?

    Develop a master engineering architecture plan. Every release should bring your closer to the master design, at which point, you should update your exit plans then update your master architecture plan. What technologies could minimize the architecture? Evaluate them… do they require glue code to be written? What frameworks can we exit from and what technologies aren’t working out? Get rid of them, quickly. What in-house utility projects (aka future “legacy code”) has been written? Open source it, replace them, or better yet, observe that YAGNI has manifested itself.
Conclusion, flame wars

When it doubt, don’t write it. Change your processes so you can use a tool without modification. Don’t dance around frameworks with unnecessary abstraction. Above all else, simplicity is key!

Finally, remember this is an attack on complicated software, not the author of the original post. I’ve enjoyed reading his blog. I look forward to your comments!

Advertisements

This article is one of many in my Strong Cryptography series. Today we’ll dive into some deeper to some Hashing topics. Also I’m taking suggestions for subjects in my posts… Last time it was US Founding Fathers, this week it’s tragedies.

We learned in our last article what a hash function is and how for every input, there is exactly one corresponding output. We’re going to apply that concept to two commons situations where confidentiality is needed.

Common Situation #1: A situation in Verona

Lets say you have Romeo who needs to email the following message to Mercutio: “I dreamt a dream tonight.” Unfortunately for Romeo, all the ISPs in Verona are owned by the Montague family (Not much better than Comcast). Romeo doesn’t mind if the Montagues can read his message, but he certainly has an issue of they change his message. How could Romeo be sure his message was delivered without being modified by the Montague family?

Well you could run the message through a hash function and attached the hash to the message. The Montagues however, could simply modify the message, and attached the new hash value. To get around that, I suppose Romeo could calculate the hash, then tell Mercutio in private what the hash value was. However, this would defeat the purpose of emailing him, since in private he could just tell Mercutio the contents of the message.

Any ideas?

Introducing the new MAC (not what you think)

Fortunately, there is an easy solution: Message Authentication Codes or MAC. A MAC function is a ‘keyed hash function’, so you could call it a cousin of a normal hash function. In fact, MAC functions and hash functions are nearly interchangeable! In the last article, we covered the properties of a hash function. So what makes a MAC function different from a hash function?

  • You must posses a secret key can produce a hash value.
  • A MAC function shouldn’t reveal ‘hints’ about the secret key, even if an attacker chooses the input.
Because you must posses a secret key to produce the hash value, they are often called “Keyed Hash Functions.” So lets run the quote “The lady doth protest too much, methinks” through a MAC function:
	private static final byte[] keybytes = new byte[] { (byte) 0xfc, (byte) 0x4f, (byte) 0xbe, (byte) 0x23,
		(byte) 0x59, (byte) 0xf2, (byte) 0x42, (byte) 0x37, (byte) 0x4c, (byte) 0x80, (byte) 0x44, (byte) 0x31,
		(byte) 0x20, (byte) 0xda, (byte) 0x20, (byte) 0x0c };

	public static void main(String[] args) throws Exception {
		Mac mac = Mac.getInstance("HMACSHA1");
		Key key = new SecretKeySpec(keybytes, "HMACSHA1");
		mac.init(key);
		byte[] hashValue = mac.doFinal("The lady doth protest too much, methinks".getBytes());
		String encodedHashValue = Base64.encodeBase64String(hashValue);
		System.out.println(encodedHashValue);
	}

The above program produces this output:

oOUKVR5hDRh4n0H+beVO4JvMw64=

If you use a different key (lets change the last byte to 0x0d) you’ll get a completely different hash:

cDdJwEBm9qIni9A7QfIQ1e9G8qo=

So how does apply to Romeo and Mercutio? First, they meet in private and create a secret key. Both Romeo and Mercutio will know the secret key. Next, when they want to send an email, they run the message through the MAC function and produce a signature. Finally, the recipient runs the message through the same MAC function, using the same key. It the calculated signature matches the signature on the message, they can be assured the message was not tampered with during transmission.

A MAC is actually just a wrapper

Take note of the following line in the above implementation:

Mac mac = Mac.getInstance("HMACSHA1");

You may remember from the last article that SHA1 is a fairly standard hash algorithm. Well here’s the whole story: a MAC function is just a wrapper around a standard hash function. The HMAC part of HMACSHA1 is the method of wrapping. There are other MAC implementations, such as VMAC, but they are less commonly used. VMAC performs a little better than HMAC. Since HMAC and VMAC are just wrappers, you can take almost any hash function and turn them into a keyed hash function. Using SHA1’s older brothers, we can run the same quote through the program:

  • HMACSHA256: POP9cefgoEC9pUaWXqQ8lBbW9CdMi1k3t7LXAGYl87s=
  • HMACSHA512: ALFIpPMbphQJ7KZQHacGIFH3T2qI5AKUqaD8lDilNnjajGL29cZdp68sLeQTjDKD+cIAfZN86udfRdecGeTm0A==

Common Situation #2: Revisiting storing passwords in a database

If you recall in the last article, we decided on the following table structure:

USERNAME PASSWORD_HASH SALT
tjefferson43 HiE0AZhRWAs6Mmd7dVqppM1WtJc= WTg68cVxPI
paulrevere1776 aktJ/0cn69Y41vssNfZDHY1CsdE= sWVUaTGa6e

Lets add one additional column:

USERNAME PASSWORD_HASH SALT MAC_KEY
tjefferson43 RFu51fVI/0y8gmPXkjo0Op9FRHU= WTg68cVxPI KvzpFe690vsVcW1jLqQ1vg==
paulrevere1776 SmZedewg1JD7kRhndEW2cRmcyGw= sWVUaTGa6e FgIpIDFi9kJgXvkp44ua4Q==

So what did we do here? Instead of using straight sha-1 function to store a representation of the user’s password, we now use HMACSHA1. What does this do for you? Not much. The better question is what does it to for hackers… It makes their life extremely difficult!

A MAC defends you from hackers (again, not what you’re thinking)

In our original table setup, a hacker could use two forms of attacks to crack our passwords. He could use a rainbow attack, (yes it’s really called that… No I don’t know why) or a brute force attack (not to be confused with a Bruce force attack).

A rainbow attack uses a multi-terrabyte database to do a reverse lookup on our hash algorithm. The database is generated ahead of time, possibly by cooperating parties and thousands of separate computers to divide up the workload. Even with the combined effort, this still takes months to generate useful tables. However, with the finished rainbow table, an attacker could lookup a hash function output to get an arbitrary input that produces the same hash as the user’s real password. Yikes! The nice thing is, using HMACSHA1 makes this infeasible for a hacker. Because we have keyed our hash algorithm differently for every user’s password, they would have to collectively regenerate the rainbow table for every individual entry, making the rainbow very very very difficult in practice!

A brute force attack is related to the rainbow attack. A brute force attack tries random combinations of letters and numbers, or even common english words until it finds an input that produces the correct output. Unfortunately, HMACSHA1 provides little additional security against brute force attacks. However, because they require powerful computing systems to run, they are less common. There are only two ways to defend against brute force attacks:

  • Enforce good polices around passwords:
    • 12 characters
    • No english words
    • at least 2 numbers
    • at least 2 symbols
    • at least 2 uppercase letters
  • Use SHA-512 or SHA-256 instead of SHA-1.

If you’ve enjoyed this article, all that I ask if you vote me up on DZone and HackerNews. Until next time, thank you for reading!

This article is one of many in my Strong Cryptography series. Today we’ll be covering one of the most useful Strong Cryptography primitives: The Hash function. These little beasts are amazing and have a multitudes of uses!

So, what is a hash function? Glad you asked, lets quote the unabated Wikipedia:

A hash function is any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array (cf. associative array). The values returned by a hash function are called hash values, hash codes, hash sums, checksums or simply hashes.

http://en.wikipedia.org/wiki/Hash_function

Sigh, Wikipedia used to have nice user-friendly definitions for common folk, since that’s not the case, let me  give my own definition:

A hash function takes some input bytes, then calculates a fixed value that identifies that sequence of  input bytes. Sorta like a checksum… Wait, a checksum IS a hash function! Cool!

-Jonathan on cryptography.

A (very bad) hash function could be as simple as “divide everything by two until you have one digit left, rounding up as you go along.” We could this function ƒ (ƒ stands for ƒailure of a hash function), and so ƒ(48) = 6 or ƒ(451) = 8.

A simple way to use this function ƒ is to protect a website. When a user creates an account, they choose a number as password (p). During the account creation process we calculate a new number (c = ƒ(p)) and store it for the user. We end up with (c) number, 0-9,  in a database for each user.

To check a supplied password (s) against the database, we simply evaluate if (c ==ƒ(s)).

This hash function ƒ isn’t really very useful in the real world, as our keyspace is only 9 possibilities, it’s scales horribly, and it’s trivial to reverse engineer, but if we substitute fail function ƒ for something better, we end up with a secure system!

Technical Properties of an Ideal Hash Algorithm

  • It should represent a true many-to-one function. It should take an input, and produce an identity output that can be reproduced exactly every time. For every plaintext, there is exactly one hash value. For every hash value, there are infinite plaintexts.
  • The input plaintext can be any length. Yes, but see the next point:
  • The output hash value is always the same size, meaning it’s fixed length. So if you gave the algorithm a plaintext of 50 bytes or 765kbytes, the length of the hash value (the output) would as be say, 16 bytes. The exact length of the output depends on the algorithm design.
  • The always produce the same hash value (output) for the same plaintext (input). So if you hash the quote, “I go on the principle that a public debt is a public curse” you should get the value, vjPWnnLiB0BLrqUuJjtpEM5KClg= every time. This is called identifying, fingerprinting, or hashing the plaintext.
  • A small change in the plaintext will result in a dramatic change to the hash value. So if you change to Madison’s quote to “I go on the principle that a public debt is not a public curse” the hash value is now: g+8o7vlQAGjvlst+XsOEwIzF0Qc=.
  • Permanent and “one way”, meaning that the hashValue cannot be reversed to plaintext. 
  • Given a particular hashValue, it is extremely difficult to find a plaintext that will produce that desired hashValue. These last two points run together… Basically, our function ƒ is far from ideal. Lets say you have (p = 5). It’s very easy to calculate an input (s) that would produce (5 = ƒ(s)). You could use any of (s = { 10, 20, 40} ). An ideal function should make this extremely difficult!!!

SHA1, the standard hash algorithm

SHA-1 (Secure Hash Algorithm – 1) is a hash algorithm that is quite complex, and is known to be secure at the time of this writing, for most applications (the paranoid should use SHA-256 or SHA-512). We can supply any length plaintext to SHA-1 and receive a consistent 20 byte hash value in return. Here’s how to do this is Ruby:

exabrial@arclight:~$ irb
>> require 'base64'
=> true
>> require 'openssl'
=> true
>> shaBytes = Digest::SHA1.digest('Give me liberty, or give me death!')
=> "\367\241\266`30 \2656\214G343\266\276\204\225cB\370\r"
>>  puts Base64.encode64(shaBytes)
96G2YBggtTaMRxwztr6ElWNC+A0=
=> nil
>> exit

You can use an online hash calculator to check to see if Patrick Henry’s quote was hashed correctly. Go ahead, I’ll wait…

SHA-1(Base64)    : 96G2YBggtTaMRxwztr6ElWNC+A0=
If we make a slight change to the plaintext, lets say: “Give me liberty, or let me eat cake!“, we will get a dramatically different output:
SHA-1(Base64)    : M2vPzwTPc7ALM+OkiGAwCkS1DY4=

Actually, just changing the case of the first character “give me liberty, or give me death!” will produce a completely different hashValue:

SHA-1(Base64)    : g1UFdWJfXWfkIVz42uLLxrJv58g=

Hash Algorithm Input is indeed infinte

Remember how we said the input to the hash function can be infinite? How about the entire text of The Declaration of Independence (12kb)? If you upload the file to the online calculator we’ve been using, the result is still 20 bytes:

SHA-1(Base64): mGnTR5dnrXrMqEMVoLKCMzWEHjU=

Here’s a quick Java program to crunch the same calculation:

import java.io.InputStreamReader;
import java.nio.CharBuffer;
import java.security.MessageDigest;
import org.apache.commons.codec.binary.Base64;

public class SHA1Hasher {
	public static void main(String[] args) throws Exception {
		MessageDigest md = MessageDigest.getInstance("SHA1");
		InputStreamReader insr = new InputStreamReader(SHA1Hasher.class.getResourceAsStream("/usdeclar.txt"));
		CharBuffer charBuff = CharBuffer.allocate(16000);
		String plainText;
		byte[] hashBytes;
		String hashValue;
		insr.read(charBuff);

		plainText = charBuff.flip().toString();
		hashBytes = md.digest(plainText.getBytes());
		hashValue = Base64.encodeBase64String(hashBytes);
		System.out.println("SHA-1(Base64): " + hashValue);
	}
}

Real World Usage of Hash Algorithms, Scenario 1: Storing account passwords in a database in a secure manner

You’ve been tasked with creating a website for the American colonial resistance. The British are abound and are randomly inspecting everyones’ databases for suspicious activity. The Founding Fathers have an unfortunate habit of choosing patriotic passwords like “I<3America”. How can mask passwords from British patrols so the secrets of who is loyal to The Crown, and who is a revolutionary is kept secret?

The best thing to do is created a ‘salted password table’ like this:

USERNAME PASSWORD_HASH SALT
tjefferson43 HiE0AZhRWAs6Mmd7dVqppM1WtJc= WTg68cVxPI

What we will do is store the password hash, plus some random text called a salt (to add entropy to the user’s password). So Thomas Jefferson’s original password: I<3America now becomes I<3AmericaWTg68cVxPI and is hashed permanently to:  HiE0AZhRWAs6Mmd7dVqppM1WtJc=.

So how do you process a login, now that the plaintext is gone? You perform the same calculation: Take the username from the login form, and lookup the salt. Then take the plaintext password from the login form and concatenated the salt from the database to reform the original string. Hash that string, and if matches the database hash, you know the correct password was supplied! Jefferson’s original patriotic password is completely masked, so those pesky Tories will never know his true allegiance! Yarr!

But what if another patriot (Paul Revere) chooses the exact same password, but is caught by the British? Couldn’t the British Regulars then just infer that anyone with the same password hash is also a patriot (and be hung)?

Because we added salt (entropy) to the plaintext password, we’re guaranteed to never have the same hash twice in our database. Lets say Paul Revere creates an account with the same password: I<3America, but we followed the required practice of using SecureRandom to create a new salt:

USERNAME PASSWORD_HASH SALT
tjefferson43 HiE0AZhRWAs6Mmd7dVqppM1WtJc= WTg68cVxPI
paulrevere1776 aktJ/0cn69Y41vssNfZDHY1CsdE= sWVUaTGa6e

So Paul Revere’s password I<3America becomes I<3AmericasWVUaTGa6e, which is hashed to: aktJ/0cn69Y41vssNfZDHY1CsdE= and stored. Later, when Paul Revere was caught and was revealed to be a patriot, Jefferson’s secret is kept safe by salting and hashing.

To summarize the benefits:

  • A database breach means your user accounts are still secure.
  • It’s impossible to tell if two people have the same password.
  • There are no restrictions to the length of a user’s password.
  • Your database column can be a fixed length instead of a varchar.
  • SHA1 was created by people who are smarter than you (and I). You can deflate your ego and leverage their brains.
Sidenote: The salt column is extremely important! Never ever, store, or send, a hash without a salt! Furthermore, the salt value must be securely generated with a random generator. You must use your language’s SecureRandom generator! You’ve been warned on these two points: they are CRITICALLY important (I can’t stress this enough)!

Real World Usage of Hash Algorithms, Scenario 2: Noah Webster

Yes, Noah Webster was an American revolutionary: He was a member of Connecticut militia, an editor of the Federalist newspaper, the author of American copyright,  among many other things. He went on later in life to write some dictionary thing, but the important part here is he was a real hard ass when it came to teaching.

So say Noah Webster has hired you to write a program to test his students spelling. Because he so adamantly believed in spelling correctly, it’s important each student get 100%. Knowing this is a bit unrealistic though, he will allow the students to retake the test as many times as they want until they get 100%. The test consists of Mr. Webster dictating a small passage of text to the class and having them enter it correctly into your program. The program should tell them whether they spelled everything correctly. This will allow them to figure out the correct answer before turning in their final answer.

Another important piece of American history was that Noah Webster was a closet Ruby fan; you have to write this in Ruby. This presents a problem, because the students will be running your program, you can’t just embed the correct answer in your program, or they could look at the source code.

By using a hash, you can embed the hash of the correct text in the program and compare the students input to the hash.

Mr. Webster begins the test by saying to his students:

Before a standing army can rule, the people must be disarmed; as they are in almost every kingdom of Europe. The supreme power in America cannot enforce unjust laws by the sword; because the whole body of the people are armed, and constitute a force superior to any bands of regular troops that can be, on any pretense, raised in the United States.

The correct hash is:  6+IHAAJ1i0KmoPaowU1i0xSjcO4=

Knowing what you know now, can you finish the program for Mr. Webster? Post your answer in the comments!

One thing that amazes me is that the most developers are not familiar with strong cryptography. In my career, I’ve seen all sort of mistakes that lead to leaked data, guessable passwords, unfortunate disclosures, and worse. The nice thing is, you don’t have to understand the ridiculously complex math behind the algorithms, you only have to know the rules for using them correctly. By the end of this series, my goal is to de-mystify the magic, so you can start using the primitives in your code right away!

But first, when I say Strong Cryptography, what the hell am I referring to anyway?

Strong cryptography or cryptographically strong are general terms applied cryptographic systems or components that are considered highly resistant to cryptanalysis.

http://en.wikipedia.org/wiki/Strong_cryptography

So Strong Cryptography is not some esoteric concept you are not privy to: Strong Cryptography is simply a set of definitions and algorithms that have been reviewed by experts, secret government agencies, and third-party organizations and found to be hard to break.

One thing I’ve seen repeatedly done is that developer ‘invents’ a cryptography scheme for a particular purpose. Here’s the thing, cryptography is thousands of years old. If you’ve ever ‘invented’ your own way to ‘encrypt’ data, chances are you’ve just re-invented something that has been discovered thousandsof years ago. If you want to avoid the mistakes that WEP made with wireless, Microsoft did with the XBox, or Sony made with the PS3, this blog series should help you avoid embarrassment, AND give you something impressive to say at the next cocktail party.

Finally, I just wanted to mention this is actually a very personal subject that I have a long history with. I found my first need for cryptography was passing notes to my friends as we played “Spies” in the neighborhood and needed to keep the locations of our secret forts safe. Unfortunately, my single letter substitution cipher must have been broken by some whiz kid as our treehouse was destroyed that summer… After reading Alvin’s Secret Code, we then created 2-3 sets of Caesar wheels and never lost a secret fort again!

I hope you enjoy the series!

One of the most interesting trends I’ve seen lately is the unpopularity of Java around blogs, DZone and others. It seems some people are even offended, some even on a personal level, by suggesting the Java is superior in any way to their favorite web2.0 language.

Java has been widely successful for a number of reasons:

  • It’s widely accepted in the established companies.
  • It’s one of the fastest languages.
  • It’s one of the most secure languages.
  • Synchronization primitives are built into the language.
  • It’s platform independent.
  • Hotspot is open source.
  • Thousands of vendors exist for a multitude of Java products.
  • Thousands of open source libraries exist for Java.
  • Community governance via that JCP (pre-Oracle).

This is quite a resume for any language, and it shows, as Java has enjoyed a long streak as being one of the most popular languages around.

So, why suddenly, in late 2010 and 2011, is Java suddenly the hated demon it is?

  1. It’s popular to hate Java.
  2. C-like syntax is no longer popular.
  3. Hate for Oracle is being leveraged to promote individual interests.
  4. People have been exposed to really bad code, that’s been written in Java.
  5. … insert next hundred reasons here.

Java, the actual language and API, does have quite a few real problems… too many to list here (a mix of native and object types, an abundance of abandoned APIs, inconsistent use of checked exceptions). But I’m offering an olive branch… Lets discuss the real problem and not throw the baby out with the bath water.

So what is the real problem in the this industry? Java, with its faults, has completely conquered web application programming. On the sidelines, charging hard, new languages are being invented at a rate that is mind-blowing, to also conquer web application programming. The two are pitted together, and we’re left with what looks a bunch of preppy mall-kids battling for street territory by break dancing. And while everyone is bickering around whether PHP or Rails 3.1 runs faster and can serve more simultaneous requests, there lurks a silent elephant in the room, which is laughing quietly as we duke it out in childish arguments over syntax and runtimes.

Tell me, what do the following have in common?

  • Paying with a credit card.
  • Going to the emergency room.
  • Adjusting your 401k.
  • Using your insurance card at the dentist.
  • Shopping around for the best car insurance.
  • A BNSF train pulling a Union Pacific coal car.
  • Transferring money between banks.
  • Filling a prescription.

All the above industries are billion dollar players in our economy. All of the above industries write new COBOL and mainframe assembler programs. I’m not making this up, I work in the last industry, and I’ve interviewed and interned in the others.

For god sakes people, COBOL, invented in 1959, is still being written today, for real! We’re not talking maintaining a few lines here and there, we’re talking thousands of new lines, every day, to implement new functionality and new requirements. These industries haven’t even caught word the breeze has shifted to the cloud. These industries are essential; they form the building blocks of our economy. Despite this, they do not innovate and they carry massive expenses with their legacy technology. The costs of running business are enormous, and a good percentage of those are IT costs.

How expensive? Lets talk about mainframe licensing, for instance. Lets say you buy the Enterprise version of MongoDB and put in on a box. You then proceed to peg out the CPU doing transaction after transaction to the database… The next week, you go on vacation, and leave MongoDB running without doing a thing. How much did MongoDB cost in both weeks? The same.

Mainframes software is licensed much different. Lets say you buy your mainframe for a couple million and buy a database product for it. You then spend all week pegging the CPU(s) with database requests. You check your mail, and you now have a million dollar bill from the database vendor. Wait, I bought the hardware, why am I paying another bill? The software on a mainframe is often billed by usage, or how many CPU cycles you spend using it. If you spend 2,000,000 cpu cycles running the database, you will end up owing the vendor $2mil. Bizzare? Absolutely!

These invisible industries you utilize every day are full of bloat, legacy systems, and high costs. Java set out to conquer many fronts, and while it thoroughly took over the web application arena, it fizzled out in centralized computing. These industries are ripe for reducing costs and becoming more efficient, but honestly, we’re embarrassing ourselves. These industries stick with their legacy systems because they don’t think Ruby, Python, Scala, Lua, PHP, Java could possibly handle the ‘load’, scalability, or uptime requirements that their legacy systems provide. This is so far from the truth, but again, there has been 0 innovation in the arenas in the last 15 years, despite the progress of web technology making galaxy-sized leaps.

So next week someone will invent another DSL that makes Twitter easier to use, but your bank will be writing new COBOL to more efficiently transfer funds to another Bank. We’re embarrassing ourselves with our petty arguments. There is an entire economy that needs to see the benefits of distributed computing, but if the friendly fire continues, we’ll all lose. Lest stop these ridiculous arguments, pass the torch peacefully, and conquer some of these behemoths!

Perhaps I’ve had a bad week, but I am not a fan of DDD… today. Maybe it’s because I’ve spent close to 50% of my career refactoring code that was over-engineered, unclear, or redundant… and DDD just happened to be present. Maybe the previous generation of programmers sat around the water cooler discussing design patterns, and they agreed DDD was second best only to pre-sliced pancakes. Maybe I haven’t really taken the time to understand when DDD really works, and that my sorrows and anger are misplaced. Perhaps many more agree with me, and I could incite lively discussion on DZone or the comments below…

I think it was my uncle who I first heard the phrase uttered, “Always walk a mile in the other guy’s shoes… so you’re a mile away and you’ve got his shoes.” Lets start with what DDD is, a simple example, and some of the pitfalls I’ve had to deal with this week.

DDD is Domain Driven Design. In a nutshell, the idea is to take the Object Oriented Programming (OOP) principles, understand them, then make them pervasive in your application architecture. The original concept of OOP was to bundle data, and logic that guards said data, together in a structure called an Object. The theory is that you model Object after real-world behavior, or a popular buzzword these days is “business logic.” Lets put the business logic right with the business data and pass it around. This is very OO! Eventually you’re so OO, that your ‘view layer’ applications are essentially just reflections of your domain layer… and actually DDD an entire process, but I’m going to just focus on the coding part for this post…

A good start would be the old, oft-abused Bank Account example. Lets have an object that represents someone’s bank account:

public class Account {
 private long balance;  // bitcoins
}

This has data, but no business logic, so we can’t get to their account balance. Lets add some:

public class Account {
 private long balance;

 public long getBalance(){
  return balance;
 }

 public void decrementAmount(long amount){
  balance = balance - amount;
 }

 public void addAmount(long amount){
  balance = balance + amount;
 }
}

Well this isn’t bad. I can clearly see what’s going on here; the methods are self-describing, and the code is clear about its intentions. I’m really feeling good about this actually.

public class Account {
	private long balance;

	public long getBalance() {
		return balance;
	}

	public void decrementAmount(long amount) {
		balance = balance - amount;
	}

	public void addAmount(long amount) {
		balance = balance + amount;
	}

	public void transferAmountFromAccount(long amount, Account foreignAccount) {
		foreignAccount.decrementAmount(amount);
		addAmount(amount);
	}

	public void fireInterestCaculation(double interestRate) {
		// office space arithmetic
		long interest = (long) (interestRate * balance);
		addAmount(interest);
	}
}

Well this is interesting… In accordance with the DRY principle, the transfer and fireInterest methods begin using other public methods internally. We could continue, maybe throw exceptions for NonSufficientFunds by calling getBalance first, or maybe even make fireInterest a recursive calculation. The reuse gets even better once we get to polymorph, extend, and generalize our objects. As we use the OO principles, we continually invoke the information hiding principle and complex business logic becomes nothing more than a glorious stack of method calls… And by now you too are thinking such a method stack could only be rivaled by a stack of pre-sliced pancakes.

So what is the opposite of DDD? An ‘Anemic Domain Model’, or one which domain object carry data but no logic. Here is the equivalent object in an anemic model:

public class Account {
	private long balance;

	public long getBalance() {
		return balance;
	}

	public void setBalance(long balance) {
		this.balance = balance;
	}

*Business logic is in a service somewhere…

Let me be perfectly honest… I started this article very harshly. I actually like DDD. DDD is the manifesting of all OO principles. You’re using the language to its fullest extent. Design patterns flow more naturally. Maybe you have fully achieved the purity of quantified conception… I hear this repeated, “Anemic domain models are leftover from procedural programming,” which to some extent is true. But are they all the bad? No, but with an Anemic Domain Model you aren’t taking advantage of the language. So DDD is the way to go? I think so, but there seems to be a few fundamental limitations with DDD that I think must be managed, and in my career, I’ve seen them managed ineffectively. The articles I’ve read on DDD focus very much on the benefits of DDD and how to implement it, but fair miserably at managing the lifecycle of DDD.

Jon’s DDD Antipattern #1: Your programmers must be domain experts.
This is an oft-omitted detail of doing DDD. Writing DDD effectively means you need to have programmers that understand the language of your domain fully, or, your analysts (that understand the domain fully) must be able to communicate in terms of OO. This should be fairly obvious because effective DDD means translating your domain from terms of accounts and interest rates to objects and functions. Furthermore, if your organization has a lot of turnover, you will be bitten rather than benefit from DDD as knowledge of how your domain was implemented evaporates.

Solution: Retain your experts, both technical and business. Recruit fresh talent into the organization to widen the pool. Group your domain experts and technical staff by domain, not by job position.

Jon’s DDD Antipattern #2: The impatient incompetent programmer
Impatient and lazy programmers are general antipatterns, but I feel in DDD they can really harm you. Remember how above I showed above how we could recycle existing methods in our code? Lets say Jane is late on a project, and she needs to add compounding interest to our Account object above. Because she’s in a hurry, she doesn’t have time to study the domain or the previous implementations. She extends the Account object as CompoundingAccount:

public class CompoundingAccount extends Account {

 public void compoundInterest(int periods, double rate){
  int interest = 0;
  for ( i = 0;  i < periods;  i++ ) {
    // more office space arithmetic
    long interest = (long) (interestRate * balance);
    addAmount(interest);
  }
 }
}

This code enters your domain and hundreds of view layers begin using the CompoundingAccount. Next, your company gets sued for not computing interest correctly (as shown here). You, as the domain expert know that in the root object know exactly where to make this change in the Account object. You do so, but because Jane did not obey the DRY principle, she has caused compounding damage to the domain! You could have hundreds of methods and views built on either fireInterest or compoundInterest. Think this won’t happen to you? Be realistic: It’s not a matter of if these events will happen; it’s simply a matter of when they will happen., and if you don’t have a contingency plan or governance, you’ve already failed.

Solution: Recognize DDD is a true double edge sword. As business logic propagates down your domain tree, mistakes will propagate too. Those mistakes will compound as methods are reused, causing a problem. You must setup governance in your domain. You must have centralized domain experts that develop and can tell you the correct place to implement new functionality as your domain expands.

Finally,
Jon’s DDD Antipattern #3: DDD is made ineffective in a distributed environment by most SOA implementations.
This is also frequently omitted. Remember how we said that classes were designed to hold state and business logic together? What happens if we translate the Account object to XML (a very common SOA protocol)

<account>
 <balance>375</balance> 
</account>

Where did my business logic go? It isn’t transferred… Protocols like REST, SOAP, RMI, Hessian transmit the state of business objects, but they do not transmit the implementation. How then, can you transmit an object between two independent systems and guarantee that fireInterestCaculation will create the exact same value? You actually can’t. The implementations on both ends of the wire must be the same, which you can’t guarantee in a distributed environment.

Solution: The only thing I can think of right now that might come close to resolving this issue the little used feature of Java RMI called Remote Class Loading. Remote Class Loading will sense that a client does not have an implementation, and load the implementation across the network. Maybe this is a solution? I haven’t experimented with this.

So as I sit here slicing my pancakes, I’ll admit I’m not opposed to DDD. In my career, DDD has simply been an innocent bystander in the war of egos vs time-schedules. The problems with DDD have been in its execution, not it’s integrity. We’ve all heard the benefits of DDD many times over, but I see very little discussion about how to manage its pitfalls. How do you manage DDD effectively? Have you any stories how DDD has because ineffectual? How do you deal with change management in your domain?

Once again, thank you for reading!

Two years back, I was tasked with redesigning the member portal for my company. We had a great set of tools we could crank smaller applications out, and a brilliant team of people who understood them. This team frequently debated best practices in Maven, Spring, Hibernate, etc and the design patterns that surrounded them. Bean naming conventions were tossed about, file naming conventions were invented, and over all, it was pretty good life. However, due to the economy and slow business, development teams were cut back. As the lead engineer on the member portal redesign, the team was cut to two engineers… myself being one of them.

At the time, we built our applications surrounding Spring’s best practices. You could say our applications completely revolved around Spring. We had interfaces for everything: We had domain jars, DAO jars, service jars, facade jars, war poms and ear poms. The jars were built into every application; we had a service jar that knew how to get member objects from our database; we had another just to lookup pharmacies… Each one of these got baked into the applications…. Since each application went to production separately, our platform fragmented into a pile of baby powder. When SHTF (Support Hit The Fan), I think the most common question I was asked was, “Hey Jon, what version of member services is X application on?”

With every app on an ever-so-slightly-different version of a particular service, the memory usage for an individual application was outrageous. Startup times for individual apps was measured in minutes for some. With every application owning its own services, we had to provision multiple datasources for each app. Caching was unheard of… Things really started getting out of hand when our customers wouldn’t migrate between versions of our applications or SOAP services (who can blame them?). Quite slowly, we had a sprawling portfolio of slightly similar production applications that we had to keep running. Budgets and teams were shrinking, and this massive project had to take flight; deadlines were just over the horizon, but disaster was brewing in the skies and the sound of delays thundered in the distance.

Philosophically, we claimed we didn’t build monolithic applications. After all, our applications had a “separation of concerns,” only in the sense that each concern was a separate jar file. We had “services”, but these services were not distributed nor redundant. We knew we had to go the distributed route… so we started to “fix” our problems by making internal SOAP services available (After all, in Spring, everything is a bean, so why can a proxy be a bean?) Our latency skyrocketed. Function calls that took 1ms now took 40+ms. We had to limit these to very coarse-grained calls, but that just caused our SOAP payload size to swell. We brought in individual caches for SOAP calls, but now we faced memory and coherency issues.

Back to the project, realizing that we had in fact been building “monolithic” applications, we had to go distributed. But how? My coworker said, lets just replace our SOAP calls with REST calls and we’ll be in great shape. So we set off, having my teammate write  the frontend in JSP/YUI, while I concentrated on developing the backend services and our usual massively complex Spring assembly. Benchmarking my calls, I saw we had gone from 40ms SOAP calls to 30ms of REST calls! What??? REST was the answer to all that was life! Panicking, I profiled the CXF/Spring code and realized I’m spending all day parsing strings! The only real benefit rest bought me was there was just less strings to parse!

I stopped. I know I’m supposed to “optimize last”, but clearly, our design or technology choices were flawed. The services had complex bindings and a million strange annotations littered about them. And oh man, did we have a ton of Spring XML files. Something had to change…

About that time, I stumbled on post by Adam Bien … Take a look at some EJB3.0 code he published:

@Remote
public interface BookService {
Book createOrUpdate(Book book);
void remove(Book book);
Book find(Object id);
}

To use invoke that service remotely (to ‘inject’ or ‘lookup’ that service) you do this:

@EJB
private BookService  service;

Wait, isn’t EJB code supposed to be complex? Isn’t it full of XML, vendor specific deployment descriptors, RemoteExceptions, Portable Objects, Narrowing, IIOP, CORBA, RMI, Lookups, LocalHomes, RemoteHomes, Object pools, RMIC compilers, IDLs and IntitialContexts? Doesn’t every method have to throw annoying checked exceptions? And most of all, when the hell did EJBs get Java annotations???

Here’s the crux of this post: The above code is expressive and clear in its intent. Even a Ruby programmer (joke) could see that we’re defining a service contract here that will be offered remotely. This style of definition delivers a certain elegance in its simplicity.

…and I realized that I had a possible solution to my problems. I quickly ripped out my Spring/CXF/REST job  and reduce the clutter to a few lines of code. I deleted my Spring files and flung XML into the recycle bin. The moment of truth came with the benchmarks… So EJB under the scenes uses Java serialization… and it’s fast… a lot faster than SOAP and still faster than REST. My response times  dropped below 10ms. Payloads that were single lines of text dropped to 3ms or less… wow!  Better yet, getting rid of the Spring Jars, the CXF jars and their laundry list of transient dependencies took our deployment artifacts from 20 megabytes to 700 kilobytes. Now that is lightweight! I got approval to move forward with the technology and I surged forward, ripping to shreds my previous XML excursion in a wake of JEE.

Now EJB3.0 isn’t without its own set of problems, which I will cover in later posts, but overall, it offers several key advantages:

  1. It’s lightweight: You can write full services using 4 annotations and deploy under a megabyte. I haven’t seen applications that small since Windows 3.1.
  2. It’s fast: Well faster than REST/SOAP… I’m sure there are probably faster binary protocols, but hey, Java Serialization is built into the JVM and it spanks any text protocol.
  3. It’s familiar: You still have to be conscious that you’re in a distributed system, but most developers have used Java Serialization. For most POJOs, you don’t have to do anything except implement Serializable.
  4. It’s transparent: It’s amazing. The EJB3.0 annotations are minimal and powerful at the same time. Developer productivity is high, and there is no generated boilerplate code. It’s easy to teach these concept to people who didn’t bear the tragedy of EJB2.1.
  5. It’s reliable: EJB has always been a cluster-able technology if you application container supports it.
The story doesn’t end, as this was just the first iteration of the application. The new member portal application is happily humming along in production. We’ve had a few hiccups with WebSphere 7, but for the most part, it’s been relatively problem free. We’re also finally starting to reap the benefits of a distributed SOA architecture. Our applications share connection pools, we’re cutting down to a few core services, deployment artifact size is 20x smaller, and memory usage is dropping.
In future posts, I’ll write examples of just how easy it is to get these services running, and why Java is a great distributed language when coupled with JEE6. You can have developer productivity (and fun) without sacrificing reliability on bleeding edge dynamic languages.

Footnote:

A great resource for EJB3.0 is the free book Master EJB 4th Edition. I read the entire thing in a weekend on an Evo4g. It’s brief, but it will give you a working knowledge to get off and running.
Also, Adam Bien‘s weblog is a great resource for everything EJB3+ and JEE6.