The Code Mechanic

How to win or fail when writing software: A car analogy

Posted: August 20, 2012 in architecture, methodology, Programming
Tags: architecture, software development

I realized a common analogy would be a useful tool to help non-engineers and non-developers… why not a car? A modern vehicle is a feat of engineering. Rolling down the highway at 70mph means all sorts of vibrations and harmful oscillations must be damped out to prevent the vehicle from destroying itself. Modern vehicles like my truck actually have a LAN-like network called a CAN(Controller Area Network) that connects everything from the transmission to the radio on a single digital bus. A lot of research and development goes into making a vehicle, so how does that compare to software development? Engineering a vehicle has many phases, the early stages being definition, planning, and experimentation; lets focus in on a few of these.

Definition

The first step of any development process should be definition, or creating a vocabulary for the problem space. The goal is to make sure everyone is talking about the same thing. For instance, in web application development, a user and a customer can be different people!

While designing a car, there is driver, passenger, steering wheel, e-brake, and many more. These terms must be used consistently in the development lifecycle. Ambiguities must be sought and destroyed. For instance, is a driver also a passenger? Or perhaps driver and passengers are both occupants? This critical step is glossed over by Agile methodologists that want to jump right into coding.

A great (and free) resource on how to create a business vocabulary is Domain Driven Design Quickly, by Eric Evans. Eric steps through the process of creating an air traffic control system. Nearly an entire chapter is spent defining the actors, nouns, actions, and relationships in the air traffic control ecosystem before introducing any DDD concepts.

If you’re a non-engineer, you should demand your developers create this vocabulary with you. As you’re painfully aware, communicating what you want with developers is difficult. Creating a vocabulary is a communication channel. Do not skip straight into development or design without it!

Planning

When GM had automotive engineers design the Chevy Volt, it’s unlikely they that put a bunch of sticky notes on a board that said, “A user should be able to steer the car in a particular direction,” then ran off willy-nilly and built out the steering rack from modern hydraulic actuators. Later on, in sprint 25, when they got around to building the engine, they realized that a hydraulic steering rack isn’t ideal for an electric car. Turns out, hydraulics require a constant pressure from a turning pump, and an electric vehicle’s motor isn’t turning at red lights.

They didn’t spend 8 years drawing on paper either, then started building the car. Each time they designed a component of the vehicle, they had to change the plans for the rest of the vehicle. Their design was iterative and if you look at the early prototypes, they look nothing like the final product.

Neither extreme is healthy and unfortunately Agile developers tend to want to get started quickly without thinking of the big picture. Old school waterfall developers like to think that once they develop a plan, they should never ever change it, and also think that they can foresee all the problems that would arise with a particular design.

Software engineers should fall somewhere in the middle. After user stories are somewhat complete, a domain model should be developed that shows object attributes and how they relate to the rest of the domain. After the domain design is created, an architecture plan should then be developed that shows how objects stored, processed, and shuffled around the system. Cross cutting concerns, such as logging, monitoring, and auditing should be introduced at this point.

Unfortunately, there is not easy road for non-engineers here. Your developers should have a technical plan, and they should be able to explain it to you in a way you understand. Each iteration the technical plan and model should be updated and changed. If your developers actually keep the same technical plan between iterations, this should be a red flag: No one gets it right the first time. During development, your engineers will encounter business scenarios and technical problems the team simply didn’t think of. Each iteration should be a refinement, or even a roadmap change if you really just “got it wrong.”

Experimentation

As I said earlier, GM didn’t spend 8 years drawing the design out on paper until they thought they solved every possible problem that popped into their heads. After they defined their vocabulary and came up with a basic design, GM experimented with 25 different battery chemistries in real life using various test platforms to validate their design. Turns out, some manufacturers had no idea that their batteries tended to explode.

Research and development is something you can actually write off on your taxes. R&D should be done at the beginning of every iteration, after design, before development. Creating, measuring, and analyzing the performance, productivity, and maintainability of a particular design will help you weed out expensive mistakes. Developers should concentrate on creating specific goals, and communicate effectively to their peers what they did, why they did it, and what happened. A wiki is a great place to store R&D results and later, when creating the tax writeoffs, accounting can show that actual R&D was done.

This is an area that developers love, and business managers don’t understand. Business managers tend to think playing with shiny development toys is a waste of time when ‘real work’ implementing requirements should be done. (If you work for a company like this, I encourage you to quit your job. These types of factories deserve to be talent starved and are not worth working for)

The trick to making sure that real work is done is timeboxing R&D and setting goals. Specific goals should first be defined, then an appropriately sized timebox is allotted proportionately to the experiment. Specific goals might be evaluating one framework over another, or one design over another. Large R&D boxes should be allocated at the beginning of the project, with small ones at the end. Again, the goal here is to avoid waste. Being proactive rather than reactive, while counterintuitive, will only make your project succeed.

…there are things that we now know we don’t know. But there are also unknown unknowns. There are things we do not know we don’t know

-Donald Rumsfeld (An odd person to say such a quote)

The ultimate goal of R&D is to discover the things you don’t know that you don’t know. Discovering failures proactively saves you time and makes developers happy.

Conclusion

The difference between “coding” and “engineering” is that engineering wants to be deterministic and predictable. Coders tend to jump straight into things without giving long-term business goals thought, then jump ship when the waters get rough. The solution is to follow all steps of an engineering lifecycle while producing engineering artifacts such as models and plans. Finally, just like a car, models and plans require maintenance, and should be changed and upgraded frequently during the engineering lifecycle.

In the comments, how about some stories where upfront design was skipped? Also, lets hear some stories where management locked engineers into a failing design!

In reply to: Three Good Laws of Software Architecture

Posted: May 28, 2012 in architecture, Programming
Tags: architecture, Java, Programming

What is the secret to good architecture? While there are plenty of posts written about this topic on the internet, I found this article on DZone (originally posted here) intriguing.

The article is written by Niklas Schlimm, whom I do not know personally. The article can be summarized with three points:

Three [proposed] Laws of Good Software Architecture

Good architecture is one that enables architects to have a minimum of irrevocable decisions.

To make decisions revocable you need to design for flexibility.

To make use of flexibility one needs to refactor mercilessly.

CYA Engineering?

What I think Niklas is trying to say is: you will get it wrong, and sadly, you’ll get it wrong most of the time, so design an architecture where you can fix your mistakes with a couple of swaps. Is designing an architecture that allows you to revoke your decisions at whim really fair? Does the CEO get the same privilege?
Flexible is not durable. They don’t build make buildings out of rubber. We’ve probably all worked on over-engineered legacy systems that were “flexible.” Most times, good intentions turning into dancing circles around libraries and frameworks in an unholy ritual of abstraction.
You should be refactoring mercilessly, all the time. But do you need 10 layers of abstraction to do this? Do revocable decisions allow you to revoke custom code?

Alternative laws

Lets address the problem here. Code doesn’t age, or stop working, it’s Human’s that forget how things work, why decisions were made, and becomes disenchanted with the past. The input to the system changes, usage levels increase, and people leave the organization. The only real constant is the code itself!

Given that, I present:

Laws of software developed by Humans

Humans cannot predict where software will be extended in the future. Instead trust in YAGNI and dependency injection.

Humans cannot predict where software is slow. Instead we have profilers and the three rules of optimization.

Humans cannot predict when or how business direction will change, the economy, or future technology enhancements. That’s why we have iterative development: Agile, <a href=”http://en.wikipedia.org/wiki/Scrum_(development)”>Scrum and XP.

Refuting the laws given with the alternative laws

Good architecture is one that enables architects to have a minimum of irrevocable decisions.
Alternative law #3 says you won’t be able to predict which decisions are irrevocable. Instead of designing an architecture with revocable everything, design the minimum possible architecture that barely scrapes by to get the job done.

Instead of developing your own bash build script, use Maven, and don’t do anything silly with it. “</amvn clean install” and “mvn clean deploy” should be all the commands you need to send your artifact onward.

Instead of getting locked into a particular open source project or vendor, use JEE6 standards like JSF2.0, JPA2.0, JMS, and EJB3.1, or commit to using Guava and Guice. Simplicity is key, less is better. Using tools they way they were intended saves time.
To make decisions revocable you need to design for flexibility.
Alternative law #1 says you stink at designing for flexibility, so don’t do it. Rather than designing for flexibility, design an exit strategy document for each technology you use. This includes frameworks built in house: how will you exit from those once they become legacy code? (The best way is to abstain from writing them.)

Alternative law #2 also applies here, don’t use obscure technology like a NoSQL database until you actually determine you’ll need a a NoSQL database by profiling. It you’ve truly designed minimum architecture and opted for simplicity, it will be simple to change out your backing store with a different database. It’s good to remember that you are not Google. If you design simple architecture, you’ll have plenty of time later on to switch to NoSQL when it’s needed, rather than rewriting a thousand lines of Redis specific JSON handling code upfront.
To make use of flexibility one needs to refactor mercilessly.
Given Alternative laws 1, 2, and 3, this brings old law #3 into new light. You should refactor mercilessly and eliminate technologies. Gage yourself: How many new technologies did you bring in versus how many exit plans did you execute?

Develop a master engineering architecture plan. Every release should bring your closer to the master design, at which point, you should update your exit plans then update your master architecture plan. What technologies could minimize the architecture? Evaluate them… do they require glue code to be written? What frameworks can we exit from and what technologies aren’t working out? Get rid of them, quickly. What in-house utility projects (aka future “legacy code”) has been written? Open source it, replace them, or better yet, observe that YAGNI has manifested itself.

Conclusion, flame wars

When it doubt, don’t write it. Change your processes so you can use a tool without modification. Don’t dance around frameworks with unnecessary abstraction. Above all else, simplicity is key!

Finally, remember this is an attack on complicated software, not the author of the original post. I’ve enjoyed reading his blog. I look forward to your comments!

Introduction to Strong Cryptography – p2.1 – One Time Pad

Posted: January 14, 2012 in cryptography
Tags: cryptography

And now for something completely different. In a previous post, I said than no encryption algorithm is secure, but only resistive to attack. Well I lied. There is one, and only one, provably secure encryption technique. This technique, while wildly impractical, is a fun exercise, and was actually used to change world history during the Cold War.

So why are all encryption algorithms insecure? Simple, because they key is shorter than the data. Take this plaintext:

I believe that fear of life brings a greater fear of death.

-David Blaine

That’s 59 bytes of text, yet the typical size for an encryption key is usually 128bits, or 16 bytes, or 268.75% larger. That means at some point, a pattern must be used to expand the key over the plaintext. Encryption algorithms are designed to obfuscate patterns in the output, or make their intervals so large that recognition takes massive computation and memory. However, this work at MIT shows that entire ‘queries’ can be ran against fully encrypted data due to patterns existing in the output. My explanation is a gross oversimplification, but the takeaway here is that encrypted data differs significantly from random data.

One way to solve this problem is to make the encryption key larger than our plaintext. In 1917 two Americans, Gilbert Vernam (of Bell Labs) and Joseph Mauborgne (U.S. Army), had the same thought while following up on an earlier idea to secure telegraph systems. They hypothesized that if two machines shared a random sequence of numbers, and were separated by a great distance, they could communicate securely by combining one number from the plaintext, with one number with the random sequence, then at the receiving end, compensate for the random number. The result would be gibberish to an attacker, unless they knew the correct random number.

Later in the 1940s the US and the Soviets proved in separate efforts that this system was impossible to break. However, the system had one catch: if the random sequence was any way statistically predictable, it simply wasn’t secure. Hence the name “One Time Pad”. If the ‘Pad’ (originally a bunch of numbers typed out on a notepad) was used more than ‘One Time’, it could be compared against other messages, giving the output statistical value.

So how does this work in practice? First sample some entropy. My sample is: 5, 2, 8, 3, 6. The numbers in the sequence have no statistical meaning compared to another sample of entropy. If I shift my set down (4, 1, 7, 2, 5) or up (6, 3, 9, 4, 7), it’s still statistically meaningless, unless you knew something about the original data.

Now, take John Adams’ last plaintext (a toast actually):

Independence forever!

-John Adams (on 50 years of freedom)

Using the standard ASCII table, shifting every letter down would transform the quote to:

Hmcdodmcdmbdenqdudq

-John Adams (on the number 1)

Unlike shifting a random sequence, the letters are still statistically significant. In English, ‘e’ is the most letter. This reflects in the plaintext, where ‘e’ occurs 5 times. In the obfuscated quote, ‘d’ appears 5 times. An attacker could easily surmise that ‘d == e’

Now, lets combine the techniques: shift every letter of the plaintext down with a random number. I’ll use the following random sequence: 2, 4, 5, 0, 7, 1, 7, 6, 7, 6, 0, 7, 3, 2, 0, 7, 5, 0, 8, 2, 4

Gj_eidg^^hc^dok`v]p

-John Adams (on randomness)

Now, all the letters are statistically random. Since ‘^’ is now the most common character in the output, an attacker might guess ‘^ == e’ but would be incorrect. Without knowledge of the pad or plaintext, an attacker, even with unlimited computing power and time, cannot guess the actual message. Furthermore, one can supply a false pad and get a completely different message. For instance, applying this pad: 6, -4, -21, -4, 3, -1, -11, -20, -17, -5, 2, -9, -81, -1, -5, 2, -3, 21, -15, 4 to the ciphertext yields an entirely new quote:

Antiferromagnetically

-John Adams (on magnetism)

In the real world, XOR is used rather than adding/subtracting for a myriad of reasons, the two biggest being: it happens to work with binary data, and it covers the statistical significance of the alphabet up nicely. The real world also requires that pads be kept secret and only used once.

In conclusion, the One Time Pad is the only provably secure encryption algorithm, provided than the Pad is completely unpredictable. But with that security comes many impracticalities: distribution of pads, pad length, and pad destruction.

Look forward to a few more cryptography posts, then we will resume course on development topics!

    public static void main(String[] args) {
        String quote = "Independence forever!";
        int[] entropy = new int[] { 2, 4, 5, 0, 7, 1, 7, 6, 7, 6, 0, 7, 3, 2, 0, 7, 5, 0, 8, 2, 4 };

        char[] chars = quote.toCharArray();

        for (int i = 0; i < chars.length; i++) {
            System.out.print((char) ((int) chars[i] - entropy[i]));
        }
    }

Encode program

	
    public static void main(String[] args) {
        String quote = "Gj_eidg^^hc^dok`v]p";
        int[] entropy = new int[] { 2, 4, 5, 0, 7, 1, 7, 6, 7, 6, 0, 7, 3, 2, 0, 7, 5, 0, 8, 2, 4 };

        char[] chars = quote.toCharArray();

        for (int i = 0; i < chars.length; i++) {
            System.out.print((char) ((int) chars[i] + entropy[i]));
        }
    }

Decode program

A helpful source and further reading: http://en.wikipedia.org/wiki/One-time_pad

How to solve production problems

Posted: July 24, 2011 in Uncategorized
Tags: problem solving, production problems

It’s been a few weeks since my last post. Things have shifted around for me career-wise, and while I’m enjoying my new position, I’m transitioning out of my old role. Sometimes it feels like I’ve worked two jobs when I get home! Anyway, I’m going to take a break from the cryptography series and talk about something much more practical.

At my job, I’m finding that a large percentage of the questions that come to my desk are “Hey this thing you did is broke,” or “Hey, this thing you worked with is on the fritz.” 99% of the time, I’d have no idea why the behavior they were asking about was occurring! However, I think I’ve got a great problem solving methodology, which I attribute to being key in my success. Maybe some agile writer can take the concepts here and we can develop a formal methodology for solving problems 🙂

Preventative Measures

As professionals, we should accept that production problems are unavoidable. You will create defects in your code, and no amount of testing will ever weed them all out. Code is rushed, servers crash, disks fail, and datacenters overheat. Accepting the fact that production problems occur is very important, but most importantly you need to plan for them. As one of my favorite financial advisor says, “It’s not a matter of if these events will happen; it’s simply a matter of when they will happen.” Plan your week/month with hours available to investigate strange behavior, crashes or user complaints.

Your developers must be on call. I’ve discovered there is a strong correlation between the cleanliness of a developer’s code, and whether or not he had to be on call at some point in his career. People that don’t have to shovel their own shiztpoo do not have stake in preventing problems! Here’s the thing though, the on-call phone is for emergencies. There should be an understanding that if the issue isn’t happening right now, then it doesn’t need to be worked right now. Too often a pager is used to extend an employee’s work day! If you get to the point where you are looking at code, you should probably agree to throw up a 500 page and sick the whole team on the problem in the morning. Furthermore, if a user calls in the middle of the night to report an issue that they failed to report hours ago, chances are it can wait till daylight. It’s important to protect the on call phone, save it for when the meadow is truly on fire!
Use a bug tracking system! This is so obvious it amazes me how many companies do not have this. When events occur, update the bug report with an occurrence. It’s very important that you keep record of the event… When you suggest to management they give you time to fix an issue, having a record of the hours spent doing crisis diagnosis is an invaluable resource!
Keep an application inventory. Have your developers create a wiki entry for every application, that has the following information:
- What configuration files are need to run
- Which databases are connected to
- What tables are read
- What EIS systems are interacted with
- What names of queues/topics are used
- What usernames are used to interact with above systems
- Where the logs are located
- Known problems!
Do not underestimate how vital this information is! In your SDLC, you should have a step where the above information is updated!

Ok, I’m on call. The phone is ringing. What do I do?

Your first step should always be to get an accurate description of the symptoms. You then must narrow the problem scope to either being able: to reproduce it on demand, or predict when it will occur next. For instance, if you have a B2B webservice and the consumer is complaining your sending SOAP faults, your first move should be to have them dump the XML request that is causing the issue. If an end user is complaining about an error message when they click a link, you should have them copy the link and take screenshots. I call this “narrowing the problem.” This is _the_ most important step. It is not optional, you should not proceed to work on a system unless you know what the problem is! You could easily worsen the situation by making arbitrary changes, and the likelihood of doing something random to solve an unknown issue is abysmal.

What if the end user causing the problem is not available, or the person on the other end is unwilling to provide the information you’re requesting? I think the best tactic at this point to ask them how they would like you to proceed. Let them finish completely without interrupting. Perhaps they were onto something, or had an alternative. Or perhaps they’re in left field, and if it leans towards left field, explain to them that you cannot solve a problem you don’t understand and can’t recreate. Don’t be afraid to stick up for yourself, it takes information to solve problems!

Ok, lets get crackin. It’s late and I want to go back to sleep

After you’ve narrowed the problem to a very very specific story or use case, it’s time to get working. First, copy the logs of the production server onto your local computer. If the logs roll while you’re troubleshooting, you may never find root cause. Next, copy any data related to the problem out of the database and store in text format with the log files. This may mean taking a snapshot of configuration tables of an application, the user’s profile rows, and maybe the data they’re trying to access. If you think it’s related, copy it! If you think it’s unrelated, copy it! Finally, note the version of the application, the bugfix level of the server, and the version number of the libraries available on the server. Whew!

The next thing is to get a pencil and paper. You should write down ever step you took and the time, and more importantly why you did it. This must be done as you perform the steps, but it doesn’t have to be verbose. For example:

“3:12am Noticed the logfile contained many a transaction timeout, meaning the connection pool may be locked. Restarting the application server”

There are three parts to this entry, the first is the observation, the second is the analysis, and the third is the action to correct.

So how do you find the problem?

It’s time to use that application inventory that you did way back in the preventative measures section. Lets say the log is reporting that the user cannot be found in the database. Since you’ve already narrowed the problem, simply refer to your inventory and connect as the application, then proceed to locate the user. If you can’t find them, then problem solved (for now)! Your inventory will tell you exactly what table and database to retrieve this information from.

But what if you’re not finding anything? Well… you need to poke around some more. Filter through your logs files and database entries. Do you see anything that looks out of place? Since you’ve already narrowed the issue down and reproduce it, can you try reproducing it using a different user? How about if you tail the log in realtime while the end user create the problem again? Can you enable debug logging in the frameworks (Spring, Hibernate, etc)? Can you reproduce the problem in a non-production environment? How about remote enabling remote debugging and stepping through as the system executes? How about cleaning up temporary files or deleting JSP compilation directories? Has the system ran out of disk space? Finally, does a good old fashioned restart fix the problem?

The hangover

So you were on the phone for an hour or two last night, what do you do the next day besides show up late? Well before you leave your house, take that pencil/paper with you. Open your bug tracking system, or send an email to the owners and include the original complaint, the logs, the database entries, and what finally fixed the problem. Transcribe your quick notes into the ticket. It’s best if this is done the next day, before your forget how you fixed the problem and why you made the decisions you did.

If you followed every step in this entry, you should have no problem giving management a thorough report of what was wrong and how it was fixed. This information is critical to mend over any rough spots with your customers. It can also be used to CYA if any flak comes your way! Finally, I hope this will make you a problem solving rockstar, ready to write better, simpler code (with tons of debug statements using SLF4J).

Introduction to Strong Cryptography – p1.1 – Keyed Hashing, Hash Attacks, MACs, Shakespeare

Posted: June 20, 2011 in cryptography
Tags: cryptography, hacking, Java, security

This article is one of many in my Strong Cryptography series. Today we’ll dive into some deeper to some Hashing topics. Also I’m taking suggestions for subjects in my posts… Last time it was US Founding Fathers, this week it’s tragedies.

We learned in our last article what a hash function is and how for every input, there is exactly one corresponding output. We’re going to apply that concept to two commons situations where confidentiality is needed.

Common Situation #1: A situation in Verona

Lets say you have Romeo who needs to email the following message to Mercutio: “I dreamt a dream tonight.” Unfortunately for Romeo, all the ISPs in Verona are owned by the Montague family (Not much better than Comcast). Romeo doesn’t mind if the Montagues can read his message, but he certainly has an issue of they change his message. How could Romeo be sure his message was delivered without being modified by the Montague family?

Well you could run the message through a hash function and attached the hash to the message. The Montagues however, could simply modify the message, and attached the new hash value. To get around that, I suppose Romeo could calculate the hash, then tell Mercutio in private what the hash value was. However, this would defeat the purpose of emailing him, since in private he could just tell Mercutio the contents of the message.

Any ideas?

Introducing the new MAC (not what you think)

Fortunately, there is an easy solution: Message Authentication Codes or MAC. A MAC function is a ‘keyed hash function’, so you could call it a cousin of a normal hash function. In fact, MAC functions and hash functions are nearly interchangeable! In the last article, we covered the properties of a hash function. So what makes a MAC function different from a hash function?

You must posses a secret key can produce a hash value.
A MAC function shouldn’t reveal ‘hints’ about the secret key, even if an attacker chooses the input.

Because you must posses a secret key to produce the hash value, they are often called “Keyed Hash Functions.” So lets run the quote “The lady doth protest too much, methinks” through a MAC function:

	private static final byte[] keybytes = new byte[] { (byte) 0xfc, (byte) 0x4f, (byte) 0xbe, (byte) 0x23,
		(byte) 0x59, (byte) 0xf2, (byte) 0x42, (byte) 0x37, (byte) 0x4c, (byte) 0x80, (byte) 0x44, (byte) 0x31,
		(byte) 0x20, (byte) 0xda, (byte) 0x20, (byte) 0x0c };

	public static void main(String[] args) throws Exception {
		Mac mac = Mac.getInstance("HMACSHA1");
		Key key = new SecretKeySpec(keybytes, "HMACSHA1");
		mac.init(key);
		byte[] hashValue = mac.doFinal("The lady doth protest too much, methinks".getBytes());
		String encodedHashValue = Base64.encodeBase64String(hashValue);
		System.out.println(encodedHashValue);
	}

The above program produces this output:

oOUKVR5hDRh4n0H+beVO4JvMw64=

If you use a different key (lets change the last byte to 0x0d) you’ll get a completely different hash:

cDdJwEBm9qIni9A7QfIQ1e9G8qo=

So how does apply to Romeo and Mercutio? First, they meet in private and create a secret key. Both Romeo and Mercutio will know the secret key. Next, when they want to send an email, they run the message through the MAC function and produce a signature. Finally, the recipient runs the message through the same MAC function, using the same key. It the calculated signature matches the signature on the message, they can be assured the message was not tampered with during transmission.

A MAC is actually just a wrapper

Take note of the following line in the above implementation:

Mac mac = Mac.getInstance("HMACSHA1");

You may remember from the last article that SHA1 is a fairly standard hash algorithm. Well here’s the whole story: a MAC function is just a wrapper around a standard hash function. The HMAC part of HMACSHA1 is the method of wrapping. There are other MAC implementations, such as VMAC, but they are less commonly used. VMAC performs a little better than HMAC. Since HMAC and VMAC are just wrappers, you can take almost any hash function and turn them into a keyed hash function. Using SHA1’s older brothers, we can run the same quote through the program:

HMACSHA256: POP9cefgoEC9pUaWXqQ8lBbW9CdMi1k3t7LXAGYl87s=
HMACSHA512: ALFIpPMbphQJ7KZQHacGIFH3T2qI5AKUqaD8lDilNnjajGL29cZdp68sLeQTjDKD+cIAfZN86udfRdecGeTm0A==

Common Situation #2: Revisiting storing passwords in a database

If you recall in the last article, we decided on the following table structure:

USERNAME	PASSWORD_HASH	SALT
tjefferson43	HiE0AZhRWAs6Mmd7dVqppM1WtJc=	WTg68cVxPI
paulrevere1776	aktJ/0cn69Y41vssNfZDHY1CsdE=	sWVUaTGa6e

Lets add one additional column:

USERNAME	PASSWORD_HASH	SALT	MAC_KEY
tjefferson43	RFu51fVI/0y8gmPXkjo0Op9FRHU=	WTg68cVxPI	KvzpFe690vsVcW1jLqQ1vg==
paulrevere1776	SmZedewg1JD7kRhndEW2cRmcyGw=	sWVUaTGa6e	FgIpIDFi9kJgXvkp44ua4Q==

So what did we do here? Instead of using straight sha-1 function to store a representation of the user’s password, we now use HMACSHA1. What does this do for you? Not much. The better question is what does it to for hackers… It makes their life extremely difficult!

A MAC defends you from hackers (again, not what you’re thinking)

In our original table setup, a hacker could use two forms of attacks to crack our passwords. He could use a rainbow attack, (yes it’s really called that… No I don’t know why) or a brute force attack (not to be confused with a Bruce force attack).

A rainbow attack uses a multi-terrabyte database to do a reverse lookup on our hash algorithm. The database is generated ahead of time, possibly by cooperating parties and thousands of separate computers to divide up the workload. Even with the combined effort, this still takes months to generate useful tables. However, with the finished rainbow table, an attacker could lookup a hash function output to get an arbitrary input that produces the same hash as the user’s real password. Yikes! The nice thing is, using HMACSHA1 makes this infeasible for a hacker. Because we have keyed our hash algorithm differently for every user’s password, they would have to collectively regenerate the rainbow table for every individual entry, making the rainbow very very very difficult in practice!

A brute force attack is related to the rainbow attack. A brute force attack tries random combinations of letters and numbers, or even common english words until it finds an input that produces the correct output. Unfortunately, HMACSHA1 provides little additional security against brute force attacks. However, because they require powerful computing systems to run, they are less common. There are only two ways to defend against brute force attacks:

Enforce good polices around passwords:

12 characters
No english words
at least 2 numbers
at least 2 symbols
at least 2 uppercase letters

Use SHA-512 or SHA-256 instead of SHA-1.

If you’ve enjoyed this article, all that I ask if you vote me up on DZone and HackerNews. Until next time, thank you for reading!

Introduction to Strong Cryptography – p1.0 – Hash functions, US patriots

Posted: June 7, 2011 in cryptography
Tags: cryptography, hacking, Java, security

This article is one of many in my Strong Cryptography series. Today we’ll be covering one of the most useful Strong Cryptography primitives: The Hash function. These little beasts are amazing and have a multitudes of uses!

So, what is a hash function? Glad you asked, lets quote the unabated Wikipedia:

A hash function is any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array (cf. associative array). The values returned by a hash function are called hash values, hash codes, hash sums, checksums or simply hashes.

http://en.wikipedia.org/wiki/Hash_function

Sigh, Wikipedia used to have nice user-friendly definitions for common folk, since that’s not the case, let me give my own definition:

A hash function takes some input bytes, then calculates a fixed value that identifies that sequence of input bytes. Sorta like a checksum… Wait, a checksum IS a hash function! Cool!

-Jonathan on cryptography.

A (very bad) hash function could be as simple as “divide everything by two until you have one digit left, rounding up as you go along.” We could this function ƒ (ƒ stands for ƒailure of a hash function), and so ƒ(48) = 6 or ƒ(451) = 8.

A simple way to use this function ƒ is to protect a website. When a user creates an account, they choose a number as password (p). During the account creation process we calculate a new number (c = ƒ(p)) and store it for the user. We end up with (c) number, 0-9, in a database for each user.

To check a supplied password (s) against the database, we simply evaluate if (c ==ƒ(s)).

This hash function ƒ isn’t really very useful in the real world, as our keyspace is only 9 possibilities, it’s scales horribly, and it’s trivial to reverse engineer, but if we substitute fail function ƒ for something better, we end up with a secure system!

Technical Properties of an Ideal Hash Algorithm

It should represent a true many-to-one function. It should take an input, and produce an identity output that can be reproduced exactly every time. For every plaintext, there is exactly one hash value. For every hash value, there are infinite plaintexts.
The input plaintext can be any length. Yes, but see the next point:
The output hash value is always the same size, meaning it’s fixed length. So if you gave the algorithm a plaintext of 50 bytes or 765kbytes, the length of the hash value (the output) would as be say, 16 bytes. The exact length of the output depends on the algorithm design.
The always produce the same hash value (output) for the same plaintext (input). So if you hash the quote, “I go on the principle that a public debt is a public curse” you should get the value, vjPWnnLiB0BLrqUuJjtpEM5KClg= every time. This is called identifying, fingerprinting, or hashing the plaintext.
A small change in the plaintext will result in a dramatic change to the hash value. So if you change to Madison’s quote to “I go on the principle that a public debt is not a public curse” the hash value is now: g+8o7vlQAGjvlst+XsOEwIzF0Qc=.
Permanent and “one way”, meaning that the hashValue cannot be reversed to plaintext.
Given a particular hashValue, it is extremely difficult to find a plaintext that will produce that desired hashValue. These last two points run together… Basically, our function ƒ is far from ideal. Lets say you have (p = 5). It’s very easy to calculate an input (s) that would produce (5 = ƒ(s)). You could use any of (s = { 10, 20, 40} ). An ideal function should make this extremely difficult!!!

SHA1, the standard hash algorithm

SHA-1 (Secure Hash Algorithm – 1) is a hash algorithm that is quite complex, and is known to be secure at the time of this writing, for most applications (the paranoid should use SHA-256 or SHA-512). We can supply any length plaintext to SHA-1 and receive a consistent 20 byte hash value in return. Here’s how to do this is Ruby:

exabrial@arclight:~$ irb
>> require 'base64'
=> true
>> require 'openssl'
=> true
>> shaBytes = Digest::SHA1.digest('Give me liberty, or give me death!')
=> "\367\241\266`30 \2656\214G343\266\276\204\225cB\370\r"
>>  puts Base64.encode64(shaBytes)
96G2YBggtTaMRxwztr6ElWNC+A0=
=> nil
>> exit

You can use an online hash calculator to check to see if Patrick Henry’s quote was hashed correctly. Go ahead, I’ll wait…

SHA-1(Base64)    : 96G2YBggtTaMRxwztr6ElWNC+A0=

If we make a slight change to the plaintext, lets say: “Give me liberty, or let me eat cake!“, we will get a dramatically different output:

SHA-1(Base64)    : M2vPzwTPc7ALM+OkiGAwCkS1DY4=

Actually, just changing the case of the first character “give me liberty, or give me death!” will produce a completely different hashValue:

SHA-1(Base64)    : g1UFdWJfXWfkIVz42uLLxrJv58g=

Hash Algorithm Input is indeed infinte

Remember how we said the input to the hash function can be infinite? How about the entire text of The Declaration of Independence (12kb)? If you upload the file to the online calculator we’ve been using, the result is still 20 bytes:

SHA-1(Base64): mGnTR5dnrXrMqEMVoLKCMzWEHjU=

Here’s a quick Java program to crunch the same calculation:

import java.io.InputStreamReader;
import java.nio.CharBuffer;
import java.security.MessageDigest;
import org.apache.commons.codec.binary.Base64;

public class SHA1Hasher {
	public static void main(String[] args) throws Exception {
		MessageDigest md = MessageDigest.getInstance("SHA1");
		InputStreamReader insr = new InputStreamReader(SHA1Hasher.class.getResourceAsStream("/usdeclar.txt"));
		CharBuffer charBuff = CharBuffer.allocate(16000);
		String plainText;
		byte[] hashBytes;
		String hashValue;
		insr.read(charBuff);

		plainText = charBuff.flip().toString();
		hashBytes = md.digest(plainText.getBytes());
		hashValue = Base64.encodeBase64String(hashBytes);
		System.out.println("SHA-1(Base64): " + hashValue);
	}
}

Real World Usage of Hash Algorithms, Scenario 1: Storing account passwords in a database in a secure manner

You’ve been tasked with creating a website for the American colonial resistance. The British are abound and are randomly inspecting everyones’ databases for suspicious activity. The Founding Fathers have an unfortunate habit of choosing patriotic passwords like “I<3America”. How can mask passwords from British patrols so the secrets of who is loyal to The Crown, and who is a revolutionary is kept secret?

The best thing to do is created a ‘salted password table’ like this:

USERNAME	PASSWORD_HASH	SALT
tjefferson43	HiE0AZhRWAs6Mmd7dVqppM1WtJc=	WTg68cVxPI

What we will do is store the password hash, plus some random text called a salt (to add entropy to the user’s password). So Thomas Jefferson’s original password: I<3America now becomes I<3AmericaWTg68cVxPI and is hashed permanently to: HiE0AZhRWAs6Mmd7dVqppM1WtJc=.

So how do you process a login, now that the plaintext is gone? You perform the same calculation: Take the username from the login form, and lookup the salt. Then take the plaintext password from the login form and concatenated the salt from the database to reform the original string. Hash that string, and if matches the database hash, you know the correct password was supplied! Jefferson’s original patriotic password is completely masked, so those pesky Tories will never know his true allegiance! Yarr!

But what if another patriot (Paul Revere) chooses the exact same password, but is caught by the British? Couldn’t the British Regulars then just infer that anyone with the same password hash is also a patriot (and be hung)?

Because we added salt (entropy) to the plaintext password, we’re guaranteed to never have the same hash twice in our database. Lets say Paul Revere creates an account with the same password: I<3America, but we followed the required practice of using SecureRandom to create a new salt:

USERNAME	PASSWORD_HASH	SALT
tjefferson43	HiE0AZhRWAs6Mmd7dVqppM1WtJc=	WTg68cVxPI
paulrevere1776	aktJ/0cn69Y41vssNfZDHY1CsdE=	sWVUaTGa6e

So Paul Revere’s password I<3America becomes I<3AmericasWVUaTGa6e, which is hashed to: aktJ/0cn69Y41vssNfZDHY1CsdE= and stored. Later, when Paul Revere was caught and was revealed to be a patriot, Jefferson’s secret is kept safe by salting and hashing.

To summarize the benefits:

A database breach means your user accounts are still secure.
It’s impossible to tell if two people have the same password.
There are no restrictions to the length of a user’s password.
Your database column can be a fixed length instead of a varchar.
SHA1 was created by people who are smarter than you (and I). You can deflate your ego and leverage their brains.

Sidenote: The salt column is extremely important! Never ever, store, or send, a hash without a salt! Furthermore, the salt value must be securely generated with a random generator. You must use your language’s SecureRandom generator! You’ve been warned on these two points: they are CRITICALLY important (I can’t stress this enough)!

Real World Usage of Hash Algorithms, Scenario 2: Noah Webster

Yes, Noah Webster was an American revolutionary: He was a member of Connecticut militia, an editor of the Federalist newspaper, the author of American copyright, among many other things. He went on later in life to write some dictionary thing, but the important part here is he was a real hard ass when it came to teaching.

So say Noah Webster has hired you to write a program to test his students spelling. Because he so adamantly believed in spelling correctly, it’s important each student get 100%. Knowing this is a bit unrealistic though, he will allow the students to retake the test as many times as they want until they get 100%. The test consists of Mr. Webster dictating a small passage of text to the class and having them enter it correctly into your program. The program should tell them whether they spelled everything correctly. This will allow them to figure out the correct answer before turning in their final answer.

Another important piece of American history was that Noah Webster was a closet Ruby fan; you have to write this in Ruby. This presents a problem, because the students will be running your program, you can’t just embed the correct answer in your program, or they could look at the source code.

By using a hash, you can embed the hash of the correct text in the program and compare the students input to the hash.

Mr. Webster begins the test by saying to his students:

Before a standing army can rule, the people must be disarmed; as they are in almost every kingdom of Europe. The supreme power in America cannot enforce unjust laws by the sword; because the whole body of the people are armed, and constitute a force superior to any bands of regular troops that can be, on any pretense, raised in the United States.

The correct hash is: 6+IHAAJ1i0KmoPaowU1i0xSjcO4=

Knowing what you know now, can you finish the program for Mr. Webster? Post your answer in the comments!

Introduction to Strong Cryptography – p0

Posted: June 6, 2011 in cryptography
Tags: cryptography, hacking, Java, security

One thing that amazes me is that the most developers are not familiar with strong cryptography. In my career, I’ve seen all sort of mistakes that lead to leaked data, guessable passwords, unfortunate disclosures, and worse. The nice thing is, you don’t have to understand the ridiculously complex math behind the algorithms, you only have to know the rules for using them correctly. By the end of this series, my goal is to de-mystify the magic, so you can start using the primitives in your code right away!

But first, when I say Strong Cryptography, what the hell am I referring to anyway?

Strong cryptography or cryptographically strong are general terms applied cryptographic systems or components that are considered highly resistant to cryptanalysis.

http://en.wikipedia.org/wiki/Strong_cryptography

So Strong Cryptography is not some esoteric concept you are not privy to: Strong Cryptography is simply a set of definitions and algorithms that have been reviewed by experts, secret government agencies, and third-party organizations and found to be hard to break.

One thing I’ve seen repeatedly done is that developer ‘invents’ a cryptography scheme for a particular purpose. Here’s the thing, cryptography is thousands of years old. If you’ve ever ‘invented’ your own way to ‘encrypt’ data, chances are you’ve just re-invented something that has been discovered thousandsof years ago. If you want to avoid the mistakes that WEP made with wireless, Microsoft did with the XBox, or Sony made with the PS3, this blog series should help you avoid embarrassment, AND give you something impressive to say at the next cocktail party.

Finally, I just wanted to mention this is actually a very personal subject that I have a long history with. I found my first need for cryptography was passing notes to my friends as we played “Spies” in the neighborhood and needed to keep the locations of our secret forts safe. Unfortunately, my single letter substitution cipher must have been broken by some whiz kid as our treehouse was destroyed that summer… After reading Alvin’s Secret Code, we then created 2-3 sets of Caesar wheels and never lost a secret fort again!

I hope you enjoy the series!

Hate Java? You’re fighting the wrong battle.

Posted: May 30, 2011 in Programming, Uncategorized
Tags: COBOL, Java, legacy systems

One of the most interesting trends I’ve seen lately is the unpopularity of Java around blogs, DZone and others. It seems some people are even offended, some even on a personal level, by suggesting the Java is superior in any way to their favorite web2.0 language.

Java has been widely successful for a number of reasons:

It’s widely accepted in the established companies.
It’s one of the fastest languages.
It’s one of the most secure languages.
Synchronization primitives are built into the language.
It’s platform independent.
Hotspot is open source.
Thousands of vendors exist for a multitude of Java products.
Thousands of open source libraries exist for Java.
Community governance via that JCP (pre-Oracle).

This is quite a resume for any language, and it shows, as Java has enjoyed a long streak as being one of the most popular languages around.

So, why suddenly, in late 2010 and 2011, is Java suddenly the hated demon it is?

It’s popular to hate Java.
C-like syntax is no longer popular.
Hate for Oracle is being leveraged to promote individual interests.
People have been exposed to really bad code, that’s been written in Java.
… insert next hundred reasons here.

Java, the actual language and API, does have quite a few real problems… too many to list here (a mix of native and object types, an abundance of abandoned APIs, inconsistent use of checked exceptions). But I’m offering an olive branch… Lets discuss the real problem and not throw the baby out with the bath water.

So what is the real problem in the this industry? Java, with its faults, has completely conquered web application programming. On the sidelines, charging hard, new languages are being invented at a rate that is mind-blowing, to also conquer web application programming. The two are pitted together, and we’re left with what looks a bunch of preppy mall-kids battling for street territory by break dancing. And while everyone is bickering around whether PHP or Rails 3.1 runs faster and can serve more simultaneous requests, there lurks a silent elephant in the room, which is laughing quietly as we duke it out in childish arguments over syntax and runtimes.

Tell me, what do the following have in common?

Paying with a credit card.
Going to the emergency room.
Adjusting your 401k.
Using your insurance card at the dentist.
Shopping around for the best car insurance.
A BNSF train pulling a Union Pacific coal car.
Transferring money between banks.
Filling a prescription.

All the above industries are billion dollar players in our economy. All of the above industries write new COBOL and mainframe assembler programs. I’m not making this up, I work in the last industry, and I’ve interviewed and interned in the others.

For god sakes people, COBOL, invented in 1959, is still being written today, for real! We’re not talking maintaining a few lines here and there, we’re talking thousands of new lines, every day, to implement new functionality and new requirements. These industries haven’t even caught word the breeze has shifted to the cloud. These industries are essential; they form the building blocks of our economy. Despite this, they do not innovate and they carry massive expenses with their legacy technology. The costs of running business are enormous, and a good percentage of those are IT costs.

How expensive? Lets talk about mainframe licensing, for instance. Lets say you buy the Enterprise version of MongoDB and put in on a box. You then proceed to peg out the CPU doing transaction after transaction to the database… The next week, you go on vacation, and leave MongoDB running without doing a thing. How much did MongoDB cost in both weeks? The same.

Mainframes software is licensed much different. Lets say you buy your mainframe for a couple million and buy a database product for it. You then spend all week pegging the CPU(s) with database requests. You check your mail, and you now have a million dollar bill from the database vendor. Wait, I bought the hardware, why am I paying another bill? The software on a mainframe is often billed by usage, or how many CPU cycles you spend using it. If you spend 2,000,000 cpu cycles running the database, you will end up owing the vendor $2mil. Bizzare? Absolutely!

These invisible industries you utilize every day are full of bloat, legacy systems, and high costs. Java set out to conquer many fronts, and while it thoroughly took over the web application arena, it fizzled out in centralized computing. These industries are ripe for reducing costs and becoming more efficient, but honestly, we’re embarrassing ourselves. These industries stick with their legacy systems because they don’t think Ruby, Python, Scala, Lua, PHP, Java could possibly handle the ‘load’, scalability, or uptime requirements that their legacy systems provide. This is so far from the truth, but again, there has been 0 innovation in the arenas in the last 15 years, despite the progress of web technology making galaxy-sized leaps.

So next week someone will invent another DSL that makes Twitter easier to use, but your bank will be writing new COBOL to more efficiently transfer funds to another Bank. We’re embarrassing ourselves with our petty arguments. There is an entire economy that needs to see the benefits of distributed computing, but if the friendly fire continues, we’ll all lose. Lest stop these ridiculous arguments, pass the torch peacefully, and conquer some of these behemoths!

IMHO: DDD is NA to SOA, but OK for OOP…

Posted: May 22, 2011 in architecture
Tags: architecture, DDD, Java, Programming

Perhaps I’ve had a bad week, but I am not a fan of DDD… today. Maybe it’s because I’ve spent close to 50% of my career refactoring code that was over-engineered, unclear, or redundant… and DDD just happened to be present. Maybe the previous generation of programmers sat around the water cooler discussing design patterns, and they agreed DDD was second best only to pre-sliced pancakes. Maybe I haven’t really taken the time to understand when DDD really works, and that my sorrows and anger are misplaced. Perhaps many more agree with me, and I could incite lively discussion on DZone or the comments below…

I think it was my uncle who I first heard the phrase uttered, “Always walk a mile in the other guy’s shoes… so you’re a mile away and you’ve got his shoes.” Lets start with what DDD is, a simple example, and some of the pitfalls I’ve had to deal with this week.

DDD is Domain Driven Design. In a nutshell, the idea is to take the Object Oriented Programming (OOP) principles, understand them, then make them pervasive in your application architecture. The original concept of OOP was to bundle data, and logic that guards said data, together in a structure called an Object. The theory is that you model Object after real-world behavior, or a popular buzzword these days is “business logic.” Lets put the business logic right with the business data and pass it around. This is very OO! Eventually you’re so OO, that your ‘view layer’ applications are essentially just reflections of your domain layer… and actually DDD an entire process, but I’m going to just focus on the coding part for this post…

A good start would be the old, oft-abused Bank Account example. Lets have an object that represents someone’s bank account:

public class Account {
 private long balance;  // bitcoins
}

This has data, but no business logic, so we can’t get to their account balance. Lets add some:

public class Account {
 private long balance;

 public long getBalance(){
  return balance;
 }

 public void decrementAmount(long amount){
  balance = balance - amount;
 }

 public void addAmount(long amount){
  balance = balance + amount;
 }
}

Well this isn’t bad. I can clearly see what’s going on here; the methods are self-describing, and the code is clear about its intentions. I’m really feeling good about this actually.

public class Account {
	private long balance;

	public long getBalance() {
		return balance;
	}

	public void decrementAmount(long amount) {
		balance = balance - amount;
	}

	public void addAmount(long amount) {
		balance = balance + amount;
	}

	public void transferAmountFromAccount(long amount, Account foreignAccount) {
		foreignAccount.decrementAmount(amount);
		addAmount(amount);
	}

	public void fireInterestCaculation(double interestRate) {
		// office space arithmetic
		long interest = (long) (interestRate * balance);
		addAmount(interest);
	}
}

Well this is interesting… In accordance with the DRY principle, the transfer and fireInterest methods begin using other public methods internally. We could continue, maybe throw exceptions for NonSufficientFunds by calling getBalance first, or maybe even make fireInterest a recursive calculation. The reuse gets even better once we get to polymorph, extend, and generalize our objects. As we use the OO principles, we continually invoke the information hiding principle and complex business logic becomes nothing more than a glorious stack of method calls… And by now you too are thinking such a method stack could only be rivaled by a stack of pre-sliced pancakes.

So what is the opposite of DDD? An ‘Anemic Domain Model’, or one which domain object carry data but no logic. Here is the equivalent object in an anemic model:

public class Account {
	private long balance;

	public long getBalance() {
		return balance;
	}

	public void setBalance(long balance) {
		this.balance = balance;
	}

*Business logic is in a service somewhere…

Let me be perfectly honest… I started this article very harshly. I actually like DDD. DDD is the manifesting of all OO principles. You’re using the language to its fullest extent. Design patterns flow more naturally. Maybe you have fully achieved the purity of quantified conception… I hear this repeated, “Anemic domain models are leftover from procedural programming,” which to some extent is true. But are they all the bad? No, but with an Anemic Domain Model you aren’t taking advantage of the language. So DDD is the way to go? I think so, but there seems to be a few fundamental limitations with DDD that I think must be managed, and in my career, I’ve seen them managed ineffectively. The articles I’ve read on DDD focus very much on the benefits of DDD and how to implement it, but fair miserably at managing the lifecycle of DDD.

Jon’s DDD Antipattern #1: Your programmers must be domain experts.
This is an oft-omitted detail of doing DDD. Writing DDD effectively means you need to have programmers that understand the language of your domain fully, or, your analysts (that understand the domain fully) must be able to communicate in terms of OO. This should be fairly obvious because effective DDD means translating your domain from terms of accounts and interest rates to objects and functions. Furthermore, if your organization has a lot of turnover, you will be bitten rather than benefit from DDD as knowledge of how your domain was implemented evaporates.

Solution: Retain your experts, both technical and business. Recruit fresh talent into the organization to widen the pool. Group your domain experts and technical staff by domain, not by job position.

Jon’s DDD Antipattern #2: The impatient incompetent programmer
Impatient and lazy programmers are general antipatterns, but I feel in DDD they can really harm you. Remember how above I showed above how we could recycle existing methods in our code? Lets say Jane is late on a project, and she needs to add compounding interest to our Account object above. Because she’s in a hurry, she doesn’t have time to study the domain or the previous implementations. She extends the Account object as CompoundingAccount:

public class CompoundingAccount extends Account {

 public void compoundInterest(int periods, double rate){
  int interest = 0;
  for ( i = 0;  i < periods;  i++ ) {
    // more office space arithmetic
    long interest = (long) (interestRate * balance);
    addAmount(interest);
  }
 }
}

This code enters your domain and hundreds of view layers begin using the CompoundingAccount. Next, your company gets sued for not computing interest correctly (as shown here). You, as the domain expert know that in the root object know exactly where to make this change in the Account object. You do so, but because Jane did not obey the DRY principle, she has caused compounding damage to the domain! You could have hundreds of methods and views built on either fireInterest or compoundInterest. Think this won’t happen to you? Be realistic: It’s not a matter of if these events will happen; it’s simply a matter of when they will happen., and if you don’t have a contingency plan or governance, you’ve already failed.

Solution: Recognize DDD is a true double edge sword. As business logic propagates down your domain tree, mistakes will propagate too. Those mistakes will compound as methods are reused, causing a problem. You must setup governance in your domain. You must have centralized domain experts that develop and can tell you the correct place to implement new functionality as your domain expands.

Finally,
Jon’s DDD Antipattern #3: DDD is made ineffective in a distributed environment by most SOA implementations.
This is also frequently omitted. Remember how we said that classes were designed to hold state and business logic together? What happens if we translate the Account object to XML (a very common SOA protocol)

<account>
 <balance>375</balance> 
</account>

Where did my business logic go? It isn’t transferred… Protocols like REST, SOAP, RMI, Hessian transmit the state of business objects, but they do not transmit the implementation. How then, can you transmit an object between two independent systems and guarantee that fireInterestCaculation will create the exact same value? You actually can’t. The implementations on both ends of the wire must be the same, which you can’t guarantee in a distributed environment.

Solution: The only thing I can think of right now that might come close to resolving this issue the little used feature of Java RMI called Remote Class Loading. Remote Class Loading will sense that a client does not have an implementation, and load the implementation across the network. Maybe this is a solution? I haven’t experimented with this.

So as I sit here slicing my pancakes, I’ll admit I’m not opposed to DDD. In my career, DDD has simply been an innocent bystander in the war of egos vs time-schedules. The problems with DDD have been in its execution, not it’s integrity. We’ve all heard the benefits of DDD many times over, but I see very little discussion about how to manage its pitfalls. How do you manage DDD effectively? Have you any stories how DDD has because ineffectual? How do you deal with change management in your domain?

Once again, thank you for reading!

The case for EJB3.0 as a distributed SOA

Posted: May 15, 2011 in architecture
Tags: architecture, EBJ3, Java, Programming

Two years back, I was tasked with redesigning the member portal for my company. We had a great set of tools we could crank smaller applications out, and a brilliant team of people who understood them. This team frequently debated best practices in Maven, Spring, Hibernate, etc and the design patterns that surrounded them. Bean naming conventions were tossed about, file naming conventions were invented, and over all, it was pretty good life. However, due to the economy and slow business, development teams were cut back. As the lead engineer on the member portal redesign, the team was cut to two engineers… myself being one of them.

At the time, we built our applications surrounding Spring’s best practices. You could say our applications completely revolved around Spring. We had interfaces for everything: We had domain jars, DAO jars, service jars, facade jars, war poms and ear poms. The jars were built into every application; we had a service jar that knew how to get member objects from our database; we had another just to lookup pharmacies… Each one of these got baked into the applications…. Since each application went to production separately, our platform fragmented into a pile of baby powder. When SHTF (Support Hit The Fan), I think the most common question I was asked was, “Hey Jon, what version of member services is X application on?”

With every app on an ever-so-slightly-different version of a particular service, the memory usage for an individual application was outrageous. Startup times for individual apps was measured in minutes for some. With every application owning its own services, we had to provision multiple datasources for each app. Caching was unheard of… Things really started getting out of hand when our customers wouldn’t migrate between versions of our applications or SOAP services (who can blame them?). Quite slowly, we had a sprawling portfolio of slightly similar production applications that we had to keep running. Budgets and teams were shrinking, and this massive project had to take flight; deadlines were just over the horizon, but disaster was brewing in the skies and the sound of delays thundered in the distance.

Philosophically, we claimed we didn’t build monolithic applications. After all, our applications had a “separation of concerns,” only in the sense that each concern was a separate jar file. We had “services”, but these services were not distributed nor redundant. We knew we had to go the distributed route… so we started to “fix” our problems by making internal SOAP services available (After all, in Spring, everything is a bean, so why can a proxy be a bean?) Our latency skyrocketed. Function calls that took 1ms now took 40+ms. We had to limit these to very coarse-grained calls, but that just caused our SOAP payload size to swell. We brought in individual caches for SOAP calls, but now we faced memory and coherency issues.

Back to the project, realizing that we had in fact been building “monolithic” applications, we had to go distributed. But how? My coworker said, lets just replace our SOAP calls with REST calls and we’ll be in great shape. So we set off, having my teammate write the frontend in JSP/YUI, while I concentrated on developing the backend services and our usual massively complex Spring assembly. Benchmarking my calls, I saw we had gone from 40ms SOAP calls to 30ms of REST calls! What??? REST was the answer to all that was life! Panicking, I profiled the CXF/Spring code and realized I’m spending all day parsing strings! The only real benefit rest bought me was there was just less strings to parse!

I stopped. I know I’m supposed to “optimize last”, but clearly, our design or technology choices were flawed. The services had complex bindings and a million strange annotations littered about them. And oh man, did we have a ton of Spring XML files. Something had to change…

About that time, I stumbled on post by Adam Bien … Take a look at some EJB3.0 code he published:

@Remote
public interface BookService {
Book createOrUpdate(Book book);
void remove(Book book);
Book find(Object id);
}

To use invoke that service remotely (to ‘inject’ or ‘lookup’ that service) you do this:

@EJB
private BookService service;

Wait, isn’t EJB code supposed to be complex? Isn’t it full of XML, vendor specific deployment descriptors, RemoteExceptions, Portable Objects, Narrowing, IIOP, CORBA, RMI, Lookups, LocalHomes, RemoteHomes, Object pools, RMIC compilers, IDLs and IntitialContexts? Doesn’t every method have to throw annoying checked exceptions? And most of all, when the hell did EJBs get Java annotations???

Here’s the crux of this post: The above code is expressive and clear in its intent. Even a Ruby programmer (joke) could see that we’re defining a service contract here that will be offered remotely. This style of definition delivers a certain elegance in its simplicity.

…and I realized that I had a possible solution to my problems. I quickly ripped out my Spring/CXF/REST job and reduce the clutter to a few lines of code. I deleted my Spring files and flung XML into the recycle bin. The moment of truth came with the benchmarks… So EJB under the scenes uses Java serialization… and it’s fast… a lot faster than SOAP and still faster than REST. My response times dropped below 10ms. Payloads that were single lines of text dropped to 3ms or less… wow! Better yet, getting rid of the Spring Jars, the CXF jars and their laundry list of transient dependencies took our deployment artifacts from 20 megabytes to 700 kilobytes. Now that is lightweight! I got approval to move forward with the technology and I surged forward, ripping to shreds my previous XML excursion in a wake of JEE.

Now EJB3.0 isn’t without its own set of problems, which I will cover in later posts, but overall, it offers several key advantages:

It’s lightweight: You can write full services using 4 annotations and deploy under a megabyte. I haven’t seen applications that small since Windows 3.1.
It’s fast: Well faster than REST/SOAP… I’m sure there are probably faster binary protocols, but hey, Java Serialization is built into the JVM and it spanks any text protocol.
It’s familiar: You still have to be conscious that you’re in a distributed system, but most developers have used Java Serialization. For most POJOs, you don’t have to do anything except implement Serializable.
It’s transparent: It’s amazing. The EJB3.0 annotations are minimal and powerful at the same time. Developer productivity is high, and there is no generated boilerplate code. It’s easy to teach these concept to people who didn’t bear the tragedy of EJB2.1.
It’s reliable: EJB has always been a cluster-able technology if you application container supports it.

The story doesn’t end, as this was just the first iteration of the application. The new member portal application is happily humming along in production. We’ve had a few hiccups with WebSphere 7, but for the most part, it’s been relatively problem free. We’re also finally starting to reap the benefits of a distributed SOA architecture. Our applications share connection pools, we’re cutting down to a few core services, deployment artifact size is 20x smaller, and memory usage is dropping.

In future posts, I’ll write examples of just how easy it is to get these services running, and why Java is a great distributed language when coupled with JEE6. You can have developer productivity (and fun) without sacrificing reliability on bleeding edge dynamic languages.

Footnote:

A great resource for EJB3.0 is the free book Master EJB 4th Edition. I read the entire thing in a weekend on an Evo4g. It’s brief, but it will give you a working knowledge to get off and running.

Also, Adam Bien‘s weblog is a great resource for everything EJB3+ and JEE6.