Sunday, July 5, 2009

Codes and Ciphers

This is from the Wall Street Journal. I thought those of you interested in ciphers might find it interesting.


FROM THE WALL STREET JOURNAL
 
For more than 200 years, buried deep within Thomas Jefferson's correspondence and papers, there lay a mysterious cipher -- a coded message that appears to have remained unsolved. Until now.

The cryptic message was sent to President Jefferson in December 1801 by his friend and frequent correspondent, Robert Patterson, a mathematics professor at the University of Pennsylvania. President Jefferson and Mr. Patterson were both officials at the American Philosophical Society -- a group that promoted scholarly research in the sciences and humanities -- and were enthusiasts of ciphers and other codes, regularly exchanging letters about them.
In this message, Mr. Patterson set out to show the president and primary author of the Declaration of Independence what he deemed to be a nearly flawless cipher. "The art of secret writing," or writing in cipher, has "engaged the attention both of the states-man & philosopher for many ages," Mr. Patterson wrote. But, he added, most ciphers fall "far short of perfection."
To Mr. Patterson's view, a perfect code had four properties: It should be adaptable to all languages; it should be simple to learn and memorize; it should be easy to write and to read; and most important of all, "it should be absolutely inscrutable to all unacquainted with the particular key or secret for decyphering."

Mr. Patterson then included in the letter an example of a message in his cipher, one that would be so difficult to decode that it would "defy the united ingenuity of the whole human race," he wrote.

There is no evidence that Jefferson, or anyone else for that matter, ever solved the code. But Jefferson did believe the cipher was so inscrutable that he considered having the State Department use it, and passed it on to the ambassador to France, Robert Livingston.
The cipher finally met its match in Lawren Smithline, a 36-year-old mathematician. Dr. Smithline has a Ph.D. in mathematics and now works professionally with cryptology, or code-breaking, at the Center for Communications Research in Princeton, N.J., a division of the Institute for Defense Analyses.

A couple of years ago, Dr. Smithline's neighbor, who was working on a Jefferson project at Princeton University, told Dr. Smithline of Mr. Patterson's mysterious cipher.
Dr. Smithline, intrigued, decided to take a look. "A problem like this cipher can keep me up at night," he says. After unlocking its hidden message in 2007, Dr. Smithline articulated his puzzle-solving techniques in a recent paper in the magazine American Scientist and also in a profile in Harvard Magazine, his alma mater's alumni journal.

The code, Mr. Patterson made clear in his letter, was not a simple substitution cipher. That's when you replace one letter of the alphabet with another. The problem with substitution ciphers is that they can be cracked by using what's termed frequency analysis, or studying the number of times that a particular letter occurs in a message. For instance, the letter "e" is the most common letter in English, so if a code is sufficiently long, whatever letter appears most often is likely a substitute for "e."

Because frequency analysis was already well known in the 19th century, cryptographers of the time turned to other techniques. One was called the nomenclator: a catalog of numbers, each standing for a word, syllable, phrase or letter. Mr. Jefferson's correspondence shows that he used several code books of nomenclators. An issue with these tools, according to Mr. Patterson's criteria, is that a nomenclator is too tough to memorize.

Jefferson even wrote about his own ingenious code, a model of which is at his home, Monticello, in Charlottesville, Va. Called the wheel cipher, the device consisted of cylindrical pieces, threaded onto an iron spindle, with letters inscribed on the edge of each wheel in a random order. Users could scramble and unscramble words simply by turning the wheels.

But Mr. Patterson had a few more tricks up his sleeve. He wrote the message text vertically, in columns from left to right, using no capital letters or spaces. The writing formed a grid, in this case of about 40 lines of some 60 letters each.

Then, Mr. Patterson broke the grid into sections of up to nine lines, numbering each line in the section from one to nine. In the next step, Mr. Patterson transcribed each numbered line to form a new grid, scrambling the order of the numbered lines within each section. Every section, however, repeated the same jumbled order of lines.

The trick to solving the puzzle, as Mr. Patterson explained in his letter, meant knowing the following: the number of lines in each section, the order in which those lines were transcribed and the number of random letters added to each line.

The key to the code consisted of a series of two-digit pairs. The first digit indicated the line number within a section, while the second was the number of letters added to the beginning of that row. For instance, if the key was 58, 71, 33, that meant that Mr. Patterson moved row five to the first line of a section and added eight random letters; then moved row seven to the second line and added one letter, and then moved row three to the third line and added three random letters. Mr. Patterson estimated that the potential combinations to solve the puzzle was "upwards of ninety millions of millions."

After explaining this in his letter, Mr. Patterson wrote, "I presume the utter impossibility of decyphering will be readily acknowledged."

Undaunted, Dr. Smithline decided to tackle the cipher by analyzing the probability of digraphs, or pairs of letters. Certain pairs of letters, such as "dx," don't exist in English, while some letters almost always appear next to a certain other letter, such as "u" after "q".
To get a sense of language patterns of the era, Dr. Smithline studied the 80,000 letter-characters contained in Jefferson's State of the Union addresses, and counted the frequency of occurrences of "aa," "ab," "ac," through "zz."

Dr. Smithline then made a series of educated guesses, such as the number of rows per section, which two rows belong next to each other, and the number of random letters inserted into a line.
To help vet his guesses, he turned to a tool not available during the 19th century: a computer algorithm. He used what's called "dynamic programming," which solves large problems by breaking puzzles down into smaller pieces and linking together the solutions.

The overall calculations necessary to solve the puzzle were fewer than 100,000, which Dr. Smithline says would be "tedious in the 19th century, but doable."

After about a week of working on the puzzle, the numerical key to Mr. Patterson's cipher emerged -- 13, 34, 57, 65, 22, 78, 49. Using that digital key, he was able to unfurl the cipher's text:

"In Congress, July Fourth, one thousand seven hundred and seventy six. A declaration by the Representatives of the United States of America in Congress assembled. When in the course of human events..."

That, of course, is the beginning -- with a few liberties taken -- to the Declaration of Independence, written at least in part by Jefferson himself. "Patterson played this little joke on Thomas Jefferson," says Dr. Smithline. "And nobody knew until now."

No comments: