One of the most common (and very easy to crack) ciphers is substitution. One sometimes sees these in a newspaper somewhere near the crossword puzzle. This scheme is also the one used in Poe's story ``The Gold Bug'', and the Sherlock Holmes ``Mystery of the Dancing Men''. The idea is straightforward: choose a rearrangement of the letters of the alphabet, and replace each letter in the plaintext by its corresponding one.
Such substitutions are very simple to implement in Maple. We will use the CharacterMap function from the StringTools package4.2which does exactly what we want. First, we define two ``alphabets''- one for our plain text, and one for our encrypted text. The second is a permutation of the first.
>
with(StringTools):
Alphabet :="abcdefghijklmnopqrstuvwxyz":
Cryptabet:="THEQUICKBROWNFXJMPSVLAZYDG":
Now we define a pair of functions4.3to encrypt and decrypt our text.
>
Scramble := msg -> CharacterMap(Alphabet, Cryptabet, msg):
Unscramble:= code -> CharacterMap(Cryptabet,Alphabet, code):
>
Scramble("please do not read this");
>
Unscramble(
Note that our message contains a spaces which are preserved in the encryption process, because the CharacterMap function only modifies those characters which are found in the first string. If a character isn't found, it is left alone. Thus, if we try to scramble a message containing punctuation or upper-case letters, they aren't modified by Scramble. Since our Cryptabet is all upper-case, a message which began with mixed case will be distorted.
>
Scramble("He said I need direction; I head for the door.");
Unscramble(
It is traditional in pen-and-paper cryptography to use a 26-letter alphabet
(that is, to remove spaces and punctuation and ignore any distinction
between upper and lower case letters). If you wish to do this, it could be
done4.4using LowerCase (which translates all letters to lower case),
and a combination of Select and IsLower to remove anything
that isn't a letter.
>
Select(IsLower,LowerCase("I don't mind suggestions. Don't give me any more."));
>
Scramble( Unscramble(
In general we won't adhere to this convention in this chapter.
Rather, we will treat spaces and punctuation like any other
character. This is not a common approach, but one we happen to like.
There are two big drawbacks to such a substitution cipher. One
primary problem is that the key for enciphering and deciphering is the
permutation: you need to remember an ordering of the entire alphabet.
A second problem, shared with all such monoalphabetic substitutions (i.e., those that substitute one letter for another), is that they aren't that hard to break. If you have a long enough message, you can use the fact that certain letters occur much more frequently than others to help tease out the original text. In English text the letter ``e'' is the occurs most frequently, followed by ``t'', ``a'', ``n'', ``i'', and so on. Spaces occur about twice as often as the letter ``e'', assuming they are encrypted, rather than just omitted. Once you guess a few letters, common words like ``the'' and ``and'' fall into place, giving more clues.
We will examine some polyalphabetic substitution ciphers in §5.
A variation on the arbitrary permutation discussed above is the ``Caesar cipher'', named after Julius Caesar, who supposedly invented it himself. This is also what was done by a ``Captain Midnight Secret Decoder Ring'', popular in the 1930s and 40s. Here we convert our alphabet to numeric equivalents (with, say A=0, B=1, and so on), add an offset to each numeric equivalent (legend has it that Caesar used an offset of 3), then re-encode the numbers as letters. We do the addition modulo the length of the alphabet; that is, when we shift a Y by 3, we get B, since 24 + 3 = 27, and 27 1 (mod 26). However, we won't always restrict ourselves to a 26 letter alphabet-- we may want punctuation, numbers, and to preserve upper- and lower-case distinctions.4.5
While we could easily implement the Caesar cipher without converting our text to numbers, doing so makes it easier to understand and sets us up for more complicated ciphers we will need later.
Computers already have a built-in numeric representation of the alphabet-- the ASCII encoding.4.6At their lowest level, digital computers operate only on numeric quantities-- they just manipulate ones and zeros. The smallest piece of information is the state of a switch: it is off (0) or it is on (1). This is called a bit, which is short for binary digit. A group of 8 bits is called a byte, and is a single ``character'' of information.4.7A byte can have 28 = 256 different states:
So how does a computer represent characters? There is an agreed upon standard encoding of characters, punctuation, and control codes into the first 127 integers. This is called ASCII,4.8an acronym for American Standard Code for Information Interchange. Extended ASCII encodes more characters into the range 128-255 (this is where you would find characters for å and ±, for example), although the characters found in this range vary a bit. Not all programs can deal successfully with characters in the extended range4.9
Maple has a built-in routine to allow us to convert characters to and from
their ASCII equivalents. If it is called with a list of numbers, it returns
a string of characters, and if called with a character string, out comes a
list of numbers. For example,
>
convert("How do I work this?",bytes);
>
convert([72, 117, 104, 63],bytes);
We can use this to make a list of the ASCII codes. Maple prints a
when the ASCII code is not a printable character. We make the table
for characters up through 127 because the meanings in extended ASCII differ
depending on what character set you use. Feel free to make your own table
of all 255 characters. (In the interests of making a nicely formatted
table, we have used a somewhat complicated command. The much simpler
seq([i,convert([i],bytes)],i=0..127); would have produced the
same information, just not as neatly.)
>
matrix([ [_,seq(i, i=0..9)],
seq([10*j, seq(convert([i+10*j],bytes), i=0..9)], j=0..12)]);
Note that some of the characters (such as ASCII code 10) print as a backslash and a letter, like \n. These are codes for special control characters: \n indicates a newline, \t is a tab, and so on. Since Maple indicates a string by printing it inside double quotes, it indicates the double quote4.10character (ASCII 34) using \'', and the backslash character (ASCII 92) with a pair of backslashes. While it doesn't print out in Maple, ASCII 127 is the control sequence DEL (sometimes printed as ^?). This sometimes is interpreted as ``delete the previous character''; if your output device interprets it that way, characters can seem to mysteriously vanish.
Another special character is ASCII code 0: note that it prints as nothing, not even a blank space (ASCII 32 is the space character). This is the null character, and is typically used internally to indicate the end of a string. If a null occurs in the middle of a string, usually what comes after it won't print out unless you take special steps. To see this for yourself, try a command like convert([73,116,39,115,0,103,111,110,101,33],bytes). You'll see that the characters following the 0 don't print out.
The StringTools package also contains Char which gives the character corresponding to a decimal number, and Ord which gives the number corresponding to a character. We'll generally use convert, although it makes little difference.
Now we are nearly ready to write a version of the Caesar cipher. The basic idea is that for each character of the plaintext, we convert it a numerical representation, add a pre-specified offset modulo 127, then convert back to characters. Unlike the Scramble function we wrote in §2.1, we needn't declare an alphabet explicitly because we will use the ASCII codes. However, in addition to the Maple command convert( ,bytes), we will use another new command, map.
The map command allows us to apply a function to every element of a vector or list in one call. For example, to compute the square root of each of the first 10 integers, we could do
>
map(sqrt,[1,2,3,4,5,6,7,8,9,10]);
If the function we wish to apply takes two variables, we can give the second one as a third argument to map. For example, the command
>
map(modp,[1,2,3,4,5,6,7,8,9,10],3);
computes n mod 3 as n ranges over the list [1, 2,..., 10]. (Recall that n mod p is the remainder after dividing n by p.)
We finally are ready.
See if you can understand it before reading the
explanation below. Remember to read from the middle outward.
Note that we are defining the function we are mapping directly as
the first argument.
>
Caesar:= (plain, shift) ->
convert(map(x -> (x+shift-1) mod 127 + 1,
convert(plain,bytes)),
bytes):
Let's give it a try:
>
Caesar("Et tu, Brute?",3);
If you don't see what is going on in Caesar, try to think of it as a composition of several functions, each of which you DO understand. That is, Caesar works as follows
Note that we didn't really compute
; instead we subtracted one
from the sum, computed mod 127, then added 1 again. Why?
Consider what would happen if, for example, our character was ``k''
(ASCII code 107) and we shifted by 20. Then
(107 + 20) mod 127
gives 0, which is the code for the null byte, indicating the end of
a string. This would be a problem. So instead, we add 1 afterward
to ensure that our answer is in the range
1,..., 127. And just
to avoid confusing ourselves, we subtract the 1 first, so that
shifting by 0 doesn't change anything.
Aside from the fact that this function is a bit confusing to read and offers essentially no cryptographic security, what else is wrong with it? Consider the following example:
>
Caesar("Who put the benzedrine in Mrs. Murphy's Ovaltine?", 86);
In addition to the fact that the result looks funny, if we were to try to decipher this from printed output, we would have a little (but not much) trouble, because the characters ``.'' and ``?'', after encoding, print out as .4.11 This happens because the numeric values of these characters, after shifting by 86, are 5 and 22, respectively. Neither of those corresponds to a printing character, so we can't tell them apart. Of course, if we had no intention of dealing with our encoded message in printed form, this would be no problem. The actual character values are different, they just print out the same.
How can we get around this? We can restrict ourselves to printing characters using a variation of the idea in Scramble of §2.1, that is, giving an explicit Alphabet that we will deal in.
Before doing this, however, it will be helpful to discuss the proc command, which allows us to construct functions that consist of more than one Maple command.