|Home | Articles | Forum | Glossary | Books|
by Lee C.F. Sallows
Lee Sallows is an English electronics engineer employed at the Psychology Laboratory of the University of Nijmegen. Besides the design and construction of electronic instruments associated with psychological experiments, he does a good deal of translation work, mostly of scientific and technical papers. A self-confessed dilettante, his interests have included ham radio, psychoanalysis, classical guitar, recreational mathematics and linguistics, impossible figures, logical paradoxes. Sherlockian studies, runology, mountain walking, and writing.
The pangram problem
Some years ago, a Dutch newspaper, the Nieuwe Rotterdamse Courant, carried an astonishing translation of a rather tongue in-cheek sentence of mine that had previously appeared in one of Douglas Hofstadter's Scientific American columns ("Metamagical Themas", January 1982). Both the translation and an article describing its genesis were by Rudy Kousbroek, a well-known writer and journalist in Holland. Here is the original sentence: Only the fool would take trouble to verify that his sentence was composed of ten a's, three b's, four c's, four cfs, forty-six es, sixteen f 's, four g's, thirteen h's, fifteen i's, two k's, nine i's, four m's, twenty-five n's, twenty-four o's, five p's, sixteen r's, forty-one s's, thirty-seven i's, ten u's, eight vs, eight w's, four x's, eleven y's, twenty-seven commas, twenty-three apostrophes, seven hyphens and, last but not least, a single! Complete verification is a tedious task: un-skeptical readers may like to take my word for it that the number of letters and signs used in the sentence do indeed correspond with the listed totals. A text that inventories its own typography in this fashion is what I call an autogram (auto = self, gramma = letter). Strict definition is un necessary, different conventions giving rise to variant forms; it is the use of cardinal number-words written out in full that is the essential feature. Below we shall be looking at some in which the self-enumeration restricts itself to the letters employed and ignores the punctuation.
Composing autograms can be an exacting task, to say the least. The process has points in common with playing a diabolically conceived game of patience. How does one begin? My approach is to decide first what the sentence is going to say and then make a flying guess at the number of occurrences of each sign. Writing out this provisional version, the real totals can be counted up and the initial guess updated into an improved estimate. The process is repeated, trials and error leading to successively closer approximations. This opening soon shades into the middle game. By now all of the putative totals ought to have been corrected to within two or three of the true sums. There are, say, nine f 's in fact but only seven being claimed, and 27 real t's where twenty-nine are declared.
An English explorer's self-referent account of his hybrid machine for solving a challenging word puzzle.
Switching seven with the nine in twenty nine to produce nine f 's and twenty-seven t's corrects both totals at a single stroke.
Introducing further cautious changes among the number-words with a view to bringing off this sort of mutual cancellation of errors should eventually carry one through to the final phase.
The end game is reached when the number of discrepancies has been brought down to about four or less. The goal is in sight but, as in a maze, proximity is an un reliable guide. Suppose, for instance, a few days' painstaking labor have at last yielded a near-perfect specimen: only the x's are wrong. Instead of the five claimed, in reality there are six. Writing six in place of five will not merely invalidate the totals for e, f, s, and v, the x in six means that their number has now become seven. Yet, replacing six by seven will only return the total to six. What now? Paradoxical situations of this kind are a commonplace of autogram construction.
Interlocking feedback loops magnify tiny displacement into far-reaching upheavals; harmless truths cannot be stated without disconfirming themselves. Clearly, the only hope of dehydrating this Hydra and getting every snake-head to eat its own tail lies in doctoring the text accompanying the listed items. In looking at the above case, for example, only a fool will fail to spot instances where style has been com promised in deference to arithmetic. Short of a miracle, it is only the flexibility granted through choice of alternative forms of expression that would seem to offer any chance of escape from such a labyrinth of mirrors.
This is what made Kousbroek's translation of my sentence so stunning. Number words excepted, his rendering not only adhered closely to the original in meaning, it was simultaneously an autogram in Dutch! Or at least, so it appeared at first sight.
Counting up, I was amused to find that three of the sums quoted in his sentence did not in fact tally with the real totals. So I wrote to the author pointing out these discrepancies. This resulted a month later in a second article in the same newspaper.
Kousbroek wrote of his surprise and dismay in being caught out by the author of the original sentence, "specially come over from America, it seems, to put me right." The disparities I had pointed to, however, were nothing new to him. A single flaw had been spotted in the supposedly finished translation on the very morning of submitting his manuscript. But a happy flash revealed a way to rectify the error in the nick of time. Later, a more careful check revealed that this 'brain wave' had in fact introduced even more errors elsewhere. He'd been awaiting 'the dreaded letter with its merciless arithmetic' ever since. The account went on to tell of his titanic struggle in getting the translation straight. The new version was included; it is a spectacular achievement.
The tail concealed a subtle sting, how ever. At the end of his story, Kousbroek threw out a new (letter-only) autogram of his own:
Dit pangram bevat vijf a's, twee h's. twee c's. drie d's, zesenveertig c's. vijff 's, vier g's. twee h's, vijftien i's, vier j's. een k. twee I 's. twee Ws, zeventien n's, een o. twee p's. een q. zeven r's, vierentwintig s's, zestien een u. elf v's. acht w's, een x. een y en zes
A finer specimen of logo-logical elegance is scarcely conceivable. The sentence is written in flawless Dutch and couldn't possibly be expressed in a crisper or more natural form. In ordinary translation, it says, "This pangram contains five a's, two b's, two c's ... one y, and six z's."
[A pangram, it is necessary to explain, is simply a phrase or sentence containing every letter of the alphabet at least once (pan = all, gramma = letter). This article is about self-enumerating pangrams, that is, pangrams that are simultaneously auto grams. In such pangrams, some letters will occur only at the point where they them selves are listed (look at k, o, q, u, x, y).]
Following this pangram came a devilish quip in my direction: "Lee Sallows will doubtless find little difficulty in producing a magic English translation of this sentence," wrote Kousbroek.
Needless to say, I didn't manage to find any errors in this sentence of his!
Autograms by computer
Rudy's playful taunt came along at a time when I had already been looking into the possibility of computer-aided autogram construction. Anyone who has tried his hand at composition will know the drudgery of keeping careful track of letter totals. One small undetected slip in counting can later result in days of wasted work.
At first I had envisaged no more than an aid to hand-composition: a program that would count letters and provide continuous feedback on the results of keyboard mediated surgery performed on a sentence displayed on screen. Later I began to wonder what would happen with a program that cycled through the list of number words, checking each against its corresponding real total and making automatic replacements where necessary. Could autograms be evolved through a repetitive process of selection and mutation? Several such LISP programs were in fact written and tested: the results were not unpredictable. In every case, processing would soon become trapped in an endless loop of repeated exchanges. Increasing refinements in the criteria to be satisfied before a number-word was replaced would win only temporary respite from these vicious circles.
What seemed to be needed was a pro gram that could look ahead to examine the ramifications of replacing nineteen by twenty, say, before actually doing so. But how is such a program to evaluate or rank prospective substitutions? Goal-directed problem solving converges on a solution by using differences between intermediate results and the final objective so as to steer processing in the direction of minimizing them. The reflexive character of auto grams frustrates this approach. As we have seen, proximity is a false index. 'Near-perfect' solutions may be anything but near in terms of the number of changes needed to correct them, while a sentence with as many as eight discrepant totals might be perfected through replacing a single number-word. If hand-composition is obliged to rely on a mixture of guesswork, word chopping, prayer, and luck, how can a more intelligent strategy be incorporated into a program?
I was pondering this impasse when Rudy Kousbroek's challenge presented it self, distracted my attention, and sent me off on a different tack. The sheer hopelessness of the undertaking caught my imagination. But was it actually impossible? What a comeback if it could really be pulled off! The task was to complete a letter-only autogram beginning, "This pangram contains ...". A solution, were it discoverable, must in a sense already exist 'out there' in the abstract realm of logo logical space. It was like seeking a number that has to satisfy certain predetermined mathematical conditions. And nobody- least of all Kousbroek-knew whether it existed or not. The thought of finding it was a tantalizing possibility. Reckless of long odds, I put aside programs and launched into a resolute attempt to discover it by hand-trial.
It was a foolhardy quest, a search for a needle in a haystack without even the reassurance of knowing that a needle had been concealed there in the first place. Two weeks' intermittent effort won only the consolation prize of a near-perfect solution: all totals correct save one; there were 21 t's instead of the 29 claimed. With a small fudge, it could even be brought to a shaky sort of resolution: ttttt this pangram contains five a's, one b, two c's, two d's, twenty-seven e's, six f 's, three g's, five h's, eleven i's, one j, one k, two i's, two m's, twenty n's, fourteen o's, two p's, one q, six r's, twenty-eight s's, twenty-nine t's, three u's, six v's, ten w's, four x's, five y's, and one z.
To the purist in me, that single imperfection was a hideous fracture in an other wise flawless crystal. Luckily, however, a promising new idea now suggested itself.
The totals in the near-solution must represent a pretty realistic approach to what they would be in the perfect solution, assuming it existed. Why not use it as the basis for a systematic computer search through neighboring combinations of number-words? Each of the near-solution totals could be seen as centered in a short range of consecutive possibilities within which the perfect total was likely to fall.
The number of f 's, say, would probably turn out to lie somewhere between two and ten, a band of nine candidates clustered about 'six'. With these ranges de fined, a program could be written to generate and test every combination of twenty-six number-words constructible by taking one from each. The test would consist in comparing these sets of potential totals with the computed letter frequencies they gave rise to, until an exact match was found, or until all cases had been examined. Blind searching might succeed where cunning was defeated.
It isn't actually necessary to deal with all twenty-six totals. In English there are just ten letters of the alphabet that never occur in any number-word between 0 and 100, the one too low and the other too high to appear in the pangram. These are a, h, e, d, j, k, nt, p, q, and The totals for these letters can thus be determined from the initial text and filled in directly:
This pangram contains five a's, one b. two c's. two d's, ? es, ?fs. ? g's. ? h's. ? i's. one j. one k.? I's. two ? n's.? (is, two p's. one ( I. ? r's.? s's. ? t's.? u's,? i's. ? ? ? yes. and one This leaves exactly sixteen critical to tals. Counting up shows that there are al ready 7 és. 2 f 's, 2 g's, 2 h's. 4 i's. I I. 10 n's, II o's. 2 r's. 24 s's, 7 i's, I rt. 2 v's. 5 w's, I s, and I y: sixteen constants that must be added to those letters occurring in the trial list of sixteen number-words.
Though straightforward in principle, the program I then set out to write carried its practical complications. Number-words lack the regularity of numerals (in what ever base notation), still less the harmony of the numbers both stand for. An obvious step was to replace number-words by PRO FILES: alphabetically ordered sixteen-element lists representing their letter content.
The PROFILE for twenty-seven, for instance, would be:
e fghi I nor stuvwxy
(3 0 0 0 0 0 2 0 0 1 2 0 1 1 0 1)
The letters above the list are for guidance only, and form no part of the PROFILE itself. A special case was the PROFILE for one, which provided for the disappearance of plural s Cone x, two x's') by including -1 in the s position. PROFILES for all number-words up to fifty (anything higher than forty was unlikely ever to be needed) were stored in memory, and a label associated with each. These labels were chosen to coincide with the number represented. The label for the PROFILE of twenty-seven, for example, would be the decimal number 27.
Starting with the lowest, a simple algorithm could now generate successive combinations of labels, that is, numbers, drawn from the 16 pre-defined ranges. We shall return to these shortly. Each set of tables would be used to call up the associated set of PROFILES. These 16 PROFILES would be added together element for element, and the resulting sums in turn added to the above-mentioned constants so as to form a SUMPROFILE-See Fig. I. The SUMPROFILE would thus contain the true letter frequencies for the presently activated sentence (the 16 number-words represented by the current combination of labels plus residual text). All that remained was for the pro gram to check whether the numbers in the SUMPROFILE coincided with the present set of PROFILE labels. If so, the candidate combination of number-words agreed with the real totals and the pangram had been found. If not, generate the next combinations and try again.... The simplicity of this design conveys no hint of the uncounted alternatives re-connoitered before reaching it. The 'obvious' PROFILES were not quite so conspicuous as suggested, being in fact a later improvement over a previous look-up table.
Weeks were spent in exploring a quite different approach that sought to exploit the mutual-cancelling technique formerly used in hand-composition. By the time the final version of the program had come into focus, half a dozen prototypes lay behind and several months had slipped by. In the mean time, cheerful enthusiasm had given way to single-minded intensity as the problem wormed its way under my skin.
Neither was I working entirely alone.
Word of the pangram puzzle had spread among colleagues, discussion sprang up and contending design philosophies were urged. At one stage, complaint of "excessive cpu-time devoted to word games" came in from the University of Nijmegen Computing Centre, whose facilities had been shamelessly pressed into service.
This was when rival programs were running simultaneously. It was bad enough to be in search of a Holy Grail that might not even exist; the thought of someone else finding it first added a sticky sense of urgency to the hunt.
The question of determining the exact ranges of number-words to be examined seemed to me an essentially trivial one, and I put it off until last. The important thing was to get the program running. For the time being it was enough to decide what the lowest combination was going to be, and to let the algorithm generate all possibilities up to. say, ten higher for each number-word. In terms of software it was convenient for ranges to be of equal length; ten might be unnecessarily high, but better the net be too large than that the fish should escape. Since the totals in the near-solution were to define the midpoint of these ranges. their lower limits would commence at about five less. Fourteen o's,' for instance, implied a range running from nine up to eighteen (or perhaps ten up to nineteen). The values actually settled *upon-on the basis of pencil-and paper trials with near-autograms-may be seen in Fig. 2. Ranges for each of the six teen critical letters are represented as vertical scales with numbers (standing for number-words) indicating their starting and finishing totals. Within these ranges fall the hand-produced near-solution sums tracing out a histogram silhouette. In most cases these are, by definition, situated roughly in the middle of the range. For the low totals I, g, and u, however, this is impossible: in a pangram all letters must occur at least once; the range cannot ex tend below one (see Fig. 2.). The second part of this article, reproduced by kind permission of Springer Verlag, Heidelberg and New York, will appear in the September issue of Elektor Electronics.
Also see: Measurement Techniques (1)