–Geoff-Hart.com: Editing, Writing, and Translation —Home —Services —Books —Articles —Resources —Fiction —Contact me —Français |
You are here: Articles --> 2024 --> Deciphering words
Vous êtes ici : Essais -->
by Geoffrey Hart
Previously published as: Hart, G. 2024. Deciphering words by recognizing common letter substitutions. American Editor Dec. 4/24.
If you work with authors who use non-Roman alphabets, trying to figure out what word an author was striving for is sometimes easier if you know a few tricks. Understanding the characteristics of an author’s language is a good start, since it often provides insights such as letters from our alphabet that they don’t use in their language, or use differently.
For example, Japanese authors often reverse L and R (e.g., glass becomes grass, or vice versa) because L-sounds fall somewhere between R and L in Japanese. In contrast, Chinese authors don’t have this problem because Chinese uses both letters the same way we do in English. Instead, because Chinese uses complex visual patterns to define the language’s characters, Chinese authors are often fooled by look-alike English letters.
This article describes three of the common types of visual error that result in incorrect words that visually resemble the correct words: similar letter shapes, missing or added letter parts, and rotations or flips of a letter. Although you can often figure out what the word should be (e.g., because only one word fits in the context of the surrounding words), that doesn’t always help. Many of these changes create non-words that will turn up in spellcheck, but some result in legitimate words that the spellchecker will ignore even though they’re inappropriate for their current context.
Look closely at the English alphabet and you’ll see that many letters have similar shapes. For example, many English letters have a round (almost circular) stroke at the heart of the character, with one or more protruding parts. Consider, for example, the arms, ascenders, and descenders of the letters b, d, p, q — which are identical, only rotated or flipped. Other characters may have the same number and position of strokes, but with different curvature of the left, right, or bottom sides of the letter. Consider, for example, k and x: four arms, but different angles between them and different arm lengths. These similarities are particularly problematic for authors who don’t use an English keyboard, and instead select characters from crowded palettes of letters from which they must choose.
The shape similarity errors that I commonly see are:
c / e: The tops and bottoms of the two letters have similarly curved horizontal strokes, and particularly with small type or type with a small x-height (the height of the lower-case letter x), it’s easy to miss the additional horizontal stroke at the center of the e.
r n / m: Kerning is the process of moving pairs of letters closer together or farther apart to achieve an esthetic visual effect. In tightly kerned fonts, the space between the upper horizontal stroke of the r can appear to merge with the tail at the top left of the n, resulting in an m.
K / X and k / x: The primary difference lies between the vertical left stroke of the “K” and the sometimes-shallow indentation between the two strokes that form the left side of the x.
f i / h: Again, the problem relates to kerning, with the horizontal stroke of the f merging with the top of the i to produce what looks like an h (because the dot above the i is small and easy to miss. Several ligatures deliberately join two letters to produce a third character, such as when the two letters f and i are replaced with a single character (&filig) that preserves characteristics of both letters. These special characters are supported or even automatically applied by some software, such as InDesign.
lower-case L (l) / the number 1: In many fonts (e.g., Times New Roman), the two characters are so similar that it’s practically impossible to distinguish between them. I once received a manuscript in which the author had typed all the lower-case L’s using the number 1.
Capital O / the number 0: Similarly, the letter and the number can be difficult to distinguish in some fonts, particularly at small sizes. I’ve occasionally received a manuscript in which the author typed the number using the letter or vice versa.
u / ii: When the two letters are tightly kerned, whether automatically because of the font’s spacing definitions (“font metrics”) or manually by a designer, the serifs on the bottom edge of the i can merge to resemble the bottom horizontal stroke of the u.
Letters generally have a “center of gravity” that defines where most of their strokes lie, and those strokes provide the basis for recognizing the characters. If a small stroke is removed from or added to the core of a letter, the crucial strokes in the core that we use to recognize the character as we read remain visible, which makes the two characters appear similar. For example, “b” and “d” are essentially the same character, but with a vertical stroke added on the left and right sides of the central bowl (the o), respectively. Two such modifications that I commonly see involve the lower-case “n”:
r / n: Here, the problem is that adding a stroke to the open right side of the r converts that letter into an n. This is particularly common if the r is followed by a character with a vertical stroke at its left edge, such as lower-case L, which changes the r into the letter n.
h / n: Here, the problem is that the loss of part of the vertical stroke on the left side of the “h” converts the character into an “n.”
The visual part of our brain is very skilled at recognizing spatial rearrangements of a letter that don’t change any of its fundamental parts. All that changes is their orientation, and this may not be sufficient to cue the reader that they’re seeing a new letter. Two involve simple rotations:
Z / N: Created by a 90° clockwise rotation from Z to N.
Two more involve flips:
p / b: Created by flipping the “p” vertically so its left stem faces upward.
These kinds of errors are not common, but they can make it difficult to figure out what word an author intended. Authors who have English as their second language aren’t the only ones who can have problems with letter shapes. If your native language is English, fatigue or the need to proofread a PDF file that includes a small, ornate, or tightly kerned font, it’s easy to miss such letter substitutions.
Fortunately, our computers can help us spot such problems. If you’re having difficulty spotting specific letters (e.g., ligatures) or letter combinations, you can create an exclusion dictionary (Windows only) or a search and replace macro that highlights the problem text so you can easily find it and review it (Windows and Mac). See the Appendices for instructions on how.
This feature is not available for Macintosh Word. If you’re using a Mac, see Appendix 2 for an alternative.
If you’re using Word for Windows, you can add the problem words to your spellchecker’s “exclusion” list. These are words that are spelled correctly, but that you want the spellchecker to flag anyway. To accomplish this:
Launch the File Explorer software from the Windows status bar (or press Windows+E), then open the View tab and select the checkbox for “Hidden Items” at the right side of the toolbar.
Use File Explorer to navigate to C:/users/[your name]/Appdata/Roaming/Microsoft/Uproof/. Replace [your name] with your username. For example, my path name includes “Geoff.” Note that if you change the example path name to use your username, you can copy this path name and paste it into the address field at the top of File Explorer.
Select the exclude dictionary for the spellcheck language that you will use it with. For example, for editing in U.S. English, select ExcludeDictionaryEN0409.lex. You can find a complete (thus, very long) list of the language identifiers (here, EN0409) at http://msdn.microsoft.com/en-us/goglobal/bb964664.aspx.
Open this dictionary by right-clicking or Control-clicking on the file name, selecting Open With, and selecting NotePad. You can also edit it in Word, as long as you don’t let Word change the file into a Word file; if it does, simply save the file again with the name extension .lex and in the Save As dialog box, change the file format to “text.”
Type the words you want to have Word flag into the document window and press Return or Enter after each word. Save the file when you’re done.
Because it’s so convoluted to get to this file, I recommend creating a shortcut to the file (or to the Uproof directory that contains it) and store it somewhere easy to access. For example, select the file or Uproof directory icon in File Explorer, right-click or Control-click the icon, and select Pin to Quick Access or Pin to Start Menu.
This second option works for both Mac and Windows versions of Word. All you need to do is create a macro that searches for and highlights potentially problematic words or letter patterns. If Track Changes is on, turn it off; the highlighting steps that I’ve included in the macro are only for your use, not for the author. When you’ve finished reviewing the highlighted text, remove the highlighting: set the highlight color to “no color,” select the whole document (Control+A in Windows, Command+A on the Mac), then click the highlighter marker icon to remove the format. Turn on Track Changes again if you need to do additional editing.
To create the search and replace macro, use Word’s macro recorder to record the following series of actions or open the macro editor and copy/paste the macro text that I’ve provided later in this Appendix:
Start the macro recorder: View tab > Macros > Record Macro.
Set the Highlighter Marker color to the desired highlight color.
Open the Find and Replace dialog.
Expand the dialog box to show all options.
Click to place the cursor in the Find What field, then type the first word or letter or combination of letters that you want to find.
Click to place the cursor in the With What field, then open the Format menu at the bottom of the dialog box and select Highlight.
Click Replace All, then close the Find and Replace dialog.
Repeat the find and replace operation for all words you want to highlight.
When you’re finished, stop the macro recorder: View Tab > Macros > Stop Recording.
Rather than re-recording this macro every time you want to add a word, simply edit the macro by copying and pasting the part of the macro that performs the search and replace operation for a single word. To edit a macro in Word:
View tab > Macros > View Macros.
Select the macro you want to update and click Edit.
Don’t be intimidated by the complicated screen display. The part you’re interested in contains macro instructions similar to the following at the right of the screen:
Sub Highlighter()
' Highlight a list of words throughout a document.
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
Selection.Find.Replacement.Highlight = True
With Selection.Find
.Text = "the"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
To edit any of this text, simply click to place the cursor where you want it, just as if you were editing text in Word. If you want to apply any additional formats to the search pattern (e.g., “MatchCase” to match the capitalization of the search term), change False to True. To add new words in the macro code:
Select the lines of text that begin with With Selection.Find and end with Selection.Find.Execute.
Copy those words.
Move the cursor to the start of the line “End Sub.”
Paste the copied text, then edit the line .Text = "the" to replace the with the word or the letter combinations that you want to highlight.
Your changes will be saved automatically. When you’re finished making the necessary additions, return to the Word window by quitting the macro editor (Alt+F4 in Windows, Command+Q on the Mac).
©2004–2025 Geoffrey Hart. All rights reserved.