Editing, Writing, and Translation

Home Services Books Articles Resources Fiction Contact me Français

You are here: Articles --> 2007 --> Combining words and pictures: degrees of abstraction
Vous êtes ici : Essais --> 2007 --> Combining words and pictures: degrees of abstraction

Combining words and pictures: degrees of abstraction

by Geoff Hart

Previously published as: Hart, G. 2007. Combining words and pictures: degrees of abstraction. Intercom January 2007:38–39, 42.

In my November 2006 column, I demonstrated how complex concepts can be simplified by judicious use of abstraction. Of course, simple concepts can also be communicated in this way. In both cases, the solution is to choose between precise words, appropriately complex images, or a combination of the two. Words sometimes communicate more precisely than any image because they convey a narrower range of specific meanings (i.e., dictionary denotations). If we choose our words carefully and with an understanding of the possible connotations, we greatly reduce the risk of misinterpretation, particularly when we're describing anything that we can't actually see. As a rule of thumb, such things are good candidates for purely verbal explanations.

The word zero offers a good example, because it has no visual counterpart in nature. This is undoubtedly why the concept of zero arrived late in the development of language and why it revolutionized mathematics. We do communicate the concept of zero with a graphical symbol (the numeral 0), but that symbol is not intuititive: just like any other word, we must learn its meaning (in this case, that it doesn't represent the letter O or a wedding ring). Because words must be learned, they most effectively communicate learned concepts whose meanings are determined by consensus (we all agree on the meaning) and standardized in a dictionary. Of course, poorly chosen words with a range of related meanings can be inefficient and highly subjective. Contrast, for example, an American's reaction to the word Republican: people with different political philosophies will have very different reactions, and both reactions will differ from those of Canadians.

A picture is worth a thousand words

Images may be less precise, but they have a strength words lack: we humans are highly visual creatures and surprisingly proficient at understanding images. Indeed, written communication would be impossible if we could not reliably distinguish between the dozens of symbols in the English alphabet or the tens of thousands in the Chinese "alphabet". Most of us invest the first half-dozen years of our life learning to skillfully interpret our visual world—long before we acquire any proficiency with words—and those who are blind or otherwise visually impaired encounter considerable difficulty interacting with the modern world, which assumes a high level of visual skill. (In discussing visual communication, we must never forget that some of our audience will be visually impaired and will require non-visual alternatives.)

“Literal” images such as photographs capture reality in ways impossible even using the thousand words in the cliché about a picture's worth. Consider, for example, a photograph of my desk (Figure 1). How could we possibly describe the visual textures of the books, the chaos of the objects, or the curvature of the desk?

A literal image (photo) that shows too much information

Figure 1. A "literal" image of my office, combining bewildering detail with abstract concepts such as color, shape, and texture.

At the other extreme, purely “abstract” images may have no visual equivalent in nature, and can convey concepts for which there are no words. Graphs of mathematical equations, for example, often convey meaning more effectively to nonmathematicians and even mathematicians than the equation alone; the modern field of scientific visualization derives its strength from such abstract images. Abstractions such as graphs convey enormous quantities of information in a small space, and in a manner impossible to convey simply in words—even when those "words" are mathematical equations.

Unfortunately, images are more subjective than words. The painting of a nude may be great art to one viewer, but pornography to another. But more importantly, photographs have a relatively limited ability to communicate the photographer’s emphasis. For example, is my desk (Figure 1) an efficient, information- and tool-dense workspace, or a cluttered and disorganized shambles? Skilled photographers learn to frame or crop their photographs to eliminate irrelevant detail. For example, Figure 2 reduces the clutter by focusing on the monitor, eliminating some subjectivity by removing details to let viewers focus only on those that remain. We could continue this process by replacing the photograph with an illustration, such as a screenshot of the open word processor document.

A cropped version that focuses on fewer details

Figure 2. Cropping a photograph to focus attention on fewer details.

This spectrum from realism to abstraction provides illustrators with something that begins to resemble a visual grammar and vocabulary, though despite attempts by authors such as Jacques Bertin and Robert Harris, illustrators still have no references as precise and universally accepted as dictionaries and style guides to codify that vocabulary and grammar. To attain that precision, we often need more than images alone.

Combining words and pictures

Synergy happens when combining two things enhances the effects of both. Once you understand the relative strengths and weaknesses of words and images, you can combine them so each medium’s strengths compensate for the other medium’s weaknesses. The cropped photograph in Figure 2 communicates more precisely if I annotate it to draw attention to one specific feature (Figure 3)—in this case, documentary evidence that people really do attach Post-it notes to their computers. Accompanying text can then explain the significance of this visual evidence.

Using text to point out the one key detail

Figure 3. Adding text focuses attention on details that aren't apparent from the graphic alone. (We could further enlarge the image to show the password written on the Post-it note.)


Striking the right balance

Choosing between words, images, or a combination of the two is rarely a matter of absolutes. There's usually more than one effective way to balance the two and still communicate effectively. For most of us, trained as writers, the problem becomes how to determine when text is insufficient and how much text (if any) is necessary to explain an image.

Finding a successful balance depends more on how you think about each component of an image than on blind adherence to any rule. Start by defining exactly what you want to communicate. Given the limited visual and textual vocabulary that are available, and constraints imposed by the grammar and rhetoric used to link that vocabulary, examine each aspect of your message by asking which words and images could communicate each message effectively. (This questioning is where some neophyte information designers fail: through excessive comfort with words, they fail to consider the potential of images.)

Once you have two or more alternatives (e.g., a description and an image), ask which one communicates most successfully on its own, and what information it fails to communicate. Sometimes the choice is clear: the words may be enormously more specific than the image, or the image may be too complex to describe in words. More often, different options have different strengths and deficiencies, and combining them will compensate for those deficiencies. Figure 3 provides documentary evidence that a simple statement of fact or a hand-drawn illustration would lack; labeling the image focuses attention on the important aspect of that evidence.

In information design, we must expand our concepts of vocabulary, grammar, and rhetoric to include their visual equivalents. (I'll discuss this in my next article.) Just as writers ponder the most effective word choice (vocabulary), the most effective word order (grammar), and the best rhetorical approach (e.g., emotional versus logical), comparable choices are available for images. We can consider the types of images available (our visual vocabulary), how the various components of the image work together (their grammar), and an appropriate rhetoric (e.g., realistic and literal versus abstract and conceptual). For example, we could improve Figure 3 by combining different types of image: a photograph to show the Post-it note, and a screenshot illustrating the password-entry screen. Claire Harrison discusses this and other issues related to the use of images in her February 2003 article in Technical Communication.

To the extent that there is a “correct” choice, you can identify that choice by practicing this thought process until you gradually attain proficiency with the vocabulary, grammar, and rhetoric of images. Of course, we should always remember that we're doing so on behalf of our audience. Experimentation involves working with our audience to learn the kinds of images they are familiar with and skilled at interpreting, and testing any images we develop to confirm that they communicate as effectively as we hope.


Bertin, J. 1983. Semiology of graphics: diagrams, networks, maps. (Translated by William Berg.) University of Wisconsin Press, Madison, Wisc. 415 p.

Harris, R.L. 1996. Information graphics: a comprehensive illustrated reference. Management Graphics, Atlanta, Georgia, 448 p.

Harrison, C. 2003. Visual social semiotics: understanding how still images make meaning. Technical Communication 50(1):46–60.

©2004–2018 Geoffrey Hart. All rights reserved