Genes Are Like Sentences, Genomes Are Like Books

I lose track sometimes of exactly what the common genetic terms mean and how the genetic pieces work together. What’s the difference between a chromosome and a strand of DNA? A gene and a genome? What are those three-letter sets in a DNA diagram called and what do they do? I’m not a scientist, but since I was an English teacher, connecting the names of genetic units to the units of written language—words, sentences, and so on—makes the picture a little clearer.  Maybe it will do the same for the reader.

Let’s start small.  The spiraling rungs on diagrams of a DNA (deoxyribonucleic acid) molecule are each marked with two of four specific letters: A, C, G, and T.  The four DNA letters stand for the four nucleotides—Adenine, Cytosine, Guanine, and Thymine—that make up DNA. Like the letters of the full alphabet, these letters–or rather the four molecules they indicate–are the smallest building blocks of their language.


In DNA, combinations of the letters for the four nucleotides make up the three-letter codons that are DNA’s version of words. Each three-letter codon/word specifies one amino acid. And most codons are “synonyms” in that several different codons refer to the same amino acid because there are many more codons than there are amino acids. The codons are “read” by a ribosome, a cellular reader/assembly-machine that produces the required amino acid and attaches it to the chain of amino acids that will form a protein.

Groups of these codons make up a gene, much as words make up a sentence. The genes/sentences are long because most proteins are complex; human proteins consist of anywhere from several hundred to several thousand amino acid molecules.  The gene/sentence for red hair says something like “Put this together with that and that and that….”

Genes also include a codon at the start that says “Start the gene here” and another at the end that says “Stop here; gene complete.” Within the gene, however, no actual spaces separate the codons, but since all codons are triplets, it’s always clear where codons themselves begin and end.  (Somewhat similarly, writing in the ancient world often lacked spaces between words.  As long as one could read slowly and figurethewordsoutspacesweren’tessential.)

chromosome (

So, to recap.  The four nucleotides are basic components much like the letters of our alphabet. Groups of three nucleotides spell out codons that can be thought of as words, which in this case are actual amino acid molecules.  And a sequence of codons/amino acids forms a gene that resembles a sentence in a protein recipe for some aspect of the organism.

Finally there are chromosomes and genomes.

A molecule of DNA is very long, a continuous strand of anywhere from a couple of hundred to more than a thousand genes, many of them about related aspects of the organism. Each molecule is a chromosome which, because its genes concern similar aspects of the body, can be compared to a chapter in a book.  But it is a strange book in that each chapter appears twice, in anticipation of the day when the molecule/chapter reproduces itself. Each human cell contain 23 such paired chromosomes, duplicate copies of the assembly instructions for an entire human being. Only the chromosome pair that determines sex contains chromosomes that are different from each other about half the time: females have two identical female chromosomes while males carry one female and one male chromosome.

Finally, our genome is like the book itself, the totality of all our genes on all our chromosomes. The book might be called Me And Us. Your genome book is almost exactly like mine except for about one tenth of one percent of our 20,000 genes that are different. That’s similar to two copies of the same long book that differ only in a few sentences.

Simplified though the comparison is, it’s startling what genetics and written language have in common considering that the second is a recent human invention and the first represents the formation of life almost four billion years ago. Both are composed of the smallest building blocks, then the groupings created from the building blocks, then the meaningful statements/instructions/recipes coded in the groupings, and finally the conversion of the code into organic construction/action/speech.