Genetic Nomenclature

The phenotype of an organism consists of the observable traits of that organism. The genotype is the actual genetic composition of the organism, its genes and alleles. There are commonly accepted rules followed when naming the phenotypes and genotypes in bacteria, although notations can vary. The phenotype is designated using a three letter code with the first letter capitalized. Superscripts are used to distinguish mutant (-) from wild-type (+) phenotypes. A particular bacteria’s ability to synthesize tryptophan, for example, would be written Trp+. However, when describing an organism’s phenotype, the mutant traits, such as the inability to synthesize tryptophan (Trp), and not the wild type traits are listed. Antibiotic resistance is written using a similar three letter code, but instead of ‘+’ or ‘-‘ superscripts an ‘r’ is used for resistance and an ‘s’ for sensitivity. E. coli‘s resistance to the antibiotic Eryhtromycin would be written Ermr.

Genotypic descriptions are written using a three letter code related to the phenotypic designation. For the genotypic designation, however, all three letters are lower case and either italicized or underlined. Also different for genotypic designations is that by simply mentioning a gene it is assumed that it is a mutant. Take the following genotype and respective phenotype for example:

  • genotype: trp, met, lys
  • phenotype: Trp, Met, Lys

The ‘-‘ mutant superscript is assumed in the phenotype by simply designating the gene in the genotype. It is possible for one protein to be encoded by more than one gene. In such a case an upper case letter (which is also italicized) is added to the end of the three letter gene description. Tryptophan synthetase, the enzyme which catalyzes the final step in the biosynthesis of tryptophan, is encoded by both the trpA gene and the trpB gene. It is even possible to have different mutations in the same gene. In that case a number is added to the end of the gene designation. Mutations in different alleles of the trpA gene could be written trpA1 or trpA32. Finally, a specific nucleotide deletion can be represented using a ‘Δ’ before the gene and the designation for the particular nucleotide after.

  • genotype: recA1, thi, trpB2, Δ(lacZ)m15
  • phenotype: RecA, Thi, TrpB, LacZ

And here are some links to pages about genetic nomenclature used for specific organisms:

E. coli genome project:

‘The E. coli Genome Project at the University of Wisconsin-Madison had its genesis in an editorial by Frederick R. Blattner in the November 18, 1983 issue of Science, in which he raised the idea of completely sequencing the E. coli (and human) genomes:’

“At present the worldwide accomplishment in DNA sequence amounts to 2.3 x 10e6 base pairs, representing 2500 individual sequences. It is now becoming more or less routine to sequence completely the DNA of whole genetic entities ranging from single genes through multigene families to simple life forms such as viruses and phage. Currently the largest single DNA molecule to have been sequenced is the phage lambda genome (48502 base pairs). We are beginning to recognize that determination of the total genetic specification of more advanced life forms may be a possibility in the relatively near future. Extension of this principle to bacteria (genome size 5 x 10e6 base pairs) — the simplest free living forms — would require an increase of the worldwide technical effort by only a factor of 2. Some three orders of magnitude more would be needed to progress to the total human genome.”

— F. R. Blattner (1983) “Biological Frontiers” Science 222(4625), 719-720. [not indexed in PubMed]

‘In that same year we began isolation of an overlapping lambda clonebank of E. coli K-12 strain MG1655. Those clones served as the starting material in our initial efforts to sequence the whole genome. Improvements in sequencing technology have since reached the point where whole-genome sequencing of microbial genomes is routine, and the human genome has in fact been completed. But in those early years of radioactive sequencing reactions and manual reading of autoradiographs, it was a daunting undertaking…’

Mouse Nomenclature Home Page:

‘The Mouse Genome Informatics Database is the authoritative source of official names for mouse genes, alleles, and strains. Nomenclature follows the rules and guidelines established by the International Committee on Standardized Genetic Nomenclature for Mice.’

Genetic nomenclature for Drosophila melanogaster:

The nomenclature guidelines below explain how FlyBase assigns canonical symbols and names to its genetic objects (genes, alleles, transposons, insertions, aberrations and balancers). We encourage the community and journals to adhere to FlyBase-approved symbols/names for consistency in published datasets. While these guidelines cover most circumstances, there may be exceptional cases not clearly covered here. Please contact FlyBase to discuss such cases or any other aspect of the nomenclature.’

Guidlines for Human Gene Nomenclature:

‘Guidelines for human gene nomenclature were first published in 1979 [1], when the Human Gene Nomenclature Committee was first given the authority to approve and implement human gene names and symbols. Updates of these guidelines were published in 1987 [2],1995 [3], and 1997 [4].  With the recent publications of the complete human genome sequence there is an estimated total of 26,000-40,000 genes, as suggested by the International Human Genome Sequencing Consortium [5] and Venter et al. [6].  Thus, the guidelines have been updated to accommodate their application to this wealth of information, although symbols are still only assigned when required for communication. These updates were derived with input from the HUGO Gene Nomenclature Committee (HGNC) International Advisory Committee and attendees of the ASHG01NW Gene Nomenclature Workshop.  All approved human gene symbols can be found in the HGNC database [7].’

‘The philosophy of the HGNC remains “that gene nomenclature should evolve with new technology rather than be restrictive as sometimes occurs when historical and single gene nomenclature systems are applied” [2].’

A summary of the guidelines is presented here:
1. Each approved gene symbol must be unique.
2. Symbols are short-form representations (or abbreviations) of the descriptive gene name.
3. Symbols should only contain Latin letters and Arabic numerals.
4. Symbols should not contain punctuation.
5. Symbols should not contain “G” for gene.
6. Symbols do not contain any reference to species, for example “H/h” for human.’

(Header Image: Petri Dish Art)