It is widely agreed that the standard genetic code must have been preceded by a simpler code that encoded fewer amino acids. How this simpler code could have expanded into the standard genetic code is not well understood because most changes to the code are costly. Taking inspiration from the recently synthesized six-letter code, we propose a novel hypothesis: the initial genetic code consisted of only two letters, G and C, and then expanded the number of available codons via the introduction of an additional pair of letters, A and U. Various lines of evidence, including the relative prebiotic abundance of the earliest assigned amino acids, the balance of their hydrophobicity, and the higher GC content in genome coding regions, indicate that the original two nucleotides were indeed G and C. This process of code expansion probably started with the third base, continued with the second base, and ended up as the standard genetic code when the second pair of letters was introduced into the first base. The proposed process is consistent with the available empirical evidence, and it uniquely avoids the problem of costly code changes by positing instead that the code expanded its capacity via the creation of new codons with extra letters.
The Standard Genetic Code can Evolve from a Two-Letter GC Code Without Information Loss or Costly Reassignments
Alejandro Frank, Tom Froese
Origins of Life and Evolution of Biospheres
June 2018, Volume 48, Issue 2, pp 259–272