We consider a DNA strand as a mathematical statement. Inspired by the work of Kurt Godel, we attach to each DNA strand a Godel's number, a product of prime numbers raised to appropriate powers. To each DNA chain corresponds a single Godel's number G, and inversely given a Godel's number G, we can specify the DNA chain it stands for. Next, considering a single DNA strand composed of N bases, we study the statistical distribution of g, the logarithm of G. Our assumption is that the choice of the mth term is random and with equal probability for the four possible outcomes. The 'experiment', to some extent, is similar to throwing N times a four-faces die. Through the moment generating function we obtain the discrete and then the continuum distribution of g. There is an excellent agreement between our formalism and simulated data. At the end we compare our formalism to actual data, to specify the presence of non-random fluctuations. (C) 2022 Elsevier B.V. All rights reserved.
Godel numberingNucleotide sequencesInformation theoryLanguage theoryLanguage representation of biological sequences