Proteins of Escherichia coli come in sizes that are multiples of 14 kDa: domain concepts and evolutionary implications.
Initial attempts to correlate the distribution of gene density (number of gene loci per unit length on the linkage map) with the distribution of lengths of coding sequences have led to the observation that 46% of approximately 1000 sampled proteins in Escherichia coli have molecular masses of n X 14,000 +/- 2500 daltons (n = 1, 2, ...). This clustering around multiples of 14,000 contrasts with the 36% one would expect in these ranges if the sizes were uniformly distributed. The entire distribution is well fit by a sum of normal or lognormal distributions located at multiples of 14,000, which suggests that the percentage of E. coli proteins governed by the underlying sizing mechanism is much greater than 50%. Clustering of protein molecular sizes around multiples of a unit size also is suggested by the distribution of well-characterized HeLa cell proteins. The distribution of gene lengths for E. coli suggests regular clustering, which implies that the clustering of protein molecular masses is not an artifact of the molecular mass measurement by gel electrophoresis. These observations suggest the existence of a fundamental structural unit. The rather uniform size of this structural unit (without any apparent sequence homology) suggests that a general principle such as geometrical or physical optimization at the DNA or protein level is responsible. This suggestion is discussed in relation to experimental evidence for the domain structure of proteins and to existing hypotheses that attempt to account for these domains. Microevolution would appear to be accommodated by incremental changes within this fundamental unit, whereas macroevolution would appear to involve "quantum" changes to the next stable size of protein.