Point mutations are single amino acid changes in a protein sequence.
They are produced by the modification of the nucleotide sequence
in a gene, which may be the result of a somatic or a germ line change.
Somatic point mutations probably occur quite frequently in eukaryotes,
but they are not passed on to subsequent generations - with the
very notable exception of plants that are propagated by grafting.
Point mutations in germ cells result in modifications in subsequent
generations, and it is this type of mutation that leads to ongoing
genetic variability in a population.
For example, if a peptide has the sequence YGGFLR, then one possible
point mutation is AGGFLR, where the mutation is residue 1 changing
from Y to A. There are 19 possible point mutations for each residue
in a peptide, considering only the 20 commonly occurring amino acids.
It was realized early on in protein sequence comparison studies that there is a bias towards
certain point mutations, when viewed in an evolutionary sense. The matrix below illustrates
this type of evolutionary bias.
| A | R | N | D | C | Q | E | G | H
| I | L | K | M | F | P | S | T
| W | Y | V
|
A | 2
|
R | -2 | 6
|
N | 0 | 0 | 2
|
D | 0 | -1 | 2 | 4
|
C | -2 | -4 | -4 | -5 | 4
|
Q | 0 | 1 | 1 | 2 | -5 | 4
|
E | 0 | -1 | 1 | 3 | -5 | 2 | 4
|
G | 1 | -3 | 0 | 1 | -3 | -1 | 0 | 5
|
H | -1 | 2 | 2 | 1 | -3 | 3 | 1 | -2 | 6
|
I | -1 | -2 | -2 | -2 | -2 | -2 | -2 | -3 | -2 | 5
|
L | -2 | -3 | -3 | -4 | -6 | -2 | -3 | -4 | -2 | 2 | 6
|
K | -1 | 3 | 1 | 0 | -5 | 1 | 0 | -2 | 0 | -2 | -3 | 5
|
M | -1 | 0 | -2 | -3 | -5 | -1 | -2 | -3 | -2 | 2 | 4 | 0 | 6
|
F | -4 | -4 | -4 | -6 | -4 | -5 | -5 | -5 | -2 | 1 | 2 | -5 | 0 | 9
|
P | 1 | 0 | -1 | -1 | -3 | 0 | -1 | -1 | 0 | -2 | -3 | -1 | -2 | -5 | 6
|
S | 1 | 0 | 1 | 0 | 0 | -1 | 0 | 1 | -1 | -1 | -3 | 0 | -2 | -3 | 1 | 3
|
T | 1 | -1 | 0 | 0 | -2 | -1 | 0 | 0 | -1 | 0 | -2 | 0 | -1 | -2 | 0 | 1 | 3
|
W | -6 | 2 | -4 | -7 | -8 | -5 | -7 | -7 | -3 | -5 | -2 | -3 | -4 | 0 | -6 | -2 | -5 | 17
|
Y | -3 | -4 | -2 | -4 | 0 | -4 | -4 | -5 | 0 | -1 | -1 | -4 | -2 | 7 | -5 | -3 | -3 | 0 | 10
|
V | 0 | -2 | -2 | -2 | -2 | -2 | -2 | -1 | -2 | 4 | 2 | -2 | 2 | -1 | -1 | -1 | 0 | -6 | -2 | 4
|
W. A Pearson, Rapid and Sensitive Sequence Comparison with FASTP and FASTA,
in Methods in Enzymology, ed. R. Doolittle (ISBN 0-12-182084-X, Academic Press, San Diego)
183(1990)63-98.
The matrix is frequently used to score aligned peptide sequences to determine the similarity
of those sequences. The numbers given above were derived from comparing aligned sequences of proteins
with known homology and determining the "accepted point mutations" (PAM) observed. The frequencies of these
mutations are in this table as a "log odds-matrix" where:
Mij = 10(log10Rij),
where Mij is the matrix element and Rij is the probability
of that substitution as observed in the database, divided by the
normalized frequency of occurrence for amino acid i. All of the
number are rounded to the nearest integer. The base-10 log is used
so that the numbers can be added to determine the score of a compared
set of sequences, rather than multiplied.
|