2 + 4 = 6 , 4 + 2 = 7?
What if the order you performed addition in mattered and the sum "2 + 3 + 4" gave you a different answer to the sum "4 + 3 + 2"? Accountants everywhere would cry as their books ceased to balance. You might find yourself pausing at the supermarket wondering if you could save money by buying your milk before your toothpaste. Eurovision fans may value a ‘douze points’ early in the voting more than one later on. Fortunately there is no new evidence to suggest that order matters in addition. However, researchers at UCD have recently found that order matters a surprising amount for sequence alignment, an important part of modern genetic analyses.
Sequence alignment is used to understand similarities and differences between proteins found in different species. Proteins are the building blocks of life and carry out most of the functions in our cells. Consequently understanding proteins and their function is a key part of biology. We can use pairwise sequence alignment to identify which parts of a specific protein are identical between a pair of species (humans and chimps, for example). Using multiple sequence alignment we can identify those parts of a protein that are conserved in all mammals or in even larger groups of species. Very large sequence alignments help us understand which parts of proteins are important (if part of a protein is identical in all mammals then it's probably important) and also gives us some insight into the three dimensional structure of proteins (as parts of a protein that are close together in 3D space tend to change together across species).
What Kieran Boyce, Fabian Sievers and Des Higgins from the Higgins Lab in UCD found is that for large protein sequence alignments the order that sequences are compared in matters, i.e. the alignment that you get out of a sequence alignment program depends on the order that you input your sequences into the program.
This finding is surprising, as people have been performing sequence alignments for decades without knowing how dependent the results are on the input order. It is important because it suggests that most scientific publications that make use of large multiple sequence alignments probably have not provided sufficient information to reproduce their results. Reproducibility is an important part of science and consequently when performing sequence alignment most scientists will provide details of the sequence alignment program and the settings they used. Boyce et al.’s findings suggest that the order is also an important setting that may need to be provided from now on. A question left open by the paper is how we can make sequence alignment programs ignore order, or how we can choose the best possible ordering.
Instability in progressive multiple sequence alignment algorithms
Boyce K, Sievers F, Higgins DG
Algorithms for Molecular Biology. 2015;10:26. doi:10.1186/s13015-015-0057-1.
About the Author
Colm Ryan is a Sir Henry Wellcome Postdoctoral Fellow based in Systems Biology Ireland and the Institute of Cancer Research in London. His research aims to improve our understanding of synthetic lethality treatments in cancer therapeutics.