In mathematics, a subsequence of a sequence is formed by deleting some or no elements without changing the order of the remaining elements, such as ⟨A, B, D⟩ from ⟨A, B, C, D, E, F⟩ after removing C, E, and F. This relation defines a partial order. A subsequence can include elements that are not consecutive in the original sequence, whereas a substring is a consecutive run of elements, like ⟨B, C, D⟩ in the same sequence. For example, all subsequences of the word "apple" include “a”, “ap”, “app”, “apple”, “ppl”, “e”, and even the empty string.
Common subsequence
Given two sequences X {\displaystyle X} and Y , {\displaystyle Y,} a sequence Z {\displaystyle Z} is said to be a common subsequence of X {\displaystyle X} and Y , {\displaystyle Y,} if Z {\displaystyle Z} is a subsequence of both X {\displaystyle X} and Y . {\displaystyle Y.} For example, if X = ⟨ A , C , B , D , E , G , C , E , D , B , G ⟩ and {\displaystyle X=\langle A,C,B,D,E,G,C,E,D,B,G\rangle \qquad {\text{ and}}} Y = ⟨ B , E , G , J , C , F , E , K , B ⟩ and {\displaystyle Y=\langle B,E,G,J,C,F,E,K,B\rangle \qquad {\text{ and}}} Z = ⟨ B , E , E ⟩ . {\displaystyle Z=\langle B,E,E\rangle .} then Z {\displaystyle Z} is said to be a common subsequence of X {\displaystyle X} and Y . {\displaystyle Y.}
This would not be the longest common subsequence, since Z {\displaystyle Z} only has length 3, and the common subsequence ⟨ B , E , E , B ⟩ {\displaystyle \langle B,E,E,B\rangle } has length 4. The longest common subsequence of X {\displaystyle X} and Y {\displaystyle Y} is ⟨ B , E , G , C , E , B ⟩ . {\displaystyle \langle B,E,G,C,E,B\rangle .}
Applications
Subsequences have applications to computer science,1 especially in the discipline of bioinformatics, where computers are used to compare, analyze, and store DNA, RNA, and protein sequences.
Take two sequences of DNA containing 37 elements, say:
SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA SEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAAThe longest common subsequence of sequences 1 and 2 is:
LCS(SEQ1,SEQ2) = CGTTCGGCTATGCTTCTACTTATTCTAThis can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences:
SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA SEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAAAnother way to show this is to align the two sequences, that is, to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) for padding of arisen empty subsequences:
SEQ1 = ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA- | || ||| ||||| | | | | || | || | || | ||| SEQ2 = -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAASubsequences are used to determine how similar the two strands of DNA are, using the DNA bases: adenine, guanine, cytosine and thymine.
Theorems
- Every infinite sequence of real numbers has an infinite monotone subsequence. (This is a lemma used in the proof of the Bolzano–Weierstrass theorem.)
- Every infinite bounded sequence in R n {\displaystyle \mathbb {R} ^{n}} has a convergent subsequence. (This is the Bolzano–Weierstrass theorem.)
- For all integers r {\displaystyle r} and s , {\displaystyle s,} every finite sequence of length at least ( r − 1 ) ( s − 1 ) + 1 {\displaystyle (r-1)(s-1)+1} contains a monotonically increasing subsequence of length r {\displaystyle r} or a monotonically decreasing subsequence of length s {\displaystyle s} . (This is the Erdős–Szekeres theorem.)
- A metric space ( X , d ) {\displaystyle (X,d)} is compact if every sequence in X {\displaystyle X} has a convergent subsequence whose limit is in X {\displaystyle X} .
See also
- Subsequential limit – The limit of some subsequence
- Limit superior and limit inferior – Bounds of a sequencePages displaying short descriptions of redirect targets
- Longest increasing subsequence problem – Computer science problemPages displaying short descriptions of redirect targets
Notes
This article incorporates material from subsequence on PlanetMath, which is licensed under the Creative Commons Attribution/Share-Alike License.
References
In computer science, string is often used as a synonym for sequence, but it is important to note that substring and subsequence are not synonyms. Substrings are consecutive parts of a string, while subsequences need not be. This means that a substring of a string is always a subsequence of the string, but a subsequence of a string is not always a substring of the string, see: Gusfield, Dan (1999) [1997]. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. USA: Cambridge University Press. p. 4. ISBN 0-521-58519-8. 0-521-58519-8 ↩