The closely related problem of finding a minimum-length string which is a superstring of a finite set of strings S = { s1,s2,...,sn } is also NP-hard.2 Several constant factor approximations have been proposed throughout the years, and the current best known algorithm has an approximation factor of 2.475.3 However, perhaps the simplest solution is to reformulate the problem as an instance of weighted set cover in such a way that the weight of the optimal solution to the set cover instance is less than twice the length of the shortest superstring S. One can then use the O(log(n))-approximation for weighted set-cover to obtain an O(log(n))-approximation for the shortest superstring (note that this is not a constant factor approximation).
For any string x in this alphabet, define P(x) to be the set of all strings which are substrings of x. The instance I of set cover is formulated as follows:
The instance I can then be solved using an algorithm for weighted set cover, and the algorithm can output an arbitrary concatenation of the strings x for which the weighted set cover algorithm outputs P(x).4
Consider the set S = { abc, cde, fab }, which becomes the universe of the weighted set cover instance. In this case, M = { abcde, fabc }. Then the set of subsets of the universe is
which have costs 3, 3, 3, 5, and 4, respectively.
David Maier (1978). "The Complexity of Some Problems on Subsequences and Supersequences". J. ACM. 25 (2). ACM Press: 322–336. doi:10.1145/322063.322075. S2CID 16120634. https://doi.org/10.1145%2F322063.322075 ↩
Kari-Jouko Räihä, Esko Ukkonen (1981). "The shortest common supersequence problem over binary alphabet is NP-complete". Theoretical Computer Science. 16 (2): 187–198. doi:10.1016/0304-3975(81)90075-x. /wiki/Doi_(identifier) ↩
Matthias Englert and Nicolaos Matsakis and Pavel Vesel (2022). "Improved approximation guarantees for shortest superstrings using cycle classification by overlap to length ratios". Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (PDF). pp. 317–330. doi:10.1145/3519935.3520001. ISBN 9781450392648. S2CID 243847650. 9781450392648 ↩
Vazirani 2001, p. 20. - Vazirani, Vijay V. (2001), Approximation Algorithms, Springer-Verlag, ISBN 3-540-65367-8 http://www.cc.gatech.edu/fac/Vijay.Vazirani/book.pd ↩