Timsort - Reference.org

On this page

Class Sorting algorithm

Data structure Array

Worst-case performance O ( n log ⁡ n ) {\displaystyle O(n\log n)}

Best-case performance O ( n ) {\displaystyle O(n)}

Average performance O ( n log ⁡ n ) {\displaystyle O(n\log n)}

Worst-case space complexity O ( n ) {\displaystyle O(n)}

Optimal No; Powersort and Peeksort are closer to optimal

Additional information

Timsort

Hybrid sorting algorithm based on insertion sort and merge sort

Timsort is a hybrid, stable sorting algorithm based on merge sort and insertion sort, designed to efficiently handle real-world data by identifying ordered subsequences called runs. Created by Tim Peters in 2002 for the Python programming language, it merges runs to optimize sorting performance. Timsort was Python’s standard algorithm since version 2.3 until replaced by Powersort in 3.11. It is also utilized in Java SE 7, Android, GNU Octave, V8, Swift, and Rust. Its approach is inspired by Peter McIlroy’s 1993 paper on optimistic sorting.

We don't have any images related to Timsort yet.

You can add one yourself here.

We don't have any YouTube videos related to Timsort yet.

You can add one yourself here.

We don't have any PDF documents related to Timsort yet.

You can add one yourself here.

We don't have any Books related to Timsort yet.

You can add one yourself here.

We don't have any archived web articles related to Timsort yet.

You can submit a link to a page to archive here.

Operation

Timsort was designed to take advantage of runs of consecutive ordered elements that already exist in most real-world data, natural runs. It iterates over the data collecting elements into runs and simultaneously putting those runs in a stack. Whenever the runs on the top of the stack match a merge criterion, they are merged. This goes on until all data is traversed; then, all runs are merged two at a time and only one sorted run remains. The advantage of merging ordered runs instead of merging fixed size sub-lists (as done by traditional mergesort) is that it decreases the total number of comparisons needed to sort the entire list.

Each run has a minimum size, which is based on the size of the input and is defined at the start of the algorithm. If a run is smaller than this minimum run size, insertion sort is used to add more elements to the run until the minimum run size is reached.

Merge criteria

Timsort is a stable sorting algorithm (order of elements with same key is kept) and strives to perform balanced merges (a merge thus merges runs of similar sizes).

In order to achieve sorting stability, only consecutive runs are merged. Between two non-consecutive runs, there can be an element with the same key inside the runs. Merging those two runs would change the order of equal keys. Example of this situation ([] are ordered runs): [1 2 2] 1 4 2 [0 1 2]

In pursuit of balanced merges, Timsort considers three runs on the top of the stack, X, Y, Z, and maintains the invariants:

|Z| > |Y| + |X|
|Y| > |X|⁹

If any of these invariants is violated, Y is merged with the smaller of X or Z and the invariants are checked again. Once the invariants hold, the search for a new run in the data can start.¹⁰ These invariants maintain merges as being approximately balanced while maintaining a compromise between delaying merging for balance, exploiting fresh occurrence of runs in cache memory and making merge decisions relatively simple.

Merge space overhead

The original merge sort implementation is not in-place and it has a space overhead of N (data size). In-place merge sort implementations exist, but have a high time overhead. In order to achieve a middle term, Timsort performs a merge sort with a small time overhead and smaller space overhead than N.

First, Timsort performs a binary search to find the location where the first element of the second run would be inserted in the first ordered run, keeping it ordered. Then, it performs the same algorithm to find the location where the last element of the first run would be inserted in the second ordered run, keeping it ordered. Elements before and after these locations are already in their correct place and do not need to be merged. Then, the smaller of these shrunk runs is copied into temporary memory, and the copied elements are merged with the larger shrunk run into the now free space. If the leftmost shrunk run is smaller, the merge proceeds from left to right. If the rightmost shrunk run is smaller, merging proceeds from right to left (i.e. beginning with elements at the ends of the temporary space and leftmost run, and filling the free space from its end). This optimization reduces the number of required element movements, the running time and the temporary space overhead in the general case.

Example: two runs [1, 2, 3, 6, 10] and [4, 5, 7, 9, 12, 14, 17] must be merged. Note that both runs are already sorted individually. The smallest element of the second run is 4 and it would have to be added at the fourth position of the first run in order to preserve its order (assuming that the first position of a run is 1). The largest element of the first run is 10 and it would have to be added at the fifth position of the second run in order to preserve its order. Therefore, [1, 2, 3] and [12, 14, 17] are already in their final positions and the runs in which elements movements are required are [6, 10] and [4, 5, 7, 9]. With this knowledge, we only need to allocate a temporary buffer of size 2 instead of 4.

Merge direction

Merging can be done in both directions: left-to-right, as in the traditional mergesort, or right-to-left.

Galloping mode during merge

An individual merge of runs R1 and R2 keeps the count of consecutive elements selected from a run. When this number reaches the minimum galloping threshold (min_gallop), Timsort considers that it is likely that many consecutive elements may still be selected from that run and switches into galloping mode. Let us assume that R1 is responsible for triggering it. In this mode, the algorithm performs a two-stage search for the place in the run R1 where the next element x of the run R2 would be inserted. In the first stage it performs an exponential search, also known as a galloping search, until finding a k such that R1[2k−1 − 1] < x <= R1[2k − 1], i.e. a region of uncertainty comprising 2k−1 − 1 consecutive elements of R1. The second stage performs a straight binary search of this region to find the exact location in R1 for x. Galloping mode is an attempt to adapt the merge algorithm to the pattern of intervals between elements in runs.

Galloping is not always efficient. In some cases galloping mode requires more comparisons than a simple linear search. According to benchmarks done by the developer, galloping is beneficial only when the initial element of one run is not one of the first seven elements of the other run. This implies an initial threshold of 7. To avoid the drawbacks of galloping mode, two actions are taken: (1) When galloping is found to be less efficient than binary search, galloping mode is exited. (2) The success or failure of galloping is used to adjust min_gallop. If the selected element is from the same array that returned an element previously, min_gallop is reduced by one, thus encouraging the return to galloping mode. Otherwise, the value is incremented by one, thus discouraging a return to galloping mode. In the case of random data, the value of min_gallop becomes so large that galloping mode never recurs.¹¹

Descending runs

In order to also take advantage of data sorted in descending order, Timsort reverses strictly descending runs when it finds them and adds them to the stack of runs. Since descending runs are later blindly reversed, excluding runs with equal elements maintains the algorithm's stability; i.e., equal elements won't be reversed.

Minimum run size

Because merging is most efficient when the number of runs is equal to, or slightly less than, a power of two, and notably less efficient when the number of runs is slightly more than a power of two, Timsort chooses minrun to try to ensure the former condition.¹²

Minrun is chosen from the range 32 to 64 inclusive, such that the size of the data, divided by minrun, is equal to, or slightly less than, a power of two. The final algorithm takes the six most significant bits of the size of the array, adds one if any of the remaining bits are set, and uses that result as the minrun. This algorithm works for all arrays, including those smaller than 64; for arrays of size 63 or less, this sets minrun equal to the array size and Timsort reduces to an insertion sort.¹³

Algorithm

As described above Timsort consists of several pieces, too long to describe here in pseudocode. Interested readers are strongly advised to read one of the following versions (with the stack size fix of 2015):

Implementation in Java, from OpenJDK version 11. 940 lines of code, 403 of which are neither blank nor purely comments.¹⁴
Implementation in C, from CPython version 3.4.10. Code for Timsort starts at line 965 and ends at line 2084, for a total of 1120 lines, 732 of which are neither blank nor purely comments.¹⁵
Implementation in Python, from PyPy commit "7fce1e5", the last update before the "Powersort" policy was incorporated. 636 lines of code, 486 of which are neither blank nor purely comments.¹⁶

The algorithm presented by the website "GeeksforGeeks" is not Timsort.¹⁷ It is in fact regular merge sort with an inner insertion sort and a bad merge procedure with unnecessary copying.

Analysis

In the worst case, Timsort takes O ( n log ⁡ n ) {\displaystyle O(n\log n)} comparisons to sort an array of n {\displaystyle n} elements. In the best case, which occurs when the input is already sorted, it runs in linear time, meaning that it is an adaptive sorting algorithm.¹⁸

For an input that has r {\displaystyle r} runs of sorted elements, the running time is O ( n log ⁡ r ) {\displaystyle O(n\log r)} . More strongly, the time is O ( n + n H ) {\displaystyle O(n+n\mathrm {H} )} , where the run-length entropy H {\displaystyle \mathrm {H} } of an input in which the i {\displaystyle i} th run has size n i {\displaystyle n_{i}} is defined to be¹⁹ H = − ∑ i = 1 r n i n log 2 ⁡ n i n . {\displaystyle \mathrm {H} =-\sum _{i=1}^{r}{\frac {n_{i}}{n}}\log _{2}{\frac {n_{i}}{n}}.} When all run sizes are equal, the run-length entropy log 2 ⁡ r {\displaystyle \log _{2}r} , its maximum value for any given number r {\displaystyle r} of runs, but it can be smaller when the runs have unevenly distributed sizes. The formula for the running time is given as n + n H {\displaystyle n+n\mathrm {H} } rather than more simply n H {\displaystyle n\mathrm {H} } , to account for the possibility that the entropy can be less than one.²⁰

The above behavior regarding the run-length entropy H {\displaystyle \mathrm {H} } is purely derived by Timsort's merge criteria, as that is the part responsible for detecting already-sorted parts. The "galloping" routine exploits a new property that can be described as the dual-run entropy H ∗ {\displaystyle \mathrm {H} ^{*}} :²¹ H ∗ = − ∑ i = 1 s n i n log 2 ⁡ n i n . {\displaystyle \mathrm {H} ^{*}=-\sum _{i=1}^{s}{\frac {n_{i}}{n}}\log _{2}{\frac {n_{i}}{n}}.} where s {\displaystyle s} the number of times "galloping" is interrupted by and s ≤ r {\displaystyle s\leq r} . It can be proven that TimSort takes up to O ( n + n H ) {\displaystyle O(n+n\mathrm {H} )} element moves and O ( n + n H ∗ ) {\displaystyle O(n+n\mathrm {H} ^{*})} value comparisons.²²

Adoption and influence

Influence

Timsort has inspired many similar algorithms, both in when they decide to merge (the nature of merge trees) and how they perform a merge (especially the galloping heuristic). Among them:

Peeksort, Powersort, adaptive ShiversSort, and α-Mergesort have the same property with regard to E t a ∗ {\displaystyle Eta^{*}} .²³
NaturalMergeSort, ShiversSort and α-StackSort do not have the property with regard to E t a ∗ {\displaystyle Eta^{*}} because they can only merge the top two elements of their stack.²⁴
NaturalMergeSort, ShiversSort and PowerSort take up to n log 2 ⁡ n + O ( n ) {\displaystyle n\log _{2}n+O(n)} comparisons.²⁵

Further influences include:

PersiSort, an algorithm that extends on the merge criterion with persistent homology.²⁶

Formal verification

In 2015, Dutch and German researchers in the EU FP7 ENVISAGE project found a bug in the standard implementation of Timsort.²⁷ It was fixed in 2015 in Python, Java, and Android.

Specifically, the invariants on stacked run sizes ensure a tight upper bound on the maximum size of the required stack. The implementation preallocated a stack sufficient to sort 264 bytes of input, and avoided further overflow checks.

However, the guarantee requires the invariants to apply to every group of three consecutive runs, but the implementation only checked it for the top three.²⁸ Using the KeY tool for formal verification of Java software, the researchers found that this check is not sufficient, and they were able to find run lengths (and inputs which generated those run lengths) which would result in the invariants being violated deeper in the stack after the top of the stack was merged.²⁹

As a consequence, for certain inputs the allocated size is not sufficient to hold all unmerged runs. In Java, this generates for those inputs an array-out-of-bound exception. The smallest input that triggers this exception in Java and Android v7 is of size 67108864 (226). (Older Android versions already triggered this exception for certain inputs of size 65536 (216))

The Java implementation was corrected by increasing the size of the preallocated stack based on an updated worst-case analysis. The article also showed by formal methods how to establish the intended invariant by checking that the four topmost runs in the stack satisfy the two rules above. This approach was initially adopted by Python³⁰ until it switched to Powersort in 2022 with the release of Python 3.11.³¹

External links

timsort.txt – original explanation by Tim Peters

References

James, Mike. "Python Now Uses Powersort". I Programmer. Retrieved 21 June 2024. https://www.i-programmer.info/news/216-python/15954-python-now-uses-powersort.html ↩
"[#JDK-6804124] (coll) Replace "modified mergesort" in java.util.Arrays.sort with timsort". JDK Bug System. Retrieved 11 June 2014. https://bugs.openjdk.java.net/browse/JDK-6804124 ↩
"Class: java.util.TimSort". Android Gingerbread Documentation. Archived from the original on 16 July 2015. Retrieved 24 February 2011. https://web.archive.org/web/20150716000631/https://android.googlesource.com/platform/libcore/+/gingerbread/luni/src/main/java/java/util/TimSort.java ↩
"liboctave/util/oct-sort.cc". Mercurial repository of Octave source code. Lines 23-25 of the initial comment block. Retrieved 18 February 2013. Code stolen in large part from Python's, listobject.c, which itself had no license header. However, thanks to Tim Peters for the parts of the code I ripped-off. http://hg.savannah.gnu.org/hgweb/octave/file/0486a29d780f/liboctave/util/oct-sort.cc ↩
"Getting things sorted in V8 · V8". v8.dev. Retrieved 21 December 2018. https://v8.dev/blog/array-sort ↩
"Is sort() stable in Swift 5?". Swift Forums. 4 July 2019. Retrieved 4 July 2019. https://forums.swift.org/t/is-sort-stable-in-swift-5/21297/9 ↩
"Comment in file "rust/src/liballoc/slice.rs"". Source code on GitHub. Retrieved 15 March 2025. https://github.com/rust-lang/rust/blob/5f60208ba11171c249284f8fe0ea6b3e9b63383c/src/liballoc/slice.rs#L841-L980 ↩
McIlroy, Peter (January 1993). "Optimistic Sorting and Information Theoretic Complexity". Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 467–474. ISBN 0-89871-313-7. 0-89871-313-7 ↩
"listsort.txt". Python source code. 18 May 2022. Archived from the original on 28 January 2016. https://web.archive.org/web/20160128232837/https://hg.python.org/cpython/file/tip/Objects/listsort.txt ↩
MacIver, David R. (11 January 2010). "Understanding timsort, Part 1: Adaptive Mergesort". Retrieved 5 December 2015. http://www.drmaciver.com/2010/01/understanding-timsort-1adaptive-mergesort/ ↩
Peters, Tim. "listsort.txt". CPython git repository. Retrieved 5 December 2019. https://github.com/python/cpython/blob/master/Objects/listsort.txt ↩
"listsort.txt". Python source code. 18 May 2022. Archived from the original on 28 January 2016. https://web.archive.org/web/20160128232837/https://hg.python.org/cpython/file/tip/Objects/listsort.txt ↩
"listsort.txt". Python source code. 18 May 2022. Archived from the original on 28 January 2016. https://web.archive.org/web/20160128232837/https://hg.python.org/cpython/file/tip/Objects/listsort.txt ↩
"openjdk-jdk11u/src/java.base/share/classes/java/util/TimSort.java at master · AdoptOpenJDK/openjdk-jdk11u". GitHub. https://github.com/AdoptOpenJDK/openjdk-jdk11u/blob/master/src/java.base/share/classes/java/util/TimSort.java ↩
"cpython/Objects/listobject.c at v3.4.10 · python/cpython". GitHub. https://github.com/python/cpython/blob/v3.4.10/Objects/listobject.c ↩
"pypy/rpython/rlib/listsort.py at 7fce1e526e750b4880c7fe61ce9362227ce60a70 · pypy/pypy". GitHub. https://github.com/pypy/pypy/blob/7fce1e526e750b4880c7fe61ce9362227ce60a70/rpython/rlib/listsort.py ↩
"TimSort - Data Structures and Algorithms Tutorials". GeeksforGeeks. 19 May 2017. Retrieved 5 April 2025. https://www.geeksforgeeks.org/timsort/ ↩
Chandramouli, Badrish; Goldstein, Jonathan (2014). "Patience is a virtue: revisiting merge and sort on modern processors". In Dyreson, Curtis E.; Li, Feifei; Özsu, M. Tamer (eds.). International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22–27, 2014. Association for Computing Machinery. pp. 731–742. doi:10.1145/2588555.2593662. https://scholar.archive.org/work/xkktlgktsjglhcqmv7rw2d5aky ↩
Auger, Nicolas; Jugé, Vincent; Nicaud, Cyril; Pivoteau, Carine (2018). "On the worst-case complexity of TimSort". In Azar, Yossi; Bast, Hannah; Herman, Grzegorz (eds.). 26th Annual European Symposium on Algorithms, ESA 2018, August 20–22, 2018, Helsinki, Finland. LIPIcs. Vol. 112. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. pp. 4:1–4:13. arXiv:1805.08612. doi:10.4230/LIPIcs.ESA.2018.4. /wiki/Hannah_Bast ↩
Auger, Nicolas; Jugé, Vincent; Nicaud, Cyril; Pivoteau, Carine (2018). "On the worst-case complexity of TimSort". In Azar, Yossi; Bast, Hannah; Herman, Grzegorz (eds.). 26th Annual European Symposium on Algorithms, ESA 2018, August 20–22, 2018, Helsinki, Finland. LIPIcs. Vol. 112. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. pp. 4:1–4:13. arXiv:1805.08612. doi:10.4230/LIPIcs.ESA.2018.4. /wiki/Hannah_Bast ↩
Ghasemi, Elahe; Jugé, Vincent; Khalighinejad, Ghazal (28 June 2022). Galloping in Fast-Growth Natural Merge Sorts. ICALP 2022. p. 3. doi:10.4230/LIPIcs.ICALP.2022.68. /wiki/Doi_(identifier) ↩
Ghasemi, Elahe; Jugé, Vincent; Khalighinejad, Ghazal (28 June 2022). Galloping in Fast-Growth Natural Merge Sorts. ICALP 2022. p. 3. doi:10.4230/LIPIcs.ICALP.2022.68. /wiki/Doi_(identifier) ↩
Ghasemi, Elahe; Jugé, Vincent; Khalighinejad, Ghazal (28 June 2022). Galloping in Fast-Growth Natural Merge Sorts. ICALP 2022. p. 3. doi:10.4230/LIPIcs.ICALP.2022.68. /wiki/Doi_(identifier) ↩
Ghasemi, Elahe; Jugé, Vincent; Khalighinejad, Ghazal (28 June 2022). Galloping in Fast-Growth Natural Merge Sorts. ICALP 2022. p. 3. doi:10.4230/LIPIcs.ICALP.2022.68. /wiki/Doi_(identifier) ↩
Ghasemi, Elahe; Jugé, Vincent; Khalighinejad, Ghazal (28 June 2022). Galloping in Fast-Growth Natural Merge Sorts. ICALP 2022. p. 3. doi:10.4230/LIPIcs.ICALP.2022.68. /wiki/Doi_(identifier) ↩
Refsgaard Schou, Jens Kristian; Wang, Bei (2024). PersiSort: a new perspective on adaptive sorting based on persistence (PDF). CCCG. p. 2. https://pure.au.dk/portal/files/411679309/PersiSort_final_version.pdf ↩
de Gouw, Stijn; Rot, Jurriaan; de Boer, Frank S.; Bubel, Richard; Hähnle, Reiner (2015). "OpenJDK's Java.utils.Collection.sort() is broken: The good, the bad and the worst case". In Kroening, Daniel; Păsăreanu, Corina S. (eds.). Computer Aided Verification – 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18–24, 2015, Proceedings, Part I. Lecture Notes in Computer Science. Vol. 9206. Springer. pp. 273–289. doi:10.1007/978-3-319-21690-4_16. https://scholar.archive.org/work/egc6ljg5sjcnbh3z4yecnktmd4 ↩
de Gouw, Stijn; Rot, Jurriaan; de Boer, Frank S.; Bubel, Richard; Hähnle, Reiner (2015). "OpenJDK's Java.utils.Collection.sort() is broken: The good, the bad and the worst case". In Kroening, Daniel; Păsăreanu, Corina S. (eds.). Computer Aided Verification – 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18–24, 2015, Proceedings, Part I. Lecture Notes in Computer Science. Vol. 9206. Springer. pp. 273–289. doi:10.1007/978-3-319-21690-4_16. https://scholar.archive.org/work/egc6ljg5sjcnbh3z4yecnktmd4 ↩
de Gouw, Stijn (24 February 2015). "Proving that Android's, Java's and Python's sorting algorithm is broken (and showing how to fix it)". Retrieved 6 May 2017. http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/ ↩
Python Issue Tracker – Issue 23515: Bad logic in timsort's merge_collapse http://bugs.python.org/issue23515 ↩
James, Mike. "Python Now Uses Powersort". I Programmer. Retrieved 21 June 2024. https://www.i-programmer.info/news/216-python/15954-python-now-uses-powersort.html ↩

Operation

Merge criteria

Merge space overhead

Merge direction

Galloping mode during merge

Descending runs

Minimum run size

Algorithm

Analysis

Adoption and influence

Influence

Formal verification

Further reading

External links

References