Duplicate code - Reference.org

On this page

Duplicate code

Repeated fragment of computer source code

In computer programming, duplicate code refers to sequences of source code that appear multiple times within or across programs. Often called code clones, these duplicates are generally seen as undesirable because they can complicate maintenance and increase errors. Clone detection is the automated process used to identify such repetitions. Duplicate code doesn’t have to be exactly the same text; it might be identical when ignoring whitespace and comments or resemble each other at the token-for-token level, sometimes allowing minor variations. Even code sequences that serve the same function but differ in implementation may be treated as duplicates, emphasizing the need for careful analysis in managing code quality.

We don't have any images related to Duplicate code yet.

You can add one yourself here.

We don't have any YouTube videos related to Duplicate code yet.

You can add one yourself here.

We don't have any PDF documents related to Duplicate code yet.

You can add one yourself here.

We don't have any Books related to Duplicate code yet.

You can add one yourself here.

We don't have any archived web articles related to Duplicate code yet.

You can submit a link to a page to archive here.

Emergence

Some of the ways in which duplicate code may be created are:

copy and paste programming, which in academic settings may be done as part of plagiarism

scrounging, in which a section of code is copied "because it works". In most cases this operation involves slight modifications in the cloned code, such as renaming variables or inserting/deleting code. The language nearly always allows one to call one copy of the code from different places, so that it can serve multiple purposes, but instead the programmer creates another copy, perhaps because they
- do not understand the language properly
- do not have the time to do it properly, or
- do not care about the increased active software rot.

It may also happen that functionality is required that is very similar to that in another part of a program, and a developer independently writes code that is very similar to what exists elsewhere. Studies suggest that such independently rewritten code is typically not syntactically similar.²

Automatically generated code, where having duplicate code may be desired to increase speed or ease of development, is another reason for duplication. Note that the actual generator will not contain duplicates in its source code, only the output it produces.

Fixing

Duplicate code is most commonly fixed by moving the code into its own unit (function or module) and calling that unit from all of the places where it was originally used. Using a more open-source style of development, in which components are in centralized locations, may also help with duplication.

Costs and benefits

Code which includes duplicate functionality is more difficult to support because,

it is simply longer, and
if it needs updating, there is a danger that one copy of the code will be updated without further checking for the presence of other instances of the same code.

On the other hand, if one copy of the code is being used for different purposes, and it is not properly documented, there is a danger that it will be updated for one purpose, but this update will not be required or appropriate to its other purposes.

These considerations are not relevant for automatically generated code, if there is just one copy of the functionality in the source code.

In the past, when memory space was more limited, duplicate code had the additional disadvantage of taking up more space, but nowadays this is unlikely to be an issue.

When code with a software vulnerability is copied, the vulnerability may continue to exist in the copied code if the developer is not aware of such copies.³ Refactoring duplicate code can improve many software metrics, such as lines of code, cyclomatic complexity, and coupling. This may lead to shorter compilation times, lower cognitive load, less human error, and fewer forgotten or overlooked pieces of code. However, not all code duplication can be refactored.⁴ Clones may be the most effective solution if the programming language provides inadequate or overly complex abstractions, particularly if supported with user interface techniques such as simultaneous editing. Furthermore, the risks of breaking code when refactoring may outweigh any maintenance benefits. ⁵ A study by Wagner, Abdulkhaleq, and Kaya concluded that while additional work must be done to keep duplicates in sync, if the programmers involved are aware of the duplicate code there weren't significantly more faults caused than in unduplicated code. ⁶[disputed – discuss]

Detecting duplicate code

A number of different algorithms have been proposed to detect duplicate code. For example:

Baker's algorithm.⁷
Rabin–Karp string search algorithm.
Using abstract syntax trees.⁸
Visual clone detection.⁹
Count matrix clone detection.¹⁰ ¹¹
Locality-sensitive hashing
Anti-unification ¹²

Example of functionally duplicate code

Consider the following code snippet for calculating the average of an array of integers

extern int array_a[]; extern int array_b[]; int sum_a = 0; for (int i = 0; i < 4; i++) sum_a += array_a[i]; int average_a = sum_a / 4; int sum_b = 0; for (int i = 0; i < 4; i++) sum_b += array_b[i]; int average_b = sum_b / 4;

The two loops can be rewritten as the single function:

int calc_average_of_four(int* array) { int sum = 0; for (int i = 0; i < 4; i++) sum += array[i]; return sum / 4; }

or, usually preferably, by parameterising the number of elements in the array.

Using the above function will give source code that has no loop duplication:

extern int array1[]; extern int array2[]; int average1 = calc_average_of_four(array1); int average2 = calc_average_of_four(array2);

Note that in this trivial case, the compiler may choose to inline both calls to the function, such that the resulting machine code is identical for both the duplicated and non-duplicated examples above. If the function is not inlined, then the additional overhead of the function calls will probably take longer to run (on the order of 10 processor instructions for most high-performance languages). Theoretically, this additional time to run could matter.

External links

References

Spinellis, Diomidis. "The Bad Code Spotter's Guide". InformIT.com. Retrieved 2008-06-06. http://www.informit.com/articles/article.aspx?p=457502&seqNum=5 ↩
Code similarities beyond copy & paste by Elmar Juergens, Florian Deissenboeck, Benjamin Hummel. https://www.cqse.eu/publications/2010-code-similarities-beyond-copy-paste.pdf ↩
Li, Hongzhe; Kwon, Hyuckmin; Kwon, Jonghoon; Lee, Heejo (25 April 2016). "CLORIFI: software vulnerability discovery using code clone verification". Concurrency and Computation: Practice and Experience. 28 (6): 1900–1917. doi:10.1002/cpe.3532. S2CID 17363758. /wiki/Doi_(identifier) ↩
Arcelli Fontana, Francesca; Zanoni, Marco; Ranchetti, Andrea; Ranchetti, Davide (2013). "Software Clone Detection and Refactoring" (PDF). ISRN Software Engineering. 2013: 1–8. doi:10.1155/2013/129437. https://boa.unimib.it/bitstream/10281/42633/1/clone.pdf ↩
Kapser, C.; Godfrey, M.W., ""Cloning Considered Harmful" Considered Harmful," 13th Working Conference on Reverse Engineering (WCRE), pp. 19-28, Oct. 2006 http://plg2.cs.uwaterloo.ca/~migod/papers/2006/wcre06-clonePatterns.pdf ↩
Wagner, Stefan; Abdulkhaleq, Asim; Kaya, Kamer; Paar, Alexander (2016). "On the Relationship of Inconsistent Software Clones and Faults: An Empirical Study". 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). pp. 79–89. arXiv:1611.08005. doi:10.1109/SANER.2016.94. ISBN 978-1-5090-1855-0. S2CID 3154845. 978-1-5090-1855-0 ↩
Brenda S. Baker. A Program for Identifying Duplicated Code. Computing Science and Statistics, 24:49–57, 1992. /wiki/Brenda_Baker ↩
Ira D. Baxter, et al. Clone Detection Using Abstract Syntax Trees http://www.semanticdesigns.com/Company/Publications/ICSM98.pdf ↩
Visual Detection of Duplicated Code Archived 2006-06-29 at the Wayback Machine by Matthias Rieger, Stephane Ducasse. http://www.iam.unibe.ch/~scg/Archive/Papers/Rieg98aEcoopWorkshop.pdf ↩
Yuan, Y. and Guo, Y. CMCD: Count Matrix Based Code Clone Detection, in 2011 18th Asia-Pacific Software Engineering Conference. IEEE, Dec. 2011, pp. 250–257. ↩
Chen, X., Wang, A. Y., & Tempero, E. D. (2014). A Replication and Reproduction of Code Clone Detection Studies. In ACSC (pp. 105-114). http://www.qualitascorpus.com/pubs/ChenWangTemperoClones.pdf ↩
Bulychev, Peter, and Marius Minea. "Duplicate code detection using anti-unification." Proceedings of the Spring/Summer Young Researchers’ Colloquium on Software Engineering. No. 2. Федеральное государственное бюджетное учреждение науки Институт системного программирования Российской академии наук, 2008. https://cyberleninka.ru/article/n/duplicate-code-detection-using-anti-unification ↩

Emergence

Fixing

Costs and benefits

Detecting duplicate code

Example of functionally duplicate code

See also

External links

References