Code Clone Related Tools

-> Japanese Page
Last update 18 Mar. 2005, Since 2002
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

-- Table of Contents --

• Tools
FAQ
Press
Papers

Introduction

A code clone is a pair(or set) of code fragments in source files of a software product. It is pointed out that a code clone makes software maintenance difficult. The code clone problem sometimes becomes serious one, especially for the large scale software. The developer can not even find out code clones by hand, without help of code clone detection tools.
Our research group has been studying how to handle such code clone problem in a large scale software development process. This page introduces the tools that we have developed.

Related methods/research/technology

Methods and tools for detecting code clones

A tool CLAN
A tool CloneDR I. Baxter, Semantic Design, Inc.
[Baxter1998]
A tool Covet J. Bailey, J. Mayrand
A tool Dotplot Accelrys Inc.
A tool Dup B.S. Baker, Bell Lab.
A paper on the tool [Baker1992]
A paper on the details of the algorithm [Baker1995]
A tool Duplix
A tool Duploc M. Rieger
[Ducasse1999]
A tool JPlag G. Malpohl
[Prechelt1999]
A tool Moss A. Aiken
Comparison of the tools
Research on the comparison of the tools: CCFinder, CloneDr, Covet, JPlag and Moss [Burd2002])

Code clones and software development process

Reengineering, Refactoring Refactoring Home Page
Kent Beck writes "Programs that have duplicate logic are hard to modify" in [Fowler1999].
Code clones and the quality of products
The statistical research on the relation between code clones and bugs [Monden2002]
Comparison of products
Comparative study on the similarity of software products using code clones [Yamamoto2001]
Tracking the changes of software products
Research on the use of code clones to track the changes of software products [Johnson1994]
Research on how code clones change in the course of version change of software products [Laguë1997])

Explanation of the technology

Here the technologies and definitions used especially in CCFinder/Gemini are introduced. Generally, various types of methods for detecting code clones exist, such as a method that compares the strings in the source code, a method using the characteristic metrics, etc (Tutorial [Inoue2001], in Japanese).

Code clones

To appear soon.

Scatter plot

To appear soon.

Clone metrics

Clone metrics are the metrics that measure the characteristics of clone classes (i.e., sets of fragments of the code which are exactly the same as or similar to each other).

  • RAD the range of the source code fragments of a code clone in the directory hierarchy, when the source code is supposed to be stored in the hierarchical directory. When all the code fragments of a clone class are located in one source file, the RAD value of the clone class is equivalent to 0. When the code fragments of the clone class are located in multiple source files stored in one directory, the RAD is 1. If those sources files are stored in different directories, then RAD is the maximum depth of those sources measured from their common parent directory.
  • POP the number of source code fragments in a clone class.
  • DFL the estimated reduction (number of tokens) of source code caused by rewriting the code clones into a common subroutine.
    {(the length of the source code fragment)-(the length of the source code which calls the common subroutine)} x POP - (the length of the source code of the subroutine)

Metrics that measures the features of modules (source files)

  • RST
  • RSA
  • MAXLEN (Maximum length of code clone) the number of lines of the longest code clone sequence in the module. After a long code sequence is copied and pasted, MAXLEN becomes large.
  • COVERAGE (Coverage of code clone) the ratio (%) of the number of lines in the code clone sequences to the number of lines in the whole module. If a whole module is copied and pasted, COVERAGE of the original module is 100%.

FAQ

Q: Where can I get this tool?

A: For several reasons, the distribution of CCFinder/Gemini is currently under strict control. However, CCFinder/Gemini can be distributed to you on the condition that you will cooperate on our research. For details, please contact us
mail to: y-higo AT ist DOT osaka-u DOT ac DOT jp

Publications

Press

[Award] CCFinder was chosen as one of the Clone Awards in 2002 in a Workshop on Detection of Software Clones co-located with the International Conference on Software Maintenance(ICSM) 2002 (2002/10/02)

[Newspaper Article] "Osaka University : 'a 100 times faster search of the positions to be revised'", Nikkei Industrial newspaper, 2002 May 17 (Fri), morning edition p.8, 2002.5.17. (in Japanese)

[Newspaper Article] "Computer software 'a fast detection method of copied parts', developed by a professor at Osaka University and his coworkers", Yomiuri newspaper, 2002 May 8 (Wed), morning edition, p.2 General, 2002.5.2 (in Japanese)

Papers

2004
2002
[Ueda2002-2] Yasushi Ueda, Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue, "On Detection of Gapped Code Clones using Gap Locations," APSEC 2002. (To appear)
2001
[Ueda2001] Yasushi Ueda, Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue, "Source Code Analysis System using Code Clone Detection Tool", IEICE SS2001-14, Vol.101, No.240, pp.17-24, (2001-7) (in Japanese).
[Kamiya2001-3] Toshihiro Kamiya, Fumiaki Ohata, Kazuhiro Kondou, Shinji Kusumoto, and Katuro Inoue: "Maintenance support tools for Java programs: CCFinder and JAAT", Proc. of The 23rd Int'l Conf. on Software Eng. (ICSE'2001), pp. 837-838, Toronto, Canada, (May, 2001).
[Kamiya2001-2] Toshihiro Kamiya, "Code Clone Detection Method", Proceedings of Winter Workshop in Kanazawa, IPSJ SIGSE, pp.21-22, (2001-1).
[Kamiya2001] Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoui, "A Token-based Code Clone Detection Technique and Its Evaluation", IEICE SS2000-42, Vol.100, No.570, pp. 41-48, (2001-1)
[Nakae2001] Daikai Nakae, Toshihiro Kamiya, Akito Monden, Hiroshi Kato, Shin-ichi Sato, and Katsuro Inoue, "Quantitative Analysis of Cloned code on Legacy Software", IEICE SS2000-49, Vol. 100, No. 570, pp. 57-64, (2001-1).

References

Acknowledgement

This project is supported by Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (A) (17200001) since April 2004 to March 2009.