Tutorial

for v2013.08 | Last modified: Aug. 22, 2013

» Japanese page

Index

Summary

In this version, the tool compare files between projects and enumerate file pairs which similarity is greater than or equals to the threshold. Then, construct the minimum spanning tree with the number of similar file pair as a cost function. The tool defines the evolutional direction as the direction that the amount of the source code is increased.

If you want to know the operating principles and results of case studies, see our paper [Extraction of Product Evolution Tree from Source Code of Product Variants (SPLC2013)] (to appear)

Preparations

PRET-Extractor v2013.08 Binary

Java Runtime Environment 7

Graphviz

Run example

Getting ready

First, edit example.bat.
Change "C:\Program Files (x86)\Graphviz 2.28\bin\dot.exe" to the path to dot.exe of Graphviz on your environment.

java -jar pret.jar -m skip -type java.typec -tmp tmp -result result -diff 0
-graph "C:\Program Files (x86)\Graphviz 2.28\bin\dot.exe" -thread 2 < example_command.txt

Run

Run example.bat.

The result will be output "result" directory created in the tool directory.
We set similarity to 0.6 so two files, 0.6.dot and 0.6.png, are in the result directory.

The result would be similar to the figure below.

Each node of a tree represents a software product and each edge indicates that a product is likely derived from another product. A node label is the path to the project.
An edge label explains the cost of software changes between products and the direction of derivation, indicating which product is an ancestor and which product is a successor.
Edges which are not allow mean the tool couldn't detect evolution direction.

Change the settings (a bit)

Edit example_command.txt.

To change the threshold of similarity between source files, change "threshold 0.6" at line 1 to:

threshold 0.65

Insert following line after line 4, "read proj3." This description let the tool to add project.

read proj4

Run example.bat again, then 0.65.png will be generated.

Notes on output

Since the tool uses Graphviz for rendering the tree, the form of the tree depends on an input order.
(Tree itself is led best one by the algorithm.)
Edit .dot file to fix the format.

Commands

This section explains about editing example_command.txt.

Reading projects

Specify the path to the target project directory after "read".
One project per one command.

read C:\src\project1
read C:\src\project2

Construct the tree

Run count command then the tool construct the tree.

count

Set similarity threshold

Similarity threshold is used for taking the correspondence between the files.
Two files are considered similar when their similarity is greater than or equals to the threshold.

Following command sets the similarity threshold to 0.8.

threshold 0.8

If the command is described after reading projects and the threshold is decreased, the tool recalculates the similarity.

Repeat while raising the threshold:

threshold 0.7
read proj1
read proj2
read proj3
count
threshold 0.8
count
threshold 0.9
count
exit

Continue from the last time

Analyzed data are saved in the tmp directory automatically.
You can load them and continue analysis if you run the tool with same command line options as before.
If the options are changed, you should not load the saved data.

To reuse results the first analysis results on "Change the settings (a bit)" section in the example, edit example_command.txt as below:

load
threshold 0.65
read proj4
count
exit

Command List

Command line options

The tool supports some command line options.

Target file type

-type option specifies setting file.

-type c.typec

The default type specification file java.typec is as follows:

java,J

which means that the tool use built-in Java parser for preprocess files with extension ".java".

Each line has "extension,Alphabet(specify preprocess procedure)". Preprocess procedures are:

The tool considers the files as same type of files if the same character is assigned for multiple extensions.
In this example configuration, between files *.x and *.y besides between files of same extension.

x,A
y,A
z,B

(*.x and *.y are file type A. *.z is file type B.)

Number of threads

Specify the number of threads for calculating similarity.

-thread 8

Using ecternal diff program

Specify diff program with the number after -diff option.

-diff 0 : Use internal LCS counter (default).
-diff 1 : Use internal diff. This is the same setting as SPLC paper.
-diff 5 diff.exe : Use external GNU diff. Specify the path to the program.
-diff 6 diff.exe : Use external Cygwin GNU diff. Specify the path to the program.

Option List

Customize

Get source code from : pret_201308_src.zip.

Use external parser

Run external parser before running PRET-Extractor

Run external parser and generate files which are one token per one line. Set extension and write it to the target type configuration file. For example, use external Pascal parser
A.pas (source) -> [Pascal parser] -> A.parsedpas
and set the target type configuration file as:

parsedpas,P

P (and other characters without C and J) means "No preprocess by the tool" so the tool does not use internal parser and compare files as one token per one line.

Edit source code

Implement Preprocessor class and add it to PreprocessorFactory class

Target files are not under the same directory or Analysis specified files only

When running the tool from source code, you can use target file list of the project instead of the path to the project directory.
The list contains one path to the target file per one line.

engine.readProjectByList(Path list);

Please note that the type specification is still enable so that files which are listed but extensions are not matched are ignored.

Alteration point of source code

Definition of the cost function with extending Counter class

The tool gets the data for calculating cost function in CalcDiff class. Analyzed data is set to the class implements ISimlarity interface.
You can define new cost function with alternating these related files.


PRET-Extractor index