for v2013.08 | Last modified: Aug. 22, 2013
In this version, the tool compare files between projects and enumerate file pairs which similarity is greater than or equals to the threshold. Then, construct the minimum spanning tree with the number of similar file pair as a cost function. The tool defines the evolutional direction as the direction that the amount of the source code is increased.
If you want to know the operating principles and results of case studies, see our paper [Extraction of Product Evolution Tree from Source Code of Product Variants (SPLC2013)] (to appear)
PRET-Extractor v2013.08 Binary
Java Runtime Environment 7
Graphviz
First, edit example.bat.
Change "C:\Program Files (x86)\Graphviz 2.28\bin\dot.exe" to the path to dot.exe of Graphviz on your environment.
java -jar pret.jar -m skip -type java.typec -tmp tmp -result result -diff 0
-graph "C:\Program Files (x86)\Graphviz 2.28\bin\dot.exe" -thread 2 < example_command.txt
Run example.bat.
The result will be output "result" directory created in the tool directory.
We set similarity to 0.6 so two files, 0.6.dot and 0.6.png, are in the result directory.
The result would be similar to the figure below.
Each node of a tree represents a software product and each edge indicates that a product is likely derived from another product.
A node label is the path to the project.
An edge label explains the cost of software changes between products and the direction of derivation,
indicating which product is an ancestor and which product is a successor.
Edges which are not allow mean the tool couldn't detect evolution direction.
Edit example_command.txt.
To change the threshold of similarity between source files, change "threshold 0.6" at line 1 to:
threshold 0.65
Insert following line after line 4, "read proj3." This description let the tool to add project.
read proj4
Run example.bat again, then 0.65.png will be generated.
Since the tool uses Graphviz for rendering the tree,
the form of the tree depends on an input order.
(Tree itself is led best one by the algorithm.)
Edit .dot file to fix the format.
This section explains about editing example_command.txt.
Specify the path to the target project directory after "read".
One project per one command.
read C:\src\project1
read C:\src\project2
Run count command then the tool construct the tree.
count
Similarity threshold is used for taking the correspondence between the files.
Two files are considered similar when their similarity is greater than or equals to the threshold.
Following command sets the similarity threshold to 0.8.
threshold 0.8
If the command is described after reading projects and the threshold is decreased, the tool recalculates the similarity.
Repeat while raising the threshold:
threshold 0.7
read proj1
read proj2
read proj3
count
threshold 0.8
count
threshold 0.9
count
exit
Analyzed data are saved in the tmp directory automatically.
You can load them and continue analysis if you run the tool with same command line options as before.
If the options are changed, you should not load the saved data.
To reuse results the first analysis results on "Change the settings (a bit)" section in the example, edit example_command.txt as below:
load
threshold 0.65
read proj4
count
exit
The tool supports some command line options.
-type option specifies setting file.
-type c.typec
The default type specification file java.typec is as follows:
java,J
which means that the tool use built-in Java parser for preprocess files with extension ".java".
Each line has "extension,Alphabet(specify preprocess procedure)". Preprocess procedures are:
The tool considers the files as same type of files if the same character is assigned for multiple extensions.
In this example configuration, between files *.x and *.y besides between files of same extension.
x,A
y,A
z,B
(*.x and *.y are file type A. *.z is file type B.)
Specify the number of threads for calculating similarity.
-thread 8
Specify diff program with the number after -diff option.
-diff 0
: Use internal LCS counter (default).-diff 1
: Use internal diff. This is the same setting as SPLC paper.-diff 5 diff.exe
: Use external GNU diff. Specify the path to the program.-diff 6 diff.exe
: Use external Cygwin GNU diff. Specify the path to the program.
Get source code from : pret_201308_src.zip.
Run external parser and generate files which are one token per one line.
Set extension and write it to the target type configuration file.
For example, use external Pascal parser
A.pas (source) -> [Pascal parser] -> A.parsedpas
and set the target type configuration file as:
parsedpas,P
P (and other characters without C and J) means "No preprocess by the tool" so the tool does not use internal parser and compare files as one token per one line.
Implement Preprocessor class and add it to PreprocessorFactory class
When running the tool from source code, you can use target file list of the project instead of the path to the project directory.
The list contains one path to the target file per one line.
engine.readProjectByList(Path list);
Please note that the type specification is still enable so that files which are listed but extensions are not matched are ignored.
Definition of the cost function with extending Counter class
The tool gets the data for calculating cost function in CalcDiff class.
Analyzed data is set to the class implements ISimlarity interface.
You can define new cost function with alternating these related files.