Assessing the impact of API evolution is the title of my master’s thesis on which I just graduated. It’s for the Master of Science in Software Engineering at the University of Amsterdam. The project is for the VU University Medical Center, for a brain image analysis package they develop. The program visualizes data from e.g. MRI scans and provides an interactive 3D view of the head. Doctors for instance use it to compare the situation before and after an operation.
My thesis proposes a technique that helps understanding how hard it is to upgrade a program to a newer version of a library. And what makes it hard exactly.
The brain image analysis program uses an old version of the Qt Framework to draw widgets and do its visualizations. The newest version has changed considerably. It’s hard but necessary to upgrade the program to use the newest version. One of the things that is not clear is which changes in Qt have an effect on the program and how big those changes are.
My thesis proposes a tool called LICIA that can find all references in the source code of the program to the library (Qt in this case). Those references are compared with a list of changes (this should be made by hand for now). The result is a spreadsheet that lists the exact code locations on which changes in the library have an effect.
The result can be used to make graphs and derive statistics: “of the 200 functions that have changed, only 20 are used in the program”. This is useful to decide how to migrate the program.
LICIA uses the Elsa parser to parse the C++ code. It supports inheritance: if a class has changed, its subclasses are assumed to be changed as well because of the Liskov substitution principle. In the experiment, precision is 81% and recall is 96%.
Another contribution is a list of refactorings that were found during the experiment. Part of the changes from Qt 2 to 3 can be seen as refactorings but they have no generally accepted names. I propose names for them.
The correct BiBTeX citation format (if anybody is going to read it haha) is as follows:
@mastersthesis{citeulike:7691136,
author = {Witte, Taco C.},
citeulike-article-id = {7691136},
day = {6},
month = {August},
school = {University of Amsterdam},
title = {Assessing the impact of API evolution},
year = {2010}
}
See below for the source code of LICIA.
Perl source code of LICIA (LIbrary Changes Impact Analyzer):
#!/usr/bin/perl
#
# Analyzes the impact of library changes to C++ source code. Returns a spreadsheet with
# code locations that should be adapted.
#
# Copyright (C) 2010 Taco Witte
#
# This program is free software: you can redistribute it and/or modify it under the
# terms of the GNU General Public License as published by the Free Software Foundation,
# either version 3 of the License, or (at your option) any later version. See
# for more information.
#
# Required:
# - Elsa C++ parser, part of the Oink Stack. See https://www.danielwilkerson.com/oink
# - C Preprocessor, part of GCC. See http://gcc.gnu.org
#
# Note: classes mentioned in the change descriptions file should be declared in files
# with the same name in lowercase and .h as extension. For example QCursor should be
# declared in qcursor.h. They are referred to by name.
#
# Analyzes the given C++ source code file. Extracts the declarations in header files it
# refers to. Analyzes which of those declarations is affected by changes stored in a
# CSV file. Returns a spreadsheet with impacted code locations.
use warnings;
# Configuration
my $tmpdir = '/tmp';
my $cpp = '/usr/bin/cpp';
my $cpparguments = '-DQT_NO_STL';
my $ccparse = '/usr/local/oink-stack/elsa/ccparse';
# Handle command-line input
if (!defined $ARGV[0] || !defined $ARGV[1] || !defined $ARGV[2])
{
print "Usage: ./impact-analysis.pl CHANGES.csv INCLUDES FILE.cppnn";
print "CHANGES.csv is the command-separated file with changes.n";
print "FILE.cpp is the name of a C++ source code file to analyze.n";
print "INCLUDES is the *absolute* path of the required header files.nn";
print "For example: n";
print " ./impact-analysis.pl changes.csv /usr/include/qt232 main.cppn";
exit(1);
}
open CHANGESCSV, $ARGV[0] or die $!;
my $INCLUDES;
if ($ARGV[1] =~ /^(.*)(w+)/?$/)
{
# Drop trailing /
$INCLUDES = $1.$2;
}
my $DIRECTORY; my $FILENAME;
if ($ARGV[2] =~ /^(.*/)?([w-./]+)$/i)
{
$DIRECTORY = $1;
$FILENAME = $2;
}
else
{
die("Bug!");
}
# Parse the command-separated file (CSV) containing change descriptions
sub getChanges
{
my @changes = ();
my @lines = ;
my $header = shift @lines;
chomp $header;
if (!'id,namespace,class,identifier,type,level' eq $header)
{
die("Incorrect format for changes file.");
}
foreach $line (@lines)
{
chomp $line;
if ($line =~ /^([w]+),([w*]*),([w*]*),([w*_=~!]*),([ws();-]+),([ws]*)$/i)
{
my %change = ();
$change{'id'} = $1;
$change{'namespace'} = $2;
$change{'class'} = $3;
$change{'identifier'} = $4;
$change{'type'} = $5;
$change{'level'} = $6;
push(@changes, %change);
}
else
{
die("Parse error while parsing changes file: $line");
}
}
return @changes;
}
# Process the given file with the C preprocessor.
sub preProcess()
{
chdir $DIRECTORY || die();
system($cpp . " $cpparguments -w -I$INCLUDES $FILENAME $tmpdir/$FILENAME");
stat($tmpdir . '/' . $FILENAME) || die();
}
# Extract the declarations from the abstract syntax tree.
sub extractDeclarations
{
# Generate the abstract syntax tree with Elsa.
my $ccparsecmd = $ccparse . ' -tr printTypedAST ' . $tmpdir . '/' . $FILENAME;
my @ast = `$ccparsecmd`;
my %declarations = ();
my $previousLine = '';
foreach $line (@ast)
{
chomp($line);
# Match code locations. The line before a code location may refer
# to a declaration used in the code.
# e.g. loc = main.cpp:20:35
if ($line =~ /^[s]*loc = $INCLUDES.*:d+:d+$/i)
{
# Skip code locations in the library. The library itself
# will of course be adapted to its own changes.
}
elsif ($line =~ /^[s]*loc = (.+):(d+):d+$/i)
{
# Consider code locations in all other files
my %codeLocation = ();
$codeLocation{'file'} = $1;
$codeLocation{'line'} = $2;
# Match references to declarations in header files. Matches on:
# field: invocations
# var: direct variable access
# ctorVar: constructor invocations
# e.g. field: void setMainWidget(QWidget *), at /include/qt232/qapplication.h:117:18 (0x094DA470)
if ($previousLine =~ /^[s]*(field|var|ctorVar): (.*) at (.+):(d+):d+ (w+)$/i)
{
my $type = $1;
my $contents = $2;
my $headerFile = $3;
my $headerLine = $4;
if (!exists ($declarations{$headerFile}{$headerLine}))
{
$declarations{$headerFile}{$headerLine}{'count'} = 0;
$declarations{$headerFile}{$headerLine}{'type'} = $type;
$declarations{$headerFile}{$headerLine}{'contents'} = $contents;
@{$declarations{$headerFile}{$headerLine}{'locations'}} = ();
}
$declarations{$headerFile}{$headerLine}{'count'}++;
push(@{$declarations{$headerFile}{$headerLine}{'locations'}}, %codeLocation);
}
}
$previousLine = $line;
}
return %declarations;
}
# Calculate the (shortest) path from one node to another
# Actually we're only interested in *whether* there's a path
# Return 1 if either (i, j) or both (i, k) and (k, j) are connected.
sub calculatePath
{
my ($ij, $ik, $kj) = @_;
if (defined $ij && 1 == $ij)
{
# Path already exists
return 1;
}
if (defined $ik && 1 == $ik && defined $kj && 1 == $kj)
{
# Path is now discovered through k
return 1;
}
# No path was found (yet -- maybe later)
return 0;
}
# Calculates the transitive closure for the given hash of hashes
# e.g. if A=>B and B=>C, return A=>B, A=>C, B=>C
# See also http://en.wikipedia.org/wiki/Floyd-Warshall_algorithm
sub transitiveClosure
{
my %path = @_;
# Store all the names of the path in a big hash as keys
my %names = ();
foreach $p1 (keys %path)
{
$names{$p1} = 1;
foreach $p2 (keys %{$path{$p1}})
{
$names{$p2} = 1;
}
}
# Calculate the transitive closure
foreach $k (keys %names)
{
foreach $i (keys %names)
{
foreach $j (keys %names)
{
$path{$i}{$j} = calculatePath($path{$i}{$j}, $path{$i}{$k}, $path{$k}{$j});
}
}
}
return %path;
}
# Construct a 2 dimensional hash with all inheritance information
# e.g. if $inherits{'Animal'}{'Monkey'} = 1 then Monkey inherits from Animal
# A class can always be reached by itself.
sub parseInheritanceHierarchy
{
# Inheritance information
my %inherits = ();
# Generate the Dot file with Elsa
$ccparsecmd = $ccparse . ' -tr printHierarchies ' . $tmpdir . '/' . $FILENAME;
@hierarchy = `$ccparsecmd`;
# Parse and store the results in a hashmap
# subclasses['Superclass'] = Array['Subclasses']
foreach $line (@hierarchy)
{
chomp($line);
# Match on e.g. "QFrame (0x9f52d00)" -> "QScrollView (0x9f53010)" [
if ($line =~ /"(w+) ([dw]+)" -> "(w+) ([dw]+)" [/i)
{
$inherits{$1}{$1} = 1; # Recursive
$inherits{$1}{$2} = 1; # Path
$inherits{$2}{$2} = 1; # Recursive
}
}
# Add transitive relations to hashmap, such that all subclasses will be given for a
# certain class
%inherits = transitiveClosure(%inherits);
return %inherits;
}
# Check whether $class:$identifier is declared on $header:$line
# $header is known to be related to $class.
sub findMatchInHeader
{
my ($declaration, $class, $identifier) = @_;
my %match = ();
if ('*' eq $identifier)
{
# Always match on *
$match{'message'} = "MATCH * identifiern";
}
elsif ('' eq $identifier)
{
# Match on constructor
if ('ctorVar' eq $declaration->{'type'})
{
$match{'message'} = "MATCH empty identifier on constructorn";
}
}
else
{
# Match on identifier; case-sensitive because C++ is
if ($declaration->{'contents'} =~ /$identifier/)
{
$match{'message'} = "MATCH $declaration->{'contents'} =~ $identifiern";
}
}
return %match;
}
# Find the impact of the given change in the given classes
sub findImpactsInClasses
{
my ($change, $candidateClasses, $declarations) = @_;
my @impacts = ();
foreach $class (@{$candidateClasses})
{
my $header = $INCLUDES . '/' . lc($class) . '.h';
my $identifier = $change->{'identifier'};
# Find a $declaration from the list for which $header:$line matches $identifier
foreach $line (keys %{$declarations->{$header}})
{
my %match = findMatchInHeader($declarations->{$header}{$line}, $class, $identifier);
# If match, save some information about it
if (defined $match{'message'})
{
my %impact = ();
$impact{'change'} = $change;
$impact{'actualclass'} = $class;
$impact{'match'} = %match;
$impact{'declaration'} = $declarations->{$header}{$line};
push(@impacts, %impact);
last;
}
}
}
return @impacts;
}
# Find declarations that match with the given changes
sub findImpacts
{
my ($changes, $inherits, $declarations) = @_;
my @impacts = ();
foreach $change (@{$changes})
{
if (!defined $change->{'class'} || '' eq $change->{'class'})
{
die("Support for namespaces is not implemented");
}
# List of classes in which to look for impact
my @candidateClasses;
# Match on all classes
if ('*' eq $change->{'class'})
{
# Store all (unique) occurring class names
my %allClasses = ();
foreach $p1 (keys %{$inherits})
{
$allClasses{$p1} = 1;
foreach $p2 (keys %{$inherits->{$p1}})
{
$allClasses{$p2} = 1;
}
}
push(@candidateClasses, keys %allClasses);
}
else
{
# Store only the class mentioned in the change and its superclasses
foreach $class (keys %{$inherits->{$change->{'class'}}})
{
if (1 == $inherits->{$change->{'class'}}{$class})
{
push(@candidateClasses, $class);
}
}
}
my @newImpacts = findImpactsInClasses($change, @candidateClasses, $declarations);
push(@impacts, @newImpacts);
}
return @impacts;
}
# Print the impacts in comma-separated form, for further processing. Print them
# such that the output of several files can be concatenated to one big file.
sub printImpactsAsCSV
{
my @impacts = @_;
foreach $impact (@impacts)
{
foreach $codeLocation (@{$impact->{'declaration'}{'locations'}})
{
# Format:
# change id, change type, changed namespace, changed class,
# changed identifier, source file, line in source file,
# class in which functionality is found
my $output = sprintf("%s,%s,%s,%s,%s,%s,%s,%s,%sn",
$impact->{'change'}->{'id'},
$impact->{'change'}->{'type'},
$impact->{'change'}->{'level'},
$impact->{'change'}->{'namespace'},
$impact->{'change'}->{'class'},
$impact->{'change'}->{'identifier'},
$codeLocation->{'file'}, # could be a different file!
$codeLocation->{'line'},
$impact->{'actualclass'});
print $output;
}
}
}
# Get all refactorings and changes for the current migration
my @changes = getChanges();
# Preprocess the source file
preProcess();
# Extract declarations from abstract syntax tree of source file
my %declarations = extractDeclarations();
# Find inheritance paths for this source file
my %inherits = parseInheritanceHierarchy();
# Find whether refactorings and changes have impact on the program
my @impacts = findImpacts(@changes, %inherits, %declarations);
printImpactsAsCSV(@impacts);