Using n-grams to rapidly characterise the evolution of software code

Rainer, A., Lane, P.C.R., Malcolm, J. and Scholz, S. (2008) Using n-grams to rapidly characterise the evolution of software code. In: Procs 23rd IEEE/ACM Int Conf on Automated Software Engineering : ASE Workshops 2008. Institute of Electrical and Electronics Engineers (IEEE), pp. 43-52. ISBN 978-1-4244-2776-5
Copy

Text-based approaches to the analysis of software evolution are attractive because of the fine-grained, token-level comparisons they can generate. The use of such approaches has, however, been constrained by the lack of an efficient implementation. In this paper we demonstrate the ability of Ferret, which uses ngrams of 3 tokens, to characterise the evolution of software code. Ferret’s implementation operates in almost linear time and is at least an order of magnitude faster than the diff tool. Ferret’s output can be analysed to reveal several characteristics of software evolution, such as: the lifecycle of a single file, the degree of change between two files, and possible regression. In addition, the similarity scores produced by Ferret can be aggregated to measure larger parts of the system being analysed.

visibility_off picture_as_pdf

picture_as_pdf
A_Rainer_Alt_1.pdf
subject
Published Version
lock
Restricted to Repository staff only

Request Copy

Atom BibTeX OpenURL ContextObject in Span OpenURL ContextObject Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation METS MODS RIOXX2 XML Reference Manager Refer ASCII Citation
Export

Downloads