Selecting Features in Origin Analysis

Green, P. D., Lane, P.C.R., Rainer, A. and Scholz, S. (2010) Selecting Features in Origin Analysis. In: Research and Development in Intelligent Systems XXVII, Incorporating Applications and Innovations in Intelligent Systems XVIII, : Proceedings of AI-2010, The Thirtieth SGAI International Conference on Innovative Techniques and Applications of Artificial I. Springer Nature, pp. 379-392. ISBN 978-0-85729-129-5

Copy

When applying a machine-learning approach to develop classifiers in a new domain, an important question is what measurements to take and how they will be used to construct informative features. This paper develops a novel set of machine-learning classifiers for the domain of classifying files taken from software projects; the target classifications are based on origin analysis. Our approach adapts the output of four copy-analysis tools, generating a number of different measurements. By combining the measures and the files on which they operate, a large set of features is generated in a semi-automatic manner. After which, standard attribute selection and classifier training techniques yield a pool of high quality classifiers (accuracy in the range of 90%), and information on the most relevant features.

Item Type	Book Section
Additional information	Original paper can be found at: http://www.springer.com/computer/ai/book/978-0-85729-129-5 Copyright Springer
Keywords	data mining, feature construction, origin analysis, machine learning
Date Deposited	15 May 2025 16:15
Last Modified	30 May 2025 23:07

Explore Further

visibility_off

picture_as_pdf

picture_as_pdf: sgai-selecting-features.pdf
subject: Submitted Version
lock: Restricted to Repository staff only

Request Copy

Atom

BibTeX

OpenURL ContextObject in Span

OpenURL ContextObject

Dublin Core

MPEG-21 DIDL

EndNote

HTML Citation

METS

MODS

RIOXX2 XML

Reference Manager

Refer

ASCII Citation

Export

Downloads