Selecting Features in Origin Analysis
When applying a machine-learning approach to develop classifiers in a new domain, an important question is what measurements to take and how they will be used to construct informative features. This paper develops a novel set of machine-learning classifiers for the domain of classifying files taken from software projects; the target classifications are based on origin analysis. Our approach adapts the output of four copy-analysis tools, generating a number of different measurements. By combining the measures and the files on which they operate, a large set of features is generated in a semi-automatic manner. After which, standard attribute selection and classifier training techniques yield a pool of high quality classifiers (accuracy in the range of 90%), and information on the most relevant features.
Item Type | Book Section |
---|---|
Additional information | Original paper can be found at: http://www.springer.com/computer/ai/book/978-0-85729-129-5 Copyright Springer |
Keywords | data mining, feature construction, origin analysis, machine learning |
Date Deposited | 15 May 2025 16:15 |
Last Modified | 30 May 2025 23:07 |
Explore Further
-
picture_as_pdf - sgai-selecting-features.pdf
-
subject - Submitted Version
-
lock - Restricted to Repository staff only