DACURA: a new solution to data harvesting and knowledge extraction for the historical sciences

Peregrine, Peter N., Brennan, Rob, Currie, Thomas, Feeney, Kevin, Francois, Pieter, Turchin, Peter and Whitehouse, Harvey (2018) DACURA: a new solution to data harvesting and knowledge extraction for the historical sciences. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 51 (3). pp. 165-174. ISSN 0161-5440

Copy

New advances in computer science address problems historical scientists face in gathering and evaluating the now vast data sources available through the Internet. As an example we introduce Dacura, a dataset curation platform designed to assist historical researchers in harvesting, evaluating, and curating high-quality information sets from the Internet and other sources. Dacura uses semantic knowledge graph technology to represent data as complex, inter-related knowledge allowing rapid search and retrieval of highly specific data without the need of a lookup table. Dacura automates the generation of tools to help non-experts curate high quality knowledge bases over time and to integrate data from multiple sources into its curated knowledge model. Together these features allow rapid harvesting and automated evaluation of Internet resources. We provide an example of Dacura in practice as the software employed to populate and manage the Seshat databank.

Item Type	Article
Additional information	© 2018 Taylor & Francis Group, LLC. This is an accepted manuscript of an article published by Taylor & Francis in Historical Methods: A Journal of Quantitative and Interdisciplinary History on 20/03/2018 , available online: https://doi.org/10.1080/01615440.2018.1443863
Keywords	data harvesting, rdf triplestore, data curation, database metamodels, database ontology, history
Date Deposited	15 May 2025 13:21
Last Modified	04 Jun 2025 17:08

Explore Further

Historical Methods: A Journal of Quantitative and Interdisciplinary History

visibility_off

description

description: Metamodel_paper_R_R_final.docx
subject: Submitted Version
lock: Restricted to Repository staff only
['licenses_description_other' not defined]: Available under ['licenses_typename_other' not defined]

picture_as_pdf

Submitted Version

['licenses_description_other' not defined]

Atom

BibTeX

OpenURL ContextObject in Span

OpenURL ContextObject

Dublin Core

MPEG-21 DIDL

Data Cite XML

EndNote

HTML Citation

METS

MODS

RIOXX2 XML

Reference Manager

Refer

ASCII Citation

Export

Downloads