Dataset Handle: 20.500.14123/12586

Supplementary Research Materials for the PhD Thesis "Vague, Incomplete, Subjective, and Uncertain Information in Digital History"

Datasets, Software Application, Source Code, and NLP Models

Archiving without access
No downloads available

Chronological data

Date of availability in catalog2026-02-27
Available from / since 2026-02-27

Language of the resource

English

Related external resources

Identical to DOI: 10.5281/zenodo.15655527
Mariani, F. (2023). Structured Provenance Events Dataset from the Art Institute of Chicago [Data set]. Zenodo.
Identical to DOI: 10.5281/zenodo.13987655
Mariani, F. (2023). NLP Models for Extracting Knowledge from Museum Provenance Texts. Zenodo.
Identical to URL: https://github.com/prov-a/prov-a.github.io
Codebase for Prov-A. (Provenance App, access: https://prov-a.github.io)

Related PubData resources

Supplement to

Editor

Other contributors

Abstract

This collection contains the research materials accompanying the PhD thesis "Vague, Incomplete, Subjective, and Uncertain Information in Digital History" (Mariani, 2026). The materials support the investigation of VISU (Vague, Incomplete, Subjective, and Uncertain) information in art provenance data and document the computational and infrastructural components developed during the project. The archived materials include: (1) Art Institute of Chicago (AIC) Provenance Dataset: Structured provenance events automatically extracted from museum records; (2) NLP Models: spaCy-based models trained on manually annotated AIC provenance texts for sentence boundary detection and span categorisation; (3) PROV-A (Provenance App): A web-based application developed during the PhD for structuring provenance information as Linked Open Data, integrating automated extraction with expert validation; (4) AIC Case Study Data in PROV-A: A curated subset of AIC provenance records processed and supervised using PROV-A. -Together, these materials document the end-to-end workflow proposed in the dissertation, from automated extraction and epistemically aware modelling to human-in-the-loop validation and Linked Open Data publication.

Resource type

Dataset
Software

Kinds of Data

Databases
Programs and Applications
Models / Modellings
Annotations

Methods

Modeling
Programming / Script-based data collection

Thematic classification

Provenienzforschung

Keywords

Provenienz; Kunst; Natural Language Processing; Künstliche Intelligenz; Linked Open Data; Webanwendung; Kunstgeschichte; Provenance; Arts; Natural Language Processing; Artificial Intelligence; Linked Open Data; Web Application; Art History

Notes

"AIC Provenance Dataset" and "NLP Models" are published via Zenodo. "PROV-A" (Provenance App) is available at prov-a.github.io. Its code is published via github, just like "AIC Case Study Data in Prov-A".

Faculty / department

More information

Time Period of the Collection of the Data

Time Period of the Creation of the Dataset

2020 - 2025

Temporal Coverage of the Dataset

Geolocation (Country)

Geolocation (Region/Location)