Dataset Handle: 20.500.14123/1735
Supplementary Material for the Paper "Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions"
Archiving without access
No downloads available
Chronological data
Date of availability in catalog2025-01-22
Available from / since 2025-01-22
Language of the resource
English
Related PubData resources
Publisher
Other contributors
Abstract
To provide insights on minimal necessary data set sizes, the researchers explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study. Prediction performance is analyzed based on dataset size (N = 100–3654), feature groups (F = 2–129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (N ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While N = 500 mitigated overfitting, performance did not converge until N = 750–1500. Consequently, the researchers propose minimum dataset sizes of N = 500–1000.
Resource type
Dataset
Kinds of Data
Statistical Evaluations / Tables
Context Materials / Supporting information
Survey Instruments / Measuring Instruments
Programs and Applications
Context Materials / Supporting information
Survey Instruments / Measuring Instruments
Programs and Applications
Methods
Summary
Description
Aggregation
Description
Aggregation
Thematic classification
Data Science
Keywords
Maschinelles Lernen; Data Science; Prognose; Algorithmus; Gesundheitsdaten; Digitale Gesundheit; Mentale Gesundheit; Psychische Störung; Intervention; Therapeutik; Machine Learning; Data Science; Prediction; Algorithm; Health Data; Digital Health; Mental Health; Psychiatric Disorder; Intervention; Therapeutics
Notes
The supplementary material is available for download. Please visit the article linked below to gain access. You will find the file in the chapter "Supplementary information".