Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.14123/1735
Original Title | Supplementary Material for the Paper "Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions" |
Handle | 20.500.14123/1735 |
Kinds of Data | Statistical Evaluations / Tables Programs and Applications Survey Instruments / Measuring Instruments Context Materials / Supporting information |
Resource Type | Dataset |
Creator | Zantvoort, Kirsten ![]() ![]() Nacke, Barbara ![]() ![]() Görlich, Dennis ![]() ![]() Hornstein, Silvan ![]() ![]() Jacobi, Corinna ![]() ![]() Funk, Burkhardt ![]() ![]() |
Description of the Dataset | To provide insights on minimal necessary data set sizes, the researchers explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study. Prediction performance is analyzed based on dataset size (N = 100–3654), feature groups (F = 2–129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (N ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While N = 500 mitigated overfitting, performance did not converge until N = 750–1500. Consequently, the researchers propose minimum dataset sizes of N = 500–1000. |
Methods | Summary Aggregation Description |
Keywords | Maschinelles Lernen; Data Science; Prognose; Algorithmus; Gesundheitsdaten; Digitale Gesundheit; Mentale Gesundheit; Psychische Störung; Intervention; Therapeutik; Machine Learning; Data Science; Prediction; Algorithm; Health Data; Digital Health; Mental Health; Psychiatric Disorder; Intervention; Therapeutics |
Thematic Classification | Data Science |
Notes | The supplementary material is available for download. Please visit the article linked below to gain access. You will find the file in the chapter "Supplementary information". |
Published by | Medien- und Informationszentrum, Leuphana Universität Lüneburg |
Superordinate Data Collection | |
Related Resources |
Items in PubData are protected by copyright, with all rights reserved, unless otherwise indicated.