Please use this identifier to cite or link to this item:
https://hdl.handle.net/20.500.14123/1735
Original Title | Supplementary Material for the Paper "Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions" |
Handle | 20.500.14123/1735 |
Kinds of Data | Statistical Evaluations / Tables Programs and Applications Survey Instruments / Measuring Instruments Context Materials / Supporting information |
Resource Type | Dataset |
Creator | Zantvoort, Kirsten 0000-0001-9876-054X (Institut für Wirtschafsinformatik (IIS), Leuphana Universität Lüneburg 02w2y2t16) Nacke, Barbara 0000-0002-8976-8440 (TU Dresden 042aqky30) Görlich, Dennis 0000-0002-2574-9419 (Universität Münster 00pd74e08) Hornstein, Silvan 0000-0002-0398-7096 (Humboldt-Universität zu Berlin 01hcx6992) Jacobi, Corinna 0000-0002-0982-0596 (TU Dresden 042aqky30) Funk, Burkhardt 0000-0001-5855-2666 (Institut für Wirtschaftsinformatik (IIS), Leuphana Universität Lüneburg 02w2y2t16) |
Description of the Dataset | To provide insights on minimal necessary data set sizes, the researchers explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study. Prediction performance is analyzed based on dataset size (N = 100–3654), feature groups (F = 2–129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (N ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While N = 500 mitigated overfitting, performance did not converge until N = 750–1500. Consequently, the researchers propose minimum dataset sizes of N = 500–1000. |
Methods | Summary Aggregation Description |
Keywords | Machinelles Lernen; Data Science; Prognose; Algorithmus; Gesundheitsdaten; Digitale Gesundheit; Mentale Gesundheit; Psychische Störung; Intervention; Therapeutik; Machine Learning; Data Science; Prediction; Algorithm; Health Data; Digital Health; Mental Health; Psychiatric Disorder; Intervention; Therapeutics |
Thematic Classification | Data Science |
Notes | The supplementary material is available for download. Please visit the article linked below to gain access. You will find the file in the chapter "Supplementary information". |
Published by | Medien- und Informationszentrum, Leuphana Universität Lüneburg |
Superordinate Data Collection | |
Related Resources |
Items in PubData are protected by copyright, with all rights reserved, unless otherwise indicated.