Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14123/1735
Original TitleSupplementary Material for the Paper "Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions"
Handle20.500.14123/1735
Kinds of DataStatistical Evaluations / Tables
Programs and Applications
Survey Instruments / Measuring Instruments
Context Materials / Supporting information
Resource TypeDataset
CreatorZantvoort, Kirsten  0000-0001-9876-054X (Institut für Wirtschafsinformatik (IIS), Leuphana Universität Lüneburg  02w2y2t16)
Nacke, Barbara  0000-0002-8976-8440 (TU Dresden  042aqky30)
Görlich, Dennis  0000-0002-2574-9419 (Universität Münster  00pd74e08)
Hornstein, Silvan  0000-0002-0398-7096 (Humboldt-Universität zu Berlin  01hcx6992)
Jacobi, Corinna  0000-0002-0982-0596 (TU Dresden  042aqky30)
Funk, Burkhardt  0000-0001-5855-2666 (Institut für Wirtschaftsinformatik (IIS), Leuphana Universität Lüneburg  02w2y2t16)
Description of the DatasetTo provide insights on minimal necessary data set sizes, the researchers explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study. Prediction performance is analyzed based on dataset size (N = 100–3654), feature groups (F = 2–129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (N ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While N = 500 mitigated overfitting, performance did not converge until N = 750–1500. Consequently, the researchers propose minimum dataset sizes of N = 500–1000.
MethodsSummary
Aggregation
Description
KeywordsMachinelles Lernen; Data Science; Prognose; Algorithmus; Gesundheitsdaten; Digitale Gesundheit; Mentale Gesundheit; Psychische Störung; Intervention; Therapeutik; Machine Learning; Data Science; Prediction; Algorithm; Health Data; Digital Health; Mental Health; Psychiatric Disorder; Intervention; Therapeutics
Thematic ClassificationData Science
NotesThe supplementary material is available for download. Please visit the article linked below to gain access. You will find the file in the chapter "Supplementary information".
Published byMedien- und Informationszentrum, Leuphana Universität Lüneburg
Superordinate Data Collection Supplementary Material PhD Kirsten Zantvoort
Related Resources Relations of the dataset

Items in PubData are protected by copyright, with all rights reserved, unless otherwise indicated.

Citation formats