Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.14123/1735
Full metadata record
FieldValue
Original TitleSupplementary Material for the Paper "Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions"
Handle20.500.14123/1735
Kinds of DataStatistical Evaluations / Tables
Programs and Applications
Survey Instruments / Measuring Instruments
Context Materials / Supporting information
Resource TypeDataset
CreatorZantvoort, Kirsten  0000-0001-9876-054X (Institut für Wirtschafsinformatik (IIS), Leuphana Universität Lüneburg  02w2y2t16)
Nacke, Barbara  0000-0002-8976-8440 (TU Dresden  042aqky30)
Görlich, Dennis  0000-0002-2574-9419 (Universität Münster  00pd74e08)
Hornstein, Silvan  0000-0002-0398-7096 (Humboldt-Universität zu Berlin  01hcx6992)
Jacobi, Corinna  0000-0002-0982-0596 (TU Dresden  042aqky30)
Funk, Burkhardt  0000-0001-5855-2666 (Institut für Wirtschaftsinformatik (IIS), Leuphana Universität Lüneburg  02w2y2t16)
Description of the DatasetTo provide insights on minimal necessary data set sizes, the researchers explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study. Prediction performance is analyzed based on dataset size (N = 100–3654), feature groups (F = 2–129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (N ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While N = 500 mitigated overfitting, performance did not converge until N = 750–1500. Consequently, the researchers propose minimum dataset sizes of N = 500–1000.
MethodsSummary
Aggregation
Description
KeywordsMachinelles Lernen; Data Science; Prognose; Algorithmus; Gesundheitsdaten; Digitale Gesundheit; Mentale Gesundheit; Psychische Störung; Intervention; Therapeutik; Machine Learning; Data Science; Prediction; Algorithm; Health Data; Digital Health; Mental Health; Psychiatric Disorder; Intervention; Therapeutics
Thematic ClassificationData Science
Language of the ResourceEnglish
NotesThe supplementary material is available for download. Please visit the article linked below to gain access. You will find the file in the chapter "Supplementary information".
Date of Availability2025-01-22T08:46:40Z
Date of issue2025-01-22
Archiving Facility Medien- und Informationszentrum (Leuphana Universität Lüneburg  02w2y2t16)
Published byMedien- und Informationszentrum, Leuphana Universität Lüneburg
  Related Resources
Superordinate Data Collection: Supplementary Material PhD Kirsten Zantvoort
FieldValue
Participating ResearchersZantvoort, Kirsten  0000-0001-9876-054X (Institut für Wirtschaftsinformatik (IIS), Leuphana Universität Lüneburg  02w2y2t16)

Items in PubData are protected by copyright, with all rights reserved, unless otherwise indicated.