Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions

Zantvoort, Kirsten; Hentati Isacsson, Nils; Funk, Burkhardt; Kaldo, Viktor

doi:10.48548/pubdata-1422

Journal ArticleParallel publicationPublished versionDOI: 10.48548/pubdata-1422

Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions

Downloads

Zantvoort_dataset_size_versus_homogeneity.pdf818 KB

Open Access

Chronological data

Date of first publication2024-05-15

Date of publication in PubData 2024-11-06

Language of the resource

English

Related external resources

Variant form of

DOI: 10.1177/20552076241248920
Zantvoort, K., Hentati Isacsson, N., Funk, B., Kaldo, V. (2024). Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions. Digital Health, 10.

Published in

ISSN: 2055-2076
Digital Health

Related PubData resources

Supplemented by

Resource

Dataset

Supplementary Material for the Paper "Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions"

Zantvoort, Kirsten; Nacke, Barbara; Görlich, Dennis; Hornstein, Silvan; Jacobi, Corinna; Funk, Burkhardt

2025

Author

Zantvoort, Kirsten

Hentati Isacsson, Nils

Funk, Burkhardt

Kaldo, Viktor

Abstract

Objective This study proposes a way of increasing dataset sizes for machine learning tasks in Internet-based Cognitive Behavioral Therapy through pooling interventions. To this end, it (1) examines similarities in user behavior and symptom data among online interventions for patients with depression, social anxiety, and panic disorder and (2) explores whether these similarities suffice to allow for pooling the data together, resulting in more training data when prediction intervention dropout. Methods A total of 6418 routine care patients from the Internet Psychiatry in Stockholm are analyzed using (1) clustering and (2) dropout prediction models. For the latter, prediction models trained on each individual intervention's data are compared to those trained on all three interventions pooled into one dataset. To investigate if results vary with dataset size, the prediction is repeated using small and medium dataset sizes. Results The clustering analysis identified three distinct groups that are almost equally spread across interventions and are instead characterized by different activity levels. In eight out of nine settings investigated, pooling the data improves prediction results compared to models trained on a single intervention dataset. It is further confirmed that models trained on small datasets are more likely to overestimate prediction results. Conclusion The study reveals similar patterns of patients with depression, social anxiety, and panic disorder regarding online activity and intervention dropout. As such, this work offers pooling different interventions’ data as a possible approach to counter the problem of small dataset sizes in psychological research.

Keywords

Mental Health; Digital Health; Machine Learning

Faculty / department

Fakultät Management und Technologie

Notes

This publication was funded by the German Research Foundation (DFG).

More information

Creation Context

Research

Collections

Literaturpublikationen

Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions

Chronological data

Language of the resource

Related external resources

Related PubData resources

Supplemented by

Editor

Author

Case provider

Other contributors

Abstract

Keywords

Faculty / department

Notes

More information

DDC

Creation Context

Collections