Supplementary Material for the Paper "Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions"

Zantvoort, Kirsten; Hentati Isacsson, Nils; Funk, Burkhardt; Kaldo, Viktor

Dataset Handle: 20.500.14123/1739

Supplementary Material for the Paper "Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions"

Archiving without access

No downloads available

Chronological data

Date of availability in catalog2025-01-22

Available from / since 2025-01-22

Language of the resource

English

Related external resources

Supplement to

DOI: 10.1177/20552076241248920
Zantvoort, K., Hentati Isacsson, N., Funk, B., Kaldo, V. (2024). Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions. Digital Health, 10.

Related PubData resources

Supplement to

Resource

Dissertation

Machine Learning Dropout Predictions for Personalizing Digital Mental Health Interventions

Zantvoort, Kirsten

2025 | DOI: 10.48548/pubdata-1596

Author

Zantvoort, Kirsten

Hentati Isacsson, Nils

Funk, Burkhardt

Kaldo, Viktor

Abstract

This study proposes a way of increasing dataset sizes for machine learning tasks in Internet-based Cognitive Behavioral Therapy through pooling interventions. To this end, it (1) examines similarities in user behavior and symptom data among online interventions for patients with depression, social anxiety, and panic disorder and (2) explores whether these similarities suffice to allow for pooling the data together, resulting in more training data when prediction intervention dropout. A total of 6418 routine care patients from the Internet Psychiatry in Stockholm are analyzed using (1) clustering and (2) dropout prediction models. For the latter, prediction models trained on each individual intervention's data are compared to those trained on all three interventions pooled into one dataset. To investigate if results vary with dataset size, the prediction is repeated using small and medium dataset sizes. The clustering analysis identified three distinct groups that are almost equally spread across interventions and are instead characterized by different activity levels. In eight out of nine settings investigated, pooling the data improves prediction results compared to models trained on a single intervention dataset. It is further confirmed that models trained on small datasets are more likely to overestimate prediction results.

Resource type

Dataset

Kinds of Data

Statistical Evaluations / Tables
Context Materials / Supporting information

Methods

Aggregation
Analysis of digital content
Description

Thematic classification

Data Science

Keywords

Maschinelles Lernen; Data Science; Prognose; Algorithmus; Gesundheitsdaten; Digitale Gesundheit; Mentale Gesundheit; Psychische Störung; Intervention; Therapeutik; Machine Learning; Data Science; Prediction; Algorithm; Health Data; Digital Health; Mental Health; Psychiatric Disorder; Intervention; Therapeutics

Notes

The supplementary material is available for download. Please visit the article linked below to gain access. You will find the file in the chapter "Supplementary Material".

Supplementary Material for the Paper "Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions"

Chronological data

Language of the resource

Related external resources

Related PubData resources

Supplement to

Editor

Author

Other contributors

Abstract

Resource type

Kinds of Data

Methods

Thematic classification

Keywords

Notes