Please use this identifier to cite or link to this item:
https://doi.org/10.48548/pubdata-1422
Resource type | Journal Article |
Title(s) | Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions |
DOI | 10.48548/pubdata-1422 |
Handle | 20.500.14123/1491 |
Creator | Zantvoort, Kirsten 0000-0001-9876-054X Hentati Isacsson, Nils 0000-0002-5749-5310 Funk, Burkhardt 0000-0001-5855-2666 Kaldo, Viktor 0000-0002-6443-5279 |
Abstract | Objective This study proposes a way of increasing dataset sizes for machine learning tasks in Internet-based Cognitive Behavioral Therapy through pooling interventions. To this end, it (1) examines similarities in user behavior and symptom data among online interventions for patients with depression, social anxiety, and panic disorder and (2) explores whether these similarities suffice to allow for pooling the data together, resulting in more training data when prediction intervention dropout. Methods A total of 6418 routine care patients from the Internet Psychiatry in Stockholm are analyzed using (1) clustering and (2) dropout prediction models. For the latter, prediction models trained on each individual intervention's data are compared to those trained on all three interventions pooled into one dataset. To investigate if results vary with dataset size, the prediction is repeated using small and medium dataset sizes. Results The clustering analysis identified three distinct groups that are almost equally spread across interventions and are instead characterized by different activity levels. In eight out of nine settings investigated, pooling the data improves prediction results compared to models trained on a single intervention dataset. It is further confirmed that models trained on small datasets are more likely to overestimate prediction results. Conclusion The study reveals similar patterns of patients with depression, social anxiety, and panic disorder regarding online activity and intervention dropout. As such, this work offers pooling different interventions’ data as a possible approach to counter the problem of small dataset sizes in psychological research. |
Language | English |
Keywords | Mental Health; Digital Health; Machine Learning |
Year of publication in PubData | 2024 |
Publishing type | Parallel publication |
Publication version | Published version |
Date issued | 2024-05-15 |
Creation context | Research |
Notes | This publication was funded by the German Research Foundation (DFG). |
Published by | Medien- und Informationszentrum, Leuphana Universität Lüneburg |
Related resources |
Information regarding first publication
Field | Value |
---|---|
Resource type | Journal |
Title of the resource type | Digital Health |
Identifier | DOI: 10.1177/20552076241248920 |
Publication year | 2024 |
Volume | 10 |
Publisher | SAGE |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Zantvoort_dataset_size_versus_homogeneity.pdf License: open-access | 817.65 kB | Adobe PDF | View/Open |
Items in PubData are protected by copyright, with all rights reserved, unless otherwise indicated.
Views
Item Export Bar
Access statistics
Page view(s): 2
Download(s): 1