Master ThesisFirst publicationDOI: 10.48548/pubdata-1887

Text2SHACL

LLM-Driven Generation of Validation Graphs for Automatic Assessment of Social Benefit Eligibility

Chronological data

Date of first publication2025-07-15
Date of publication in PubData 2025-07-15
Date of thesis submission2025-06-10
Date of defense2025-07-03

Language of the resource

English

Editor

Case provider

Other contributors

Abstract

To increase social benefit take-up, applications like FörderFunke automatically assess eligibility based on user data and provide personalized information about relevant benefits. FörderFunke encodes eligibility requirements in the Shapes Constraint Language (SHACL), supporting the German government’s efforts to promote open standards in public IT and facilitating their solution’s maintainability, modularity, and interoperability. However, like other SHACL-driven approaches to automatic compliance checking, they face the challenge that converting natural language rules to SHACL requires significant manual effort and slows down development. Against this background, this work formally defines the Text2SHACL task and extends existing approaches in two key ways: First, we establish a principled foundation for utilizing SHACL in the formerly unexplored domain of social benefit eligibility assessment. Specifically, we introduce a domain-specific annotated dataset and a schema guiding through the Text2SHACL task alongside a qualitative analysis of critical SHACL- and domain-specific formalization challenges. Second, we explore an end-to-end approach to automating Text2SHACL driven by Large Language Models (LLMs), which overcomes limitations of prior work that struggles with complex constraints and diverse linguistic input. Adopting a prompt engineering methodology, we establish a Zeroshot baseline, analyze reasons for an overall poor initial performance, and demonstrate the positive effects of Fewshot and Chain-of-Thought prompting on the syntactic and semantic quality of generated shapes graphs for a selection of LLMs.

Keywords

SHACL; Natural Language Processing; Social Benefit

Grantor

Leuphana University Lüneburg

Study programme

Management & Data Science

Notes

The thesis was compiled with the support of FörderFunke UG with regard to the annotation processes. Code base and dataset available at https://github.com/semantic-systems/text-to-SHACL.git.

More information

DDC

006 :: Spezielle Computerverfahren

Creation Context

Study