Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP

Pfeifer, Jann

doi:10.48548/pubdata-2789

Master ThesisFirst publicationDOI: 10.48548/pubdata-2789

Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP

Downloads

Pfeifer_Text-to-SPARQL_Generation_with_Reinforcement_Learning_MA.pdf3 MB

Open Access

Chronological data

Date of first publication2026-01-13

Date of publication in PubData 2026-01-13

Date of thesis submission2025-12-10

Date of defense2025-12-16

Language of the resource

English

Author

Pfeifer, Jann

Referee

Usbeck, Ricardo

Banerjee, Debayan

Abstract

Knowledge graph question answering seeks to translate natural language questions into executable queries over structured knowledge graphs, but existing approaches often rely on large models or full supervision in the form of gold query annotations. This study examines whether reinforcement learning with outcome-based rewards can train a small instruction-tuned language model to perform zero-shot Text-to-SPARQL generation in the scholarly domain. Group-Relative Policy Optimization (GRPO) is applied to the Qwen3-1.7B model on DBLP-QuAD, using prompts that combine natural language questions with symbolic hints about entities and relations. Training relies on execution feedback, structural constraints, and answer-level rewards, with an additional variant that incorporates gold-query-based shaping. The resulting models are compared to the unmodified zero-shot baseline and to a supervised DoRA-finetuned baseline using exact-match accuracy, execution accuracy, answer-set F1, category-wise scores, temporal accuracy, and generalization to held-out templates. The results show that GRPO substantially improves performance over the zero-shot baseline across most metrics and exhibits competitive generalization behavior, while supervised DoRA finetuning achieves higher overall accuracy under the same model scale. Ablation analyses indicate that execution-based rewards account for most gains, with additional shaping terms yielding limited effects. Overall, the findings indicate that GRPO can meaningfully improve Text-to-SPARQL performance without access to gold queries, while supervised finetuning remains advantageous when such supervision is available.

Keywords

Text-to-SPARQL; Reinforcement Learning (RL); Natural Language Processing (NLP); Knowledge Graph Question Answering (KGQA)

Grantor

Leuphana University Lüneburg

Study programme

Management & Data Science

More information

DDC

006.35

Creation Context

Study

Collections

Literaturpublikationen

Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP

Chronological data

Language of the resource

Editor

Author

Referee

Advisor

Case provider

Other contributors

Abstract

Keywords

Grantor

Study programme

More information

DDC

Creation Context

Collections