Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP
Chronological data
Date of first publication2026-01-13
Date of publication in PubData 2026-01-13
Date of thesis submission2025-12-10
Date of defense2025-12-16
Language of the resource
English
Editor
Author
Advisor
Case provider
Other contributors
Abstract
Knowledge graph question answering seeks to translate natural language questions into executable queries over structured knowledge graphs, but existing approaches often rely on large models or full supervision in the form of gold query annotations. This study examines whether reinforcement learning with outcome-based rewards can train a small instruction-tuned language model to perform zero-shot Text-to-SPARQL generation in the scholarly domain. Group-Relative Policy Optimization (GRPO) is applied to the Qwen3-1.7B model on DBLP-QuAD, using prompts that combine natural language questions with symbolic hints about entities and relations. Training relies on execution feedback, structural constraints, and answer-level rewards, with an additional variant that incorporates gold-query-based shaping. The resulting models are compared to the unmodified zero-shot baseline and to a supervised DoRA-finetuned baseline using exact-match accuracy, execution accuracy, answer-set F1, category-wise scores, temporal accuracy, and generalization to held-out templates. The results show that GRPO substantially improves performance over the zero-shot baseline across most metrics and exhibits competitive generalization behavior, while supervised DoRA finetuning achieves higher overall accuracy under the same model scale. Ablation analyses indicate that execution-based rewards account for most gains, with additional shaping terms yielding limited effects. Overall, the findings indicate that GRPO can meaningfully improve Text-to-SPARQL performance without access to gold queries, while supervised finetuning remains advantageous when such supervision is available.
Keywords
Text-to-SPARQL; Reinforcement Learning (RL); Natural Language Processing (NLP); Knowledge Graph Question Answering (KGQA)
Grantor
Leuphana University Lüneburg
Study programme
Management & Data Science
