The "swiss-building-law-rag-bench" dataset, created by MarcoFurrer, is now available on Hugging Face. This dataset is intended to be an evaluation benchmark for Retrieval-Augmented Generation (RAG) systems, focusing on Swiss cantonal building law. It was developed as part of a bachelor's thesis on optimizing RAG pipelines for German legal texts. The dataset contains question-answering pairs, with a German subset comprising 318 entries and a multilingual subset (DE/FR/IT) with 270 entries. These pairs are grounded to specific article-level passages within legal documents. The dataset is licensed under CC-BY-4.0 and supports multiple languages, including German, French, and Italian, making it a valuable resource for multilingual legal AI research.
Featured on AI Radar: Swiss Building Law RAG Benchmark Dataset Released on Hugging Face