Context-Aware Distillation Enhances Text2DSL Code Generation

Why it matters

This advancement in Text2DSL offers developers a more robust method for generating code from natural language. By incorporating structured context during distillation, the system achieves higher accuracy and reliability, potentially reducing manual coding effort and improving the quality of generated DSL code for various applications.

What changed

Researchers have extended their previous work on Text2DSL, a system designed for automatically generating domain-specific language (DSL) code from natural language descriptions. The primary innovation lies in replacing prompt-only synthetic generation with a technique called context-aware distillation. In this new method, a teacher large language model, specifically DeepSeek-V4-Flash, operates within a structured context. This context includes a Backus-Naur Form (BNF) grammar, an API specification, and a closed identifier vocabulary.

The output of this distillation process is a corpus that undergoes verification through a two-tier pipeline. This pipeline first validates the Abstract Syntax Tree (AST) using esprima and then checks for runtime acceptance using the production polkitd daemon and the pkcheck client. This enhanced process has scaled the verified PolkitBench corpus from 4,204 to 10,073 natural-language-to-Polkit-rule pairs. The system now reports 100.0% AST validity and a 99.7% runtime pass rate.

Furthermore, the study conducted a per-component factorial ablation of the structured context elements. This evaluation was performed on the GigaChat-10B-A1.8B model using the newly generated corpus, examining eight different conditions (C0-C7).

Three key findings emerged from this ablation study:

1. **Contextual Robustness:** The new, more challenging corpus significantly degraded the performance of the baseline mode (Syntax Valid dropping from 97.6% to 58.5%, and Combined Score from 0.482 to 0.252). In contrast, the context-enhanced mode showed only a marginal degradation (Syntax 98.6% to 97.4%, Combined 0.801 to 0.750). This confirms that structured context is a critical, load-bearing mechanism rather than a superficial improvement. 2. **Optimal Context Configuration:** The best absolute performance was achieved with the full context (C7). Among partial context configurations, C5 (BNF + Vocabulary) and C6 (API + Vocabulary) performed strongest, with both incorporating the vocabulary. 3. **Component Importance:** A Shapley-style decomposition revealed that the vocabulary component had the largest effect on semantic quality (Combined Score increase of +0.198). The API specification and BNF grammar had the largest effects on structural validity, contributing +24.7 percentage points and +22.3 percentage points, respectively.

Why it matters for builders

This research offers a more sophisticated approach to generating DSL code from natural language, directly benefiting developers. The context-aware distillation method, by leveraging structured information like grammars and APIs, promises to produce more accurate and reliable code. This can significantly reduce the time and effort developers spend on manual coding and debugging, especially when working with complex DSLs.

For builders involved in code generation, DSL creation, or natural language processing tasks, these findings highlight the importance of providing rich contextual information to language models. The detailed ablation study also provides insights into which contextual components are most critical for different aspects of code generation, allowing for more targeted optimization of such systems.

Practical impact

The practical impact for builders is a more dependable Text2DSL system. The substantial increase in the size and quality of the PolkitBench corpus, coupled with high AST validity and runtime pass rates, suggests that generated Polkit rules will be more functional and correct. This means developers can potentially integrate Text2DSL more confidently into their workflows for tasks involving policy management or other areas where Polkit is used.

The ablation study's findings on the importance of vocabulary, BNF grammars, and API specifications provide actionable guidance for anyone building or improving similar code generation systems. Builders can prioritize the inclusion and refinement of these contextual elements to maximize performance gains. The research demonstrates a path towards more robust and efficient automated code generation, reducing the burden of manual implementation and verification.

Caveats and source limits

The findings presented in this paper are based on research conducted using specific large language models (DeepSeek-V4-Flash and GigaChat-10B-A1.8B) and a particular dataset (PolkitBench). The performance metrics and component importance may vary when applied to different models, DSLs, or datasets. The study focuses on the generation of Polkit rules, and its direct applicability to other DSLs would require further investigation.

While the paper reports high AST validity and runtime pass rates, these are specific to the verified corpus and the evaluation pipeline. The authors do not provide information on the computational cost or latency introduced by the context-aware distillation process compared to simpler methods. Additionally, the research is presented as a preprint on arXiv, and has not yet undergone formal peer review, which could lead to revisions or further scrutiny of the findings.

Article ID - cmqqnpdhh0Featured on AI Radar: Context-Aware Distillation Enhances Text2DSL Code Generation