How I Structured Prompts for a Consistent D&D 5e AI GM
Building an AI Dungeon Master sounds straightforward until you actually try it. Left to its own devices, an LLM will happily let a level-1 fighter dual-wield longswords, cast spells they never learned, and narrate a killing blow before the player has rolled a single die. The core challenge is consistency.
Layer 1: The System Prompt as a Rules Contract
The foundation is a dense, versioned system instruction that functions like a contract the model must sign before every session. It covers:
- Inventory and resource tracking: The AI is told explicitly what it can offer based on the current character state. When an item is used, the model emits a machine-readable tag (ITEM_USED: Item Name).
- Spell management by caster type: The prompt distinguishes between prepared casters (Clerics), known casters (Bards), and spellbook casters (Wizards). Warlock slots are specifically noted to recharge on short rests.
- Rolls are mandatory: The model is shown "wrong" vs "correct" patterns to ensure it requests a roll and waits for the result before narrating an outcome.
- Encounter balance: Level 1-3 encounters have hard HP caps (max 7 HP for level 1) to prevent "run-ending" single turns.
Layer 2: Injecting Rules via RAG (Retrieval-Augmented Generation)
Specific mechanics are injected exactly when they matter. Every turn, a keyword extractor identifies signals like "sneak" or "cast" and pulls the top 3 relevant rules from a database:
RELEVANT D&D 5E RULES:
- Sneak Attack: Requires a finesse/ranged weapon plus advantage.
- Two-Weapon Fighting: Off-hand attack uses a bonus action and adds no damage modifier.
Layer 3: Game State as the Source of Truth
To prevent "hallucinated" continuity, the entire engine state is serialized into every prompt turn. This includes:
- Combat state: Initiative order, enemy distance, line of sight, and cover values.
- Active effects: Concentration status, conditions with round durations, and death save trackers.
- Combat trace: A record of what mechanics were applied in the previous turn to prevent narrative drift.
Layer 4: Enforced Response Structure
The model is required to return four distinct sections to ensure the parser can read the output:
- [NARRATIVE]: The creative story beat.
- [MECHANICS]: Machine-readable tags like HP_CHANGE or ROLL:1d20.
- [SUGGESTIONS]: Player options tagged with a "roll:true/false" flag.
- [CHRONICLE]: A one-line log entry for significant campaign beats.
Testing and Lessons Learned
Instead of traditional unit tests, I used Instrumented Playtesting. This involved logging every injected rule and mechanic event to catch edge cases, such as the model adding unexpected suffixes to item names.
The biggest takeaway: The gap between "the rules" and "what the model does" is where the engineering lives. Precise descriptions and real-time state injection are the only ways to survive a 50-turn campaign without the AI losing the plot.