Prompt Architecture for a Reliable AI Dungeon Master

26 February 2026

How I Structured Prompts for a Consistent D&D 5e AI GM

Building an AI Dungeon Master sounds straightforward until you actually try it. Left to its own devices, an LLM will happily let a level-1 fighter dual-wield longswords, cast spells they never learned, and narrate a killing blow before the player has rolled a single die. The core challenge is consistency.

Layer 1: The System Prompt as a Rules Contract

The foundation is a dense, versioned system instruction that functions like a contract the model must sign before every session. It covers:

  • Inventory and resource tracking: The AI is told explicitly what it can offer based on the current character state. When an item is used, the model emits a machine-readable tag (ITEM_USED: Item Name).
  • Spell management by caster type: The prompt distinguishes between prepared casters (Clerics), known casters (Bards), and spellbook casters (Wizards). Warlock slots are specifically noted to recharge on short rests.
  • Rolls are mandatory: The model is shown "wrong" vs "correct" patterns to ensure it requests a roll and waits for the result before narrating an outcome.
  • Encounter balance: Level 1-3 encounters have hard HP caps (max 7 HP for level 1) to prevent "run-ending" single turns.

Layer 2: Injecting Rules via RAG (Retrieval-Augmented Generation)

Specific mechanics are injected exactly when they matter. Every turn, a keyword extractor identifies signals like "sneak" or "cast" and pulls the top 3 relevant rules from a database:

RELEVANT D&D 5E RULES:
- Sneak Attack: Requires a finesse/ranged weapon plus advantage.
- Two-Weapon Fighting: Off-hand attack uses a bonus action and adds no damage modifier.

Layer 3: Game State as the Source of Truth

To prevent "hallucinated" continuity, the entire engine state is serialized into every prompt turn. This includes:

  • Combat state: Initiative order, enemy distance, line of sight, and cover values.
  • Active effects: Concentration status, conditions with round durations, and death save trackers.
  • Combat trace: A record of what mechanics were applied in the previous turn to prevent narrative drift.

Layer 4: Enforced Response Structure

The model is required to return four distinct sections to ensure the parser can read the output:

  • [NARRATIVE]: The creative story beat.
  • [MECHANICS]: Machine-readable tags like HP_CHANGE or ROLL:1d20.
  • [SUGGESTIONS]: Player options tagged with a "roll:true/false" flag.
  • [CHRONICLE]: A one-line log entry for significant campaign beats.

Testing and Lessons Learned

Instead of traditional unit tests, I used Instrumented Playtesting. This involved logging every injected rule and mechanic event to catch edge cases, such as the model adding unexpected suffixes to item names.

The biggest takeaway: The gap between "the rules" and "what the model does" is where the engineering lives. Precise descriptions and real-time state injection are the only ways to survive a 50-turn campaign without the AI losing the plot.