Back to home

AI Doesn't Understand Your Codebase (And That's Your Fault)

Jan 15, 20264 min read
AIdbtDocumentationData Engineering
TL;DR: We spent more time explaining our dbt project to AI than the AI saved us. The fix? Document everything. Now we just hand it a ticket and it works.

The Setup

Picture this: A 3-year-old dbt codebase. 500+ models. 5 developers pushing fixes and features daily. Custom macros refined from working with dozens of clients. A beautiful mess of tribal knowledge.

Then came the AI coding assistants.


The Pain

Every. Single. Time.

"What does this macro do?"

"Where are your staging models?"

"What's the naming convention here?"

"Why is this variable set this way?"

We weren't pair programming with AI. We were onboarding a new hire with amnesia. Every day. Multiple times a day.

The time spent explaining our project setup was greater than the time saved by AI.


The Dirty Secret

Here's what 99% of the AI coding tutorials don't tell you: they're all building new projects.

Starting fresh with AI is easy. You don't need to provide context — you're building the context together. Every file, every pattern, every convention is established collaboratively. The AI was there from line one.

But existing codebases? That's a different beast entirely.

Your 3-year-old project has history. It has quirks. It has "we tried that once and here's why it didn't work." None of that is written down. It's all in your team's heads.

And the AI? It's walking into a movie halfway through, asking who all these characters are.


The Realization

Here's what nobody tells you about AI coding assistants:

1.Your codebase isn't "standard" — It's built on years of decisions, client learnings, and team conventions
2.AI is powerful but context-blind — It knows dbt syntax, but not your dbt
3.The bottleneck isn't AI capability — It's AI understanding

Out of the box, AI will never understand your use case. It doesn't know that you use a specific testing dataset, or that source_system means something particular in your context, or that you have a custom generate_schema_name macro that changes behavior based on target.


The Fix: Document Everything

So we started documenting. Not for humans — for AI.

What we documented:

  • Project structure: Where different model types live and why
  • Naming conventions: How we name models, columns, macros
  • Custom macros: What they do, when to use them, common patterns
  • dbt variables: What each one controls, default values, when to override
  • Testing patterns: How we test, which datasets we use, edge cases we've learned from clients
  • Common workflows: "To add a new source, do X, Y, Z"
  • Where we put it:

  • .cursor/rules for Cursor-specific context
  • Detailed README.md files in key directories
  • Inline comments explaining the why, not just the what
  • How much documentation:

    More than you think. We probably have more lines of documentation than lines of dbt code now.


    The Result

    The transformation was remarkable.

    Before: "Here's a ticket, let me explain our entire project structure, our conventions, our macros, our testing approach..."

    After: "Here's the ticket." Done.

    The AI now knows:

  • Where to look for different model types
  • How to follow our naming conventions
  • When to use which custom macros
  • How we typically structure tests
  • What our variables mean

  • The Takeaway

    In the AI era, documentation isn't optional. It isn't a "nice to have" for future developers. It's essential infrastructure for your AI tools to function.

    Think of it this way:

  • Old mental model: Documentation is for humans to read occasionally
  • New mental model: Documentation is for AI to read constantly
  • Your AI assistant will read every piece of documentation you write. It will actually use your README files. It will follow your conventions if you write them down.

    So write them down.


    Practical Steps

    If you're struggling with AI on a complex codebase:

    1.Start with structure — Document where things live
    2.Explain the "why" — Not just what a macro does, but when to use it
    3.Capture tribal knowledge — Those things "everyone knows" that aren't written anywhere
    4.Include examples — AI learns patterns from examples
    5.Update as you go — When AI gets something wrong, document the correction

    Final Thought

    In the age of AI, the best thing you can do for your team isn't just writing good code — it's writing good context.

    Documentation has always been important. Now it's essential.

    Your AI is only as good as your documentation.