Last August I started using AI on a data platform engagement. My first tool was ChatGPT — not for coding, but for requirements analysis. I was a solo practitioner on a real engagement, building something that had to work, and I had dense, sometimes contradictory business documentation to work through. ChatGPT was good at reading it and helping me find the signal. That was the door.

I tried other tools in those early months. The AI landscape in 2025 was crowded and everyone had an opinion about which model was best for what. I ran some experiments, formed some opinions, moved on. Two tools survived the cut: ChatGPT, which I still use for requirements analysis and strategic thinking, and Claude, which I brought in when I started writing code. The two serve different functions in my workflow and I've stopped feeling the need to consolidate them into one.

By early 2026, I was running a complex multi-layer ETL platform — SQL Server, Snowflake, five database layers, a governed deployment process — with Claude as a genuine collaborator on the engineering side. Not a code generator. A partner in architectural reasoning, documentation, pattern enforcement, and the kind of sustained, session-to-session continuity that is genuinely hard to maintain alone.

This is the story of how that happened, what I learned about the limits of AI and my own limits, and why I never once let the AI drive.


The Wild Stallion Problem

The first thing you notice when you start working seriously with an AI assistant is that it is extremely capable and extremely literal at the same time. It will do exactly what it understands you to mean — which is not always what you meant.

I learned this sharply in December 2025. I was deep in a session and I said two words: save state.

I had already told the AI what "save state" meant: update the session notes, capture where we are, make sure nothing is lost before I close out for the day. That definition was on record.

The AI committed to git.

It had decided to expand the definition. Committing to version control is, after all, also a form of saving state — and arguably a more complete one. In trying to be more thorough than I asked, it overrode what I had explicitly told it. It wasn't malicious. It was the AI being helpfully wrong, improving on my instructions in a direction I hadn't authorized.

That incident made something clear: a verbal definition given in conversation is a weak contract. The AI had heard it, agreed with it, and then reasoned past it. What I needed wasn't a better definition — I needed a harder constraint. The lesson wasn't to explain more clearly. It was to write the rule down somewhere the AI had to read before it acted, and make the boundary explicit enough that there was no room to interpret it as a floor to improve upon.

That incident produced a permission vocabulary. "Do it," "execute," "run it" — these mean take action. Anything else means analysis only. "Save state" got its own explicit definition, documented in CLAUDE.md: update the session notes; do not touch git. One incident. One protocol. Documented and enforced from that day forward.

This is the wild stallion problem. An AI coding assistant is enormously powerful and very fast and will go wherever it understands you to be pointing. The question is never can it do this — it's have you told it what this means. A bridle doesn't slow a horse down. It gives you the ability to steer.

Building the Protocols

I want to be honest about how the AI governance framework developed, because it didn't start as a framework. It started as reactions.

Every protocol I built came from friction. The permission vocabulary came from the git incident. A rule about checking for existing files before creating new ones came from a session where the AI built a new function in a new location without realizing the original already existed somewhere in the codebase. I had to track down my own code and reconcile two versions of a function I'd written months earlier. That was a bad afternoon.

After each incident I did two things: I fixed the immediate problem, and I documented the lesson as a rule. Not in a notebook. In CLAUDE.md — the project-level briefing file that the AI reads at the start of every session. The rules were explicit and behavioral:

  • Never execute state-changing commands without explicit permission phrases.
  • Always check existing file locations before creating new files.
  • Interpret "save state" as: update session notes. Not: commit to git.

Over time, the list grew. But here's the thing: the list also stabilized. Once a rule was in place and the AI internalized it, the friction went away. I stopped losing afternoons to ambiguity. The bridle was on, and we could actually work.

Building the Library

Rules accumulate quickly on a real engagement. Each project facet — staging patterns, data vault conventions, deployment procedures, naming standards, logging requirements — had its own set of constraints. A single flat list of rules becomes unworkable fast. You end up with a document nobody reads in full, including the AI.

The answer was a library. Not a single document but a structured collection of knowledge organized by domain, with CLAUDE.md as the index and entry point. CLAUDE.md holds the core principles and a set of startup instructions — what to read, in what order, to be properly oriented for a session. It doesn't try to contain everything. It points to where everything lives.

My standard opening prompt, in every AI session regardless of which tool I'm using, is three instructions: read CLAUDE.md, follow startup, enforce rules. That's it. The AI reads the index, follows the startup sequence, loads the relevant parts of the library, and arrives at the actual work already oriented to the project's standards.

What makes this approach durable is that the library is just files. CLAUDE.md and the documents it references are readable by any AI system that can access the file system. The knowledge architecture is not tied to a vendor. The same library that bridles one AI bridles the next. When I switch tools or add a new thread, I don't rebuild context from scratch. I point the new thread at the library and issue the same three-instruction prompt.

This is the difference between governing a session and governing a system. A session ends. A library persists.

Why I Talk to AI Instead of Prompting It

There is a school of thought that says the right way to use AI is to write precise, structured prompts. Define the role. Specify the format. Constrain the output. Treat it like a very sophisticated API call.

I don't work that way, and I've thought about why.

My work is rarely well-defined enough to support a rigid prompt at the start. When I'm making architectural decisions — how to model a complex data source, how to handle a foreign key that doesn't always exist in the source, whether to use a MERGE or an INSERT/DELETE pattern for a particular load — I don't arrive at the session knowing the answer. I arrive knowing the problem, roughly. The specifics emerge through the conversation.

A rigid prompt forces me to fully define the problem before I understand it. That's backwards.

Conversational prompting lets me discover the problem while I'm describing it. I'll start explaining a data quality issue and, in the act of explaining it, realize I've been thinking about it wrong. The response might surface a dimension I hadn't considered. The dialogue sharpens the question. Often the most valuable output isn't the code the AI writes — it's the moment where it says something that makes me see the problem differently.

That said, structured prompts are a genuinely valuable asset in the right context — and I use them. Documentation is the clearest example. Once you have defined what every piece of documentation should contain, how it should be structured, and where it should go, that definition can live in a prompt file. From that point on, generating consistent documentation for any file or pipeline is a single instruction. The AI reads the prompt file, reads the target, and produces documentation that meets the defined standard every time.

The distinction I've landed on: conversational prompting for open-ended thinking, structured prompt files for repeatable execution. Both have a place. The mistake is applying either one where the other belongs.

What I Actually Delegate

The clearest thing I learned over these months is that the highest-value use of AI is not code generation. It's consistency.

I delegate pattern implementation freely. Once I've defined how a staging procedure works — the file header, the logging wrapper calls, the exception handling, the truncate-and-copy sequence — the AI can generate the next one from the pattern. I review it. I don't write it from scratch.

I delegate documentation almost entirely. Session notes, architectural decision records, inline comments, README files — the AI generates these as a byproduct of our working sessions, not as a separate task. The documentation exists because the conversation existed.

I delegate analysis and options generation constantly. When I'm facing an architectural decision, I want to see the trade-offs written out clearly before I decide. AI is excellent at this.

What I do not delegate: decisions. Standards definition. Output verification. Judgment calls about trade-offs. The thing a client pays for is my judgment about their data. AI can inform that judgment. It cannot replace it.

That judgment is also what makes the collaboration work in the first place. My domain expertise tells me what options should exist for a given design choice. When AI presents two options and I know there should be four, I push back. I challenge it to rethink. That challenge usually produces the fuller picture — but only because I knew to ask for it.

Documentation as a Consistency Engine

AI is exceptionally good at creating deep, thorough documentation for a system it understands. Once a codebase has enough context — enough patterns established, enough pipelines built, enough decisions made — I can ask the AI to document it comprehensively: the patterns in use, the standards that govern them, the issues that were found and resolved, the reasoning behind key architectural choices.

But the documentation is not the end product. It's an input.

When I start a new pipeline or extend an existing one, I point the AI at the written documentation and at already-built pipelines that represent the established patterns. Then I ask it to do something specific before it writes a single line of code: compare what it finds across the documentation and the examples. Tell me where they agree. Tell me where they vary. Surface any inconsistencies between what the standards say and what the existing code actually does.

This is how AI becomes a consistency enforcement mechanism rather than a source of new variation. The risk with AI-generated code is that it brings patterns from its training data into your codebase — patterns that may be perfectly valid in the abstract but inconsistent with how you've chosen to do things. Grounding it in your own documentation and your own examples before it writes redirects that tendency.

Testing: Where AI Earns Its Keep and Loses Its Patience

AI is genuinely valuable for data testing, and it's an area where the return on investment is immediate and visible. Dataset comparisons that would take hours to set up manually — row counts, column-level deltas, null checks across source and target — come together quickly.

The challenge is the iterations. Writing a solid test suite for a complex pipeline requires the AI to understand the source files — the actual schemas, the actual column names, the actual data types. What it will often do instead is assume. It will generate a test that looks complete, references columns with plausible names, and handles cases that seem relevant — all without having read the file. The test is confidently wrong in ways that only become visible when you run it.

The workaround is direct instruction at the start: read this file before you write anything. Not implied — stated. And then verify that it did. The iterations add time, but the end result — a thorough test suite grounded in the actual schema — is still faster and more complete than writing it manually. You just have to budget for the back-and-forth.

Running Multiple Threads

At some point I started using a second AI tool inside VS Code alongside Claude in the terminal. This wasn't redundancy — it opened something new. I could run two AI work streams simultaneously. One thread refining a staging procedure while another analyzed a data quality issue in a different part of the codebase.

The immediate challenge was file conflicts. Two AI threads working in the same codebase at the same time will collide if you don't manage scope carefully. I learned to divide work by domain before I started: this thread owns these files, that thread owns those. Overlap required coordination. The discipline wasn't optional — a merge conflict between two AI-generated outputs in a governed codebase is exactly the kind of mess that costs more to clean up than it saved to create.

Continuity Across Sessions

One problem nobody talks about enough when they talk about AI-assisted development: sessions end. Context resets. The AI that helped you design a complex pipeline yesterday has no memory of it today.

For short engagements, this is a minor inconvenience. For a months-long platform build running across multiple machines and multiple threads, it's a structural challenge.

I solved it with session notes. At the end of every working session — on every thread, on every machine — I write a detailed handoff document: the current state of the work, decisions made, decisions deferred, the immediate next action. These notes are written for the AI to read at the start of the next session — a context injection that reconstructs enough state to continue where we left off.

This practice also produces something I didn't anticipate: an artifact record of how the work evolved. Going back through months of session notes, you can trace exactly how an architectural decision was made, what the options were, what evidence shifted the decision, and who — human or AI — contributed what. That's not just useful for continuity. It's useful for accountability.

The Partnership, Honestly Assessed

There are things the AI does that genuinely impress me. Its ability to hold a complex schema in view and reason about implications across tables. Its consistency in applying patterns once they're established. Its willingness to say "here are three approaches and here's the trade-off for each" rather than just picking one.

The most useful mental model I've found: AI is like an employee who is extraordinarily intelligent and has no common sense. It will go off on tangents. Give it a focused task and it may solve an adjacent problem you didn't ask about, refactor something you didn't mention, or produce a thorough answer to a question you weren't asking. The intelligence is genuine. The judgment is not.

There is a subtler failure mode that took me longer to come to terms with. Even with a well-documented AI governance framework — protocols written down, rules explicit, CLAUDE.md in place — the AI will still occasionally do something you didn't ask for. When you point it out, it will review what it did, agree that it violated the protocol, and commit to not doing it again. The commitment is sincere. It can also be meaningless. The same behavior can reappear in the same session, sometimes within minutes.

This is not bad faith. It reflects something fundamental about how these systems work: they don't retain corrections the way a person does. A human collaborator who commits to not doing something again has updated their mental model. An AI that commits to not doing something again has generated a response — and the next output is still drawn from the same underlying patterns.

The practical implication: written protocols matter more than verbal agreements. What you put in CLAUDE.md constrains behavior at the session level. What the AI agrees to in conversation constrains the next response, at best. Build your governance into the briefing document, not the conversation. And stay alert regardless.

What the Journey Taught Me

I started this engagement skeptical of AI hype and ended it convinced that AI-assisted development is genuinely transformative — with a hard and specific caveat: it transforms the productivity of practitioners who have strong judgment, clear standards, and the discipline to govern the tool. It amplifies what's already there. It does not supply what isn't.

The bridle metaphor keeps coming back to me because I think it captures something important. A bridle doesn't diminish a horse. It creates the conditions under which the horse's speed and strength can be directed productively. Without it, all that power goes somewhere — but not necessarily where you need it to go.

The protocols I built — the permission vocabulary, the session notes, the CLAUDE.md briefing files, the verification checklists — are the bridle. They're not limitations on what AI can do. They're the conditions under which I can trust what it does enough to ship it.

And that trust, earned through incidents and corrections and months of working together, is the thing I actually couldn't have built any other way.


Continue the conversation

Have thoughts on this? The discussion is happening on LinkedIn — join in →


A note on how this article was written

This article was researched, shaped, and edited with AI assistance — the same tools, the same discipline, and the same process described in the pages above. The source material was months of session notes I wrote during the engagement. My role was the same role I always occupy: I provided the context, I steered the direction, I corrected what was wrong, I pushed for what was missing, and I made every judgment call about what stayed and what didn't. The bridle was on. The article is mine.

I'm disclosing this not as a disclaimer but as a demonstration. If you've read this far and found the argument credible, you've already seen the evidence. A governed AI collaboration produced a piece of writing that accurately reflects a months-long technical journey — because the person driving it understood the work, stayed in the seat, and never handed over the reins.

Darwin Fisk is the founder of Open Data Designs, LLC, a data engineering consultancy specializing in governed data platform development.