Why Adding Schemas Supercharges AI Answer Quality

Why Adding Schemas Supercharges AI Answer Quality

Context Engineering
author

By Caber Team

29 Jul 2025

We all know the story: you’re building out a project, tweaking code on your organically grown website. One day you prompt an AI:

“I need a function to do X.”

It generates something good. Then later:

“I need my function to do X + Y.”

Instead of improving the original function and updating all its references, the AI spins up a new function. Now you’ve got duplicated logic—some calls point to the old function, some to the new. Chaos creeps in.

In my own codebase I saw this pattern over and over. For example:

// Original helper
function processOrder(order) {
   validate(order);
   submit(order);
}

// Later request to add tracking
function processOrderWithTracking(order) {
   validate(order);
   submit(order);
   track(order);
}

Half the code called processOrder, half called processOrderWithTracking.
Downstream references in invoice.js still used the old function, while shipment.js used the new one.
The AI had no visibility into the function graph to know it should merge logic and update references.


Models Are Great, But Data Rules the Outcome

AI models are getting better fast. Switching between them is becoming less painful, and their raw capabilities feel similar. So how do you know which one to use?
You don’t—because the model isn’t the whole story.

The real question is:
👉 What data are you feeding it?

When you provide structure, even a little, answer quality skyrockets. For example, supplying a database schema instead of just dumping tables into a prompt improved one system’s answer quality from 72% to 87%—a 15% jump for minutes of extra work.



Example: Try #1 — Use AI to find unused and disconnected files in my codebase

My Prompt:

Identify all files that are disconnected from the marketing pages, authorization, deployment and dashboard.  I'm looking for unused files that may be referenced in other files that themselves are disconnected from the core graph.

AI Response:

[...]
This cleanup could remove approximately **20-25% of unused code** from your components directory and simplify the codebase significantly.

Time: 3 minutes 12 seconds



Unstructured Dump vs. Structured Context

Dumping unstructured data into a context window often fails to deliver the right results. You need to give your AI not just data, but data with context:

  • What each element is.
  • How elements are related.
  • Which relationships matter for the task at hand.

Think of it like giving your AI a map instead of a random list of street names.

Adding Structural Hints Improves Responses

Graph-based RAG (retrieval augmented generation) is a great start, but the trick is building relationships:

  • What functions call what.
  • What database fields depend on what tables.
  • How pieces of documentation relate to each other.

Here’s a snippet from my codebase that demonstrates the value of mapping out dependencies between files:

index.js -> utils/logger.js
index.js -> services/api.js
services/api.js -> utils/logger.js
services/api.js -> models/user.js

When you represent your code this way, the AI can see that both index.js and services/api.js rely on the same logger—so a change in logger.js affects them both. Instead of duplicating logic, the AI could reason:

Modify logger.js once and update its usage everywhere.`



Example: Try #2 — Was I about to delete needed code?

Deterministically build a dependency graph

caber% time npm run analyze:json

> caber-systems-portal@0.1.0 analyze:json
> npx madge --json src/ > dependency-graph.json

npm run analyze:json  1.41s user 0.23s system 79% cpu 2.070 total

My Prompt:

Given the attached dependency graph, identify all files that are disconnected from the marketing pages, authorization, deployment and dashboard.  I'm looking for unused files that may be referenced in other files that themselves are disconnected from the core graph.

AI Response:

[...]
This analysis shows your codebase is relatively clean with only ~8.5% disconnected files, mostly old pages and test files.

Time: 2 minutes 3 seconds + 2.070 seconds = 2 minutes 5 seconds ⬅️



Where Does Context Come From?

Context comes from data elements and the patterns in their creation and use. Some patterns are standardized (a function calling another). Some are complex (like how weather patterns can affect call center volume). When you surface these patterns you’re giving AI clues about your data that it often won't uncover on its own.

In the examples above providing structured context (the dependency graph), saved us about 35% of the time while getting a more accurate result (8.5% disconnected files vs the original estimate of 20-25% unused code).

Why It Matters

Interconnectedness in data is something we take for granted but rarely exploit. By unraveling those relationships and feeding them to the AI, you elevate it from a code generator or summarizer into a trustworthy partner that understands your data.

It’s not the model—it’s the data you give it.

Popular Tags:
AI Data Quality
Context Engineering
Follow us on LinkedIn:
Share this post: