You already know the pattern: you ask for “structured output”, the model returns something that looks like JSON, and your downstream code still explodes.
Usually it’s not even a “hard failure”. It’s boring stuff:
- A field is missing (so your parser gets
undefinedand later code assumes a string). - A key gets renamed (“location” vs “locations”).
- The model wraps JSON in extra prose (“Here’s the result:” + JSON + “Hope this helps”).
- A value changes shape (string vs array, number vs string, etc.).
This is the real problem behind LLM structured output validation: the model isn’t your type system. Your application logic is.
The naive approach: “just parse the JSON” (until it doesn’t)
Here’s the most common implementation I’ve seen in tool-calling and extraction pipelines: grab something JSON-y, then JSON.parse.
// Naive parsing: try to find the first {...} block and parse it
function parseModelOutput(raw) {
const match = raw.match(/\{[\s\S]*\}/);
if (!match) throw new Error("No JSON object found");
return JSON.parse(match[0]);
}
// Later
const data = parseModelOutput(modelText);
if (data.status === "success") {
doSomethingWith(data.result.link); // boom when fields drift
}
This fails in ways that look trivial in isolation and catastrophic in production.
Failure mode 1: trailing commentary
// Model output you might see:
const modelText = `{
"status": "success",
"result": { "link": "https://example.com/x" }
}
Note: this was generated automatically.`;
If your regex grabs from the first { to the last }, you’re fine. But if the model outputs multiple JSON blocks, or your regex is too greedy, you’ll parse the wrong thing.
Failure mode 2: “almost JSON”
const modelText = `{
"status": success,
"result": { "link": "https://example.com/x" }
}`;
JSON.parse throws. You catch it, maybe you retry, maybe you “fallback” to regex extraction of individual fields. That’s how you end up with partially correct objects and subtle downstream bugs.
Failure mode 3: hallucinated keys and missing fields
const modelText = `{
"state": "success",
"result": { "url": "https://example.com/x" }
}`;
This parses. It’s also wrong. If you only parse, you never learn that the status and link fields drifted. Your app runs, but it’s not doing what you think it’s doing.
What JSON Schema buys you: a contract, not a guess
When you validate with a schema, you stop treating model output like “text that might be right”. You treat it like a typed object that must satisfy rules:
- Required fields must exist.
- Optional fields can be absent, but if present they must match types.
- Type constraints prevent “string where a number belongs”.
- Enums prevent free-form text where you need a closed set.
- Nesting and arrays validate deep structure.
This is exactly the kind of discipline you want for structured outputs, tool calling, and LLM output parsing where the downstream consumer is strict.
OpenAI’s structured outputs guidance and Anthropic’s emphasis on reliable structured formats both boil down to the same engineering lesson: define constraints, then enforce them. JSON Schema is a good way to encode those constraints you can test.
References:
- https://platform.openai.com/docs/guides/structured-outputs
- https://json-schema.org/learn/getting-started-step-by-step
A working example: validate model output before using it
Assume you want the model to produce an object like:
status: one of"success"or"error"result: present only for success, with alink(string URL)error: present only for error, with a messageissues: optional array of strings
Here’s a JSON Schema that expresses that contract.
// schema.js
export const schema = {
type: "object",
additionalProperties: false,
required: ["status"],
properties: {
status: { type: "string", enum: ["success", "error"] },
result: {
type: "object",
additionalProperties: false,
required: ["link"],
properties: {
link: { type: "string", format: "uri" }
}
},
error: {
type: "object",
additionalProperties: false,
required: ["message"],
properties: {
message: { type: "string", minLength: 1 }
}
},
issues: {
type: "array",
items: { type: "string" }
}
},
// Enforce conditional presence
allOf: [
{
if: { properties: { status: { const: "success" } } },
then: { required: ["result"], not: { required: ["error"] } }
},
{
if: { properties: { status: { const: "error" } } },
then: { required: ["error"], not: { required: ["result"] } }
}
]
};Now validate a model response. This example uses AJV (a common JSON Schema validator for Node.js), but the pattern works with any validator.
// validate.js
import Ajv from "ajv";
import { schema } from "./schema.js";
const ajv = new Ajv({ allErrors: true, strict: false });
const validate = ajv.compile(schema);
export function parseAndValidateLLM(rawText) {
let obj;
// Keep parsing separate from validation.
// If the model output isn't JSON, we fail fast.
try {
obj = JSON.parse(rawText);
} catch (e) {
throw new Error(`Invalid JSON from model: ${e.message}`);
}
const ok = validate(obj);
if (!ok) {
// Important: don't silently continue with invalid objects.
const details = validate.errors?.map(err => ({
path: err.instancePath,
message: err.message
}));
throw new Error(`Schema validation failed: ${JSON.stringify(details)}`);
}
return obj;
}
// Usage
function handleResponse(modelText) {
const data = parseAndValidateLLM(modelText);
if (data.status === "success") {
// Safe: result.link exists and is a URI string
doSomethingWith(data.result.link);
} else {
logError(data.error.message);
}
}Notice the ordering: parse first, then validate. Validation can’t help you if you can’t parse.
Also notice additionalProperties: false. That single setting kills a whole class of hallucinated-key bugs. If the model returns {"statuz":"success"}, you don’t quietly ignore it—you reject it.
Schema design mechanics that matter in practice
These are the knobs you’ll actually use when building JSON Schema for LLMs workflows.
Required vs optional fields
If a field is essential for downstream logic, put it in required. If it’s optional, don’t. But remember: optional doesn’t mean “any type”. If it exists, it still needs a type.
Type constraints and shape validation
Constrain type aggressively:
stringfor IDs and URLsinteger/numberfor numeric valuesarraywithitemsfor listsobjectwithpropertiesfor nested data
If the model is likely to return the wrong type (it happens), schema validation will catch it. Parsing won’t.
Enums for “don’t be creative” fields
Use enum for fields like status, action, intent, source. LLMs are extremely good at producing plausible-but-wrong labels. Enums force correctness.
Arrays and nested objects
For arrays, always define items. For nested objects, define their own schema (including additionalProperties if you can).
// Example: tags is an array of allowed strings
tags: {
type: "array",
items: { type: "string", enum: ["bug", "feature", "question"] }
}
Edge cases you should assume will happen
Schema validation doesn’t magically fix everything. It makes failures explicit.
Trailing commentary and extra prose
Don’t validate raw text. Extract or isolate the JSON first (or use the model’s structured-output mode if available). If you keep regex-parsing the response, you’re back to the naive failure modes.
Invalid JSON
Schema validation can’t run if JSON.parse fails. In that case you should reject and retry with a prompt that asks for “JSON only” (or whatever constraint your model supports).
Hallucinated keys
This is where additionalProperties: false helps a lot. Without it, a model can return extra fields and still pass a “soft” schema.
Partially correct responses
You’ll see objects where half the fields match and the rest drift. Schema validation catches mismatches precisely, instead of letting your app stumble later.
Schema drift after prompt changes
If you change prompts, you change distributions. Eventually, the model stops producing the old shape. Treat schema changes like API changes: version them, test them, and don’t “just patch” your parser to accept the new drift.
One practical closing note: validate, then retry with a correction prompt
The simplest reliable pattern is:
- Ask the model for structured output.
- Parse JSON.
- Validate against JSON Schema.
- If validation fails, don’t keep going. Retry with the validation error details (or a short “fix the JSON to match the schema exactly” correction prompt).
It’s not glamorous. It works. Most importantly, it keeps your downstream parsing logic honest: either you get a contract-valid object, or you fail fast and recover.