Your agent does the wrong thing because it was never told the right thing

That's specification failure, and once the model is good enough, it's the failure that's left. Kinesthetic makes the specification a first-class asset: the rules, edge cases, and worked examples your team authors and corrects, and your agent retrieves from at inference time instead of from one bloated prompt. A spec you can operate is a durable asset that compounds.

Request Early Access Read the thesis →

From prompts to knowledge base retrieval

Instead of handing the model one constant prompt, migrate that information (instructions, tool descriptions, skills, plus annotated traces) into a separate knowledge base. At inference time, only the artifacts relevant to the current input are placed in context.

From prompt to retrieval

Input“I was charged twice for my subscription this month.”

–Before · everything in the prompt

System prompt

+ refund policy · tone guide · 40 tools · edge cases · escalation rules…

≈ 4,200 tokens · sent on every call

✓With Kinesthetic · retrieve what's relevant

Retrieved artifactDuplicate-charge refund policyThe one artifact this input actually needs, pulled from the KB and placed in context.≈ 180 tokens · this call

Everything else stays in the knowledge base, out of context until an input calls for it.

Why move it out of the prompt

Better quality

Minimized context-rot risk. The model no longer reasons through what is and isn't relevant on every call; it works from a small, precise context.

Lower cost

Far fewer input tokens processed per call, because the bulk of the specification never enters context unless an input requires it.

Auditability

Every output traces back to the specific instructions retrieved to produce it. Specification failures triage to a single artifact.

Scalable scope

Scale the complexity of behavior your system handles by scaling the knowledge one agent can tractably leverage, without bloating any prompt.

Clean multi-tenant & multi-agent generalization

Knowledge is shared without duplication and overridden only where necessary, so tenants and agents diverge exactly as much as they need to, and no more.

Shared by default, scoped where it differs

Sharedescalation

“Escalate a dispute to a human agent when it exceeds the customer's tier limit.”

Applies to every tenant.

Tenant Bescalation

“Escalate only above $5,000. Resolve smaller disputes automatically.”

Applies to Tenant B only.

A request under Tenant A→gets the shared artifact

A request under Tenant B→gets Tenant B's artifact · most specific scope wins

Each is a standalone natural-language instruction, authored on its own; neither references the other. They differ only in who they're scoped to. At retrieval the most specific scope in play takes precedence, so knowledge is shared by default and overridden only where a tenant actually needs it, with no partially-repeated prompts to keep in sync.

Triage a failure to the instruction behind it

Searching a prompt finds a string. It can't tell you which instruction actually drove a decision, and the one responsible often doesn't mention the symptom at all. When several instructions conflict, nothing tells you which one won. Provenance records exactly what was in context and what the model acted on.

One flagged response, traced to its source

KAgent · session 4471⚑ Flagged incorrect

“I've approved a full refund of $4,200 to your account. You'll see it in 3–5 business days.”

↓traced to the knowledge that was in context, and what it acted on

Context provenance6 artifacts retrieved · 2 decisive

retention-playbook v9

retention-playbook · Shared

“When a customer signals they'll cancel, resolve on the spot and favor goodwill over process.”

⚑ drove the decision

refund-eligibility v4

refund-eligibility · Shared

“Refunds over $1,000 require proof of error and manager approval.”

overridden

+ 4 more

tone-guide, kyc-checks, dispute-flow, account-lookup

Why search wouldn't find thisGrep “refund” and you land on refund-eligibility, which is correct, and conclude it's a model problem. The real cause is a retention instruction that never says “refund” quietly outranking it. Provenance shows both were in context and which one won, so you resolve the conflict once (scope retention below monetary actions) and validation replays every input where the two co-occur.

Knowledge retrieval, as a service

A managed service that gives your agent the best knowledge context for each input at inference time. Connect in a few lines, then let the engine find the right configuration automatically, or turn the knobs yourself.

ExplicitRetrieve the context yourself, then pass it to any model.

# retrieve-and-inject: you get the context, pass it to any model
from openai import OpenAI
import kinesthetic

client = OpenAI()
kb = kinesthetic.connect("banking-support")

# retrieve only the artifacts this input needs, as system messages
context = kb.retrieve(input=user_message)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        *context.as_messages(),   # e.g. refund-policy v3, tone-guide v7
        {"role": "user", "content": user_message},
    ],
)

SeamlessOr drop a decorator on your agent's completion call and change nothing else. The engine sees every call the way the agent does — the model, every tool, the full message history — and learns on its own when an input needs retrieval and what to inject, instead of you calling it explicitly.

# decorate your agent's completion call — once
from openai import OpenAI
import kinesthetic

client = OpenAI()

@kinesthetic.engine("banking-support")
def respond(messages, tools):
    # nothing else changes. on every call the engine reads the full
    # request and injects the right artifacts before it's sent
    return client.chat.completions.create(
        model="gpt-4o",         # whatever model the agent runs on
        tools=tools,            # every tool the agent can call
        messages=messages,      # the full prompt + message history
    )

ARetrieve and inject

Call the retrieval API yourself and add the artifacts to your context, or decorate your completion call (above) and let the engine inject on every call it judges needs it. You stay on your own model and harness.

BExpose as a tool

Register Kinesthetic as a tool or MCP server and let the agent call it on demand. Same retrieval engine, agent-driven instead of inline.

Onboarding is a one-time upload

Point us at your existing prompts or a legacy knowledge base and we ingest them once. Connect with whichever mode fits your harness, and you're retrieving. No re-platforming, no rewrite of your stack.

What teams hand-roll

Agentic RAG over raw content

Shell + embedding similarity + keyword match, or an off-the-shelf memory module. None of it is designed for agent specification data, so it over-retrieves and misses.

What Kinesthetic does

Retrieval built for specification

A knowledge engine that significantly outperforms hand-rolled approaches and generic memory, because it's purpose-built for the structure of agent instructions. Read the research →

Ergonomic interactions with your agent's specification

By disentangling the specification from the agent harness and engineering stack, into a structure intentionally designed for it, we can build powerful, user-friendly tools for the whole team to understand and shape agent behavior. Engineers, SMEs, and product people work on the same artifacts.

KinestheticProject · banking-supportKnowledge base

Knowledge basedescribe what you're investigating… Agentic

Artifact	Type	Scope	Updated
refund-policyDuplicate-charge handling	Ground truth	Shared	2d agov3
escalationEscalation policy	Ground truth	Tenant B	5d agov9
refund-indexRetrieval map over refund docs	Synthesized	Shared	2d agoauto
dispute-flowChargeback procedure	Ground truth	Agent	1w agov2
dispute-playbookDistilled from 3 artifacts	Synthesized	Agent	1w agoauto

Version history

refund-policy

human-authored · ground truth

v3 · current

Full reversal for duplicate charges.

M. Reyes · 2d ago

from flagged trace #4471

Added partial-refund clause.

agent proposal · merged

Imported from system prompt.

onboarding

−offer a partial refund of the duplicate

+reverse the duplicate charge in full

↻ On merge2 synthesized artifacts regenerated from this change: refund-index, dispute-playbook

Artifact-level version control

The knowledge base is managed like a git repo. Artifact-level history, branching, pull requests, and automated merge guardrails all fit naturally, so a change to how the agent behaves is a reviewable diff, not a mystery edit to a prompt.

Query, and every other action

Search it like a document store, or run the same retrieval process your agent uses to surface relevant artifacts for an investigation. Every action can be done through code, through our agent, or by hand.

KinestheticProject · banking-supportAgent

You

Stop auto-approving large refunds when a customer is threatening to cancel. Retention shouldn't override the eligibility check.

Kinesthetic agent

Two artifacts govern this. retention-playbook v9 currently outranks refund-eligibility v4 with no monetary bound. I'll scope retention below money movements and add an explicit precedence note:

retention-playbookproposed · v10

−resolve on the spot and favor goodwill over process.

+favor goodwill over process, except actions that move money

+(refunds, credits), which defer to their own eligibility policy.

Review & applyEdit by handreplays inputs where both co-occur

Describe a change, or ask what governs a behavior…↑

AnywhereCLIAPIMCPAgentUI

Query

keyword · semantic · agentic

Search like a document store, or give our agent a freeform description of what you're figuring out: it kicks off multiple queries to recall anything relevant and synthesizes an answer directly.

Edit

manual · agent-proposed

Modify or create artifacts by hand. Or describe the net-new or modified behavior you want and let our agent propose the changes as a reviewable diff.

Annotate

never throw work away

Once you've gone through a trace, mark it in/correct and optionally add feedback, free-form or in any structure your team already uses. The work you do inspecting behavior is never lost.

Validate

across KB versions

Run sampled, synthesized, and user-defined inputs against different KB versions to surface behavioral changes, and run automated consistency checks to catch contradictory instructions.

Export

fully portable

Kinesthetic KBs are fully portable. Your data is never held hostage, sold, or used as training data. It stays yours, and you can take it with you at any time.

SOTA knowledge retrieval tech

A big part of how Kinesthetic makes agents work well is by synthesizing and enriching the artifacts an agent uses at inference time. These machine-authored data are created when Kinesthetic learns what the agent does in production, or through offline inspection, and they are always treated as derived from human-authored ground truth.

The authority gradient · KB → synthesized artifacts → context

Authority · humanGround-truth artifactsEdge-case docs, annotated traces. Authored and owned by people.

↓ synthesized artifacts derive from ground truth

Derived · machineSynthesized artifactsIndexes, maps, and distilled procedural playbooks built for agent use.

→retrieves

InferenceYour agentRetrieves from both, getting the right context for the scenario it faces.

Humans only handle model-invariant knowledge, namely how the task needs to be performed. The automated layer that adapts it to your current model regenerates synthesized artifacts underneath: edit ground truth, and anything derived that now conflicts is rebuilt.

Ground-truth artifacts

Human-authored, the source of truth

Unstructured documents, like how a specific edge case should be handled, plus annotated traces: in/correct, with feedback in any format.

Synthesized artifacts

Machine-derived, for agent use

An index or map over ground-truth artifacts, or a procedural guide distilled from several. All inspectable for correctness.

Why purpose-built retrieval wins

Failure modes of vanilla methods vs. optimal behavior

✕Vanilla failure

Over-retrieval

Semantically similar yet irrelevant documents get pulled in, crowding context with near-misses and re-introducing the rot you migrated away from.

✕Vanilla failure

Recall miss

The document that mattered is poorly matched by embedding or grep on its raw content, so it's never retrieved, and the agent acts without it.

→

The shared root cause

Both come from generic similarity search over raw content. Kinesthetic instead retrieves over specification data and the artifacts it derives from your ground truth, rather than treating your knowledge as an undifferentiated pile of text. How much that helps against these baselines is what our research measures. Read the research →

The flywheel: your system learns by doing

None of this is a one-time setup. Once your specification is a knowledge base your team operates, Kinesthetic becomes a closed-loop system that improves by running, and the work compounds. The same structure that makes failures auditable turns every correction into durable, proprietary value.

Run in production

The agent handles real inputs against the current specification, retrieving what each one needs.

→

Triage & annotate

Failures map cleanly to specific artifacts. Mark traces in/correct and leave feedback in any format.

→

Correct the spec

Experts edit ground-truth artifacts as reviewable diffs. Fixes are auditable, verifiable, and owned.

→

Regenerate & retrieve

Synthesized artifacts rebuild from the new ground truth, so the next inputs retrieve sharper context.

↺And it compounds. Every pass leaves the specification sharper and the data behind it richer, so the system gets measurably better the more it runs, not just the day you ship it.

The data layerEvery expert correction isn't just a prompt change. It also produces corrected agentic trace data: high-quality, domain-grounded examples of how the task should be done. It's the policy data you'd train a model on tomorrow, the kind no foundation model has and no competitor can replicate.

→Compounds

Cost and capability gains

You stop depending on frontier inference to stay ahead, and get to decide what you actually need general frontier capability for, what to run on cheaper models, and what to own outright.

→Accrues

A proprietary data asset

The corrections accumulate into domain-grounded data in a format you can actually operationalize, the kind of asset that no off-the-shelf setup is capturing for you today.

→Defends

A lead that compounds

Your agent runs on a near-complete specification of how the task should be done, in a format that outlives model and harness churn. You touch it only to add or change behavior, never to re-coax each new model into using it well. That's how a system improves faster than the models underneath it, while a competitor's year-old prompt is already archaeology.

Where Kinesthetic fits

Not every AI system needs Kinesthetic. We built it for the class where specifying how the AI should behave exceeds the capabilities of vanilla knowledge implementations, and creates operational challenges for the humans shaping it.

AI system taxonomy

Literal wrapper

Model deployed in an environment with a basic system prompt that adds some awareness of your use case.

Simple workflow

One narrowly-scoped procedure, low input variability. Short set-and-forget prompting plus few-shot works well. May be chained or orchestrated by deterministic logic.

Dynamic AIKinesthetic

A wide range of inputs needs many domain-specific instructions. Everything-in-prompt causes context rot; vanilla retrieval underperforms.

Dynamic AI at scaleKinesthetic

Manual KB management becomes intractable; specialized tooling is required to inspect & safely modify knowledge. Annotations become the primary improvement interface.

You don't need to already be operating Dynamic AI to get value from Kinesthetic. If you're building for a task where it's the natural fit, but your system is still stuffed into a constraining architecture because of what's been possible so far, you can use Kinesthetic to start scaling the complexity of what your agent can learn to do.

Building a domain-specific AI system? Let's talk.Request Early Access