Kinesthetic

Your agent does the wrong thing because it was never told the right thing

That's specification failure, and once the model is good enough, it's the failure that's left. Kinesthetic makes the specification a first-class asset: the rules, edge cases, and worked examples your team authors and corrects, and your agent retrieves from at inference time instead of from one bloated prompt. A spec you can operate is a durable asset that compounds.

01

From prompts to knowledge base retrieval

Instead of handing the model one constant prompt, migrate that information (instructions, tool descriptions, skills, plus annotated traces) into a separate knowledge base. At inference time, only the artifacts relevant to the current input are placed in context.

From prompt to retrieval
Input“I was charged twice for my subscription this month.”
Before · everything in the prompt
System prompt
+ refund policy · tone guide · 40 tools · edge cases · escalation rules…
≈ 4,200 tokens · sent on every call
With Kinesthetic · retrieve what's relevant
Retrieved artifactDuplicate-charge refund policyThe one artifact this input actually needs, pulled from the KB and placed in context.≈ 180 tokens · this call
Everything else stays in the knowledge base, out of context until an input calls for it.

Why move it out of the prompt

01
Better quality
Minimized context-rot risk. The model no longer reasons through what is and isn't relevant on every call; it works from a small, precise context.
02
Lower cost
Far fewer input tokens processed per call, because the bulk of the specification never enters context unless an input requires it.
03
Auditability
Every output traces back to the specific instructions retrieved to produce it. Specification failures triage to a single artifact.
04
Scalable scope
Scale the complexity of behavior your system handles by scaling the knowledge one agent can tractably leverage, without bloating any prompt.
05
Clean multi-tenant & multi-agent generalization
Knowledge is shared without duplication and overridden only where necessary, so tenants and agents diverge exactly as much as they need to, and no more.
Shared by default, scoped where it differs
Sharedescalation
“Escalate a dispute to a human agent when it exceeds the customer's tier limit.”
Applies to every tenant.
Tenant Bescalation
“Escalate only above $5,000. Resolve smaller disputes automatically.”
Applies to Tenant B only.
A request under Tenant Agets the shared artifact
A request under Tenant Bgets Tenant B's artifact · most specific scope wins
Each is a standalone natural-language instruction, authored on its own; neither references the other. They differ only in who they're scoped to. At retrieval the most specific scope in play takes precedence, so knowledge is shared by default and overridden only where a tenant actually needs it, with no partially-repeated prompts to keep in sync.

Triage a failure to the instruction behind it

Searching a prompt finds a string. It can't tell you which instruction actually drove a decision, and the one responsible often doesn't mention the symptom at all. When several instructions conflict, nothing tells you which one won. Provenance records exactly what was in context and what the model acted on.

One flagged response, traced to its source
KAgent · session 4471⚑ Flagged incorrect
“I've approved a full refund of $4,200 to your account. You'll see it in 3–5 business days.”
Context provenance6 artifacts retrieved · 2 decisive
retention-playbook v9
retention-playbook · Shared
“When a customer signals they'll cancel, resolve on the spot and favor goodwill over process.”
⚑ drove the decision
refund-eligibility v4
refund-eligibility · Shared
“Refunds over $1,000 require proof of error and manager approval.”
overridden
+ 4 more
tone-guide, kyc-checks, dispute-flow, account-lookup
Why search wouldn't find thisGrep “refund” and you land on refund-eligibility, which is correct, and conclude it's a model problem. The real cause is a retention instruction that never says “refund” quietly outranking it. Provenance shows both were in context and which one won, so you resolve the conflict once (scope retention below monetary actions) and validation replays every input where the two co-occur.

02

Knowledge retrieval, as a service

A managed service that gives your agent the best knowledge context for each input at inference time. Connect in a few lines, then let the engine find the right configuration automatically, or turn the knobs yourself.

ExplicitRetrieve the context yourself, then pass it to any model.
# retrieve-and-inject: you get the context, pass it to any model
from openai import OpenAI
import kinesthetic

client = OpenAI()
kb = kinesthetic.connect("banking-support")

# retrieve only the artifacts this input needs, as system messages
context = kb.retrieve(input=user_message)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        *context.as_messages(),   # e.g. refund-policy v3, tone-guide v7
        {"role": "user", "content": user_message},
    ],
)
SeamlessOr drop a decorator on your agent's completion call and change nothing else. The engine sees every call the way the agent does — the model, every tool, the full message history — and learns on its own when an input needs retrieval and what to inject, instead of you calling it explicitly.
# decorate your agent's completion call — once
from openai import OpenAI
import kinesthetic

client = OpenAI()

@kinesthetic.engine("banking-support")
def respond(messages, tools):
    # nothing else changes. on every call the engine reads the full
    # request and injects the right artifacts before it's sent
    return client.chat.completions.create(
        model="gpt-4o",         # whatever model the agent runs on
        tools=tools,            # every tool the agent can call
        messages=messages,      # the full prompt + message history
    )
ARetrieve and inject
Call the retrieval API yourself and add the artifacts to your context, or decorate your completion call (above) and let the engine inject on every call it judges needs it. You stay on your own model and harness.
BExpose as a tool
Register Kinesthetic as a tool or MCP server and let the agent call it on demand. Same retrieval engine, agent-driven instead of inline.

Onboarding is a one-time upload

Point us at your existing prompts or a legacy knowledge base and we ingest them once. Connect with whichever mode fits your harness, and you're retrieving. No re-platforming, no rewrite of your stack.

What teams hand-roll
Agentic RAG over raw content
Shell + embedding similarity + keyword match, or an off-the-shelf memory module. None of it is designed for agent specification data, so it over-retrieves and misses.
What Kinesthetic does
Retrieval built for specification
A knowledge engine that significantly outperforms hand-rolled approaches and generic memory, because it's purpose-built for the structure of agent instructions. Read the research →

03

Ergonomic interactions with your agent's specification

By disentangling the specification from the agent harness and engineering stack, into a structure intentionally designed for it, we can build powerful, user-friendly tools for the whole team to understand and shape agent behavior. Engineers, SMEs, and product people work on the same artifacts.

KinestheticProject · banking-supportKnowledge base
Knowledge basedescribe what you're investigating… Agentic
ArtifactTypeScopeUpdated
refund-policyDuplicate-charge handlingGround truthShared2d agov3
escalationEscalation policyGround truthTenant B5d agov9
refund-indexRetrieval map over refund docsSynthesizedShared2d agoauto
dispute-flowChargeback procedureGround truthAgent1w agov2
dispute-playbookDistilled from 3 artifactsSynthesizedAgent1w agoauto
Version history
refund-policy
human-authored · ground truth
v3 · current
Full reversal for duplicate charges.
M. Reyes · 2d ago
from flagged trace #4471
v2
Added partial-refund clause.
agent proposal · merged
v1
Imported from system prompt.
onboarding
offer a partial refund of the duplicate
+reverse the duplicate charge in full
↻ On merge2 synthesized artifacts regenerated from this change: refund-index, dispute-playbook

Artifact-level version control

The knowledge base is managed like a git repo. Artifact-level history, branching, pull requests, and automated merge guardrails all fit naturally, so a change to how the agent behaves is a reviewable diff, not a mystery edit to a prompt.

Query, and every other action

Search it like a document store, or run the same retrieval process your agent uses to surface relevant artifacts for an investigation. Every action can be done through code, through our agent, or by hand.

KinestheticProject · banking-supportAgent
SR
You
Stop auto-approving large refunds when a customer is threatening to cancel. Retention shouldn't override the eligibility check.
K
Kinesthetic agent
Two artifacts govern this. retention-playbook v9 currently outranks refund-eligibility v4 with no monetary bound. I'll scope retention below money movements and add an explicit precedence note:
retention-playbookproposed · v10
resolve on the spot and favor goodwill over process.
+favor goodwill over process, except actions that move money
+(refunds, credits), which defer to their own eligibility policy.
Review & applyEdit by handreplays inputs where both co-occur
Describe a change, or ask what governs a behavior…
AnywhereCLIAPIMCPAgentUI
Query
keyword · semantic · agentic
Search like a document store, or give our agent a freeform description of what you're figuring out: it kicks off multiple queries to recall anything relevant and synthesizes an answer directly.
Edit
manual · agent-proposed
Modify or create artifacts by hand. Or describe the net-new or modified behavior you want and let our agent propose the changes as a reviewable diff.
Annotate
never throw work away
Once you've gone through a trace, mark it in/correct and optionally add feedback, free-form or in any structure your team already uses. The work you do inspecting behavior is never lost.
Validate
across KB versions
Run sampled, synthesized, and user-defined inputs against different KB versions to surface behavioral changes, and run automated consistency checks to catch contradictory instructions.
Export
fully portable
Kinesthetic KBs are fully portable. Your data is never held hostage, sold, or used as training data. It stays yours, and you can take it with you at any time.

04

SOTA knowledge retrieval tech

A big part of how Kinesthetic makes agents work well is by synthesizing and enriching the artifacts an agent uses at inference time. These machine-authored data are created when Kinesthetic learns what the agent does in production, or through offline inspection, and they are always treated as derived from human-authored ground truth.

The authority gradient · KB → synthesized artifacts → context
Authority · humanGround-truth artifactsEdge-case docs, annotated traces. Authored and owned by people.
synthesized artifacts derive from ground truth
Derived · machineSynthesized artifactsIndexes, maps, and distilled procedural playbooks built for agent use.
retrieves
InferenceYour agentRetrieves from both, getting the right context for the scenario it faces.
Humans only handle model-invariant knowledge, namely how the task needs to be performed. The automated layer that adapts it to your current model regenerates synthesized artifacts underneath: edit ground truth, and anything derived that now conflicts is rebuilt.
Ground-truth artifacts
Human-authored, the source of truth
Unstructured documents, like how a specific edge case should be handled, plus annotated traces: in/correct, with feedback in any format.
Synthesized artifacts
Machine-derived, for agent use
An index or map over ground-truth artifacts, or a procedural guide distilled from several. All inspectable for correctness.

Why purpose-built retrieval wins

Failure modes of vanilla methods vs. optimal behavior
Vanilla failure
Over-retrieval
Semantically similar yet irrelevant documents get pulled in, crowding context with near-misses and re-introducing the rot you migrated away from.
Vanilla failure
Recall miss
The document that mattered is poorly matched by embedding or grep on its raw content, so it's never retrieved, and the agent acts without it.
The shared root cause
Both come from generic similarity search over raw content. Kinesthetic instead retrieves over specification data and the artifacts it derives from your ground truth, rather than treating your knowledge as an undifferentiated pile of text. How much that helps against these baselines is what our research measures. Read the research →

05

The flywheel: your system learns by doing

None of this is a one-time setup. Once your specification is a knowledge base your team operates, Kinesthetic becomes a closed-loop system that improves by running, and the work compounds. The same structure that makes failures auditable turns every correction into durable, proprietary value.

01
Run in production
The agent handles real inputs against the current specification, retrieving what each one needs.
02
Triage & annotate
Failures map cleanly to specific artifacts. Mark traces in/correct and leave feedback in any format.
03
Correct the spec
Experts edit ground-truth artifacts as reviewable diffs. Fixes are auditable, verifiable, and owned.
04
Regenerate & retrieve
Synthesized artifacts rebuild from the new ground truth, so the next inputs retrieve sharper context.
And it compounds. Every pass leaves the specification sharper and the data behind it richer, so the system gets measurably better the more it runs, not just the day you ship it.
The data layerEvery expert correction isn't just a prompt change. It also produces corrected agentic trace data: high-quality, domain-grounded examples of how the task should be done. It's the policy data you'd train a model on tomorrow, the kind no foundation model has and no competitor can replicate.
Compounds
Cost and capability gains
You stop depending on frontier inference to stay ahead, and get to decide what you actually need general frontier capability for, what to run on cheaper models, and what to own outright.
Accrues
A proprietary data asset
The corrections accumulate into domain-grounded data in a format you can actually operationalize, the kind of asset that no off-the-shelf setup is capturing for you today.
Defends
A lead that compounds
Your agent runs on a near-complete specification of how the task should be done, in a format that outlives model and harness churn. You touch it only to add or change behavior, never to re-coax each new model into using it well. That's how a system improves faster than the models underneath it, while a competitor's year-old prompt is already archaeology.

Where Kinesthetic fits

Not every AI system needs Kinesthetic. We built it for the class where specifying how the AI should behave exceeds the capabilities of vanilla knowledge implementations, and creates operational challenges for the humans shaping it.

AI system taxonomy

Literal wrapper
Model deployed in an environment with a basic system prompt that adds some awareness of your use case.
Simple workflow
One narrowly-scoped procedure, low input variability. Short set-and-forget prompting plus few-shot works well. May be chained or orchestrated by deterministic logic.
Dynamic AIKinesthetic
A wide range of inputs needs many domain-specific instructions. Everything-in-prompt causes context rot; vanilla retrieval underperforms.
Dynamic AI at scaleKinesthetic
Manual KB management becomes intractable; specialized tooling is required to inspect & safely modify knowledge. Annotations become the primary improvement interface.
You don't need to already be operating Dynamic AI to get value from Kinesthetic. If you're building for a task where it's the natural fit, but your system is still stuffed into a constraining architecture because of what's been possible so far, you can use Kinesthetic to start scaling the complexity of what your agent can learn to do.
Building a domain-specific AI system? Let's talk.Request Early Access