xCoder // the shock collar for your coding agent

Skills are scoped guidance bundles. xCoder picks the right ones for the task at hand, threads them through the host agent's hooks, and scores whether the agent actually followed them.

Three tracks

Track	Question it answers	Surface
A — Activation	Which skills should be active for this task, with what confidence and why?	`xc skills detect`
B — Application	How does an active skill actually reach the host agent?	`xc flow attach`
C — Adherence	Did the agent follow the skills it was given?	`xc skills report`

Track A — Task-aware activation

Skill activation is no longer just Skill.detect(cwd). Each skill ships an optional match(ctx) that reads a per-task TaskContext (branch slug, files touched, goal text, recent commits, issue labels) and returns a confidence-scored SkillMatch.

interface TaskContext {
  branch: string                  // 'feat/auth-flow'
  integrationBranch: string       // 'main'
  goal?: string                   // 'wire up Google OAuth callback'
  filesTouched: readonly string[] // diff vs. integration branch
  recentCommits: readonly string[]// last 5 subjects
  issue?: { number: number; title: string; labels: readonly string[] }
}

interface SkillMatch {
  confidence: number              // 0..1
  reason: string                  // human-readable, ≤ 80 chars
  signal: 'branch' | 'files' | 'goal' | 'commits' | 'issue' | 'project' | 'multi'
}

Rule scorer

The default scorer is pure code — no model call. It calls each skill's match(ctx) and falls back to project-static detect(cwd) at confidence 0.4 when match is absent. Activations are sorted by confidence; explicit --skill pins land at 1.0.

bash

xc skills detect --goal "redesign the dashboard hero with motion"
# TaskContext
#   branch            feat/m0.7d-skills-docs-ci-bench
#   integrationBranch main
#   goal              redesign the dashboard hero with motion
#   filesTouched      6 files
#   recentCommits     5 commits
#
# Scored skills
#   ui                 0.92  [rule|files]  6 UI files touched (e.g. apps/web/src/Hero.tsx)
#   project-management 0.55  [rule|goal]   goal text mentions planning work

Fixture-based bench

The repo ships a JSONL fixture under packages/xcoder/src/skills/__fixtures__/task-skill-truth.jsonl. Each row labels the expected activations for a synthesised task. xc skills bench runs the rule scorer against every row and exits non-zero on any miss.

Metric	Target	Status
Macro F1	≥ 0.75	1.000
Recall on must-activate	≥ 0.95	1.000
Precision on must-not-activate	= 1.0	1.000
p95 row latency	≤ 50 ms	~2 ms

bench is a CI gate

xc skills benchruns on every PR. Adding a row that the rule scorer can't classify (without label noise) blocks merge — the floor is intentional.

Track B — Hook-based application

Each active skill reaches the host agent through hooks. xc flow attach <driver> compiles the active skill set into a driver-native artifact:

Driver	Skill-injection channel
claude-code	`SessionStart` hook concatenates per-skill files
cursor	`.cursor/rules/xcoder-skills.md`
codex	`.codex/instructions.md`
opencode	`.opencode/AGENTS.md`

Per-skill prompt files always land at .xcoder/cache/skills/<id>.md so every host has a stable path; the driver-specific bundle is a deterministic concatenation.

bash

xc flow attach claude-code
# ✓ wrote .claude/settings.json
# ✓ wrote .xcoder/cache/skills/ui.md
# ✓ wrote .xcoder/cache/skills/qa.md
# ✓ wrote .xcoder/cache/active-skills.json
# adherence observer wired for: ui, qa

xc flow attach claude-code --skill ui --skill qa     # force-include
xc flow attach claude-code --no-skills               # bypass entirely

Skill policies (PreToolUse gates)

A skill can ship its own policies?: SkillPolicy[] — same shape as flow-engine's Policy. Skill policies merge with flow policies in execution order: flow policies first (kernel-level gates), skill policies second (domain-specific guidance). A skill policy cannot override a flow policy that returned { ok: false } — flow gates win.

Track C — Adherence tracking

A skill that fires but isn't followed is noise. Skills declare optional AdherenceChecks that observe agent actions and emit verdicts. Observation, not enforcement — verdicts log events but never block tool calls.

interface AdherenceCheck {
  id: string                                    // 'ui-edits-include-a11y'
  description: string
  observe: 'tool-call' | 'commit' | 'phase-exit'
  evaluate(observation: ObservedAction, ctx: TaskContext): AdherenceVerdict
}

interface AdherenceVerdict {
  status: 'adhered' | 'violated' | 'not-applicable'
  reason: string
}

Bundled checks

Skill	Check	Trigger	Verdict
ui	`ui-interactive-includes-accessibility-attrs`	Edit/Write/MultiEdit on .tsx/.jsx/.vue/.svelte/.astro	Interactive elements (button/a/input/select/textarea) must carry an aria-/role/aria-label/htmlFor/for attribute
qa	`qa-tests-pair-with-new-source`	git commit	New source files must be accompanied by test/spec files in the same diff

Reading the rate

bash

xc skills report
# xc skills report
# ────────────────────────────────────────────────────────────
#   skill                passed  failed  rate
#   ui                       12       1  0.92
#   qa                        5       3  0.62
#     2× qa-tests-pair-with-new-source
#
#   16 adherence events across 2 skills

xc flow show
# ...
# Active skills (5)
#   ✓ ui                 adherence 0.92 (12/13)
#   ✓ qa                 adherence 0.62 (5/8)

CLI surface

Command	Purpose
`xc skills`	Default: list available skills + project-static activation
`xc skills detect [--goal ...]`	Dry-run task-aware scorer for the current cwd
`xc skills bench [--fixture path] [--json]`	Run the fixture suite — exits non-zero on any target miss
`xc skills report [--dir path] [--json]`	Per-session adherence rollup from events.jsonl
`xc i --skill <id>`	Force a skill on for an interactive session
`xc i --no-skill <id> --reason <...>`	Force a skill off (logged as a bypass; M0.7+)
`xc flow attach <driver> [--skill ...] [--no-skills]`	Compile active skills into a driver-native hook artifact

Events on the wire

skills.activated — task-entry computes the activation set
skills.changed — phase boundary recompute, delta vs. previous
skills.drift — heartbeat re-eval, tool pattern doesn't match active skills (informational)
skill.adherence.passed — a check returned adhered
skill.adherence.failed — a check returned violated

not-applicable verdicts emit no events

They're a noise filter — the check ran but the action didn't apply. Recording a tally of those would obscure real signals.

Why measure

A consistently-violated skill is a signal for the rule scorer to lower its confidence over time (M0.8 mining), or for the user to drop the skill manually.
Adherence is the input to skill sourcing (M0.8) — the system can only propose new skills when it can show existing ones aren't covering observed patterns.
It closes the "did this work?" loop. Without it, skill activation is theatre.

Policies →

How block | warn | off severity works for flow + skill policies.

Hooks →

What `xc flow attach` writes for each driver.

CLI reference →

Every xc command.

Skills

Three tracks

Track A — Task-aware activation

Rule scorer

Fixture-based bench

Track B — Hook-based application

Skill policies (PreToolUse gates)

Track C — Adherence tracking

Bundled checks

Reading the rate

CLI surface

Events on the wire

Why measure

Next