Three tracks
| Track | Question it answers | Surface |
|---|---|---|
| A — Activation | Which skills should be active for this task, with what confidence and why? | xc skills detect |
| B — Application | How does an active skill actually reach the host agent? | xc flow attach |
| C — Adherence | Did the agent follow the skills it was given? | xc skills report |
Track A — Task-aware activation
Skill activation is no longer just Skill.detect(cwd). Each skill ships an optional match(ctx) that reads a per-task TaskContext (branch slug, files touched, goal text, recent commits, issue labels) and returns a confidence-scored SkillMatch.
interface TaskContext {
branch: string // 'feat/auth-flow'
integrationBranch: string // 'main'
goal?: string // 'wire up Google OAuth callback'
filesTouched: readonly string[] // diff vs. integration branch
recentCommits: readonly string[]// last 5 subjects
issue?: { number: number; title: string; labels: readonly string[] }
}
interface SkillMatch {
confidence: number // 0..1
reason: string // human-readable, ≤ 80 chars
signal: 'branch' | 'files' | 'goal' | 'commits' | 'issue' | 'project' | 'multi'
}Rule scorer
The default scorer is pure code — no model call. It calls each skill's match(ctx) and falls back to project-static detect(cwd) at confidence 0.4 when match is absent. Activations are sorted by confidence; explicit --skill pins land at 1.0.
xc skills detect --goal "redesign the dashboard hero with motion"
# TaskContext
# branch feat/m0.7d-skills-docs-ci-bench
# integrationBranch main
# goal redesign the dashboard hero with motion
# filesTouched 6 files
# recentCommits 5 commits
#
# Scored skills
# ui 0.92 [rule|files] 6 UI files touched (e.g. apps/web/src/Hero.tsx)
# project-management 0.55 [rule|goal] goal text mentions planning workFixture-based bench
The repo ships a JSONL fixture under packages/xcoder/src/skills/__fixtures__/task-skill-truth.jsonl. Each row labels the expected activations for a synthesised task. xc skills bench runs the rule scorer against every row and exits non-zero on any miss.
| Metric | Target | Status |
|---|---|---|
| Macro F1 | ≥ 0.75 | 1.000 |
| Recall on must-activate | ≥ 0.95 | 1.000 |
| Precision on must-not-activate | = 1.0 | 1.000 |
| p95 row latency | ≤ 50 ms | ~2 ms |
bench is a CI gate
xc skills benchruns on every PR. Adding a row that the rule scorer can't classify (without label noise) blocks merge — the floor is intentional.
Track B — Hook-based application
Each active skill reaches the host agent through hooks. xc flow attach <driver> compiles the active skill set into a driver-native artifact:
| Driver | Skill-injection channel |
|---|---|
| claude-code | SessionStart hook concatenates per-skill files |
| cursor | .cursor/rules/xcoder-skills.md |
| codex | .codex/instructions.md |
| opencode | .opencode/AGENTS.md |
Per-skill prompt files always land at .xcoder/cache/skills/<id>.md so every host has a stable path; the driver-specific bundle is a deterministic concatenation.
xc flow attach claude-code
# ✓ wrote .claude/settings.json
# ✓ wrote .xcoder/cache/skills/ui.md
# ✓ wrote .xcoder/cache/skills/qa.md
# ✓ wrote .xcoder/cache/active-skills.json
# adherence observer wired for: ui, qa
xc flow attach claude-code --skill ui --skill qa # force-include
xc flow attach claude-code --no-skills # bypass entirelySkill policies (PreToolUse gates)
A skill can ship its own policies?: SkillPolicy[] — same shape as flow-engine's Policy. Skill policies merge with flow policies in execution order: flow policies first (kernel-level gates), skill policies second (domain-specific guidance). A skill policy cannot override a flow policy that returned { ok: false } — flow gates win.
Track C — Adherence tracking
A skill that fires but isn't followed is noise. Skills declare optional AdherenceChecks that observe agent actions and emit verdicts. Observation, not enforcement — verdicts log events but never block tool calls.
interface AdherenceCheck {
id: string // 'ui-edits-include-a11y'
description: string
observe: 'tool-call' | 'commit' | 'phase-exit'
evaluate(observation: ObservedAction, ctx: TaskContext): AdherenceVerdict
}
interface AdherenceVerdict {
status: 'adhered' | 'violated' | 'not-applicable'
reason: string
}Bundled checks
| Skill | Check | Trigger | Verdict |
|---|---|---|---|
| ui | ui-interactive-includes-accessibility-attrs | Edit/Write/MultiEdit on .tsx/.jsx/.vue/.svelte/.astro | Interactive elements (button/a/input/select/textarea) must carry an aria-/role/aria-label/htmlFor/for attribute |
| qa | qa-tests-pair-with-new-source | git commit | New source files must be accompanied by test/spec files in the same diff |
Reading the rate
xc skills report
# xc skills report
# ────────────────────────────────────────────────────────────
# skill passed failed rate
# ui 12 1 0.92
# qa 5 3 0.62
# 2× qa-tests-pair-with-new-source
#
# 16 adherence events across 2 skills
xc flow show
# ...
# Active skills (5)
# ✓ ui adherence 0.92 (12/13)
# ✓ qa adherence 0.62 (5/8)CLI surface
| Command | Purpose |
|---|---|
xc skills | Default: list available skills + project-static activation |
xc skills detect [--goal ...] | Dry-run task-aware scorer for the current cwd |
xc skills bench [--fixture path] [--json] | Run the fixture suite — exits non-zero on any target miss |
xc skills report [--dir path] [--json] | Per-session adherence rollup from events.jsonl |
xc i --skill <id> | Force a skill on for an interactive session |
xc i --no-skill <id> --reason <...> | Force a skill off (logged as a bypass; M0.7+) |
xc flow attach <driver> [--skill ...] [--no-skills] | Compile active skills into a driver-native hook artifact |
Events on the wire
skills.activated— task-entry computes the activation setskills.changed— phase boundary recompute, delta vs. previousskills.drift— heartbeat re-eval, tool pattern doesn't match active skills (informational)skill.adherence.passed— a check returnedadheredskill.adherence.failed— a check returnedviolated
not-applicable verdicts emit no events
They're a noise filter — the check ran but the action didn't apply. Recording a tally of those would obscure real signals.
Why measure
- A consistently-violated skill is a signal for the rule scorer to lower its confidence over time (M0.8 mining), or for the user to drop the skill manually.
- Adherence is the input to skill sourcing (M0.8) — the system can only propose new skills when it can show existing ones aren't covering observed patterns.
- It closes the "did this work?" loop. Without it, skill activation is theatre.