Skills

Skills are scoped guidance bundles. xCoder picks the right ones for the task at hand, threads them through the host agent's hooks, and scores whether the agent actually followed them.

Three tracks

TrackQuestion it answersSurface
A — ActivationWhich skills should be active for this task, with what confidence and why?xc skills detect
B — ApplicationHow does an active skill actually reach the host agent?xc flow attach
C — AdherenceDid the agent follow the skills it was given?xc skills report

Track A — Task-aware activation

Skill activation is no longer just Skill.detect(cwd). Each skill ships an optional match(ctx) that reads a per-task TaskContext (branch slug, files touched, goal text, recent commits, issue labels) and returns a confidence-scored SkillMatch.

ts
interface TaskContext {
  branch: string                  // 'feat/auth-flow'
  integrationBranch: string       // 'main'
  goal?: string                   // 'wire up Google OAuth callback'
  filesTouched: readonly string[] // diff vs. integration branch
  recentCommits: readonly string[]// last 5 subjects
  issue?: { number: number; title: string; labels: readonly string[] }
}

interface SkillMatch {
  confidence: number              // 0..1
  reason: string                  // human-readable, ≤ 80 chars
  signal: 'branch' | 'files' | 'goal' | 'commits' | 'issue' | 'project' | 'multi'
}

Rule scorer

The default scorer is pure code — no model call. It calls each skill's match(ctx) and falls back to project-static detect(cwd) at confidence 0.4 when match is absent. Activations are sorted by confidence; explicit --skill pins land at 1.0.

bash
xc skills detect --goal "redesign the dashboard hero with motion"
# TaskContext
#   branch            feat/m0.7d-skills-docs-ci-bench
#   integrationBranch main
#   goal              redesign the dashboard hero with motion
#   filesTouched      6 files
#   recentCommits     5 commits
#
# Scored skills
#   ui                 0.92  [rule|files]  6 UI files touched (e.g. apps/web/src/Hero.tsx)
#   project-management 0.55  [rule|goal]   goal text mentions planning work

Fixture-based bench

The repo ships a JSONL fixture under packages/xcoder/src/skills/__fixtures__/task-skill-truth.jsonl. Each row labels the expected activations for a synthesised task. xc skills bench runs the rule scorer against every row and exits non-zero on any miss.

MetricTargetStatus
Macro F1≥ 0.751.000
Recall on must-activate≥ 0.951.000
Precision on must-not-activate= 1.01.000
p95 row latency≤ 50 ms~2 ms

bench is a CI gate

xc skills benchruns on every PR. Adding a row that the rule scorer can't classify (without label noise) blocks merge — the floor is intentional.

Track B — Hook-based application

Each active skill reaches the host agent through hooks. xc flow attach <driver> compiles the active skill set into a driver-native artifact:

DriverSkill-injection channel
claude-codeSessionStart hook concatenates per-skill files
cursor.cursor/rules/xcoder-skills.md
codex.codex/instructions.md
opencode.opencode/AGENTS.md

Per-skill prompt files always land at .xcoder/cache/skills/<id>.md so every host has a stable path; the driver-specific bundle is a deterministic concatenation.

bash
xc flow attach claude-code
# ✓ wrote .claude/settings.json
# ✓ wrote .xcoder/cache/skills/ui.md
# ✓ wrote .xcoder/cache/skills/qa.md
# ✓ wrote .xcoder/cache/active-skills.json
# adherence observer wired for: ui, qa

xc flow attach claude-code --skill ui --skill qa     # force-include
xc flow attach claude-code --no-skills               # bypass entirely

Skill policies (PreToolUse gates)

A skill can ship its own policies?: SkillPolicy[] — same shape as flow-engine's Policy. Skill policies merge with flow policies in execution order: flow policies first (kernel-level gates), skill policies second (domain-specific guidance). A skill policy cannot override a flow policy that returned { ok: false } — flow gates win.

Track C — Adherence tracking

A skill that fires but isn't followed is noise. Skills declare optional AdherenceChecks that observe agent actions and emit verdicts. Observation, not enforcement — verdicts log events but never block tool calls.

ts
interface AdherenceCheck {
  id: string                                    // 'ui-edits-include-a11y'
  description: string
  observe: 'tool-call' | 'commit' | 'phase-exit'
  evaluate(observation: ObservedAction, ctx: TaskContext): AdherenceVerdict
}

interface AdherenceVerdict {
  status: 'adhered' | 'violated' | 'not-applicable'
  reason: string
}

Bundled checks

SkillCheckTriggerVerdict
uiui-interactive-includes-accessibility-attrsEdit/Write/MultiEdit on .tsx/.jsx/.vue/.svelte/.astroInteractive elements (button/a/input/select/textarea) must carry an aria-/role/aria-label/htmlFor/for attribute
qaqa-tests-pair-with-new-sourcegit commitNew source files must be accompanied by test/spec files in the same diff

Reading the rate

bash
xc skills report
# xc skills report
# ────────────────────────────────────────────────────────────
#   skill                passed  failed  rate
#   ui                       12       1  0.92
#   qa                        5       3  0.62
#     2× qa-tests-pair-with-new-source
#
#   16 adherence events across 2 skills

xc flow show
# ...
# Active skills (5)
#   ✓ ui                 adherence 0.92 (12/13)
#   ✓ qa                 adherence 0.62 (5/8)

CLI surface

CommandPurpose
xc skillsDefault: list available skills + project-static activation
xc skills detect [--goal ...]Dry-run task-aware scorer for the current cwd
xc skills bench [--fixture path] [--json]Run the fixture suite — exits non-zero on any target miss
xc skills report [--dir path] [--json]Per-session adherence rollup from events.jsonl
xc i --skill <id>Force a skill on for an interactive session
xc i --no-skill <id> --reason <...>Force a skill off (logged as a bypass; M0.7+)
xc flow attach <driver> [--skill ...] [--no-skills]Compile active skills into a driver-native hook artifact

Events on the wire

  • skills.activated — task-entry computes the activation set
  • skills.changed — phase boundary recompute, delta vs. previous
  • skills.drift — heartbeat re-eval, tool pattern doesn't match active skills (informational)
  • skill.adherence.passed — a check returned adhered
  • skill.adherence.failed — a check returned violated

not-applicable verdicts emit no events

They're a noise filter — the check ran but the action didn't apply. Recording a tally of those would obscure real signals.

Why measure

  1. A consistently-violated skill is a signal for the rule scorer to lower its confidence over time (M0.8 mining), or for the user to drop the skill manually.
  2. Adherence is the input to skill sourcing (M0.8) — the system can only propose new skills when it can show existing ones aren't covering observed patterns.
  3. It closes the "did this work?" loop. Without it, skill activation is theatre.

Next