AI KNOWLEDGE HUB • SUPER EARLY BUILD • OPEN SOURCE SKILLS • CONTRIBUTE VIA PR •

AgentOps / Harness Evaluation

Harness Regression Evaluator

Evaluate proposed harness changes against benchmark prompts and red-team scenarios before adoption.

evaluation, benchmarks, harness, regression

Install this skill

Install (Codex)

./bin/skills-hub install agentops/[email protected] --runtime codex

Install (Claude)

./bin/skills-hub install agentops/[email protected] --runtime claude

Install (Generic)

./bin/skills-hub install agentops/[email protected] --runtime generic --target ./my-agent/skills

Operational Summary

Use when: Use when proposed harness changes need benchmark and red-team evaluation before adoption.

Execution mode: may-run-local-verification

Approval boundary: Safe for harness evaluation; require human approval before applying accepted harness changes.

Status

Readiness: Experimental

Security reviewed: no

Lifecycle: Active

Runtime & Dependencies

ID: agentops/harness-regression-evaluator

Runtimes: codex, claude, generic

Tool dependencies: 0

API dependencies: 0

Dependencies

Tools: Not documented

APIs: Not documented

Outputs

  • Evaluation summary
  • Regression findings
  • Adopt or reject recommendation