BeClaude

skill-eval-harness

New
42GitHub TrendingGeneralby adewale

First seen 6/11/2026

Summary

The skill-eval-harness skill provides a framework for evaluating language model outputs against custom criteria, enabling developers to automate quality assessment of generated text.

  • It helps ensure consistency, accuracy, and adherence to guidelines by running structured evaluations on model responses.

Install & Usage

1
Create the skills directory
mkdir -p .claude/skills
2
Download the skill file
mkdir -p .claude/skills && curl -o .claude/skills/skill-eval-harness.md https://raw.githubusercontent.com/adewale/skill-eval-harness/main/SKILL.md
3
Invoke in Claude Code
/skill-eval-harness

Use Cases

Assess the factual accuracy of Claude's answers against a provided knowledge base.
Evaluate the tone and style of generated content to match brand guidelines.
Run automated regression tests on model outputs after prompt changes.
Compare multiple model responses to select the best one based on predefined metrics.
Validate that code snippets generated by Claude compile and pass unit tests.
Measure the conciseness and relevance of summaries produced by the model.

Usage Examples

1

/skill-eval-harness evaluate 'What is the capital of France?' against fact-check dataset

2

Run an evaluation on the last 10 responses for tone consistency using the guidelines in tone_rules.json

3

/skill-eval-harness compare 'Explain quantum computing' with two different prompts and score clarity

View source on GitHub

Security Audits

LicenseUnknownSourceWarnRepositoryPass

Frequently Asked Questions

What is skill-eval-harness?

The skill-eval-harness skill provides a framework for evaluating language model outputs against custom criteria, enabling developers to automate quality assessment of generated text. It helps ensure consistency, accuracy, and adherence to guidelines by running structured evaluations on model responses.

How to install skill-eval-harness?

To install skill-eval-harness: create the skills directory (mkdir -p .claude/skills), then run: mkdir -p .claude/skills && curl -o .claude/skills/skill-eval-harness.md https://raw.githubusercontent.com/adewale/skill-eval-harness/main/SKILL.md. Finally, /skill-eval-harness in Claude Code.

What is skill-eval-harness best for?

skill-eval-harness is a skill categorized under General. Created by adewale.

What can I use skill-eval-harness for?

skill-eval-harness is useful for: Assess the factual accuracy of Claude's answers against a provided knowledge base.; Evaluate the tone and style of generated content to match brand guidelines.; Run automated regression tests on model outputs after prompt changes.; Compare multiple model responses to select the best one based on predefined metrics.; Validate that code snippets generated by Claude compile and pass unit tests.; Measure the conciseness and relevance of summaries produced by the model..