harness-forge
NewTurn Claude Code into its own Meta-Harness — a skill that evolves the scaffolding around a fixed model (memory, retrieval, context, prompts) via a native propose→score→Pareto loop. Native reimplementation of Meta-Harness (Lee et al. 2026).
Summary
This skill transforms Claude Code into a meta-harness that automatically evolves its own scaffolding—memory, retrieval, context, and prompts—through a propose-score-Pareto loop.
- It reimplements the Meta-Harness approach (Lee et al.
- 2026), enabling developers to continuously optimize their AI workflow without manual tuning.
Install & Usage
mkdir -p .claude/skillsAdd the configuration to .claude/skills/harness-forge.md
/harness-forgeUse Cases
Usage Examples
/harness-forge propose --objective accuracy --constraints token_budget:4000
Run a Pareto optimization loop to find the best retrieval top-k and prompt template for my current repo.
Score the current harness configuration against a set of test queries and propose an improved variant.
Security Audits
Frequently Asked Questions
What is harness-forge?
This skill transforms Claude Code into a meta-harness that automatically evolves its own scaffolding—memory, retrieval, context, and prompts—through a propose-score-Pareto loop. It reimplements the Meta-Harness approach (Lee et al. 2026), enabling developers to continuously optimize their AI workflow without manual tuning.
How to install harness-forge?
To install harness-forge: create the skills directory (mkdir -p .claude/skills), then add the config to .claude/skills/harness-forge.md. Finally, /harness-forge in Claude Code.
What is harness-forge best for?
harness-forge is a community categorized under General. Created by 001TMF.
What can I use harness-forge for?
harness-forge is useful for: Automatically refine the system prompt and context window strategy to improve response relevance for a specific project.; Evolve retrieval-augmented generation (RAG) parameters like chunk size and embedding model based on task performance feedback.; Optimize memory management policies (e.g., summarization triggers, forgetting curves) to balance recall and token usage.; Discover Pareto-optimal configurations for multi-objective trade-offs between response accuracy, latency, and cost.; Continuously adapt the skill's own scaffolding as the codebase grows, preventing degradation in assistant performance.; Benchmark and compare different prompt engineering strategies (e.g., chain-of-thought vs. direct answer) under real usage..