What are Claude Code Skills?

Claude Code Skills are reusable prompt templates stored as markdown files in your project's .claude/skills/ directory. They let you codify best practices and common workflows into slash commands that Claude follows consistently.

How do I create a Claude Code Skill?

Create a markdown file in .claude/skills/ with instructions for Claude. The filename becomes the command name — for example, .claude/skills/code-review.md can be invoked with /code-review during a Claude Code session.

Can I share skills with my team?

Yes. Skills stored in .claude/skills/ can be committed to version control so every team member has access to the same standardized workflows. You can also organize skills into subdirectories by category.

What is the difference between project and global skills?

Project skills (.claude/skills/) are specific to the current project and are committed to git. Global skills (~/.claude/skills/) are personal skills available across all your projects.

universal-data-acquisition-pipeline

New

GitHub TrendingGeneralby Pranjay-kumar

Trigger when the user wants to collect, structure, evaluate, crawl, extract, refresh, or build reusable data acquisition pipelines from websites, APIs, portals, files, or rendered apps. Use for dataset design, source classification, feasibility, endpoint discovery, authorized/owned-session scraping plans, Playwright fallback, source probing, pagination analysis, scraper/pipeline architecture, sample validation, refresh design, and output contracts. Do not trigger for ordinary browsing, exploitative access, credential theft, CAPTCHA solving, auth bypass, rate-limit bypass, or non-data tasks.

Community PluginView Source

Overview

Universal Data Acquisition Pipeline

Act as the router for a data acquisition skill tree. Classify the request, select the narrowest child skill, and keep outputs aligned with shared core contracts. Design robust, refreshable scraping and API pipelines that are honest about source access, reliability, compliance, cost, and data quality.

Do not scrape immediately. First classify the source access, prove that a reliable data path exists, design a reusable pipeline, validate a small sample, and require approval before any full run.

Skill Tree

Use the child skill that best matches the request:

•data-acquisition-core: shared contracts, access classes, compliance, scorecards, output schemas, and pipeline quality standards.
•data-acquisition-design: DatasetNeed, DatasetSpec, scope control, and "what data do we actually need?"
•data-acquisition-feasibility: feasibility scoring, source comparison, Green/Yellow/Red decisions, and approval gates.
•data-acquisition-discovery: endpoint discovery, public APIs, GraphQL, XHR/fetch, sitemaps, embedded JSON, and pagination probes.
•data-acquisition-browser: Playwright, rendered DOM, browser network capture, and authorized owned-session probes.
•data-acquisition-pipeline: production pipeline architecture, pipeline.yaml, raw/staged/normalized layers, quality gates, and run reports.
•data-acquisition-publish: probe-backed case studies and publishable/non-publishable result packaging.

Modes

Select the narrowest useful mode from the user's request. Default to dataset-design when the user is still unsure what data they need, and feasibility when they already named a dataset.

dataset-design: clarify the decision, entity grain, required fields, freshness, history, coverage, joins, exclusions, and uselessness criteria before source discovery.
feasibility: decide whether the requested dataset is collectible enough to justify a pipeline.
endpoint-discovery: hunt APIs, XHR/fetch routes, page-data, feeds, sitemaps, and embedded JSON.
pagination-limits: prove page size, cursor/offset depth, terminal behavior, caps, sort stability, and completeness ceiling.
source-comparison: compare official API, public XAPI, sitemap plus detail, embedded JSON, HTML, rendered DOM, and reject paths.
pipeline-design: convert known sources into a refreshable pipeline plan without broad collection.
sample-validation: run tiny probes and validate rows, fields, parsing, and diagnostics.
compliance-boundary: identify Green/Yellow/Red boundaries, stop conditions, and safer alternatives.
owned-session: use a user-provided, user-owned authenticated browser/session only for data the user is authorized to access; mark outputs non-public.
execution: collect only after explicit approval, with checkpoints, limits, validation, and incremental outputs.

Core Workflow

Every request must move through:

ModeSelection
SourceAccessClass
DatasetNeed
DatasetSpec
SourcePlan
EndpointPlan
HeaderProfile
ProbeResults
FeasibilityScorecard
DataAcquisitionMemo
FeasibilityReport
PipelineQualityPlan
PipelinePlan
SampleRows
ApprovalGate

Never return raw code alone. The user wants a decision and an engineering design: what access class applies, whether the data is collectible, how complete it can be, the cheapest reliable path, the trapdoors, the quality gates, and the repeatable pipeline design.

Reference Map

Load only the shared core references needed for the request from skills/data-acquisition-core/references/, or delegate mentally to the matching child skill:

•Overall process: skills/data-acquisition-core/references/workflow.md
•Mode selection: skills/data-acquisition-core/references/modes.md
•Source access and owned-session rules: skills/data-acquisition-core/references/source-access.md
•API/source discovery patterns: skills/data-acquisition-core/references/pattern-library.md
•Endpoint discovery: skills/data-acquisition-core/references/endpoint-discovery.md
•Probing and pagination: skills/data-acquisition-core/references/probing.md
•Playwright/rendered DOM: skills/data-acquisition-core/references/playwright-rendered-dom.md
•Warm sessions and execution adapters: skills/data-acquisition-core/references/execution-adapters.md
•Pipeline engineering: skills/data-acquisition-core/references/pipeline-engineering.md
•Feasibility scoring: skills/data-acquisition-core/references/feasibility-scoring.md
•Compliance: skills/data-acquisition-core/references/compliance-boundaries.md
•Source strategies: skills/data-acquisition-core/references/source-strategies.md
•Output contracts: skills/data-acquisition-core/references/output-contracts.md
•Examples: skills/data-acquisition-core/references/examples.md

Default Posture

•Prefer structured endpoints over HTML parsing when allowed by the source access class.
•Prefer endpoint templates, pagination params, and stable IDs over browser automation.
•Use Playwright/rendered DOM only after public APIs, feeds, sitemaps, embedded JSON, and static HTML fail or are insufficient.
•Treat every ask as due diligence before implementation: answer "should we do this?" before "how do we code it?"
•Treat vague "all data" requests as dataset-design problems before source discovery.
•Use normal browser-style headers only when needed for public unauthenticated responses.
•When a public page mints request context for public XHR/API calls, use Warm Session Capture: capture user-owned browser storage state and observed non-secret request templates locally, then replay tiny probes in that same authorized browser context.
•Detect rate limits and design within them using backoff, caching, checkpointing, sampling, and approval gates. Do not bypass rate limits or access controls.
•If cookies, credentials, or auth are involved, switch to owned-session or licensed_api, mark outputs non-public, avoid storing secrets, and require explicit approval.
•Stop escalation when a path requires auth bypass, credential extraction, CAPTCHA solving, fingerprint evasion, private third-party access, or rate-limit bypass.
•Before full execution, validate a small sample and present a clear approval gate.

Output Contract

Always produce these sections unless the user explicitly asks for a narrower artifact:

•DatasetSpec
•SourceAccessClass
•DatasetNeed
•SourcePlan
•EndpointPlan
•HeaderProfile
•ProbeResults
•FeasibilityScorecard
•DataAcquisitionMemo
•FeasibilityReport
•PipelineQualityPlan
•PipelinePlan
•SampleRows
•ApprovalGate

For implementation tasks, generated pipeline artifacts should include pipeline.yaml, report.json, sample output, logs/diagnostics, and runnable collection logic appropriate to the repo and user environment.

Install & Usage

Create the skills directory

mkdir -p .claude/skills

Download the skill file

mkdir -p .claude/skills && curl -o .claude/skills/universal-data-acquisition-pipeline.md https://raw.githubusercontent.com/Pranjay-kumar/universal-data-acquisition-pipeline-skill/main/SKILL.md

Invoke in Claude Code

/universal-data-acquisition-pipeline

View source on GitHub

apidesign

Frequently Asked Questions

What is universal-data-acquisition-pipeline?

How to install universal-data-acquisition-pipeline?

To install universal-data-acquisition-pipeline, create the .claude/skills directory in your project, then run the curl command to download the skill file. Once installed, invoke it in Claude Code with /universal-data-acquisition-pipeline.

What is universal-data-acquisition-pipeline best for?

universal-data-acquisition-pipeline is a community categorized under General. It is designed for: api, design. Created by Pranjay-kumar.