What are Claude Code Skills?

Claude Code Skills are reusable prompt templates stored as markdown files in your project's .claude/skills/ directory. They let you codify best practices and common workflows into slash commands that Claude follows consistently.

How do I create a Claude Code Skill?

Create a markdown file in .claude/skills/ with instructions for Claude. The filename becomes the command name — for example, .claude/skills/code-review.md can be invoked with /code-review during a Claude Code session.

Can I share skills with my team?

Yes. Skills stored in .claude/skills/ can be committed to version control so every team member has access to the same standardized workflows. You can also organize skills into subdirectories by category.

What is the difference between project and global skills?

Project skills (.claude/skills/) are specific to the current project and are committed to git. Global skills (~/.claude/skills/) are personal skills available across all your projects.

scanned-pdf-to-markdown

New

1GitHub TrendingDocumentationby chaoweiku52519

Convert scanned image PDFs (no text layer) to structured Markdown via local OCR; spec-book profile for coding guidelines.

Community PluginView Source

Overview

Scanned PDF → Markdown

Convert scanned image PDFs (printer/scanner books, no text layer) into structured Markdown for specification documents (spec-book profile).

When to use

•User asks to convert scanned PDF / OCR book / coding guideline to Markdown
•PDF has no extractable text layer (image-only pages)
•Documents with rule tags like 【1.1.1】, 【级别】, 【反例】, 【正例】

Do not use on PDFs with a text layer — use pdfminer or markitdown instead.

Setup (once per environment)

Install Python dependencies:

bash

python -m pip install -r {baseDir}/scripts/requirements.txt

Stack: pypdfium2 (render), rapidocr-onnxruntime (OCR), pdfminer.six (text-layer detection).

Output naming

Place outputs next to the source PDF:

File	Rule
Final Markdown	`{pdf_stem}.md` — e.g. `开发规范1.pdf` → `开发规范1.md`
OCR raw (optional)	`{pdf_stem}.ocr-raw.txt`

Do not append _OCR, page ranges, or other suffixes unless the user asks.

Workflow

text

- [ ] Step 1: Detect PDF type
- [ ] Step 2: OCR pages (script)
- [ ] Step 3: Structure into {pdf_stem}.md (agent)
- [ ] Step 4: Quality note (optional)

Step 1: Detect PDF type

bash

python {baseDir}/scripts/detect_pdf_type.py "path/to/file.pdf"

•image-only (0 chars/page) → continue with this skill
•text-layer → extract text directly; do not OCR

Step 2: OCR pages

bash

python {baseDir}/scripts/ocr_pages.py "path/to/file.pdf" --pages all --raw-out "path/to/file.ocr-raw.txt"

Options:

•--pages: 6-8, 1,3,5, or all (default all)
•--scale: default 3.5
•--min-confidence: default 0.5

Convert the user-requested page range directly; a trial subset is optional, not required.

Step 3: Structure final Markdown

Read OCR raw output. Apply rules in {baseDir}/profiles/spec-book.md. Format reference: {baseDir}/examples/dev-spec-p6-8.md.

Agent responsibilities (scripts cannot do this reliably):

Remove headers/footers (book title, 3-digit page numbers)
Merge cross-page paragraphs and broken lines
Map structure: # 第X章 / ## 1.1 / ### 【1.1.1】 / **【级别】** etc.
Format code blocks (text for trees, xml/java for snippets)
Fix high-confidence OCR typos in code only (groupld→groupId, artifactld→artifactId)
Mark scan illustrations as blockquotes
Do not infer content beyond the selected page range

Write result to {pdf_stem}.md beside the PDF.

Step 4: Quality note (optional)

If code is present, append:

markdown

<!--
ocr-quality:
  prose: high|medium
  code: review-required
  truncated: yes|no
-->

Code handling

Pattern	Action
Confidence ≥ 0.9 prose	Keep wording
Spacing/punctuation	Normalize
Known OCR code typo	Fix (`groupld`→`groupId`)
Ambiguous wording	Keep OCR literal or flag
Broken XML/Java tags	Fix obvious typos; flag rest

Never present code as copy-paste-ready without review.

Do not

•Use markitdown on image-only PDFs (returns empty)
•Auto-merge pages outside the requested range
•Rename output away from {pdf_stem}.md unless asked

Quick example

User: 把开发规范1.pdf 第6-8页转成 md

bash

python {baseDir}/scripts/detect_pdf_type.py "开发规范1.pdf"
python {baseDir}/scripts/ocr_pages.py "开发规范1.pdf" --pages 6-8 --raw-out "开发规范1.ocr-raw.txt"

Then produce 开发规范1.md following {baseDir}/profiles/spec-book.md.

Cursor IDE note

When installed at .cursor/skills/scanned-pdf-to-markdown/, treat {baseDir} as that folder path, or run scripts relative to the skill root.

Install & Usage

Create the skills directory

mkdir -p .claude/skills

Download the skill file

mkdir -p .claude/skills && curl -o .claude/skills/scanned-pdf-to-markdown.md https://raw.githubusercontent.com/chaoweiku52519/scanned-pdf-to-markdown/main/SKILL.md

Invoke in Claude Code

/scanned-pdf-to-markdown

View source on GitHub

Frequently Asked Questions

What is scanned-pdf-to-markdown?

Convert scanned image PDFs (no text layer) to structured Markdown via local OCR; spec-book profile for coding guidelines.

How to install scanned-pdf-to-markdown?

To install scanned-pdf-to-markdown, create the .claude/skills directory in your project, then run the curl command to download the skill file. Once installed, invoke it in Claude Code with /scanned-pdf-to-markdown.

What is scanned-pdf-to-markdown best for?

scanned-pdf-to-markdown is a community categorized under Documentation. Created by chaoweiku52519.