md-doc-convert
NewTHE default and only sanctioned way to convert between Markdown and branded PDF, HTML, or DOCX -- one engine, one shared theme file, identical brand across all formats, BOTH directions. Markdown is the source of truth. Forward: md -> PDF/Word/HTML. Reverse: doc -> md (`--to md`; docx/html/odt/rtf via pandoc unwrapped, pdf best-effort via PyMuPDF) so the SAME tool ingests a document into editable markdown instead of a one-off pandoc call. ALWAYS use this skill for any md <-> PDF / Word / HTML conversion in either direction; do NOT write or run per-project build/convert scripts (build_manual.py, build_docx.py, ad-hoc weasyprint/pandoc/pymupdf one-offs) -- if a project needs something the converter lacks, ADD it to this skill instead. Per-document settings (cover meta, effective date, badge, toc, page_break_h2) belong in a YAML frontmatter block in the .md (deep-merged over the theme, document wins), NOT in the theme or a project script. Brand (palette, fonts, emblem, footer) lives in themes/<name>.json; render with `--theme <name>`. Use when asked to produce a PDF / Word / HTML document from markdown, build a branded or themed document, generate a policy manual / report / handbook, add a cover page or Table of Contents, or apply a company brand to a document. Triggers: convert markdown, md to pdf, md to docx, md to html, markdown to word, document converter, branded document, theme.json, frontmatter, cover page, table of contents, TOC field, WeasyPrint, pandoc docx, python-docx, signature lines, policy manual, rebuild deliverable, reprint pdf.
Summary
This skill provides a single, sanctioned engine for converting Markdown files to branded PDF, HTML, or DOCX, and reverse-converting documents back to Markdown.
- It enforces consistent branding across all output formats via shared theme files, eliminating the need for per-project build scripts.
Overview
md-doc-convert
One engine (convert.py), three renderers, one role-based theme. Markdown is the source; a single themes/<name>.json drives PDF, HTML, and DOCX so a brand renders the same everywhere. This supersedes per-project build_*.py scripts.
Invoke
VENV=~/.claude/skills/md-doc-convert/venv/bin/python
CONV=~/.claude/skills/md-doc-convert/convert.py
$VENV $CONV --list-themes # discover what's available
$VENV $CONV --to pdf --theme example input.md output.pdf
$VENV $CONV --to docx --theme example input.md output.docx
$VENV $CONV --to html --theme base input.md output.html
$VENV $CONV --to pdf --theme /path/to/acme.json input.md output.pdf # external theme
# reverse direction: rich document -> markdown source (no theme needed)
$VENV $CONV --to md report.docx report.md # docx/html/odt/rtf via pandoc (unwrapped)
$VENV $CONV --to md scan.pdf scan.md # pdf -> best-effort text (PyMuPDF)
# --to is optional if the output extension is .pdf/.docx/.html/.md
# --theme accepts a bundled NAME (themes/<name>.json) OR a PATH to any .json;
# default: base. A wrong name prints the list of available themes.Run from the project directory (CWD) so project-local assets (e.g. a logo named in the theme) resolve. The skill uses a venv (markdown, weasyprint, python-docx) and needs pandoc on PATH for DOCX. See the Install section in README.md for per-OS setup.
Themes
Role-based names so one schema fits any brand (NOT teal/lime). Hex has no leading `#`. Bundled themes:
- •
base— neutral blue, no cover/TOC (plain professional document). - •
example— fictional brand showing a full cover + TOC.
To add a business: copy example.json to <brand>.json, override the palette + cover, drop its logo in themes/. When the brand has design tokens (a tokens.ts/Tailwind config), source the palette from there. Honor contrast rules: if a decorative color fails as body text, use a darker text-safe variant for accent_deep. Brand themes in themes/ are git-ignored, so private branding is never committed.
{
"palette": { "accent","accent_bright","accent_deep","accent_tint",
"edge","alert","slate","slate_tint","ink" }, // hex, no '#'
"font": "Helvetica, Arial, sans-serif", // PDF/HTML font stack
"docx_font": "Arial", // Word font
"body_pt": 10.5,
"toc": true, // add a Table of Contents
"footer": "Org | Document", // page footer (optional)
"page_break_h2": ["Form A","Form B"], // force these H2s onto a fresh page
"cover": { // omit for a plain doc (no cover)
"org": "...", "title": "...", "meta": ["line","line"],
"badge": "...", "emblem": "logo.png" // emblem resolves rel. to themes/ dir
}
}Palette roles → use: accent rules/table-headers/callout-borders; accent_deep heading text + part-divider banners + cover title; accent_tint table zebra + callout bg; edge part-divider left edge + H3 square bullet; alert emphasis (sparingly); slate captions/meta/footer; ink body. For CMYK/SWOP brand art, sample the palette via a color-managed conversion (macOS sips --matchTo), never a naive RGB convert (oversaturates).
Per-document frontmatter (document-specific settings live in the .md)
A leading YAML frontmatter block in the input .md is deep-merged over the theme, document wins. This is the sanctioned place for anything specific to one document, so the theme stays reusable brand and nothing custom lives in the project. Keep brand (palette, fonts, emblem, footer) in themes/<name>.json; keep cover meta / effective date / badge / toc / page_break_h2 in the .md.
---
toc: true
page_break_h2: [Employee Acknowledgement Form, ATTACHMENT A]
cover:
org: Minnesota Community Care
title: Policy Manual
meta: ["Effective | 01.01.2026", "Last Revised | 06.17.2026"]
badge: "Includes Minnesota Paid Family & Medical Leave<br>effective January 1, 2026"
---Merge is recursive: cover.emblem can stay in the theme while the .md supplies cover.meta/badge. Frontmatter is stripped before markdown/pandoc, so it never renders. Needs pyyaml (in requirements.txt). No frontmatter → theme used as-is.
Cover badge wrapping. The badge "bubble" uses balanced line wrapping so a short message never strands a single word on its own line. For exact control, put a literal <br> in cover.badge to force the break (renders as a line break in PDF, HTML, and DOCX). Prefer <br> over an em-dash (em-dashes are disallowed in these docs and also wrap badly).
How each format is built
| Format | Pipeline |
|---|---|
| md → python-markdown → themed HTML (cover, TOC, CSS) → WeasyPrint | |
| HTML | same themed HTML, emblem embedded as a data URI, written to disk |
| DOCX | md → pandoc → python-docx applies palette, cover, TOC field, tables, sig lines |
PDF and HTML share build_html(); DOCX is render_docx(). Markdown preprocessing (strip <!--PAGE--> markers, signature blocks) is shared intent but format-specific (CSS classes for PDF/HTML; [[SIG]] markers for DOCX).
Gotchas baked into the engine (cross-platform, not Mac-specific)
- •DOCX TOC is a Word field we build ourselves (not
pandoc --toc, whose SDT
block would land above a prepended cover). updateFields=true makes Word build it on open. WeasyPrint/LibreOffice/web-Word do NOT populate Word TOC fields — only desktop Word does; a converted preview shows the placeholder text.
- •**Signature lines: each rule is its OWN blank bordered paragraph, split by the
label** — because Word MERGES identical adjacent paragraph borders, which would collapse a stack of signature rules into one. (See docx-notes for the rule.)
- •Signature markers need a blank line before them or the first one glues onto
the preceding paragraph (markdown soft-break) and loses styling — sig_repl() wraps them in \n\n.
- •pandoc's base.docx has no `Table Grid` style → set
w:tblBordersin XML. - •Signature source convention (for cover/forms): markdown with
<div class="sigblock"><div class="sigline">LABEL</div>…</div>. PDF/HTML style it via .sigblock/.sigline CSS; DOCX converts it to [[SIG]] markers.
Verified
Exercised on a ~60-page branded policy manual across all three formats: cover page, multi-page Table of Contents, part-divider banners, styled tables with zebra striping, subsection bullets, and multi-line signature forms (each rule on its own page). The base theme renders a clean cover-less document in PDF, HTML, and DOCX.
Notes
Self-contained: the engine resolves its own location (__file__), so the folder can live anywhere. The only non-portable piece is venv/ (rebuild per machine from requirements.txt; see the Install section in README.md).
Install & Usage
mkdir -p .claude/skillsmkdir -p .claude/skills && curl -o .claude/skills/md-doc-convert.md https://raw.githubusercontent.com/Corvalon/skill-md-doc-convert/main/SKILL.md/md-doc-convertUse Cases
Usage Examples
/md-doc-convert --to pdf --theme acme manual.md manual.pdf
Convert report.docx to markdown using the reverse direction.
Build a branded HTML document from policy.md with a table of contents and cover page.
Security Audits
Frequently Asked Questions
What is md-doc-convert?
This skill provides a single, sanctioned engine for converting Markdown files to branded PDF, HTML, or DOCX, and reverse-converting documents back to Markdown. It enforces consistent branding across all output formats via shared theme files, eliminating the need for per-project build scripts.
How to install md-doc-convert?
To install md-doc-convert: create the skills directory (mkdir -p .claude/skills), then run: mkdir -p .claude/skills && curl -o .claude/skills/md-doc-convert.md https://raw.githubusercontent.com/Corvalon/skill-md-doc-convert/main/SKILL.md. Finally, /md-doc-convert in Claude Code.
What is md-doc-convert best for?
md-doc-convert is a skill categorized under Documentation. It is designed for: documentation, python. Created by Corvalon.
What can I use md-doc-convert for?
md-doc-convert is useful for: Generate a branded policy manual in PDF from a Markdown source with a company theme.; Convert a client's Word document (.docx) into editable Markdown for version control.; Produce an HTML version of a technical report with a table of contents and cover page.; Rebuild all deliverables (PDF, DOCX, HTML) after updating the Markdown source or theme.; Extract text from a scanned PDF into Markdown for further editing.; Apply a custom theme.json to generate a consistent brand across PDF, DOCX, and HTML outputs..