BeClaude

md-skill

New
GitHub TrendingGeneralby MRGHOZ

First seen 6/18/2026

Summary

This skill provides a Python cheat sheet for manipulating PDFs, covering reading, editing, and regenerating PDF files using pdfplumber, reportlab, and pypdf.

  • It includes techniques for overlaying text on existing layouts without altering the original design, such as updating prices in a price list.

Overview

Cheat Sheet: Manipulasi PDF dengan Python

Ringkasan praktis 3 library utama yang sering dipakai bareng untuk baca, edit, dan bikin ulang PDF — termasuk teknik yang dipakai untuk update harga di price list (overlay text di atas layout asli tanpa merusak desain).


1. Kapan Pakai Library Apa

KebutuhanLibrary
Baca teks, tabel, koordinat, warna dari PDF yang sudah adapdfplumber
Bikin konten baru (teks, kotak, garis) dari nolreportlab
Gabung/pisah/rotate/encrypt/timpa (overlay) halaman PDFpypdf

Install sekali jalan:

bash
pip install pdfplumber reportlab pypdf --break-system-packages

2. pdfplumber — Membaca PDF

Extract teks per kata + koordinat

python
import pdfplumber

with pdfplumber.open("file.pdf") as pdf:
    page = pdf.pages[0]
    words = page.extract_words()
    for w in words:
        print(w["text"], w["x0"], w["x1"], w["top"], w["bottom"])

Catatan: top/bottom dihitung dari atas halaman (beda sama reportlab, lihat bagian #6).

Extract tabel

python
tables = page.extract_tables()
for row in tables[0]:
    print(row)

Cek warna background suatu area (rects)

Berguna kalau mau nutup teks lama dengan warna yang sama persis dengan kotak kuning/highlight di belakangnya:

python
for r in page.rects:
    if r["fill"]:
        print(r["x0"], r["x1"], r["top"], r["bottom"], r["non_stroking_color"])

Cek font & ukuran karakter asli

python
for c in page.chars[:5]:
    print(c["text"], c["fontname"], c["size"], c["non_stroking_color"])

3. reportlab — Bikin Konten / Overlay Baru

Canvas dasar, ukuran harus sama dengan halaman PDF target

python
from reportlab.pdfgen import canvas
import io

packet = io.BytesIO()
c = canvas.Canvas(packet, pagesize=(page.width, page.height))  # ambil dari pdfplumber
c.save()
packet.seek(0)

Daftarkan font custom (TTF) sebelum dipakai

python
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont

pdfmetrics.registerFont(TTFont("MyFont", "/usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf"))
c.setFont("MyFont", 12)

Gambar kotak solid (untuk nutup teks lama) + teks baru

python
c.setFillColorRGB(1, 1, 0.4)   # warna kotak (cocokkan dengan bg asli)
c.rect(x, y, w, h, fill=1, stroke=0)

c.setFillColorRGB(0, 0, 0)     # warna teks
c.drawString(x, y, "Teks Baru")

⚠️ Jangan pakai karakter Unicode subscript/superscript (₀¹²) di reportlab — font built-in nggak punya glyph-nya, hasilnya kotak hitam. Pakai tag <sub>/<super> di Paragraph, bukan canvas string biasa.


4. pypdf — Gabung, Overlay, Rotate, dst.

Baca & tulis dasar

python
from pypdf import PdfReader, PdfWriter

reader = PdfReader("file.pdf")
writer = PdfWriter()
for page in reader.pages:
    writer.add_page(page)
writer.write(open("output.pdf", "wb"))

⚠️ merge_page() — arah penting!

python
base_page.merge_page(overlay_page)

Ini artinya: `overlay_page` digambar DI ATAS `base_page`. Polanya A.merge_page(B) → B tampil di atas A.

Jadi kalau mau overlay teks baru tampil (nggak ketutup desain asli):

python
orig_page.merge_page(overlay_page)   # benar: overlay di atas, terlihat
writer.add_page(orig_page)

Kalau dibalik (overlay_page.merge_page(orig_page)), hasilnya overlay malah tertutup desain asli. Selalu cek hasil render-nya (lihat #5 langkah terakhir) — perilaku ini kadang kebalik antar versi pypdf, jangan asal percaya tanpa verifikasi visual.

Rotate / Split / Encrypt (singkat)

python
page.rotate(90)                          # rotate
writer.encrypt("user_pw", "owner_pw")    # password protect

5. Contoh Praktis: Update Harga di PDF (Tanpa Ubah Layout)

Pola lengkap yang dipakai buat naikin semua harga di price list 20% tapi desain/gambar produk/warna tetap sama:

python
import pdfplumber, re, io
from reportlab.pdfgen import canvas
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from pypdf import PdfReader, PdfWriter

pdfmetrics.registerFont(TTFont("Bold", "/usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf"))

def overlay_untuk_halaman(page, factor=1.2):
    packet = io.BytesIO()
    c = canvas.Canvas(packet, pagesize=(page.width, page.height))

    for w in page.extract_words():
        if not re.match(r"^\d{1,3}(,\d{3})+$", w["text"]):
            continue  # skip yang bukan format harga "15,000"

        harga_lama = int(w["text"].replace(",", ""))
        harga_baru = f"{round(harga_lama * factor):,}"

        # cari warna bg di belakang teks ini (default putih)
        bg = (1, 1, 1)
        for r in page.rects:
            if r["fill"] and r["x0"] <= w["x0"]+5 and r["x1"] >= w["x1"]-5 \
               and r["top"] <= w["top"]+2 and r["bottom"] >= w["bottom"]-2:
                bg = r["non_stroking_color"]
                break

        # konversi koordinat pdfplumber (top-down) -> reportlab (bottom-up)
        y_bawah = page.height - w["bottom"]
        h = w["bottom"] - w["top"]

        c.setFillColorRGB(*bg)
        c.rect(w["x0"]-2, y_bawah-1, (w["x1"]-w["x0"])+20, h+2, fill=1, stroke=0)

        c.setFillColorRGB(0, 0, 0)
        c.setFont("Bold", h*0.82)
        c.drawString(w["x0"]-4, y_bawah+1, harga_baru)

    c.save()
    packet.seek(0)
    return packet

with pdfplumber.open("price_list.pdf") as pdf:
    original = PdfReader("price_list.pdf")
    writer = PdfWriter()
    for i, page in enumerate(pdf.pages):
        overlay = PdfReader(overlay_untuk_halaman(page)).pages[0]
        orig_page = original.pages[i]
        orig_page.merge_page(overlay)   # overlay di atas, harga lama tertutup
        writer.add_page(orig_page)
    writer.write(open("price_list_updated.pdf", "wb"))

6. Gotchas yang Sering Kejebak

  1. Sistem koordinat beda arah. pdfplumber pakai top (jarak dari atas). reportlab pakai y dari bawah. Konversi: y_reportlab = page_height - top_pdfplumber.
  2. Arah `merge_page()` menentukan siapa di atas siapa — selalu render ke gambar (pdf2image) dan cek visual sebelum kirim hasil final.
  3. Warna kotak penutup harus sama dengan background asli (putih polos vs kuning highlight) — kalau nggak, hasilnya keliatan "ditambal".
  4. Lebar kotak penutup harus dilebihkan sedikit dari teks lama, supaya nggak ada sisa angka lama yang nongol kalau angka baru lebih pendek.
  5. Font harus didaftarkan (registerFont) sebelum dipakai di setFont(), beda dengan font sistem yang otomatis tersedia di Word/LibreOffice.

Catatan: dokumen ini ringkasan teknik umum berbasis library open-source (pdfplumber, reportlab, pypdf), ditulis ulang dari pengalaman project — bukan salinan dari materi internal apa pun.

Install & Usage

1
Create the skills directory
mkdir -p .claude/skills
2
Download the skill file
mkdir -p .claude/skills && curl -o .claude/skills/md-skill.md https://raw.githubusercontent.com/MRGHOZ/md-skill/main/SKILL.md
3
Invoke in Claude Code
/md-skill

Use Cases

Extract text, tables, and coordinates from existing PDFs for data analysis.
Overlay new text onto a PDF page while preserving the original layout and design.
Create new PDF content from scratch, including text, boxes, and lines.
Merge, split, rotate, or encrypt PDF pages using pypdf.
Identify background colors and font properties of text in a PDF for precise editing.
Automate price list updates by overlaying new prices on existing PDF templates.

Usage Examples

1

/md-skill Extract all tables from invoice.pdf and save as CSV

2

/md-skill Overlay 'New Price: $99' at coordinates (100, 200) on page 1 of price_list.pdf

3

/md-skill Merge three PDF files into one and encrypt with password 'secret123'

View source on GitHub

Security Audits

LicenseUnknownSourceWarnRepositoryPass

Frequently Asked Questions

What is md-skill?

This skill provides a Python cheat sheet for manipulating PDFs, covering reading, editing, and regenerating PDF files using pdfplumber, reportlab, and pypdf. It includes techniques for overlaying text on existing layouts without altering the original design, such as updating prices in a price list.

How to install md-skill?

To install md-skill: create the skills directory (mkdir -p .claude/skills), then run: mkdir -p .claude/skills && curl -o .claude/skills/md-skill.md https://raw.githubusercontent.com/MRGHOZ/md-skill/main/SKILL.md. Finally, /md-skill in Claude Code.

What is md-skill best for?

md-skill is a skill categorized under General. Created by MRGHOZ.

What can I use md-skill for?

md-skill is useful for: Extract text, tables, and coordinates from existing PDFs for data analysis.; Overlay new text onto a PDF page while preserving the original layout and design.; Create new PDF content from scratch, including text, boxes, and lines.; Merge, split, rotate, or encrypt PDF pages using pypdf.; Identify background colors and font properties of text in a PDF for precise editing.; Automate price list updates by overlaying new prices on existing PDF templates..