BeClaude

ollama-deepseek-ocr-tool

New
2Community RegistryDocumentationby Dennis Vriend

A CLI tool that loads images, sends a prompt to ollama to invoke deepseek-ocr and instructs it to return the image as markdown

Community PluginView Source

Overview

<div align="center">

<img src=".github/assets/logo.png" alt="ollama-deepseek-ocr-tool logo" width="200"/>

ollama-deepseek-ocr-tool

![Python Version](https://www.python.org/downloads/) ![License: MIT](https://opensource.org/licenses/MIT) ![Code style: ruff](https://github.com/astral-sh/ruff) ![Type checked: mypy](https://github.com/python/mypy) ![AI Generated](https://www.anthropic.com/claude) ![Built with Claude Code](https://www.anthropic.com/claude/code)

A CLI tool for batch OCR processing of document images using DeepSeek-OCR via Ollama.

</div>

Overview

Convert sequences of textbook pages, lecture slides, or scanned documents into a single, coherent markdown file suitable for note-taking applications like Obsidian.

Key Features:

  • Fast - ~3s per image on M4 (faster than cloud OCR services)
  • 🔒 Private - Runs entirely on your machine via Ollama
  • 💰 Free - No API keys, rate limits, or costs
  • 📝 Clean Output - Markdown tables, headings, and lists
  • 🔄 Sequential Processing - Natural sorting maintains document order

Installation

Prerequisites

bash
# 1. Install Ollama
brew install ollama

# 2. Start Ollama service
ollama serve

# 3. Pull DeepSeek-OCR model (~6GB download)
ollama pull deepseek-ocr

Install Tool

bash
cd ollama-deepseek-ocr-tool
uv sync
uv tool install .

Usage

Quick Start

bash
# Basic: Process all PNG files in current directory
ollama-deepseek-ocr-tool "*.png" output.md

Common Use Cases

bash
# Process textbook chapter from iPhone photos
ollama-deepseek-ocr-tool "IMG_*.png" chapter-3-notes.md

# Process lecture slides from subdirectory
ollama-deepseek-ocr-tool "lectures/week-5/*.jpg" week-5-summary.md

# Process numbered scans in order
ollama-deepseek-ocr-tool "scan-00*.png" document.md

Verbose Logging

bash
# INFO level - High-level operations
ollama-deepseek-ocr-tool "*.png" output.md -v

# DEBUG level - Detailed processing info (file sizes, word counts)
ollama-deepseek-ocr-tool "*.png" output.md -vv

# TRACE level - Full HTTP request/response logs
ollama-deepseek-ocr-tool "*.png" output.md -vvv

Get Help

bash
# Show full help with examples and troubleshooting
ollama-deepseek-ocr-tool --help

What It Can Do

Text & Formatting

  • ✅ Body text with markdown formatting
  • ✅ Headings (H1, H2, H3)
  • ✅ Lists (bulleted, numbered)
  • ✅ Multi-column layouts

Tables

  • ✅ Converts to clean markdown tables
  • ✅ Preserves headers and structure
  • ✅ Handles merged cells

Diagrams & Figures

  • ✅ Extracts text labels from diagrams
  • ✅ Captures figure captions
  • ❌ Does not describe visual content
  • ❌ Does not capture flow/arrows

Output Format

markdown
<!-- Source: IMG_4170.png -->

[extracted text from page 1]

---

<!-- Source: IMG_4171.png -->

[extracted text from page 2]

Performance

  • Speed: ~3 seconds per image (M4 MacBook)
  • Memory: ~6GB (DeepSeek-OCR model)
  • Throughput: ~20 images per minute

Development

bash
# Install dependencies
make install

# Run linting
make lint

# Format code
make format

# Type check
make typecheck

# Security checks
make security

# Full pipeline
make pipeline

Architecture

See ARCHITECTURE.md for detailed documentation on:

  • System components and module structure
  • Ollama integration details
  • DeepSeek-OCR capabilities and limitations
  • Performance benchmarks and design decisions

License

MIT

Credits

Built with assistance from AI coding tools and reviewed by humans.

Install & Usage

1
Create the skills directory
mkdir -p .claude/skills
2
Download the skill file
mkdir -p .claude/skills && curl -o .claude/skills/ollama-deepseek-ocr-tool.md https://raw.githubusercontent.com/dnvriend/ollama-deepseek-ocr-tool/main/SKILL.md
3
Invoke in Claude Code
/ollama-deepseek-ocr-tool
View source on GitHub

Frequently Asked Questions

What is ollama-deepseek-ocr-tool?

A CLI tool that loads images, sends a prompt to ollama to invoke deepseek-ocr and instructs it to return the image as markdown

How to install ollama-deepseek-ocr-tool?

To install ollama-deepseek-ocr-tool, create the .claude/skills directory in your project, then run the curl command to download the skill file. Once installed, invoke it in Claude Code with /ollama-deepseek-ocr-tool.

What is ollama-deepseek-ocr-tool best for?

ollama-deepseek-ocr-tool is a community categorized under Documentation. Created by Dennis Vriend.