ollama-deepseek-ocr-tool
NewA CLI tool that loads images, sends a prompt to ollama to invoke deepseek-ocr and instructs it to return the image as markdown
Overview
<div align="center">
<img src=".github/assets/logo.png" alt="ollama-deepseek-ocr-tool logo" width="200"/>
ollama-deepseek-ocr-tool
     
A CLI tool for batch OCR processing of document images using DeepSeek-OCR via Ollama.
</div>
Overview
Convert sequences of textbook pages, lecture slides, or scanned documents into a single, coherent markdown file suitable for note-taking applications like Obsidian.
Key Features:
- •⚡ Fast - ~3s per image on M4 (faster than cloud OCR services)
- •🔒 Private - Runs entirely on your machine via Ollama
- •💰 Free - No API keys, rate limits, or costs
- •📝 Clean Output - Markdown tables, headings, and lists
- •🔄 Sequential Processing - Natural sorting maintains document order
Installation
Prerequisites
# 1. Install Ollama
brew install ollama
# 2. Start Ollama service
ollama serve
# 3. Pull DeepSeek-OCR model (~6GB download)
ollama pull deepseek-ocrInstall Tool
cd ollama-deepseek-ocr-tool
uv sync
uv tool install .Usage
Quick Start
# Basic: Process all PNG files in current directory
ollama-deepseek-ocr-tool "*.png" output.mdCommon Use Cases
# Process textbook chapter from iPhone photos
ollama-deepseek-ocr-tool "IMG_*.png" chapter-3-notes.md
# Process lecture slides from subdirectory
ollama-deepseek-ocr-tool "lectures/week-5/*.jpg" week-5-summary.md
# Process numbered scans in order
ollama-deepseek-ocr-tool "scan-00*.png" document.mdVerbose Logging
# INFO level - High-level operations
ollama-deepseek-ocr-tool "*.png" output.md -v
# DEBUG level - Detailed processing info (file sizes, word counts)
ollama-deepseek-ocr-tool "*.png" output.md -vv
# TRACE level - Full HTTP request/response logs
ollama-deepseek-ocr-tool "*.png" output.md -vvvGet Help
# Show full help with examples and troubleshooting
ollama-deepseek-ocr-tool --helpWhat It Can Do
Text & Formatting
- •✅ Body text with markdown formatting
- •✅ Headings (H1, H2, H3)
- •✅ Lists (bulleted, numbered)
- •✅ Multi-column layouts
Tables
- •✅ Converts to clean markdown tables
- •✅ Preserves headers and structure
- •✅ Handles merged cells
Diagrams & Figures
- •✅ Extracts text labels from diagrams
- •✅ Captures figure captions
- •❌ Does not describe visual content
- •❌ Does not capture flow/arrows
Output Format
<!-- Source: IMG_4170.png -->
[extracted text from page 1]
---
<!-- Source: IMG_4171.png -->
[extracted text from page 2]Performance
- •Speed: ~3 seconds per image (M4 MacBook)
- •Memory: ~6GB (DeepSeek-OCR model)
- •Throughput: ~20 images per minute
Development
# Install dependencies
make install
# Run linting
make lint
# Format code
make format
# Type check
make typecheck
# Security checks
make security
# Full pipeline
make pipelineArchitecture
See ARCHITECTURE.md for detailed documentation on:
- •System components and module structure
- •Ollama integration details
- •DeepSeek-OCR capabilities and limitations
- •Performance benchmarks and design decisions
License
MIT
Credits
Built with assistance from AI coding tools and reviewed by humans.
Install & Usage
mkdir -p .claude/skillsmkdir -p .claude/skills && curl -o .claude/skills/ollama-deepseek-ocr-tool.md https://raw.githubusercontent.com/dnvriend/ollama-deepseek-ocr-tool/main/SKILL.md/ollama-deepseek-ocr-toolFrequently Asked Questions
What is ollama-deepseek-ocr-tool?
A CLI tool that loads images, sends a prompt to ollama to invoke deepseek-ocr and instructs it to return the image as markdown
How to install ollama-deepseek-ocr-tool?
To install ollama-deepseek-ocr-tool, create the .claude/skills directory in your project, then run the curl command to download the skill file. Once installed, invoke it in Claude Code with /ollama-deepseek-ocr-tool.
What is ollama-deepseek-ocr-tool best for?
ollama-deepseek-ocr-tool is a community categorized under Documentation. Created by Dennis Vriend.