Cat_paper_search
New🔍 Find real academic papers across 5 scholarly databases & deep-read any open-access PDF — grounded, never fabricated. Runs on Claude, ChatGPT & DeepSeek. A lite, open-source edition of AcademiCats. 跨 5 大学术库检索真实论文、精读 PDF —— 可在 Claude / ChatGPT / DeepSeek 上使用。
Overview
Paper Search
Two chained capabilities: find papers, then deep-read any one of them. Both are grounded in real data — search hits come from public scholarly APIs, and reading reports are built only from the extracted PDF text.
search a topic ──▶ ranked real papers ──▶ pick several by number ──▶ deep-read them all at onceThe default flow is search → deep-read. As soon as a search finishes, prompt the user for which papers to read next, and let them pick several at once (e.g. "3, 7, 12") — you then deep-read every pick in one batch and return all the reports together.
Running the scripts: run them **by their full path from your current
working directory — do NOT
cdinto the skill folder.** That way--savewrites
search-results.mdinto the user's workspace, where Claude can open itas a clickable preview. (Replace
scripts/…in the examples below with theskill's real path, e.g.
~/.claude/skills/paper-search/scripts/search_papers.py.)If
pythonisn't found, usepython3(needs Python 3.8+).
Capability 1 — Search
When: the user wants to find papers / prior work / sources on a topic, or asks what the research says about something.
- Default = run immediately. Only ask if the topic is missing. Don't gate
the search behind a menu. If the message already contains a topic (e.g. "查论文 玩家共鸣机制 200" or "find papers on X"), search right away — take any params they included (count, years, sort, open-access) and sensible defaults for the rest (20 results, all years, best match). Do not show the setup menu and do not wait. After the results, add ONE optional line so they can still refine: "Showing 200, all years, best match — say e.g. 'since 2021, open access only' to refine."
Show the setup menu only when you genuinely don't have a topic to search (or it's too vague to query). Then, and only then:
> 🔍 Search setup — tell me the topic (tweak the rest if you like): > 1. Topic — (what should I search?) > 2. How many papers — 20 (e.g. 10 / 20 / 40) > 3. Years — any · 4. Open-access only — no · 5. Sort — best match (or `newest` / `most cited`) > > Just give me a topic, or adjust any line.
Map choices to flags: count → --limit, years → --from-year / --to-year,
(best match → relevance, newest → recency, most cited → citations). --sort sets the script's primary order; you still re-rank by research fit afterwards (step 5) unless the user explicitly asked for strict newest/most- cited order.
- For a focused query, search directly. For a broad/exploratory one, first
expand it into directions and search the best 2–4 query strings (see references/search.md, Stages 1–3).
- Run the retriever (5 keyless APIs in parallel — OpenAlex, Crossref, arXiv,
Semantic Scholar, Europe PMC — deduped and rule-scored):
```bash python scripts/search_papers.py "your query" --limit 40 --compact --save # --save : ALWAYS include this — writes the FULL list (every card, with # abstracts + links) to search-results.md. This is the complete # record; it can't be truncated by what you show in chat. # --compact : numbered one-line-per-paper list — USE THIS when the count is # large (~25+) so the whole list is scannable and fully shown # --markdown : richer cards (title + abstract + links) — good for smaller # sets or when the user wants detail # --brief : plain table, for your own quick skim # (omit all) : full JSON records with abstracts, for re-ranking / writing # filters: --from-year 2021 --to-year 2026 --open-access-only
# newest first, citations = most cited first — rule_score breaks ties) # sources: --sources openalex,crossref,arxiv,s2,europepmc ```
- Present the full numbered list, and hand over the saved file. Always run
with `--save` — the script writes every result to search-results.md (printed path on stderr). That file is the guaranteed-complete record, so even if the chat view ends up partial, nothing is lost. In chat, show the script's stdout as-is (use `--compact` for large counts → a clean 1…N list they can scan, or `--markdown` cards for smaller/detailed sets) — the script, not the model, produces the layout, so it's identical on any model. Then point the user to the file as a clickable Markdown link with the relative path so it opens in Claude's preview pane — write it exactly as 📄 Full list (all N, with abstracts): [search-results.md](search-results.md). (Use the relative path, not an absolute one — absolute paths render gray/ non-clickable. This works because you ran the script from the workspace, so the file is right there.) A --markdown card looks like:
> ### 1. Paper title ← title links to the original > Authors · Year · Venue · cited by N · via Source · 🟢 Open Access > > abstract snippet… > 📄 Open paper · ⬇ PDF · 🔗 DOI
Every card carries clickable links that jump straight to the original paper, its open-access PDF, and its DOI — keep them intact.
Hard output rules (do not override these for "helpfulness"): - Show ALL N cards. Present the full list, from item 1 through the — end of N results — line. If the user asked for 200 and the databases returned 169, list all 169. Never stop early, never show only "highlights" or "the most representative", never collapse the rest into "… and more". The saved search-results.md is your backstop — it always holds the complete set. - One flat numbered list, 1…N. Do NOT reorganise into themes, categories, sections, or sub-numbered buckets. One continuous 1, 2, 3 … N sequence — that number is how the user refers back ("deep-read #3", "summarise #7, #12"). - Don't shorten cards. Keep each item's metadata + links. You may add at most a one-line "why" under one. - A long reply is expected when the count is large. You may add ONE line after the full list offering to narrow/filter/deep-read — never instead.
- Re-rank by fit FIRST, then number 1…N. This is required, not optional —
the script's rule_score is only a keyword prior, so its raw order floats keyword-soup to the top (wrong domain, wrong sense of an ambiguous word — e.g. "players" matching football players or game-theory players when the user meant video-game players). Before presenting: judge each result's fit to the user's ACTUAL question (see references/search.md), lead with the genuinely on-target papers, and push clearly off-target ones to the bottom — or tag them `⚠ likely off-target`. Then renumber the full set 1…N. Keep every paper (the file has them all); only DROP papers if the user explicitly asked to filter. For a very large list where full reordering is impractical, at least lead with the strongest hits and flag the obvious off-targets. Papers with a pdf_url are open-access and can be deep-read by number next.
- **Then prompt for deep read — this is the default next step, not an
afterthought.** Right after the numbered list, close with ONE clear line inviting a multi-pick, e.g.:
> 📖 Want me to deep-read any of these? Give me the numbers — you can pick > several at once (e.g. 3, 7, 12 or 1-5) and I'll read them all and > return a report for each. (Papers marked 🟢 open-access can be read in full; > the rest I can only summarise from the abstract.)
When the user replies with numbers (single, list, or range), go straight to Capability 2 — Deep Read in batch mode. Don't re-ask or re-confirm each one; resolve the picked papers and read them all.
The script auto-retries Semantic Scholar on rate-limit (HTTP 429), which clears most skips. If s2 still gets skipped a lot, set a free key — export S2_API_KEY=... (from semanticscholar.org/product/api) — and it stops 429-ing. Any skipped source is reported on stderr and the rest carry the search. Full pipeline in references/search.md.
Capability 2 — Deep Read (one paper or many at once)
When: the user wants to actually read, summarise, or extract findings from papers — one specific paper, or several picked by number from a search. The common path is: they ran a search, then replied with picks like 3, 7, 12 or 1-5.
Batch mode (the default after a search)
When the user picks more than one paper:
- Resolve each pick to its best identifier from the saved search results —
prefer DOI or arXiv id over title (more reliable). Skip any pick with no pdf_url only if the user wanted full-text; otherwise note it can only be summarised from its abstract.
- Fetch all picked PDFs in parallel — issue one
fetch_pdf.pycall per
paper in a single batch of tool calls (they're independent), so the user doesn't wait for them serially:
``bash python scripts/fetch_pdf.py --doi 10.1145/3313831.3376234 --text-only python scripts/fetch_pdf.py --arxiv 1706.03762 --text-only # …one call per picked paper, run together ``
- Produce one reading report per paper using the prompt in
references/deep_read.md, each grounded only in that paper's extracted text. Keep them in the same order the user picked, each headed by its number + title so they map back to the list. For any paper whose PDF wouldn't resolve, say so plainly under its heading and offer the abstract — don't skip silently and don't fabricate.
- Don't re-ask between papers. Read every pick, then deliver all reports in
one response. After them, optionally offer a short cross-paper comparison or to hand the claims to a writing task.
Single-paper mode
When the user names just one paper (by title, DOI, arXiv id, or URL):
- Resolve and extract the PDF (tries direct URL → arXiv-by-DOI → Unpaywall →
OpenAlex → Semantic Scholar):
``bash python scripts/fetch_pdf.py --doi 10.1145/3313831.3376234 --text-only python scripts/fetch_pdf.py --arxiv 1706.03762 --text-only python scripts/fetch_pdf.py --pdf-url https://.../paper.pdf python scripts/fetch_pdf.py --title "Attention is all you need" # --text-only prints just the extracted text; drop it for full JSON # (per-page array + resolved URL + which sources were tried) # reads the first ~12 pages by default; for a long paper's results section # raise it, e.g. --max-pages 20 --max-chars 60000 (--email for Unpaywall) ``
- If
resolved_pdf_urlis null, the paper is paywalled with no free copy — say
so plainly and offer to try another identifier or work from the abstract. Never fabricate the contents.
- Produce the evidence-aware reading report using the prompt in
references/deep_read.md, from the extracted text only. If extraction was truncated, disclose that the report covers just the analysed excerpt. You can also extract typed claims to hand off to a writing task.
Setup
python -m pip install -r scripts/requirements.txt # only pypdf, for deep readSearch needs nothing beyond the Python 3.8+ standard library; all sources are keyless. If python isn't found, use python3. Set UNPAYWALL_EMAIL=you@domain (or pass --email) to be polite to Unpaywall and slightly improve OA hit-rate.
Honesty contract
Search results are real papers, not invented. Deep-read reports use only the extracted PDF text; gaps are disclosed, not filled with guesses. When a PDF is paywalled or an abstract is too thin to support a claim, say so rather than fabricating — that candor is the point.
Install & Usage
mkdir -p .claude/skillsmkdir -p .claude/skills && curl -o .claude/skills/cat-paper-search.md https://raw.githubusercontent.com/jy1529098645-gif/Cat_paper_search/main/SKILL.md/cat-paper-searchFrequently Asked Questions
What is Cat_paper_search?
🔍 Find real academic papers across 5 scholarly databases & deep-read any open-access PDF — grounded, never fabricated. Runs on Claude, ChatGPT & DeepSeek. A lite, open-source edition of AcademiCats. 跨 5 大学术库检索真实论文、精读 PDF —— 可在 Claude / ChatGPT / DeepSeek 上使用。
How to install Cat_paper_search?
To install Cat_paper_search, create the .claude/skills directory in your project, then run the curl command to download the skill file. Once installed, invoke it in Claude Code with /cat-paper-search.
What is Cat_paper_search best for?
Cat_paper_search is a community categorized under Data & Analytics. Created by jy1529098645-gif.