paper-search
NewFind academic papers and read them — searches real scholarly databases for prior work on a topic, then resolves and deep-reads the open-access PDF of any paper you pick. Use this skill WHENEVER the user wants to discover literature or read a study: "find recent papers on X", "what does the research say about Y", "find sources for my thesis on Z", "search for prior work on ...", or "summarise / read / extract the findings from this paper / this arXiv id / this DOI / this PDF". Trigger even when the tool isn't named. Results are real papers from public APIs (OpenAlex, Crossref, arXiv, Semantic Scholar, Europe PMC) and reading reports use only the actual extracted PDF text — nothing is fabricated.
Overview
Paper Search
Two chained capabilities: find papers, then deep-read any one of them. Both are grounded in real data — search hits come from public scholarly APIs, and reading reports are built only from the extracted PDF text.
search a topic ──▶ ranked real papers ──▶ deep-read the open-access onesRunning the scripts: run them **by their full path from your current
working directory — do NOT
cdinto the skill folder.** That way--savewrites
search-results.mdinto the user's workspace, where Claude can open itas a clickable preview. (Replace
scripts/…in the examples below with theskill's real path, e.g.
~/.claude/skills/paper-search/scripts/search_papers.py.)If
pythonisn't found, usepython3(needs Python 3.8+).
Capability 1 — Search
When: the user wants to find papers / prior work / sources on a topic, or asks what the research says about something.
- Default = run immediately. Only ask if the topic is missing. Don't gate
the search behind a menu. If the message already contains a topic (e.g. "查论文 玩家共鸣机制 200" or "find papers on X"), search right away — take any params they included (count, years, sort, open-access) and sensible defaults for the rest (20 results, all years, best match). Do not show the setup menu and do not wait. After the results, add ONE optional line so they can still refine: "Showing 200, all years, best match — say e.g. 'since 2021, open access only' to refine."
Show the setup menu only when you genuinely don't have a topic to search (or it's too vague to query). Then, and only then:
> 🔍 Search setup — tell me the topic (tweak the rest if you like): > 1. Topic — (what should I search?) > 2. How many papers — 20 (e.g. 10 / 20 / 40) > 3. Years — any · 4. Open-access only — no · 5. Sort — best match > > Just give me a topic, or adjust any line.
Map choices to flags: count → --limit, years → --from-year / --to-year, open-access → --open-access-only, sort → your re-rank preference (step 5).
- For a focused query, search directly. For a broad/exploratory one, first
expand it into directions and search the best 2–4 query strings (see references/search.md, Stages 1–3).
- Run the retriever (5 keyless APIs in parallel — OpenAlex, Crossref, arXiv,
Semantic Scholar, Europe PMC — deduped and rule-scored):
``bash python scripts/search_papers.py "your query" --limit 40 --compact --save # --save : ALWAYS include this — writes the FULL list (every card, with # abstracts + links) to search-results.md. This is the complete # record; it can't be truncated by what you show in chat. # --compact : numbered one-line-per-paper list — USE THIS when the count is # large (~25+) so the whole list is scannable and fully shown # --markdown : richer cards (title + abstract + links) — good for smaller # sets or when the user wants detail # --brief : plain table, for your own quick skim # (omit all) : full JSON records with abstracts, for re-ranking / writing # filters: --from-year 2021 --to-year 2026 --open-access-only # sources: --sources openalex,crossref,arxiv,s2,europepmc ``
- Present the full numbered list, and hand over the saved file. Always run
with `--save` — the script writes every result to search-results.md (printed path on stderr). That file is the guaranteed-complete record, so even if the chat view ends up partial, nothing is lost. In chat, show the script's stdout as-is (use `--compact` for large counts → a clean 1…N list they can scan, or `--markdown` cards for smaller/detailed sets) — the script, not the model, produces the layout, so it's identical on any model. Then point the user to the file as a clickable Markdown link with the relative path so it opens in Claude's preview pane — write it exactly as 📄 Full list (all N, with abstracts): [search-results.md](search-results.md). (Use the relative path, not an absolute one — absolute paths render gray/ non-clickable. This works because you ran the script from the workspace, so the file is right there.) A --markdown card looks like:
> ### 1. Paper title ← title links to the original > Authors · Year · Venue · cited by N · via Source · 🟢 Open Access > > abstract snippet… > 📄 Open paper · ⬇ PDF · 🔗 DOI
Every card carries clickable links that jump straight to the original paper, its open-access PDF, and its DOI — keep them intact.
Hard output rules (do not override these for "helpfulness"): - Show ALL N cards. Present the full list, from item 1 through the — end of N results — line. If the user asked for 200 and the databases returned 169, list all 169. Never stop early, never show only "highlights" or "the most representative", never collapse the rest into "… and more". The saved search-results.md is your backstop — it always holds the complete set. - One flat numbered list, 1…N. Do NOT reorganise into themes, categories, sections, or sub-numbered buckets. One continuous 1, 2, 3 … N sequence — that number is how the user refers back ("deep-read #3", "summarise #7, #12"). - Don't shorten cards. Keep each item's metadata + links. You may add at most a one-line "why" under one. - A long reply is expected when the count is large. You may add ONE line after the full list offering to narrow/filter/deep-read — never instead.
- Re-rank by fit FIRST, then number 1…N. This is required, not optional —
the script's rule_score is only a keyword prior, so its raw order floats keyword-soup to the top (wrong domain, wrong sense of an ambiguous word — e.g. "players" matching football players or game-theory players when the user meant video-game players). Before presenting: judge each result's fit to the user's ACTUAL question (see references/search.md), lead with the genuinely on-target papers, and push clearly off-target ones to the bottom — or tag them `⚠ likely off-target`. Then renumber the full set 1…N. Keep every paper (the file has them all); only DROP papers if the user explicitly asked to filter. For a very large list where full reordering is impractical, at least lead with the strongest hits and flag the obvious off-targets. Papers with a pdf_url are open-access and can be deep-read by number next.
The script auto-retries Semantic Scholar on rate-limit (HTTP 429), which clears most skips. If s2 still gets skipped a lot, set a free key — export S2_API_KEY=... (from semanticscholar.org/product/api) — and it stops 429-ing. Any skipped source is reported on stderr and the rest carry the search. Full pipeline in references/search.md.
Capability 2 — Deep Read
When: the user wants to actually read, summarise, or extract findings from a specific paper (by title, DOI, arXiv id, or URL) — including one they just found in a search.
- Resolve and extract the PDF (tries direct URL → arXiv-by-DOI → Unpaywall →
OpenAlex → Semantic Scholar):
``bash python scripts/fetch_pdf.py --doi 10.1145/3313831.3376234 --text-only python scripts/fetch_pdf.py --arxiv 1706.03762 --text-only python scripts/fetch_pdf.py --pdf-url https://.../paper.pdf python scripts/fetch_pdf.py --title "Attention is all you need" # --text-only prints just the extracted text; drop it for full JSON # (per-page array + resolved URL + which sources were tried) # reads the first ~12 pages by default; for a long paper's results section # raise it, e.g. --max-pages 20 --max-chars 60000 (--email for Unpaywall) ``
- If
resolved_pdf_urlis null, the paper is paywalled with no free copy — say
so plainly and offer to try another identifier or work from the abstract. Never fabricate the contents.
- Produce the evidence-aware reading report using the prompt in
references/deep_read.md, from the extracted text only. If extraction was truncated, disclose that the report covers just the analysed excerpt. You can also extract typed claims to hand off to a writing task.
Setup
python -m pip install -r scripts/requirements.txt # only pypdf, for deep readSearch needs nothing beyond the Python 3.8+ standard library; all sources are keyless. If python isn't found, use python3. Set UNPAYWALL_EMAIL=you@domain (or pass --email) to be polite to Unpaywall and slightly improve OA hit-rate.
Honesty contract
Search results are real papers, not invented. Deep-read reports use only the extracted PDF text; gaps are disclosed, not filled with guesses. When a PDF is paywalled or an abstract is too thin to support a claim, say so rather than fabricating — that candor is the point.
Install & Usage
mkdir -p .claude/skillsmkdir -p .claude/skills && curl -o .claude/skills/paper-search.md https://raw.githubusercontent.com/jy1529098645-gif/Cat_paper_search/main/SKILL.md/paper-searchFrequently Asked Questions
What is paper-search?
Find academic papers and read them — searches real scholarly databases for prior work on a topic, then resolves and deep-reads the open-access PDF of any paper you pick. Use this skill WHENEVER the user wants to discover literature or read a study: "find recent papers on X", "what does the research say about Y", "find sources for my thesis on Z", "search for prior work on ...", or "summarise / read / extract the findings from this paper / this arXiv id / this DOI / this PDF". Trigger even when the tool isn't named. Results are real papers from public APIs (OpenAlex, Crossref, arXiv, Semantic Scholar, Europe PMC) and reading reports use only the actual extracted PDF text — nothing is fabricated.
How to install paper-search?
To install paper-search, create the .claude/skills directory in your project, then run the curl command to download the skill file. Once installed, invoke it in Claude Code with /paper-search.
What is paper-search best for?
paper-search is a community categorized under General. It is designed for: api. Created by jy1529098645-gif.