BeClaude
Research2026-05-06

Hey, That's My Data! Token-Only Dataset Inference in Large Language Models

Source: Arxiv CS.AI

arXiv:2506.06057v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) rely on massive training datasets, often including proprietary data, which raises concerns about unauthorized usage and copyright infringement. Existing dataset inference methods typically require access to log...

arxivpapers