Research2026-05-06
Hey, That's My Data! Token-Only Dataset Inference in Large Language Models
Source: Arxiv CS.AI
arXiv:2506.06057v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) rely on massive training datasets, often including proprietary data, which raises concerns about unauthorized usage and copyright infringement. Existing dataset inference methods typically require access to log...
arxivpapers