Digger: Detecting Copyright Content Mis-usage in Large Language Model Training
Pre-training, which utilizes extensive and varied datasets, is a critical factor in the success of Large Language Models (LLMs) across numerous applications. However, the detailed makeup of these datasets is often not disclosed, leading to concerns about data security and…
- Year
- 2024
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.