0

Digger: Detecting Copyright Content Mis-usage in Large Language Model Training

Pre-training, which utilizes extensive and varied datasets, is a critical factor in the success of Large Language Models (LLMs) across numerous applications. However, the detailed makeup of these datasets is often not disclosed, leading to concerns about data security and…

Year
2024
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2401.00676v1
TL;DR
Semantic Scholar
Attribution policy →