Haciendo una busqueda rápida en el PDF, veo que han estado scrapeando masto.es y un buen chorro de instancias.
LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI
LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI
The tech giant is sidestepping guardrails that websites use to prevent being scraped, data show, in a move whistleblowers say is unethical and potentially illegal.
(www.dropsitenews.com)