Researchers suggest OpenAI trained AI models on paywalled O’Reilly books

Researchers suggest OpenAI trained AI models on paywalled O’Reilly books


Recent findings from an AI watchdog organization suggest OpenAI might have used paywalled O’Reilly Media books to train its advanced AI models, including GPT-4o. The allegations stem from a newly published paper analyzing the models’ familiarity with non-public content.

The study employed a technique called DE-COP, which evaluates whether AI systems recognize copyrighted material from their training data. By testing OpenAI’s models against 13,962 excerpts from 34 O’Reilly books, researchers observed that GPT-4o demonstrated significantly higher recognition of paywalled content compared to older versions like GPT-3.5 Turbo. This pattern persisted even after accounting for improvements in newer models’ ability to discern human-authored text from AI-generated paraphrases.

While the paper stops short of confirming definitive wrongdoing, its authors argue that GPT-4o’s performance implies prior exposure to restricted materials. They speculate that OpenAI may have sourced content via user interactions or unlicensed access, though evidence remains circumstantial. Notably, the latest OpenAI models, such as GPT-4.5, were not included in the analysis.

OpenAI has long faced scrutiny over its data practices, with multiple lawsuits alleging unauthorized use of copyrighted works. The company asserts it licenses content from publishers and provides opt-out mechanisms, though critics argue these measures are insufficient. Meanwhile, competitors across the AI industry are adopting similar strategies, hiring domain experts to refine models using specialized knowledge.

As debates over fair use and intellectual property intensify, this study adds fuel to ongoing legal and ethical discussions. OpenAI has yet to publicly address the allegations, leaving questions about the boundaries of AI training data unresolved.


Share this article

Subscribe

By pressing the Subscribe button, you confirm that you have read our Privacy Policy.
Your Ad Here
Ad Size: 336x280 px

Leave a Reply

Your email address will not be published. Required fields are marked *