Reddit Sues Perplexity For Allegedly Ripping Its Content To Feed Ai

3 months ago

Reddit is suing Perplexity and 3 “data-scraping work providers” to “stop nan industrial-scale, unlawful circumvention of information protections by a group of bad actors who will extremity astatine thing to get their hands connected valuable copyrighted contented connected Reddit,” according to nan complaint.

The institution equates nan information scraping companies — SerpApi, Oxylabs, and AWMProxy — to “would-be slope robbers” who “knowing they cannot get into nan slope vault, break into nan armored motortruck carrying nan rate instead.” Reddit alleges that Perplexity is simply a customer of “at slightest one” of nan information scraping companies, saying that it “will apparently do thing to get nan Reddit information it desperately needs to substance its ‘answer engine’ — that is, thing other than participate into an statement pinch Reddit directly, arsenic immoderate of its competitors person done.”

According to nan lawsuit, Reddit sent a cease-and-desist missive to Perplexity successful May 2024 “demanding that it extremity scraping Reddit data.” While Perplexity told Reddit astatine nan clip that it didn’t usage Reddit contented to train AI models and that it would respect Reddit’s robots.txt, aft that letter, nan measurement of Reddit citations connected Perplexity really increased. Reddit besides created a station that could only beryllium crawled by Google, and “within hours,” Perplexity “ produced nan contents” of that post, nan institution says.

“The only measurement that Perplexity could person obtained that Reddit contented and past utilized it successful its ‘answer engine’ is if it and/or its Co-Defendants scraped Google SERPs for that Reddit contented and Perplexity past quickly incorporated that information into its reply engine,” Reddit writes.

Reddit’s information — posts connected each sorts of topics written by and classed by humans — is hugely adjuvant to thief train AI models, and nan institution knows it; nan API changes that sparked nan 2023 protests were positioned arsenic a measurement for nan institution to beryllium compensated for that data. Reddit has struck deals pinch AI companies including OpenAI and Google, and it reportedly wants amended ones. And Reddit has antecedently taken ineligible action against Anthropic, alleging that Anthropic’s bots accessed Reddit’s level moreover aft Anthropic said they wouldn’t beryllium doing that.

“AI companies are locked successful an arms title for value quality contented — and that unit has fueled an industrial-scale ‘data laundering’ economy,” Ben Lee, Reddit’s main ineligible officer, says successful a statement. “Scrapers bypass technological protections to bargain data, past waste it to clients quiet for training material. Reddit is simply a premier target because it’s 1 of nan largest and astir move collections of quality speech ever created.

“Defendants Oxylabs UAB, AWM Proxy, and SerpAI — a Lithuanian information scraper, a erstwhile Russian botnet, and a institution that openly advertises its shady circumvention strategies — are textbook examples of this forbidden behavior,” Lee says. “Unable to scrape Reddit directly, they disguise their identities, hide their locations, and disguise their web scrapers to bargain Reddit contented from Google Search. Perplexity is simply a consenting customer of astatine slightest 1 of these scrapers, choosing to bargain stolen information alternatively than participate into a lawful statement pinch Reddit itself.”

“Perplexity has not yet received nan lawsuit, but we will ever conflict vigorously for users’ authorities to freely and reasonably entree nationalist knowledge,” Jesse Dwyer, Perplexity’s caput of communication, tells The Verge. “Our attack remains opinionated and responsible arsenic we supply actual answers pinch meticulous AI, and we will not tolerate threats against openness and nan nationalist interest.”