Meta has revealed that it used public posts from Facebook and Instagram to train its new Meta AI virtual assistant. In a move to retain trust, the company emphasized that they excluded private posts shared only with family and friends to safeguard consumers’ privacy.
Meta also made it clear that private chats on its messaging services were not used as training data for the AI model. The company took additional steps to filter out private details from public datasets used during the training process.
Meta’s President of Global Affairs, Nick Clegg, stated, “We’ve tried to exclude datasets that have a heavy preponderance of personal information.” He highlighted that the “vast majority” of the data used for training was publicly available. He also named LinkedIn as an example of a website Meta deliberately chose not to use due to privacy concerns.
Internet scraping of data
These developments come amid increasing criticism of tech companies for using internet-scraped information without permission to train their AI models. These models process vast amounts of data to summarize information and generate content like images.
Concerns have arisen regarding the use of private or copyrighted materials during this process, potentially leading to copyright infringement lawsuits. Meta’s new AI assistant, Meta AI, was unveiled at the company’s annual Connect conference. This year’s event mostly centred around artificial intelligence.
Meta AI is capable of generating text, audio, and imagery. It has real-time information access through a partnership with Microsoft’s Bing search engine.
The training data for Meta AI included public Facebook and Instagram posts encompassing both text and photos. These posts were used to train image generation, while the chat functions were supplemented with publicly available and annotated datasets.
Regarding copyrighted materials, Clegg acknowledged the potential for litigation to determine whether creative content falls under existing fair use. Fair use permits limited use of protected works for purposes such as commentary, research, and parody. He stated, “We think it is, but I strongly suspect that’s going to play out in litigation.”
Perhaps it would have been better to have had an opt-out option for users on the platforms. After all, Meta still does not own the copyright for those images. Open.AI has just this week announced that they are letting artists opt out of training data. However, the process is so convoluted that they may as well have not bothered.
It’s an interesting problem and one that we will continue to see playing out regarding AI and copyrighted material to train data sets.