Subscribe to Snippets

AI News Roundup – California’s AI regulations, YouTube videos to train AI models, EU’s efforts to regulate AI, and more

To help you stay on top of the latest news, our AI practice group has compiled a roundup of the developments we are following.

- California’s proposed wide-ranging regulations on AI technologies are moving forward in the state’s legislature, according to The Washington Post. We covered the proposed legislation in a previous AI Roundup. The most prominent bill, introduced by Scott Wiener, a Democratic state senator from San Francisco, would require AI companies to test their models for “catastrophic” risks before public release. The regulations have faced opposition from tech industry leaders, who argue it could stifle innovation and disadvantage smaller startups. Other AI-related bills in California aim to address issues such as bias testing, protection of children’s data and transparency in AI model development. The state’s efforts highlight the growing tension between the tech industry’s calls for regulation and its resistance to specific legislative measures, as well as the increasing role of state governments in shaping AI policy in the absence of comprehensive federal action.

- WIRED reports that numerous tech companies, including Nvidia, Apple, Anthropic, and Salesforce, used materials from thousands of YouTube videos to train AI models, despite YouTube’s terms of service prohibiting data harvesting without permission. An investigation by Proof News found that subtitles from 173,536 YouTube videos, spanning over 48,000 channels, were incorporated into a dataset called YouTube Subtitles and were subsequently used by these tech giants to train various AI models. The content ranged from educational channels like Khan Academy to popular YouTubers such as MrBeast and PewDiePie. Many content creators were unaware their work had been used, raising concerns about consent, compensation and the potential impact on their livelihoods. The YouTube Subtitles dataset is part of a larger compilation known as the Pile, which includes other publicly available information from online sources. A spokesman for Anthropic confirmed the use of the Pile in the training of its AI chatbot Claude but noted that “YouTube’s terms cover direct use of its platform, which is distinct from use of the Pile dataset,” and directed questions about violating YouTube’s terms of service to the Pile’s creators.

- The Financial Times reports on the European Union’s (EU) efforts to regulate AI technologies and ensure its ethical use. The EU’s Artificial Intelligence Act, set to come into effect in August 2024, aims to categorize AI systems based on risk levels and impose corresponding regulations. However, the rushed nature of the law’s development, particularly in response to the emergence of generative AI like OpenAI’s ChatGPT, has led to concerns about its implementation. Critics argue that the act lacks clarity on crucial issues such as copyright and enforcement, potentially hindering innovation in the EU’s tech sector. EU officials have faced challenges in filling in regulatory gaps, hiring technical experts and balancing the desire to lead in AI regulation with the need to foster a competitive AI industry. While some view the act as a necessary step towards ensuring trustworthy AI, others fear it may put European companies at a disadvantage in the global AI race, especially against competitors in the U.S. and China.

- OpenAI has released a new version of its GPT large language model, named GPT-4o Mini, intended to be lighter and cheaper than its full-sized counterparts, according to The Verge. The model is designed to make AI more accessible to developers and users by offering a more affordable option for building AI applications. GPT-4o Mini is reported to be more capable than OpenAI’s previous GPT-3.5 and will replace GPT-3.5 Turbo for ChatGPT users on Free, Plus and Team plans. The new model supports text and vision inputs, with plans to handle other multimodal inputs like video and audio in the future. OpenAI’s move is seen as a response to competing lightweight models such as Google’s Gemini 1.5 Flash and Anthropic’s Claude 3 Haiku and aims to encourage more widespread AI development and usage across various industries and applications without heavy monetary or computational costs.

- The New York Times reports on new research findings that several prominent online sources have restricted the use of their data for AI training, leading to a large drop in content available for that purpose. A study by the Data Provenance Initiative found that 5% of all data and 25% of high-quality data from 14,000 web domains in common AI training datasets has been restricted over the past year. This dramatic shift has been attributed to publishers and online platforms taking steps to prevent their data from being harvested, often using “robots.txt” files that limit text crawlers or changes in terms of service. The trend poses challenges for AI companies, particularly smaller ones, and researchers who rely on public datasets. Some major tech companies have responded by striking deals with publishers, while others are exploring synthetic data generation. The situation highlights growing tensions between AI developers and content creators over data usage and compensation, as well as the need for more nuanced tools to control data access for AI training purposes.

AI News Roundup – Trump administration plans restrictions on DeepSeek and Nvidia, OpenAI releases latest AI models, UAE to use AI in lawmaking, and more

MBHB Partner James Lovsin discusses the USPTO’s recent Iancu-era discretionary denial policy reinstatement

AI News Roundup – EU announces further AI initiatives, IEA releases report on AI, Energy, and Climate, DOGE uses AI to monitor federal workers, and more

About the Authors

Michael S. Borella

Michael S. Borella is Co-Chair of MBHB’s Artificial Intelligence Practice Group and Chair of the firm’s Software and Business Methods Practice Group. Dr. Borella leverages his knowledge of complex software to help his clients – from individual inventors and global technology companies – solve intellectual property challenges and build and manage patent portfolios. Dr. Borella is a named inventor on more than 70 U.S. patent applications and has drafted or been involved in the prosecution of hundreds of patents in the U.S. and around the world. Clients also seek Dr. Borella’s counsel on patent eligibility, validity, infringement, patentability analyses and litigation matters.

Aaron V. Gin

With deep expertise in electrical engineering and computer science, Aaron Gin, Ph.D., helps clients secure intellectual property rights for their innovative technologies. Dr. Gin prepares and prosecutes U.S. and foreign applications for patents and trademarks. He also advises clients on patent validity, infringement, and patentability.

Luke P. Koenigsknecht

Luke P. Koenigsknecht concentrates his practice on intellectual property law matters, including patent preparation and prosecution in the electrical & software practice areas. Mr. Koenigsknecht draws on his academic background in computer engineering to support patent prosecution activities related to a variety of technologies.

© 2025 McDonnell Boehnen Hulbert & Berghoff LLP snippets is a trademark of McDonnell Boehnen Hulbert & Berghoff LLP. All rights reserved. The information contained in this newsletter reflects the understanding and opinions of the author(s) and is provided to you for informational purposes only. It is not intended to and does not represent legal advice. MBHB LLP does not intend to create an attorney–client relationship by providing this information to you. The information in this publication is not a substitute for obtaining legal advice from an attorney licensed in your particular state. snippets may be considered attorney advertising in some states.