AI News Roundup – Updates in The New York Times v. OpenAI lawsuit, punishment stands for high school student accused of cheating using AI, GPT-4 is outperforming human doctors, and more

To help you stay on top of the latest news, our AI practice group has compiled a roundup of the developments we are following.

    • The New York Times has alleged that OpenAI erased information key to the newspaper’s copyright lawsuit against the company, according to WIRED. In a court filing this past week, the NYT claims that engineers at OpenAI inadvertently deleted data that their legal team had spent over 150 hours extracting from OpenAI’s training datasets as potential evidence in their ongoing copyright lawsuit. While OpenAI was able to recover much of the deleted information, the NYT’s lawyers say the restored data lacks crucial original file names and folder structures that would help determine how and where the newspaper’s articles were incorporated into OpenAI’s AI models. OpenAI, which characterized the incident as a “glitch,” said in a statement to WIRED that it disagrees with the NYT’s characterization of events and plans to file a response. This dispute is part of a broader copyright lawsuit filed by the NYT against OpenAI and Microsoft last year, alleging illegal use of its articles to train AI tools like ChatGPT.
    • A federal judge in Massachusetts ruled against parents who sued a school district for punishing their son, a high school student who used AI to complete a project and was given a failing grade for cheating, according to Ars Technica. U.S. Magistrate Judge Paul Levenson rejected the parents’ request for a preliminary injunction, finding that Hingham High School officials acted appropriately when they disciplined the junior student in December 2023 for copying and pasting AI-generated text from Grammarly.com into a script for an AP U.S. History documentary film project. The student and a classmate had submitted work containing citations to nonexistent books (an example of AI “hallucinations”) without acknowledging their use of AI, violating the school’s academic integrity policies. While the parents argued there was no specific rule against AI use in the student handbook, the judge noted that existing policies banned unauthorized use of technology and that students had received explicit guidance about AI usage in fall 2023. The students received failing grades on parts of the project but were allowed to redo it, and the disciplined student also received Saturday detention. The judge found that the discipline “did not deprive [the student] of his right to a public education,” as the parents had argued, and thus “any substantive due process claim premised on [the student’s] entitlement to a public education must fail.”
    • A new study has found that AI chatbots outperform human physicians in diagnosing illnesses based on medical histories, according to The New York Times. In the study, published in JAMA Network Open, OpenAI’s GPT-4 acting alone achieved an average score of 90 percent correct when diagnosing conditions from case reports, while doctors using the chatbot scored 76 percent, and those without it scored 74 percent. The research, which involved 50 doctors examining six complex case histories, revealed several surprising findings: doctors often remained committed to their initial diagnoses even when the chatbot suggested better alternatives, and many physicians didn’t fully utilize the AI’s capabilities, treating it more like a search engine rather than leveraging its ability to analyze entire case histories comprehensively. Dr. Adam Rodman, an internal medicine expert who helped design the study, expressed shock at the results, suggesting that AI systems should serve as “doctor extenders” providing valuable second opinions rather than replacing physicians outright.
    • A massive dataset containing dialogue from tens of thousands of movies and TV episodes has been used to train AI systems, according to an investigation from The Atlantic. The dataset, sourced from OpenSubtitles.org, includes dialogue subtitles extracted from DVDs, Blu-ray discs and streams of media content from more than 53,000 movies and 85,000 TV episodes, including every Academy Award Best Picture nominee from 1950 to 2016 and complete series like The WireThe Sopranos and Breaking Bad. Major tech companies including Apple, Anthropic, Meta, Nvidia and Salesforce have used this 14-gigabyte collection of subtitles to train their AI models, without obtaining permission from the original writers. The data, which was originally intended for translation purposes, has become part of “the Pile,” a larger collection of training data used for developing AI systems. While tech companies argue that using copyrighted work for AI training falls under the doctrine of “fair use,” the practice remains legally contentious, with numerous lawsuits filed by writers, actors and artists who view it as a form of plagiarism that threatens their livelihoods.
    • Amazon is investing an additional four billion dollars into Anthropic, the AI company behind the Claude chatbot, according to Bloomberg. The new investment, announced this past week, follows a previous $4 billion investment made earlier in the year and deepens the partnership between the two companies, with Anthropic designating Amazon Web Services (AWS) as its primary cloud and AI training partner. The deal includes provisions for Anthropic to use AWS data centers and Amazon’s AI chips (as reported on by this roundup last week) for developing its most advanced models, while maintaining Amazon’s minority stake in the company. The investment comes amid regulatory scrutiny of big tech companies’ investments in AI firms, though the UK’s competition watchdog has cleared Amazon’s prior Anthropic investments.