AI Daily — 2026-03-25

English 中文

Google DeepMind Unveils Lyria 3 Pro Music Model · The AI Scientist: Fully Automated AI Research P...

Covering 27 AI news items

🔥 Top Stories

1. Google DeepMind Unveils Lyria 3 Pro Music Model

Google DeepMind announced Lyria 3 Pro, the newest and most advanced iteration of its Lyria music model. The update enables tracks up to 3 minutes long with greater creative control and expands Lyria availability to additional Google products, including HLS playback support. Source-twitter

2. The AI Scientist: Fully Automated AI Research Published in Nature

Sakana AI Labs’ AI Scientist, powered by foundation models, automates the entire ML research lifecycle from ideation to manuscript drafting. The AI Scientist-v2 reportedly produced the first fully AI-generated paper to pass rigorous human peer review. The Nature paper, The AI Scientist: Towards Fully Automated AI Research, describes milestones and the foundation-model orchestration enabling automated AI research. Source-twitter

3. ARC-AGI-3: Humans 100%, AI <1% in Agentic Benchmark

ARC-AGI-3 is described as the world’s only unsaturated agentic intelligence benchmark. In the released results, humans scored 100% while AI scored under 1%, highlighting a gap to true AGI. Unlike traditional benchmarks, ARC-AGI-3 focuses on how models learn rather than what they already know. Source-twitter

📰 Featured

LLM

Apple Gains Deep Access to Gemini, Distills On-Device Models — Apple reportedly has full access to Google’s Gemini model, enabling in-house distillation of Gemini knowledge into smaller task-specific models. These compact models could run on iPhones and even learn Gemini’s internal reasoning to improve performance. This deep access signals a major shift in how edge devices leverage large language models. Source-twitter
Google DeepMind Partners with Agile Robots to Deploy Gemini Foundation Models — Google DeepMind announces a research partnership with Agile Robots to integrate its Gemini foundation models with Agile Robots’ hardware, aiming to power the next generation of capable industrial robots. The collaboration will enable deploying more helpful and useful robots to tackle complex industrial challenges. Source-twitter
They are developing Claude into the app that ChatGPT wanted to be. — Claude’s mobile app now brings work tools directly to phones, letting users access Figma designs, Canva slides, and Amplitude dashboards from Claude. The update positions Claude as a more integrated productivity assistant, echoing the goal of making ChatGPT a mobile workflow companion. A download link claude.com/download invites users to try the mobile experience. Source-twitter
Supermemory AI Debuts Fast Memory and Context Engine — Supermemory provides a memory and context layer for AI, combining RAG and file processing into a single system. It automatically learns from conversations, extracts facts, builds user profiles, handles updates and contradictions, and forgets expired information to deliver the right context quickly (about 50 ms per call). It claims top performance on AI memory benchmarks such as LongMemEval, LoCoMo, and ConvoMem. Source-github
Google TurboQuant Claims 6x KV Cache, 8x Attention Speedup — Google introduced TurboQuant, claiming 6x KV cache compression with zero accuracy loss and up to 8x attention speedups on H100 GPUs, presented at ICLR 2026. The post asks whether anyone has implemented it and what real-world gains they observed beyond the paper benchmarks. Source-reddit
Liquid AI LFM2-24B-A2B Runs ~50 Tokens/s in WebGPU Browser — Liquid AI’s MoE-based LFM2-24B-A2B delivers about 50 tokens per second in a web browser via WebGPU on an M4 Max. The 8B A1B variant reportedly exceeds 100 tokens per second on the same hardware. The demo and source code are provided on HuggingFace Spaces, along with optimized ONNX models for 8B-A1B and 24B-A2B. Source-reddit
Qwen 3.5 Hybrid Attention Doubles Long-Context Speed — A Reddit post compares Qwen 3.5 against earlier Qwen architectures using qwen3.5-9b-mlx and qwen3VL-8b-mlx in 4-bit quantization with LM Studio. It reports that hybrid attention significantly improves long-context handling, claiming roughly 2x faster performance at 128K+ context lengths. This marks a notable open-source LLM optimization. Source-reddit
Supply-chain attack hits litellm; new alternatives emerge — Litellm versions 1.82.7 and 1.82.8 on PyPI were compromised with credential-stealing malware. The piece highlights open-source alternatives: Bifrost (Go, ~50x faster P99 latency, Apache 2.0, 20+ providers), Kosong (LLM abstraction layer, agent-oriented, supports multiple APIs), and Helicone (AI gateway with analytics, 100+ providers). Source-reddit

AI Research

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding — Researchers propose reframing document OCR as inverse rendering and applying diffusion-based decoding instead of traditional left-to-right autoregressive methods. This aims to reduce latency and error propagation in long documents while better capturing structure like layout, tables, and formulas. The work introduces MinerU-Diffusion as a diffusion-based approach to improve structured document understanding. Source-huggingface

Dataset

WildWorld Dataset Enables Action-Conditioned Dynamic World Modeling — WildWorld is a large-scale dataset designed for action-conditioned dynamic world modeling with explicit state representation to support generative ARPG research. It models world evolution as latent-state dynamics driven by actions, with visual observations providing partial information, aligning with dynamical systems theory and reinforcement learning. The release highlights gaps in existing datasets, notably lacking diverse, semantically meaningful action spaces and actions that are not mediated by underlying states. Source-huggingface

⚡ Quick Bites

Cursor Self-Hosts Cloud Agents on Your Infrastructure — Cursor announced that its cloud agents can run entirely on customers’ own infrastructure. The self-hosted option provides the same cloud agent harness and experience while keeping code and tool execution within the client’s network. This move emphasizes security and on-prem control for AI tooling. Source-twitter
Claim: Mary Shelley Frankenstein Opening 100% AI-Generated — A viral tweet claims Mary Shelley used generative AI to produce the opening of chapter 5 of Frankenstein, asserting the passage is 100% AI-generated. The claim is unverified and raises questions about authenticity and attribution of AI-generated text in literature. The post, attributed to WrnrWrites, was circulated on X (formerly Twitter) with a link by DrakeGatsby. Source-twitter
SpecEyes Accelerates Agentic Multimodal LLMs via Speculative Perception — SpecEyes proposes an agentic-level speculative acceleration framework to reduce latency from cascaded perception, reasoning, and tool-calling loops in agentic multimodal LLMs. It targets breaking agentic depth to improve system-level concurrency and response speed. The work is presented as a research contribution on HuggingFace. Source-huggingface
Last30days-skill Enables 30-Day Topic Research Across Platforms — The last30days-skill AI agent surveys topics across Reddit, X, Bluesky, YouTube, TikTok, Instagram, Hacker News, Polymarket, and the web for the last 30 days, surfacing what communities upvote, share, bet on, and say with real citations. In version 2.9.5 it adds Bluesky as a source with opt-in via Bluesky credentials and introduces comparative mode plus configuration improvements; Claude Code is recommended for generation. Source-github
Built an algorithm to surface useful AI Reddit posts — A Reddit user describes building a small algorithm with Claude Code to filter and surface high-quality posts about vibecoding and AI-assisted development. The system scrapes nine subreddits and uses keyword searches, then applies engagement-based filters to surface the 15 most useful posts daily, cutting through low-effort content. Source-reddit
Intel to sell 32GB VRAM GPU for $949 next week — Intel plans to release a budget GPU with 32 GB of VRAM on March 31, priced at $949, with 608 GB/s bandwidth and 290 W. The card targets local AI workloads and quantized models like Qwen 3.5 27B at 4-bit precision. The post expresses optimism for Intel’s AI hardware efforts. Source-reddit
Intel unveils Arc Pro B70 and B65 with 32GB GDDR6 — Intel has announced the Arc Pro B70 and B65 professional GPUs, each with 32GB of GDDR6 memory. The B70 is reportedly priced around $949, targeting AI and professional workloads, with the B65 positioned as a lower-tier option in the same lineup. Source-reddit
DeepSeek Teases Massive Model Surpassing V3.2 — An employee at DeepSeek teased a forthcoming ‘massive’ model that allegedly surpasses DeepSeek V3.2. The teaser appeared on Reddit and was later deleted, with the employee’s reply reportedly containing content they shouldn’t have. Source-reddit
Local LLM Setup to Summarize 500 Pages of OCR Medical PDFs — A Reddit user seeks a simple, privacy-focused, local AI workflow to summarize ~500 pages of OCR’d medical records. They want a no-frills setup that runs on a Ryzen 5 5600X with RX 590 and 16GB RAM on Windows 11, using OCR’d PDFs from ocrmypdf, to produce structured summaries for specialists. They prefer easy deployment and cleanup, with minimal deep-diving into local LLMs. Source-reddit
Fully Local Voice AI Runs On iPhone 15 On-Device — A Reddit user demonstrates a fully free, self-hosted voice AI that runs entirely on-device on an iPhone 15. The setup offloads STT and TTS to FluidAudio and Apple’s Neural Engine, enabling llama.cpp to leverage the GPU with minimal contention. GitHub repo: https://github.com/fikrikarim/volocal Source-reddit
Level1techs ARC B70 Review for Qwen, 4 Pros — An initial Level1techs review examines using the Intel Arc B70 GPU to run the Qwen LLM and related workloads. The piece highlights practical observations and notes a setup that includes four B70 cards. Source-reddit
LLMs Personalization: Distracting Memory Repeats Past Questions, Karpathy Says — Karpathy highlights a persistent issue in LLM personalization: memory from prior interactions can dominate responses, dragging focus back to topics from months ago. This ‘memory creep’ leads to undue repetition and over-mention of a single past question, impacting user experience. The observation was shared on X (Twitter). Source-twitter
AI Doesn’t Make Engineering Easier; Exposes Engineer Shortcomings — An online post argues that AI does not simplify engineering; rather, it exposes and amplifies engineers’ weaknesses. The claim suggests that bad engineers become delusional, average engineers noisier, and great engineers harder to beat. It originates from a tweet by Yuchenj_UW. Source-twitter
Warning: Kryven AI Scam Claims Debunked — A new AI tool called Kryven AI is being promoted as private and uncensored with token-based rewards for promoters, but a tester found it to resemble a Gemini Frontend and questioned the claimed model origin. Claims about training by Google are unverified and cited only in a Reddit post, casting doubt on the platform’s legitimacy. The report treats Kryven AI as a scam warning rather than a legitimate AI product. Source-reddit

Generated by AI News Agent | 2026-03-25