AI Daily — 2026-05-25

English 中文

AlphaProof Nexus Solves Open Math Problems with Gemini · Grok V9-Medium (1.5T) Training Complete;...

Covering 32 AI news items

🔥 Top Stories

1. AlphaProof Nexus Solves Open Math Problems with Gemini

AI agents built around Google DeepMind’s AlphaProof Nexus autonomously solved multiple open formal-math problems, including nine Erdős problems (two unsolved for 56 years), 44 OEIS challenges, a 15-year-old algebraic-geometry problem, and a seven-year-old min-max optimization question. The work demonstrates the potential of agentic loops powered by Gemini in tackling research-level mathematics, and involves collaboration with mathematicians across combinatorics, graph theory, and quantum optics. The related paper is available at arXiv:2605.22763v1. Source-twitter

2. Grok V9-Medium (1.5T) Training Complete; Release in Weeks

Grok announced that its V9-Medium foundation model with 1.5T parameters has finished training and preliminary evaluations are positive. Additional Cursor data was added during supplemental training, with more to come; fine-tuning is underway and reinforcement learning will start shortly. The team expects a public release in 2 to 3 weeks, and this model should significantly outperform the current 0.5T v8-small for challenging coding tasks. Source-twitter

3. DeepSeek V4 Flash Free on Nous Portal for Hermes Agent

DeepSeek V4 Flash is back on the Nous Portal and available for free use with the Hermes Agent. The Nous Portal promotes free access to DeepSeek V4 Flash and positions Nous Research as a developer of human-centric language models and simulators. Source-twitter

📰 Featured

AI Safety

Anthropic Co-Founder Warns AI Could Eliminate Half of Entry-Level White-Collar Jobs — Anthropic co-founder Dario Amodei has warned for over a year that AI could trigger mass job displacement, including a May 2025 claim that 50% of entry-level white-collar roles could be eliminated within five years and unemployment could reach 10-20%. In January 2026, he published a 20,000-word essay arguing AI will act as a general labor substitute with unusually painful disruption. The piece also notes Davos-era warnings of a ‘zeroth world’ economy in Silicon Valley, supported by data showing 2025 declines in tech entry-level hiring, about 200,000 junior roles cut in Wall Street banks, net job shedding by S&P 500 firms, and Anthropic’s own labor market research. Source-twitter
Pope warns AI must serve dignity, warns against state overreach — The Pope asserts AI must serve human dignity rather than enabling domination or exclusion. It warns that giving governments broad control over AI could enable censorship, surveillance, and citizen control, drawing on Orwell’s 1984 and the adage ‘Quis custodiet ipsos custodes.’ The piece frames this as the real alignment problem facing AI governance. Source-twitter
Anthropic hires Karpathy, signals ethics-driven PR coup amid weapons debate — Anthropic has made a high-profile hire of Andrej Karpathy, framed as a major PR coup that highlights the company’s appeal to popular researchers and its ethical stance. The piece notes past tensions with the Department of War over Claude’s use in autonomous weapons, which led to OpenAI and Google securing contracts and Anthropic being labeled a supply chain risk. It also references Dario Amodei’s warnings about unemployment, framing the debate within broader AI industry dynamics. Source-twitter

Multimodal

Lens 3.8B Text-to-Image Model Surpasses Larger Models in Efficiency — Lens, a 3.8B-parameter text-to-image model, matches or exceeds state-of-the-art models with more than 6B parameters on benchmarks while using about 19.3% of their training compute. Its efficiency stems from a compact model plus strategies to maximize data information density per training batch. Source-huggingface

Open Source

NuExtract3 Open-Weight 4B VLM for Markdown and OCR — NuExtract3 is a 4-billion-parameter open-weight Vision-Language Model based on Qwen3.5-4B, released under Apache-2.0. It targets practical information extraction from complex documents—turning image/text into Markdown, and extracting structured data from PDFs, forms, tables, receipts, and multi-page layouts—self-hostable with a target JSON template. It succeeds NuMarkdown, with a free HuggingFace space provided for experimentation. Source-reddit
MiMo V2.5-Coder Released for Local 128 GB RAM — MiMo V2.5-Coder has been released and is pitched as one of the best models to run locally on systems with 128 GB of RAM. The author claims fast performance and that in experiments it outperformed Qwen 3.6 and DeepSeek 4-Flash. The project promotes open source and open science, with the model hosted on Hugging Face. Source-twitter
Frigate NVR Enables Local Real-Time Object Detection for IP Cameras — Frigate NVR is a complete, local NVR designed for Home Assistant that performs real-time object detection on IP cameras using OpenCV and TensorFlow. It emphasizes local processing, multiprocessing for high FPS, and low-overhead motion detection, with MQTT communication and Home Assistant integration. Source-github

LLM

OSCAR RotationZoo: Offline 2-bit KV Cache Rotations — OSCAR RotationZoo provides precomputed K/V rotation matrices for OSCAR INT2 KV-cache quantization. It packages artifacts enabling offline estimation of attention-aware K/V covariance and per-layer orthogonal rotations that align 2-bit quantization with attention directions. The approach reportedly yields ~7x KV-cache memory reduction with a small accuracy drop on GPQA for dense reasoning models, delivered as ready-to-use .pt files. Source-reddit
Herm improves Python performance, beats Codex on multiturn benchmarks — Herm’s update announces performance improvements and argues Python performance is competitive with large Rust codebases. It claims Herm beats Codex on most multiturn benchmarks, and includes a GitHub PR link to NousResearch/herm. Source-twitter
Earendil-Works pi AI Agent Toolkit Released — The pi project releases an open-source AI agent toolkit featuring a coding agent CLI, unified multi-provider LLM API, and UI libraries (TUI & web). It is built in the pi agent harness monorepo with modules for coding agent, agent runtime, and LLM API. The pi.dev domain was donated by exe.dev; new issues and PRs auto-close by default and are reviewed daily by maintainers; see CONTRIBUTING.md for details. Source-github
Open-source Proxy Enables Free Claude Code Access — An open-source drop-in proxy project, free-claude-code, routes Anthropic Claude Code API calls through 17 provider backends, enabling free access via CLI, VS Code, and Discord-like interfaces. It supports per-model routing and exposes Claude Code’s model picker through the proxy’s /v1/models endpoint, while letting users choose free, paid, or local models. The repo, by Alishahryar1, lists providers including NVIDIA NIM, OpenRouter, Gemini, Mistral, llama.cpp, and Ollama. Source-github
CUDA fast FWHT added to llama.cpp, boosts speed — A Reddit post reports that am17an added a fast Walsh-Hadamard Transform (FWHT) for CUDA to llama.cpp to accelerate quantized KV-cache paths. Benchmark results show modest gains: about 1-2% in pp tests and 7-9% in tg tests on a 5090 across various configurations. Source-reddit
Qwen 0.8B Fine-Tuned on Pangram for AI Detection — An AI content detector built on Qwen 0.8B, fine-tuned for Pangram’s dataset with EditLens, is showcased as a Chrome extension called Slop Hammer. The tool runs locally after downloading a ~400MB model from Hugging Face and returns the probability distribution of AI-generated text within about one second on an M1 MacBook Pro. The author compares Qwen 0.8B favorably against other models (Llama 3.2 3B, Qwen 2B, Gemma variants) after ~20 hours of fine-tuning on an RTX 3090. Source-reddit
Qwen3.6 27B Hits 1000 TPS on V100 GPUs — Reddit user Simple_Library_2700 reports 1000 TPS generation on Qwen3.6 27B using NVIDIA V100 GPUs. The test used 128 concurrent requests (more than needed), but for a single user the generation is around 80 t/s with about 3000 t/s processing, and no MTP (multi-tasking pipeline) was used. Source-reddit
Best Qwen 27B Q8 Quant? Debates and Options — A Reddit post asks which quantization setting yields the best performance for Qwen 27B. The author notes discussion around Q4–Q6 and mentions running Q8 from Unsloth, which is slow even with MTP ON, and asks whether to switch to Q8 35B A3B as an alternative. Source-reddit
MiniCPM5-1B Released: 1B-Parameter Local LLaMA Model — A Reddit submission highlights MiniCPM5-1B, a 1B-parameter model associated with LocalLLaMA. The post links to additional details about the model, though the item itself provides no in-depth specifications. Source-reddit

AI

Open-Source Claude Knowledge-Work Plugins — Anthropics open-sources 11 plugins to turn Claude Cowork into role-specific specialists for knowledge work. Each plugin bundles skills, connectors, slash commands, and sub-agents, and can be customized to a company’s tools, data, and workflows. The plugins are available via the Plugin Marketplace and GitHub, with broad Claude Code compatibility. Source-github

⚡ Quick Bites

Pope Leo XIV Issues Encyclical on Safeguarding Humanity in AI Era — The Vatican releases Magnifica Humanitas, an encyclical by Pope Leo XIV addressing the challenges AI poses to human dignity. It argues that the grandeur of humanity revealed in Christ cannot be replaced by machines and urges believers to remain profoundly human in the age of artificial intelligence. The document frames AI as a test of moral responsibility and calls for safeguarding the human person. Source-twitter
Implicit caching goes live on Qwen3.7-Max, faster and cheaper — Alibaba’s Qwen3.7-Max now supports implicit caching that activates automatically with no setup. The update promises faster, cheaper out-of-the-box performance, and notes an option to use explicit caching for higher, more deterministic hit rates. Source-twitter
Claude Code Enables API Reverse-Engineering via Network Requests — The post describes using Claude Code with browser_harness or Playwright to sniff network requests and infer API structures and authentication schemes from websites that are not easily navigable via the DOM. It shows how to test and map rate limits, enabling automated data retrieval and a range of projects, such as a travel CLI and website monitoring. Source-twitter
GPT-5.5 Pro Excels at Fact-Checking, Says Ethan Mollick — Ethan Mollick praises GPT-5.5 Pro as a robust fact-checker, capable of processing full chapters and locating key references accurately. He notes the model often surfaces nuanced caveats, indicating it can flag tiny details while preserving the general idea. This highlights GPT-5.5 Pro’s potential as a reliable AI-assisted fact-checking tool. Source-twitter
SkillOpt: Text-Space Optimization for Self-Evolving Agent Skills — The paper argues that agent skills should be trained as the external state of a frozen agent, using the same reproducible discipline as weight-space optimization. It introduces SkillOpt as, to their knowledge, the first systematic controllable text-space optimization approach for evolving agent skills. Source-huggingface
Rethinking Cross-Layer Information Routing in Diffusion Transformers — The paper presents a systematic empirical analysis of cross-layer information flow in Diffusion Transformers (DiTs), highlighting the residual stream’s inherited role from the original Transformer. It discusses implications for how information is routed across layers and suggests directions for redesigning cross-layer communication in DiTs. Source-huggingface
12x32GB SXM V100 Cluster for Local Legal AI — A lawyer provides an update on a local AI cluster, now featuring twelve V100-SXM2 32GB GPUs on a Threadripper Pro with careful intra-board NVLink layout. A second node (EPYC 7302P, 512GB RAM, 4x RTX 3090, 2x V100-PCIe) was added. The author has dropped vLLM for local models and continues to drive the setup with Claude Code, despite ongoing uncertainty about the process. Source-reddit
Can Smaller, Less-Quantised Models Outperform Larger Counterparts? — A Reddit user asks whether smaller models with looser quantisation can outperform larger models that are more quantised, citing examples like Gemma 4 31B Q4 K S vs Gemma 4 26B A4B Q8 and Qwen 3.6 27B Q4 K M vs Qwen 3.6 35B A3B Q6 K. They wonder at what point it is worth switching quantisation levels, noting a creative-writing use case. Source-reddit
Local LLMs Generate Custom Interactive Textbooks on the Fly — Reddit post explores using local, open-source LLMs to dynamically generate personalized interactive textbooks. The approach aims to create recursive, on-demand learning materials tailored to individual needs. It cites Local LLaMA and is submitted by user Ryoiki-Tokuiten on r/LocalLLaMA. Source-reddit
Concerns Over AI-Written Emails Signed as Human — A tweet expresses discomfort with emails advertised as human-authored but generated by AI, describing it as deceptive. The author questions who would tolerate such deception and highlights trust issues in AI-mediated communication. Source-twitter
OpenAI Dota bot architecture request; Olah on AI encyclical — A social media post calls on OpenAI to release the Dota bot architecture, accusing the group of hypocrisy. The item also notes Anthropic co-founder Chris Olah’s invitation to speak on Pope Leo XIV’s AI encyclical Magnifica humanitas, including links to his remarks. Source-twitter
Building with Codex — A tweet references using Codex to build projects. The post offers no additional context or details. Source-twitter

Generated by AI News Agent | 2026-05-25