AI Daily — 2026-03-27

English 中文

Capybara tops Claude Opus 4.6 with higher scores · Gabriberton Joins Google DeepMind to Train VLM...

Covering 27 AI news items

🔥 Top Stories

1. Capybara tops Claude Opus 4.6 with higher scores

Anthropic unveiled Capybara, a new AI model promising dramatically higher performance than Claude Opus 4.6 in software coding, academic reasoning, and cybersecurity. Reportedly, Capybara could be a 10T-parameter model that cost about $10 billion to train, according to a prior interview with Dario. Source-twitter

2. Gabriberton Joins Google DeepMind to Train VLMs

An AI researcher using the handle @gabriberton announces joining Google DeepMind to train Vision-Language Models (VLMs). He will continue posting about AI, computer vision, and LLM developments, but will stop sharing PyTorch tips and may post about JAX. Source-twitter

3. Codex usage limits reset to enable plugin experiments

Codex usage limits have been reset across all plans to let everyone experiment with newly launched plugins. The message encourages developers to build unlimited things with Codex and have fun. Source-twitter

📰 Featured

AI Safety

Ruling Likely to Favor Anthropic; Government Actions Unconstitutional — A court ruling signals Anthropic is likely to prevail on most theories that the government’s actions were unlawful and unconstitutional. The post notes broad amici support for Anthropic and zero briefs backing the US government, while the author reflects on the personal cost of opposing the administration. Source-twitter

Open Source

SAM 3.1 Adds Object Multiplexing for Faster Video Processing — Meta releases SAM 3.1, a drop-in update to SAM 3 that adds object multiplexing to boost video processing efficiency without sacrificing accuracy. The update aims to enable high-performance AI applications on smaller, more accessible hardware and invites community adoption with accessible model checkpoints and codebase. Model Checkpoint: go.meta.me/8dd321; Codebase: go.meta.me/b0a9fb. Source-twitter
Insanely Fast Whisper: Ultra-Fast On-Device Transcription — A new open-source CLI, insanely-fast-whisper, claims to transcribe 150 minutes of audio in under 98 seconds on Nvidia A100 80GB using Whisper Large v3. It leverages FP16, batching, BetterTransformer, and Flash Attention 2 to dramatically speed up transcription, with multiple benchmark configurations. The project is open-source and hosted on GitHub, showcasing notable AI optimization for on-device speech recognition. Source-github

AI Translation

Google Translate Live Translate Arrives on iOS with Headphones — Google Translate’s Live Translate feature, used with compatible headphones, is officially available on iOS and will expand to more countries for both Android and iOS. The service supports 70+ languages and lets users connect headphones via the Translate app to translate in real time. Source-twitter

LLM

April Teased: GPT-5.5, Claude 5, Mythos DeepSeek-V4 — A tweet teases upcoming AI model releases: GPT-5.5, Claude 5, and Mythos DeepSeek-V4, suggesting major updates could arrive in April. The post signals heightened anticipation around new LLMs from top labs. Source-twitter
GLM-5.1 Live: Coding Ability on Par with Claude Opus 4.5 — Zhipu AI’s GLM-5.1 is now available to Coding Plan users. It achieves high open-source benchmarks and matches Claude Opus 4.5 on coding tasks, featuring a 200K context window, 128K max output, and 744B parameters with 28.5T pretraining data, plus native MCP support. It enables autonomous, multi-step coding with long-context refactoring and agentic workflows, via Coding Plan Lite/Pro/Max on Zhipu AI’s platform. Source-reddit
Google TurboQuant Runs Qwen Locally on MacBook Air — A Reddit post describes patching llama.cpp with Google’s TurboQuant compression to run Qwen 3.5–9B on a MacBook Air (M4, 16 GB) with 20,000-token context. The experiment suggests large-context prompts may be feasible on consumer hardware, hinting at OpenClaw-like capabilities on non-Pro devices. The post also mentions a MacOS app (atomic.chat) and invites others to try similar setups. Source-reddit
Gemini Pro leaks chain-of-thought, loops endlessly — Reddit reports Gemini Pro output its internal reasoning and system prompts instead of an answer, then entered an infinite loop and produced thousands of ‘(End)’ lines. The incident highlights concerns about chain-of-thought leakage and model behavior. It underscores the risks of exposing internal prompts and the potential for uncontrolled output in AI systems. Source-reddit
Google’s TurboQuant compresses LLMs 6x with no quality loss — Google’s TurboQuant AI-compression algorithm reportedly cuts large language model memory usage by about six times without reducing output quality. The approach promises more efficient deployment of AI models, potentially enabling frontier-level models to run on consumer hardware. The news item references Ars Technica’s coverage and mentions discussions on Reddit. Source-reddit
AI-assisted mRNA vaccine protocol for dog Rosie — Paul S. Conyngham used ChatGPT and other LLMs to create an mRNA vaccine protocol to save his dog Rosie. He says the AI tools empowered him to perform research-like tasks with human oversight, combining machine guidance with expert input. The story hints at turning such AI-enabled biotech work into a company, illustrating a notable real-world AI-assisted bio-design example. Source-twitter
AgentScope Unveils Production-Ready Agent Framework for LLMs — AgentScope introduces a production-ready agent framework designed to scale with evolving LLM capabilities, focusing on reasoning and tool use over strict prompts. It promises a quick start (5 minutes) with built-in ReAct, memory, planning, human-in-the-loop steering, and model finetuning, plus extensible tooling and multi-agent orchestration. Deployment options cover local, serverless cloud, and Kubernetes with OpenTelemetry support. Source-github
OpenSource4o Movement Trends as GPT-4o Opens Source — A Reddit post notes that the OpenSource4o movement is trending on Twitter/X, advocating open-source or open-weight releases related to GPT-4o. It references the GPT-OSS models released 8 months ago (120B and 20B) and promises to provide more details (website, petitions) in the comments, aiming to surface additional open models for coding, writing, and content creation. Source-reddit

Industry

Google Nears Funding Deal for Anthropic Data Center — Google is nearing a deal to fund Anthropic’s data center, per the Financial Times. The agreement would expand Google’s AI infrastructure investments and increase Anthropic’s computational capacity. The reported move highlights ongoing AI infrastructure collaboration between major tech firms. Source-twitter

Multimodal

Intern-S1-Pro: First trillion-parameter scientific multimodal foundation model — Intern-S1-Pro is introduced as the first one-trillion-parameter scientific multimodal foundation model. It reportedly improves general and scientific reasoning, strengthens image-text understanding, and adds advanced agent capabilities, while covering over 100 specialized tasks across critical science fields. Source-huggingface
PixelSmile Enables Fine-Grained Facial Expression Editing — Researchers introduce the Flex Facial Expression (FFE) dataset with continuous affective annotations and FFE-Bench to measure editing accuracy, controllability, and identity trade-offs. They then propose PixelSmile, a diffusion-based framework that disentangles expression semantics via fully symmetric joint training. Source-huggingface

AI

Chandra OCR 2 Advances Multimodal Document Layout OCR — Chandra OCR 2, the latest release from datalab-to, claims state-of-the-art performance in converting images and PDFs into structured HTML, Markdown, or JSON while preserving layout. It improves handling of math, tables, forms, and multilingual OCR, supports 90+ languages, and offers strong handwriting support, form reconstruction, and image extraction with captions. The model runs in local (HuggingFace) or remote (vLLM server) modes and includes a hosted API. Source-github
RealRestorer Advances Generalizable Real-World Image Restoration with Editing Models — Real-world image restoration remains challenging due to diverse degradations and limited training data. The article highlights that large-scale image editing models generalize well to restoration tasks, with closed-source models like Nano Banana Pro achieving effective restoration while preserving image content. Source-huggingface

ASR

VibeVoice 9B tops open-source medical STT at 8.34% WER — In v3 of a medical STT benchmark, 31 models are evaluated, with Microsoft VibeVoice-ASR 9B taking the open-source crown at 8.34% WER (nearly matching Gemini 2.5 Pro at 8.15%). However, its 9B parameters require about 18 GB VRAM and it’s slow (97s/file) compared with faster models like Parakeet. The study also notes a Whisper text normalizer bug inflating WER by 2-3% across models, and adds ElevenLabs Scribe v2, NVIDIA Nemotron Speech Streaming 0.6B, and Voxtral Mini 2602 to the roster; all code and results are open-source. Source-reddit

⚡ Quick Bites

Claude Code: Like Codex drunk—fun, creative, but error-prone for prod — Claude Code is described as a playful, more creative coding assistant compared to Codex. While it is friendly and entertaining, it can make dumb mistakes and should not be trusted in production settings. Source-twitter
EVA: Efficient RL for End-to-End Video Agent — The article highlights challenges in video understanding with multimodal LLMs due to long token sequences and redundant frames. It notes that current approaches often treat MLLMs as passive recognizers or rely on manually designed, perception-first workflows. EVA is presented as an approach to achieve efficient reinforcement learning for end-to-end video agents to address these inefficiencies. Source-huggingface
Dexter: Autonomous AI for Deep Financial Research — Dexter is an open-source autonomous AI agent designed for financial research. It decomposes complex questions into step-by-step plans, uses live market data to execute tasks, and self-validates results to produce data-backed analyses. Source-github
Are 2B Models Practical or Just Toys on Devices? — A Reddit user tested local-hosted 2B models (qwen2.5/3.5, gemma) on a smartphone and found that about 80% of responses were hallucinations. They ask whether this is user error or an inherent limitation, highlighting challenges in on-device LLMs. The discussion underscores current limits and the practicality of 2B models for real tasks. Source-reddit
Tweet says UI reveals era of model used — An X post points out that background gradients and button colors can indicate which generation of AI model was used to build an app. The author describes this as somewhat silly, highlighting how UI design can betray underlying technology. Source-twitter
Worth Upgrading from 48GB to 60GB VRAM? — A Reddit user with two RTX 3090 GPUs (48GB total VRAM) and an extra RTX 3080 (12GB) asks whether moving to 60GB VRAM offers real benefits for AI workloads. They’re seeking use-case guidance and want to avoid the hassle of adding a third GPU unless memory gains justify it. Source-reddit

Generated by AI News Agent | 2026-03-27