AI Daily — 2026-03-12

English 中文

Google Maps Now Powered by Gemini; Ask Anything While Driving · Meta Unveils MTIA 300–500 Chips F...

Covering 30 AI news items

🔥 Top Stories

1. Google Maps Now Powered by Gemini; Ask Anything While Driving

Google Maps is getting its biggest upgrade in over a decade by integrating Google’s Gemini models. The update enables natural-language queries and AI-driven understanding of the world while driving, reshaping navigation and exploration. It promises new ways to interact with maps and get real-time guidance. Source-twitter

2. Meta Unveils MTIA 300–500 Chips Focused on Inference

Meta introduced four generations of MTIA chips (300–500) built over roughly two years, emphasizing an inference-first design and modular chiplets to enable rapid iteration. MTIA 450/500 are optimized for GenAI inference rather than training, with memory bandwidth scaling from 6.1 TB/s to 27.6 TB/s across the line, and the MX4 on the MTIA 500 delivering ~30 PFLOPS. The stack includes PyTorch-native support with vLLM, torch.compile, Triton, and vLLM plugins, and claims models can run on both GPUs and MTIA without rewrites. Source-reddit

3. AI Bottleneck: Human-Guided Pattern Memorization

Current AI techniques depend on pattern memorization and retrieval, requiring humans to specify which patterns to memorize through training data and reinforcement learning environments. The author argues that AI cannot yet operate in a fully autonomous, open-ended way and remains a reflection of human cognition rather than an independent agent. Source-twitter

📰 Featured

LLM

Grok 4.20 Beta Cuts Hallucinations, Boosts Adherence and Speed — Grok 4.20 Beta reports three major improvements over Grok 4: the lowest hallucination rate on the AA-Omniscience evaluation, top instruction-following scores on IFBench (82.9%), and leading output speed at 265 tokens per second on xAI’s API. The update also signals rapid performance gains versus Grok 4.1 and congratulates xAI and Elon Musk on the launch. Source-twitter
Claude Adds Interactive In-Chat Charts in Beta — Anthropic’s Claude now supports generating interactive charts and diagrams directly in chat. The feature is available in beta on all plans, including free, and can be accessed via claude.ai. Source-twitter
Cursor Reveals New Scoring Method for Agentic Coding — Cursor has introduced a new benchmarking method to score models on agentic coding tasks. The post compares Cursor models on intelligence and efficiency, highlighting their performance relative to existing approaches. Source-twitter
OmniCoder-9B: 9B Coding Agent Trained on 425K Trajectories — OmniCoder-9B is a 9-billion-parameter coding agent from Tesslate, fine-tuned on top of Qwen3.5-9B’s hybrid architecture. It was trained on 425,000+ curated agentic trajectories from real-world software tasks and traces from Claude Opus 4.6 and other models (GPT-5.4, GPT-5.3-Codex, Gemini 3.1 Pro). The model shows agentic behaviors such as read-before-write recovery, LSP-diagnostic responsiveness, and the use of targeted edit diffs, indicating strong practical coding and reasoning capabilities. Source-reddit
GATED_DELTA_NET Vulkan merged in llama.cpp boosts performance — An update merging GATED_DELTA_NET for Vulkan into llama.cpp is now in the latest release. The change yields a noticeable performance boost on an AMD RX7800XT system running Fedora Linux. In a test with Qwen 3.5 27B, token generation increased from roughly 28 t/s to 36 t/s. Source-reddit
Qwen3.5 Grabs Edge Over gpt-oss-120b in 96GB VRAM Coding — Qwen3.5 is emerging as a credible challenger to gpt-oss-120b for 96GB VRAM agentic coding tasks, adding vision support, parallel tool calls, and twice the context length. It comes with higher quality variance and slower speed due to its larger parameter count and new architecture. The discussion centers on whether any Qwen3.5 variants have replaced gpt-oss-120b, with users sharing experiences and configurations like Qwen3.5-122B UD_Q4_K_XL gguf, non-thinking, and sampling settings. Source-reddit
Nemotron-3-Super-120B-A12B NVFP4 Benchmark on RTX Pro 6000 — The Nemotron-3-Super-120B-A12B NVFP4 model was benchmarked on a single RTX Pro 6000 using vLLM, with fp8 KV cache settings per Nvidia’s setup (unclear if metrics used fp8). The test ran across context sizes from 1K to 512K and 1–5 concurrent requests, generating 1024 output tokens per request and with no prompt caching, reporting steady-state averages under sustained load. Results show per-user generation speeds and time-to-first-token that decline as context and user count rise, reflecting team-oriented benchmarking rather than peak single-user performance. Source-reddit

Open Source

World’s Largest Open-Source Dataset of Computer-Use Recordings Launches — A new open-source dataset comprising 10,000+ hours of computer-use recordings across Salesforce, Blender, Photoshop, and more has launched to advance white-collar automation. The project aims to enable researchers and developers to model real user workflows and automate higher-level office tasks. Source-twitter
Fish Speech: SOTA Open-Source TTS by Fish Audio — Fish Audio releases Fish Speech, a state-of-the-art open-source text-to-speech system with multilingual English support. The project is licensed under the Fish Audio Research License, with warnings about license compliance and DMCA, and includes official installation and deployment docs for S2 and related tools. The GitHub repository fish-audio/fish-speech hosts the model weights and code. Source-github
Summarize 0.12.0 adds NVIDIA and AssemblyAI support — Summarize v0.12.0 introduces a NVIDIA provider alias for NVIDIA OpenAI-compatible endpoints and adds AssemblyAI transcription. It also includes UI/media workflow improvements (Chrome sidepanel enhancements, better YouTube/video switching) and xurl support for X, enabling faster summaries from URLs, files, and media. Source-twitter
AstrBot: Open-Source Agent Chatbot Platform for IM — AstrBot is an open-source, all-in-one agent chatbot platform that integrates with mainstream IM apps, LLMs, plugins, and AI features. It enables building production-ready AI applications within IM workflows, with multi-platform support and integrations with other agent platforms. Source-github

Multimodal

OpenAI Unveils Sora 2-Powered Video API Capabilities — OpenAI introduced new Video API capabilities powered by Sora 2. The update adds features such as custom characters and objects, 16:9 and 9:16 exports, clips up to 20 seconds, video continuation to extend scenes, batch generation jobs, and HLS playback. Source-twitter

AI

Flash-KMeans Enables Fast Online Exact K-Means — Researchers reframe k-means as an online primitive suitable for modern AI system design, introducing Flash-KMeans as a fast, memory-efficient exact implementation. They argue that current GPU-based k-means is bottlenecked by system-level constraints rather than algorithmic limits, and outline approaches to make online k-means practical in production. The work signals a path toward integrating k-means into real-time AI workflows. Source-huggingface
Qwen3.5-397B MoE Benchmarks Hit 50.5 tok/s on SM120 — A researcher benchmarked every MoE backend for Qwen3.5-397B-NVFP4 on four RTX PRO 6000 Blackwell workstation GPUs and achieved 50.5 tok/s sustained decode—the best reported on SM120 to date, contradicting circulating claims of 130+ tok/s. The study attributes higher reported speeds to broken NVIDIA CUTLASS kernels on workstation GPUs and tested 16 configurations across Docker images, two inference frameworks, MoE backends, MTP, CUDA versions, EP/PP/TP combinations, and kernel patches. Source-reddit

LLM Benchmarking

MLX Not Faster Than llama.cpp on M1 Max, Benchmark Finds — A Reddit user benchmarks MLX against llama.cpp on an M1 Max 64GB Mac using Qwen3.5-35B-A3B GGUF in LM Studio. They find that prefill latency, driven by context size, dominates total response time, making generation-only tokens-per-second metrics misleading. The post calls for broader benchmarking, including M2–M5 comparisons, and highlights practical performance over simple tok/s speeds. Source-reddit

AI Hardware

Qwen 3.5 35B A3B Inference on Raspberry Pi 5 — An update reports progress running Qwen 3.5 35B A3B on Raspberry Pi 5 using a customized llama.cpp build and various quantizations. With 16k context and vision encoding, 2-bit A3B achieves about 3.5 t/s on a 16GB Pi and 2.5 t/s on an SSD-enabled 8GB Pi, while smaller 2-bit quants reach up to 4.5 t/s; 2B 4-bit hits ~8 t/s on both. The author shared a demo link and invites Pi 5 owners to test, noting ongoing tweaks and ARM prompt-caching work. Source-reddit

⚡ Quick Bites

Claude Code gains gstack for exact skill setup via paste — Garry Tan introduces gstack, a method to install his exact Claude Code skill setup by pasting a short text into Claude code. The approach aims to let users quickly mirror a preferred Claude Code configuration for coding tasks. Source-twitter
DeepMind expands London presence with Platform 37, nod to Move 37 — Demis Hassabis announces DeepMind’s new London building, Platform 37, to deepen ties in the city. The project pays homage to AlphaGo’s Move 37 and frames Platform 37 as a tribute to science and AI and a space for future breakthroughs. Source-twitter
Perplexity Computer Now Available to Pro Subscribers with 20+ Models — Perplexity announced that Perplexity Computer is now available to Pro subscribers, unlocking access to its full suite of 20+ advanced models, prebuilt and custom skills, and hundreds of connectors. Max subscribers gain monthly credits and higher spend limits than Pro. Source-twitter
Gemini API gains spend caps for developers — Gemini announces spend caps in its API to give developers more control over usage costs. The update aims to improve cost predictability and peace of mind for building with Gemini, inviting users to set caps and share feedback. The tweet also briefly mentions enabling HLS playback. Source-twitter
Codex App Adds Theme Personalization and Sharing — The Codex app now supports theme customization, allowing users to tailor the UI to their taste. Users can import themes they like or share their own with the community. Source-twitter
Why Isn’t He at a Frontier AI Lab Amid AI Boom? — An AI-focused tweet questions why a key figure is not at a frontier AI lab during a pivotal moment in AI development. The post highlights concerns about leadership visibility amid rapid frontier AI progress. Source-twitter
OpenClaw-RL: Train Any Agent Simply by Talking — OpenClaw-RL introduces a reinforcement learning framework that treats next-state signals (e.g., user replies, tool outputs, GUI/terminal state changes) as universal online learning sources. The approach claims policy can learn from all such signals simultaneously, unifying diverse interactions like conversations, terminal runs, GUI actions, SWE tasks, and tool-call traces as training data. Source-huggingface
Claude After Compaction — The post references Claude’s compaction, suggesting a compacted or distilled version of Anthropic’s language model. Details are sparse, with no additional information provided in the tweet. Source-twitter
Qwen3.5-9B Shines for Agentic Coding on Limited Hardware — Reddit user on an RTX 3060 tested Qwen 3 with multiple quantizations and tool-calling setups. They report Qwen3.5-9B, especially when optimized for tools and using UD-TQ1_0 for code completion, delivering strong agentic coding performance on constrained hardware, outperforming smaller Qwen variants. Some quantizations remain slower or unstable, but the overall experience is positive. Source-reddit
From Function Calls to Unix Pipelines for AI Agents — Former Manus backend lead turned AI agent developer explains that after two years building agents, he abandoned a catalog of typed function calls in favor of a single run(command=’…’) approach using Unix-style commands. He argues that text streams and small, composable tools outperform structured function catalogs, citing his Pinix runtime and agent-clip project as demonstrations and sharing production lessons that shaped his principles. Source-reddit
Llama.cpp with Brave Search MCP: Addictive Local AI — A Reddit post promotes enabling llama.cpp with Brave’s MCP search to run a local AI search experience. The poster says it’s funny and addictive to watch GPU fans spin up while using ‘Your own Google’. Source-reddit

Generated by AI News Agent | 2026-03-12