AI Daily — 2026-03-29

English 中文

MLB Uses Sony Hawk-Eye AI to Call Balls and Strikes · MistralAI Voxtral TTS Delivers Expressive M...

Covering 23 AI news items

🔥 Top Stories

1. MLB Uses Sony Hawk-Eye AI to Call Balls and Strikes

Major League Baseball began using a computer-vision AI system to call balls and strikes, making Hawk-Eye the authoritative ruling alongside umpires for the first time. The Sony system reads ball seam patterns and spin metrics to adjudicate pitches through an AI pipeline of models from capture to output. In a recent game, several calls were overturned and fans cheered the precision of the machine. Source-twitter

2. MistralAI Voxtral TTS Delivers Expressive Multilingual Speech from ~3s Reference

MistralAI introduces Voxtral TTS, which separates semantic content from acoustic voice to achieve expressive speech. The system uses Voxtral Codec to compress speech into ultra-low bitrate tokens and supports 9 languages, delivering high-quality voice cloning from about 3 seconds of reference audio. It reportedly achieves a 68.4% win rate against ElevenLabs Flash v2.5 in voice cloning. Source-twitter

3. M5-Max MacBook Pro 128GB RAM Benchmarks Qwen3-Coder-Next 8-Bit

Two local inference backends, MLX (Apple’s native framework) and Ollama (llama.cpp-based), are tested with Qwen3-Coder-Next 8-Bit on Apple Silicon to measure throughput, time-to-first-token, and coding capability. The methodology uses three iterations per prompt with results averaged, excluding the first TTFT. The reported highlight is that the M5-Max with 128GB RAM achieves about 72 tokens per second using MLX. Source-reddit

📰 Featured

LLM

Google’s TurboQuant Cuts KV Cache, Boosts Inference — Google’s TurboQuant aims to compress the KV cache to 3–4 bits with supposedly zero accuracy loss, targeting faster local LLM inference rather than model weights. The discussion questions whether the benefit is mainly enabling very large context windows or delivering broader speedups, and how well the claimed 8x H100 speedup scales to consumer GPUs and Apple Silicon. Source-reddit
Local models passed GPT-3.5, claims circulate online — An online thread asserts that local AI models surpassed GPT-3.5 about 18 months ago, challenging the dominance of GPT-3.5. The discussion mentions Openclaw and a search tool, with casual banter about how local models may improve relative to GPT-3.5. Source-twitter
Anthropic Mythos Rumors Boost Open-Weight AI Push — Andrew Curran tweets that rumors of Anthropic achieving its largest training run and producing a model that outperforms scaling expectations appear credible, likely pointing to Mythos. The rumors also mention an architectural breakthrough at a frontier lab. He argues that open-weight models are advantaged when frontier compute is expensive to meter, driving demand for more models and infrastructure. Source-twitter
KV Rotation PR Recovers Q8 Quant Performance on AIME25 — A recent kv-rotation pull request for llama.cpp shows that Q8 KV quantization performance on the AIME25 benchmark dropped, but can be largely recovered with kv rotation. The discussion notes potential benefits for existing Q8 users, though the author plans to stick with FP16 for the foreseeable future. The insight comes from a comment on the PR by Betadoggo_. Source-reddit
Zinc: Zig-based LLM Inference for 35B on AMD GPUs — A new LLM inference engine called Zinc is being built in Zig to run 35B-parameter models on AMD GPUs using Vulkan. It highlights direct Vulkan C ABI access, per-quantization dispatch via comptime, automated GLSL shader compilation, and a single-binary build process aimed at making local LLMs feasible on consumer hardware. Source-reddit
Tinylora LoRA achieves 13-parameter training, confirms claims — Tinylora demonstrates that LoRA-style fine-tuning can alter model behavior with very few parameters. A Reddit replication on Qwen-3.5 finds that assigning 13 shared parameters to all MLP layers and 13 to all attention layers (26 total) improves convergence compared to larger global counts. The author plans to explore per-layer parameterization as the next step. Source-reddit
Kimi K2.6 to Launch Soon; K3 Aims to Match US Models — Moonshot insiders say Kimi K2.6 will drop within 10-15 days as a small upgrade. Development of K3 continues, with the goal of matching American models in parameter count to achieve comparable performance. Source-reddit
Meta Teases Avocado Open-Source Model Family — Meta’s internal selector shows several Avocado configurations under evaluation, including Avocado 9B, Avocado Mango (multimodal with agent/sub-agent labels), Avocado TOMM (Tool of Many Models), Avocado Thinking 5.6, and Paricado (text-only). The information is sourced from an internal model selector and a TestingCatalog article, with a Reddit post referencing the source. Source-reddit

Multimodal AI

JEPA family broadens: latent-space prediction across modalities — An X thread outlines a range of JEPA variants aimed at efficient self-supervised learning across modalities. It explains how each variant adapts latent-space prediction for vision, video, audio, and 3D data—covering hierarchical structuring (H-JEPA), efficient semantics (I-JEPA), and field-specific versions for AVs and robotics (MC-JEPA, V-JEPA, Point-JEPA, 3D-JEPA, ACT-JEPA, V-JEPA 2). The thread emphasizes reducing compute by masking patches and leveraging latent representations, enabling scalable action tracking and imitation learning. Source-twitter

AI

Linux Inference Far Faster Than Windows for Ollama/LLaMA Tests — A Reddit user compared AI model inferences on Linux (Ubuntu 22.04 LTS) vs Windows 10 using Ollama with LLaMA variants. In two benchmark setups, Linux achieved 72% to 118% higher throughputs (e.g., 31 t/s vs 18 t/s and 105 t/s vs 48 t/s). The post invites others to share similar observations, highlighting OS impact on AI inference performance. Source-reddit

⚡ Quick Bites

All Chapters of Build A Reasoning Model From Scratch in Early Access — All chapters of Build A Reasoning Model From Scratch are now available in early access. The book is in production and expected to release in coming months with full-color print and syntax highlighting; preorder is available on Amazon. Source-twitter
Recursive self-improvement advances in fits and S-curves — The piece argues that recursive self-improvement will unfold in fits and starts along S-curves, with progress punctuated by phases of drought. It also highlights AI winters and the lag before the next chip generation becomes available, framing advancement as non-linear. Source-twitter
Carlini: current AI models are better at finding vulnerabilities — Alex Palcuie notes on X that he’s long followed Nicholas Carlini’s work on vulnerability research and was glad to see him present publicly. Palcuie quotes Carlini saying that current AI models are better vulnerability researchers than he is, adding that he used to do this somewhat professionally. The post links to a public talk and highlights Carlini’s perspective on model vulnerabilities. Source-twitter
AI Doomsday Toolbox v0.932 Adds Benchmarking, Datasets, Termux Workflows — An Android app for running local AI gains a suite of major features: benchmarking for local LLMs with adjustable thread counts to optimize setups and compare configurations. It also adds a dataset creator to import text or PDFs, generate QA pairs, and export datasets in Alpaca JSON format, plus enhanced Termux/proot workflows with SSH and in-app tool management, and a new AI agent workspace built on local backends. Source-reddit
Europe to be viewed as a global AI powerhouse — An Oliver Molander post argues that Europe should be recognized as a global AI powerhouse, challenging the notion that AI leadership lies elsewhere. He contends that many of the defining figures behind Transformers and LLMs are European, urging a reframing of the narrative. Source-twitter
AI field values NeurIPS peer review, contrasts with unreviewed work — A tweet highlights NeurIPS as a model of rigorous peer review and contrasts it with critiques of non-peer-reviewed work. The item frames this as emblematic of the AI field’s emphasis on formal validation, while acknowledging heated online disagreement. Source-twitter
Debate Over Alec’s Role in GPT-1 Pretraining — Social media asserts that Alec pushed for pretraining transformer language models and built GPT-1, claiming all LLMs trace to him. The post argues that labeling him the inventor of pretraining is a stretch and disrespects others’ contributions, igniting debate about AI history and credit. Source-twitter
AI Dot Engineer Singapore Event May 15-17 — Announcement for the AI Dot Engineer conference in Singapore, scheduled for May 15-17. The teaser promises a fun, AI-focused gathering by the aiDotEngineer community. No further details are provided. Source-twitter
Help mounting two RTX 3090 GPUs in one case — A Reddit user asks for the best way to fit two NVIDIA GeForce RTX 3090 GPUs in the same home server case when the first card partially blocks the second PCIe slot. They outline three options involving low-position risers, moving the power supply, or a vertical mount, with both cards limited to 220W and airflow concerns. They also consider relocating the PSU for improved airflow and seek practical mounting and securing advice. Source-reddit
LocalLLaMA 2026 Post: ‘We Are Doomed’ — A Reddit post in r/LocalLLaMA titled ‘LocalLLaMA 2026’ contains the terse message ‘we are doomed’ by user /u/jacek2023, with links to discussion comments. The post provides minimal information about LocalLLaMA 2026 beyond the sentiment expressed. Source-reddit

Generated by AI News Agent | 2026-03-29