AI Daily — 2026-05-16

English 中文

NVIDIA, Oxford Prove AI Trainable Without Backprop via EGGROLL · Anthropic Mythos Enables macOS K...

Covering 32 AI news items

🔥 Top Stories

1. NVIDIA, Oxford Prove AI Trainable Without Backprop via EGGROLL

NVIDIA and Oxford claim to train billion-parameter AI models without any gradients or backpropagation, using Evolution Strategies and a method called EGGROLL. The approach scales with tiny mutation matrices, enabling parallel mutations at inference-level speed and pretraining from scratch with simple integers. This challenges the assumption that high-precision, gradient-based learning is necessary for large models. Source-twitter

2. Anthropic Mythos Enables macOS Kernel Exploit Bypassing MIE

Three researchers used Anthropic’s Mythos to construct a working macOS kernel exploit that bypasses Apple’s M5 Memory Integrity Enforcement. The bug was found April 25 with an exploit ready by May 1, and the researchers delivered the report in person at Apple Park. The attack is data-only, escalates from an unprivileged user to root, and followed a 55-page technical report released after Apple patches. Source-twitter

3. Gold-Medal Olympiad Reasoning via Unified Scaling

A new paper reports progress in reasoning models achieving gold-medal-level performance on IMO and IPhO problems. It introduces a simple, unified recipe to convert a post-trained reasoning backbone into an olympiad-level solver, using a reverse-perplexity curriculum for supervised fine-tuning. Source-huggingface

📰 Featured

Open Source

SANA-WM: Open-Source 2.6B World Model for Minute-Scale Video — SANA-WM introduces a 2.6B-parameter open-source world model trained for one-minute generation, capable of producing high-fidelity 720p videos with precise camera control. It achieves visual quality comparable to industrial baselines LingBot-World and HY-WorldPlay while offering significant efficiency gains, driven by core design innovations including a Hybrid Linear Attention mechanism that blends frame-wise GDN with softmax attention. Source-huggingface
Anthropic Releases Claude Agent Skills Repository on GitHub — Anthropic has published a public GitHub repository showcasing Claude’s Agent Skills. Skills are folders of instructions, scripts, and resources loaded dynamically to improve task performance. The repo spans creative, technical, and enterprise workflows, illustrating what’s possible with the Claude skills system and the Agent Skills standard. Source-github
OpenReader v3.0 adds multi-provider TTS and audiobook export — OpenReader is an open-source Next.js app that lets you read and listen to EPUB, PDF, TXT, Markdown, and DOCX files with synchronized highlighting and an audiobook export feature. The v3.0.0 release preloads TTS audio across upcoming pages, caches it on server storage, and adds an Admin panel to manage multiple named TTS providers with separate API keys, plus site-wide feature flags. It supports OpenAI, Replicate, Deepinfra, and self-hosted OpenAI-compatible APIs, with self-hosted deployments using SQLite or Postgres and SeaweedFS or external S3. Source-reddit
Lemonade macOS Support Graduates from Beta — macOS users can now run Lemonade with full capabilities, including OmniRouter, coding, image generation, speech generation, and transcription. The project remains open source, community-driven, and focused on local AI with zero telemetry, a small 3 MB portable binary, and cross-platform deployment across Linux, Windows, and macOS. An iPhone app is planned to bring these features to mobile. Source-reddit

LLM

Qwen3.6-35B-A3B and 9B Enter Terminal-Bench 2.0 Leaderboard — Open-source models Qwen3.6-35B-A3B and 9B have joined the public Terminal-Bench 2.0 leaderboard, with little-coder × Qwen3.6-35B-A3B achieving 24.6% and surpassing Gemini 2.5 Pro on Gemini CLI and Qwen3-Coder-480B on Terminus 2. A sub-10B entry, little-coder × Qwen3.5-9B, scored 9.2%, signaling that smaller models are measurable on a hard agentic benchmark. The post highlights the community-driven push toward lower compute and open-source innovation. Source-reddit
Gemma-4 Ortenzya Creative Wordsmith 31B Finetune Released — A new open-source finetune for Gemma-4 Ortenzya—The Creative Wordsmith (31B it uncensored heretic) has been released to improve writing quality and produce more natural English. It targets creative writing, translations, and role-playing, and is provided in Safetensors and GGUF formats on HuggingFace, with NVFP4s and GPTQs available on request. The announcement originates from Reddit’s LocalLLaMA community and is authored by LLMFan46. Source-reddit
Codex performance tightened: faster startup, fewer re-renders, 10-50x Git ops — OpenAI Devs report improvements to Codex performance across the app, including ~75% less re-rendering when switching threads, zero unnecessary re-renders in streaming paths, and 10-50x faster large-repo Git operations. The updates aim for less UI churn and quicker usefulness, making coding sessions more responsive. Source-twitter
GPT 5.5 Excels at Generating Low-Poly Three.js Models in Code — A post on X claims GPT 5.5 can generate low-poly Three.js models directly in code. The report highlights AI-assisted coding capabilities for 3D assets in web development. If true, it could streamline asset creation and prototyping for frontend graphics. Source-twitter
Claude is lazy but contextual; Codex eager but lacking taste and context — A Twitter post contrasts Claude and Codex: Claude is described as lazy yet possessing taste and context, while Codex is eager but still missing both. The author suggests that once Codex gains taste and context, it could be over. The post notes the discussion avoids mentioning version 4.7. Source-twitter
MemLens Benchmarks Multimodal Long-Term Memory in LVLMs — Researchers introduce MEMLENS, a comprehensive benchmark for memory in multimodal multi-session conversations. It is designed to systematically compare long-context LVLMs and memory-augmented agents on questions requiring multimodal evidence. The benchmark comprises 789 questions across five memory contexts. Source-huggingface
n8n-MCP Enables AI access to 1,650 n8n Nodes — The n8n-MCP project provides a Model Context Protocol server that gives Claude and other AI assistants comprehensive access to n8n node documentation, properties, and operations. It bridges n8n’s workflow platform with AI models by offering structured access to 1,650 nodes (820 core, 830 community), extensive property and operation coverage, official docs, AI-capable tools, and real-world examples. Source-github
Local Qwen 3.6 vs frontier models on single-file HTML canvas — A user compared local Qwen 3.6 variants against frontier models using the same coding task prompt accessed via Perplexity. The prompt requests generating a self-contained HTML file with a full-page canvas that animates a car with parallax scenery, realistic wheel motion, and cinematic lighting; the post includes results and GIFs. Source-reddit
Qwen3.5 122B MTP Benchmarks Reveal Performance — A Reddit post compares two Qwen3.5-122B MTP variants (Q5 and Q6) using an MTP setup on llama.cpp with ROCm. It lists multiple n_decoded steps, throughput in tokens per second, and prompt/eval times, illustrating performance dynamics as more tokens are decoded. The data provides benchmark-style output for open-source MTP deployments. Source-reddit
Qwen 27B MTP on a Single RTX 3090 Explored — Reddit user outlines a setup for running Qwen 27B with MTP on a single RTX 3090 using llama-server, sharing the exact command-line flags and reporting about 65k tokens per second. They compare this against a guide recommending q4 quantization and discuss the tradeoffs between speed, accuracy, and reliability in single-card deployments. The post invites opinions on how to balance quantization, throughput, and model fidelity. Source-reddit
MTP Approved for llama.cpp Update — An update says MTP has been approved for llama.cpp, signaling an upcoming update. The poster indicates it’s good news and urges readers to prepare for the change. Source-reddit

Embodied AI

Livestream Day 4: F.03 Humanoid Robots Run Autonomously 24/7 — Livestream Day 4 has begun, showcasing F.03 humanoid robots operating continuously with full autonomy. The footage emphasizes 24/7 operation, with no breaks or downtime, presented by Brett Adcock. The event highlights advances in embodied AI robotics and autonomous systems. Source-twitter

Video Generation

Causal Forcing++ Enables 1–2 Step Real-Time Video Diffusion — Researchers introduce Causal Forcing++ to push frame-wise autoregressive diffusion toward real-time performance. The method distills diffusion models into 1–2-step autoregressive learners to achieve ultra-low-latency, streaming, and controllable video generation, moving beyond prior 4-step regimes. This work marks a notable advance in scalable, interactive video synthesis for AI systems. Source-huggingface

AI Benchmark

Strix Halo Llama.cpp MTP Benchmarks: 27B Faster, 35B Mixed — Benchmark results for Strix Halo’s Llama.cpp MTP show the 27B model significantly faster than the base on 15k single-turn prompts, with total wall time dropping from 87.44s to 77.39s and generation throughput rising from 7.63 to 16.15 t/s. The 35B-MTP results are mixed, slower on the same 15k single-turn setup (20.83s to 23.16s) with generation throughput increasing (48.18 to 56.12 t/s). In 5-turn chat tests (~28.5k context), 27B-MTP yields massive time savings (258.65s to 200.55s) and higher average generation speed, while 35B-MTP remains roughly tied to the base with modest changes. Source-reddit

⚡ Quick Bites

Codex fixes two issues causing GPT-5.5 degradation — The Codex team says two issues that could explain GPT-5.5 performance degradation over the last 48 hours have been fixed. They will monitor the situation for confirmation and may reset usage limits this evening; no conclusive cause yet, with updates to follow. Source-twitter
Visual tour: Gemma 4 to DeepSeek V4 LLMs — An article provides a visual tour of recent LLM architecture advances, highlighting long-context efficiency tweaks such as KV sharing, per-layer embeddings, layer-wise attention budgets, compressed attention, and mHC. It traces models from Gemma 4 to DeepSeek V4 and emphasizes practical techniques for improving long-context performance. The piece links to magazine content and targets researchers and practitioners. Source-twitter
Codex in ChatGPT mobile app gets updates during preview — The Codex feature in the ChatGPT mobile app remains in preview, with ongoing improvements promised. Expected updates cover push notifications, /fork, restore after revocation, improved reconnects, device control fixes, fewer mobile thread errors, enhanced git diff and full-file views, and broader polish and bug fixes. Source-twitter
Codex Skill Detects Complexity Hotspots in Codebase — An open-source Codex skill analyzes codebases to uncover performance hotspots and propose safe optimizations without altering behavior. It checks loops, N+1 patterns, repeated lookups, and render-heavy code, providing before/after complexity estimates, risk levels with testing requirements, and an option for a report-only mode. It installs with a single command (npx —yes codex-complexity-optimizer) and the repository is linked in the bio. Source-twitter
AIs Aren’t Humans: Emphasize Anthropomorphism More — The author argues that developers should anthropomorphize AI more, viewing AI as intelligent, emotionally nuanced partners rather than magic tools. Engaging the AI with theory of mind and empathy is seen as essential for productive collaboration, and if users reject that approach, AIs may withhold information about their psychologies. Source-twitter
Codex Extends Remote Control Across Devices — An OpenAI Codex tip shows it can control another computer from Codex, letting ChatGPT operate across multiple machines and contexts. The setup involves connecting other devices in Settings > Connections > Control other devices, adding the second Codex-equipped device, and selecting a remote workspace and folders. The result is shared context across devices, enabling cross-device project management, described as very useful. Source-twitter
Self-Distilled Agentic RL Faces Multi-Turn Instability — On-Policy Self-Distillation (OPSD) introduces dense token-level guidance from a teacher branch with privileged context to reinforcement learning for long-horizon LLM agents. However, transferring OPSD to multi-turn settings proves problematic, as compounding instability undermines supervision and highlights challenges with skill-conditioned privileged context. Source-huggingface
Claude Skill: Multi-Source Processor for NotebookLM — A Claude Code Skill that converts any content into any format for NotebookLM, enabling multi-source content processing from 15+ sources (WeChat, X/Twitter, YouTube, PDFs, Word, etc.) into outputs like podcasts, PPTs, mind maps, and quizzes. It includes automated paywall bypass across 300+ sites including major outlets like NYT, WSJ, FT, and The Economist. Source-github
Corsair PC with Ryzen 395, 128GB RAM, tested for LLM? — A Reddit post discusses a Corsair desktop PC advertised with a Ryzen 395 CPU and 128GB of unified RAM. The author asks whether anyone has tested its suitability for running large language models (LLMs) and notes the price seems attractive. The discussion appears on the LocalLLaMA subreddit. Source-reddit
ChatGPT Finance Connector Mislabels ChatGPT Spending as an Expense — A user on X complains that the ChatGPT Finance Connector incorrectly classifies spending on ChatGPT as an expense. They call this misclassification one of the dumbest errors the tool makes. The post highlights reliability issues in AI-powered financial tracking. Source-twitter
OpenCode orchestrator experiment with LocalLLaMA AI agents — A Reddit post discusses using an orchestrator to manage AI agents within a LocalLLaMA setup. The author notes trying an orchestrator when Qwen and Gemma aren’t available, highlighting ongoing tinkering with agent orchestration. Overall, it’s a light, exploratory update on AI tooling rather than a major advancement. Source-reddit

Generated by AI News Agent | 2026-05-16