AI Daily — 2026-03-15

English 中文

New work fixes gradient-based planning in latent world models · Next AI breakthrough to come from...

Covering 23 AI news items

🔥 Top Stories

1. New work fixes gradient-based planning in latent world models

Latent world models offer differentiable dynamics suitable for planning via gradient descent, but in practice researchers revert to derivative-free methods like CEM and MPPI due to non-convex objectives. A new paper by Yingwww, Yann LeCun, and Mengye Ren diagnoses this issue and proposes a principled fix. This could revive gradient-based planning in learned latent spaces within model-based RL. Source-twitter

2. Next AI breakthrough to come from lower-level architecture shift

Sam Altman hints in a recent interview that a new AI architecture will be a major upgrade, comparable to Transformers versus LSTM. The discussion argues breakthroughs will occur at a lower level than current model architectures and recommends using existing AI to help discover the next giant leap, per Rohan Paul. Source-twitter

3. International Teams Built AI Milestones; Calls for AI Weapons Moratorium

The post argues that major AI breakthroughs such as convnets, AlexNet, attention, AlphaGo, AlphaCode, AlphaFold, transformers, and RL were developed by international teams rather than Americans. It condemns a war-mongering Palantir CEO’s stance and suggests that American AI leadership does not represent the broader community. It calls for a moratorium on AI weapons and for international institutions to enforce it, framing AI warfare as imminent (Skynet v1.0). Source-twitter

📰 Featured

LLM

Heretic Unveils Automatic Censorship Removal for Language Models — Heretic is an open-source tool that automatically removes censorship (safety alignment) from transformer-based language models. It combines directional ablation (abliteration) with a TPE-based optimizer powered by Optuna to auto-tune parameters, co-minimizing refusals and KL divergence from the original model. The goal is a decensored model that preserves as much intelligence as possible and requires no deep transformer knowledge to use. Source-github
Qwen3.5-27B Nearly Matches 397B and GPT-5 Mini in GACL — In the March GACL run, Qwen3.5-27B performed just behind 397B, trailing by 0.04 points, and nearly matches GPT-5 Mini. GPT-5.4 leads among major models, with Kimi2.5 as the top open-weight model (#6 globally) and GLM-5 at #7. The results show GPT models dominating Battleship, while Tic-Tac-Toe proved weak as a benchmark. Source-reddit
GPT-4 Turns 3; Codex Powers Sketch-to-Website Demo — Celebrating GPT-4’s third birthday, the post recalls a moment when @gdb turned a hand-drawn sketch into a working website. It notes how programming felt like it shifted in real time, and asserts that Codex now embodies that future. Source-twitter
GPT-4 era enables AI to write 1000-line programs — Greg Brockman recalls an internal goal of an AI that could write a coherent 1000-line program once seemed impossible. He says the technology has progressed dramatically, highlighting GPT-4’s capabilities. The post celebrates AI advancement with a birthday nod to GPT-4. Source-twitter
OpenCode OSS LLM Emerges as Cheaper Open-Source Alternative — A Reddit post praises OpenCode’s Open-Source LLM interface as superior to CC/Codex, highlighting its open-source nature, cheaper pricing, and the ability to run behind a product with an OSS model. The author notes it can inspect how tools are implemented and even summarize its own code scaffolding into system messages and tool descriptions. They also flag reliability concerns and mention using Kimi k2.5 as the model they intend to deploy. Source-reddit
From FlashLM to State Flow: Replacing Transformers with Memory — The author behind FlashLM describes moving beyond static SlotMemoryAttention to a new ‘State Flow Machine’ that maintains explicit state across input sequences. The work aims to replace traditional transformers with memory-augmented architectures; early results show strong length retention (79%) versus transformers (2%). Source-reddit
Apex 1.6 Instruct 350M Released as Powerful Chat Model — LH-Tech-AI released Apex 1.6 Instruct 350M, their most capable chat model to date, by adjusting the finetuning data ratio to 2:1 (Alpaca-Cleaned to Fineweb-Edu-10BT). The release improves world knowledge over Apex 1.5 Coder and is available on Hugging Face in GGUF format for Ollama, LM Studio, and llama.cpp. The post compares Apex 1.6 to Apex 1.5 Coder and highlights its more complex, instruction-rich outputs. Source-reddit

AI Benchmarking

Coding Benchmark Exposes True Reasoning; 11% Best Result — Researchers designed an esoteric-language coding benchmark to separate true problem solving from pattern matching learned during training. By testing on Brainfuck, Befunge-98, Whitespace, Unlambda, and Shakespeare with HumanEval problems, they show that many models may rely on training data rather than genuine reasoning; esoteric languages have near-zero training data. Across GPT-5.2, O4-mini, Gemini 3 Pro, Qwen3-235B, and Kimi K2, the best single result was 11.2% on Befunge-98 using self-scaffolding. Source-reddit

Open Source

Opencode Port for Karpathy’s Autoresearch — An Reddit user, dabiggmoe2, announced they built an open-code port of Karpathy’s Autoresearch project. The post, in r/LocalLLaMA, links to the port and discussion, highlighting an effort to enable open-source experimentation with Autoresearch workflows on local setups. This open-port effort demonstrates interest in accessible AI research tooling. Source-reddit

⚡ Quick Bites

Florida Man Sells Home in 5 Days Using ChatGPT-Driven Sale — A Florida man sold his house in five days after using ChatGPT to manage the entire sale instead of a real estate agent. The AI handled pricing, marketing, showings, and contract drafting, illustrating AI’s potential to automate real estate processes. Source-twitter
Emily Bender: LLMs Only Useful to Offload Cognition — The post endorses Emily Bender’s view that LLMs are primarily valuable for offloading cognitive work. The author argues that the other two use cases she mentions are rare exceptions of that core function. Offloading cognition has long been a guiding aim of AI. Source-twitter
Seven Emerging Memory Architectures for AI Agents — A roundup highlights seven emerging memory architectures for AI agents, including Agentic Memory (AgeMem), Memex, MemRL, UMA, Pancake, Conditional memory, and Multi-Agent Memory from a Computer Architecture Perspective. The piece is shared via The Turing Post on Twitter, linking to a deeper article on memory architectures for AI agents. Source-twitter
Sebastian Raschka Launches LLM Architecture Gallery — Sebastian Raschka released a new LLM Architecture Gallery that bundles architecture figures in one place. The resource aims to simplify comparing LLM architectures for researchers and learners by aggregating diagrams and examples. Access it at sebastianraschka.com/llm-arc. Source-twitter
Does Increasing MoE Experts Improve Performance? — A Reddit discussion questions whether expanding the number of experts in Mixture-of-Experts (MoE) models yields meaningful gains, citing Qwen3-30B-A3B versus A6B configurations. It notes that MoE setups are still easy to run in Llama-CPP, but there appears to be little recent experimentation, and asks if others have tested larger MoE configurations. Source-reddit
The Fast Food Problem with AI Coding — An author draws a parallel between fast food becoming abundant and AI-assisted coding, arguing that cheap, easy access can lead to overuse. The piece is not anti-AI and the author uses AI to write code daily; it frames the shift as a recurring pattern and invites reader feedback. Source-reddit
Open-Source GreenBoost Driver Expands NVIDIA GPU RAM with System RAM & NVMe — A new open-source driver named GreenBoost aims to augment NVIDIA GPUs’ VRAM by offloading to system RAM and NVMe storage, enabling larger language models. The project seeks to extend memory beyond GPU limits to run bigger LLMs on commodity hardware. Source-reddit
mjv5 Turns 3; Four Candidates for Great AI Art — mjv5 marks its third anniversary and shares four candidates for great AI art in a post dated July 20, 2025, with mentions of neurotica and Schwarzposter_. The thread includes engagements from Rez and Brick Suit and contains a light reference to Tucker Carlson. Source-twitter
this is what 12 gigs of VRAM built in 2026. a 9 billion parameter model running on a 5 year old RTX… — this is what 12 gigs of VRAM built in 2026. a 9 billion parameter model running on a 5 year old RTX 3060 wrote a full space shooter from a single prompt. blank screen on first try. i came back with a Source-twitter
Gallery of LLM Architecture Visualizations — A Reddit post curates a gallery of visualizations that illustrate various large language model architectures. The collection highlights differences in design choices and components across LLMs, offering a reference for researchers and enthusiasts. Source-reddit
Pied Piper Discovers Claude Code Offers 2x Rate Limits 5–11 AM — Silicon Valley S10E5 reveals that Claude Code offers twice the rate limits during 5–11 AM, prompting Richard and Dinesh to exploit it with polyphasic sleep. Gavin Belson tries to enforce the same at Hooli, but his move costs him half his engineering team. Source-twitter

Generated by AI News Agent | 2026-03-15