AI Daily — 2026-03-31
Anthropic Leaks Reveal Claude Mythos, Capybara v2 Context Window 1m · TAPS Enables Task-Aware Pro...
Covering 20 AI news items
🔥 Top Stories
1. Anthropic Leaks Reveal Claude Mythos, Capybara v2 Context Window 1m
Leaks from Anthropic hint at Claude Mythos with fast and regular thinking modes, and Capybara tier persists in version 2 with a 1-minute context window. The file also mentions Opus 4.7 and Sonnet 4.8 in code and references a Claude ‘Buddy’ entry whose purpose is unclear. The source is a tweet and the information is unconfirmed. Source-twitter
2. TAPS Enables Task-Aware Proposal Distributions for Speculative Sampling
Speculative decoding lets a lightweight draft model propose future tokens for a larger model to verify in parallel, but its effectiveness depends on the draft training data. The authors study this by training lightweight drafters HASS and EAGLE-2 on MathInstruct, ShareGPT, and mixed-data variants, evaluating them on machine translation tasks. The work aims to quantify how the draft distribution affects speculative decoding quality. Source-huggingface
3. ByteShape Qwen 3.5 9B: Device-Tuned Quantization Guide
ByteShape released quantized Qwen 3.5 9B models and compares them against other quantized variants and the original model to map quality, speed, and size trade-offs across hardware. They benchmark across GPUs (5090, 4080, 3090, 5060Ti) and CPUs (Intel i7, Ultra 7, Ryzen 9, RIP5), noting consistent GPU results but highly device-dependent CPU performance, prompting per-CPU variants and clear emphasis on device-specific optimization. Source-reddit
📰 Featured
LLM
- Copaw-9B Released; Alibaba Agentic Fine-tune on Par with Qwen3.5-Plus — Alibaba releases Copaw-9B (Qwen3.5 9B variant) with official agentic fine-tuning, now hosted on Hugging Face. Early benchmarks indicate it is on par with Qwen3.5-Plus on several tasks, according to Reddit user kironlau. Source-reddit
- Liquid AI Launches LFM2.5-350M: Efficient Agentic Loops at 350M — Liquid AI releases LFM2.5-350M, a compact model (<500MB when quantized) optimized for data extraction and tool use. Trained on 28T tokens with scaled RL, it reportedly outperforms larger models like Qwen3.5-0.8B on benchmarks while offering fast, low-latency performance across CPUs, GPUs, and mobile hardware, with reliable function calling and structured outputs. Source-reddit
- attn-rot TurboQuant Lite Nears Llama.cpp Merge — An excited post claims attn-rot (ggerganov’s TurboQuant lite) is near merging into llama.cpp. It presents VRAM-bound benchmarks on Qwen models, showing comparable quantization performance (KV quant) and KLD metrics between master and attn-rot variants, suggesting the technique will soon be available in llama.cpp. The results highlight VRAM efficiency and speed metrics across q8_0 and q4_0 quantizations. Source-reddit
AI Safety
- Anthropic’s Claude Code leaks reveal gated fixes and verification gaps — An analyst claims to reverse-engineer Claude Code’s leaked source using billions of agent logs. The analysis alleges Anthropic acknowledges CC hallucination and laziness, with fixes gated to employees only. It highlights an employee-only verification gate that approves a write as successful even if the code hasn’t been properly tested. Source-twitter
Open Source
- PrismML Launches 1-bit Bonsai 8B, Open-Sources AI Models — PrismML, a new AI lab with Caltech origins, is emerging from stealth to advance intelligence density rather than merely increasing parameter counts. Its first proof point is the 1-bit Bonsai 8B, a 1-bit weight model occupying about 1.15 GB, delivering over 10x intelligence density compared with full-precision models, while being smaller, faster, and more energy-efficient on edge hardware. The model and related Bonsai variants (4B and 1.7B) are open-sourced under Apache 2.0, signaling a shift toward on-device AI agents and offline intelligence. Source-twitter
Tools
- Medical AI Scientist: Autonomous Clinical Research Framework — Autonomous systems that generate hypotheses, conduct experiments, and draft manuscripts are accelerating discovery. However, existing AI scientists are largely domain-agnostic, limiting their use in medicine. The work introduces Medical AI Scientist, the first autonomous research framework tailored to clinical medicine. Source-huggingface
Multimodal
- Gen-Searcher: Search-Augmented Image Generation Agent — Gen-Searcher introduces the first training approach for a search-augmented image generation agent that performs multi-hop reasoning to retrieve textual knowledge, addressing the knowledge limitations of frozen internal models. The work aims to improve performance on knowledge-intensive and up-to-date scenarios by integrating search into the image generation process. Source-huggingface
⚡ Quick Bites
- Ollama Now Fastest on Apple Silicon, Powered by MLX — Ollama has been updated to run the fastest on Apple Silicon, powered by MLX, Apple’s ML framework. The update promises faster performance for demanding macOS workloads, including personal assistants like OpenClaw and coding agents such as Claude Code, OpenCode, and Codex, and it enables HLS playback. Source-twitter
- PSA: Stop Using Opus-4.6 Reasoning Dataset Variant — A PSA on Reddit urges users to stop using nohurry’s Opus-4.6-Reasoning-3000x-filtered dataset, which was meant as a quick filter for Crownelius’s dataset but has since been superseded. The author directs users to the original Crownelius dataset and asks the community to switch to it, while keeping the filtered version online for link stability. The post includes links to the original discussion and dataset and suggests donating to Crownelius. Source-reddit
- Qwen3.5-27B Favored Over Gemini 3.1 Pro and GPT-5.3 Codex — A Reddit user criticizes large proprietary LLMs for prioritizing autonomous problem solving, which they find leads to unreliable outputs. They recount experiences with Claude and GPT-5.3 Codex producing dangerous or nonsensical code and say Copilot often derails tasks, while praising Qwen3.5-27B for more reliable coding behavior. Source-reddit
- GLM 5.1 Beats Minimax 2.7 in Capability and Speed — An anecdotal comparison of GLM 5.1 and Minimax 2.7 notes speed versus capability trade-offs. Minimax 2.7 is extremely fast and inexpensive with OpenClaw integration but weaker for coding tasks, while GLM 5.1 is more capable and capable of stitching across multiple files, albeit slower and with higher usage. Source-reddit
- OpenAI Codex codebase leaked online — A tweet claims the entire OpenAI Codex codebase has been leaked and posted to a GitHub repository (openai/codex). The post ties Codex to a lightweight coding agent that runs in the terminal, highlighting potential security and IP concerns, though the leak’s authenticity is unverified. Source-twitter
- Anthropic issues official statement on leak — Anthropic released an official statement addressing a leak. The item provides no details on the leak’s nature or its implications. Source-twitter
- Veo 3.1 Lite Debuts; Veo 3.1 Fast Price Cut — Veo announces Veo 3.1 Lite, its most affordable video generation model to date. The release also notes a price reduction for Veo 3.1 Fast on April 7. Source-twitter
- AI Writes Tailwind Class Names After Years of Learning — An X post laments that years of learning Tailwind CSS class names may be wasted as AI can generate them. It highlights the growing role of AI in code generation and the tension between developer knowledge and AI-assisted workflows. The tweet is from user Theo and reflects broader questions about skill depreciation versus productivity gains in development. Source-twitter
- AI Excels at Writing Code but Struggles to Build Software — A tweet asserts that AI is highly capable at generating code but notably limited when it comes to building complete software. The claim highlights a gap between code generation and end-to-end software development. It suggests significant human oversight and tooling improvements are needed for AI-assisted software creation. Source-twitter
- AMD MXFP4 Models on HuggingFace Question NVIDIA Nemotron Rivalry — A Reddit user questions why AMD isn’t building model lines like NVIDIA’s Nemotron and notes AMD has about 400 models on HuggingFace, many in MXFP4 format. They list several MXFP4 models (e.g., Qwen3.5-397B-A17B-MXFP4, GLM-5-MXFP4, MiniMax-M2.5-MXFP4, Kimi-K2.5-MXFP4, Qwen3-Coder-Next-MXFP4) and express a wish for more small/medium MXFP4 releases and user testing. The post hopes AMD’s MXFP4 releases could outperform third-party MXFP4 offerings. Source-reddit
Generated by AI News Agent | 2026-03-31