AI Daily — 2026-03-02

English 中文

Stepfun AI releases base models and open-sources SteptronOSS · BullshitBench v2 Released; Models ...

Covering 30 AI news items

🔥 Top Stories

1. Stepfun AI releases base models and open-sources SteptronOSS

Stepfun AI unveiled two base models: Step-3.5 Flash Base and Step-3.5 Flash Base-Midtrain, and open-sourced SteptronOSS to enable customizable workflows. The move reinforces open-source/open-science goals and provides a reference pipeline, with SFT data coming soon to expand community workflows. Source-x

2. BullshitBench v2 Released; Models Don’t Improve, Claude Excels

BullshitBench v2 expands evaluation with 100 new questions across coding, medical, legal, finance, and physics, tested across 70+ models. Results reiterate that progress is not uniform across models, with Claude performing notably well while others lag; the project is open-source with a data explorer for deeper analysis. Source-x

3. Claude dethrones ChatGPT as top U.S. app after Pentagon saga

Anthropic’s Claude reportedly overtakes OpenAI’s ChatGPT in U.S. app downloads amid a Pentagon-related controversy, signaling intensified competition among leading LLM platforms. The shift is observed in download trends and coverage compiled by Axios via Hacker News discussions. Source-rss

📰 Featured

Embodied AI & Benchmark

RoboCasa365 Unveils 2,500 Environments, 365 Tasks Benchmark — New large-scale simulation benchmark for generalist robot models: 2,500 kitchen environments, 365 tasks, 3,200+ objects, and 2,200+ hours of demonstrations to support scalable multi-task training and continual learning. Source-x

Hardware & Optimization

ByteDance Unveils CUDA Agent for Optimized Kernels — CUDA Agent writes fast, optimized CUDA kernels and reportedly outperforms torch.compile and top models on kernel complexity, signaling a performance-first direction for profiling and RL training. Source-x
Reverse-engineered Apple Neural Engine trains neural network locally — Researchers claim local AI inference can be faster and more power-efficient by reverse-engineering Apple’s Neural Engine; project open-sourced on GitHub, early research not officially supported by Apple. Source-x

Edge AI & On-device

Qwen3.5 2B Runs On iPhone 17 Pro: Edge AI Breakthrough — Alibaba’s Qwen3.5 2B runs on-device on iPhone 17 Pro, outperforming models several times larger with an optimized 6-bit path for Apple Silicon. Source-x

AI Safety & Policy

OpenAI contract locks in current law? Experts say unlikely. — Legal analysis argues that claims of freezing autonomous weapons law via contracts are unlikely to hold, prompting ongoing policy debates. Source-x

Tools & Development Practices

AGENTS.md Boosts for Coding Agents: Faster, Cheaper Runs — Study shows AGENTS.md reduces median runtime by ~28.6% and output tokens by ~16.6% for OpenAI Codex tasks, suggesting guardrails against worst-case thrash rather than universal acceleration. Source-x

⚡ Quick Bites

Anthropic natsec model includes safeguards, contradicting OpenAI claims — Anthropic’s natsec-focused model reportedly includes safeguards conflicting with OpenAI’s public statements. Source-x
dLLM Proposes Simple Diffusion Language Modeling Framework — A diffusion-based framework for language modeling presented in a HuggingFace paper. Source-huggingface
OmniGAIA Benchmark for Native Omni-Modal AI Agents — Benchmark released for native omni-modal AI agents. Source-huggingface
High-Quality Environments Key to Studying Model Scheming — Environment quality is critical for studying model scheming behaviors. Source-x
K-Dense-AI Unveils Claude Scientific Skills Suite for AI Agents — Claude’s scientific skills suite for AI agents released. Source-github
Claude tops App Store as Anthropic rally grows — Claude reaches App Store #1 amid Anthropic momentum. Source-rss
Tune LLMs to RAM, CPU, GPU limits — Tooling to align LLMs with hardware resource constraints. Source-github
AI Makes Junior Developers Seem Useless — Opinion piece arguing AI undermines junior developer usefulness. Source-rss
Demo shows how free, ad-supported AI chat could look — Concept of ad-supported free AI chat presented. Source-rss
Critics call closed frontier models dystopian and uncomfortable — Critics voice concerns over closed frontier AI models. Source-x
Minimal AI agent for automated theorem proving achieves competitive proofs — AI agent achieves competitive proofs in automated theorem proving. Source-x
Anthropic Cowork creates 10GB macOS VM bundle without warning — Claude-related workspace bundle release raises size concerns. Source-github
Claude Code LSP Brings Claude to Code Editors — Claude integration via LSP into code editors. Source-rss
AI Makes Writing Code Easier, But Engineering Harder — Analysis arguing AI eases code writing but complicates engineering work. Source-rss
Switch to Claude Without Starting Over — Claude memory import feature enables switching without starting anew. Source-rss
CS336: Building LLMs From Scratch Beats Bootcamps — Course shows hands-on LLM-building outperforms bootcamps. Source-x
Go Is Best Language for AI Agents — Opinion arguing Go is best for AI agents. Source-rss
Apple AI servers sit idle amid low Apple Intelligence usage — Reports of idle Apple AI servers due to low usage. Source-rss
If AI writes code, should the session be part of the commit? — Debate on whether AI-generated sessions should be included in commits. Source-github
Why XML tags are so fundamental to Claude — Discussion on XML tag usage in Claude. Source-rss

Generated by AI News Agent | 2026-03-02