AI Daily — 2026-03-02
Stepfun AI releases base models and open-sources SteptronOSS · BullshitBench v2 Released; Models ...
Covering 30 AI news items
🔥 Top Stories
1. Stepfun AI releases base models and open-sources SteptronOSS
Stepfun AI unveiled two base models: Step-3.5 Flash Base and Step-3.5 Flash Base-Midtrain, and open-sourced SteptronOSS to enable customizable workflows. The move reinforces open-source/open-science goals and provides a reference pipeline, with SFT data coming soon to expand community workflows. Source-x
2. BullshitBench v2 Released; Models Don’t Improve, Claude Excels
BullshitBench v2 expands evaluation with 100 new questions across coding, medical, legal, finance, and physics, tested across 70+ models. Results reiterate that progress is not uniform across models, with Claude performing notably well while others lag; the project is open-source with a data explorer for deeper analysis. Source-x
3. Claude dethrones ChatGPT as top U.S. app after Pentagon saga
Anthropic’s Claude reportedly overtakes OpenAI’s ChatGPT in U.S. app downloads amid a Pentagon-related controversy, signaling intensified competition among leading LLM platforms. The shift is observed in download trends and coverage compiled by Axios via Hacker News discussions. Source-rss
📰 Featured
Embodied AI & Benchmark
- RoboCasa365 Unveils 2,500 Environments, 365 Tasks Benchmark — New large-scale simulation benchmark for generalist robot models: 2,500 kitchen environments, 365 tasks, 3,200+ objects, and 2,200+ hours of demonstrations to support scalable multi-task training and continual learning. Source-x
Hardware & Optimization
- ByteDance Unveils CUDA Agent for Optimized Kernels — CUDA Agent writes fast, optimized CUDA kernels and reportedly outperforms torch.compile and top models on kernel complexity, signaling a performance-first direction for profiling and RL training. Source-x
- Reverse-engineered Apple Neural Engine trains neural network locally — Researchers claim local AI inference can be faster and more power-efficient by reverse-engineering Apple’s Neural Engine; project open-sourced on GitHub, early research not officially supported by Apple. Source-x
Edge AI & On-device
- Qwen3.5 2B Runs On iPhone 17 Pro: Edge AI Breakthrough — Alibaba’s Qwen3.5 2B runs on-device on iPhone 17 Pro, outperforming models several times larger with an optimized 6-bit path for Apple Silicon. Source-x
AI Safety & Policy
- OpenAI contract locks in current law? Experts say unlikely. — Legal analysis argues that claims of freezing autonomous weapons law via contracts are unlikely to hold, prompting ongoing policy debates. Source-x
Tools & Development Practices
- AGENTS.md Boosts for Coding Agents: Faster, Cheaper Runs — Study shows AGENTS.md reduces median runtime by ~28.6% and output tokens by ~16.6% for OpenAI Codex tasks, suggesting guardrails against worst-case thrash rather than universal acceleration. Source-x
⚡ Quick Bites
- Anthropic natsec model includes safeguards, contradicting OpenAI claims — Anthropic’s natsec-focused model reportedly includes safeguards conflicting with OpenAI’s public statements. Source-x
- dLLM Proposes Simple Diffusion Language Modeling Framework — A diffusion-based framework for language modeling presented in a HuggingFace paper. Source-huggingface
- OmniGAIA Benchmark for Native Omni-Modal AI Agents — Benchmark released for native omni-modal AI agents. Source-huggingface
- High-Quality Environments Key to Studying Model Scheming — Environment quality is critical for studying model scheming behaviors. Source-x
- K-Dense-AI Unveils Claude Scientific Skills Suite for AI Agents — Claude’s scientific skills suite for AI agents released. Source-github
- Claude tops App Store as Anthropic rally grows — Claude reaches App Store #1 amid Anthropic momentum. Source-rss
- Tune LLMs to RAM, CPU, GPU limits — Tooling to align LLMs with hardware resource constraints. Source-github
- AI Makes Junior Developers Seem Useless — Opinion piece arguing AI undermines junior developer usefulness. Source-rss
- Demo shows how free, ad-supported AI chat could look — Concept of ad-supported free AI chat presented. Source-rss
- Critics call closed frontier models dystopian and uncomfortable — Critics voice concerns over closed frontier AI models. Source-x
- Minimal AI agent for automated theorem proving achieves competitive proofs — AI agent achieves competitive proofs in automated theorem proving. Source-x
- Anthropic Cowork creates 10GB macOS VM bundle without warning — Claude-related workspace bundle release raises size concerns. Source-github
- Claude Code LSP Brings Claude to Code Editors — Claude integration via LSP into code editors. Source-rss
- AI Makes Writing Code Easier, But Engineering Harder — Analysis arguing AI eases code writing but complicates engineering work. Source-rss
- Switch to Claude Without Starting Over — Claude memory import feature enables switching without starting anew. Source-rss
- CS336: Building LLMs From Scratch Beats Bootcamps — Course shows hands-on LLM-building outperforms bootcamps. Source-x
- Go Is Best Language for AI Agents — Opinion arguing Go is best for AI agents. Source-rss
- Apple AI servers sit idle amid low Apple Intelligence usage — Reports of idle Apple AI servers due to low usage. Source-rss
- If AI writes code, should the session be part of the commit? — Debate on whether AI-generated sessions should be included in commits. Source-github
- Why XML tags are so fundamental to Claude — Discussion on XML tag usage in Claude. Source-rss
Generated by AI News Agent | 2026-03-02