daily
Feb 28, 2026

AI Daily — 2026-02-28

English 中文

AI Firm Reaches DoW Deal to Deploy Models in Classified Network · Google finds longer reasoning h...


Covering 23 AI news items

🔥 Top Stories

1. AI Firm Reaches DoW Deal to Deploy Models in Classified Network

An AI firm announced an agreement with the Department of War to deploy its models in the DoW’s classified network. The deal enshrines safety principles—prohibiting domestic mass surveillance and ensuring human oversight for the use of force, including autonomous weapons—plus technical safeguards and cloud-only deployment. It also calls on DoW to offer the same terms to all AI companies. Source-twitter

2. Google finds longer reasoning harms accuracy; introduces Deep Thinking Ratio

Google researchers tested eight model variants (GPT-OSS, DeepSeek-R1, Qwen3, etc.) across AIME2024/2025, HMMT 2025, and GPQA-Diamond, finding that token length negatively correlates with accuracy (-0.54). They introduce Deep Thinking Ratio (DTR) to measure deep processing in tokens, which correlates with accuracy at 0.82. The team also outlines a Think@n strategy that samples multiple reasoning paths, estimates DTR from the first 50 tokens, retains the top half with high-DTR, and uses majority voting to decide answers. Source-reddit

3. Qwen3.5 35B-A3B replaces my 2-model agentic setup on M1

A Reddit post claims Qwen3.5-35B-A3B can match or beat larger models in reasoning, agentic, and coding tasks within the same size class, rivaling models up to hundreds of billions of parameters. On an Apple Silicon M1 Max with 64GB RAM, the user ran Qwen3.5-35B-A3B (19 GB footprint) via a llama.cpp server to analyze a six-sheet Amazon January 2025 sales Excel and propose a 10% sales uplift for next month. This single-model setup reportedly replaces their prior two-model agentic workflow, illustrating strong end-to-end capabilities on consumer hardware. Source-reddit

Industry

  • Anthropic’s Amodei speaks after Pentagon blacklist; vows patriotism — Dario Amodei gives his first interview since the Pentagon blacklisted Anthropic, saying the lab is patriotic and built models to defend America. The piece notes the government’s demand for unrestricted access to autonomous weapons and mass surveillance, and its use of emergency powers including a supply-chain designation and a six-month phaseout via Truth Social. It highlights tensions at the crossroads of AI development, policy, and national security. Source-twitter

AI Tools

  • Claude Code Adds /simplify and /batch Skills — Anthropic’s Claude Code will introduce two new Skills, /simplify and /batch. These features automate tasks like shepherding a pull request to production and performing parallelizable code migrations, reducing manual effort. The author notes they’ve been using them daily and is excited to share them publicly. Source-twitter

AI Safety

  • Pentagon rejects veto power over military AI use — It argues that the Pentagon shouldn’t grant veto power over how it uses an AI tool it purchased, citing lawful use under civilian control. The piece contrasts this with critiques urging strict limits (no mass surveillance, human-in-the-loop autonomy) and contrasts U.S. governance with PLA AI deployment, referencing Claude and Dario Amodei. Source-twitter
  • DoW Standards Differ for OAI vs Anthropic; Altman Misleading — A commenter argues that DoW applies different standards to OpenAI and Anthropic, or that Altman is misleading in this tweet. Given Altman’s history, the poster favors the latter explanation. Source-twitter

LLM

  • CodexBar Tracks AI Usage Across Codex, Claude and More — CodexBar is a macOS 14+ menu bar app that displays per-provider usage limits for OpenAI Codex, Claude Code, and other AI services. It shows session and weekly limits, per-provider status, reset times, and an optional overview tab, all configurable via Settings. The project also offers Linux CLI support and Omarchy integration, with releases on GitHub and Homebrew installation options. Source-github
  • Wei-Shaw Claude Relay Service Enables Unified Open-Source LLM Access — Wei-Shaw’s claude-relay-service provides a self-hosted Claude Code mirror and a one-stop open-source relay for accessing Claude, OpenAI, Gemini, and Droid with shared-cost carpooling. It warns that v1.1.248 and earlier have a severe admin authentication bypass vulnerability and urges upgrading to v1.1.249+ or migrating to CRS 2.0 (sub2api), while promoting a self-hosted Claude API relay with multi-account support. The project also markets pincc.ai’s Claude/Codex carpool service via Codex CLI, but includes cautions about Anthropic’s terms, privacy concerns, and reliability issues of third-party mirrors. Source-github
  • Bare-Metal AI: Boot LLM Inference Without OS or Kernel — A Reddit post describes a UEFI-based application that boots directly into large-language-model inference with no operating system or kernel. The entire AI stack—tokenizer, weight loader, tensor math, and inference engine—runs in freestanding C under UEFI boot services, with plans to add network drivers and serve smaller models on a network. The developer notes it’s slow for now and aims to improve performance with future optimizations, primarily for experimentation. Source-reddit
  • Qwen3 Coder Next Benchmark in Rust & Next.js — Continuing local benchmarks on personal production repos, the author compares Qwen3 Coder Next, Qwen3.5 27B, Devstral Small 2, and related models in a Rust + Next.js setup. Prior results showed Qwen3.5 27B leading a 78-task Next.js/Solidity bench, while Devstral Small 2 edged Next.js; a Noctrex benchmark also highlighted Qwen3-Coder-Next-UD-IQ3_XXS against Mistral and Qwen models. This update tests on a Rust + Next.js repo, adds LM Studio’s Devstral Small 2 Q8_0, and fixes KV Cache to Q8_0 to reduce VRAM usage. Source-reddit
  • Self-Organizing Maps Enable Multi-Directional Refusal Suppression — A pull request proposes using self-organizing maps to suppress refusals in LLMs across multiple directions, arguing refusals form low-dimensional manifolds rather than a single latent direction. Reported results on gpt-oss-20b and oss-120b show improved refusal suppression at various KL divergences; previous one-direction ablations are insufficient, with a HuggingFace model visualizing refusal clusterization. Researchers from the University of Cagliari contribute to this effort. Source-reddit
  • LLM Agents Pass KV-Cache to Cut Token Reprocessing — An AI enthusiast argues that multi-agent LLM setups re-tokenize and reprocess the entire conversation, leading to substantial token waste (about 47-53% in tests). They propose the Agent Vector Protocol (AVP), which transfers KV-cache between agents instead of text, eliminating re-tokenization and redundant forward passes. Early tests across Qwen2.5, Llama 3.2, and DeepSeek-R1-Distill report 73-78% token savings with no overhead. Source-reddit

Open Source

  • Alibaba OpenSandbox Launches Multi-Language AI Sandbox Platform — OpenSandbox is Alibaba’s general-purpose sandbox platform for AI applications, offering multi-language SDKs, unified sandbox APIs, and Docker/Kubernetes runtimes. It supports use cases such as Coding Agents, GUI Agents, Agent Evaluation, AI Code Execution, and RL Training, with built-in environments and runtime lifecycle management. The project is hosted on GitHub at the Alibaba/OpenSandbox repository. Source-github
  • InvisPose WiFi DensePose Delivers Real-Time Privacy-Preserving Pose Estimation — ruvnet has released a production-ready implementation of InvisPose, a WiFi-based dense human pose estimation system that detects full-body pose using Channel State Information data without cameras, enabling tracking through walls. The system offers sub-50ms latency at 30 FPS, supports multi-person tracking up to 10 individuals, and provides an enterprise-ready API with analytics like fall detection and occupancy monitoring across healthcare, fitness, smart home, and security use cases. Source-github

Multimodal

  • DeepSeek V4 to launch next week with image and video generation — The Financial Times reports that DeepSeek plans to release its long-awaited AI model, DeepSeek V4, next week. The new version reportedly includes image and video generation capabilities, signaling a push in multimodal AI amid competition with US rivals. Source-reddit

⚡ Quick Bites

  • Anthropic’s Amodei Sets red lines for government AI use — Anthropic CEO Dario Amodei told CBS News that the company intends to draw red lines on the government’s use of its AI technology, saying crossing those lines would violate American values. He added that disagreeing with the government is ‘the most American thing in the world.’ Source-twitter
  • Meta’s Llama 4 sidelined in AI conversation — An X post claims Meta damaged Llama 4 to the point that the model no longer features in AI discussions. The author frames Meta’s handling of Llama 4 as a cause of its diminished prominence amid competing models. The post reflects sentiment around Meta’s AI strategy, with no independent verification provided in the item. Source-twitter
  • moeru-ai/airi Opens Self-Hosted AI Waifu Container — moeru-ai/airi is an open-source project that recreates Neuro-sama as a self-hosted AI waifu container called Grok Companion. It offers real-time voice chat and gaming support (Minecraft, Factorio) across web, macOS, and Windows, with memory/RAG features and Live2D utilities. The project is part of Project AIRI, welcomes translations via Crowdin, and notes there is no cryptocurrency token associated. Source-github
  • OpenAI Pivot Gains Investors’ Favor — An anonymous Reddit post notes investor enthusiasm for OpenAI’s recent pivot. It offers no details on what the pivot entails, but indicates a positive market reaction to OpenAI’s strategic direction. Source-reddit
  • Unsloth Dynamic 2.0 Improves GGUF Layer Quantization — Unsloth Dynamic 2.0 updates GGUFs to selectively quantify model layers more intelligently and extensively. The change enables finer-grained compression across layers, potentially reducing footprint while maintaining performance. The update reflects ongoing open-source AI optimization efforts. Source-reddit
  • Algorithm trained on cat tweets; interrupted by killbot claim — An individual describes training a social media algorithm to prioritize cat-related content. They report occasional non-cat posts, including a message claiming the government plans to use AI for killbots and mass surveillance. The post highlights concerns about AI-driven recommendation systems and the spread of alarming claims. Source-twitter
  • Small Qwens Update Adds 4 Hidden Items — A Reddit post reports that the unsloth collection has been updated with four hidden items, tagged with the hint ‘13-9=4’. The submission comes from user /u/jacek2023 and links to further discussion. There is no indication of an official AI product or announcement in the post. Source-reddit

Generated by AI News Agent | 2026-02-28