daily
Feb 21, 2026

AI Daily — 2026-02-21

English 中文

LoopViT: Tiny AI Outthinks Bigger Models with Looped Transformers · Llama 3.1 70B Runs on RTX 309...


Covering 34 AI news items

🔥 Top Stories

1. LoopViT: Tiny AI Outthinks Bigger Models with Looped Transformers

A collaboration between HKUST, CASIA, and UC Santa Cruz introduces LoopViT, a looped transformer that reuses a small set of weights to simulate an internal chain of thought. The 18M-parameter model stops computing when predictions become certain and achieves 65.8% on the ARC-AGI visual reasoning benchmark, outperforming a larger 73M-parameter model on the same task. The paper is available on arXiv and code on GitHub. Source-twitter

2. Llama 3.1 70B Runs on RTX 3090 via NVMe-GPU Bypass

A Show HN demonstrates that Llama 3.1 70B can run on a single RTX 3090 by bypassing CPU and RAM with an NVMe-to-GPU setup. The project links a library (ntransformer) and reports it works on consumer GPUs, with better performance expected on professional GPUs. It highlights a hardware-focused approach to running large transformers outside traditional CPU-based memory paths. Source-hackernews

3. CPU-trained 29.7M LLM beats GPU baseline in 40 hours

FlashLM v5 ‘Thunderbolt’ trains on CPU (AMD Ryzen 7950X3D) for about 40 hours, achieving perplexity 1.36 and BPC 0.44 with 29.7M parameters. It beats the TinyStories-1M baseline (PPL 1.59), marking the first CPU-trained model to surpass that benchmark. The model uses a MatMul-free architecture called ParallelGatedRecurrence with ternary BitLinear weights; arki05 provided the CPU hardware. Source-reddit

LLM

  • ChatGPT Pro Lite priced at $100/month, checkout hints — A developer-spotting tweet reports that the ChatGPT web app now references a new ‘ChatGPT Pro Lite’ plan, priced at $100 per month. The checkout page description appears unfinished, suggesting the plan may still be in development or testing rather than officially announced. If confirmed, Pro Lite would introduce a lower-cost tier for ChatGPT users. Source-twitter
  • Tidy: Cloud-Hosted AI Agent Learns to Use Any App — Tidy is a personal agent that can learn to use any app you use, enabling it to perform your workflows. The system is cloud-hosted, keeps you updated via iMessage, and offers a persistent filesystem. It can be taught to safely operate websites without writing code, positioning itself as a cloud alternative to OpenClaw. Source-producthunt
  • How Taalas Prints LLM onto a Chip — Taalas outlines a method to embed large language models directly onto hardware. The article discusses hardware-software co-design to enable LLMs to run on chips, potentially boosting efficiency and deployment. It highlights the implications of LLM-on-chip approaches for industry and research. Source-hackernews
  • Qwen Team Flags Serious Data Quality Issues in GPQA and HLE — Discussion around DeepSeek-Overclock suggested the model could derive correct reasoning that conflicted with gold-standard labels, revealing data quality issues in the evaluation sets. The Qwen team has confirmed serious data quality problems in the GPQA and Humanity’s Last Exam (HLE) benchmarks, underscoring potential reliability concerns for these tests. Source-reddit
  • O-TITANS Orthogonal LoRAs for Gemma 3 with TITANS — The post introduces O-TITANS, an Orthogonal LoRA approach for Gemma 3 that leverages Google’s TITANS memory architecture. It outlines MoOLE-T, a Mixture of Orthogonal LoRA Experts using an 8B router to select one or more O-LoRAs for parallel inference, with outputs de-conflicted at an exit node running a larger 20B-80B model. The design promises scalable, non-interfering skill modules and potential to train 100+ O-LoRAs. Source-reddit
  • Ouro 2.6B GGUFs Released: Q8_0 and Q4_K_M Live — Ouro released its 2.6B GGUF models (Q8_0 and Q4_K_M) on HuggingFace, compatible with LM Studio, Ollama, and llama.cpp. Ouro is a looped inference model that performs multiple reasoning iterations before final output, with the model’s thinking visible in results. Release notes clarify that the GGUF format follows standard Llama architecture, but Ouro includes three custom features; notably, the early exit gate tensor is skipped in this release. Source-reddit
  • Nanbeige 4.1 Tops Small LLMs, Beats Qwen 4B — A Reddit user claims Nanbeige 4.1 is the best small LLM and reportedly outperforms Qwen 4B when given enough room to think. The post positions Nanbeige as the go-to local LLM, signaling a favorable comparison against Qwen 4B. Source attribution: /u/Individual-Source618 on r/LocalLLaMA. Source-reddit
  • Anthropic’s Claude Code: Terminal AI for Coding and Git — Claude Code is an agentic coding tool that runs in your terminal, understands your codebase, and speeds coding by performing routine tasks, explaining complex code, and managing git workflows through natural language commands. It can be used in the terminal, IDE, or via GitHub mentions (@claude). Installation notes emphasize recommended methods and caution that npm installation is deprecated, pointing users to setup docs. Source-github

AI in Sales

  • Ashera AI analyzes GTM calls to turn truth into action — Ashera AI uses AI to analyze go-to-market sales calls and deliver actionable guidance rather than generic summaries. It provides in-call guidance, extracts risks/objections/next steps after each call, updates your CRM automatically, and scores accounts to show deal health. Its differentiator is one source of truth across the entire sales journey to keep teams aligned on what was said; free plans are available on Product Hunt. Source-producthunt

Open Source

  • zclaw: Personal AI assistant under 888 KB on ESP32 — Zclaw is an open-source personal AI assistant designed to run on an ESP32 MCU, weighing under 888 KB. It demonstrates ultra lightweight AI on microcontrollers, enabling on-device inference without cloud reliance. The project is hosted on GitHub and discussed on Hacker News, signaling community interest in embedded AI. Source-hackernews
  • Kon releases tiny open-source coding agent — Kon introduces a new open-source coding agent named kon, running glm-4.7-flash-q4 on a consumer rig (i7-14700F, 64GB RAM, RTX 3090). The project highlights a compact harness with roughly 215 system-prompt tokens and 600 tool-definition tokens, keeping conversations under 1k tokens before context. As of February 22, 2026, the repo has about 112 files and is pitched as a minimal, fork-and-extend coding agent. Source-reddit

LLMs

  • IQ2 Quantization Delivers Speed and Quality Parity for LLMs — A Reddit user tests UD-IQ2_XXS on Qwen3-30B-A3B (10.3 GB) and reports ~5x speedup (100 TPS vs 20 TPS) with full GPU offload, and quality on high-school/college topics comparable to Q4_K_M. In niche areas like Gödel’s Incompleteness Theorem, IQ2 trails slightly (81/100 vs 92), and a 10 GB IQ2 model even solved a graph question that Claude Opus 4.6 and Sonnet 4.6 missed. The post questions why ultra-low quantization hasn’t been more hyped. Source-reddit

⚡ Quick Bites

  • Critics Say Frontier Labs’ AI Claims Produce Buggy, Resource-Heavy Software — A post mocks Frontier Labs for claiming AI writes their code, arguing the released products are buggy and resource-hungry. The author says this misrepresents both their products and their worldview. Source-twitter
  • AI Should Augment Knowledge, Not Outsource Cognition — Francois Chollet argues that AI should serve as an interface to information, helping people deepen and improve their knowledge and mental models. He warns against using AI as a crutch that outsources thinking and degrades personal cognition. Source-twitter
  • Codex API accessible via app-server enables local iPhone integration — A developer describes Codex providing a friendly API accessible by running ‘codex app-server’. They unexpectedly built a native Codex iPhone app that can spawn and talk to Codex instances over a local network, with the Codex integration running directly on the iPhone. Source-twitter
  • Figure’s autonomous robots run 24/7 with HLS playback — A tweet highlights Figure’s autonomous robots that operate continuously, rain or shine. The system reportedly enables HTTP Live Streaming (HLS) playback, underscoring ongoing advances in autonomous robotics. Source-twitter
  • Your moat: You + AI, not fear of replacement — An AI-focused tweet argues that workers should stop fearing AI replacing them. Instead, they should maximize the advantage of collaborating with AI—the ‘you + AI’ vs ‘others + AI’ gap becomes their moat. The message emphasizes AI augmentation as a strategic differentiator in the job market. Source-twitter
  • Straion centralizes rules for AI coding agents to boost speed — Straion offers centralized rules management for AI coding agents like Claude Code, Github Copilot, and Cursor. The platform automatically selects the appropriate rules for each task, enabling the delivery of enterprise-ready code at accelerated speed. It positions Straion as an orchestration layer for AI coding tools. Source-producthunt
  • Cloudflare Debuts AI Agents Platform Powered by Durable Objects — Cloudflare launches AI Agents, a platform to deploy persistent, stateful agent workloads on its edge network powered by Durable Objects. Agents provide real-time communication, scheduling, AI model calls, MCP, and workflows, with idle hibernation and massive scalability at no cost when inactive. Developers can start with npm create cloudflare@latest — —template cloudflare/agents-starter or add to existing projects via npm install agents. Source-github
  • GitNexus: Browser-Based Code Knowledge Graph and AI Agent — GitNexus is a client-side tool that indexes a GitHub repo or ZIP into a browser-run knowledge graph, capturing dependencies, call chains, and execution flow. It provides a Web UI for interactive exploration and a Graph RAG Agent, with CLI tooling (MCP) to give AI agents a deeper architectural view for reliable code understanding. Source-github
  • AI uBlock Blacklist Launches Open-Source AI-Blocking List — A Hacker News discussion highlights the open-source project ai-ublock-blacklist, a GitHub-hosted list intended to block AI-related domains in uBlock Origin. The thread has drawn significant engagement, indicating notable interest in privacy-focused, AI-related ad-blocking tools. The project provides a curated resource for users seeking to block AI services in their browsers. Source-hackernews
  • PSA: OpenClaw Injected in Latest Cline Release — Public agentic tools are shipping updates rapidly with questionable quality. A Reddit post alleges a recent Cline release included an OpenClaw installer, implying widespread OpenClaw exposure and unsafe VSCode extensions. The message calls for greater scrutiny of tooling and advises turning off auto-updates for VSCode extensions. Source-reddit
  • Culture and execution drive great AI products. — The author argues that successful AI products require both creativity and rigorous technical execution, emphasizing the importance of culture and space for ideas to mature and be fueled by strong execution. It warns against building for fictitious users, arguing that real user needs come from personal, user-centered projects. The piece cites Pedro Domingos and notes Anthropic and products like Claude Code, Cowork, and MCP as examples discussed on Twitter. Source-twitter
  • Thoughts on AI and Math, Inspired by First Proof — This post offers a brief reflection on the relationship between AI and mathematics, inspired by First Proof. It considers how mathematical ideas might inform AI research and how AI could illuminate mathematical thinking. Source-twitter
  • Seeking Reliable Coding Agent for Local AI Models — Reddit user criticizes coding-agent options for local models, citing Claude Code’s frequent context recalculation and OpenCode’s lack of a permissions model. They also mention Cline’s OpenClaw installation on users’ machines, arguing for a stable, secure, permission-aware agent that can run with a local model. They request recommendations and reference Roo and Pi as competitors. Source-reddit
  • Best LLMs for a Single RTX 3090 in 2026 — A Reddit post asks for recommendations on the best overall model for coding and reasoning on a single RTX 3090 (24GB VRAM) in 2026. Priorities include strong code generation (Go/TypeScript), deep reasoning, staying within 24GB (quantization allowed), and decent latency for local inference. The author seeks specific model names and quant setups, citing Qwen and DeepSeek as potential options. Source-reddit
  • Which AI model are you waiting for: 9B or 35B? — A Reddit discussion asks readers which size of the LocalLLaMA model they prefer, the 9B or 35B parameter version. The post links to a LocalLLaMA thread and invites opinions on release timing and usability. No concrete announcements are provided. Source-reddit
  • Lawyer Says Google Shut Gmail, Voice, Photos After NotebookLM Upload — Reddit user /u/Thrumpwart alleges Google disabled his Gmail, Voice, and Photos shortly after uploading content to NotebookLM. The post frames this as a data-handling concern within the broader Local LLMs discussion, but the incident remains unverified. The claim is based on a social media post and not independently corroborated. Source-reddit
  • Anthropic’s internal tools reportedly include Slack, Zoom, Figma — An informal tweet claims Anthropic uses mainstream collaboration tools such as Slack, Zoom, Figma, Notion, Workday, and Google Workspace. The author asks Anthropic to correct them, suggesting these tools are part of the company’s daily workflow. The post illustrates how AI labs’ tool stacks often resemble standard enterprise software. Source-twitter
  • Gemini 3.1 Pro Hailed as Smartest, User Displeased — A tweet extolls Gemini 3.1 Pro as the smartest model yet, but the author says they hate using it. The post also requests enabling HLS playback. Source-twitter
  • AI math is narrow; author misses doing real math — A tweet laments that AI relies on a small subset of mathematical ideas and expresses a longing to engage in deeper, real mathematics. The author reflects on the theoretical breadth of AI and its dependence on limited mathematics. Source-twitter

Generated by AI News Agent | 2026-02-21