AI Daily — 2026-04-04

English 中文

OpenAI's GPT-Image-2 leaks with strong world knowledge · Qwen 3.6 Plus Tops OpenRouter With 1 Tri...

Covering 18 AI news items

🔥 Top Stories

1. OpenAI’s GPT-Image-2 leaks with strong world knowledge

OpenAI’s GPT-Image-2 has leaked, claiming extremely strong world knowledge and excellent text rendering. The post lists code names maskingtape-alpha, gaffertape-alpha, and packingtape-alpha, and suggests it may outperform Nano Banana Pro; the model is associated with Arena. Source-twitter

2. Qwen 3.6 Plus Tops OpenRouter With 1 Trillion Tokens

Alibaba’s Qwen 3.6 Plus becomes the first model on OpenRouter to process over 1 trillion tokens in a single day, hitting about 1.4 trillion. This marks the strongest one-day performance for any model released this year, with congratulations extended to the Qwen team. Source-twitter

3. T3 Code Safe for Claude Subs; Local Use Allowed

A post on X/Twitter claims that T3 Code is confirmed safe for Claude Subscriptions and that explicit confirmation exists that tools wrapping Claude Code for local use are permitted. The message frames this as a positive update amid discussions about Anthropic’s messaging strategy. Source-twitter

📰 Featured

LLM

GLM-5 Nearly Matches Claude Opus at 11× Lower Cost — YC-Bench ran a year-long trial where 12 LLMs manage a simulated startup with hundreds of turns. GLM-5 nearly matched Claude Opus 4.6 in performance while costing about 11× less per run; top models reached roughly $1.2M avg final funds, while many others failed. The study highlights long-horizon coherence under delayed feedback and spotlights Kimi-K2.5 as best for revenue-per-API-dollar. Source-reddit
Extended NYT Connections Benchmark: MiniMax-M2.7 Leads at 34.4 — An extended NYT Connections benchmark reports open-source model scores: MiniMax-M2.7 achieves 34.4, Gemma 4 (31B) 30.1, and Arcee Trinity Large Thinking 29.5. The results, shared via a GitHub repository (lechmazur/nyt-connections) and a Reddit post, highlight ongoing progress in AI benchmarking among open-source models. Source-reddit
Ollama Cloud Lets OpenClaw Run on $20 Plan — Ollama promotes its cloud as a top place to run OpenClaw, noting a $20 plan is enough for everyday usage of open models like kimi-k2.5:cloud, glm-5:cloud, and minimax-m2.7:cloud. The post includes a call-to-action to switch via the terminal command ollama launch openclaw, and mentions a Verge piece about Anthropic restricting OpenClaw access to Claude through extra charges. Source-twitter
Llama-server onboarding Gemma-4-26B GGUF with OpenAI compatibility — This post describes onboarding a llama-server instance to run the Gemma-4-26B GGUF model (ggml-org/gemma-4-26b-a4b-it-GGUF) in non-interactive mode. It uses a local OpenAI-compatible API at http://127.0.0.1:8080/v1 with a custom-model-id and API key, and plaintext secret input, with a risk-acceptance setting. Source-twitter
Onyx Open Source AI Platform Highlights RAG and Deep Research — Onyx is an open-source AI platform that provides an application layer for LLMs with a feature-rich, self-hosted interface. It delivers Agentic RAG, deep research workflows, custom AI agents, and web search, along with 50+ out-of-the-box connectors. The project emphasizes easy deployment via a one-line curl command and claims top leaderboard standing for deep research as of February 2026. Source-github
Hermes Agent Emerges as Best Open-Source Local-Model Agent — Hermes agent from Nous Research is praised for superior local-model support, including per-model tool call parsing that works on 30B-class models. It natively supports Ollama, vLLM, and SGLang, offers six terminal backends (including Modal and Daytona) for serverless use, and provides a one-process gateway for multiple messaging platforms. The self-improving Honcho feature is off by default but becomes noticeable when enabled via config.yaml; it also includes built-in OpenClaw migration. Source-reddit

Industry

MiniMax Plan Enables Cross-Third-Party AI Harnesses — The MiniMax Token Plan is designed to work across third-party harnesses, arguing that external AI ideas will outpace lab-developed ones. It warns that restricting AI subs to first-party products stifles innovation before it can emerge. The post calls for openness to outside AI usage and ideas. Source-twitter

⚡ Quick Bites

Stanford CS153: AI Scaling and the Compute Bottleneck — The lecture outlines a four-bottlenecks framework for scalable AI systems and emphasizes empirical validation. It discusses shifting cloud costs, the distinction between verifiable and fuzzy progress, and why chip scarcity and CapEx growth prevent compute from becoming a commodity. Source-twitter
AI empowers people to boost government visibility and accountability — A tweet argues that people, empowered by AI, can enhance the visibility, legibility, and accountability of their governments, reversing the historical trend of governance shaping society. It notes that while governments publish vast data, the bottleneck is processing and deriving insights, not access. The post cites large bills and FOIA responses as examples of information that AI could help make legible for non-experts. Source-twitter
China’s Massive AI Training Labs Signal the Next Revolution — Posts highlight China’s massive AI training facilities and argue that job security is not safe as automation advances. The author says the next revolution is already being trained, signaling rapid shifts in AI capabilities and labor markets. Source-twitter
Steerable Visual Representations: Guiding ViT Features with Prompts — Pretrained Vision Transformers like DINOv2 and MAE deliver generic image features that capture the most salient cues. The article discusses steering representations toward less prominent concepts, contrasting this with Multimodal LLMs which can be guided via prompts but may become language-focused and lose pure visual information. Source-huggingface
NVFP4 Still Missing in DGX Spark After Six Months — A Reddit user who owns two DGX Sparks says NVFP4 has not been properly delivered after six months. They note that Blackwell + NVFP4 on a local AI setup with NVIDIA’s software stack made the system compelling, but Spark now requires workarounds and backend tinkering to function. The product was pitched as a finished premium system, not an experimental dev kit, and the delay undermines that claim. Source-reddit
Qwen 3.5 vs Gemma 4: Is There a Winner? — A Reddit post asks which AI model is superior between Qwen 3.5 and Gemma 4, seeking a clear winner. The discussion compares open-source LLMs, inviting community input on performance and benchmarks. Source-reddit
Apple: Embarrassingly Simple Self-Distillation Improves Code Generation — Reddit post titled ‘Apple: Embarrassingly Simple Self-Distillation Improves Code Generation’ discusses a straightforward self-distillation method claimed to improve code generation. The discussion references Apple and is linked to the LocalLLaMA subreddit, framing it as an AI-tech topic. Source-reddit
OpenAI researcher reportedly runs toward Anthropic office daily; Saturday attendance teased — A Twitter post describes an OpenAI researcher allegedly running toward Anthropic’s office every day, mentioning a person named Gabriel. The author speculates about Anthropic’s Saturday office attendance. The content appears to be informal gossip rather than a substantive AI development update. Source-twitter

Generated by AI News Agent | 2026-04-04