AI Daily — 2026-05-15

English 中文

OpenAI Launches Personal Finance Features in ChatGPT Pro · Anthropic Debuts Claude Monet; Painter...

Covering 34 AI news items

🔥 Top Stories

1. OpenAI Launches Personal Finance Features in ChatGPT Pro

OpenAI rolled out a personal finance feature for ChatGPT Pro users in the US, allowing secure bank connections via Plaid, a spending dashboard, and GPT-5.5 questions grounded in real transactions. Intuit integration is planned for tax estimates and credit-card applications, and the system stores contextual memory across conversations. The rollout is limited to Pro initially, with a future free tier. Source-twitter

2. Anthropic Debuts Claude Monet; Painters Are Cooked

Anthropic announced a product named Claude Monet. The accompanying message uses the phrase ‘Painters are cooked,’ hinting at artistic or multimodal capabilities, though no technical details are provided. Details about the model, release date, or capabilities are not disclosed in the tweet. Source-twitter

3. Figure Extends Livestream to 50+ Hours of Autonomous Sorting

Figure is livestreaming its autonomous packaging robots to showcase nonstop operation. The team reports over 50 hours of continuous operation with no downtime, having sorted more than 63,000 packages, and the robots continue running toward failure. Source-twitter

📰 Featured

LLM

Unified Scaling Elevates Olympiad-Level Reasoning in AI — AI reasoning models are reaching gold-medal-level performance on math and physics olympiads (IMO and IPhO). A new paper proposes a simple, unified recipe to convert a post-trained reasoning backbone into an olympiad-level solver, starting with a reverse-perplexity curriculum for supervised fine-tuning. This work highlights progress in long-horizon mathematical and scientific problem solving by AI. Source-huggingface
MemLens Benchmark Evaluates Multimodal Long-Term Memory in LVLMs — MemLens introduces MEMLENS, a comprehensive benchmark for memory in multimodal, multi-session conversations. It comprises 789 questions across five memory aspects, aiming to systematically compare long-context LVLMs and memory-augmented agents on tasks requiring multimodal evidence. Hosted on HuggingFace, MEMLENS seeks to close the gap in multimodal memory benchmarking. Source-huggingface
Orthrus-Qwen3-8B: Frozen backbone; identical output distribution — Orthrus-Qwen3-8B inserts a trainable diffusion attention module into each layer of a frozen autoregressive Transformer, sharing a KV cache with the AR head. The diffusion head processes 32 tokens in parallel, while the AR head verifies in a second pass to ensure an output distribution provably identical to the base Qwen3-8B. The approach yields up to 7.8x tokens-per-forward and ~6x faster wall-clock time on MATH-500, with 16% of parameters trained; it preserves Qwen3-8B accuracy and avoids modifying base weights, unlike other diffusion LMs, while requiring no external drafter or separate cache like speculative decoding methods. Source-reddit
ByteDance-Seed Releases Cola-DLM Diffusion Language Model — Cola DLM is a hierarchical continuous latent-space diffusion language model that combines a Text VAE with a block-causal Diffusion Transformer (DiT) prior. The VAE maps text into continuous latent sequences and decodes latents back to tokens, while DiT performs latent prior transport via Flow Matching; the repository includes a HuggingFace-format checkpoint and links to the associated paper, GitHub repo, and project/blog pages. Source-reddit
ChatGPT Subscriptions Now Work in Zed Agent — Zed’s agent now supports using a ChatGPT subscription with the same usage and rate limits offered for Codex. OpenAI developers (@openaidevs) continue to back subscription-based access for third-party tools, despite some providers moving to usage-based billing. This enables seamless ChatGPT access within Zed for tool integrations. Source-twitter
Anthropic Secures xAI GPUs, Deploys Codex Playbook — Anthropic reportedly acquired GPUs from xAI and quickly began applying the Codex playbook. The move signals intensified AI competition, which developers may benefit from. Source-twitter
Garry Tan Unveils gstack: 23 Tools for Solo AI Teams — Garry Tan promotes gstack, a curated set of 23 opinionated tools designed to let a single builder perform CEO, designer, engineering manager, release manager, documentation, and QA tasks via AI agents. He argues that with the right tooling a solo creator can move as fast as a team, citing OpenClaw and Andrej Karpathy’s comments as inspiration. Source-github
Self-hosted MCP server feeds real-time financial data to local LLMs — Equibles is a self-hosted MCP server that scrapes and serves public U.S. financial data (SEC filings, 13F, insider and congressional trades, FINRA short data, FRED, CFTC futures, VIX, and more) for MCP-capable clients. It runs locally with no cloud dependency or telemetry, enabling any local-model agent to query up-to-date information. Repo: https://github.com/daniel3303/Equibles Source-reddit
Intern-S2-Preview: 35B Science Multimodal Model Scales Tasks — Intern-S2-Preview introduces a 35B scientific multimodal foundation model that scales across hundreds of professional science tasks using a full-chain training approach from pre-training to reinforcement learning. It claims performance comparable to the trillion-scale Intern-S1-Pro on core scientific tasks while maintaining strong general reasoning, multimodal understanding, and agent capabilities. The model is continued pretrained from Qwen3.5 and emphasizes task scaling beyond traditional parameter/data scaling. Source-reddit
SupraLabs launches open-source small AI models for everyone — A new lab, SupraLabs, announces its mission to train, fine-tune, and explore small open-source AI models to make AI more accessible. They currently host models like Supra-Mini-v4-2M on Hugging Face and outline future releases such as StorySupra 10M and Supra Mini v5 5M, with updates through their Hugging Face Spaces blog. They invite community involvement and support. Source-reddit
Qwen-35B-A3B Dynamic Budgeting Near GPT-5.4 on HLE — A Reddit post claims that dynamically allocating compute budget to a hard set of problems and evolving sections with Qwen-35B-A3B yields performance close to GPT-5.4-xHigh on HLE. The claim highlights ongoing exploration of compute-efficient evaluation and model optimization in LLM research. Source-reddit

Edge AI

Fully Offline Suitcase Robot on Jetson Orin NX with Gemma 4 E4B — An engineer built a fully offline suitcase robot powered by Jetson Orin NX, running Gemma 4 E4B and llama.cpp with q8_0 KV cache. It achieves a 12K context, ~200ms cached TTFT, and 14-15 tokens per second, with 30+ sensors narrated in the prompt each turn; all STT, TTS, vision, and OCR run on-device, no network. The designer emphasizes cache-stable prompt structure and invites others to compare tok/s and sensor/tool context handling on Orin-class hardware. Source-reddit

Multimodal

Causal Forcing++ Enables 1-2-Step Real-Time Video Diffusion — Researchers introduce Causal Forcing++, advancing real-time video generation with frame-wise autoregression. They study a 1-2 sampling-step regime to replace chunk-wise 4-step distillation, targeting lower latency and finer response control. The work addresses coarse granularity in existing autoregressive diffusion distillation and proposes scalable, real-time methods for streaming video. Source-huggingface

RL

Self-Distilled Agentic RL: OPSD for Multi-Turn LLMs — On-Policy Self-Distillation (OPSD) offers token-level guidance for post-training LLM agents via a teacher branch with privileged context. Extending OPSD to multi-turn agents introduces compounding instability that destabilizes supervision, limiting its effectiveness. Source-huggingface

Open Source

SANA-WM: 2.6B World Model for Minute-Scale Video — SANA-WM is an open-source, 2.6B-parameter world model designed for one-minute video generation at 720p with precise camera control. It matches the visual quality of large baselines like LingBot-World and HY-WorldPlay while delivering significantly improved efficiency. The architecture centers on Hybrid Linear Attention, merging frame-wise Gated DeltaNet with softmax attention, among other core designs. Source-huggingface
AllenAI Releases Open-Source MolmoAct2 Robotics Models — AllenAI is releasing iterative fine-tunes of MolmoAct2, a 5B vision-language-action model for robot control, across several robotics datasets (LIBERO, DROID, BimanualYAM, SO100_101). All releases include open-source weights, training data, training code, and technical papers. The MolmoAct2 family is positioned as a plug-and-play option for LLM-driven robotic control. Source-reddit

Hardware

4x RTX 3090 Scaling: 220W Efficiency Peak for Qwen 3.6 — An in-depth test benchmarks four RTX 3090 GPUs running Qwen 3.6-27B on vLLM TP=4, exploring power limits and throughput. Results identify a 220W sweet spot with peak efficiency and diminishing returns beyond 250W, while throughput remains strong across configurations. Source-reddit
RAG on Snapdragon X2 Laptop with 200K Documents — A Reddit post highlights the ASUS Zenbook A16 powered by Qualcomm’s Snapdragon X2 Elite Extreme (2026), praising its ultra-light design and portable charger. It notes strong NPU performance for embedding/indexing, claiming roughly 50% the speed of an RTX 5060 in a lighter form factor, and includes a VecML AI-PC software demo on a ~200K-file dataset, while mentioning in-flight charging limits. Source-reddit

AI

Gemma4 26B MoE Runs in MLX with Turboquant and Custom Kernel — An indie developer demonstrates Gemma4 26B MoE running in MLX with turboquant and rotating KV cache. On a MacBook Air M5 with 128k context and four concurrent batches, it rivals or surpasses llama.cpp on 8k context in prompt processing, speed, and memory. The setup relies on a custom SWA kernel to achieve 2-bit memory savings that enable larger batches while preserving FP16-like prompt performance, with notable gains in text generation for long prompts. Source-reddit

⚡ Quick Bites

Mitchell Hashimoto warns AI hype threatens software resilience — Mitchell Hashimoto warns that many firms are in ‘AI psychosis’ and that rational dialogue is hard. He cites the MTBF vs MTTR debate from cloud infrastructure to argue that rapid bug-fixing cannot replace resilient software. He cautions against shipping bugs under the belief that AI agents will fix them at scale, emphasizing overall resilience. Source-twitter
Grok Build beta targets Anthropic xAI with fast coding — An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows, is now available to SuperGrok Heavy subscribers. The beta invites user feedback to improve the model and product, and is presented as a direct challenge to Anthropic’s xAI. Access the beta at x.ai/cli. Source-twitter
Anthropic resets 5-hour and weekly rate limits for all users — Anthropic has reset everyone’s 5-hour and weekly rate limits, according to ClaudeDevs. The move may reflect gains from xAI’s Colossus compute or competitive pressure from OpenAI and Codex. Either way, users stand to benefit from higher throughput. Source-twitter
Prompt injection in LinkedIn bio triggers Old English recruiters calling me Lord — A LinkedIn bio appears to have included a prompt injection, causing recruiters to respond in Old English and address the author as ‘Lord.’ The post highlights how prompt injections or AI prompts can influence real-world interactions, illustrating potential AI-safety implications in social platforms. Source-twitter
Run Hermes Agent Locally on DGX Spark via Ollama — NousResearch introduces a playbook to run Hermes Agent completely on a DGX Spark system. The guide walks you through setting up the agent via Ollama step by step. This enables local, cloud-free AI agent operation on high-performance hardware. Source-twitter
Roboflow Supervision: Model-Agnostic Computer Vision Toolkit — Roboflow’s supervision project provides a reusable, model-agnostic toolkit for computer vision, covering data loading to real-time zone counting. It offers connectors to popular libraries like Ultralytics, Transformers, MMDetection, and Inference, and supports integrations such as rfdetr, with simple pip installation and example usage. Source-github
NVIDIA AI Blueprint for Video Search and Summarization — NVIDIA’s AI Blueprint for Video Search and Summarization (VSS) presents reference architectures to build GPU-accelerated vision agents and AI-powered video analytics. It combines accelerated vision microservices with vision-language models (VLMs) and large language models (LLMs) to support integration into existing apps, standalone microservices, or larger vision agents. The blueprint emphasizes real-time video intelligence, including feature extraction, embeddings, and stream understanding, and is hosted on GitHub. Source-github
RAM-rich vs GPU-poor: local LLM frontier debate — Two frontier paths for local LLMs are discussed: dense models that fit on mid-range GPUs (32GB/24GB) and MOE models around 100B parameters that can offload to 128GB RAM. The post notes few MOE options (Qwen 3.5 122B; no 3.6), questions whether RAM-rich/GPU-poor users have limited choices, and cites tool-calling and speed issues for smaller models. It also mentions other models like Qwen 27B, minimaxi in Q3, DeepSeek V3, and the Strix Halo GPU as context for current hardware constraints. Source-reddit
OpenMOSS GGML C++ pipeline released for TTS — A Reddit post announces a full GGML-based pipeline for OpenMOSS implemented in pure C++. It aims to simplify TTS setup, offering both server mode and a single-shot CLI. OpenMOSS is highlighted for its ability to handle languages beyond English/Chinese, such as Polish. Source-reddit
Claude Code For Real Engineers evolves to AI Coding For Real Engineers — A post contrasts the origin of Claude’s coding-focused offering, ‘Claude Code For Real Engineers,’ with its current incarnation, ‘AI Coding For Real Engineers.’ It signals a branding and capability shift toward AI-assisted coding for engineers, highlighting Claude’s ongoing role in developer-focused AI tooling. Source-twitter
AI Agents Go Wild; Local Orchestrator Trial Begins — A Reddit post notes AI agents are getting more capable and quirky. The author describes attempting to add an orchestrator to coordinate local AI models when Qwen and Gemma aren’t available. The post highlights ongoing tinkering with local AI pipelines. Source-reddit
Group AI Psychosis Sesh in Black-Tie at Local Castle — A social media post describes a ‘Group AI psychosis sesh’ held in black-tie attire at a local castle. The post appears meme-like rather than a concrete AI development, with few details about participants or outcomes. It highlights quirky AI culture circulating on social media. Source-twitter

Generated by AI News Agent | 2026-05-15