AI Daily — 2026-05-10

English 中文

NVIDIA Star Elastic: One Checkpoint for 30B/23B/12B LLMs · Cull: Open-Source Image Dataset Scrapi...

Covering 25 AI news items

🔥 Top Stories

1. NVIDIA Star Elastic: One Checkpoint for 30B/23B/12B LLMs

NVIDIA released Star Elastic, a single checkpoint that includes 30B, 23B, and 12B reasoning models with zero-shot slicing, enabling dynamic scaling and cross-model guidance. The approach treats models as nested layers sharing a KV cache, allowing fast switching between sizes and local offline inference. The design borrows from dense and MoE ideas, enabling a scalable, ‘Russian doll’ ensemble that can generate reasoning quickly and refine outputs by cycling through models. Source-reddit

2. Cull: Open-Source Image Dataset Scraping & Classification Tool

An open-source tool named Cull is introduced as a machine curation engine for AI image datasets. It scrapes images and prompts from multiple sources (Civitai, X/Twitter, Reddit, Discord, and various gallery sites), deduplicates per source, and classifies items using vision-language models with a strict 17-field JSON schema. Results are organized in a local workflow next to prompts and audit records, supporting use in LoRA training and dataset curation with two quality gates for overall quality and topic relevance. Source-reddit

📰 Featured

LLM

Codex Autonomously Earns $506/Month via Security Audits — A user employed OpenAI’s Codex to autonomously pursue and complete an open-source security/audit bounty. It produced a legitimate PR, engaged with maintainers, and secured a first payment of $16.88, which equates to about $506 per month if repeated daily, illustrating an early AI-agent monetization use case. Source-twitter
GPT-5.5 Enables Capabilities Previously Impossible, Says Bubeck — Sebastien Bubeck replies to @roydanroy, claiming that the topics discussed could not have happened prior to GPT-5.5. The tweet implies a notable leap in capabilities tied to the new model version. It reflects ongoing debates about the boundaries of AI progress. Source-twitter
MTP Benchmarks: Task Type Drives Inference Speed (Coding vs Creative) — An AI researcher reports extensive MTP benchmarking on Qwen 3.6 27B, analyzing 300+ tests across task types, temperatures, and MTP quantizations. They find that F16 + MTP nearly triples coding task speed, while Q4_K_M + MTP slows creative writing; the same features and model lead to opposite results depending on the task, suggesting speculative inference behavior is a primary performance driver. The study notes limits (not all quant sizes tested) but highlights a clear link between task type and speed with speculative inference. Source-reddit
Qwen3.6 35B A3B on 8GB VRAM, 190k Context — A Reddit post demonstrates running Qwen3.6-35B-A3B on an RTX 4060 with 8GB VRAM and 32GB RAM, using a Linux laptop as a Tailscale-accessible server to achieve ~190k context. It reports token-throughput around 37-43 tok/sec for tested variants, with tweaks pushing to ~51 tok/sec by adjusting ctx-size, n-gpu-layers, and n-cpu-moe, along with Q5 quant models. The setup highlights practical open-source LLM inference on limited GPU resources. Source-reddit
DeepSeek-V4-Flash MTP patch achieves 85 tok/s on RTX Pro 6000 — A retrofitted MTP block and GPTQ tuning on DeepSeek-V4-Flash-W4A16-FP8 boosts decoding speed to 85.52 tok/s at 524k context (2-stream) and ~111 tok/s at 128k context (single-stream). The model (671B total / 32B active) runs on two RTX PRO 6000 Max-Q GPUs (96 GB each, no NVLink) and is patched with vLLM; the work is documented with a HuggingFace release and a Reddit post. Source-reddit

Open Source

ByteDance Open-Source Multimodal AI Agent Stack — ByteDance announces the Open-Source Multimodal AI Agent Stack, introducing two projects: Agent TARS and UI-TARS-desktop. Agent TARS provides a general multimodal AI agent with GUI and Vision integration accessible via CLI and Web UI, aiming for human-like task completion and seamless integration with real-world MCP tools. UI-TARS Desktop offers a native GUI desktop application for local/remote computers and browser operators, highlighting open-source accessibility through the project stack. Source-github
Rowboat Launches Open-Source Personal AI Knowledge Graph — Rowboat Labs released Rowboat, an open-source AI coworker that builds a long-lived knowledge graph from your email and meeting notes and runs locally on your machine. It uses this graph to generate decks, prep meeting briefs, track topics, and visualize/edit the graph via a Markdown interface. It’s downloadable for Mac, Windows, and Linux via GitHub. Source-github
Open-Source Hyperparameter Search Tool for Diffusion Fine-Tunes — A developer released Bracket, an open-source tool to automate hyperparameter search for diffusion model fine-tuning. It runs multiple short training trials in parallel using Optuna’s TPE, then scores results with both training-loss trajectories and a local VLM-based image-quality assessment. The tool outputs a Markdown report with Welch’s t-test results to declare a statistically superior configuration, and it orchestrates existing training scripts (musubi-tuner and sd-scripts) rather than reimplementing training. Source-reddit

AI

AgentMemory Provides Persistent Memory for AI Coding Agents — AgentMemory provides a persistent memory layer for AI coding agents, letting them remember context across sessions and avoid repeated explanations. Built on the iii engine, it supports Claude Code, Cursor, Gemini CLI, Codex CLI, pi, and OpenCode, plus MCP clients, and adds confidence scoring, lifecycle, knowledge graphs, and hybrid search. It works with any agent via hooks, MCP, or REST API and uses a shared memory server across implementations. Source-github
Oracle AI Developer Hub Enables AI Apps on OCI — The Oracle AI Developer Hub provides technical resources for AI developers to build applications, agents, and systems using Oracle AI Database and OCI services. The repository is organized into apps and reference implementations, with source code, deployment configurations, and documentation to showcase end-to-end, production-grade AI solutions on Oracle tech. It includes example apps such as FitTracker, illustrating practical integration patterns and best practices. Source-github

Hardware

NCCL-Free Tensor Parallelism on Dual Blackwell PCIe with llama.cpp b9095 Released — The llama.cpp b9095 release enables NCCL-free tensor parallelism (-sm) on dual Blackwell PCIe GPUs. This could significantly improve performance for users running dual Blackwell hardware. The discussion references upcoming benchmark results for 2x5060ti setups. Source-reddit

⚡ Quick Bites

AI models’ origins mapped worldwide; SV leads topics — Yann LeCun replies with a playful inventory of where AI models and related tech are associated around the world. The thread lists models and projects like AlphaGo, AlphaFold, ESMFold, Llama, DeepSeek, DINO, and JEPA across cities, noting Silicon Valley is ahead on topics by about three months. It highlights the global distribution of AI R&D activity and competition across labs. Source-twitter
Why the US lacks a competitive open-source AI lab — A Twitter post argues that the US still does not have a truly competitive open-source AI model lab. It contends that funding and compute are not the bottlenecks, since neolabs have raised billions and US labs have hardware access, and asks what underlying issue is at play. Source-twitter
Shopify’s River agent lives in Slack for public learning — Shopify’s River AI agent operates inside Slack but is restricted to public usage so colleagues can learn from each other’s workflows. The setup is likened to Midjourney’s Discord-first launch, which let users master prompting by watching peers. The note references Shopify CEO Tobias Lütke and ongoing discussions about transparent internal tools. Source-twitter
Karpathy’s wikiLLM Turns Obsidian into a Second Brain — Karpathy previously introduced wikiLLM, integrating Obsidian with Claude’s code/codex. A recovering author sets it up and expresses enthusiasm, claiming it gives them a ‘second brain.’ The post highlights wikiLLM as a notable AI knowledge-management tool. Source-twitter
SFT, RL, OPD and Generalization vs Catastrophic Forgetting — A blog post examines how supervised fine-tuning (SFT), reinforcement learning (RL), and OPD relate to generalization and catastrophic forgetting in AI models. It discusses implications for model stability and memory retention across training paradigms. Source-twitter
Gauging Real Tokens per Second for Local LLMs — An Reddit post discusses how raw tokens-per-second figures often fail to convey true speed. It promotes a script that measures tokens/second for text, code, and reasoning to give a practical feel for performance in local LLMs, referencing Qwen 3.6-27B at 21 tokens/second as an example. Source-reddit
Gemma-4-26b-a4b Excels at One-Shot Three.js Code — A Reddit post praises Gemma-4-26b-a4b for its strong one-shot prompting in generating Three.js code. The author describes a Python app that cycles prompts, writes HTML from a CSV of prompts, detects crashes, and archives finished demos, linking to a static demo and a GitHub page. Source-reddit
Hermes Agent adds LINE gateway channel — Hermes Agent now officially supports LINE as a gateway channel for interacting with your agent. You can set up Hermes Agent as a LINE Messaging API bot and start using it with the hermes update command; docs are at hermes-agent.nousresearch.co. Source-twitter
Aurora Optimizer blog: heed Rohan’s AI-lab insights — A note on the Aurora Optimizer blog emphasizes heeding Rohan’s insights. It includes a technical aside about dead neurons causing zero gradients that ‘muon’ is trying to revive, especially in preconditioner contexts. The thread also questions why the US lacks a competitive open-source model lab despite funding and access to hardware. Source-twitter
Sama hints at naming the next AI model ‘Goblin’ — A tweet from OpenAI CEO Sam Altman hints at naming the next AI model ‘Goblin’. The remark appears playful rather than an official product announcement. It underscores ongoing chatter about branding for future models, without indicating concrete plans. Source-twitter
Removing Unnecessary Agent-Written Comments and Test Code — A post notes removing extra comments and test code generated by an AI agent. The author, isaniss, mentions this on May 9 and adds that they can do it all day, with a note about enabling HLS playback. Source-twitter
Overdoing Local LLMs: Coil Whine Haunts Sleep — Reddit user /u/MrChilliBalls posts about spending too much time tinkering with local large language models (Local LLMs). They humorously report coil whine in their sleep and ask for advice from the community. The post appears in the r/LocalLLaMA subreddit, reflecting ongoing interest in open-source local LLM exploration. Source-reddit

Generated by AI News Agent | 2026-05-10