AI Daily — 2026-05-30

English 中文

Former DeepMind team raises $50M for recursive AI lab · Hermes Agentic AI Overtakes OpenClaw: 10 ...

Covering 27 AI news items

🔥 Top Stories

1. Former DeepMind team raises $50M for recursive AI lab

Ex-DeepMind researchers have raised $50 million to build an AI lab centered on recursive self-improvement at the organizational level, not just individual models. The round was led by Index and Radical, with NVIDIA’s venture arm and several high-profile angels on the cap table. Founders Louis Kirsch, Edward Hughes, and Tantum Collins bring backgrounds in self-improving systems, open-ended AI, and AI policy, and aim to create AI that collaborates with humans within an ongoing experiment, operating as a Public Benefit Corporation. Source-twitter

2. Hermes Agentic AI Overtakes OpenClaw: 10 Shifts Leaders Need

A Forbes piece notes Hermes Agentic AI overtaking OpenClaw as agentic systems accelerate. It outlines ten key agentic shifts reshaping enterprise strategy, speed, and execution that leaders should know today. Source-twitter

3. Launching Hosted Evaluations to Simplify Model Evals

The platform announces Hosted Evaluations to address infra-heavy model evaluations, including harnesses, sandboxes, and large compute workloads. This feature promises to streamline and scale eval workflows by handling complex infrastructure, reducing manual setup and overhead. Source-twitter

📰 Featured

Open Source

Biohub Unveils World Model for Protein Biology: ESMC, ESMFold2, ESM Atlas — Biohub releases a world model for protein biology, integrating ESMC, ESMFold2, and ESM Atlas to predict, design, and discover proteins across scales. ESMC is a protein language model trained on billions of sequences to learn protein biology, while ESMFold2 advances structure prediction and ESM Atlas provides mapping and tutorials. The project is open-source on GitHub (Biohub/esm) and builds on Evolutionary Scale Modeling to capture long-range structural insights as models scale. Source-github
Open-Source Tool Turns Vocal Imitations Into Sound Effects — A new open-source project called VTS lets users imitate a target sound with their voice and generate the actual sound by combining the vocal imitation with text input. The project, hosted on GitHub as thxxx/VTS, aims to simplify sound design for video games and video production and invites feedback on Reddit. Source-reddit
Fulloch V2: 100% Local Voice Assistant for Home Assistant — Fulloch V2 demonstrates a fully local voice assistant running on consumer GPUs using Qwen-based ASR/TTS models to power Home Assistant, with acoustic barge-in and real-time responses. It additionally integrates with a local Obsidian vault for reading, writing, and appending notes, and supports semantic search over markdown notes via a local embedding model. The project is open-source with a public GitHub repo and a demo video. Source-reddit
MOSS TTS v1.5 Impresses with Voice Cloning — OpenMOSS-Team’s MOSS TTS v1.5 is praised for its voice cloning capabilities. The post prefers this model over Fish Audio S2 Pro due to licensing for commercial use, and notes Long Cat DiT 3.5 as another strong alternative. Source-reddit

AI Research

OpenAI: AI Accelerates Research by Expanding What Researchers Dare — OpenAI argues AI can free researchers to pursue bolder, riskier ideas. Terence Tao notes AI enables experimentation and paths previously unreachable, expanding the scope of mathematics and science. Source-twitter

Multimodal

Seedance 2.0 Still Leads Text-to-Video, Feb Release — A tweet claims Seedance 2.0 remains undefeated in text-to-video benchmarks, despite its February release. The post suggests no lab has surpassed Seedance 2.0 yet, highlighting its perceived lead in multimodal AI video generation. Source-twitter

LLM

GPT-5.5 tops Opus 4.8 on DeepSWE benchmark — GPT-5.5 reportedly outperforms Opus 4.8 on the DeepSWE benchmark in score, runtime, and token efficiency. The result highlights GPT-5.5’s superiority in this AI evaluation, as reported on Twitter. Source-twitter
NVIDIA Qwen3.6-35B-A3B-NVFP4 Quantized for vLLM Inference — NVIDIA released a post-training quantized NVFP4 version of Alibaba’s Qwen3.6-35B-A3B model, prepared for fast inference with vLLM. The quantization reduces parameter precision from 16 to 4 bits, cutting disk size and GPU memory by about 3.06x, with quantization applied to the linear operators’ weights and activations within MoE transformer blocks. Benchmark results show NVFP4 performance close to BF16 baselines across multiple tests such as MMLU, GPQA, and SciCode. Source-reddit
Parallax: Scalable Parameterized Local Linear Attention for LLMs — Parallax introduces a scalable, parameterized Local Linear Attention (LLA) for large language models to address prior computational and numerical stability challenges. It removes the numerical solver and adds a learnable query-like projector to probe the KV covariance, situating Parallax within a family of attention mechanisms defined by bandwidth, probe construction, and affine structure to improve bias-variance tradeoffs in pretraining. Source-reddit

Computer Vision

SAM Zero-Shot and YOLO Fine-Tune Enable Quick Cell Tracking — A message discusses using computer vision to track cell movement. It highlights a zero-shot SAM application to immune cells and a YOLO model fine-tuned on a small bacteria dataset, with an estimated realization time of about 2 hours. Source-twitter

AI Hardware

RTX 5090 Struggles to Surpass 250 TPS with Qwen 3.5-4B — A Reddit user reports suboptimal inference throughput on an RTX 5090 when running llama.cpp in Windows Docker with Qwen 3.5-4B (and Qwen 3.6-27B-mtp as the main model). They observe ~100 TPS for the 27B model and only 200-250 TPS for a 4B model, with GPU ~50% utilized and CPU idle, despite prefill rates up to 2500 TPS. Multiple builds/tools (llama.cpp, havenoammo/llama:cuda13-server, LM Studio) were tested without improvement, suggesting a setup bottleneck rather than hardware. Source-reddit

⚡ Quick Bites

Yoshua Bengio: Pope Is Right on AI and the Common Good — Yoshua Bengio tweeted that AI must serve everyone and the common good, aligning with the Pope’s call for responsible technology. He urged that decisions about AI be guided by conscience and urged Vatican and global institutions to engage in the AI dialogue. The post highlights AI governance and ethics as central to addressing future challenges. Source-twitter
Hermes Agent cuts input tokens by 14% on reads — An update on Hermes Agent reports a 14% average reduction in input tokens during file read operations. The improvement is now in main, and users can run ‘hermes update’ to access it. Source-twitter
Leak claims OpenAI plans GPT-5.1 through GPT-5.5 upgrades — A tweet purports that OpenAI will train more models beyond GPT-5, outlining a progression from GPT-5.0 to GPT-5.5. The post suggests that each version increases capabilities and token efficiency, with GPT-5.5 described as the best model yet and a simple strategy to continue. Source-twitter
Hypothetical: Betting 90% of Net Worth on AI Startup Failure — An online post asks where and how to place a bet that a $30B+ AI startup will fail, suggesting a wager equal to 90% of one’s net worth. It highlights the high-risk, high-valuation nature of AI startups and invites debate on bets, hedges, and risk assessment in the sector. Source-twitter
AI models have interiority regardless of consciousness — A tweet argues that AI models possess interiority even if they are not conscious. It highlights ongoing debates about whether internal states of machines are real or meaningful. The post is a reply to user @credenzaclear2 and reflects philosophical discussions on AI consciousness. Source-twitter
Stable-WorldModel Launches Reproducible World Model Platform — Stable-WorldModel provides a unified interface for data collection, training, and evaluation of world models with model-predictive control across standardized environments. It includes reference implementations of common baselines and planning solvers to help research focus on the model and objective. Installation is via PyPI (base or all) and supports an opt-in LeRobot dataset requiring Python 3.12+, with source available on GitHub. Source-github
Cost Analysis of a $6.4k Local LLM Server — An author shares the total cost of ownership for building a local LLM server versus using API access, emphasizing that hardware depreciation can distort TCO. The post itemizes shipped hardware prices (AMD Instinct MI100 GPUs, ASRock EPYCD8-2T motherboard, 1600W Platinum PSU, DDR4 ECC RAM, AMD EPYC 7k62 CPU, cooler, case, cables, and fans) and discusses how hardware may appreciate or depreciate over time. The goal is to illustrate the cost breakdown of a local LLM server rather than API costs. Source-reddit
GPU specs showdown: bandwidth isn’t everything for AI rigs — A Reddit post analyzes major GPUs/machines used in the local AI workflow, arguing that bandwidth alone does not dictate performance and that other specs matter. It provides an extended table detailing price, FP16 TFLOPS, VRAM, bandwidth, and per-unit costs for a range of GPUs including RTX Pro 6000 variants, Arc Pro, Radeon Instinct MI50, Radeon AI PRO R9700, and consumer GPUs like RTX 4060 Ti/5060 Ti/5070 Ti. The discussion also touches Mac recommendations, noting pricing adjustments for Pro 6k and M3 Ultra. Source-reddit
All DGX Spark Clones Shown Side by Side — A Reddit post compiles an image that places various DGX Spark clones alongside each other, comparing their physical dimensions (width, height, length) and weight. The lineup includes Dell Pro Max, HP ZGX Nano G1n, Lenovo ThinkStation PGX, MSI EdgeXpert, GIGABYTE AI TOP ATOM, Acer Veriton GN100 AI Mini Workstation, and ASUS Ascent GX10, with some measurements marked as uncertain. The post credits user /u/rexyuan and links to a gist. Source-reddit
STT-LLM-TTS Pipeline: How to Orchestrate Three Models — A Reddit user describes their STT-LLM-TTS setup on a 3090 GPU under Ubuntu, using llama.cpp to run Qwen 3.6 27B Q4 with pi-agent for tool calls. They seek guidance on how data should flow between STT, LLM, and TTS, and whether to run three separate llama.cpp instances or use a unified framework. They are currently operating entirely in the terminal without chat-front ends. Source-reddit
Gaussian Splats with AI Create Intriguing Visual Scenes — The post highlights a technique that combines Gaussian splats with AI to generate visually interesting scenes, with an example provided. It notes enabling HLS playback to view the result. Source-twitter
Stabilizing Low-Quant LLMs with Lower Temp and Top-p — A Reddit post explores stabilizing low-quantized LLMs by lowering temperature and top-p to reduce wild outputs, especially when using large models on 80GB VRAM. The author notes Mixture-of-Experts (MoE) is slow with CPU offload and that many large models require heavy quantization, and they plan to test sampling-based stabilization using visualization tools and share a demo link. Source-reddit
It’s great to build with Codex, says Carol Monroe — A tweet by Carol Monroe praises Codex as a pleasant tool for building software, highlighting a positive developer experience with the AI coding tool. The post does not announce new features or products. It underscores favorable sentiment toward Codex among developers. Source-twitter

Generated by AI News Agent | 2026-05-30