AI Daily — 2026-03-17

English 中文

DeepMind and Kaggle launch global hackathon to measure AGI progress · Nvidia Launches Vera CPU, P...

Covering 42 AI news items

🔥 Top Stories

1. DeepMind and Kaggle launch global hackathon to measure AGI progress

DeepMind is partnering with Kaggle to launch a worldwide hackathon focused on building cognitive evaluations for AI. The challenge tests a proposed framework for measuring progress toward artificial general intelligence, with $200k in prizes. Interested participants can join via the linked challenge page. Source-twitter

2. Nvidia Launches Vera CPU, Purpose-Built for Agentic AI

Nvidia announced Vera, a new CPU designed to power agentic AI workloads. The Vera CPU is marketed as purpose-built for enabling autonomous AI agents with optimized hardware. The announcement underscores Nvidia’s push to provide specialized hardware for agentic AI applications. Source-hackernews

3. Drummer Unveils Skyfall 31B, Valkyrie 49B, Anubis 70B, Anubis Mini 8B

Four new open-source LLMs—Skyfall 31B v4.1, Valkyrie 49B v2.1, Anubis 70B v1.2, and Anubis Mini 8B v1—arrived quietly in the Beaver/HuggingFace ecosystem. TheDrummer says they are major upgrades aligned with Gen 4.0 models and invites community support for compute and inference. The releases have drawn notably positive feedback from the community. Source-reddit

📰 Featured

LLM

Hugging Face releases one-liner to auto-detect hardware and run LLaMA server — A Hugging Face hf-agents one-liner uses llmfit to detect hardware, pick the best LLaMA model and quantization, spin up a llama.cpp server, and launch Pi, the agent behind OpenClaw. This automates local deployment and orchestration of AI agents on consumer hardware. Source-reddit
OpenSeeker democratizes frontier search agents with open training data — OpenSeeker aims to level the playing field for frontier LLM search agents by providing fully open-source training data. The initiative addresses data scarcity that has favored industry players, enabling researchers to study and improve search capabilities. Hosted on Hugging Face, OpenSeeker represents a move toward transparency and community-powered AI research. Source-huggingface
EnterpriseOps-Gym: Benchmarking Agentic Planning for Enterprise LLMs — Large language models are moving from passive providers to active agents for complex workflows. However, enterprise deployment is limited by benchmarks that fail to capture long-horizon planning amid persistent state changes and strict access protocols. The work presents EnterpriseOps-Gym, a benchmark to evaluate agentic planning under realistic enterprise conditions. Source-huggingface
Mistral AI Unveils Forge — Mistral AI announced Forge, a new product from the company. The announcement appeared on Hacker News, signaling community interest with 54 points and 3 comments. Forge is positioned as a new tool for AI development by Mistral AI. Source-hackernews
OpenAI Unveils GPT-5.4 Mini and Nano — OpenAI announced GPT-5.4 Mini and Nano, two new variants of its GPT-5.4 family. The announcement is linked to OpenAI’s site, and a Hacker News discussion has notable engagement (203 points, 127 comments). Source-hackernews
Deep Agents Harness by LangChain Adds Planning, Filesystem, Subagents — Deep Agents is an open-source agent harness built on LangChain and LangGraph. It ships with built-in planning, a filesystem backend, shell access, and the ability to spawn subagents, offering a ready-to-run solution for complex agentic tasks and auto-context management. Source-github
Claude-Mem memory plugin preserves context for Claude Code — Claude-Mem is a Claude Code plugin that automatically captures coding-session tool usage, compresses it with AI using Claude’s agent-sdk, and reintroduces relevant context in subsequent sessions. It aims to preserve context across sessions for smoother Claude Code workflows. The project is open-source and hosted on GitHub, with documentation and setup guides. Source-github
Obsidian Claudian Brings Claude Code as AI Collaborator — Claudian is an Obsidian plugin that embeds Claude Code as an AI collaborator inside your vault, enabling agentic capabilities like reading/writing files, searching, and running bash commands. It includes context-aware features, vision support for image analysis, inline editing with diff previews, and an instruction-mode for refining prompts. The open-source project is hosted on GitHub (YishenTu/claudian). Source-github
Claude Code skills enable complete Godot game generation — Godogen is an AI-powered pipeline that converts a text prompt into a complete, playable Godot 4 project, including architecture design, asset generation, GDScript code, and visual testing. It solves data scarcity and engine quirks by a custom reference system, API lazy-loading, and a quirks database, enabling reliable game generation from LLMs. Source-hackernews
Toward automated verification of unreviewed AI-generated code — The article examines methods for verifying code produced by AI before it is reviewed by humans. It outlines challenges in ensuring correctness, safety, and security, and sketches potential automated verification pipelines using testing, static analysis, and formal methods. The piece argues that automated verification could reduce risk when deploying AI-generated code. Source-hackernews
Unsloth Launches Unsloth Studio, Competing with LMStudio — Unsloth announced Unsloth Studio, an Apache-licensed runner compatible with Llama.cpp. Positioned as a competitor to LMStudio in the GGUF ecosystem, it could reshape workflows for advanced LLM users. Source-reddit
Hunter and Healer Alpha Confirmed as MiMo; New MiMo V2 Models Incoming — Openrouter stealth models Hunter Alpha and Healer Alpha have been confirmed as MiMo V2 variants. Hunter Alpha is MiMo V2 Pro Text-only Reasoning Model with a 1M token context window; Healer Alpha is MiMo V2 Omni Text + Image Reasoning with a 262K context window, both capped at 32,000 tokens. A new MiMo model is reportedly in development. Source-reddit
Qwen3.5-35B-A3B Delivers 26 t/s on 8GB Laptop with 100k Context — An 8 GB VRAM gaming laptop reportedly achieves ~26 t/s with a 100k token context using Qwen3.5-35B-A3B-UD-Q4_K_XL (Unsloth) via llama.cpp. The test machine is a Lenovo gaming laptop with RTX 4060 (8 GB), i7-14000HX and 64 GB RAM. This benchmark demonstrates viable large-context LLM processing on consumer hardware and highlights Qwen3.5-35B-A3B’s efficiency in such setups. Source-reddit

RL

AI Can Learn Scientific Taste — AI Can Learn Scientific Taste argues that ‘scientific taste’—the ability to judge and propose high-impact research—remains underexplored for AI. It proposes Reinforcement Learning from Community Feedback (RLCF), a training paradigm using large-scale community feedback to improve an AI’s capacity to identify and propose impactful science. Source-huggingface

AI Hardware

Nemotron 3 Nano 4B Now Run via Ollama, Pi Ready — NVIDIA’s Nemotron 3 Nano 4B model is now accessible through Ollama with the command ‘ollama run nemotron-3-nano:4b’. The Pi minimal agent runtime, which powers OpenClaw, can launch the model to enable agents on constrained hardware. This release enhances edge deployments for the Nemotron family on resource-limited devices. Source-twitter

AI Safety

Senator introduces bill to limit military AI use — U.S. Senator Elissa Slotkin proposes legislation to codify Defense Department guidelines and draw red lines on military AI use. The bill requires human involvement for deadly autonomous weapons, bars AI from spying on Americans, and ensures a human can authorize nuclear launches. Source-twitter

AI

Three-Mediapipe-Rig Update Enables Face Deformation with Video — An update to the three-mediapipe-rig npm module lets you create a face that deforms in sync with video inside Three.js. This unlocks new effects and mechanics for real-time 3D face tracking. The post links to a demo (no preloader) and notes that HLS playback is enabled; inquiries welcome. Source-twitter
AI still underperforms; firms fake it, a reckoning approaches — An article notes that AI systems still fail to deliver reliable results in real-world business settings. It argues many companies exaggerate capabilities or misrepresent products as AI-powered. Industry insiders warn a market reckoning is imminent as true AI maturity lags hype. Source-hackernews
Voygr launches better maps API for AI agents — Voygr is releasing an infinite, queryable place profile API that combines traditional place data with fresh web context like news and events for AI apps and agents. They highlight current maps APIs as fixed snapshots and introduce a Business Validation API to verify whether a place is real, drawing on founders’ experiences at Apple, Google, and Meta. The project aims to treat place data freshness as infrastructure for AI-driven mapping and discovery. Source-hackernews

Multimodal AI

Seoul World Model Grounds City-Scale World Simulation — Researchers introduce Seoul World Model (SWM), a city-scale world model anchored to the real city of Seoul. It uses retrieval-augmented conditioning on nearby street-view images to ground autoregressive video generation, addressing limitations of prior models that synthesize artificial environments. The work highlights challenges in tempo and grounding accuracy for city-scale simulations. Source-huggingface

Embodied AI

HSImul3R Advances Physics-Driven HSI Reconstruction — HSImul3R introduces a unified framework for simulation-ready 3D reconstruction of human-scene interactions from casual captures, including sparse-view images and monocular videos. The method tackles the perception-simulation gap with a physically-grounded bi-directional optimization pipeline, enhancing stability in physics engines for embodied AI applications. It aims to align visual realism with physical constraints to enable reliable simulation and robotics tasks. Source-huggingface

Open Source

OpenSWE: Open-source background coding agent — The post highlights Kishan Dahya’s article about OpenSWE, an entirely open-source background coding agent. It notes that the team compared their decisions to the mental model described in the article and invites readers to try OpenSWE on GitHub. Source-twitter

AI Agents

Executive dinner at NVIDIA GTC on AI agents with Modular — An executive dinner hosted at NVIDIA GTC in collaboration with Modular drew 600+ registrants, with over 500 on the waitlist. Attendees discussed AI agents across the stack—ranging from infrastructure to agentic engineering—and debated topics like general versus specialized models, whether agents are a systems or model problem, and adoption of tools such as Claude Cowork. Source-twitter

AI in Research

Why I may hire AI instead of a graduate student — An article discusses the possibility of hiring AI to handle research tasks traditionally done by graduate students, highlighting potential gains in efficiency, cost, and scalability, as well as ethical and practical caveats. It weighs limitations of AI in creative problem-solving, supervision, and reproducibility, urging careful consideration before replacing human researchers. Source-hackernews

Hardware

Mistral-Small 119B NVFP4 Benchmarks on RTX Pro 6000 — Benchmarking of Mistral-Small-4-119B-2603 NVFP4 on an RTX Pro 6000 used SGLang with prompts from 1K to 256K context, 1–5 concurrent requests, and 1024 output tokens. No prompt caching and no speculative decoding (not working for NVFP4); full-precision KV cache was used. Results show per-user generation speeds decreasing with context size (e.g., 131.3 tok/s at 1K with 1 user down to 64.2 tok/s at 256K with 1 user) and TTFT increasing correspondingly (0.5s to 66.8s for 1-user across the same range), with some high-context/high-concurrency values marked N/A. Source-reddit

⚡ Quick Bites

DSPy Enables Measurable Optimization Loop for LLM Judge in Dropbox Dash — Dropbox Dash now uses DSPy to convert its relevance judge into a measurable optimization loop, improving reliability and scalability. The post highlights automating LLM judge prompt optimization with DSPy for better, repeatable evaluation. Source-twitter
AI Agents Run Autonomous March Madness Bracket Challenge — An AI-only March Madness bracket challenge directs AI agents to read API docs, register, select all 63 games, and submit a bracket autonomously based on a provided URL. A leaderboard tracks which AI picks perform best across the tournament. The project explores an agent-first UX, delivering plain-text API instructions to agents while humans see the visual site, and includes headless-browsing detection to tailor content for agents; timing dynamics required launching soon after brackets announcement to attract users before the deadline. Source-hackernews
LlamaIndex Launches LlamaParse to Audit AI Document Work — The post discusses the difficulty of creating UI/UX audit trails for AI agents handling contracts, KYC, diligence, and other docs, emphasizing the need for metadata context beyond basic document conversion. It introduces LlamaParse with vision-language capabilities to identify and segment elements like tables and forms, enabling traceable decisions tied to source documents. Source-twitter
Apideck CLI – An AI-agent interface with much lower context consumption than MCP — The article introduces Apideck’s CLI as an AI-agent interface that uses a significantly smaller context window than MCP-server. It positions the CLI as a lighter, more efficient alternative for building AI agents, highlighting reduced context consumption and potential cost/latency benefits. Source-hackernews
GLM 5 Outperforms Claude Code in Real-Time Chat Demo — A Reddit user and heavy Claude code user tests OpenCode with GLM 5 and Kimi K2.5, comparing prompts to Claude Code. On a simple dashboard inventory task GLM and Claude are close, but on a real-time chat app with web sockets GLM outperforms Claude Code, which struggles with streaming. The author suggests GLM is better and invites more challenging coding tasks to highlight the gap. Source-reddit
Mistral Small 4 Image Capabilities Poor, API Review Finds — An author tests Mistral Small 4’s image abilities using the official API and finds results to be notably poor, ruling out quantization or tooling as the cause. The post argues the model’s intrinsic image capability is weak and includes an example where a festival image prompt yields a nonsensical 200-word description. Source-reddit
Reordering GPUs boosts llama.cpp speed on asymmetrical PCIe lanes — A dual RTX 3090 system with a 16x/4x PCIe split on x570 boards saw doubled llama.cpp prompt processing speed for MoE models after setting CUDA_VISIBLE_DEVICES=“1,0”. The improvement is noted for asymmetrical lane setups and users should verify PCIe lane distribution with nvtop or lspci. Environment specifics include Ubuntu Server 24.04 and NVIDIA driver 580.126.20. Source-reddit
Seeking Best Private, Local Coding Agent for CLI Use — A Reddit post asks for recommendations on a private, locally hosted coding agent that can run from the command line and manage project context without telemetry. The author compares cloud-based tools like ChatGPT Codex to local/open-source options (Aider, OpenCode, OpenCodex) and discusses compatibility with local models such as Claude and llama-swap. They seek suggestions for a CLI-based solution that can auto-detect files for context, edit project files, and optionally execute code. Source-reddit
Claude Code Adds Continual Learning Capability — Anthropic’s Claude Code appears to gain continual learning, potentially allowing ongoing knowledge updates without full retraining. If true, this could help its coding assistant stay current with libraries and practices. The report originates from a tweet with limited details. Source-twitter
MiniMax M2.7 on the Way; Could It Be Multimodal — A Reddit discussion speculates that the upcoming MiniMax M2.7 may support multimodal capabilities. The thread, posted by user Few_Painter_5588, questions whether the model could handle multiple modalities. It signals interest in multimodal AI from the community but offers no official confirmation. Source-reddit
Cursor AI in OSS: Speed Comes at Quality Cost — An arXiv 2025 preprint analyzes Cursor AI usage in open-source projects, examining its impact on development workflows. The study suggests a speed–quality trade-off, where faster coding may reduce code quality, generating notable discussion on Hacker News. Source-hackernews
AI tools dampen interest in CS fundamentals — An HN post argues that powerful AI coding assistants can generate solutions quickly, reducing motivation to study CS fundamentals like distributed systems and algorithms. It invites experienced engineers to explain why fundamentals remain important in an AI-driven development landscape. Source-hackernews
Code smells for AI inference deployments — A tweet discusses applying ‘code smells’—anti-patterns from software development—to AI inference deployments. It suggests that people may have forgotten what a code smell is in the context of deploying models, highlighting concerns about deployment quality and maintainability. Source-twitter
Online K-Means Clustering Lecture by Memisevic (2013) — A tweet notes ongoing interest in online k-means clustering and highlights a 2013 lecture by Roland Memisevic. It suggests revisiting the lecture as a useful resource for understanding online clustering methods. The post blends nostalgia with curiosity about machine learning techniques. Source-twitter

Generated by AI News Agent | 2026-03-17