AI Daily — 2026-05-23

English 中文

Demis Hassabis: Singularity may arrive in a few years via AGI · Anthropic valued at $61.5B, xAI a...

Covering 23 AI news items

🔥 Top Stories

1. Demis Hassabis: Singularity may arrive in a few years via AGI

Demis Hassabis says the technological Singularity could be only a few years away, potentially triggered by the arrival of true AGI. He argues the transformative impact will make it the most important technology ever. Source-twitter

2. Anthropic valued at $61.5B, xAI at $80B

A thread highlights Anthropic’s latest valuation around $61.5 billion and Elon Musk’s xAI at about $80 billion, noting private status and limited revenue for both. The discussion also touches data-center usage, local impacts, and related commentary from political figures and media posts. Source-twitter

3. ZeroEntropy Builds 4-8x Faster Task-Specific AI Models

ZeroEntropy, a six-person team, is building task-specific AI models that the team claims are 4-8x faster than OpenAI or Anthropic. The project has drawn about 500K downloads on HuggingFace and focuses on production AI systems, offering state-of-the-art rerankers, embeddings, and custom-trained models with higher accuracy, lower latency, and lower cost. Source-twitter

📰 Featured

LLM

Command A+ 218B MoE Runs on Apple Silicon via MLX Port — Cohere released Command A+ (218B total, 25B active, 128 experts top-8) and added a cohere2_moe port for mlx-lm to run on Apple Silicon. The post includes architecture notes, quantization caveats (W4A4 artifacts vs BF16), and performance data, with a PR open on ml-explore. On a larger box, BF16→Q8 work yielded ~22.9 tok/s generation and ~57.6 tok/s prompt with a 241GB peak. Source-reddit
GPT-5.5 praised as strong, edges Opus in competition — An AI commentator on May 22 hailed GPT-5.5 as a very good model, noting major improvements for complex agent tasks. The user says GPT-5.2 is far behind Opus, and switching to Opus 4.7 after 5.5 feels like a regression. The post lauds competition and frames OpenAI as staging a strong comeback. Source-twitter
DSA Sparse Attention Added to LLMs-from-Scratch Repo — DeepSeek Sparse Attention (DSA) from-scratch implementation was added to the LLMs-from-scratch repo, thanks to a reader contribution. The post includes motivation and overview, plus a GPT-style model reference implementation as standalone example code. Source-twitter
DeepSeek Makes V4 Pro API Price Cut Permanent — DeepSeek has made its 75% temporary price cut for the first-party V4 Pro API permanent, slashing input costs to $0.435 per 1M and output to $0.87 per 1M with a blended ~$0.18/1M. Running the Artificial Analysis Intelligence Index (Reasoning, Max Effort) on V4 Pro costs about $268, far cheaper than Gemini 3.1 Pro Preview, GPT-5.5, and Claude Opus 4.7. The company positions V4 Pro, alongside V4 Flash, on the Pareto frontier of Intelligence Index vs Cost to Run Intelligence Index. Source-twitter
Hermes Agent Enables Cross-Session Memory and Learning — Hermes Agent is described as one of the first AI projects to remember everything across sessions and improve with use. It markets multi-layer memory, self-evolving skills, and autonomous 24/7 agents with cross-session recall, aiming to feel like an operator rather than a tool. Source-twitter
llama.cpp server gains built-in native tools for AI tasks — An experimental flag —tools in the llama.cpp server enables native tools such as read_file, file_glob_search, grep_search, exec_shell_command, write_file, edit_file, apply_diff, and get_datetime. This turns llama-server into a mini agent harness for local AI work, though there is currently no security sandboxing or whitelist of allowed commands. Source-reddit
Chrome Gemma4 Runs Gemini Nano Locally on PC Without GPU — A Chrome extension enables Gemini Nano (Gemma4) to run fully locally on a PC without a GPU via Google Chrome. The setup reportedly uses about 16 GB RAM and ~9216 tokens per session, with no llama.cpp or tinkering required; it’s distributed as a one-click extension called Dobby on the Chrome Web Store and shared on Reddit. Source-reddit
Best Small LLMs You Can Run Without a GPU — A Reddit post asks for recommendations on the current best small language models that can run on CPU (no GPU), balancing accuracy and speed. The author also seeks details on deployment stacks and practical CPU-only deployment experiences. Source-reddit
MOE outperforms dense for RAG tasks, user reports — A Reddit user building an all-in-one RAG with large datasets compares Mixture-of-Experts (MOE) against dense models. They report that qwen3.6 35b APEX provides better, more information-rich answers than a dense model and offers higher throughput on a single RTX 3090. The discussion touches on concerns like misinformation and auditability, as well as practical token-per-second performance differences. Source-reddit

Industry

Computers Solve Erdos Problems, AI Progress Accelerates — A post on Twitter claims that computers are speaking and solving Erdős problems, signaling a shift toward automated problem-solving in AI. It also asserts that gradient descent on deep neural networks shows no sign of plateau, implying rapid AI progress. The post is presented as a wake-up call about the pace of AI advancement. Source-twitter
US Green Card rule interrupts researchers on temporary visas — Many top AI researchers at OpenAI, Anthropic, Google, and Meta are in the U.S. on temporary visas, but U.S. policy requires them to return home to apply for Green Cards. This adds uncertainty, delays, and risk to AI development and to the U.S.’s ability to attract global talent. The author notes these implications for frontier labs and national competitiveness in AI. Source-twitter

Open Source

LongCat-Video-Avatar 1.5 Introduces Whisper-Large Encoder — LongCat-Video-Avatar 1.5 is an open-source upgrade for audio-driven human video generation, building on the LongCat-Video foundation with native AT2V, ATI2V, and Video Continuation capabilities. It replaces Wav2Vec2 with Whisper-Large for smoother lip dynamics and emphasizes production-ready stability, identity consistency, and broad domain generalization including anime and real-world scenes. Source-reddit

⚡ Quick Bites

What Is xAI Missing to Leap Ahead With Compute? — Data Noir replies to @yacineMTB with a serious question about what xAI would need to leap ahead given heavy compute. The tweet asks whether such a leap is currently impossible or could be achieved within a year with the right people. The post frames a broader debate on AI progress and resource requirements around xAI. Source-twitter
Codex Enables End-to-End iPhone Simulator Build and Debug — A Codex-driven workflow demonstrates building and debugging an iPhone simulator end-to-end. The setup shows Codex driving the simulator to bug bash a newly built feature and enable HLS playback. Source-twitter
Embeddings for NVIDIA Nemotron Personas with Qwen 0.6B — A Reddit post discusses generating embedding vectors for NVIDIA’s Nemotron-Personas dataset using Qwen 0.6B to enable semantic search and K-Nearest Neighbors clustering. It notes precomputed embeddings for Korea, Japan, France, and the USA, and provides links to the HuggingFace dataset and a web demo. Source-reddit
Local LLMs Handle Accounting Tasks with Claude Integration — A Reddit post describes using Qwen 3.6 27B for monthly closes, bank reconciliations, and payables/receivables, with a self-built SQLite database. The author also integrated Claude skills via the anthropics/financial-services GitHub repo and notes that local models are finally becoming practical, even with budget-limited hardware. They mention running the MTP version overnight on a modest GPU and express optimism about the usefulness of local AI models. Source-reddit
Llama.cpp vs LiteRT on Xiaomi 12 Pro 24/7 Server (V2) — An update on a 24/7 headless AI server running Llama.cpp and LiteRT on a Xiaomi 12 Pro, featuring a V2 redesign of cooling and power. The post describes a copper heatsink with a back panel fan, a front aluminum plate with two fans and a thermal pad, and cooling that engages at 40°C and stops at 35°C. It also covers a custom PSU wired to the phone’s BMS, with fuses, a crowbar at 4.3V, a backup PSU fan, and a 3D-printed aluminum stand. Source-reddit
30-Run Benchmarks Find Optimal Llama Settings on MI60 GPU — A Reddit user ran 30 llama-bench tests to optimize two models, Gemma4 and Qwen3.6, on an MI60 GPU using a prebuilt docker container. The goal was to maximize speed and efficiency for Frigate and HomeAssistant, noting quantization performance differences on MI60/MI50. Source-reddit
Karpathy’s nn-zero-to-hero: Neural nets from basics to transformers — A GitHub-hosted educational project outlining a neural networks course by Karpathy. It uses YouTube lectures and Jupyter notebooks to teach backpropagation (micrograd) and language modeling (makemore), with plans to extend to Transformer-style models like GPT. Source-github
Gary Marcus blocks AI debate, user says — An X user claims Gary Marcus is insisting that no one debate him about AI. The author says they shared their thoughts and were blocked. The post notes that the image in the tweet is unrelated. Source-twitter

Generated by AI News Agent | 2026-05-23