daily
Apr 24, 2026

AI Daily — 2026-04-24

English 中文

GPT-5.5 Rolls Out to Copilot, M365 Copilot, Studio, Foundry · Codex gains browser-based testing t...


Covering 23 AI news items

🔥 Top Stories

1. GPT-5.5 Rolls Out to Copilot, M365 Copilot, Studio, Foundry

GPT-5.5 is rolling out to GitHub Copilot, M365 Copilot, Copilot Studio, and Foundry, with deeper reasoning, stronger multistep execution, and better performance on long, complex tasks. The update emphasizes selecting the right model for each task across workflows to accelerate idea-to-execution with fewer iterations. It underscores ongoing AI integration across Microsoft’s productivity and developer tools. Source-twitter

2. Codex gains browser-based testing to close build-verify loop

OpenAI’s Codex now supports browser-based testing, enabling it to build front-ends and test them as a user would by clicking through the app. It uses vision to see what a user sees and checks network/console logs to debug and fix issues, accelerating autonomous coding. The update even mentions enabling HLS playback and demonstrates an iterative loop of testing and feature delivery. Source-twitter

3. DeepSeek-V4 Preview Open-Sources with 1M Context

DeepSeek released the DeepSeek-V4 Preview as open source, featuring two configurations: DeepSeek-V4-Pro (1.6T total, 49B active parameters) and DeepSeek-V4-Flash (284B total, 13B active parameters). It promotes a cost-effective 1M context length, API updates, and access via chat.deepseek.com, with accompanying tech report and Open Weights on Hugging Face. Source-twitter

LLM

  • Google to Invest $10B in Anthropic, $30B More Planned — Google has agreed to invest $10 billion in Anthropic at its current valuation, with up to $30 billion more planned later. This comes despite Google’s own Gemini AI models, underscoring a major push into AI capability funding and strategic partnerships. Source-twitter
  • Local Qwen3.6 27B Runs on MacBook Pro via Llama.cpp — An AI hobbyist demonstrates running Qwen3.6 27B locally on a MacBook Pro using Llama.cpp inside a Pi coding agent, even in airplane mode. The post argues local models approach the capabilities of Claude Code’s Opus and frames powerful on-device AI as a path to efficiency, security, and sovereignty. Source-twitter
  • LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics — Researchers formalize Time Series Reasoning (TSR) with a four-level taxonomy to address fragmented task definitions in LLM-based time-series understanding. They introduce HiTSR, a hierarchical time series reasoning dataset, to enable rigorous evaluation and foster unified TSRMs. Source-huggingface
  • Cline: Autonomous IDE coding agent with human-in-the-loop — Meet Cline, an open-source AI assistant that operates inside your IDE, enabling file creation/editing, command execution, and browser-assisted tasks with user permission. It leverages Claude Sonnet’s agentic coding and the Model Context Protocol to extend its capabilities, while a human-in-the-loop GUI ensures approval for every file change and terminal command. Source-github
  • Context-Mode trims AI context, preserves session continuity — Context Mode is an open-source MCP server that optimizes context window usage for AI coding agents by sandboxing tool outputs. It reports a 98% reduction in raw data within the context and maintains session continuity by logging edits, tasks, and decisions in SQLite, using BM25-indexed retrieval to fetch only relevant data. The project, hosted at mksglu/context-mode on GitHub, aims to prevent context loss during conversation compaction and long-running sessions. Source-github
  • Anthropic Downgrades Hosted Models, Reverts Changes — Anthropic admits it briefly lowered Claude Code’s reasoning effort to reduce latency, making hosted models seem less capable. After user feedback, the company reverted to higher-intelligence defaults and fixed a session-clearing bug that caused repetition; the changes affected Sonnet 4.6 and Opus 4.6. The episode is framed as evidence for the value of open weights and local models for reliability. Source-reddit
  • VLLM PR: Cohere to Release New MoE Model Soon — A Reddit post teases an upcoming mixture-of-experts (MoE) model from Cohere announced via the VLLM PR repository. The post provides no technical details, and there is no official confirmation. If true, it signals potential scalability benefits for Cohere’s LLM offerings. Source-reddit

Multimodal

  • WorldMark: Unified Benchmark for Interactive Video World Models — WorldMark proposes a standardized benchmark suite to evaluate interactive video world models under identical scenes and action sequences, addressing fragmentation across models like Genie, YUME, HY-World, and Matrix-Game. By providing a common test condition and a unified control interface, WorldMark aims to enable fair cross-model comparisons using standardized metrics. Source-huggingface

AI Agents

  • VoltAgent Unveils 1000+ Agent Skills Repository — VoltAgent releases a curated collection of over 1,000 real-world agent skills sourced from official dev teams and the community. The repository lists official skills from leaders such as Anthropic, Google Labs, Vercel, Stripe, Cloudflare, Netlify, Trail of Bits, Sentry, Expo, Hugging Face, Figma, and more, compatible with Claude Code, Codex, Gemini CLI, Cursor, and other tools. It is touted as the most contributed agent skills repository, built with active community involvement. Source-github

AI Safety

  • A Brief Era of AIs Bumbling on Computers Before Rapid Progress — A tweet envisions a short period when AIs will awkwardly navigate computers—clicking around, failing, and taking a human-like time to write code. It argues that in a blink, these systems will be able to manipulate computers far faster than humans can monitor. The message frames a shift from trial-and-error AI use to rapid, potentially unmonitorable automation. Source-twitter

Open Source

  • Hugging Face unveils ml-intern: open-source ML engineer CLI — ml-intern is an open-source CLI that acts as an autonomous ML engineer, capable of reading papers, training models, and shipping models within the Hugging Face ecosystem. It offers interactive and headless modes, with quick-start installation and configuration for tokens and models, including anthropic/claude-opus-4-6. The project aims to streamline research-to-deployment workflows by providing a single-tool interface for ML research, data, and cloud compute. Source-github
  • Open-Generative-AI Uncensored Open-Source Studio with 200+ Models — Open-Generative-AI offers a fully open-source, self-hosted alternative to proprietary AI studios like Higgsfield AI and Freepik. It features 200+ models, no content filters, and an MIT license, enabling uncensored image and video generation; the project is hosted on GitHub and promotes automation via AI coding agents. Source-github
  • Free Claude Code Proxy Enables Zero-Cost Access Across Providers — A GitHub project offers a lightweight proxy that routes Claude Code’s Anthropic API calls to multiple backends (NVIDIA NIM, OpenRouter, DeepSeek, LM Studio, llama.cpp). It enables free usage with a 40 requests/minute quota on NVIDIA NIM and claims no Anthropic API key is required. The tool works via terminal, VSCode extension, or Discord bot, and includes per-model mapping and drop-in compatibility with Claude Code CLI/VSCode. Source-github
  • ONNX Runtime: Cross-Platform AI Inference and Training Accelerator — ONNX Runtime provides a cross-platform inference and training accelerator that supports models from PyTorch, TensorFlow/Keras, and classic ML libraries like scikit-learn, LightGBM, and XGBoost. It aims to deliver faster inference and lower costs by leveraging hardware accelerators and graph optimizations, with training support on multi-node NVIDIA GPUs for transformer models via a simple PyTorch script addition. Source-github

⚡ Quick Bites

  • LeCun: LLMs Useful, but Need World Models and Planning — Yann LeCun argues that while LLMs are useful, they cannot operate effectively in the real world without world models and zero-shot planning. He contends that a robot-rich future requires systems that understand the physical world and anticipate consequences, and he comments on dystopian visuals and personal attire in the discussion. Source-twitter
  • Marketing Skills for AI Agents and Claude Code — Coreyhaines31/marketingskills provides a collection of AI agent skills focused on marketing tasks, aimed at technical marketers and founders using AI coding agents for CRO, copywriting, SEO, analytics, and growth engineering. It supports Claude Code, OpenAI Codex, Cursor, Windsurf, and other agents that implement the Agent Skills spec, and invites community contributions. The project references related resources and tools such as Magister, Conversion Factory, Swipe Files, and Coding for Marketers. Source-github
  • r/LocalLLaMa updates rules to curb spam bots — Moderators of the r/LocalLLaMa subreddit announced the first rule updates to address rising slop and spam amid growing traffic. The changes add explicit minimum karma requirements to Rules 3 and 4, with slides detailing the updates and an FAQ explaining how fresh bot accounts are targeted while older high-karma accounts may bypass general Reddit defenses. The team will monitor impact and plan future updates. Source-reddit
  • Current State of LocalLLaMA Revealed — A Reddit post by user /u/jacek2023 provides an update on the LocalLLaMA project, outlining its current status. The post includes a link to related discussions and comments on Reddit. Source-reddit
  • Chip Huyen’s AI Engineering Book: Resource Hub — A GitHub repository aggregates resources for AI engineers and supporting materials for Chip Huyen’s AI Engineering book. It lists chapter summaries, study notes, case studies, prompt examples, misalignment analyses, and tools like ChatGPT and Claude heatmap generator, with the book explaining end-to-end adaptation of foundation models to real-world problems. The book is available on Amazon, O’Reilly, Kindle, and other retailers and is not a tutorial with heavy code snippets. Source-github
  • AMA: Nous Research — Open-Source Hermes Agent Lab — An AMA is planned with The Nous Research Team, developers of the open-source Hermes Agent. The event runs Wednesday, April 29, 8:00–11:00 AM PST, hosted in a separate thread; this post is an announcement. The discussion targets the r/LocalLLaMA community. Source-reddit

Generated by AI News Agent | 2026-04-24