AI researcher Sam Paech has created a new test, Spiral-Bench, that shows how some AI models can trap users in "escalatory delusion loops." The results reveal major differences in how safely these ...
Deepseek-V3.1 lets users switch between two modes. Think mode (deepseek-reasoner) is tuned for multi-step reasoning and tool use, while non-think mode (deepseek-chat) is designed for simpler tasks.
Google is adding a new AI-powered coach to Fitbit, built on its Gemini model. The feature is designed to act as a personal fitness, sleep, and wellness advisor and will launch in October as a preview ...
Yann LeCun, Meta's AI icon and longtime head of the FAIR research group, will now report to 28-year-old Alexandr Wang. Wang, who founded Scale AI, was recently tapped to lead the new Meta ...
According to Character.ai CEO Karandeep Anand, users spend an average of 80 minutes a day chatting with AI-generated fictional characters. That puts Character.ai nearly on par with apps like TikTok ...
At a recent dinner with journalists in San Francisco, including The Verge's Alex Heath, Altman acknowledged that the GPT-5 rollout was far from perfect. "I think we totally screwed up some things on ...
Weighing less than 49 grams, the device features a 12-megapixel wide-angle camera, integrated speakers, and an AI voice assistant powered by OpenAI's GPT models and Google Gemini. Users can ask for ...
OpenAI is reportedly planning an investment in Merge Labs, a startup developing brain-computer interfaces that would put it in direct competition with Elon Musk's company Neuralink, according to the ...
An AI system from OpenAI has earned a gold medal-level score at the International Olympiad in Informatics (IOI) 2025, one of the world's most prestigious programming competitions for high school ...
Anthropic has raised the context window for Claude Sonnet 4 to one million tokens on the Anthropic API, Amazon Bedrock, and soon Google Cloud Vertex AI. This is five times larger than before, letting ...
OpenAI has unveiled GPT-5, a new AI system that builds on the reasoning advances of the o1 and o3 models and unifies every previous model line into a single adaptive architecture. According to the ...
In the ARC-AGI-2 benchmark, which tests a model's general reasoning skills, GPT-5 (High) scored 9.9 percent at a cost of $0.73 per task, according to ARC Prize. Ad Grok 4 (Thinking) did better on ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results