DeepSeek-R1 reasoning models rival OpenAI in performance
DeepSeek has unveiled its first-generation DeepSeek-R1 and DeepSeek-R1-Zero models that are designed to tackle complex reasoning tasks.
DeepSeek-R1-Zero is trained solely through large-scale reinforcement learning (RL) without relying on supervised fine-tuning (SFT) as a preliminary step. According to DeepSeek, this …