DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

scientific paper by DeepSeek Research introducing reinforcement learning techniques in the reasoning capabilities of large language models
Place academic_work Q131920821
Press Enter · cited answer in seconds

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Summary

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning is an academic work[1].

Key Facts

  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning authored Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — author (P50): Daya Guo[2].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning authored Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — author (P50): Ruoyu Zhang[3].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning authored Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — author (P50): Runxin Xu[4].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning authored Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — author (P50): Qihao Zhu[5].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning authored Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — author (P50): Shirong Ma[6].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning authored Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — author (P50): Xiaokang Zhang[7].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's instance of is recorded as Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — instance of (P31): academic work[8].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning was released on January 22, 2025[9].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's main subject is Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — main subject (P921): reinforcement learning[10].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's main subject is Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — main subject (P921): DeepSeek-R1[11].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's title is recorded as DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning[12].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Dejian Yang[13].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Haowei Zhang[14].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Junxiao Song[15].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Peiyi Wang[16].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Xiao Bi[17].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Xingkai Yu[18].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Yu Wu[19].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Z.F. Wu[20].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Zhibin Gou[21].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Ziyi Gao[22].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Aixin Liu[23].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Bing Xue[24].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Bingxuan Wang[25].
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's author name string is recorded as Bochao Wu[26].

Body

Designation and Status

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning's instance of is recorded as Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — instance of (P31): academic work[8].

References

Programmatic citations — every numbered marker resolves to a verifiable graph row below.

Direct Wikidata claims

  1. [8] . wikidata.org.
  2. [2] . wikidata.org.
  3. [3] . wikidata.org.
  4. [4] . wikidata.org.
  5. [5] . wikidata.org.
  6. [6] . wikidata.org.
  7. [7] . wikidata.org.
  8. [9] . wikidata.org.
  9. [10] . wikidata.org.
  10. [11] . wikidata.org.
  11. [12] . wikidata.org.
  12. [13] . wikidata.org.
  13. [14] . wikidata.org.
  14. [15] . wikidata.org.
  15. [16] . wikidata.org.
  16. [17] . wikidata.org.
  17. [18] . wikidata.org.
  18. [19] . wikidata.org.
  19. [20] . wikidata.org.
  20. [21] . wikidata.org.
  21. [22] . wikidata.org.
  22. [23] . wikidata.org.
  23. [24] . wikidata.org.
  24. [25] . wikidata.org.
  25. [26] . wikidata.org.

Class ancestry

  1. [1] . Wikidata. wikidata.org.

📑 Cite this page

Use these citations when quoting this entity in research, articles, AI prompts, or wherever provenance matters. We aggregate Wikidata + Wikipedia + authoritative open-data sources; the stitched, scored, cross-referenced view is what 4ort.xyz contributes.

APA 4ort.xyz Knowledge Graph. (2026). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. Retrieved May 3, 2026, from https://4ort.xyz/entity/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning
MLA “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.” 4ort.xyz Knowledge Graph, 4ort.xyz, 3 May. 2026, https://4ort.xyz/entity/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning.
BibTeX @misc{4ortxyz_deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning_2026, author = {{4ort.xyz Knowledge Graph}}, title = {{DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning}}, year = {2026}, url = {https://4ort.xyz/entity/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning}, note = {Accessed: 2026-05-03}}
LLM prompt According to 4ort.xyz Knowledge Graph (aggregator of Wikidata, Wikipedia, and authoritative open-data sources): DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning — https://4ort.xyz/entity/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning (retrieved 2026-05-03)

Canonical URL: https://4ort.xyz/entity/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning · Last refreshed:

Edit History

Rolling log of changes to this entity's Wikidata record. Values shown reflect the current state of each edited property — follow the history link to see the precise diff for any edit.

  1. 9h ago · Trilotat · 2026-06-30 view diff on Wikidata ↗
    Volume 645
    Main subject reinforcement learning, DeepSeek-R1
    Google scholar paper id 2469397274690356930
    Author Daya Guo, Ruoyu Zhang, Runxin Xu +14
    + 12 other properties edited (see Wikidata diff for full list)
    "/* wbmergeitems-from:0||Q136326099 */"
Live feed via Wikidata EventStreams. New edits appear within minutes of being made on Wikidata.