• In machine learning, reinforcement learning from human feedback (RLHF) is a technique to align an intelligent agent with human preferences. It involves...
    62 KB (8,617 words) - 19:50, 11 May 2025
  • GPT-4 (category Short description is different from Wikidata)
    licensed from third-party providers"). Then, it was fine-tuned for human alignment and policy compliance, notably with reinforcement learning from human feedback...
    63 KB (6,043 words) - 00:21, 1 August 2025
  • Thumbnail for Reinforcement learning
    Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions...
    69 KB (8,200 words) - 18:16, 17 July 2025
  • learning, but there are also techniques to fine-tune a model using weak supervision. Fine-tuning can be combined with a reinforcement learning from human...
    12 KB (1,274 words) - 04:17, 29 July 2025
  • Thumbnail for Generative pre-trained transformer
    Generative pre-trained transformer (category Wikipedia articles in need of updating from May 2025)
    GPT-3 family was the use of reinforcement learning from human feedback (RLHF) to better align the models' behavior with human preferences. This led to the...
    54 KB (4,320 words) - 20:33, 2 August 2025
  • Thumbnail for Claude (language model)
    Claude (language model) (category Machine learning)
    been fine-tuned, notably using constitutional AI and reinforcement learning from human feedback (RLHF). Constitutional AI is an approach developed by...
    26 KB (2,274 words) - 20:30, 2 August 2025
  • Paul Christiano (category Machine learning researchers)
    paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing reinforcement learning from human feedback (RLHF). He is...
    14 KB (1,221 words) - 00:26, 6 June 2025
  • GPT-4.5 (category Use mdy dates from June 2025)
    This method was combined with supervised fine-tuning and reinforcement learning from human feedback. The computational resources needed for training were...
    7 KB (630 words) - 15:27, 23 July 2025
  • Waluigi effect (category Articles with specifically marked weasel-worded phrases from July 2025)
    Waluigi". AI alignment Hallucination Existential risk from AGI Reinforcement learning from human feedback (RLHF) Suffering risks Bereska, Leonard; Gavves,...
    6 KB (625 words) - 17:36, 19 July 2025
  • having the human in the feedback loop of the computational process Reinforcement learning from human feedback MIM-104 Patriot - Examples of a human-on-the-loop...
    8 KB (978 words) - 16:01, 10 April 2025
  • Deep reinforcement learning (DRL) is a subfield of machine learning that combines principles of reinforcement learning (RL) and deep learning. It involves...
    12 KB (1,658 words) - 13:16, 21 July 2025
  • 360-degree feedback Biofeedback Climate change feedback, for positive and negative feedbacks associated with climate change Reinforcement learning from human feedback...
    3 KB (416 words) - 20:57, 3 May 2025
  • to attempt to replicate GPT-3. Leahy is sceptical of reinforcement learning from human feedback as a solution to the alignment problem. “These systems...
    5 KB (529 words) - 07:10, 19 May 2025
  • with reinforcement learning from human feedback). Some researchers take an anthropomorphic perspective and posit that hallucinations arise from a tension...
    70 KB (7,151 words) - 21:37, 29 July 2025
  • Large language model (category Deep learning)
    assistant. Techniques like reinforcement learning from human feedback (RLHF) or constitutional AI can be used to instill human preferences and make LLMs...
    136 KB (14,372 words) - 10:24, 3 August 2025
  • Reasoning language model (category Machine learning)
    model on human ranked preference data, as in reinforcement learning from human feedback. A base model can also be fine-tuned to predict, from a partial...
    26 KB (3,061 words) - 21:30, 31 July 2025
  • Thumbnail for ChatGPT
    ChatGPT (category Short description is different from Wikidata)
    process involved supervised learning and reinforcement learning from human feedback (RLHF). Both approaches employed human trainers to improve model performance...
    171 KB (15,047 words) - 07:11, 3 August 2025
  • Self-Correction via Reinforcement Learning (SCoRe) which rewards the model for improving its responses. Early research explored PRMs to provide feedback on each reasoning...
    8 KB (763 words) - 11:13, 20 July 2025
  • Policy gradient method (category Reinforcement learning)
    Policy gradient methods are a class of reinforcement learning algorithms. Policy gradient methods are a sub-class of policy optimization methods. Unlike...
    31 KB (6,297 words) - 20:12, 9 July 2025
  • useful, and harmless, usually with a technique called reinforcement learning from human feedback (RLHF). Current GPT models are prone to generating falsehoods...
    285 KB (29,145 words) - 07:39, 1 August 2025
  • Scale AI (category Use mdy dates from April 2024)
    languages. Outlier tasks involve content evaluation and reinforcement learning from human feedback (RLHF). Capoot, Ashley (June 18, 2025). "Tech Scale AI...
    25 KB (2,312 words) - 05:00, 2 August 2025
  • Thumbnail for Imagination
    2022). "Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback". p. 26. arXiv:2211.11602 [cs.LG]. Allen, K.R.; Lopez-Guevara...
    71 KB (7,546 words) - 19:12, 23 June 2025
  • Sparrow (chatbot) (category Articles with unsourced statements from March 2023)
    which has 70 Billion parameters. Sparrow is trained using reinforcement learning from human feedback (RLHF), although some supervised fine-tuning techniques...
    10 KB (571 words) - 20:51, 5 March 2024
  • AI alignment (category Short description is different from Wikidata)
    Existential risk from artificial general intelligence AI takeover AI capability control Reinforcement learning from human feedback Regulation of artificial...
    133 KB (13,069 words) - 15:35, 21 July 2025
  • Thumbnail for Feedback
    positive and negative reinforcement or punishment rather than feedback. Yet even within a single discipline an example of feedback can be called either...
    48 KB (5,792 words) - 09:15, 20 July 2025
  • Toloka (category Human-based computation)
    Toloka provides services such as model fine tuning, reinforcement learning from human feedback, evaluation, adhoc datasets, which require large volumes...
    8 KB (716 words) - 18:43, 19 June 2025
  • Agentic AI (category Short description is different from Wikidata)
    in deep learning, reinforcement learning, and neural networks allowed AI systems to learn on their own and make decision with minimal human guidance...
    10 KB (1,027 words) - 10:57, 30 July 2025
  • Language and Communication Technologies (category Articles needing additional references from March 2011)
    assistant. Methods such as reinforcement learning from human feedback (RLHF) or constitutional AI can be used to embed human preferences and make LLMs...
    10 KB (1,150 words) - 09:42, 30 July 2025
  • Prompt injection (category Short description is different from Wikidata)
    filtering, prompt evaluation, reinforcement learning from human feedback, and prompt engineering to distinguish user input from system instructions. Additional...
    28 KB (2,958 words) - 23:30, 1 August 2025
  • Thumbnail for Multi-agent reinforcement learning
    Multi-agent reinforcement learning (MARL) is a sub-field of reinforcement learning. It focuses on studying the behavior of multiple learning agents that...
    29 KB (3,030 words) - 12:25, 24 May 2025