Reinforcement_learning_from_human_feedback Search Results

Reinforcement learning from human feedback

In machine learning, reinforcement learning from human feedback (RLHF) is a technique to align an intelligent agent with human preferences. It involves...

62 KB (8,617 words) - 19:50, 11 May 2025

GPT-4 (category Short description is different from Wikidata)

licensed from third-party providers"). Then, it was fine-tuned for human alignment and policy compliance, notably with reinforcement learning from human feedback...

63 KB (6,043 words) - 00:21, 1 August 2025

Reinforcement learning

Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions...

69 KB (8,200 words) - 18:16, 17 July 2025

Fine-tuning (deep learning)

learning, but there are also techniques to fine-tune a model using weak supervision. Fine-tuning can be combined with a reinforcement learning from human...

12 KB (1,274 words) - 04:17, 29 July 2025

Generative pre-trained transformer (category Wikipedia articles in need of updating from May 2025)

GPT-3 family was the use of reinforcement learning from human feedback (RLHF) to better align the models' behavior with human preferences. This led to the...

54 KB (4,320 words) - 20:33, 2 August 2025

Claude (language model) (category Machine learning)

been fine-tuned, notably using constitutional AI and reinforcement learning from human feedback (RLHF). Constitutional AI is an approach developed by...

26 KB (2,274 words) - 20:30, 2 August 2025

Paul Christiano (category Machine learning researchers)

paper "Deep Reinforcement Learning from Human Preferences" (2017) and other works developing reinforcement learning from human feedback (RLHF). He is...

14 KB (1,221 words) - 00:26, 6 June 2025

GPT-4.5 (category Use mdy dates from June 2025)

This method was combined with supervised fine-tuning and reinforcement learning from human feedback. The computational resources needed for training were...

7 KB (630 words) - 15:27, 23 July 2025

Waluigi effect (category Articles with specifically marked weasel-worded phrases from July 2025)

Waluigi". AI alignment Hallucination Existential risk from AGI Reinforcement learning from human feedback (RLHF) Suffering risks Bereska, Leonard; Gavves,...

6 KB (625 words) - 17:36, 19 July 2025

Human-in-the-loop

having the human in the feedback loop of the computational process Reinforcement learning from human feedback MIM-104 Patriot - Examples of a human-on-the-loop...

8 KB (978 words) - 16:01, 10 April 2025

Deep reinforcement learning

Deep reinforcement learning (DRL) is a subfield of machine learning that combines principles of reinforcement learning (RL) and deep learning. It involves...

12 KB (1,658 words) - 13:16, 21 July 2025

Feedback (disambiguation)

360-degree feedback Biofeedback Climate change feedback, for positive and negative feedbacks associated with climate change Reinforcement learning from human feedback...

3 KB (416 words) - 20:57, 3 May 2025

Connor Leahy

to attempt to replicate GPT-3. Leahy is sceptical of reinforcement learning from human feedback as a solution to the alignment problem. “These systems...

5 KB (529 words) - 07:10, 19 May 2025

Hallucination (artificial intelligence) (redirect from Hallucination (machine learning))

with reinforcement learning from human feedback). Some researchers take an anthropomorphic perspective and posit that hallucinations arise from a tension...

70 KB (7,151 words) - 21:37, 29 July 2025

Large language model (category Deep learning)

assistant. Techniques like reinforcement learning from human feedback (RLHF) or constitutional AI can be used to instill human preferences and make LLMs...

136 KB (14,372 words) - 10:24, 3 August 2025

Reasoning language model (category Machine learning)

model on human ranked preference data, as in reinforcement learning from human feedback. A base model can also be fine-tuned to predict, from a partial...

26 KB (3,061 words) - 21:30, 31 July 2025

ChatGPT (category Short description is different from Wikidata)

process involved supervised learning and reinforcement learning from human feedback (RLHF). Both approaches employed human trainers to improve model performance...

171 KB (15,047 words) - 07:11, 3 August 2025

Feedback neural network

Self-Correction via Reinforcement Learning (SCoRe) which rewards the model for improving its responses. Early research explored PRMs to provide feedback on each reasoning...

8 KB (763 words) - 11:13, 20 July 2025

Policy gradient method (category Reinforcement learning)

Policy gradient methods are a class of reinforcement learning algorithms. Policy gradient methods are a sub-class of policy optimization methods. Unlike...

31 KB (6,297 words) - 20:12, 9 July 2025

Artificial intelligence (redirect from Probabilistic machine learning)

useful, and harmless, usually with a technique called reinforcement learning from human feedback (RLHF). Current GPT models are prone to generating falsehoods...

285 KB (29,145 words) - 07:39, 1 August 2025

Scale AI (category Use mdy dates from April 2024)

languages. Outlier tasks involve content evaluation and reinforcement learning from human feedback (RLHF). Capoot, Ashley (June 18, 2025). "Tech Scale AI...

25 KB (2,312 words) - 05:00, 2 August 2025

Imagination (redirect from Human imagination)

2022). "Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback". p. 26. arXiv:2211.11602 [cs.LG]. Allen, K.R.; Lopez-Guevara...

71 KB (7,546 words) - 19:12, 23 June 2025

Sparrow (chatbot) (category Articles with unsourced statements from March 2023)

which has 70 Billion parameters. Sparrow is trained using reinforcement learning from human feedback (RLHF), although some supervised fine-tuning techniques...

10 KB (571 words) - 20:51, 5 March 2024

AI alignment (category Short description is different from Wikidata)

Existential risk from artificial general intelligence AI takeover AI capability control Reinforcement learning from human feedback Regulation of artificial...

133 KB (13,069 words) - 15:35, 21 July 2025

Feedback

positive and negative reinforcement or punishment rather than feedback. Yet even within a single discipline an example of feedback can be called either...

48 KB (5,792 words) - 09:15, 20 July 2025

Toloka (category Human-based computation)

Toloka provides services such as model fine tuning, reinforcement learning from human feedback, evaluation, adhoc datasets, which require large volumes...

8 KB (716 words) - 18:43, 19 June 2025

Agentic AI (category Short description is different from Wikidata)

in deep learning, reinforcement learning, and neural networks allowed AI systems to learn on their own and make decision with minimal human guidance...

10 KB (1,027 words) - 10:57, 30 July 2025

Language and Communication Technologies (category Articles needing additional references from March 2011)

assistant. Methods such as reinforcement learning from human feedback (RLHF) or constitutional AI can be used to embed human preferences and make LLMs...

10 KB (1,150 words) - 09:42, 30 July 2025

Prompt injection (category Short description is different from Wikidata)

filtering, prompt evaluation, reinforcement learning from human feedback, and prompt engineering to distinguish user input from system instructions. Additional...

28 KB (2,958 words) - 23:30, 1 August 2025

Multi-agent reinforcement learning

Multi-agent reinforcement learning (MARL) is a sub-field of reinforcement learning. It focuses on studying the behavior of multiple learning agents that...

29 KB (3,030 words) - 12:25, 24 May 2025