Deep Reinforcement Learning from Human Preferences

paper-0096 · paper · 2017

Paul F. Christiano et al.

Learning reward from human comparisons; the seed of RLHF.

Academic, score -0.1953

Metric	Status	Value	Norm.	Weight	Contribution	Source	Confidence	Provenance
citation_count	present	508.0	0.002282	0.5	0.001141	OpenAlex	high	link
library_holdings	missing	recorded as missing, penalized by rule, never imputed			−0.1	recorded as missing; penalized by rule, never imputed
readership_persistence	present	9.0	0.571429	0.05	0.028571	OpenAlex	medium	link
syllabus_adoptions	missing	recorded as missing, penalized by rule, never imputed			−0.125	recorded as missing; penalized by rule, never imputed

Broad Influence, score 0.0290

Metric	Status	Value	Norm.	Weight	Contribution	Source	Confidence	Provenance
citation_count	present	508.0	0.002282	0.2	0.000456	OpenAlex	high	link
library_holdings	missing	recorded as missing, penalized by rule, never imputed			−0.125	recorded as missing; penalized by rule, never imputed
readership_persistence	present	9.0	0.571429	0.4	0.228571	OpenAlex	medium	link
syllabus_adoptions	missing	recorded as missing, penalized by rule, never imputed			−0.075	recorded as missing; penalized by rule, never imputed

Governance Practitioner, score -0.2673

Metric	Status	Value	Norm.	Weight	Contribution	Source	Confidence	Provenance
citation_count	present	508.0	0.002282	0.25	0.000571	OpenAlex	high	link
library_holdings	missing	recorded as missing, penalized by rule, never imputed			−0.15	recorded as missing; penalized by rule, never imputed
readership_persistence	present	9.0	0.571429	0.1	0.057143	OpenAlex	medium	link
syllabus_adoptions	missing	recorded as missing, penalized by rule, never imputed			−0.175	recorded as missing; penalized by rule, never imputed

A rank is not a verdict on intrinsic worth. It is a transparent output of declared evidence, weights, and missing-data rules at a specific release date.

Disagree with this rank or a number? Challenge it with your evidence. Every challenge gets a public identifier and a published resolution.