Risks from Learned Optimization in Advanced Machine Learning Systems

paper-0112 · paper · 2019

Evan Hubinger et al.

Mesa-optimization and deceptive alignment; core inner-alignment concepts.

Academic, score -0.2035

Metric	Status	Value	Norm.	Weight	Contribution	Source	Confidence	Provenance
citation_count	present	25.0	0.000108	0.5	5.4e-05	OpenAlex	high	link
library_holdings	missing	recorded as missing, penalized by rule, never imputed			−0.1	recorded as missing; penalized by rule, never imputed
readership_persistence	present	7.0	0.428571	0.05	0.021429	OpenAlex	medium	link
syllabus_adoptions	missing	recorded as missing, penalized by rule, never imputed			−0.125	recorded as missing; penalized by rule, never imputed

Broad Influence, score -0.0285

Metric	Status	Value	Norm.	Weight	Contribution	Source	Confidence	Provenance
citation_count	present	25.0	0.000108	0.2	2.2e-05	OpenAlex	high	link
library_holdings	missing	recorded as missing, penalized by rule, never imputed			−0.125	recorded as missing; penalized by rule, never imputed
readership_persistence	present	7.0	0.428571	0.4	0.171429	OpenAlex	medium	link
syllabus_adoptions	missing	recorded as missing, penalized by rule, never imputed			−0.075	recorded as missing; penalized by rule, never imputed

Governance Practitioner, score -0.2821

Metric	Status	Value	Norm.	Weight	Contribution	Source	Confidence	Provenance
citation_count	present	25.0	0.000108	0.25	2.7e-05	OpenAlex	high	link
library_holdings	missing	recorded as missing, penalized by rule, never imputed			−0.15	recorded as missing; penalized by rule, never imputed
readership_persistence	present	7.0	0.428571	0.1	0.042857	OpenAlex	medium	link
syllabus_adoptions	missing	recorded as missing, penalized by rule, never imputed			−0.175	recorded as missing; penalized by rule, never imputed

A rank is not a verdict on intrinsic worth. It is a transparent output of declared evidence, weights, and missing-data rules at a specific release date.

Disagree with this rank or a number? Challenge it with your evidence. Every challenge gets a public identifier and a published resolution.