Rules the ranking cannot break
- Deterministic scoring. Identical inputs and weights produce identical ranks; reproducible from the audit package with one command.
- Provenance on every number: source, retrieved_at, confidence, licence note. A number without provenance does not exist.
- No silent imputation. Missing evidence is recorded as missing and penalized by a published rule, never estimated.
- Domains never cross-rank. Books, papers, reports, and standards are scored within their own domain.
- Each language ecosystem scores within itself first. Coverage gaps are declared, not hidden.
- People are context, not contestants. Persons, organizations, and platforms carry no score, ever.
- Manual decisions are records. Every override carries a written rationale and is published; Apparens-authored works are flagged.
- Humility on rank. A rank is a transparent output of declared evidence, weights, and missing-data rules at a release date, not a verdict on intrinsic worth.
Ontology v0.2 (frozen)
Canonical entities (book, paper, report, standard) are scored within their domain. Context entities (person, organization, platform) are described, never ranked: structurally, they carry no score field. Governance records (releases, challenges, overrides) are append-only.
Weighting scenarios
| Scenario | citation_count | library_holdings | readership_persistence | syllabus_adoptions |
|---|---|---|---|---|
| academic | 0.5 | 0.2 | 0.05 | 0.25 |
| broad_influence | 0.2 | 0.25 | 0.4 | 0.15 |
| governance_practitioner | 0.25 | 0.3 | 0.1 | 0.35 |
Missing-data penalty factor: 0.5. Normalization: per_domain_min_max. method_version 0.1-pilot. These are pilot placeholder weights; every change ships with a changelog entry.
What each signal means
- citation_count: all-time citations from OpenAlex (CC0). The scale of scholarly impact.
- readership_persistence: the number of distinct years a work keeps being cited (from OpenAlex counts_by_year). A longevity proxy: a work cited across many years scores higher than a one-year spike. It rewards enduring use, not recent volume.
- library_holdings, syllabus_adoptions: declared but not yet harvested for the pilot (WorldCat / Open Syllabus drops pending). Works are penalized for them by rule, never imputed.
Declared deferred capabilities
The method names these now and does not pretend they are done. Each is deferred openly, not silently stubbed:
- Per-ecosystem normalization (rule 5): scoring runs per domain today. Per-language normalization activates only once works from more than one ecosystem enter a scored domain. Until then the site does not claim worldwide or present-tense multilingual coverage; the Chinese spine (28 works) is a declared gap.
- A fuller longevity proxy: library holdings over time, edition count, and continued availability, to complement readership_persistence.
- Book scoring: books are curated and browsable now but not yet scored; the pilot ranks papers only.
Cite this method
The method is documented in a citable note (Corpus Cognitivum), archived with a DOI: doi.org/10.5281/zenodo.21042034 (concept DOI, always the latest version). It is licensed CC BY 4.0.
Janssen, J. (2026). The AI Canon: a method for auditable knowledge curation (Corpus Cognitivum). Apparens. https://doi.org/10.5281/zenodo.21042034