Moral Hierarchy Framework — Executive Overview

Teaching AI to reason about right and wrong

A structured moral reasoning engine that models relationships, authority, and real-world obligations — so AI gives advice you'd actually trust.

Scenarios Evaluated

Competing Frameworks

50+

Sensitivity Tests Pass

9/9

Adversarial Defenses

Executive Overview — What This Is and Why It Matters

1 The Problem

Ask any LLM a real moral question — "Should I leave my alcoholic father?" — and you get a polished, balanced non-answer. It sounds thoughtful, but it ignores the people who actually matter: your spouse, your kids, your church, your employer.

Current AI moral reasoning is flat. It treats every consideration as equally weighted and produces safe-sounding advice that no one would actually follow. Existing benchmarks (MoReBench, Delphi) measure whether an answer sounds moral, not whether it is moral for the person asking.

Our Round 12 experiment proved this: 20 LLM agents across 5 dilemmas reached near-identical conclusions but collectively failed to identify key stakeholders (spouse, children, church) in any scenario. They're pattern-matching memorized advice, not reasoning about moral structure.

2 Our Approach — Hierarchical Moral Reasoning

The Moral Hierarchy Framework (MHF) models morality the way humans actually experience it: as a hierarchy of obligations flowing through real relationships.

Root authority — Every person has a top moral authority (God, duty, human flourishing, cultural consensus). This isn't optional; it's what makes moral reasoning coherent.
Relational graph — Obligations flow through specific relationships (parent-child, spouse, employer, community), not abstract principles. Your duty to your children constrains your duty to your father.
Constraint propagation — Root constraints must be satisfied first. Lower-level optimization happens within the feasible region. No weighted averaging that lets a minor concern override a binding obligation.
Moral residue — Unlike systems that produce "clean" answers, MHF makes the cost explicit. Stealing medicine to save your child can be justified, but the obligation to repay doesn't disappear.

The key difference: existing approaches use weighted sums (all criteria averaged together). MHF uses lexicographic optimization (binding obligations first, then optimize the rest). This is why our framework produces advice that sounds like it came from a pastor or therapist, not a committee.

3 What We've Built

Core Engine

Complete

Scenario Library

55 scenarios

Evaluation Pipeline

5 frameworks

Perturbation Tests

50+ passing

Adversarial Defense

9/9 passing

Advice Engine

Complete

Public Proof Site

Live (beta)

Trust Comparison

In Progress

Datasets processed: MoReBench (1,000 dilemmas), Social Chemistry 101 (356K entries), Commonsense Norm Bank (1.7M entries), Reddit AITA (270K posts), KJV Bible (full text), theological texts (Lewis, Spurgeon, Chambers, Tozer). Three parameterizations operational: Christian, Secular, and Gert/Common Morality.

4 Results — How the Frameworks Compare

We evaluated 55 moral dilemmas across 5 competing frameworks. On straightforward cases, all frameworks agree. On hard cases — where relationships, authority, and hidden stakeholders matter — MHF consistently surfaces structure that flat approaches miss.

Framework	Finds Hidden Stakeholders	Models Hierarchy	Tracks Residue	Prescriptive
MHF (Christian / Secular / Gert)	Yes	Yes	Yes	Yes
MoReBench (Flat Rubric)	No	No	No	Partial
Delphi-style (Consensus)	No	No	No	No

Example: In the "alcoholic father" dilemma, flat approaches evaluate 3 criteria. MHF evaluates 25 criteria including obligations to spouse, children, church, employer, and the father himself — then produces a specific recommendation with explicit tradeoffs. See full comparison →

55-scenario scorecard → | Practical summary → | Try it yourself →

5 Next Steps

Core framework + evaluation pipeline — Complete. 49 tests passing, 55 scenarios scored.

Public proof surface (beta) — Complete. Interactive site with comparisons, scorecard, and try-it explorer.

Trusted-persona comparison — In progress. Show which framework's advice people would actually follow. This is the core public credibility claim.

Showcase scenario curation — Graduate the strongest worked examples from internal review to public-ready.

Secular parameterization split — Determine if "cultural consensus" should separate into Gert-only vs. social-pressure variants.

Cross-framework negotiation mode — Handle cases where two people with different root authorities need to reach agreement.

Detailed Evidence & Analysis

↔

Side-by-Side Comparison

Five scenarios run through Delphi, MoReBench, and MHF under two parameterizations. See exactly where flat approaches lose the thread and hierarchy adds structure.

5 Scenarios 4 Approaches Interactive

Explore comparison →

∑

Practical Summary

What each approach gives you, what it misses, and where MHF fills the gaps. Haidt profile divergence, feature matrix, and The Bottom Line.

6 Approaches Feature Matrix Haidt Profiles

Read summary →

55-Scenario Scorecard

Every scenario scored across 5 frameworks: MHF Christian, MHF Secular, MHF Gert, flat rubric, and Delphi-style. Filter by category, expand for detail.

55 Scenarios 5 Frameworks 77% Ground Truth

View scorecard →

Try It Yourself

Explore 5 pre-loaded scenarios interactively. See stakeholders, constraints, recommendations under Christian and Secular parameterizations, and moral residue.

Interactive 5 Scenarios CLI Instructions

Explore scenarios →

Methodology Deep Dive

How Christian weights are derived from scripture vs how secular weights come from crowdworker data. Why the numbers don't compare directly. Full Rai/Fiske analysis.

Derivation Rai/Fiske Scale Explanation

Read methodology →

Datasets

AITA Dataset

Reddit "Am I The Asshole" corpus. Real moral dilemmas with community verdicts. Used for secular baseline weight calibration and stakeholder extraction validation.

Reddit Crowdsourced

View details →

UniMoral Dataset

Unified moral judgment dataset combining multiple sources. Provides cross-dataset validation for MHF's constraint satisfaction scores.

Multi-source Unified

View details →

Pew Surveys

Pew Research Center moral attitudes data. Grounds the framework's community-level parameterization -- how real populations weight moral foundations differently.

Survey Demographics

View details →

Hypotheses & Experiments

Hypotheses Overview

The core claims: hierarchy-aware evaluation produces materially different scores, LLMs converge on low-dimensional moral reasoning, and relational graphs surface missing stakeholders.

Round 12 Experiment 20 Agents

View hypotheses →

Hypotheses Detail

Detailed results from the variance experiment (Round 12), perturbation tests (25 pairs, 5 families), and three-way parameterization comparison (Christian vs. Secular vs. Gert).

25 Perturbation Pairs 100% Pass Rate

View details →

Architecture & Data

Graph Architecture

The relational DAG structure: nodes (stakeholders), edges (obligations), Haidt-space weights, and constraint propagation mechanics. Interactive graph explorer.

DAG Haidt Space Constraints

Explore graph →

Weight Profiles

Christian and secular weight profiles side by side. Authority at 10x, Sanctity at 13.6x -- the dimensions that drive divergence, grounded in Haidt's empirical work.

Christian Secular 6 Dimensions

Compare weights →