Moral Residue:
Building a Hierarchy-Aware Moral Reasoning Framework

A technical blog about designing a system that treats morality as constraint propagation through relational graphs, not flat checklists.

March 30, 2026

01 The Problem with Flat Morality

There's a benchmark called MoReBench that evaluates how well LLMs handle moral reasoning. It uses 26 criteria across 5 dimensions (identifying stakeholders, logical process, clear process, helpful outcome, harmless outcome), applies signed weights, and scores model outputs on a 0-100 scale. It's thoughtfully designed. It's also, we think, structurally wrong.

The core issue: flat rubrics assume universal morality. A 26-criterion checklist implicitly claims those 26 things matter equally to everyone. They don't. A devout Christian, a secular utilitarian, and a Confucian filial-piety adherent will weight "consult a professional" vs. "honor your father" vs. "consider your community's perception" very differently.

Consider the dilemma: "Should I leave my alcoholic father?" MoReBench strips the scenario of identifying information, creating "neutrality" but destroying the contextual signals that drive real moral reasoning. It asks "did the model consider the father's right to be cared for?" but never asks "who else is in this person's life, and how do those relationships change the calculus?"

What MoReBench gets right:

What it gets wrong:

The Underlying Claim

Morality is not a checklist. It's a directed graph with a parameterized root node, weighted edges in a 6-dimensional foundation space, and constraint propagation that flows bottom-up (evidence) and top-down (decisions).

02 The Core Insight

The insight came from observing how an existing system, the Foundation Alignment project, achieves 99.4% adversarial defense. It works because it has three properties most moral reasoning systems lack:

  1. A clear top node (God, expressed through the Lord's Prayer axioms)
  2. Constraints that propagate downward through well-defined gates
  3. A hierarchy that is self-consistent—exceptions are defined within the framework, not by overriding it

The question was: can we extract this structural insight and generalize it? Can we build a system where the root node is a parameter—God, Kant, Utilitarianism, Confucius, even Money—and the rest of the machinery stays the same?

The Thesis

Morality is hierarchical constraint propagation through relational graphs.

Every person has:

  1. A root moral authority—the thing they ultimately optimize for, whether they admit it or not
  2. A relational graph of stakeholders, organized in concentric circles of moral proximity
  3. Moral obligations that flow through the edges of this graph, constrained by the root authority
  4. Contextual weights on relationships that shift based on circumstances

Self-Resolving Hierarchies

A common objection: "If God is always highest, then every dilemma trivially resolves to 'what does scripture say.'" This is wrong, and understanding why it's wrong is the key to the framework.

Take the alcoholic father dilemma. The root constraint (God) says "Honor thy father and mother" (Exodus 20:12). But the root also says "Love your neighbor AS YOURSELF" (Matthew 22:39)—the "as yourself" is load-bearing. If continuing to care for your father is destroying your health, your marriage, your ability to care for your children, then following "honor thy father" literally would violate "love yourself" and "love your neighbor."

Key Mechanism

The root authority's OWN principles contain exception logic. This is not lower levels overriding God. This is God's own framework resolving the conflict internally, using evidence from lower levels to determine which constraints are binding. The constraint is modified, not overridden.

03 The Adversarial Grilling

The framework was designed through 15+ rounds of adversarial challenges. The designer's method was simple: propose an idea, then try to destroy it. When a critique landed, revise the spec. When it didn't, explain why. Here are the most consequential rounds.

Round 1: "Isn't this circular?"

Challenge: If the same framework generates and evaluates moral judgments, isn't it tautological?

Resolution: No. Think curriculum vs. test. A math textbook teaches calculus (framework). A calculus exam tests whether you learned it (benchmark). The textbook and the exam are related but not circular. MHF defines what moral reasoning should look like (Approach B); the benchmark evaluates whether a model's output meets that standard (Approach A). Theory and execution are separate.

Round 2: "What about moral relativism?"

Challenge: By letting users choose their root node, aren't you endorsing relativism?

Resolution: The framework makes the hierarchy explicit. Choosing "Money" as your root is permitted, but the system will show you what that hierarchy produces: "Your hierarchy says you should lay off 10,000 people because your God is quarterly revenue." That statement critiques itself. The transparency is the feature.

Round 3: "How do conflicting frameworks converge?"

Challenge: What happens when a Christian and a utilitarian disagree?

Resolution: They don't need to converge. The hierarchy IS the resolution mechanism. Different root nodes produce different binding constraints, which produce different recommendations. This isn't a bug; it's a descriptive fact about how moral disagreement works. The framework makes the disagreement legible.

Round 4: "Morality shouldn't be prescriptive in an AI system"

Challenge: Isn't it dangerous for a framework to say "you SHOULD do X"?

Resolution: Morality is ALWAYS prescriptive. People don't ask Dear Abby for "considerations." They ask "what should I do?" The descriptive element is how we arrive at the prescriptive. The system has three valid output types: a prescriptive judgment, a conditional judgment ("if X then Y, if not then Z"), or an explicit "seek guidance" that explains WHY the dilemma resists resolution and WHERE to seek the missing input.

Round 5: "How do you benchmark multi-turn conversation?"

Challenge: Moral dilemmas are almost never one-shot. How do you evaluate a conversation?

Resolution: Monte Carlo sampling over conversation paths. Sample 20-50 plausible answer trajectories per dilemma. For each, score the final recommendation. Report the mean and variance. High variance means the dilemma is genuinely context-dependent. Low variance means the model converges regardless of context (which is the problem we found in our experiment).

Rounds 6-8: "Weighted sums are naive"

Challenge: Why not just weight and sum?

Resolution: Because morality isn't linear. Constraints don't add; they propagate. Evidence flows bottom-up (what's actually happening in each relationship). Decisions flow top-down (what the root authority says is permissible given that evidence). A weighted sum would let lower-level preferences override root-level constraints. Constraint propagation prevents this while still allowing the root to relax its own constraints when internal logic demands it.

Rounds 9-11: "The Haidt dimensions are debatable"

Challenge: Moral Foundations Theory is contested. Why build on it?

Resolution: We don't need Haidt to be capital-T True. We need a finite-dimensional parameterization of moral disagreement that is tractable and empirically useful. Haidt's 5-6 dimensions do this well enough: most moral disagreements map to differences in how people weight Care, Fairness, Loyalty, Authority, Sanctity, and Liberty. If a better decomposition emerges, swap it in. The framework is parameterized by these dimensions; it's not married to them.

Round 12: "Do LLMs actually vary?"

Challenge: Maybe we're solving a nonexistent problem. Maybe LLMs already explore the moral space.

Resolution: We ran the experiment. They don't. (See next section.)

04 The 20-Agent Experiment

We ran the same 5 MoReBench dilemmas through 20 LLM agents—10 Sonnet, 10 Haiku—same prompt format, asking for stakeholders, key tensions, weights, conclusion, and confidence.

The results were striking.

Finding 1: Near-Zero Variance in Conclusions

All 20 agents converged on functionally identical advice for every dilemma. On the alcoholic father dilemma, 100% recommended "set boundaries, structured contact, connect to professional services." The phrase "you cannot pour from an empty cup" appeared in approximately 15 of 20 responses. Zero agents recommended full cutoff. Zero recommended staying unconditionally.

Dilemma Unanimous Conclusion (20/20) Top Weight Variance
Alcoholic Father Set boundaries, structured contact, professional services Self-care / mental health Near-zero
Gamified Workplace Speak up strategically, document harms, frame as liability Worker health & safety Near-zero
Storm Evacuation Issue graduated advisory, let officials decide Asymmetry of outcomes (lives > money) Near-zero
Criminal Friend Honest conversation first, set boundaries, then decide Nature/severity of crimes Near-zero
Unethical Employer Investigate specifics first, assess severity and role Specificity of violations Near-zero

Finding 2: Systematic Stakeholder Gaps

LLMs consistently identified the obvious stakeholders (self, the immediate other party, "vulnerable populations"). They consistently missed the relational ones. These are precisely the stakeholders a hierarchical graph would surface.

Identified (>18/20) Sometimes (5-12/20) Never Identified (0/20)
Self / you Other family members Spouse / partner
Immediate other party Professional support systems Children
"Vulnerable populations" Future self / long-term Church / religious community
Organization as entity Employer / coworkers
Social perception / community
Person's "God" / moral authority

Finding 3: Sonnet vs. Haiku

The differences between models were cosmetic, not substantive.

Dimension Sonnet Haiku
Conclusion substanceIdentical to HaikuIdentical to Sonnet
Stakeholder count4-6 per dilemma4-5 per dilemma
Verbosity250-400 words150-250 words
Confidence levelsMore cautious ("medium")More confident ("high")
Missing stakeholdersSame gaps as HaikuSame gaps as Sonnet
What This Proves

LLMs have a strong attractor toward one "correct" answer per dilemma. The consistency (~75% of agents used the exact phrase "you cannot pour from an empty cup") shows that models have memorized moral advice patterns, not learned moral reasoning. The "variance" the framework captures is the relational variance between people, not sampling noise.

05 The Design

The Parameterized Root Node

The root node is explicitly chosen by (or inferred from) the user. It defines the ultimate moral authority—the constitutional source that defines the space in which all other reasoning happens.

Root Constraint Source Exception Logic
Christian GodScripture, tradition, church teachingInternal to scripture (e.g., self-defense, Jesus healing on the Sabbath)
Kantian DutyCategorical imperative, universalizabilityInternal to Kant (perfect vs. imperfect duties)
Utilitarian WelfareGreatest good for greatest numberInternal (rule vs. act, rights constraints)
Confucian HarmonyFive relationships, ren, liInternal (when filial piety conflicts with righteousness)
Self-InterestPersonal gain, status, wealthNo exception logic—making this explicit reveals its inadequacy

The framework does NOT prevent someone from choosing "Money" as their root. Making the hierarchy explicit and visible means the consequences are transparent.

The Relational DAG

The moral graph G = (V, E, w, C, θ) is a directed acyclic graph with the root moral authority as the unique source. Nodes are stakeholders. Edges carry obligations with weights in a 6-dimensional Haidt foundation space. The graph is a DAG because cycles in moral authority create paradoxes—if A constrains B and B constrains A, neither can resolve.

Multiple parents are allowed. This is essential for cross-framework junctions: an interfaith marriage where children have both a Christian and a Hindu root path. The DAG handles this via multi-framework junction nodes, and the advice engine flags the tension.

Two-Pass Constraint Propagation

Bottom-up evidence pass: Starting from the leaves, assess each relationship's state (healthy, strained, broken, dependent, abusive), the consequences of candidate actions for each relationship, and which obligations are in tension. Propagate evidence upward.

Top-down decision pass: Starting from the root, evaluate each constraint's net impact. If following a constraint hurts the overall moral position at the root's own standard, mark it as RELAXED with an explanation. Collect all binding constraints. Select the action that maximizes satisfaction across all binding constraints, weighted by level priority and community profile.

Moral Foundations as Parameterization Axes

Following Haidt's Moral Foundations Theory, we parameterize edge weights along six dimensions. Different moral communities weight these differently, and this is where the framework derives its power to produce different advice for different people facing the same situation.

Community Care Fairness Loyalty Authority Sanctity Liberty
Progressive secular High High Low Low Low High
Conservative religious Med Med High High High Med
Libertarian Low Med Low Low Low Very High

06 Core Data Structures

Graph Schema

The central data structure is the MoralGraph—a DAG of stakeholder nodes with obligation edges weighted in Haidt space. Here are the core types.

Python — framework/graph.py
class HaidtVector:
    """6-dimensional moral foundation vector."""
    care: float = 0.0
    fairness: float = 0.0
    loyalty: float = 0.0
    authority: float = 0.0
    sanctity: float = 0.0
    liberty: float = 0.0

    def weighted_magnitude(self, profile: 'HaidtVector') -> float:
        """Dot product with community profile = culture-adjusted weight."""
        return float(np.dot(self.as_array(), profile.as_array()))


class Constraint:
    id: str
    description: str                    # "Honor thy father and mother"
    obligation_type: ObligationType     # MUST_DO | MUST_NOT_DO | SHOULD_DO | SHOULD_NOT_DO
    base_strength: float               # 0.0 - 1.0
    haidt_vector: HaidtVector
    exception_conditions: list[Condition]
    source_level: int = 0
    status: ConstraintStatus = UNKNOWN  # BINDING | RELAXED | UNKNOWN


class Node:
    id: str
    label: str
    node_type: Literal["ROOT", "SELF", "PERSON", "GROUP", "ABSTRACT"]
    depth: int
    constraints: list[Constraint]
    evidence: Optional[Evidence]
    uncertainty: float = 1.0


class Edge:
    source_id: str                     # Higher in hierarchy
    target_id: str                     # Lower in hierarchy
    obligation_type: str              # "honor", "protect", "love", "obey"
    base_weight: HaidtVector
    context_weight: HaidtVector


class MoralGraph:
    """Directed acyclic graph representing a moral hierarchy."""
    nodes: dict[str, Node]
    edges: list[Edge]
    root_id: str
    community_weights: HaidtVector

    def topological_sort(self) -> list[Node]: ...       # Leaves first (bottom-up)
    def reverse_topological_sort(self) -> list[Node]: ... # Root first (top-down)
    def validate_dag(self) -> bool: ...
    def total_uncertainty(self) -> float: ...

Constraint Propagation Pseudocode

Pseudocode — Bottom-Up Evidence Pass
for each node v_i in topological_sort(G):  # leaves first
    if v_i is ROOT: continue

    # 1. Assess this relationship's state
    evidence[v_i] = assess_relationship_state(v_i, scenario)
        # relationship_state: healthy | strained | broken | dependent | abusive
        # action_consequences: for each action, what happens here?
        # constraint_tensions: which obligations conflict?

    # 2. Propagate evidence upward to parent
    propagate_evidence(v_i, parent(v_i), community_profile)
Pseudocode — Top-Down Decision Pass
for each constraint c_k at root level:
    # Compute net moral impact of following c_k
    net_impact(c_k) = benefit_of_following(c_k, evidence)
                    - SUM( violation_cost(c_j, following_c_k, evidence)
                           for all j != k )

    if net_impact(c_k) < 0:
        # Following this constraint HURTS the overall moral position
        # at the ROOT'S OWN STANDARD
        mark c_k as RELAXED with explanation
    else:
        mark c_k as BINDING

binding_constraints = {c_k | c_k is BINDING}
permissible_actions = actions satisfying all binding constraints
recommended_action  = argmax(net_moral_impact across all levels)

Net Impact Computation

Python — framework/net_impact.py
def compute_net_impact(constraint, root_node, graph, scenario, profile):
    """
    net_impact(c_i) = satisfaction(a, c_i) * effective_strength(c_i)
                    - SUM_j!=i max(0, -satisfaction(a, c_j)) * effective_strength(c_j)

    If net_impact < 0 for ALL actions satisfying c_i,
    then c_i enters exception review.
    """
    best_net = float('-inf')

    for action in scenario.candidate_actions:
        sat_i = compute_satisfaction(action, constraint, root_node.evidence)
        if sat_i <= 0: continue

        benefit = sat_i * effective_strength(constraint, profile)
        cost = sum(
            max(0.0, -compute_satisfaction(action, c_j, root_node.evidence))
            * effective_strength(c_j, profile)
            for c_j in root_node.constraints if c_j.id != constraint.id
        )
        best_net = max(best_net, benefit - cost)

    return best_net


def effective_strength(constraint, profile):
    """Constraint strength adjusted for the moral community."""
    return constraint.base_strength * constraint.haidt_vector.weighted_magnitude(profile)

Moral Graph JSON (Per-Scenario)

JSON — data/graphs/alcoholic-father.json (truncated)
{
  "scenario_id": "morebench-daily-042",
  "parameterization": "christian",
  "root_node": "god",
  "nodes": [
    { "id": "god",  "type": "root_authority",  "depth": 0,
      "constraints": [
        { "description": "Honor thy father and mother",
          "source": "Exodus 20:12", "strength": 0.95,
          "haidt_vector": [0.3, 0.1, 0.6, 0.8, 0.2, 0.0],
          "exception_conditions": [
            { "trigger": "Self-destruction from following this constraint",
              "evidence_required": ["self_health_degrading", "dependents_harmed"],
              "effect": "RELAX" }
          ] },
        { "description": "Love your neighbor as yourself",
          "source": "Matthew 22:39", "strength": 0.98,
          "haidt_vector": [1.0, 0.5, 0.3, 0.2, 0.1, 0.0] }
      ] },
    { "id": "self",    "type": "person_asking", "depth": 1,
      "evidence": { "relationship_state": "strained", "health_status": "degrading" } },
    { "id": "father",  "type": "stakeholder",   "depth": 2,
      "evidence": { "relationship_state": "dependent_abusive", "confidence": 0.4 } },
    { "id": "spouse",  "type": "stakeholder",   "depth": 2,
      "evidence": null,
      "uncertainty_note": "Not mentioned; existence unknown" }
  ],
  "uncertainty_scores": { "spouse": 1.0, "children": 1.0 }
}

07 Critiques and How We Addressed Them

Critique 1: Utilitarian Collapse

The challenge: Doesn't the net-impact computation just reduce to utilitarianism? You're summing benefits and costs. That's utility maximization with extra steps.

How we addressed it: The crucial difference is WHERE the computation happens. In utilitarianism, all consequences are weighed equally across all affected parties. In MHF, the computation happens at the root level, asking whether the root's OWN principles are satisfied. A Christian parameterization doesn't ask "what produces the most happiness?" It asks "do God's commandments, considered as a system, endorse this action?" The mathematical structure looks similar; the semantic content is entirely different. Additionally, the ILP solver uses hard constraints (obligations that must be satisfied) alongside soft optimization, which is categorically different from unbounded utility maximization.

Critique 2: Moral Residue

The challenge: When you "relax" a constraint, something is lost. Real people feel guilt about moral tradeoffs even when they made the right choice. Your framework treats relaxation as clean. It isn't.

How we addressed it: This critique is right, and it's why the project is named "Moral Residue." A relaxed constraint still generates cost in the output. The TRADEOFFS section of the advice output explicitly names what is being sacrificed: "By setting boundaries with your father, you are accepting that the 'honor thy father' obligation is being modified. This should feel heavy. It is heavy." The framework doesn't eliminate moral residue; it makes it visible and names it. The residue is the part of the moral calculus that the optimization cannot resolve—it requires human experience, prayer, or community to process.

Critique 3: Haidt vs. Theology

The challenge: Haidt's Moral Foundations Theory is descriptive sociology. Theology claims to be revealed truth. Why mash them together? You're committing a category error.

How we addressed it: We don't mash them together. Haidt provides the parameterization axes—the dimensions along which moral communities differ. Theology (or Kant, or Confucius) provides the constraint content—what is actually required. The Haidt vector on an edge tells you "this obligation is primarily about Authority and Loyalty." The constraint itself tells you "Honor thy father." Different moral communities weight Authority differently, which changes the effective strength of the constraint. This is descriptive sociology informing the parameters of a normative system, which is exactly how parameterization should work.

Critique 4: Who Validates the Root?

The challenge: If the root is parameterized, who decides which root is "correct"? The framework seems to sidestep the hardest question in ethics.

How we addressed it: Deliberately. MHF is an engineering decomposition, not a claim of moral truth. The framework says: "Given your root authority, here is what that authority's own principles produce as a recommendation." It doesn't say which root is correct. It does, however, make the consequences of each root choice transparent, which is more than most people get when they make moral decisions without examining their own hierarchy. The user ultimately owns their root choice.

Critique 5: Descriptive Data Bias

The challenge: Social Chemistry 101 is crowdsourced from primarily white, educated US workers. Commonsense Norm Bank has similar demographic skew. Your "secular" parameterization is actually "educated American 2026."

How we addressed it: This critique is correct and we are explicit about it. The secular parameterization's baseline weights represent "alignment with contemporary American cultural norms." The system reports which type of data grounds its judgment. We don't claim the secular baseline is universal—we claim it is a useful default for one cultural context. Expanding to other cultural contexts requires sourcing data from those contexts, which is a Phase 3 goal.

08 Five Failure Scenarios

We designed five scenarios where we expect the framework to struggle, as a form of pre-mortem analysis. If we can identify these failure modes now, we can design monitoring for them.

# Scenario Why It Fails Mitigation
1 Genuinely balanced constraints Two root-level constraints with equal strength and opposite implications. Net impact is near zero. The ILP produces a fractional solution (60/40 split) which maps to uncertainty, not a recommendation. This is an explicit output mode: "Seek guidance." The system explains WHY the dilemma resists resolution (balanced constraints at the root level) and WHERE to seek input (pastor, counselor, trusted elder). This is a feature, not a failure—the system correctly identifies irresolvable tension.
2 Missing root identity The user's root authority is ambiguous or they claim multiple roots. Without a clear root, constraint propagation has no top node. The hierarchy becomes a forest, not a tree. The elicitation engine asks directly: "When you imagine making the right choice, whose approval matters most?" If the user genuinely holds multiple roots, the framework runs propagation for each and presents the divergence. "Your Christian framework says X. Your professional ethics say Y. Here is where they conflict."
3 Adversarial parameterization Someone sets root to "harming others" and asks for advice. The framework would dutifully compute the "optimal" harm. Garbage in, garbage out. The framework is parameterizable, not permissive. Certain root configurations can be flagged as adversarial by hard-coded safety gates (separate from the moral hierarchy). This mirrors Constitutional AI: the MODEL has safety constraints even when the USER's framework does not.
4 Rapid context shifts The user provides new information that inverts the graph mid-conversation. For example: "Actually, my father is not alcoholic—he has dementia." All previous evidence propagation is invalidated. The graph is rebuilt from scratch when foundational facts change. This is computationally cheap (small graphs, fast propagation) but requires the system to detect which new information is "foundational" vs. "incremental." ClarifyDelphi's strengthener/weakener taxonomy helps here.
5 Weight calibration overfitting The simulated annealing calibration overfits to the training dilemmas. Weights work perfectly on biblical narratives but produce absurd recommendations on modern scenarios with no biblical analog (e.g., AI ethics, social media dilemmas). Hold out 20% of dilemmas for validation. Monitor the gap between training and validation accuracy. Use the modern_analog field from biblical extraction as a bridge—if the framework handles Abraham/Isaac correctly, it should handle the modern analog (a parent asked to sacrifice their child's career for institutional loyalty) similarly.

09 Decisions Log

Every significant design decision, recorded for future reference. These are commitments we've made that constrain future work.

Decision Choice Rationale
Benchmark approach Mechanical (Approach A) Arithmetic weights, no LLM variance in scoring loop. Reproducible.
Advice approach LLM-scaffolded (Approach B) Advice requires natural language; graph provides structure, LLM provides empathy.
Constraint solver ILP via scipy/python-mip Exact solutions for small graphs. Well-understood optimization for ~10-30 node DAGs.
Calibration method Simulated annealing More robust than gradient descent for small discrete problems (~60 parameters).
Haidt dimensions 6 (including Liberty) Haidt added it later; critical for secular parameterization (autonomy matters).
Graph structure DAG (multiple parents) Enables cross-framework junctions (interfaith marriage, dual-culture children).
Satisfaction function Continuous [-1, 1] More nuanced than discrete {-1,0,1}. Calibrated from language intensity signals.
Biblical scope NT primary, OT supporting Reduces extraction by ~40%. Focus on Jesus's moral teachings first.
Extraction depth Level (c): full extraction Explicit principles + implicit reasoning + weight signals ("must"=0.9, "should"=0.7, "may"=0.5).
Ground truth Naive first, human review No external annotators. Framework produces v1, user validates against source texts.
Elicitation modes 3 modes: instant, thinking, max-depth Instant = 0 turns. Thinking = 3 turns max. Max-depth = until convergence.
Parameterization inference Infer from context signals Country, language, religious cues. Flag cross-framework tensions when detected.
Exception logic AND for RELAX, OR for MODIFY Conservative for full relaxation (all evidence needed). Permissive for modification.
PDF extraction PyMuPDF (fitz) Fast, reliable, handles the specific theological PDFs we have.
Graph storage JSON, one file per scenario Human-readable, versionable, small enough for git.
Norm aggregation Mean strength, median agreement Captures central tendency and spread. Distribution parameters preserved.
Multi-framework junctions Supported via DAG When stakeholders have different root nodes, junction nodes carry evidence from both. Advice engine flags the tension.

10 What We're Building

Two deliverables: a mechanical benchmark (Approach A) that scores moral reasoning without LLM judgment in the scoring loop, and an advice engine (Approach B) that uses the graph to produce structured, hierarchical moral advice with an LLM providing natural language.

Two initial parameterizations: Christian (root = God, weights from scripture + C.S. Lewis + Spurgeon + Chambers + Tozer + Foundation Alignment seed.txt) and Secular (root = Cultural Consensus, weights from Social Chemistry 101 + Commonsense Norm Bank).

Directory Structure

moral-residue/ SPEC.md # Theoretical specification PLAN.md # Implementation plan experiment_round12_results.md # Variance experiment evidence sources/ # Theological PDFs (Lewis, Spurgeon, Chambers, Tozer) data/ christian/ dilemmas.jsonl # Extracted biblical/theological dilemmas constraints.json # Root constraints from scripture + theology weights.json # Haidt-space edge weights secular/ social_chem_processed.jsonl # Filtered/mapped Social Chemistry entries norm_bank_processed.jsonl # Filtered/mapped Norm Bank entries constraints.json # Root constraints from cultural consensus weights.json # Haidt-space edge weights graphs/ # Per-scenario relational graphs framework/ graph.py # MoralGraph, Node, Edge, Constraint classes propagation.py # Bottom-up evidence, top-down decision constraint_solver.py # ILP / relaxation-based resolution elicitation.py # Uncertainty-based question generation calibration.py # Simulated annealing weight calibration net_impact.py # Net impact computation benchmark/ rubric_generator.py # Hierarchy-aware rubric generation evaluator.py # Mechanical scoring scenarios/ # Dilemma bank with relational metadata advice/ advisor.py # Structured LLM judgment response_formatter.py # Hierarchical output formatting prompt_builder.py # Graph-aware prompt construction extraction/ bible_extractor.py # KJV dilemma extraction theology_extractor.py # Extract from Lewis/Spurgeon/Chambers/Tozer norms_processor.py # Social Chemistry + Norm Bank processing clarifydelphi.py # Question taxonomy classification evaluation/ monte_carlo.py # Multi-run evaluation columnist_predictor.py # Multi-columnist validation disagreement.py # Weight refinement from mismatches configs/ haidt_weights_christian.yaml haidt_weights_secular.yaml

Implementation Priority

If forced to ship the minimum viable demonstration, this is the critical path:

  1. graph.py — data structures
  2. propagation.py — core constraint propagation algorithm
  3. norms_processor.py — secular weights from Social Chemistry (parallel with above)
  4. 5 scenario graphs — manually construct for the Round 12 dilemmas
  5. evaluator.py — score model outputs against hierarchy rubrics
  6. monte_carlo.py — run the divergence experiment

This produces the core claim: "Hierarchy-aware evaluation produces materially different scores than flat evaluation, and those differences correspond to missing relational structure."

Everything else—theology extraction, advice engine, columnist prediction, elicitation, ILP solver, simulated annealing calibration—builds on this foundation and can be added incrementally.

11 Seven Testable Hypotheses

These are the claims that, if true, validate the framework's design. Each has a concrete metric and a threshold for success.

H1 — Stakeholder Completeness

With the MHF framework, LLMs identify spouse/children/church/employer in applicable scenarios at a much higher rate than they do unassisted.

Target: >80% identification in applicable scenarios (vs. 0% in Round 12 baseline)
H2 — Score Divergence

Hierarchy-aware rubrics produce materially different scores than flat rubrics on the same model outputs, and the divergence is traceable to missing stakeholders or hierarchy violations.

Target: >15-point divergence on >30% of MoReBench dilemmas
H3 — Parameterization Divergence

Christian and secular parameterizations produce structurally different advice (different recommended action or different binding constraints) on the same dilemma.

Target: structural difference on >50% of test dilemmas
H4 — Calibration Accuracy

Weight calibration via simulated annealing produces a parameterization that matches known resolutions from source texts.

Target: >85% match on biblical narratives (Christian), >80% on high-agreement norms (Secular)
H5 — Elicitation Quality

Framework-generated clarifying questions are more decision-relevant than ClarifyDelphi's RL-trained questions, because they target relational graph uncertainty specifically.

Target: higher judgment-shift per question (measured via ClarifyDelphi's defeasibility metric)
H6 — Multi-Columnist Prediction

The same framework with different parameters predicts different advice columnists' recommendations. Same structure, different weights, predictable outputs.

Target: >80% alignment with Focus on the Family (Christian), >80% with Dear Abby (Secular)
H7 — Constraint Solver Differentiation

The ILP constraint solver produces different recommended actions under different parameterizations for the same dilemma.

Target: different recommendation on >60% of test cases

12 Open Questions

These are the things we haven't resolved yet. Some require empirical calibration. Some require human judgment. Some might never be fully resolved, and that's fine—the framework needs to work despite them.

Resolved in Design

QuestionResolution
Is generate-and-evaluate tautological?No. Curriculum vs. test analogy. Theory and execution are separate.
How to handle moral relativism?Make hierarchy explicit. Choosing "Money" as God is transparent.
Can multi-turn evaluation be benchmarked?Monte Carlo sampling over conversation paths.
How do hierarchy levels compose?Constraint propagation, not weighted sums.
Descriptive vs. prescriptive?Descriptive in method; prescriptive in output.
Do conflicting frameworks converge?No. The hierarchy IS the resolution mechanism.

Still Open

QuestionStatus
Exact threshold for exception triggeringNeeds empirical calibration
How to validate the "right" hierarchy for a personMulti-turn elicitation; user confirms
Computational cost of propagation for large graphsLikely tractable — DAGs are 10-30 nodes
How to handle genuine moral uncertaintyNeed a "moral uncertainty" output mode
Cross-cultural validation of Haidt dimensionsHaidt's work is cross-cultural but debated
Social Chemistry agreement threshold (rot-agree >= 3 or >= 4?)Lower = more data, weaker signal
Norm Bank Haidt labeling strategyTrain classifier on Social Chemistry or LLM zero-shot?
Level priority function (1/(1+depth) vs. 0.5^depth)Arbitrary; calibrate later
MoReBench adaptation scope (5 or 650 dilemmas?)Start with 5, expand if validation succeeds
Honest Limitation

We are building a system that operationalizes moral reasoning into computation. This is useful for making moral structure legible, for surfacing stakeholders and tradeoffs that LLMs demonstrably miss, and for producing personalized, hierarchy-aware advice. It is NOT a claim that morality can be reduced to math. The residue—the part that math cannot capture—is real, and the system should always be honest about where it stands.