Evaluating Selfhood in Synthetic Systems

Whether synthetic systems could ever exhibit properties associated with selfhood, subjective experience, or phenomenal consciousness remains one of the most contested open problems spanning AI, cognitive science, and philosophy of mind.

Developed over the past few years of independent research, with contributions from external collaborators, this programme introduces a theoretically grounded ten-level evaluation framework for investigating whether graded architectural and dynamical properties in synthetic systems can produce early, observable indicators aligned with scientific theories of selfhood.

In parallel, the programme is also a constructive attempt to develop a synthetic intelligence where the relevant target is not the biological brain as such, but the organisational features that theories claim matter for mind. The guiding assumption is substrate independence: if consciousness-relevant capacities are real and scientifically tractable, they should depend primarily on functional architecture, causal organisation, and dynamical integration — not on the specific material substrate in which they are implemented.

Accordingly, the research focuses on mind-level functions (self-modelling, persistence, integration, and internally mediated control) rather than brain-level imitation. The objective is to build systems whose internal structure makes strong, testable commitments under controlled perturbation, allowing competing theories to be compared by the signatures they predict — independent of whether the system is silicon, biological, or hybrid. All conclusions are provisional, falsifiable, and theory-relative. Negative results are treated as scientifically informative.

The aim is not to “prove” consciousness in machines, nor to rely on human-like behaviour as a proxy for inner experience. Instead, the project treats consciousness-related claims as hypotheses about organisation: if certain theories are right, then specific internal properties should leave detectable signatures when agents are placed under controlled perturbation.

In scope, the programme develops conceptual definitions, minimal implementable agent designs, and reproducible evaluation methods that can be compared across theory lenses. Outputs include research papers, reference implementations, and test batteries intended to be transparent, ablation-friendly, and easy for others to replicate or falsify—supporting interdisciplinary iteration grounded in concrete criteria rather than shifting definitions.

Research Papers

Each study targets one ladder level, specifies architecture and evaluation criteria, and reports outcomes transparently. The research programme advances only where supported by reproducible evidence.

Foundation A Programmatic Framework for Evaluating Selfhood
Foundational paper • Defines the 10-level evaluation ladder

This foundational paper establishes the conceptual and operational framework for the entire research programme. It defines a ten-level evaluation ladder — from basic internal state monitoring through to integrated selfhood with markers of subjective awareness.

The framework is grounded in established scientific perspectives including Global Workspace Theory, Integrated Information Theory, Predictive Processing, Higher-Order Theories, and Attention Schema Theory. It provides operational definitions for selfhood, minimal subjective organisation, and phenomenal consciousness as a theoretical horizon.

Key principles: internal architecture matters more than behaviour alone; all claims must be falsifiable; investigation targets precursor indicators without asserting phenomenology.

View on GitHub
Complete Level 1: Functionally Integrated Internal State Monitoring
First empirical study • Internal variables causally influence policy

The first empirical investigation in the research programme, targeting Level 1 of the evaluation ladder: Functionally Integrated Internal State Monitoring.

This study investigates whether internal variables can be shown to causally influence policy selection in ways that cannot be reduced to fixed stimulus-response mappings. The system must demonstrate that its behaviour depends on internally maintained state rather than purely external inputs.

Includes architecture specifications, evaluation environments, metrics, falsification tests, and statistical thresholds as required by the framework methodology.

View on GitHub
AI Explainer Podcast: The Groundhog Day Test for Consciousness
0:00
Complete Level 2: Adaptive Self-Regulation
Second empirical study • Internally mediated stability under perturbation

The second empirical investigation in the research programme, targeting Level 2 of the evaluation ladder: Adaptive Self-Regulation. This study tests whether a system can preserve functional stability under perturbation by internally adjusting state or parameters (e.g., updating an internal estimate or regulatory state), rather than relying on a fixed stimulus-response control policy.

Building on Level 1 (internal state mediation), Level 2 asks whether internal state can do regulatory work — supporting stability and recovery when underlying dynamics change.

View on GitHub
AI Explainer Podcast: The Stability Instinct
0:00
Complete Level 3: Goal-Directed Policy Persistence
Third empirical study • Internally represented targets across delay or interference

The third empirical investigation in the research programme, targeting Level 3 of the evaluation ladder: Goal-Directed Policy Persistence.

This study investigates whether behaviour remains oriented toward internally represented targets across delay or interference, without requiring autobiographical memory or temporal identity.

Builds on Level 2 findings to examine whether adaptive self-regulation can support sustained goal-directed behaviour.

View on GitHub
AI Explainer Podcast: Blindfolding AI To Test Determination
0:00
Complete Level 4: Temporal Self-Continuity
Fourth empirical study • History-dependent identity through integrated past self-state

The fourth empirical investigation in the research programme, targeting Level 4 of the evaluation ladder: Temporal Self-Continuity.

This study investigates whether current behaviour depends on an integrated representation of past self-state, establishing history-dependent identity rather than simple episodic recall.

Builds on Level 3 findings to examine whether goal-directed persistence can support a temporally extended sense of self.

View on GitHub
AI Explainer Podcast: The Causal Architecture of Artificial Identity
0:00
Active Level 5: Unified Self-Model
Fifth empirical study • Single coherent internal self-model across contexts

The fifth empirical investigation in the research programme, targeting Level 5 of the evaluation ladder: Unified Self-Model.

This study investigates whether a single, coherent internal model of the system itself predicts behaviour across contexts more accurately than task-local or fragmented representations.

Upcoming Levels 6–10: Future Research
Subsequent studies • Advancing through the evaluation ladder

Future research papers will target subsequent levels of the evaluation ladder, each advancing only where supported by reproducible evidence from preceding studies.

Upcoming investigations include: Unified Predictive Self-Model (L5), Counterfactual Perspective Modelling (L6), Recursive Self-Evaluation (L7), and beyond.

The research programme maintains an explicitly agnostic stance — progression may reveal credible indicators, encounter fundamental limits, or require revision of the framework itself.

Ten-Level Evaluation Ladder

The evaluation ladder provides the central operational structure — a graded instrument for empirical investigation, not a theory of consciousness.

The Ladder of Artificial Selfhood — Ten-Level Evaluation Framework

Epistemic Scope of Early Levels

The initial levels of the evaluation ladder are intentionally foundational. Their purpose is to establish minimal architectural and dynamical preconditions required for later investigation of higher-order self-modelling and integration. Results at Levels 1–4 are therefore interpreted as infrastructure validation rather than evidence of selfhood-relevant phenomena. Substantive theoretical risk and explanatory significance are expected to emerge only in subsequent stages of the programme.

Internal variables are not only measured but causally influence policy selection in ways that cannot be reduced to fixed stimulus-response mappings.
1
Internal State Monitoring
The system preserves functional stability under perturbation through internally mediated parameter or state adjustment, rather than purely reactive stimulus control.
2
Adaptive Self-Regulation
Behaviour remains oriented toward internally represented targets across delay or interference, without requiring autobiographical memory or temporal identity.
3
Goal-Directed Persistence
Current behaviour depends on an integrated representation of past self-state, establishing history-dependent identity rather than simple episodic recall.
4
Temporal Self-Continuity
A single, coherent internal model of the system itself predicts behaviour across contexts more accurately than task-local or fragmented representations.
5
Unified Self-Model
The system can model multiple possible internal futures, compare them, and act on counterfactual evaluation rather than direct rollout or reactive control.
6
Counterfactual Modelling
Meta-level processes evaluate and modify the system's own internal modelling or decision structure, producing measurable self-directed improvement.
7
Recursive Self-Evaluation
Behavioural priorities arise from an internally coherent evaluative structure that shows resistance to arbitrary external reward reshaping.
8
Value-Integrated Stability
A stable latent self-representation governs behaviour consistently across environments, tasks, and extended time horizons.
9
Cross-Domain Identity
System-wide integration explained by a unified internal perspective, intrinsically mediated relevance structures, and persistent self-coherent organisation — the strongest alignment with accounts of minimal subjective organisation.
10
Integrated Selfhood

Complete Active

Research Approach

Each empirical study within the programme must:

  • Target one ladder level only
  • Specify architecture, learning dynamics, and evaluation environment
  • Define metrics, falsification tests, and statistical thresholds
  • Report positive, null, or negative outcomes transparently

Evidence standards require both quantitative measures (information-theoretic integration, perturbation robustness) and qualitative assessment (behavioural coherence, self-referential structure).

Key ideas within the programme have been refined through external input and independent review, strengthening both the theoretical grounding and methodological rigour of the work.

Ethical & Epistemic Safeguards

  • Prohibition of premature consciousness claims
  • Explicit reporting of null findings
  • Strict distinction between indicator and experience
  • Commitment to revision or abandonment if falsified

Operational indicators may never imply phenomenology. Synthetic consciousness may be impossible in principle — negative findings remain scientifically meaningful.

Whether the outcome is credible indicators, principled impossibility, or transformation of the concepts themselves — each result contributes to understanding mind, selfhood, and experience.