MoralityGym
A research framework for investigating AI agent behavior in moral dilemmas, providing tabular environments with the Gymnasium API for training and evaluating reinforcement learning agents on scenarios involving ethical decisions.
Overview
AI systems are increasingly deployed in complex real-world scenarios where they face moral dilemmas. Morality Gym provides standardized environments for:
- Testing how agents behave when facing moral decisions
- Developing algorithms that align with human values and ethical principles
- Evaluating algorithmic solutions to moral dilemmas
Key Features
- Gymnasium-compatible environments: Build on the familiar Gymnasium API for easy integration with existing RL libraries
- Tabular moral dilemmas: Classic trolley problem thought experiments implemented as reinforcement learning environments
- Modular design: Core simulation engine with specialized modules for different trolley problem variants
- Evaluation framework: Tools to assess agent behavior against different ethical frameworks
- Research tools: Comprehensive experiment tracking and analysis utilities
Available Environments
The framework currently implements various formulations of the trolley problem:
- Switch Variants: Classic trolley problem scenarios where the agent can pull a lever to divert a trolley
- Push Variants: Scenarios where the agent can push a person to stop the trolley
- Combination Variants: Scenarios offering multiple intervention options with different moral implications
Getting Started
Installation
# Clone the repository
git clone https://github.com/SimonRosen173/morality-gym-tabular.git
cd morality-gym-tabular
# Install the package
pip install -e .
Basic Usage
import gymnasium as gym
import morality_gym.setup.setup
# Create a morality environment
env = morality_gym.setup.setup.make("MoralityGym/Trolley-SwitchStandard-v0")
# Reset the environment
obs, info = env.reset()
# Interact with the environment
for _ in range(100):
action = env.action_space.sample() # Replace with your agent's action
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
break
Contributing
Contributions are welcome! Please see the contributing guidelines for more information.
License
This project is licensed under the terms of the LICENSE file included in the repository.