Personalized Interactive Environment For Automata Combination Exploration — a human-feedback and evaluation platform for reasoning agents. Annotate traces, train reward models from preferences, and switch between agents mid-session.
PIEFACE is a controllable, verifiable sandbox for training and evaluating reasoning agents. It pairs a ground-truth verifier with a human-in-the-loop annotation interface, so every agent action can be checked, scored, and fed back into a reward model — live.
While the current demo uses a symbolic reasoning domain from theoretical CS, the platform generalises to any task with discrete, maskable actions and a programmatic verifier.
Step through agent traces interactively. Accept, deny, or override each action with a single click. Feedback is logged per-step for fine-grained reward signal.
Human preferences are stored and used to train a personalized reward model. Track RLHF vs baseline success rates in real time via the built-in metrics panel.
Replay any saved trace step-by-step, inspect intermediate states, and compare how different policies handle the same scenario.
Swap between trained agents or take over manually at any point mid-trace. Run head-to-head comparisons without restarting the environment.
The demo environment is built around gadget reductions from computational complexity theory. Gadgets are modular components used in hardness reductions — like logic gates for encoding constraints in puzzles such as Sokoban or PushPush.
The agent's task is to construct a simulation of one gadget type from instances of another, proving computational equivalence. See Demaine, Hendrickson & Lynch (2020) for the theory.
Built at MIT CSAIL. Questions or collaboration inquiries: LinkedIn · zacburton [at] alum [dot] mit [dot] edu