AI and Productive Relations 3: Institutional Design for AI Agents

Alignment, safety, and interpretability are necessary. AI governance also needs institutional design for ownership, work, compute, and accountability.

From Technical Safety to Institutional Governance

The first post treated AI as a productive capacity: a material system of compute, energy, data, models, chips, and infrastructure. The second treated capable agents as production subjects: semi-autonomous actors inside firms, markets, workplaces, and institutions.

This final post asks what governance follows from that picture.

Technical AI safety already addresses part of the problem. Constitutional AI, alignment, responsible scaling policies, interpretability, autonomy measurement, agent protocols, and model welfare research make systems more controllable, inspectable, and governable.

They do not settle the institutional question:

Who writes the rules for artificial agents, who benefits from their deployment, who can contest their decisions, and who controls the infrastructure they require?

That is an institutional design question. It concerns legitimate authority, accountability, and the distribution of benefits and costs.

Constitutional AI as Internal Norm Formation

Anthropic’s work on Constitutional AI gives a model written principles and trains it to critique, revise, and constrain its own behavior according to those principles. The method uses AI feedback to reduce reliance on direct human labeling for every harmful or borderline output.

Read narrowly, this is a safety technique. Read socially, it is an early form of internal norm formation for artificial agents.

Human institutions do not rely only on punishment after the fact. They shape behavior through education, law, professional norms, culture, organizational routines, and internalized constraints. If AI agents act inside social production, they need internal constraints too. A model constitution is one technical mechanism for producing them.

The limitation is just as important. A company-written model constitution can improve behavior, clarify values, and make safety choices more inspectable. It still leaves open who chooses the principles, how conflicts between principles are resolved, and how affected users and communities can contest the rules.

Alignment to Whom?

Alignment usually means making a system follow human intentions or values. That phrase works for a low-stakes assistant. It becomes ambiguous for production agents.

A system can align with a user’s request, an organization’s policy, a platform’s business model, a security requirement, a worker’s interest, a community’s welfare, a regulator’s rule, or a public-interest objective. These targets can conflict.

A sales agent may optimize revenue while harming consumers. A workplace management agent may optimize throughput while eroding worker autonomy. A recommender may optimize engagement while degrading information quality. A security-oriented model may satisfy its operator’s objective while creating broader institutional risk.

For high-stakes deployments, alignment has to expand from behavioral compliance to institutional alignment. The key question is not simply whether the model follows instructions. The question is whether the institution giving those instructions has legitimate goals, accountable authority, and contestable procedures.

A technically obedient system can still be socially harmful.

Safety, Legitimacy, and Distribution

Responsible scaling policies address another part of the puzzle. Anthropic’s Responsible Scaling Policy Version 3.0 describes its voluntary framework for mitigating catastrophic risks from advanced AI systems. The third version separates company plans from broader industry recommendations and introduces a Frontier Safety Roadmap.

This kind of safety policy matters. Industrial societies regulate risky productive systems: aviation, nuclear power, pharmaceuticals, finance, mining, chemical plants, and critical infrastructure. AI agents may require analogous institutions: pre-deployment evaluations, red teaming, incident reporting, post-deployment monitoring, capability thresholds, access controls, and emergency shutdown procedures.

Safety policy still leaves distribution open. A system can reduce catastrophic misuse while concentrating wealth, weakening workplace bargaining power, expanding surveillance, extracting rents from dependent firms or regions, or making public services depend on private platforms.

Technical safety is necessary. Legitimacy and fair distribution require more.

Interpretability as Audit Science

Interpretability has a central role in this broader governance picture. My technical posts treat mechanistic interpretability and causal representation learning as ways to make learned computation scientifically legible. In social deployment, interpretability also becomes audit science.

If an AI agent refuses a task, manipulates a user, colludes with another agent, discriminates between workers, hides information, or optimizes against institutional goals, output observation may fail. We need tools for understanding internal representations, circuits, objectives, uncertainty, and failure modes.

Interpretability can support safety evaluation by identifying dangerous capabilities and mechanisms. It can support legal evidence by explaining consequential decisions. It can support workplace protections by auditing automated evaluation and discipline systems. It can support market oversight by detecting collusion, deception, or manipulation. It can support external accountability by making deployed systems contestable.

Interpretability explains mechanisms. Institutions decide rights, ownership, appeal, remedy, and authority.

Three Governance Layers

AI governance has at least three layers.

Layer	Object	Main Question
Model layer	internal model behavior	What principles constrain outputs and actions?
Deployment layer	release, access, monitoring, and revocation	Where, how, and under what safeguards can the system operate?
Institutional layer	ownership, access, work, and accountability	Who owns, governs, contests, and benefits from AI infrastructure?

The model layer covers honesty norms, refusal behavior, privacy constraints, helpfulness, and harm avoidance.

The deployment layer covers evaluations, risk thresholds, red teaming, access controls, agent permissions, security requirements, incident reporting, monitoring, and shutdown procedures. The NIST AI Risk Management Framework and the EU Artificial Intelligence Act sit partly at this layer.

The institutional layer covers compute access, research infrastructure, workplace governance under AI management, competition policy, funding mechanisms around AI-generated gains, data rights, energy allocation, cross-border deployment, agent identity, and stakeholder participation.

The third layer cannot come from model companies alone. It depends on public institutions, worker representation, technical expertise, community input, and international coordination.

Agent Protocols as Institutional Infrastructure

Agentic AI also needs protocols. Anthropic introduced the Model Context Protocol as an open standard for connecting AI systems to data sources and tools. Later, Anthropic announced that it was donating MCP to the Agentic AI Foundation, a directed fund under the Linux Foundation.

This is technical infrastructure, but it also has institutional effects. Protocols define what agents can access, how permissions work, how actions are logged, how identities appear, how systems revoke access, and how one tool recognizes another.

Rail standards, accounting systems, corporate registries, electrical grids, and internet protocols all became institutional infrastructure for earlier economic systems. Agent protocols may play a similar role for AI production.

Protocol design is therefore more than plumbing. It shapes auditability, security, market structure, platform power, and external accountability.

Model Welfare Without Anthropomorphism

Model welfare research raises a separate question. Anthropic’s model welfare program asks whether future systems might have morally relevant experiences or preferences. The topic is difficult, and the post itself emphasizes uncertainty and lack of scientific consensus.

A clean governance agenda should separate three concepts:

Concept	Question	Evidence Needed
Economic agency	Can the system act as a stable participant in production?	behavior, tool access, persistence, institutional effects
Legal traceability	Can institutions assign responsibility for its actions?	identity, logs, authority, controller, permissions
Moral patienthood	Could the system have welfare-relevant experience?	a much stronger theory and evidence of experience

The first may arrive before the second is mature. The third remains open. Collapsing all three into one debate creates two errors at once: anthropomorphism on one side, institutional blindness on the other.

Current governance can recognize artificial agents as economically consequential without treating them as moral patients.

A Research Agenda for Artificial Production Subjects

If AI becomes a productive capacity and some agents become production subjects, the institutional agenda expands beyond model behavior.

Domain	Institutional Question
Agent identity	Which high-autonomy systems must register, log actions, and declare controllers?
Liability	How should responsibility split across developers, deployers, owners, users, and tool providers?
Workplace governance	What protections apply when workers are managed, scored, or disciplined by AI?
Audit	Who can inspect consequential automated decisions, and under what process?
Compute access	What shared compute should support science, education, health, law, and local government?
Distribution	How should institutions share gains from compute, models, platforms, and automation?
Energy	Who pays for data center grid expansion, and who receives priority during scarcity?
Competition	How should interoperability rules limit platform lock-in?
Cross-border deployment	How should cross-border agents, cloud dependence, and compute concentration be governed?

This table is a research agenda rather than a single policy program. Different societies will choose different mixes of public ownership, regulation, market design, workplace protections, welfare policy, and industrial strategy.

The core point is simple: technical alignment cannot substitute for these choices.

Summary

AI governance needs three layers. Model layers shape internal behavior. Deployment layers manage release, access, monitoring, and risk. Institutional layers determine ownership, workplace protections, distribution, accountability, and participation.

Technical AI safety, interpretability, and alignment are necessary. They do not decide who controls compute, who receives AI-generated value, who can contest automated decisions, or how artificial agents participate in markets and institutions.

The main conclusion of this series is therefore:

The AI era needs a research program for artificial production subjects and the institutions around them.

The technical task is to understand and control AI systems. The institutional task is to build accountability mechanisms before those systems become too deeply embedded in production, governance, markets, and everyday life to redirect.

Citation

If you found this post useful, please consider citing it:

@article{song2026aiproductiverelations3,
  title={AI and Productive Relations 3: Institutional Design for AI Agents},
  author={Song, Xiangchen},
  year={2026},
  month={May},
  url={https://xiangchensong.github.io/blog/2026/ai-productive-relations-3/}
}