catopt-flow-category-theore.../AGENTS.md

2.6 KiB
Raw Blame History

CatOpt-Flow: Agent Architecture Guide

This document outlines the architectural approach used in CatOpt-Flow and establishes guidelines for contributors and automated tooling. It complements the codebase and the CI/test contracts already in place.

Overview

  • CatOpt-Flow is a production-oriented platform for multi-tenant ML training pipelines across heterogeneous accelerators.
  • It models optimization as category-theory-inspired primitives: Objects (local tasks), Morphisms (data-exchange channels with versioned schemas), and Functors (adapters mapping device-specific problems to a vendor-agnostic representation).
  • Global constraints are enforced via Limits/Colimits, providing an aggregator that stitches local problems into a globally consistent plan.
  • An ADMM-like distributed solver runs on each node and communicates summarized statistics through a delta-sync protocol that tolerates dynamic scaling and partial failures.
  • A lightweight schema registry and contract marketplace enable plug-and-play adapters for popular ML frameworks and hardware backends.
  • Code generation tooling is provided to output orchestration stubs (Rust/C++) and Python bindings for rapid deployment with minimal vendor lock-in.

What to Build (MVP Path)

  • Protocol skeleton with two starter adapters per platform.
  • Delta-sync, simple governance ledger, and identity primitives (DID-based).
  • Cross-domain demo with a simulated domain (Phase 2) and HIL validation (Phase 3).
  • A minimal DSL sketch: LocalProblem/SharedVariables/PlanDelta and toy adapters to bootstrap interoperability.

Development Rules

  • All changes should be driven by tests. If a feature requires a new test, add it alongside the implementation.
  • Use the existing test.sh to validate tests and packaging build. The script runs pytest and builds the package via python -m build.
  • Do not break the public API unless explicitly requested. If you add new classes, export them from the packages init to ease discoverability.
  • When in doubt, add a small integration test demonstrating a 2-node ADMM interaction before expanding scope.

Publishing and Governance

  • Publishable artifacts should include a clear README, a small DSL sketch, and a contract registry skeleton.
  • A ready-to-publish signal is provided via a READY_TO_PUBLISH file in the repository root once all required checks pass.

Contributing

  • Open issues and PRs should reference sections of this guide and align with the MVP roadmap.
  • Documentation updates should accompany code changes.

This file is intentionally lightweight but should be kept current with repository changes.