CatOpt-Flow: Agent Architecture Guide ===================================== This document outlines the architectural approach used in CatOpt-Flow and establishes guidelines for contributors and automated tooling. It complements the codebase and the CI/test contracts already in place. Overview - CatOpt-Flow is a production-oriented platform for multi-tenant ML training pipelines across heterogeneous accelerators. - It models optimization as category-theory-inspired primitives: Objects (local tasks), Morphisms (data-exchange channels with versioned schemas), and Functors (adapters mapping device-specific problems to a vendor-agnostic representation). - Global constraints are enforced via Limits/Colimits, providing an aggregator that stitches local problems into a globally consistent plan. - An ADMM-like distributed solver runs on each node and communicates summarized statistics through a delta-sync protocol that tolerates dynamic scaling and partial failures. - A lightweight schema registry and contract marketplace enable plug-and-play adapters for popular ML frameworks and hardware backends. - Code generation tooling is provided to output orchestration stubs (Rust/C++) and Python bindings for rapid deployment with minimal vendor lock-in. What to Build (MVP Path) - Protocol skeleton with two starter adapters per platform. - Delta-sync, simple governance ledger, and identity primitives (DID-based). - Cross-domain demo with a simulated domain (Phase 2) and HIL validation (Phase 3). - A minimal DSL sketch: LocalProblem/SharedVariables/PlanDelta and toy adapters to bootstrap interoperability. Development Rules - All changes should be driven by tests. If a feature requires a new test, add it alongside the implementation. - Use the existing test.sh to validate tests and packaging build. The script runs pytest and builds the package via python -m build. - Do not break the public API unless explicitly requested. If you add new classes, export them from the package’s __init__ to ease discoverability. - When in doubt, add a small integration test demonstrating a 2-node ADMM interaction before expanding scope. Publishing and Governance - Publishable artifacts should include a clear README, a small DSL sketch, and a contract registry skeleton. - A ready-to-publish signal is provided via a READY_TO_PUBLISH file in the repository root once all required checks pass. Contributing - Open issues and PRs should reference sections of this guide and align with the MVP roadmap. - Documentation updates should accompany code changes. This file is intentionally lightweight but should be kept current with repository changes.