2.6 KiB
2.6 KiB
CatOpt-Flow: Agent Architecture Guide
This document outlines the architectural approach used in CatOpt-Flow and establishes guidelines for contributors and automated tooling. It complements the codebase and the CI/test contracts already in place.
Overview
- CatOpt-Flow is a production-oriented platform for multi-tenant ML training pipelines across heterogeneous accelerators.
- It models optimization as category-theory-inspired primitives: Objects (local tasks), Morphisms (data-exchange channels with versioned schemas), and Functors (adapters mapping device-specific problems to a vendor-agnostic representation).
- Global constraints are enforced via Limits/Colimits, providing an aggregator that stitches local problems into a globally consistent plan.
- An ADMM-like distributed solver runs on each node and communicates summarized statistics through a delta-sync protocol that tolerates dynamic scaling and partial failures.
- A lightweight schema registry and contract marketplace enable plug-and-play adapters for popular ML frameworks and hardware backends.
- Code generation tooling is provided to output orchestration stubs (Rust/C++) and Python bindings for rapid deployment with minimal vendor lock-in.
What to Build (MVP Path)
- Protocol skeleton with two starter adapters per platform.
- Delta-sync, simple governance ledger, and identity primitives (DID-based).
- Cross-domain demo with a simulated domain (Phase 2) and HIL validation (Phase 3).
- A minimal DSL sketch: LocalProblem/SharedVariables/PlanDelta and toy adapters to bootstrap interoperability.
Development Rules
- All changes should be driven by tests. If a feature requires a new test, add it alongside the implementation.
- Use the existing test.sh to validate tests and packaging build. The script runs pytest and builds the package via python -m build.
- Do not break the public API unless explicitly requested. If you add new classes, export them from the package’s init to ease discoverability.
- When in doubt, add a small integration test demonstrating a 2-node ADMM interaction before expanding scope.
Publishing and Governance
- Publishable artifacts should include a clear README, a small DSL sketch, and a contract registry skeleton.
- A ready-to-publish signal is provided via a READY_TO_PUBLISH file in the repository root once all required checks pass.
Contributing
- Open issues and PRs should reference sections of this guide and align with the MVP roadmap.
- Documentation updates should accompany code changes.
This file is intentionally lightweight but should be kept current with repository changes.