Production compiler rails for ML systems

Distributed training with deterministic controls, measurable throughput, and visible bottleneck telemetry.

PyC packages compiler-next contracts, runtime fallbacks, and benchmark publication into a single operational loop. CPU orchestration and GPU execution are intentionally split, then rejoined through instrumentation so performance changes are explainable.

Loading release metadata...

Operational Flow

Runtime stages are coordinated as a conveyor: host-side preprocessing, pinned-memory transfer, GPU compute, communication sync, then telemetry publication. This keeps throughput high while preserving deterministic rollback behavior.

Pinned host memory staging Async H2D dispatch NCCL synchronized gradient flow Artifact publication pipeline

Install and Validate

cmake -S . -B build -D PYC_BUILD_COMPILER_NEXT=ON -D PYC_BUILD_COMPILER_NEXT_TESTS=ON
cmake --build build --parallel
ctest --test-dir build -C Release --output-on-failure
./build/pyc

Binary downloads:

Latest Distributed Evidence

Loading distributed training insights...

Latest distributed run summary chart
Latest campaign summary
Latest distributed throughput chart
Distributed throughput comparison
Latest distributed pipeline breakdown chart
Latest pipeline breakdown

Published artifacts: manifest.json | latest-summary.json | distributed-latest.json

Compiler Adapter Baseline

Loading benchmark publication data...

CPU Adapter Summary

Adapter Mode Mean (ms) P50 (ms) P95 (ms) Throughput

GPU Adapter Summary

Adapter Mode Mean (ms) P50 (ms) P95 (ms) Throughput
Latest CPU benchmark chart
CPU baseline snapshot
Latest GPU benchmark chart
GPU baseline snapshot

Release Assets