
Trainy
ActiveInfrastructure for managing GPU clusters for training/serving.
This company has previously operated under โAB Labsโ. A rename frequently marks a pivot in positioning or product โ useful raw material for variant ideas.
About
Goodbye Slurm, Hello Konduktor. Trainy Konduktor is a software platform for AI teams to schedule workloads with priority, control resource allocation, and improve GPU reliability. With Konduktor, teams submit jobs to a healthy pool of GPUs, assign job priority with a simple user interface, and never worry about hardware faults again.
Founders ยท 2
Studied CS & Mathematics at UC Santa Cruz. Led Audio team at Hive AI, where we trained and deployed 500M parameter-scale models to production.
Co-founder and CTO at Trainy building a training platform to make deep learning go faster. Previously a lead Machine Learning Engineer for Hive AI's object detection products. I completed my Physics Ph.D. UC Berkeley '22 where my thesis focused on applying computer vision and deep learning on nanoscience. Physics & Computer Science B.A. UC Berkeley '17.
Launch
Dashboards to help ML engineers training large models isolate performance bottlenecks and boost training speed.
Trainy builds a dashboard that aggregates profiling data across many GPUs to help ML engineers training large models identify performance bottlenecks, isolate slow GPUs, and optimize training speed by balancing computation, communication, and memory operations.
From the original launch (Jun 2023) โ may be outdated.
Related startups

Vercel For GPUs

Fast, reliable, reproducible AI with GPU live migration



