AI JOB
Software Engineer, Inference - Performance Optimization
OpenAI
San Francisco
Remote
Posted:
Description
Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and drive faster, cheaper inference. We combine systems profiling, benchmarking, and analysis to understand where time and cost are spent, then turn that understanding into performance optimizations and models that project performance and capacity needs for future launches. In this role, you will model inference performance across application, model, and fleet layers with higher fidelity...