Yunseong Lee, Alberto Scolari, Matteo Interlandi, Markus Weimer, Byung-Gon Chun


Machine Learning models are often composed of sequences of transformations. While this design makes it easy to decompose and ef´Čüciently execute single model components at training time, predictions require low latency and high-performance predictability whereby end-to-end and multi-model runtime optimizations are eeded to meet such goals. This paper sheds some light on the problem by introducing a new system design for high-performance prediction serving. We report some preliminary results showing how our system design is able to improve performance over several dimensions with respect to current state-of-the-art approaches.

Download PDF


  title={Towards High-Performance Prediction Serving Systems},
  author={Yunseong Lee, and Alberto Scolari, and Matteo Interlandi, and Markus Weimer, and Byung-Gon Chun},
  booktitle={NIPS Machine Learning Systems Workshop},