Yunseong Lee, Alberto Scolari, Matteo Interlandi, Markus Weimer, Byung-Gon Chun

Abstract

Machine Learning models are often composed of sequences of transformations. While this design makes it easy to decompose and efficiently execute single model components at training time, predictions require low latency and high-performance predictability whereby end-to-end and multi-model runtime optimizations are eeded to meet such goals. This paper sheds some light on the problem by introducing a new system design for high-performance prediction serving. We report some preliminary results showing how our system design is able to improve performance over several dimensions with respect to current state-of-the-art approaches.

Download PDF

BibTeX

@inproceedings{lee2017scoring,
  title={Towards High-Performance Prediction Serving Systems},
  author={Yunseong Lee, and Alberto Scolari, and Matteo Interlandi, and Markus Weimer, and Byung-Gon Chun},
  booktitle={NIPS Machine Learning Systems Workshop},
  year={2017}
}