Ignacio Cano, Markus Weimer, Dhruv Mahajan, Carlo Curino and Giovanni Matteo Fumarola

Abstract

In large organizations, data is “born” in data centers all around the world. Learning requires a global view of such data. This new class of geo-distributed machine learning (GDML) applications need to cope with: 1) scarce and expensive cross-data center bandwidth, and 2) growing privacy concerns that are pushing for stricter data sovereignty regulations.

In this paper, we formalize this problem, show that the current state-of-the-art lacks proper support for GDML applications, and propose an initial system and algorithm that perform training in a geo-distributed fashion. Our empirical evaluation confirms the general validity of our approach, but many research challenges remain open.

Download PDF

An extended version of the paper is available on as arXiv:1603.09035

BibTeX

@inproceedings{cano2015geoml,
  title={Towards Geo-Distributed Machine Learning},
  author={Cano, Ignacio and Weimer, Markus and Mahajan, Dhruv and Curino,Carlo and Matteo Fumarola, Giovanni},
  booktitle={Learning Systems Workshop at NIPS 2015},     
  year={2015}
}