Towards Geo-Distributed Machine Learning
Ignacio Cano, Markus Weimer, Dhruv Mahajan, Carlo Curino and Giovanni Matteo Fumarola
Abstract
In large organizations, data is “born” in data centers all around the world. Learning requires a global view of such data. This new class of geo-distributed machine learning (GDML) applications need to cope with: 1) scarce and expensive cross-data center bandwidth, and 2) growing privacy concerns that are pushing for stricter data sovereignty regulations.
In this paper, we formalize this problem, show that the current state-of-the-art lacks proper support for GDML applications, and propose an initial system and algorithm that perform training in a geo-distributed fashion. Our empirical evaluation confirms the general validity of our approach, but many research challenges remain open.
An extended version of the paper is available on as arXiv:1603.09035
BibTeX
@inproceedings{cano2015geoml,
title={Towards Geo-Distributed Machine Learning},
author={Cano, Ignacio and Weimer, Markus and Mahajan, Dhruv and Curino,Carlo and Matteo Fumarola, Giovanni},
booktitle={Learning Systems Workshop at NIPS 2015},
year={2015}
}