Association for Computing Machinery (ACM), Operating Systems Review, 1(40), p. 65-74, 2006
Full text: Unavailable
CoMon is an evolving, mostly-scalable monitoring system for PlanetLab that has the goal of presenting environment- tailored information for both the administrators and users of the PlanetLab global testbed. In addition to passively repo rt- ing metrics provided by the operating system, CoMon also actively gathers a number of metrics useful for developers of networked systems. Using CoMon, PlanetLab administra- tors and users can easily spot problematic machines, where the problem may arise from the machine itself, local config- uration/environment problems, or the workload running on the machine. Furthermore, users can easily observe many properties of all of the experiments running across multiple PlanetLab nodes, facilitating not only their own experiment monitoring and debugging, but also helping scale the task of finding PlanetLab problems. In this paper we describe CoMon's design and operation, including what kinds of data are gathered, the scale of the pro- cessing involved, and the approaches we have taken to keep CoMon running. Our goal is not only to illustrate the kinds of problems faced in this environment, but also to invite others to participate, either by experimenting with the data gener ated by CoMon, or by building on the CoMon system itself.