American Astronomical Society, Astronomical Journal, 2(137), p. 3109-3138, 2009
DOI: 10.1088/0004-6256/137/2/3109
Full text: Download
(abridged) We develop an algorithm for estimating parameters of a distribution sampled with contamination, employing a statistical technique known as ``expectation maximization'' (EM). Given models for both member and contaminant populations, the EM algorithm iteratively evaluates the membership probability of each discrete data point, then uses those probabilities to update parameter estimates for member and contaminant distributions. The EM approach has wide applicability to the analysis of astronomical data. Here we tailor an EM algorithm to operate on spectroscopic samples obtained with the Michigan-MIKE Fiber System (MMFS) as part of our Magellan survey of stellar radial velocities in nearby dwarf spheroidal (dSph) galaxies. These samples are presented in a companion paper and contain discrete measurements of line-of-sight velocity, projected position, and Mg index for ~1000 - 2500 stars per dSph, including some fraction of contamination by foreground Milky Way stars. The EM algorithm quantifies both dSph and contaminant distributions, returning maximum-likelihood estimates of the means and variances, as well as the probability that each star is a dSph member. Applied to our MMFS data, the EM algorithm identifies more than 5000 probable dSph members. We test the performance of the EM algorithm on simulated data sets that represent a range of sample size, level of contamination, and amount of overlap between dSph and contaminant velocity distributions. The simulations establish that for samples ranging from large (N ~3000) to small (N~30), the EM algorithm distinguishes members from contaminants and returns accurate parameter estimates much more reliably than conventional methods of contaminant removal (e.g., sigma clipping). ; Comment: Accepted for publication in The Astronomical Journal. Download pdf with full-resolution figures from http://www.ast.cam.ac.uk/~walker/dsph_em.pdf