Published in

Taylor and Francis Group, Journal of Computational and Graphical Statistics, 1(17), p. 243-262

DOI: 10.1198/106186008x287517

Links

Tools

Export citation

Search in Google Scholar

Delineation of Irregularly Shaped Disease Clusters Through Multiobjective Optimization

Journal article published in 2008 by Luiz Duczmal, André L. F. Cançado ORCID, Ricardo H. C. Takahashi
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Red circle
Preprint: archiving forbidden
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Irregularly shaped spatial disease clusters occur commonly in epidemiological stud-ies, but their geographic delineation is poorly defined. Most current spatial scan soft-ware usually displays only one of the many possible cluster solutions with different shapes, from the most compact round cluster to the most irregularly shaped one, cor-responding to varying degrees of penalization parameters imposed on the freedom of shape. Even when a fairly complete set of solutions is available, the choice of the most appropriate parameter setting is left to the practitioner, whose decision is of-ten subjective. We propose quantitative criteria for choosing the best cluster solution, through multiobjective optimization, by finding the Pareto-set in the solution space. Two competing objectives are involved in the search: regularity of shape and scan statistic value. Instead of running sequentially a cluster-finding algorithm with varying degrees of penalization, the complete set of solutions is found in parallel, employing a genetic algorithm. The cluster significance concept is extended for this set in a natu-ral and unbiased way, being employed as a decision criterion for choosing the optimal solution. The Gumbel distribution is used to approximate the empirical scan statistic distribution, speeding up the significance estimation. The multiobjective methodology is compared with the genetic mono-objective algorithm. The method is fast, with good power of detection. We discuss an application to breast cancer cluster detection. The introduction of the concept of Pareto-set in this problem, followed by the choice of the most significant solution, is shown to allow a rigorous statement about what is a "best solution," without the need of any arbitrary parameter.