Springer Verlag, Lecture Notes in Computer Science, p. 139-155
DOI: 10.1007/978-3-540-73255-6_13
Full text: Download
D-Nuclear magnetic resonance (NMR) spectroscopy is a powerful analytical method to elucidate the chemical structure of mole- cules. In contrast to 1D-NMR spectra, 2D-NMR spectra correlate the chemical shifts of 1H and 13C simultaneously. To curate or merge large spectra libraries a robust (and fast) duplicate detection is needed. We propose a deflnition of duplicates with the desired robustness properties mandatory for 2D-NMR experiments. A major gain in runtime perfor- mance wrt. previously proposed heuristics is achieved by mapping the spectra to simple discrete objects. We propose several appropriate data transformations for this task. In order to compensate for slight variations of the mapped spectra, we use appropriate hashing functions according to the locality sensitive hashing scheme, and identify duplicates by hash- collisions.