Nature Research, Nature, 7354(475), p. 101-105, 2011
DOI: 10.1038/nature10113
Full text: Download
Chronic lymphocytic leukaemia (CLL), the most frequent leuk-aemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution 1,2 . Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes 3,4 . The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene func-tion. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immu-noglobulins. The patterns of somatic mutation, supported by func-tional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our know-ledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the iden-tification of clinically relevant mutations in cancer. To gain insights into the molecular alterations that cause CLL, we performed whole-genome sequencing of four cases representative of different forms of the disease: two cases, CLL1 and CLL2, with no mutations in the immunoglobulin genes (IGHV-unmutated) and two cases, CLL3 and CLL4, with mutations in these genes (IGHV-mutated) (Supplementary Table 1 and Supplementary Information). We used a combination of whole-genome sequencing and exome sequencing, as well as long-insert paired-end libraries, to detect variants in chromo-somal structure (Supplementary Fig. 1 and Supplementary Tables 2–5). We obtained more than 99.7% concordance between whole-genome sequencing calls and genotyping data, indicating that the coverage and parameters used were sufficient to detect most of the sequence variants in these samples (Supplementary Information). We detected about 1,000 somatic mutations per tumour in non-repetitive regions (Fig. 1a, Supplementary Fig. 2 and Supplementary Table 6). These numbers of somatic mutations were lower than the numbers in mela-noma and lung carcinoma 5,6 , but in agreement with previous estimates of less than one mutation per megabase (Mb) for leukaemias 7 . The most common substitution was the transition G.A/C.T, usually occurring in a CpG context (Fig. 1b and Supplementary Fig. 2). We also detected marked differences in the mutation pattern between CLL samples and these differences were associated with tumour subtype (Fig. 1b). Thus, IGHV-mutated cases showed a higher proportion of A.C/T.G muta-tions than cases with unmutated IGHV (16 6 0.2% versus 6.2 6 0.1%). The base preceding the adenine in A to C transversions showed an over-representation of thymine, when compared to the prevalence expected from its representation in non-repetitive sequences in the wild-type genome (P , 0.001, Fig. 1c), and there were fewer A to C substitutions at GpA dinucleotides than would be expected by chance (P , 0.001). These differences between CLL subtypes might reflect the molecular mechanisms implicated in their respective development. The pattern and context of mutations are consistent with their being introduced by the error-prone polymerase g during somatic hypermutation in immunoglobulin genes 8 . This indicates that polymerase g could con-tribute to the high frequency of A . T to C . G transversions in cases with IGHV-mutated. It also extends the differences observed between these two CLL subtypes to the genomic level. We classified the somatic mutations into three different classes according to their potential functional effect (Supplementary Informa-tion). We also searched for small insertions and deletions (indels) in coding regions: we found and validated five somatic indels, which caused frameshifts in protein-coding regions (Supplementary Table 7).