Full text: Download
Abstract Background Copy number variations (CNVs) are an important type of structural variations in the genome that usually affect gene expression levels by gene dosage effect. Understanding CNVs as part of genome evolution may provide insights into the genetic basis of important agricultural traits and contribute to the crop breeding in the future. While available methods to detect CNVs utilizing next-generation sequencing technology have helped shed light on prevalence and effects of CNVs, the complexity of crop genomes poses a major challenge and requires development of additional tools. Results Here, we generated genomic and transcriptomic data of 93 rice (Oryza sativa L.) accessions and developed a comprehensive pipeline to call CNVs in this large-scale dataset. We analyzed the correlation between CNVs and gene expression levels and found that approximately 13% of the identified genes showed a significant correlation between their expression levels and copy numbers. Further analysis showed that about 36% of duplicate pairs were involved in pseudogenetic events while only 5% of them showed functional differentiation. Moreover, the offspring copy mainly contributed to the expression levels and seemed more likely to become a pseudogene, whereas the parent copy tended to maintain the function of ancestral gene. Conclusion We provide a high-accuracy CNV dataset that will contribute to functional genomics studies and molecular breeding in rice. We also showed that gene dosage effect of CNVs in rice is not exponential or linear. Our work demonstrates that the evolution of duplicated genes is asymmetric in both expression levels and gene fates, shedding a new insight into the evolution of duplicated genes.