Chromosome-scale assembly and annotation of 13 diverse tomato accessions
Project overview |
Michael Alonge, Srividya Ramakrishnan, Sebastian Soyk, Xingang Wang, Matthias Benoit, Zachary B. Lippman, Michael C. Schatz
The research groups of Michael C. Schatz and Zachary B. Lippman at Johns Hopkins University and Cold Spring Harbor Laboratory, respectively, have generated genome assemblies and associated gene annotations for 13 diverse tomato accessions. These assemblies and annotations, each with their own independent versioning, are beta pre-releases to the community.
This large dataset is being made available for research under the "Toronto statement", which outlines rules for pre-publication data sharing, under which we, the authors, reserve the right to publish the first analyses of the data, which includes descriptions of whole chromosome or genome-level analyses of genes, variants, gene families, repetitive elements, and comparisons with other organisms
The following accessions have been assembled and annotated and are included in this release.
- Brandywine
- M82
- Floradade
- EA00371
- EA00990
- PAS014479
- BGV006775
- BGV006865
- BGV007989
- BGV007931
- PI303721
- PI169588
- LYC1410
Genes were annotated by "lifting-over" a combination of Heinz ITAG 4.0 annotation and pan-genome genes (Gao et. al. 2019) onto the new assemblies. The in silico cDNA for each gene was aligned to assemblies with GMAP (version 2018-07-04) and Minimap2 (v2.16-r922). High confidence alignments were then used to annotate genes in our new assemblies.
Annotated genes were assigned the same gene ids as their homologs in ITAG4.0. Annotated duplicate copy of a gene has the ITAG4.0 gene id with a '-c num' appended to their ids to keep them unique.
These genome assemblies are beta versions, and we have released these data pre-publication in anticipation of immediate value for the Solanaceae and large plant biology communities. With this in mind, we note that none of the assemblies have been screened for bacterial contamination, and no formal assessment of assembly consensus or structural accuracy has been performed. Finally, though we have done some initial tests and QC on the gene models, we can make no formal claims about their accuracy. These data were initially produced to aid our study of structural variants in tomato, however, forthcoming genome versions will seek to improve all aspects of assembly, accuracy, and annotation in order to serve as general reference-quality resources.
For further information, please contact Zach B. Lippman or Michael C. Schatz