Upload your data - DASConfView
Introduction to Uploading
There are multiple ways to upload data in Ensembl. Please view the "External Data" chapter in the "Ensembl Information" section.
Upload to ContigView or CytoView using the 'URL' based upload (see the article 'Upload your Data - URL') Mark your data on a karyotype or chromosome using the Karyoview web interface. Or, upload using DAS (explained here).
Introduction to DAS
The Distributed Annotation System (DAS) is a client-server system that has been initially designed for the exchange of genome annotation information based on a standard protocol. DAS provides thus a standardised method to serve custom annotation information and to integrate data sets for display in other resources. Ensembl 'DASConfView' allows attachment and configuration of external DAS sources to Ensembl genome browser displays.
While GenomeDAS exchanges annotation on the basis of reference sequences in chromosome or clone coordinate systems, GeneDAS and ProteinDAS are extensions to the DAS protocol and are used to exchange gene and protein annotation independent of genomic location information. Currently, UniProt/Swiss-Prot records serve as sequence reference.
Several DAS resources provided by the EBI and the Wellcome Trust Sanger Institute are already available and pre-configured by default. Further annotation information resources can be integrated into Ensembl by either attaching a valid data source provided by a DAS annotation server or by uploading your data into the DAS server maintained by the Ensembl project.
The Ensembl data upload facility lets you display your own data sets within the Ensembl system. You may then share this data source with others or keep the data source private to yourself.
Please read and understand the Ensembl Data Upload Disclaimer, before uploading any data set to Ensembl.
How to Upload Data
To upload data, begin by selecting Manage Sources from the "DAS Sources" menu of Ensembl 'ContigView', 'GeneView', 'ProtView' or 'CytoView' genome sequence displays.
Next, click 'Upload your data'. If you already have a DAS server, then use 'Add Data Source'.
Enter a NONSECURE email address and a password. This is used in the case Ensembl must delete a DAS source from its data servers- you will receive a request by email if this is the case. Also, you will need it to later change or edit your data source.
Paste your data into the box or upload a file. Please pay attention to the format:
Data format
There are two versions of the upload format; one simple and one slightly more complex.
The simple upload format is based on the format used by the Lightweight DAS Server (LDAS). This format may be used for annotations on genomic displays, such as ContigView.
To use this format, create a text file containing your feature data with the following tab-separated column format:
| Column | Data type | |
|---|---|---|
| 1 | Group class | e.g. gene |
| 2 | Group name | e.g. ABCD1 |
| 3 | Type | e.g. exon |
| 4 | Subtype | e.g. curated |
| 5 | Chromosome | e.g. X |
| 6 | Start position | e.g. 152511170 |
| 7 | End position | e.g. 152512468 |
| 8 | Strand | e.g. + |
| 9 | Phase | e.g. . |
| 10 | Score | e.g. 100 |
| 11 | Similarity alignment range !!!OPTIONAL!!! | e.g. 1 |
| 12 | Similarity alignment range !!!OPTIONAL!!! | e.g. 1299 |
The example above in one tab-delimited line:
Gene ABCD1 exon curated X 152511170 152512468 + . 1.0e-12 1 1299
See additional information in the annotations section of the LDAS documentation.
The slightly more complicated format is an extension of this format and must be used for any annotation that is to be displayed on ProtView or GeneView. Detailed information about this format may be found in the pdf located here.
Data Appearance
After uploading data using the first page of the DAS Wizard, select the page(s) in which you would like the data to appear. Click 'next' to name the data track, to select the text colour and more features.To link back to an external url from the data source (once it is displayed) enter in a url in the 'link to' section using this format: http://my.link.com/script?feature=###id###
Select 'Finish,' which will refresh the page you started from. You should be able to see your DAS track on the page, and to deselect it in the 'DAS Sources' roll-down menu.
Coordinate Systems
Ensembl understands either chromosome or (finished) BAC clone coordinates in uploaded data sets. You should use either one of these when uploading data into Ensembl.
Chromosome Coordinate-based Data
Chromosome coordinates are the easiest to work with since features may be annotated across clone boundaries. However, this coordinate system is unstable and will change with each new genome sequence assembly build. Therefore, if you use chromosome coordinates for uploaded data you will have to delete the data source and re-upload it in the new coordinate system every time the genome sequence assembly is updated.
Download this test data file for a more extensive chromosome-based example.
BAC Clone Coordinate-based Data
BAC clone coordinates are generally more stable and will be stable across genome sequence assembly builds provided the clone version number has not changed. However you cannot annotate features that span the boundary of two clones, although you can create "split-features" that annotate portions on two or more clones.
You can only "annotate" finished clones, for which a single contig sequence spanning the full clone length is available. If you provide data on unfinished clones, which contain more than one contig sequence, they will be ignored.
First enter any chromosomal and/or clone coordinate data you wish to display in the same tab-separated column format, as explained in the section above.
Next you must provide a list of clones you have annotated without version suffixes, their type, usually just "Clone" and their total lengths in base pairs. This information needs to appear below the feature data and separated from it using the special heading "[references]" on a line of its own.
eg:
Similarity AL137655.1.1 homology wutblastn AP000869.2 4260 4340 + . 373.0000 Similarity Hs.326048.2 homology wutblastn AP000869.2 5406 5534 + . 373.0000 [references] AP000869 Clone 159840 AP001267 Clone 239314
Note: The [references] section must appear last in the file, after all features.
You can find a list of the clones used in the current genome sequence assembly by dumping genes and clones from Ensembl 'CytoView'.
Download this test data file for a more extensive clone-based example.
To allow others to see the data set you have uploaded simply send them the data source ID. They may then use this ID to attach the data to their displays. Do not send them the password for the data source unless you want them to have permission to administer the source (i. e. delete it).
Optionally, a DAS source name (DSN), which Ensembl issued after a previous upload may be provided, if a data set needs re-attachment, amendment or replacement. Select the path to a file on your local machine, which contains the annotation information. To reduce network transmission time for large data volumes you may compress the data file with GNU gzip, bzip2 or UNIX compress. Alternatively, data sets could be directly pasted into the box provided.
A delete button allows for removing DAS sources no longer in use.
Data Upload Disclaimer
Although your data source is allocated a unique ID and any management operations on the data (e. g. deleting) require a password, this is only trivial security. Data uploads are made on an open, non-secure network connection. If your data is sensitive you should not upload it but contact Ensembl for more information.
While we undertake everything to not make uploaded data available to any person other than the originator we DO NOT provide any assurance whatsoever concerning data security and/or privacy.
If you are concerned about data privacy you should consider setting up your own secure DAS server.
