Microarray Probe Set Mappings
Microarray Probeset Mapping
Ensembl aims to associate micro array probe set identifiers to Ensembl transcript models in a two-step procedure.
Genome Sequence Mapping
In the first step individual probes (oligonucleotides) are mapped to the genome sequence. The Ensembl analysis and annotation pipeline uses the exonerate sequence comparison and alignment tool (Slater et al., 2005) and tolerates only 1 bp mismatch between the probe and the genome sequence assembly. Probes that hit to 100 or more locations (e.g. suspected Alu repeats) are discarded and not stored in the database. Ensembl 'ContigView' displays individual probes that match to the current assembly.
Ensembl Transcript Mapping
In the second step, we aim to associate microarray probe sets with Ensembl transcript predictions (ENST...). Individual probes are grouped into probe sets and generally it is required that more than 50% of the probes in a probe set hit a given transcript sequence. Probe set sizes are determined dynamically on a per probe set basis, rather than taking the documented array wide value. Transcript sequences are defined by the cDNA sequence including UTRs, where annotated UTRs are absent a default UTR length is used. Defaults are calculated for both five and three prime UTRs as the highest of either the mean or the median of all UTRs for a given species. Probes mapping across exon boundaries are not currently captured as the transcript annotations are based on the genomic mappings from step one. Microarray probe sets matching to a transcript can be seen in the 'Similarity Matches' section of 'GeneView' or 'TransView'pages.