ConTra - Conserved transcription factor binding sites

To top This help page provides detailed descriptions of the different steps in the ConTra analysis process

Click on a topic below for more information:

Step 1 - Step 2 - Step 3 - Step 4 - Results - Exploration

Step 1: type of analysis and your input data

ConTra can perform two types of analyses depending on the question of the researcher (1).
Visualisation is the default option and allows the user to identify specific (conserved) transcription factor binding sites that may regulate the gene of interest.
The exploration option helps the scientist who has no idea which transcription factors (TFs) may regulate his or her gene of interest. In that case ConTra will first show a list of all possible TFs ranked by their binding probability to the regio of interest. This binding probability of a transcription factor (TF) is determined by a score that takes into account the number of predicted binding sites for that TF, the phylogenetic depth of each predicted site (~ defined as the number of other species in the alignment that have a predicted binding site for the same PWM in a window up to 200% of the ungapped reference site length on each side of the reference site) and the Information Content (IC) of the predicting PWM.
More information can be found in the ConTra paper in the Nucleic Acids Research Web Server issue of 2008.
An email address is optional. A link to the results will be sent to this email address.
The results page is reloaded until results are ready and can be bookmarked.

Since version 2 of ConTra (ConTrav2) it is possible to choose your reference organism (2)
and this will determine which UCSC multiz alignment will be used.
The following table indicates which alignment and species that will be used in the analysis.

Reference species	Common name	Genome version	Multiz alignment
Homo sapiens	human	hg19	multiz46way of 46 vertebrate genomes
Mus musculus	mouse	mm9	multiz30way of 30 vertebrate genomes
Bos taurus	cow	bosTau4	multiz5way: cow, dog, human, mouse, platypus
Gallus gallus	chicken	galGal3	multiz7way: chicken, human, mouse, rat, opossum, frog, zebrafish
X. tropicalis	clawed frog	xenTro2	multiz7way: frog, chicken, opossum, human, mouse, rat, zebrafish
Danio rerio	zebrafish	danRer6	multiz6way: zebrafish, tetraodon, stickleback, frog, mouse, human
D. melanogaster	fruit fly	dm3	multiz15way of 15 insects
C. elegans	roundworm	ce6	multiz6way of 6 worms
S. cerevisiae	baker's yeast	sacCer2	multiz7way of 7 yeast species

To specify your gene of interest (3) ConTra accepts the official name or symbol (HGNC, Entrez Gene), aliases or gene identifiers (Entrez Gene), RefSeq accession numbers (NM_,NR_) or Ensembl accession numbers (ENSG, ENST).
In the table below some examples of accepted terms:

Species	Gene symbol	Gene name	Aliases	NCBI Entrez Gene & RefSeqs IDs	Ensembl Gene (ENSG) & Transcript (ENST) IDs
Human	ATOH7	atonal homolog 7 (Drosophila)	bHLHa13 Math5	220202 NM_145178	ENSG00000179774 ENST00000373673
Human	CDH1	cadherin 1, type 1, E-cadherin (epithelial)	UVO CDHE ECAD LCAM Arc-1 CD324 CDH1	999 NM_004360	ENSG00000039068 ENST00000261769 ENST00000268794 ENST00000379120 ENST00000422392
X. tropicalis	wnt3	wingless-type MMTV integration site family, member 3	int4 wnt-3 wnt3l Xwnt-3 Xwnt3	100125198 NM_001103082	ENSXETG00000024231 ENSXETT00000052291 ENSXETT00000052292 ENSXETT00000052293

If the gene or transcript is relatively new and/or no multiz alignment is availble, it is possible to upload your own multiple alignment file (MAF). ConTra accepts MAFs in the UCSC maf format (example.maf), clustal format (example2.aln) or fasta format (example3.fasta). When a maf file is uploaded, step 2 and 3 will be skipped.

It is also possible to upload your own collection of positional weight matrices (PWMs) (optional). This collection will be used next to the built-in PWM libraries from JASPAR, TRANSFAC, phyloFACTS and PBM homeodomains.

Step 2: a transcript of your gene of interest

ConTra shows a list of genes that contain the name, symbol or identifier provided in the first step. The matching keyword is highlighted. A gene can encode several transcipts or isoforms. These can be regulated differently by the presence or absence of specific transcription factor binding sites in the transcript specific promoter, UTR or intron region. For every transcript that can be analyzed in ConTra the position of the transcription start site (TSS) is shown together with the number of introns and the identifier (RefSeq of Ensembl). The identifier is linked to the genomic view of this transcript in the UCSC or Ensembl genome browser.
Select the transcript to analyze (1) and click next (2).

Step 3: promoter, 5'UTR, 3'UTR or any intron

In the first version of ConTra it was only possible to analyze the promoter upstream of the TSS. In the new version ConTra v2 next to the promoter region also 5' UTR, 3'UTR or any intron can be analyzed (1). The promoter size upstream of the TSS can be specified. For UTRs and introns the entire region is being used in the analysis. After selecting the sequence parts clicking Next will proceed to the final step 4 (2).

Step 4: transcription factor bindings sites (TFBS) and stringency

One of four stringencies can be selected next to the "minimize false positives" option (1). This last option only works for PWMs selected in the TRANSFAC list. The different stringencies balance sensitivity and accuracy. With a core match = 0.85 and a matrix match = 0.70 the detection will be highly sensitive but less accurate and some false positives may be included. Consequently a core match = 1.00 and matrix match = 0.95 will be very accurate but less sensitive and may not show some true binding sites.

Position weight matrices or PWMs represent sequence motifs for transcription factor binding sites (TFBS).
For detecting TFBS ConTra uses PWMs from different databases (2): JASPAR and phyloFACTS, TRANSFAC and a Protein Binding Microarray (PBM) derived collection of homeodomain TF PWMs.
If own PWMs were uploaded in step 1 these will also be available for selection (2).

Up to 25 PWMs can be selected from the different databases (3). The lists of available PWMs are sorted alphabetically.
Use the Ctrl+F function of your web browser to find a specific PWM more easily.
Clicking a checkbox will select the PWM for analysis. For some transciption factors more than one PWM is available (e.g. E2F in TransFac: V$E2F_03 and E2F_Q6_01) and the "all" checkbox above will select them all.
Clicking the "To top" button will jump to the top of the page and allows a quick switch to a another list of PWMs from a different database.
To start the analysis click the "Run ConTra button (4).

Results

Depending on the size of the sequence parts (length of the sequences to scan),
depending on the number of sequences in the multiz alignment (reference organism)
and depending on the number of PWMs selected and on server load,
the execution time of an analysis can vary from a few minutes up to a couple of hours.
The intial results page will reload until the results are ready. Alternatively this page can be bookmarked. Note that results remain on the server for three weeks. If an email address was provided, a link to the result page will be sent to this address.

The result page shows results for every sequence part (promoter, UTRs, introns or own uploaded maf). The multiz alignments are divided in alignment blocks. For every block a jalview preview figure is shown and if TFBS were detected these are highlighted (1).
For every block there is a link to a separate result page (2).
Additionally a fasta file with the alignment and a feature color file with the TFBS and highlight colors is available (3). When these are downloaded and opened in jalview nice publication-worthy figures can be created (for a demo click here).

On the html page of a specific block the different PWMs can be visualized or hidden by clicking the checkboxes. Some binding sites might overlap and can only be partially visible. The binding sites corresponding with the first PWM in the list will be the most in the back (bottom layer). The last one in the list will be on top (front layer). Deselecting the checkboxes of the sites shown in the front will reveal the underlaying sites as illustrated in the example below.

Exploration

The exploration option helps the scientist who has no idea which transcription factors (TFs) may regulate his or her gene of interest. This analysis consists of the same 4 steps: selecting the gene of interest, selecting the transcript, selecting the sequence parts (promoter region, UTRs, introns) and selecting the PWMs. Step 4 is somewhat different than in the visualization process. Instead of specifying some PWMs the user has to select the databases to be used. The alignment will be analyzed by all PWMs from the selected databases and will therefore need a much longer execution time. For running one set of PWMs on the human reference multiz46way alignment the server will typically need about half an hour.

After the initial analysis ConTra will show a list of the putative TFBS ranked by their binding probability to the regio of interest. By default the top 20 is listed. Click expand to show the top 100. At the bottom of the first result page there is also a link to the rankedlist file which contains the results of all PWMs from the selected databases.
To visualize certain TFBS first select the checkboxes of choice (1) and continue by clicking the "Show TFBS positions" button (2). The following result page is the same as in a visualization analysis.

Examples

Gene (species)	TFBS	Reference	ConTra results
ATOH7 (Homo sapiens)	E-box	Del Bene et al, 2007

Demos

(a) Getting a multiple alignment file (MAF) of a genomic region in the UCSC genome browser for analysis in ConTra
11 slides - 3.8 MB

(b) Uploading and analyzing a multiple alignment file (MAF) in ConTra
5 slides - 1.0 MB

FAQs

Q: No transcripts were found by your query! ConTra cannot find your gene of interest.
A1: Please use the official name or symbol or a valid RefSeq of Ensembl accesion number.
A2: Although ConTra uses recent information of the UCSC genome browser and Entrez Gene database some genes are not annotated yet and not present in the gene list of ConTra. In this case please download a multiple alignment file (MAF) of the region you want to analyze from the UCSC genome browser to upload in step 1.

Q: We are sorry but UCSC genome has no correct maf available for the selected position.
A: For the selected reference organism and sequence part(s) of your gene of interest there is no sequence conservation and hence no maf available. Either try another reference organism for which data may be available or look in the UCSC Genome Browser to get a maf file whcih can be uploaded in ConTra (or the sequence of the reference organism in fasta format).

Q: The uploaded file is not a text file!
A: The format of an uploaded alignment file should be the UCSC MAF format, the Clustal ALN format or a valid (multi)fasta format. Please look at the examples at the end of Step 1 on this page for details.

For other questions or help contact us by sending an email to ConTra@irc.UGent.be

New search | Help | Contact

Site Menu

This help page provides detailed descriptions of the different steps in the ConTra analysis process