
                                jaspextract 



Function

   Extract data from JASPAR

Description

   JASPAR is a collection of transcription factor DNA-binding
   preferences, modelled as matrices. These can be converted into
   Position Weight Matrices (PWMs or PSSMs), used for scanning genomic
   sequences.

   JASPAR is the only database with this scope where the data can be used
   with no restrictions (open-source).

   This program copies the 3 JASPAR matrix sets into the EMBOSS data
   directories, performing any necessary conversions.

   The home page of JASPAR is: http://jaspar.genereg.net/

   The EMBOSS program jaspscan will not work unless this program is run.

   Running this program may be the job of your system manager.

Usage

   Here is a sample session with jaspextract


% jaspextract 
Extract data from JASPAR
JASPAR database directory [.]: jaspar

   Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-directory]         directory  The directory containing the unzipped and
                                  untarred JASPAR_CORE, JASPAR_FAM and
                                  JASPAR_PHYLOFACTS subdirectories

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers: (none)
   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Input file format

   The input files are the uncompressed and extracted JASPAR_CORE.tgz,
   JASPAR_FAM.tgz and JASPAR_PHYLOFACTS.tgz files provided in the JASPAR
   MatrixDir download directory of the JASPAR homepage
   (http://jaspar.genereg.net).

   The files will usually be extracted in the same directory thereby
   creating a directory containing subdirectories JASPAR_CORE, JASPAR_FAM
   and JASPAR_PHYLOFACTS. Given a directory specification, jasparextract
   will search for these subdirectories and process the files in any that
   exist. jasparextract will warn the user if any of the three
   subdirectories do not exist.

Output file format

   The output file format is currently the same as the JASPAR
   distribution format.

  Output files for usage example

  Directory: JASPAR_CORE

   This directory contains output files, for example MA0070.pfm
   MA0071.pfm MA0072.pfm MA0073.pfm MA0074.pfm MA0075.pfm MA0076.pfm
   MA0077.pfm MA0078.pfm MA0079.pfm and matrix_list.txt.

  File: JASPAR_CORE/MA0070.pfm

 5  3 16  1  0 17 17  0  0 16 12  8
 6  9  1  1 18  1  0  0 18  1  0  2
 2  3  1  0  0  0  0  1  0  0  1  2
 5  3  0 16  0  0  1 17  0  1  5  6

  File: JASPAR_CORE/MA0071.pfm

15  9  6 11 21  0  0  0  0 25
 1  1 12  2  0  0  0  0 25  0
 2  0  4  5  4 25 25  0  0  0
 7 15  3  7  0  0  0 25  0  0

  File: JASPAR_CORE/MA0072.pfm

 9 17 15 35 23  2  0 28  0  0  0  0 36 15
 8  2  0  1  0 12  0  0  0  0  0 36  0  6
 8  7  3  0  0 13  0  8 36 36  0  0  0 10
11 10 18  0 13  9 36  0  0  0 36  0  0  5

  File: JASPAR_CORE/MA0073.pfm

 3  1  3  0  7  9  8  4  0 11  4  1  3  4  2  4  4  4  1  4
 8 10  8 11  4  2  3  6 11  0  7 10  8  6  9  5  5  6  7  4
 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  1  0  3  2
 0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  2  1  1  0  1

  File: JASPAR_CORE/MA0074.pfm

 3  0  0  0  0  9  4  2  2  5  0  0  1  0  7
 0  0  0  0  9  0  2  4  0  0  0  0  0  9  1
 7 10  9  0  0  1  0  2  8  5 10  0  0  0  2
 0  0  1 10  1  0  4  2  0  0  0 10  9  1  0

  File: JASPAR_CORE/MA0075.pfm

52 59  0  0 58
 2  0  0  0  0
 4  0  1  0  1
 1  0 58 59  0

  File: JASPAR_CORE/MA0076.pfm

16  0  0  0  0 20 16  4  1
 1 20 20  0  0  0  0  1  6
 2  0  0 20 20  0  0 15  0
 1  0  0  0  0  0  4  0 13

  File: JASPAR_CORE/MA0077.pfm

24 54 59  0 65 71  4 24  9
 7  6  4 72  4  2  0  6  9
31  7  0  2  0  1  1 38 55
14  9 13  2  7  2 71  8  3

  File: JASPAR_CORE/MA0078.pfm

 7  8  3 30  0  0  0  0  0
 9  8 18  0  1  0  0  0 17
 6  4  1  0  0  0 31  2 10
 9 11  9  1 30 31  0 29  4

  File: JASPAR_CORE/MA0079.pfm

1 2 0 0 0 2 0 0 1 2
1 1 0 0 5 0 1 0 1 0
4 4 8 8 2 4 5 6 6 0
2 1 0 0 1 2 2 2 0 6

  File: JASPAR_CORE/matrix_list.txt

MA0073  22.2782723704014        RREB1   ZN-FINGER, C2H2 ; acc "Q92766" ; medlin
e "8816445" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate
" ; total_ic "22.2800" ; type "SELEX"
MA0075  9.06306510239135        Prrx2   HOMEO   ; acc "Q06348" ; medline "79018
37" ; seqdb "SWISSPROT" ; species "Mus musculus" ; sysgroup "vertebrate" ; tota
l_ic "9.0620" ; type "SELEX"
MA0070  14.6408952002356        Pbx     HOMEO   ; acc "P40424" ; medline "79109
44" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; tota
l_ic "14.6440" ; type "SELEX"
MA0079  9.7185757452318 SP1     ZN-FINGER, C2H2 ; acc "P08047" ; medline "21923
57" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; tota
l_ic "9.7160" ; type "SELEX"
MA0077  9.07881462267179        SOX9    HMG     ; acc "P48436" ; medline "99736
26" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; tota
l_ic "9.0810" ; type "SELEX"
MA0078  10.5018372361999        Sox17   HMG     ; acc "Q61473" ; medline "86362
40" ; seqdb "SWISSPROT" ; species "Mus musculus" ; sysgroup "vertebrate" ; tota
l_ic "10.5030" ; type "SELEX"
MA0071  13.1897301896459        RORA    NUCLEAR RECEPTOR        ; acc "P35397"
; medline "7926749" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "ve
rtebrate" ; total_ic "13.1920" ; type "SELEX"
MA0072  17.4248426117905        RORA1   NUCLEAR RECEPTOR        ; acc "P35398"
; medline "7926749" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "ve
rtebrate" ; total_ic "17.4230" ; type "SELEX"
MA0074  20.4511671987138        RXR-VDR NUCLEAR RECEPTOR        ; acc "P11473"
; medline "8674817" ; seqdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "ve
rtebrate" ; total_ic "20.4520" ; type "SELEX"
MA0076  14.123230134165 ELK4    ETS     ; acc "P28324" ; medline "8524663" ; se
qdb "SWISSPROT" ; species "Homo sapiens" ; sysgroup "vertebrate" ; total_ic "14
.1230" ; type "SELEX"

  Directory: JASPAR_FAM

   This directory contains output files.

  Directory: JASPAR_PHYLOFACTS

   This directory contains output files.

Data files

   None

Notes

   The home page of REBASE is: http://jaspar.genereg.net/

   Running this program may be the job of your system manager.

References

    1. DNA binding sites: representation and discovery Bioinformatics.
       2000 Jan;16(1):16-23
    2. Applied bioinformatics for the identification of regulatory
       elements Nat Rev Genet. 2004 Apr;5(4):276-87

Warnings

   The program will warn you if any of the JASPAR_CORE, JASPAR_FAM or
   JASPAR_PHYLOFACTS subdirectories do not exist.

Diagnostic Error Messages

   None.

Exit status

   It always exits with status 0 unless an error is reported

Known bugs

   None.

See also

   Program name Description
   aaindexextract Extract amino acid property data from AAINDEX
   cutgextract Extract codon usage tables from from CUTG database
   printsextract Extract data from PRINTS database for use by pscan
   prosextract Processes the PROSITE motif database for use by
   patmatmotifs
   rebaseextract Process the REBASE database for use by restriction
   enzyme applications
   tfextract Process TRANSFAC transcription factor database for use by
   tfscan

Author(s)

   Alan Bleasby (ajb  ebi.ac.uk)
   European Bioinformatics Institute, Wellcome Trust Genome Campus,
   Hinxton, Cambridge CB10 1SD, UK

History

   Completed 23rd July 2007

Target users

   This program is intended to be used by administrators responsible for
   software and database installation and maintenance.

Comments

   None
