GitXplorerGitXplorer
a

annotateFromGFF

public
0 stars
0 forks
0 issues

Commits

List of commits on branch master.
Unverified
49221d70f1f73fcf4a7887a750534e66ee555275

added docstrings

aafrendeiro committed 10 years ago
Unverified
f1a2af61b783d2030883f34e539d4d8e43f3ebe1

added unix line terminator to output

aafrendeiro committed 10 years ago
Unverified
a74fc32d6a2ae50ff102565f9b6cbf75b73ddcfc

fixed bug when not using operon flag and csv writing in unix format

aafrendeiro committed 10 years ago
Unverified
c783146aab43fd3e6fb1d1ec56bdef5e1bc7937d

fixed output to have same number of fields in every line (gene in intergenic=".")

aafrendeiro committed 10 years ago
Unverified
8818dc025ccb27bdebf332d3424a13496c3640a0

added support for operons

aafrendeiro committed 10 years ago
Unverified
f5b9a666a90d15c037085d57ca7fcfe59eb56934

fixed syntax

aafrendeiro committed 10 years ago

README

The README file for this repository.

annotateFromGFF.py

Creates functional annotation of complete genome based on features on GFF file.

Reports coding-sequence (CDS) and untranslated-regions (UTR) and extracts features such as introns, intergenic space, TSSs and promoters. It also distinguishes between 5' and 3' UTRs.

Promoters and intergenic space are annotated dynamically based on a specified promoter size but rezisable to fit genome boundaries and small intergenic space. This can account for the occurence of operons, annotating intergenic space within operons accordingly.

Usage

python annotateFromGFF.py [OPTIONS] file.gff chrmSizes.tsv > annotation.bed

Positional arguments (required)

gff - GFF file with annotation.

chrmFile - Tab-delimited file with sizes of each chromossome (chr:size).

Optional arguments

-o, --outfile - Specifies the name of the output file. If not specified, will output to stdout.

-p, --promoterSize - Average size of promoter elements. Dynamically resizable.

-op, --operons - Consider operons. In this mode, the promoters and TSSs of consecutive genes in the same orientation separated by less than the distance specified by --operonDistance will not be annotated, and this distance is annotated as intergenic space.

--operonDistance - Distance between genes to classify as belonging to same operon.

-l, --logfile - Specify the name of the log file.

-s - Silent behaviour. Don't make log file.

Promoter annotation

Promoters are defined as a region upstream of transcription start sites (TSSs) by a fixed length (-p argument) but are resized to less if that distance would overlap a chromossome boundary or other gene.

Intergenic regions are resized accordingly to the complementary space between two genes minus the promoter space.

If considering operons (option --operons), intergenic space within operons will be annotated accordingly, excluding promoter and TSS annotations within operons.