NAME

gff3_to_ucsc_table.pl

A script to convert a GFF3 file to a UCSC style gene table

SYNOPSIS

gff3_to_ucsc_table.pl [--options...] <filename>

Options:
--in <filename>
--out <filename> 
--bin
--gz
--version
--help

OPTIONS

The command line flags and descriptions:

--in <filename>

Specify the input GFF3 file. The file may be compressed with gzip.

--out <filename>

Specify the output filename. By default it uses input file base name appened with '_ucsc_genetable.txt'.

--bin

Specify whether the UCSC table-specific column bin should be included as the first column in the table. This column is reserved for internal UCSC database use, and, if included here, will simply be populated with 0s. The default behavior is to not include it.

--gz

Specify whether (or not) the output file should be compressed with gzip. The default is to mimic the status of the input file

--version

Print the version number.

--help

Display this POD documentation.

DESCRIPTION

This program will convert a GFF3 annotation file to a UCSC-style gene table, similar to that obtained through the UCSC Table Browser. Specifically, it matches the format of the refGene (RefSeq Genes) and ensGene (Ensembl Genes) tables.

It will assemble the exon starts, stops, and frames from the defined transcript features in the GFF3 file. This assumes the standard parent->child relationship using the primary tags of gene -> mRNA -> [CDS, five_prime_utr, three_prime_utr]. Additional features (exon, start_codon, stop_codon, transcript) will be safely ignored.

It will also process non-coding transcripts, including all non-coding RNAs; all subfeatures of non-coding RNAs will be considered as exons.

The cdsStartStat and cdsEndStat fields are populated depending on whether five- or three-prime UTRs exist; this may or may not reflect the actual status according to UCSC.

For very large GFF3 files, it is helpful to include close feature directive pragmas (lines with ###) after the annotation for each reference sequence (see the GFF3 specification at http://www.sequenceontology.org/resources/gff3.html). Fasta sequence in the GFF3 file is ignored.

AUTHOR

Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0.