NAME
change_chr_prefix.pl
A script will add/remove chromosome name prefixes.
SYNOPSIS
change_chr_prefix.pl [--add | --strip] [--options...] <filename>
Options:
--in <filename>
--out <filename>
--add
--strip
--roman
--arabic
--prefix <text>
--contig
--gz
--version
--help
OPTIONS
The command line flags and descriptions:
- --in <filename>
-
Specify the input file. Supported file types include Bam, Sam, Bed, GFF, Fasta, or other tab-delimited text files. Text-based files may be compressed with gzip.
- --out <filename>
-
Specify the output filename. By default it uses the input base name, appended with either _chr or nochr.
- --add
- --strip
-
Specify the renaming action. One or the other must be specified. The add action will prefix simple chromosome names (one to four characters) with the prefix, while the strip action will remove the offending prefix.
- --roman
-
Convert arabic numerals (1, 2 ... 30) to Roman numerals (I, II ... XXX). Up to 30 is renamed, all others are ignored.
- --arabic
-
Convert Roman numerals (I, II, ... XXX) to Arabic numerals (1, 2 ... 30). Only upper case are recognized. Higher numbers are ignored.
- --prefix <text>
-
Specify the chromosome prefix. The default value is 'chr'.
- --contig
-
Indicate whether contig and scaffold names should be included in the renaming process. These are recognized by the text 'contig', 'scaffold', or 'NA' in the name. The default value is false.
- --gz
-
Specify whether (or not) the output text file should be compressed with gzip.
- --version
-
Print the version number.
- --help
-
Display this POD documentation.
DESCRIPTION
This program will re-name chromosome names in a data file. Supported data formats include Bam and Sam alignment files, GFF and BED feature files, Fasta sequence files, wig and bedgraph files, and any other tab-delimited text files.
Re-naming consists of either adding or stripping a prefix from the chromosome name. Some genome repositories prefix their chromosome names with text, most commonly 'chr', while other repositories prefer bare numbers, or Roman numerals. UCSC and Ensembl are two good examples. Mixing and matching annotation from different authorities requires matching chromosome names.
Be careful with the conversions, and check carefully. Mitochondrial chromosomes or other funny named chromosomes may need to be changed manually.
AUTHOR
Timothy J. Parnell, PhD
Howard Hughes Medical Institute
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0.