NAME

Proch::N50 - a small module to calculate N50 (total size, and total number of sequences) for a FASTA or FASTQ file. It's easy to install, with minimal dependencies.

VERSION

version 1.5.0

SYNOPSIS

use Proch::N50 qw(getStats getN50);
my $filepath = '/path/to/assembly.fasta';

# Get N50 only: getN50(file) will return an integer
print "N50 only:\t", getN50($filepath), "\n";

# Full stats
my $seq_stats = getStats($filepath);
print Data::Dumper->Dump( [ $seq_stats ], [ qw(*FASTA_stats) ] );
# Will print:
# %FASTA_stats = (
#               'N50' => 65,
#               'N75' => 50,
#               'N90' => 4,
#               'min' => 4,
#               'max' => 65,
#               'dirname' => 'data',
#               'auN' => 45.02112,
#               'size' => 130,
#               'seqs' => 6,
#               'filename' => 'test.fa',
#               'status' => 1
#             );

# Get also a JSON object
my $seq_stats_with_JSON = getStats($filepath, 'JSON');
print $seq_stats_with_JSON->{json}, "\n";
# Will print:
# {
#    "status" : 1,
#    "seqs" : 6,
#    <...>
#    "filename" : "small_test.fa",
#    "N50" : 65,
# }
# Directly ask for the JSON object only:
my $json = jsonStats($filepath);
print $json;

NAME

Proch::N50 - a small module to calculate N50 (total size, and total number of sequences) for a FASTA or FASTQ file. It's easy to install, with minimal dependencies.

VERSION

version 1.4.2

METHODS

getN50(filepath)

This function returns the N50 for a FASTA/FASTQ file given, or 0 in case of error(s).

getStats(filepath, alsoJSON)

Calculates N50 and basic stats for <filepath>. Returns also JSON if invoked with a second parameter. This function return a hash reporting:

jsonStats(filepath)

Returns the JSON string with basic stats (same as $result->{json} from getStats(File, JSON)). Requires JSON::PP installed.

_n50fromHash(hash, totalsize)

This is an internal helper subroutine that perform the actual N50 calculation, hence its addition to the documentation. Expects the reference to an hash of sizes $size{SIZE} = COUNT and the total sum of sizes obtained parsing the sequences file. Returns N50, min and max lengths.

Dependencies

Module (N50.pm)

Stantalone program (n50.pl)

SUPPORT

This module and the n50 program have limited support. SeqFu (https://telatin.github.io/seqfu2) is a compiled suite of utilities that includes a seqfu stats module, a faster replacement for the n50 program.

If you are interested in contributing to the development of this module, or in reporting bugs, please refer to repository https://github.com/telatin/proch-n50/issues.

AUTHOR

Andrea Telatin andrea@telatin.com

COPYRIGHT AND LICENSE

This software is Copyright (c) 2018-2022 by Andrea Telatin.

This is free software, licensed under:

The MIT (X11) License