NAME
Unicode::Util - Unicode-aware versions of built-in Perl functions
VERSION
This document describes Unicode::Util version 0.02.
SYNOPSIS
use Unicode::Util qw( graph_length code_length byte_length );
# grapheme cluster ю́: Cyrillic small letter yu + combining acute accent
my $grapheme = "\x{44E}\x{301}";
say graph_length($grapheme); # 1
say code_length($grapheme); # 2
say byte_length($grapheme); # 4
DESCRIPTION
This module provides additional versions of Perl’s built-in functions, tailored to work on three different units:
graph: Unicode extended grapheme clusters (graphemes)
code: Unicode codepoints
byte: 8-bit bytes (octets)
This is an early release and this module is likely to have major revisions. Only the length-functions are currently implemented. See the "TODO" section for planned future additions.
FUNCTIONS
- graph_length($string)
-
Returns the length in graphemes of the given string. This is likely the number of “characters” that many people would count on a printed string, plus non-printing characters.
- code_length($string)
-
Returns the length in codepoints of the given string. This is likely the number of “characters” that many programmers and programming languages would count in a string.
- byte_length($string)
-
Returns the length in bytes of the given string encoded as UTF-8. This is the number of bytes that many computers would count when storing a string.
- graph_chop($string)
-
Chops off the last grapheme of the given string and returns the grapheme chopped.
- code_chop($string)
-
Chops off the last codepoint of the given string and returns the codepoint chopped.
TODO
Evaluate the following core Perl functions and operators for the potential addition to this module.
reverse, split, substr, index, rindex, eq, ne, lt, gt, le, ge, cmp
SEE ALSO
The length-functions are based on methods provided by Perl6::Str.
AUTHOR
Nick Patch <patch@cpan.org>
COPYRIGHT AND LICENSE
© 2011–2012 Nick Patch
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.