NAME
Unicode::Util - Unicode grapheme-level versions of built-in Perl functions
VERSION
This document describes Unicode::Util version 0.07.
SYNOPSIS
use Unicode::Util qw( grapheme_length grapheme_reverse );
# grapheme cluster ю́ (Cyrillic small letter yu, combining acute accent)
my $grapheme = "\x{044E}\x{0301}";
say length($grapheme); # 2 (length in code points)
say grapheme_length($grapheme); # 1 (length in grapheme clusters)
# Spın̈al Tap; n̈ = Latin small letter n, combining diaeresis
my $band = "Sp\x{0131}n\x{0308}al Tap";
say scalar reverse $band; # paT länıpS
say grapheme_reverse($band); # paT lan̈ıpS
DESCRIPTION
This module provides Unicode grapheme cluster–level versions of Perl’s built-in string functions, tailored to work on grapheme clusters as opposed to code points or bytes.
This is an early release and major revisions are planned for the near future.
FUNCTIONS
Functions may each be exported explicitly or by using the :all tag for everything.
- grapheme_length($string)
-
Returns the length of the given string in grapheme clusters. This is the closest to the number of “characters” that many people would count on a printed string.
- grapheme_chop($string)
-
Returns the given string with the last grapheme cluster chopped off. Does not modify the original value, unlike the built-in
chop. - grapheme_reverse($string)
-
Returns the given string value with all grapheme clusters in the opposite order.
TODO
grapheme_substr, graphem_index, grapheme_rindex, canonical_eq, compatibility_eq
SEE ALSO
Unicode::GCString, String::Multibyte, Perl6::Str, http://perlcabal.org/syn/S32/Str.html
AUTHOR
Nick Patch <patch@cpan.org>
COPYRIGHT AND LICENSE
© 2011–2013 Nick Patch
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.