NAME
WAIT::Filter - Perl extension providing the basic freeWAIS-sf reduction functions
SYNOPSIS
use WAIT::Filter qw(Stem Soundex Phonix isolc isouc disolc disouc);
$stem = Stem($word);
$scode = Soundex($word);
$pcode = Phonix($word);
$lword = isolc($word);
$uword = isouc($word);
disolc($word);
disouc($word);
DESCRIPTION
This tiny modules gives access to the basic reduction functions build in freeWAIS-sf.
- Stem(word)
-
reduces word using the well know Porter algorithm.
AU: Porter, M.F. TI: An Algorithm for Suffix Stripping JT: Program VO: 14 PP: 130-137 PY: 1980 PM: JUL - Soundex(word)
-
computes the 4 byte Soundex code for word.
AU: Gadd, T.N. TI: 'Fisching for Werds'. Phonetic Retrieval of written text in Information Retrieval Systems JT: Program VO: 22 NO: 3 PP: 222-237 PY: 1988 - Phonix(word)
-
computes the 8 byte Phonix code for word.
AU: Gadd, T.N. TI: PHONIX: The Algorithm JT: Program VO: 24 NO: 4 PP: 363-366 PY: 1990 PM: OCT
ISO charcater case functions
There are some additional function which transpose some/most ISOlatin1 characters to upper and lower case. To allow for maximum speed there are also destructive versions which change the argument instead of allocating a copy which is returned. For convenience, the destructive version also returns the argument. So both of the following is valid and $word will contain the lowercased string.
$word = disolc($word);
disolc($word);
Here are the hardcoded characters which are recognized:
abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïñòóôõöøùúûüýß
ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝß
$new =isolc($word)- disolc
($word) -
transposes to lower case.
$new =isouc($word)- disouc
($word) -
transposes to upper case.
AUTHOR
Ulrich Pfeifer <pfeifer@ls6.informatik.uni-dortmund.de>
SEE ALSO
perl(1).
4 POD Errors
The following errors were encountered while parsing the POD:
- Around line 628:
You forgot a '=back' before '=head1'
- Around line 643:
Non-ASCII character seen before =encoding in 'abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïñòóôõöøùúûüýß'. Assuming CP1252
- Around line 646:
'=item' outside of any '=over'
- Around line 658:
You forgot a '=back' before '=head1'