NAME

WAIT::Filter - Perl extension providing the basic freeWAIS-sf reduction functions

SYNOPSIS

use WAIT::Filter qw(Stem Soundex Phonix isolc isouc disolc disouc);

$stem  = Stem($word);
$scode = Soundex($word);
$pcode = Phonix($word);
$lword = isolc($word);
$uword = isouc($word);
disolc($word);
disouc($word);

DESCRIPTION

This tiny modules gives access to the basic reduction functions build in freeWAIS-sf.

Stem(word)

reduces word using the well know Porter algorithm.

AU: Porter, M.F.
TI: An Algorithm for Suffix Stripping
JT: Program
VO: 14
PP: 130-137
PY: 1980
PM: JUL
Soundex(word)

computes the 4 byte Soundex code for word.

AU: Gadd, T.N.
TI: 'Fisching for Werds'. Phonetic Retrieval of written text in
    Information Retrieval Systems
JT: Program
VO: 22
NO: 3
PP: 222-237
PY: 1988
Phonix(word)

computes the 8 byte Phonix code for word.

AU: Gadd, T.N.
TI: PHONIX: The Algorithm
JT: Program
VO: 24
NO: 4
PP: 363-366
PY: 1990
PM: OCT

ISO charcater case functions

There are some additional function which transpose some/most ISOlatin1 characters to upper and lower case. To allow for maximum speed there are also destructive versions which change the argument instead of allocating a copy which is returned. For convenience, the destructive version also returns the argument. So both of the following is valid and $word will contain the lowercased string.

$word = disolc($word);
disolc($word);

Here are the hardcoded characters which are recognized:

abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïñòóôõöøùúûüýß
ABCDEFGHIJKLMNOPQRSTUVWXYZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝß
$new = isolc($word)
disolc($word)

transposes to lower case.

$new = isouc($word)
disouc($word)

transposes to upper case.

AUTHOR

Ulrich Pfeifer <pfeifer@ls6.informatik.uni-dortmund.de>

SEE ALSO

perl(1).

4 POD Errors

The following errors were encountered while parsing the POD:

Around line 628:

You forgot a '=back' before '=head1'

Around line 643:

Non-ASCII character seen before =encoding in 'abcdefghijklmnopqrstuvwxyzàáâãäåæçèéêëìíîïñòóôõöøùúûüýß'. Assuming CP1252

Around line 646:

'=item' outside of any '=over'

Around line 658:

You forgot a '=back' before '=head1'