NAME
Text::Query::Simple - Match text against simple query expression and return relevance value for ranking
SYNOPSIS
use Text::Query::Simple;
# Constructor
$query = Text::Query::Simple->new([QSTRING] [OPTIONS]);
# Methods
$query->prepare(QSTRING [OPTIONS]);
$query->match([TARGET]);
$query->matchscalar([TARGET]);
DESCRIPTION
This module provides an object that tests a string or list of strings against a query expression similar to an AltaVista "simple query" and returns a "relevance value." Elements of the query expression may be regular expressions or literal text, and may be assigned weights.
Query expressions are compiled into an internal form when a new object is created or the prepare method is called; they are not recompiled on each match.
Query expressions consist of words (sequences of non-whitespace), regexps or phrases (quoted strings) separated by whitespace. Words or phrases prefixed with a + must be present for the expression to match; words or phrases prefixed with a - must be absent for the expression to match.
A successful match returns a count of the number of times any of the words (except ones prefixed with -) appeared in the text. This type of result is useful for ranking documents according to relevance.
Words or phrases may optionally be followed by a number in parentheses (no whitespace is allowed between the word or phrase and the parenthesized number). This number specifies the weight given to the word or phrase; it will be added to the count each time the word or phrase appears in the text. If a weight is not given, a weight of 1 is assumed.
EXAMPLES
use Text::Query::Simple;
my $q=new Text::Query::Simple('+hello world');
die "bad query expression" if not defined $q;
$count=$q->match;
...
$q->prepare('goodbye adios -"ta ta",-litspace=>1);
#requires single space between the two ta's
if ($q->match($line,-case=>1)) {
#doesn't match "Goodbye"
...
$q->prepare('\\bintegrate\\b',-regexp=>1);
#won't match "disintegrated"
...
$q->prepare('information(2) retrieval');
#information has twice the weight of retrieval
CONSTRUCTOR
- new ([QSTRING] [OPTIONS])
-
This is the constructor for a new Text::Query::Simple object. If a
QSTRINGis given it will be compiled to internal form.OPTIONSare passed in a hash like fashion, using key and value pairs. Possible options are:-case - If true, do case-sensitive match.
-litspace - If true, match spaces (except between operators) in
QSTRINGliterally. If false, match spaces as\s+.-regexp - If true, treat patterns in
QSTRINGas regular expressions rather than literal text.-whole - If true, match whole words only, not substrings of words.
The constructor will return
undefif aQSTRINGwas supplied and had illegal syntax.
METHODS
- prepare (QSTRING [OPTIONS])
-
Compiles the query expression in
QSTRINGto internal form and sets any options (same as in the constructor).preparemay be used to change the query expression and options for an existing query object. IfOPTIONSare omitted, any options set by a previous call to the constructor orprepareremain in effect.This method returns a reference to the query object if the syntax of the expression was legal, or
undefif not. - match ([TARGET])
-
If
TARGETis a scalar,matchreturns the number of words in the string specified byTARGETthat match the query object's query expression. IfTARGETis not given, the match is made against$_.If
TARGETis an array,matchreturns a list of references to anonymous arrays consisting of each element followed by its match count. The list is sorted in descending order by match count. If the elements ofTARGETwere anonymous arrays, the match count is appended to each element. This allows arbitrary information (such as a filename) to be associated with each element.If
TARGETis a reference to an array,matchreturns a reference to a sorted list of matching items, with counts, for all elements. - matchscalar ([TARGET])
-
Behaves just like
MATCHwhenTARGETis a scalar or is not given. Slightly faster thanMATCHunder these circumstances.
RESTRICTIONS
This module requires Perl 5.005 or higher due to the use of evaluated expressions in regexes
AUTHOR
Eric Bohlman (ebohlman@netcom.com)
CREDITS
The parse_tokens routine was adapted from the parse_line routine in Text::Parsewords.
COPYRIGHT
Copyright (c) 1998 Eric Bohlman. All rights reserved. This program is free software; you can redistribute and/or modify it under the same terms as Perl itself.