NAME
Schema::Validator - Tools for validating and loading Schema.org vocabulary definitions
VERSION
Version 0.03
SYNOPSIS
use Schema::Validator qw(is_valid_datetime load_dynamic_vocabulary);
# Validate a date or datetime string
if (is_valid_datetime('2024-11-14')) {
print "Valid date\n";
}
# Load and query the Schema.org vocabulary
my $classes = load_dynamic_vocabulary();
if (exists $classes->{'Person'}) {
print "Person class is defined\n";
}
# Override a config value for a single call
my $classes = load_dynamic_vocabulary(ua_timeout => 60);
DESCRIPTION
Schema::Validator provides two utilities for working with Schema.org structured data:
"is_valid_datetime" -- validates a string against the ISO 8601 date/datetime subset used by Schema.org.
"load_dynamic_vocabulary" -- downloads (and caches for 24 hours) the full Schema.org JSON-LD vocabulary and exposes all class and property definitions as a hashref and via package globals.
Configuration
Runtime behaviour is controlled by the package-level %Schema::Validator::config hash. Supported keys and their defaults:
cache_file => "$CACHEDIR/schemaorg_dynamic_vocabulary.jsonld" # or tmpdir
cache_duration => 86400 # seconds
vocab_url => 'https://schema.org/.../schemaorg-current-https.jsonld'
ua_timeout => 30 # seconds
Override any key before calling an exported function:
$Schema::Validator::config{ua_timeout} = 60;
Or supply a complete replacement via Object::Configure:
Object::Configure->configure('Schema::Validator', \%my_config);
PACKAGE VARIABLES
%dynamic_schema
Package hash keyed by Schema.org class label (e.g. Person, Event). Values are the raw item hashrefs from the JSON-LD @graph array. Populated as a side-effect of "load_dynamic_vocabulary".
%dynamic_properties
Package hash keyed by Schema.org property label (e.g. name, startDate). Values are the raw item hashrefs from the JSON-LD @graph array. Populated as a side-effect of "load_dynamic_vocabulary".
FUNCTIONS
is_valid_datetime
PURPOSE
Tests whether a scalar string conforms to one of the ISO 8601 date or datetime formats accepted by Schema.org:
YYYY-MM-DD (date only)
YYYY-MM-DDTHH:MM (T separator, no seconds)
YYYY-MM-DD HH:MM (space separator, no seconds)
YYYY-MM-DDTHH:MM:SS (T separator, with seconds)
YYYY-MM-DD HH:MM:SS (space separator, with seconds)
Optional timezone designators (Z, +HH:MM, -HH:MM) are accepted. Calendar sanity is enforced: out-of-range values (e.g. month 99) are rejected.
ARGUMENTS
string(required, scalar) -- the candidate string to test. Both positional (is_valid_datetime('2024-11-14')) and named (is_valid_datetime(string => '2024-11-14')) calling conventions are accepted.
RETURNS
1 if the string is in a supported format; 0 otherwise. Returns 0 for undef or an empty string without throwing.
SIDE EFFECTS
None.
NOTES
Delegates to DateTime::Format::ISO8601-parse_datetime()> for semantic validation, so out-of-range values (e.g. month 99) are rejected. The space-separator variant (YYYY-MM-DD HH:MM) is normalised to a T separator before parsing since the module requires strict ISO 8601. Timezone designators (Z, +HH:MM, -HH:MM) are now accepted.
EXAMPLE
use Schema::Validator qw(is_valid_datetime);
is_valid_datetime('2024-11-14'); # 1
is_valid_datetime('2024-11-14T15:30:00'); # 1
is_valid_datetime('2024-11-14 15:30'); # 1 (space sep normalised)
is_valid_datetime('2024-11-14T15:30:00Z'); # 1 (UTC timezone)
is_valid_datetime('2024-11-14T15:30:00+01:00'); # 1 (offset timezone)
is_valid_datetime('2024-99-01'); # 0 (invalid month)
is_valid_datetime('28/06/2025'); # 0
is_valid_datetime(undef); # 0 (no exception)
is_valid_datetime(''); # 0 (no exception)
# Named calling convention
is_valid_datetime(string => '2024-11-14'); # 1
API SPECIFICATION
Input (Params::Validate::Strict)
{
string => {
type => 'string',
optional => 0,
},
}
Output (Return::Set)
{
type => 'boolean'
description => '1 (valid) or 0 (invalid, undef, or empty input)'
}
load_dynamic_vocabulary
PURPOSE
Downloads the complete Schema.org vocabulary from the official JSON-LD endpoint, parses it into class and property lookup tables, caches the raw JSON-LD locally, and returns the class table as a hashref.
The cache is considered fresh for cache_duration seconds (default 24 hours). On network failure the function falls back to a stale cache rather than returning an empty result, and emits a carp warning.
ARGUMENTS
All arguments are optional; defaults come from %Schema::Validator::config.
cache_file(optional, scalar) -- path to the local cache file. Defaults to$config{cache_file}:$CACHEDIR/schemaorg_dynamic_vocabulary.jsonldif$ENV{CACHEDIR}is set, otherwiseFile::Spec->tmpdir()is used.cache_duration(optional, scalar) -- cache validity window in seconds. Defaults to$config{cache_duration}.vocab_url(optional, scalar) -- URL of the JSON-LD vocabulary endpoint. Defaults to$config{vocab_url}.ua_timeout(optional, scalar) -- LWP::UserAgent timeout in seconds. Defaults to$config{ua_timeout}.
Both zero-argument and named calling conventions are supported:
load_dynamic_vocabulary();
load_dynamic_vocabulary(ua_timeout => 60);
RETURNS
A hashref mapping class labels (e.g. 'Person') to their raw JSON-LD definition hashrefs from the @graph array.
Returns an empty hashref {} on all failure paths (network unreachable, no cache, JSON parse error). Never throws.
SIDE EFFECTS
Populates
%Schema::Validator::dynamic_schemawith class definitions.Populates
%Schema::Validator::dynamic_propertieswith property definitions.Creates or updates the local cache file on a successful download.
Emits
carpwarnings on network failures, I/O errors, or JSON parse errors.
NOTES
The default cache directory is determined once at module load time: the $CACHEDIR environment variable is used if set; otherwise File::Spec->tmpdir() is used (typically /tmp on Unix). Override for the session with:
$Schema::Validator::config{cache_file} = '/my/path/vocab.jsonld';
The bin/validate-schema CLI tool imports this function from the module and uses cache_file => $path to store its cache under ~/.cache/schema_validator/.
EXAMPLE
use Schema::Validator qw(load_dynamic_vocabulary);
my $classes = load_dynamic_vocabulary();
printf "%d classes loaded\n", scalar keys %{$classes};
# Check for a specific class in the returned hashref
print "Has Person\n" if exists $classes->{'Person'};
# Or query the package globals directly after the call
Schema::Validator::load_dynamic_vocabulary();
my @names = sort keys %Schema::Validator::dynamic_schema;
API SPECIFICATION
Input (Params::Validate::Strict)
{
cache_file => { type => 'string', optional => 1 },
cache_duration => { type => 'string', optional => 1 },
vocab_url => { type => 'string', optional => 1 },
ua_timeout => { type => 'string', optional => 1 },
}
Output (Return::Set)
{
type => 'hashref',
description => 'class-label => JSON-LD item hashref'
# ON_FAILURE => 'empty hashref {}; never throws'
# SIDE_EFFECTS => 'populates %dynamic_schema and %dynamic_properties'
}
FILES
schemaorg_dynamic_vocabulary.jsonld
Cache file written to $CACHEDIR (if set) or the system temporary directory (File::Spec->tmpdir()), unless overridden via $config{cache_file}. Contains the downloaded Schema.org vocabulary in JSON-LD format. Refreshed when older than $config{cache_duration} seconds.
ERROR HANDLING
The module uses carp rather than die for recoverable failures:
Failed HTTP requests emit
carpand trigger the stale-cache fallback.JSON parse errors emit
carpand return{}.File I/O errors emit
carp; the download path is attempted next.croakis reserved for programmer errors (bad argument types).
BUGS
Cache invalidation is time-based only; no checksum or version check.
SEE ALSO
REPOSITORY
https://github.com/nigelhorne/schema-validator
FORMAL SPECIFICATION
is_valid_datetime
Let CHAR denote the set of all Unicode code points and
DIGIT = { c : CHAR | c in {'0'..'9'} }.
Let seqN(S) = { s : seq S | #s = N }.
YEAR ≜ seqN(4, DIGIT)
MONTH ≜ seqN(2, DIGIT)
DAY ≜ seqN(2, DIGIT)
HOUR ≜ seqN(2, DIGIT)
MINUTE ≜ seqN(2, DIGIT)
SECOND ≜ seqN(2, DIGIT)
SEP ≜ { 'T', ' ' }
DATE ≜ { d : seq CHAR | ∃ y ∈ YEAR; mo ∈ MONTH; dy ∈ DAY
• d = y ⌢ ⟨'-'⟩ ⌢ mo ⌢ ⟨'-'⟩ ⌢ dy }
HHMM ≜ { t : seq CHAR | ∃ h ∈ HOUR; m ∈ MINUTE
• t = h ⌢ ⟨':'⟩ ⌢ m }
HHMMSS ≜ { t : seq CHAR | ∃ h ∈ HOUR; m ∈ MINUTE; s ∈ SECOND
• t = h ⌢ ⟨':'⟩ ⌢ m ⌢ ⟨':'⟩ ⌢ s }
TIMEFRAG ≜ { tf : seq CHAR | ∃ sep ∈ SEP; hm ∈ (HHMM ∪ HHMMSS)
• tf = ⟨sep⟩ ⌢ hm }
DATETIME ≜ DATE ∪ { dt : seq CHAR | ∃ d ∈ DATE; tf ∈ TIMEFRAG
• dt = d ⌢ tf }
──────────────────────────────────────────────────────────────
IsValidDatetime
──────────────────────────────────────────────────────────────
str? : seq CHAR
result! : B
──────────────────────────────────────────────────────────────
result! ⟺ str? ∈ DATETIME
──────────────────────────────────────────────────────────────
load_dynamic_library
Let FILE, DUR, URL be the resolved config values.
Let now : N be the current UNIX epoch time.
Let mtime : PATH -> N map a path to its last-modification time.
Let readable, writeable : PATH -> B be filesystem predicates.
Let reachable : URL -> B test HTTP reachability.
Let slurp : PATH -> seq CHAR and spit : PATH x seq CHAR -> 1.
Let fetch : URL x N -> seq CHAR (second arg is timeout).
Let decode_json : seq CHAR -> ITEM.
Let label : ITEM -> (LABEL | {}) extract rdfs:label.
Let types : ITEM -> P TYPE extract @type values.
FRESH ≜ ( -e(FILE) ) ∧ ( (now - mtime(FILE)) < DUR )
──────────────────────────────────────────────────────────────────────
LoadDynamicVocabulary
──────────────────────────────────────────────────────────────────────
ΔVocabularyStore
cache_file? : PATH
cache_duration? : N
vocab_url? : URL
ua_timeout? : N
result! : CLASS_LABEL ⇸ ITEM
──────────────────────────────────────────────────────────────────────
content : seq CHAR
FRESH ∧ readable(cache_file?)
⇒ content = slurp(cache_file?)
¬FRESH ∧ reachable(vocab_url?)
⇒ content = fetch(vocab_url?, ua_timeout?)
∧ ( writeable(cache_file?) ⇒ spit(cache_file?, content) )
¬FRESH ∧ ¬reachable(vocab_url?) ∧ -e(cache_file?)
⇒ content = slurp(cache_file?)
graph ≜ (decode_json content)[AT_GRAPH]
dynamic_schema' =
{ item ∈ graph | RDF_CLASS ∈ types(item) ∧ label(item) ≠ ∅
• label(item) ↦ item }
dynamic_properties' =
{ item ∈ graph | RDF_PROPERTY ∈ types(item) ∧ label(item) ≠ ∅
• label(item) ↦ item }
result! = dynamic_schema'
──────────────────────────────────────────────────────────────────────
AUTHOR
Nigel Horne, <njh at nigelhorne.com>
LICENCE AND COPYRIGHT
Copyright 2025-2026 Nigel Horne.
Usage is subject to the GPL2 licence terms. If you use it, please let me know.