Alternate Parsers
The language UTL is a general-purpose way of coding the Universaltext, but there are many other possible formats and languages, that are frequently fit for a particular purpose. The Universaltext Interpreter provides a way to use them inside a UTL expression or input file. With an alternate parser you can express the text in an arbitrary language or format and it gets parsed as Universaltext, thus integrating completely with the rest of the text and being susceptible to be navigated and queried.
Calling a Parser by Name
A parser is invoked with a line containing an opening square bracket, an asterisk and the parser name. All lines until the next line containing just a closing square bracket are parsed by this parser. Example calling the parser named ”my-format“:
[* my-format
... some contents with a custom format here ...
]
Calling a Type Parser
You can parse a unit with an alternate parser defined for its type. To parse a particular unit with its implicit parser one puts square brackets instead of curly brackets around its children.
For example if you have this text:
~webpage =index Overview {
~content
~h1 My Site
Welcome to my site!
This site is under construction.
}
Supposing there is a parser defined for the type "webpage" you could put this:
~webpage =index Overview [
... some contents with an alternate language or format here ...
]
The lines between [ and ] are not UTL but an alternate coding that the parser recognises.
Embedded Parser Calls
An explicit parser block can be embedded inside another parser block. Example:
[*script
... script instructions here...
[*settings
... settings here...
]
... script instructions here...
]
The embedded parser is recognized by the interpreter and called by it, not by the outer parser.
If this notation conflicts with the syntax of a particular parser, this parser can disable embedded blocks by setting the property EMBEDDED_PARSERS to 0:
$self->{EMBEDDED_PARSERS} = 0
Implementing a Parser
To implement a parser, one writes a Perl function that parses the format, say parseWebpage, and then one binds it to the type as parser:
^webpage {
~parser main::parseWebpage
^title : ustring
^content {
^p : ustring
^h1 : ustring
}
}
An explicit parser call with [* can invoke any parser, whatever type it is bound to.
When the interpreter comes to the text in square brackets, it calls the function parseWebpage, in order to get it parse the lines. A parser has this form:
sub my_parser
{
my ($ut,$uid) = @_;
[...]
my $lin = $ut->readline;
[...]
}
The parser receives a UText object for feeding text. It gets also the unit Id of the parser being called, this is useful to implement a family of parsers with a single Perl function.
The class UText exposes the function readline for parsers to get the next line to be parsed from the source file.
$ut->readline
It returns a string with the next line or undefined if the end of the region to be parsed is reached. The parser does not see the line marked ] that closes the parse region.
Note that the returned string can be an empty string if there is an empty line to be parsed. Thus when checking for the end of the input lines one cannot test for if(!$ut->readline) but for if(undefined($ut->readline)) instead.
Back to our example. If we want to enter some websites in this format:
first line containing the header
second line containing the first paragraph
new paragraph
etc.
Our parser could look like this:
sub parseWebpage
{
my $ut=shift;
my $n=0;
$ut->enter();
$ut->set({role=>'content'});
$ut->enter();
while(defined(my $lin=$ut->readline)) {
my $role = $n++==0 ? 'h1' : 'p';
$ut->set({role=>$role,bin=>$lin});
}
$ut->leave();
$ut->leave();
}
The function does this:
- It enters the current level (that is the "webpage")
- It creates a new unit
~contentwithset - It enters into the content with
enter - It loops through all lines to be parsed with
readline - It creates a new unit with
set, the binary contents are the last read line, the role is ”h1“ for the first line and ”p“ for the next ones - After quitting the loop it leaves the content level and the webpage level with 2x
leave
Now we can enter our websites with this code:
~webpage =index Overview [
My Site
Welcome to my site!
This site is under construction.
]
~webpage =contact Contact [
Contact Me
You can write to me at me@myweb.org
You can also contact me with the form below.
]
The generated text looks like this:
=index ~webpage {
~title Overview
~content {
~h1 My Site
~p Welcome to my site!
~p This site is under construction.
}
}
=contact ~webpage {
~title Contact
~content {
~h1 Contact Me
~p You can write to me at me@myweb.org
~p You can also contact me with the form below.
}
}
There is a file ”parser.pl“ containing this example at the distribution files under the directory samples.

