Computing Pages

by Francesc Hervada-Sala


Alternate Parsers

The language UTL is a general-purpose way of coding the Universaltext, but there are many other possible formats and languages, that are frequently fit for a particular purpose. The Universaltext Interpreter provides a way to use them inside a UTL expression or input file. With an alternate parser you can express the text in an arbitrary language or format and it gets parsed as Universaltext, thus integrating completely with the rest of the text and being susceptible to be navigated and queried.

Calling a Parser by Name

A parser is invoked with a line containing an opening square bracket, an asterisk and the parser name. All lines until the next line containing just a closing square bracket are parsed by this parser. Example calling the parser named ”my-format“:

[* my-format
    ... some contents with a custom format here ...
]

Calling a Type Parser

You can parse a unit with an alternate parser defined for its type. To parse a particular unit with its implicit parser one puts square brackets instead of curly brackets around its children.

For example if you have this text:

~webpage =index Overview {
    ~content
    ~h1 My Site
    Welcome to my site!
    This site is under construction.
}

Supposing there is a parser defined for the type "webpage" you could put this:

~webpage =index Overview [
    ... some contents with an alternate language or format here ...
]

The lines between [ and ] are not UTL but an alternate coding that the parser recognises.

Embedded Parser Calls

An explicit parser block can be embedded inside another parser block. Example:

[*script
  ... script instructions here...
  [*settings 
	... settings here...
  ]
  ... script instructions here...
]

The embedded parser is recognized by the interpreter and called by it, not by the outer parser.

If this notation conflicts with the syntax of a particular parser, this parser can disable embedded blocks by setting the property EMBEDDED_PARSERS to 0:

$self->{EMBEDDED_PARSERS} = 0

Implementing a Parser

To implement a parser, one writes a Perl function that parses the format, say parseWebpage, and then one binds it to the type as parser:

^webpage {
    ~parser main::parseWebpage
    ^title : ustring
    ^content {
        ^p : ustring
        ^h1 : ustring
    }
}

An explicit parser call with [* can invoke any parser, whatever type it is bound to.

When the interpreter comes to the text in square brackets, it calls the function parseWebpage, in order to get it parse the lines. A parser has this form:

sub my_parser
{
my ($ut,$uid) = @_;
[...]
	my $lin = $ut->readline;
[...]
}

The parser receives a UText object for feeding text. It gets also the unit Id of the parser being called, this is useful to implement a family of parsers with a single Perl function.

The class UText exposes the function readline for parsers to get the next line to be parsed from the source file.

$ut->readline

It returns a string with the next line or undefined if the end of the region to be parsed is reached. The parser does not see the line marked ] that closes the parse region.

Note that the returned string can be an empty string if there is an empty line to be parsed. Thus when checking for the end of the input lines one cannot test for if(!$ut->readline) but for if(undefined($ut->readline)) instead.

Back to our example. If we want to enter some websites in this format:

first line containing the header
second line containing the first paragraph
new paragraph
etc.

Our parser could look like this:

sub parseWebpage
{
my $ut=shift;
my $n=0;
$ut->enter();
$ut->set({role=>'content'});
$ut->enter();
while(defined(my $lin=$ut->readline)) {
    my $role = $n++==0 ? 'h1' : 'p';
    $ut->set({role=>$role,bin=>$lin});
}
$ut->leave();
$ut->leave();
}

The function does this:

Now we can enter our websites with this code:

~webpage =index Overview [
My Site
Welcome to my site!
This site is under construction.
]
~webpage =contact Contact [
Contact Me
You can write to me at me@myweb.org
You can also contact me with the form below.
]

The generated text looks like this:

=index ~webpage {
        ~title Overview
        ~content {
                ~h1 My Site
                ~p Welcome to my site!
                ~p This site is under construction.
        }
}
=contact ~webpage {
        ~title Contact
        ~content {
                ~h1 Contact Me
                ~p You can write to me at me@myweb.org
                ~p You can also contact me with the form below.
       }
}

There is a file ”parser.pl“ containing this example at the distribution files under the directory samples.

Print Contact

Alternate Parsers

Calling a Parser by Name

Calling a Type Parser

Embedded Parser Calls

Implementing a Parser

UText/1.2 Manual

Copyright

Getting Started

Installation

Quick Tour

User Guide

Universaltext Language

Feeding Text

Alternate Parsers

Text Selectors

Output Processors

Universaltext Script

Add-In Modules

Reference: Base Modules

UText.pm

UTL.pm

Navigation.pm

Tags.pm

FILE.pm

Reference: Script

Script.pm

Functions.pm

Settings.pm

utshell.pl

Reference: Extensions

cms add-in

odt add-in

types add-in

env add-in

Reference: Predefined Operations

Operations Index

Tags

Functions

Add-In Hooks

Project Universal Text

Forerunner

UText/1

Milestones

Text Engine

Text Repository

Text Server

Text Workbench

Text OS

Design Documents

Concepts

Universal Text Language

UTL Syntax

UTL Name System

Architecture

Glossary

Discussion

On Text Structure