Computing Pages

by Francesc Hervada-Sala


Text Structure

I present here a candidate for the fundamental text structure. Let us introduce first a language to represent text and after that deal with the general text definition.

Plain Text Language

The simplest text is a single text unit having a name. We note it down this way:

^ person

This defines a text unit named ”person“. A text unit can consist of some other text units. For example, a person has a name and a birth date.

^ person {
	^ first-name
	^ last-name
	^ birth-date
}

That introduces three named text units that are part of the text unit ”person“.

A text can also be of some type. We use for this a colon : as prefix.

= Ann : person

The prefix = identifies the unit name. If Ann is a person, then she has all characteristics of persons, according to the previous definition she has a name and a birth date:

= Ann : person {
	~ first-name Ann
	~ last-name Jones
	~ birth-date 6/24/1954
}

The above introduces a text unit named ”Ann“ and records some data about it. The three subordinated text units are not being here defined, but mentioned, that is why we do not use ^ (for definition) but ~ (for role) instead.

A text ends with some data not being further analyzed, above two strings ”Ann“ and ”Jones“ and a date ”6/24/1954“. For these we can define some unit types, too. The complete text would look like this:

^ string
^ date
^ person {
	^ first-name : string
	^ last-name : string
	^ birth-date : date
}
= Ann ~ person {
	~ first-name Ann
	~ last-name Jones
	~ birth-date 6/24/1954
}

If the text we are analyzing talks about some more people, we might want to distinguish between men and women. We can add the names ”man“ and ”woman“:

^ man : person
^ woman : person

After that, each unit with the type ”woman“ or ”man“ is necessarily a person, too, and has therefore a name and a birth date:

= Ann ~ woman {
	~ first-name Ann
	~ last-name Jones
	~ birth-date 6/24/1954
}

Text units can be grouped to produce higher text units. For example, a family is a compound unit consisting of persons.

^ family {
	^ parent : person
	^ child : person
}

The persons of a family are grouped above in two classes: the parents and the children. Each person that belongs to a family plays a particular role in it, either as parent or as child. When talking about some family and recording its members, one must determine the role of each of them.

= Jones ~ family {
	= Ann ~ parent : woman
	= John ~ parent : man
	= Lena ~ child : woman
}

Note the difference between a type and a role. Ann has herself a type, she is a woman, apart from that she belongs to family Jones and plays inside this family the role of a parent. Text units with different types (man or woman) can play the same role (parent or child).

Instead of defining a unit, one can refer to a unit defined elsewhere with the sign ==. For example the following text produces exactly the same family as above:

=Ann : woman
=John : man
=Lena : woman
= Jones ~ family {
	~parent ==Ann
	~parent ==John
	~child ==Lena 
}

With the plain text language we have introduced a particular model of text. We record any text as a web of text units that relate to each other as component, as type or as role. Let us reduce this now to a single algebraic expression as a general definition of text.

Text Formula

The structure of every single text can be reduced to some particular text units that are related to each other this way: each text unit has a parent, a role and a type, these being again text units. One can express the general text formula this way:

<parent> {
	<child> ~<role> :<type>
}

The meanings of these relationships are these:

This one formula exhausts the structure of every text. An arbitrary text can be reduced to an expression based on it, getting its structure completely described.

Note that each finite text must have at least one completely self-related unit that is its own parent, type and role. Otherwise the above formula would not be true for all text units.

Note that there is a single root category, the unit. A type is not a separate entity: types are themselves units, too. Every unit can act as the type of another unit.

Note that the formal text definition does not include names. Unit names can be part of a language one uses to express texts, but they do not belong to the text structure itself.

Scope of Text

Let us now see what range of matters this concept of text applies to.

Books

Books are —not surprisingly— text. A particular book might be arranged in several fixed levels such as sections and chapters, this can be expressed explicitly ”Section 1“, ”Chapter Two“ or implicitly through graphic presentation, for example: section titles appear at a separate page on the right side of the book, that only contains the title and is otherwise left blank, whereas chapter titles begin a new page, indifferently on the left or the right side and contain some prose paragraphs, too.

^ myBookOutline {
	^ section {
		^ title : string
		^ chapter {
			^ title : string
			^ contents {
				[...]
			}
		}
	}
} 
= myBook ~ myBookOutline {
	~ section Introduction
		~ chapter Motivation
		[...]
	~ section Theory
		[...]
	~ section Applications
		[...]
}

A book can also be arranged by generic levels instead, as usually for example in science as in ”1.1 Motivation.“

^ myThesisOutline {
	^ level {
		^ title : string
		^ no : cardinal
		^ level : level
		^ contents {
			[...]
		}
	}
}
= myThesis ~ myThesisOutline {
	~ level Introduction {
		~ no 1
		~ level Motivation {
			~ no 1
			[...]
		}
	}
	~ level Theory [...]
}

Prose

Prose consists, apart from an arrangement in sections and chapters as we have already seen, of prose blocks, being a prose block a continuous flow of natural language. It can be a list of paragraphs:

^ prose {
	^ paragraph : string
}
=chapter1 ~ prose
	~ paragraph At first one must say...
	~ paragraph Nonetheless this is tricky...

But it can also contain other structure elements, such as a list:

^ prose {
	^ paragraph : string
	^ list {
		^ item : string
	}
}
=chapter3 ~ prose {
	~ paragraph One must perform some steps:
	~ list {
		~ item First...
		~ item After that, ...
		[...]
	}
}

The prose flow is not always a single level, there can be more than one, for example something included in parentheses or between dashes. Footnotes can be an embedded level, too.

^ paragraph {
	^ sentence {
		^ content : string
		^ note : string
	}
}
~ paragraph {
	~ sentence Something must be done. {
		~ note Not everything is good, though.
	}	
}

Natural Language

The prose structure ends with natural language sentences. But a sentence is not an unanalyzable character string, it has structure, too. One can perform a syntax analysis of a sentence to free its syntactic structure.

~ sentence : copulative-sentence "I am tired." 
{
	~ subject : personal-pronoun {
		~ person 1
		~ number singular
	}
	~ copula "to be" {
		~ tense present
		~ mode indicative
	}
	~ attribute tired
}

One will surely not want to make manually a syntax analysis of all the sentences that one writes, but in humanities one will want to analyze completely say Plato's Works, and a spell checker program will try to make an approximate one in order to check the grammar.

Apart from that, natural language sentences have a meaning, and this can be analyzed, too. For example, if a sentence quotes a work, the reference can be recorded.

^ sentence {
	~ reference {
		~ work : author.work
	}
}

One can set references on single words, too, for example to disambiguate them at a dictionary. One can set references to contemporary people or events that clarify text passages.

Knowledge

Some academic studies are mainly done in prose. Some of them have particular writings as object (as in humanities) or as mean (as in history), these writings can be recorded as prose and everything that the scholars find out can be tied with it together.

All knowledge can be expressed as text.

^ kingdom {
	^ name : string
	^ house {
		^ name : string
		^ head-of-state {
			^ name : string
		}
		^ king : head-of-state
		^ queen : head-of-state
	}
}
~ kingdom "United Kingdom" {
	~ house "House of Stuart"
		~ queen "Anne"
	~ house "House of Hanover"
		~ king "Georg I"
		~ king "Georg II"
		[...]
	~ house "House of Windsor"
		[...]
}

Mathematics

Some sciences use the mathematical language. Mathematical statements can be analyzed syntactically and be reduced to a text, too.

~ statement : equation "3+4=7" {
	~ part {
		~ addition {
			~ operand : integer 3
			~ operand : integer 4
		}
	}
	~ part {
		~ value : integer 7
	}
}

Programming

Like natural language and mathematical language, every formal language can be recorded as text, too. One gets the text of a formal expression through syntax analysis, which also frees its semantics. This includes programming languages.

~ statement : for-loop {
	~ source-string ""
for(int i=0; i<10; i++)
	dothis(i);
""
	~ preoperation : assignment {
		~ variable {
			~ name i
			~ type integer
			~ scope block
		}
		~ value {
			~ constant : integer 0
		}
	}
	~ condition : lesser-than {
		~ left-operand : variable i
		~ right-operand : constant 10
	}
	~ postoperation : increment {
		~ variable i
	}
	~ statement : statement-block {
		~ statement : function-call {
			~ function dothis
			~ parameter : variable i
			}
		}
	}
}

One can also express in terms of text a ”make file“ as commonly used to specify procedures for building executable files from source files.

^ make {
	^ target {
		^ filename : string
		^ dependency : filename
		^ command : string
	}
}
~ make {
	~ target helloworld {
		~ dependency helloworld.o
        	~ command cc -o $@ $<
 	}
	~ target helloworld.o {
		~ dependency helloworld.c
        	~ command cc -c -o $@ $<
	}
}

Digital Media

Digital media files can be expressed as text. For example mp3:

^ mp3-file {
	^ header {
		^ segment {
			^ bit-count : cardinal
			^ meaning : string
		}
		~ segment {
			~ bit-count 12
			~ meaning Sync Word		
		}
		~ segment {
			~ bit-count 1
			~ meaning Version
		}
		~ segment {
			~ bit-count 2
			~ meaning Layer
		}
		[...]
	}
	^ data {
		[...]
	}
}

Every media file can be expressed this way, one only needs to reproduce the so-called ”file structure“.

We have seen that many different things can be reduced to this concept of text. Not only such different things as prose and digital media, mathematics and programming languages can be seen as text, but they can all be expressed by a single structure. This opens the door for computer assisted text management in all these fields.

Print Contact

Text Structure

Plain Text Language

Text Formula

Scope of Text

Books

Prose

Natural Language

Knowledge

Mathematics

Programming

Digital Media

Text-Oriented Software (Book)

Text-Oriented Software

Copyright

Preface

Text

Text Structure

Comparing Text to Other Structures

Text Query

Languages

Text-Orientation

Imagine

Text-Oriented IDE

Text-Oriented Programming Languages

Files and Text

Programs and Text

Text-Oriented Compiling

Case Studies

Sample: Program Parameters

Unix: A Text-Aware Environment

Universaltext Interpreter

Background

What is Text?

What is Text-Orientation?

Just Once: A Programming Ideal

Why is Computing Important?