A Universal Text Database
The Text Repository is a Text Engine extended with the capability to store the parsed text persistently.
Last update on Sun Jun 5, 2011.
The storage function puts relevant requirements on the text-engine. Wheras a text engine can simply read all source files on each execution, a text repository must read and operate upon already parsed text, and it must support changes in an existing text structure such as deleting and renaming units. The source files can be used in a text repository as ”one shot“ material that does not need to be kept, changes to the existing text can be made editing the result of a text query and feeding it back into the text engine.
A client can open a text cursor to navigate through the text saved at the repository. A cursor is opened by giving a text selector that defines what text parts are to be extracted.
The cursor has an internal state pointing to one text unit at a time. The client can step forward or backward, go to the first or last unit, enter a text level or leave it, and can open nested cursors.
The client can also update the text through a cursor. It can update the unit or its binary contents, it can add and remove units.
Not all kinds of cursors are updatable, some of them allow only some modifications, others are read-only. This can be because of implementation limits but it can have a logical, incontrovertible cause. This depends on the selector the cursor is bound to.
A general cursor is bound to an empty selector, it traverses all text units and can update them without restrictions.
The cursors should perform good for big texts, not consuming ram memory and be paginated, not computing large amounts of nodes before these are effectively retrieved.
Storage is allocated within storage places that can be changed by the user at run time.
Storage places can be local available hard disks, mountable disks or remote sites (ftp, http).
Each text unit belongs to one or more storage place. It defaults to the parent unit's storage.
Binary data for a text unit can be stored at a different place as the unit itself.
The management of storage places seems to have the Universal Text layer as prerequisite, but the text layer is supposed to be stand-alone. Programming here an adhoc naming system etc. seems to me not to be a good approach.