NooJ: A Linguistic Development Environment



Technology

 

NooJ runs on MS-Windows, Mac OS X, LINUX and BSD Unix.

 

NooJ processes texts and corpora (i.e. sets of text files) at the Orthographical, Lexical, Morphological, Syntactic and Semantic levels.

All linguistic information (at any level) is represented by annotations that are stored in the Text Annotation Structure (TAS).

Annotations are typically inserted added to the TAS in cascade, without destroying the original text.

Annotations can describe units inside word forms (for contracted words, e.g. "cannot" and for agglutinative languages), simple forms (e.g. "table"), multiword units (e.g. "round table") as well as discontinuous expressions (e.g. "turn ... off").

NooJ offers the four types of grammars/machines of the Chomsky hierarchy:

-- NooJ contains several tools to process Finite-State machines and Regular grammars.

-- NooJ processes Context-Free Grammars and Push Down Automata. Note that in most cases, NooJ can "flatten out" sets of recursively embedded graphs, to de-recursivate Context-Free Grammars into Regular grammars.

-- NooJ processes Context-Sensitive Grammars in two steps: the first step is performed by a Push Down Automaton (or even a Finite-State Machine when the grammar is flattened out), the second step is performed by computing variables' value and testing the constraints of the Grammar (in O(n)).

-- NooJ can perform Z. Harris's transformations in cascade, giving NooJ the power of a Turing Machine. The morphological and the syntactic engines are integrated: this makes it possible to perform morphological operations on words while performing a syntactic transformation.

NooJ can process texts written in over 20 languages, including some Roman, Germanic, Slavic, Semitic and Asian languages, as well as Hungarian.

All NooJ grammars/machines are compatible, i.e. one can insert parts of a Regular Grammar in a Context-Free Grammar, in a Context-Sensitive Grammar, and use them in a loop to simulate a Turing-Machine.

NooJ dictionaries are extremely simple objects and can describe orthographical and synonymous variants, inflectional as well derivational forms.

NooJ includes tools to check, debug, adapt, maintain, and share dictionaries and grammars.