For some time, I’ve wanted to write about the origins of Structure Synth – in particular Context Free and the design grammar approach.
The structure of both ordinary languages and computer languages have been studied for quite some time.
Formal grammars provide a way to exactly describe the structure of all valid sentences for a language. A formal grammar consists of production rules which describe valid transformations of symbol sequences. For instance a production rule might state that an English sentence consists of either a ‘Simple Sentence’ of a ‘Compound Sentence’.
Of course we cannot stop here – ‘Simple sentence’ is not a terminal symbol – it needs to be replaced and broken down into one or more terminal symbols. The terminal symbols in English would be the actual English words, while the non-terminal symbols would be structural placeholders such as ‘Simple sentence’ or ‘Verb phrase’.
Here is an example for an incomplete grammar for the English language:
[English Sentence] = [Simple Sentence] | [Compound Sentence]
[Simple Sentence] = [Declarative Sentence] | [Interrogative Sentence] | [Imperative Sentence] | [Conditional Sentence]
[Declarative Sentence] = [subject] [predicate]
[subject] = [simple subject] | [compound subject]
[simple subject] = [noun phrase] | [nominative personal pronoun]
[nominative personal pronoun] = "I" | "you" | "he" | "she" | "it" | "we" | "they"
[predicate] = ([verb] | [verb phrase]) [complement]
[verb] = [linking verb] | ...
[linking verb] = "am" |"are" |"is" | "was"| "were" | ....
The rules above are only a tiny fragment (see more here).
While formal grammars are perhaps mostly used in Computer Science for parsing and checking the syntax of computer languages, they have interesting applications in other domains. For instance, it is possible to create a formal grammar for a specific domain of a natural language, and then use the grammar to generate random sentences:
SCIgen is an example of a generator, which builds random computer science papers. As can be seen from the example above it is not difficult to generate a valid sentence if the formal grammar is at hand – just pick your start symbol (e.g. ‘English Sentence’) and choose at random one of the possible substitutions. Continue doing this until only terminal symbol are left.
SCIgen actually managed to get one of its generated papers accepted at the WMSCI 2005 conference. Here is a fragment of the grammar used:
EVAL_ANALYZE_ONE = note the heavy tail on the CDF in EXP_FIG, exhibiting DIFFERENT EVAL_MEASUREMENT
EVAL_ANALYZE_ONE = the many discontinuities in the graphs point to DIFFERENT EVAL_MEASUREMENT introduced with our hardware upgrades
EVAL_ANALYZE_ONE = bugs in our system caused the unstable behavior throughout the experiments
EVAL_ANALYZE_ONE = Gaussian electromagnetic disturbances in our EXP_WHERE caused unstable experimental results
EVAL_ANALYZE_ONE = operator error alone cannot account for these results
Here the upper cased words are the non-terminals, which are to be substituted. The complete grammar (with ~3000 production rules) can be found here: scirules.in. Also more respectable institutions have accepted SCIgen papers – in 2007 the Elsevier journal ‘Applied Mathematics and Computation’ accepted this article (now removed – but read more about this at the SCIgen blog).
Another spectacular (and funny) example of grammar generated text is the Postmodernism generator (based on the Dada Engine). It creates postmodernistic ramblings that would have made Alan Sokal proud!
Context Free Grammars
In the fifties Noam Chomsky categorized the formal grammars into four classes.
This classification can be understood in terms of how the productions rules look: For instance, for the third class – the socalled Context Free Grammars – the left hand side of the production rule is always a single non-terminal symbol (this is also the case for the natural language examples above). This means, that it is possible to apply the production rule no matter where the given non-terminal symbol is placed – the production rule is not dependent on the context of the symbol.
While formal grammars have interesting applications in computer science and natural language research, they are merely tools for analyzing or generating symbol sequences. In Part II I’ll describe the Context Free Design Grammar used in Context Free, and its relation to the EisenScript used in Structure Synth.