Note that the system now can only understand and generate a single utterance. Progress towards a broader coverage of English awaits the completion of the Grammar Acquisition Skill, and the Vocabulary Acquisition Skill.

I have completed coding the English Generation Skill and I tested it on my use case utterance “the book is on the table“. As expected, I wrote the supporting Generation Rule Application Library with less effort than its counterpart on the parsing side - Parsing Rule Application Library. The same rule set is used by both libraries, and the method of rule application is similar. In my opinion, this is strong validation of the Fluid Construction Grammar engine that I adopted from Luc Steel’s work here.

As noted in my page on the English Comprehension Skill, my use case utterance “the book is on the table” has been elaborated with ambiguities to challenge the parser with respect to figuring out the right interpretation within the discourse context. Without much guidance from the literature on Natural Language Generation (NLG), I hypothesized that a cognitively-plausible generation system should prune generation alternatives at the earliest possible point and the beamwidth of the generation interpretation tree should be 1. I selected the following features to score the generation alternatives. I hope in the future to employ machine learning to optimise the weighting of the factors.

Generated utterances are favored:

  • that have fewer words
  • that reuse previously uttered words for a given meaning term
  • that use words that the recipient is otherwise likely to know
  • that have the least effort when performing a trial parse of the (partial) utterance

Presently I am satisfied with these features, while conceding that performing numerous trial parses during generation degrades the response time. For example, in the use case utterance, I included the following ambiguities for a total of six possible generation alternatives:

  • book, tome and volume were treated as synonyms for the meaning cyc:BookCopy (i.e. a book)
  • on and on top of were treated as equivalent constructions for the meaning texai:OnTopOf-SituationLocalized (i.e. the situation having something on top of something else)
  • book means either cyc:BookCopy or texai:SheetsOfPaperBoundTogetherOnOneEdge, in which the latter term can be used for a pad of paper or a matchbook

These are the six alternative generated utterances:

  1. the book is on the table
  2. the book is on top of the table
  3. the tome is on the table
  4. the tome is on top of the table
  5. the volume is on the table
  6. the volume is on top of the table

Alternatives (2), (4), and (6) are disfavored because they are longer. Alternative (1) is disfavored because the trial parse has more effort to figure out which word sense of book is meant, given that the discourse context contains a reference to a cyc:BookCopy. The preferred utterance was therefore a tie between (3) and (5), and the system generated choice (5). Because a human would not make the same choice, I believe that book and volume are not truly synonymous as WordNet groups them. I’ll revisit this use case in the vocabulary acquisition skill when the user will have the opportunity to elaborate the semantics for these words so that they can be generated more appropriately.

The next task ahead for the Texai project is to write the Vocabulary Acquisition Skill. Unlike the most recently completed tasks that built upon earlier work, this one will require more analysis and design, beginning with a set of use cases. To obtain data, I performed a word frequency analysis of the 324,000 glosses (definitions) from the Texai lexicon. The results are in the project repository at SourceForge here. By being taught lexical stem and morphological constructions that cover only 15% of the most frequently used words, it should be possible to automatically parse 50% of the word sense definitions. That is, about 13,000 words out of 85,000 words can be used to parse approximately 150,000 word senses out of the 300,000 contained by the lexicon. About 12,000 WordNet word senses are already mapped to the OpenCyc portion of the Texai ontology, and that’s a good start.