The roadmap for the Texai project is guided by these principles, which are viewed from a different perspective in this post about the challenge ahead.

  • We must incrementally build up the capabilities of the system, having an operational system each step of the way to ensure that components and their interfaces are valid
  • At each step of developement the intelligent system is operated with real sensing and action
  • Components are developed in an order such that later components use the facilities provided by earlier components
  • Bootstrap development is entirely in Java
  • Existing Java libraries are preferred, however an existing application can be re-written from scratch as a less favored alternative
  • Each component is based upon a technology which is known to work, so that the project risk for AI-hard tasks is reduced to the component combination and elaboration
  • All facts and rules are stored in the knowledge base (KB). The Semantic Annotation for Persistence component: RDF Entiy Manager, facilitates storing any Java object as RDF statements in the Sesame RDF store.
  • During the Bootstrap Stage, each component should be written or integrated by me
  • Each component should be released in a manner that suits its incorporation into other projects

This project’s roadmap has as its goal the creation of artificial intelligence. Thus it is expected that many of the component combinations or component elaborations will be AI-hard problems (i.e. a problem which is beyond the ability of conventional software, but which a human can solve). Each such component combination/elaboration will be based upon my assessment of the best approach as evidenced by decades of research.

In brief, the approach will be to first construct an English dialog system whose initial goals are to acquire linguistic and common sense skills to improve its own performance. Next the system will acquire expertise in algorithms, and in Java programming for the purpose of explicitly representing its own behavior in the KB. Thus it will understand, revise, test and automatically compose its own source code. In parallel at this point, the system will acquire lexical and common sense knowledge from the glosses (word sense definitions) in the Texai lexicon, and begin to covert Wikipedia English text into KB statements, fleshing out the OpenCyc terms. In addition to scaling to many disparate users via Jabber chat from a single Texai instance, the system will be deployed as a virtual appliance to compute clusters and to a multitude of Internet users, where each instance hosts one or more nodes organized within an Albus Hierarchical Control System. These Albus Nodes (i.e. agents) will be organized into agencies, many mirroring current human organizations in which a node is a user’s proxy into the Albus HCS for some role. The artificial intelligence will then consist of a vast community of organizations whose members are Albus nodes, each quite intelligent with regard to its agency’s mission.

Bootstrap Stage

Bootstrap in the context of the Texai dialog system means that I, as sole contributor, Java programmer, and linguistics technician, create a tool whereby a great number of people can subsequently voluntarily contribute without being Java programmers or linguists. The Texai Bootstrap Stage is the process and set of milestones required to create such a tool. During this stage I will manually perform the development life-cycle for the essential dialog Java programs. Likewise I will manually compose the essential grammar constructions. By essential, I mean just those minimal features that permit the dialog tool to acquire new knowledge, new behaviors (i.e. scripts that compile to Java code), and new grammar constructions from a multitude of contributors. This post describes the dialog system architecture.

  1. Create an initial RDF knowledge base derived from OpenCyc (done September/2006)
  2. Incorporate a scalable RDF store (done September 2006, revised to using Sesame 2 August 2007)
  3. Create a facility to persist Java objects into an RDF store (released RDF Entity Manager, done December 2006)
  4. Create a Jabber chat interface (done November 2006 using the Smack API)
  5. Import the WordNet ontology into the Texai KB Lexicon and release as an RDF file (done September 2006, released September 2007)
  6. Import the CMU Pronouncing Dictionary into the Texai KB Lexicon and release as an RDF file (done November 2006, released September 2007)
  7. Import the Wiktionary into the Texai KB Lexicon and release as an RDF file (done November 2006, released September 2007)
  8. Integrate the Sphinx-4 automatic speech recognition and the FreeTTS speech synthesis engine (done January 2007, not yet released)
  9. Create a prototype language model plug-in for Sphinx-4 that intercepts calls to the N-gram model when evaluating the n-best list (done January 2007, not yet released)
  10. Create a Java version of the Fluid Construction Grammar engine (released November 2006, the original Lisp implemention provided via collaboration with Joachim De Beule at the Artificial Intelligence Lab, Free University of Bussels)
  11. Create an incremental Java-based implementation of the Fluid Construction Grammar engine (done, and released January, 2008)
  12. Adapt grammar construction rules from Double R Grammar (done, and released January, 2008)
  13. Incorporate the Kintsch Construction/Integration technique for pruning interpretations during parsing (done, April 2008)
  14. Implement Capability Description Language to support skill acquisition, and implement a task-to-capability matcher to support skill performance (in-progress as of May 2008)
  15. Create a controlled English vocabulary and grammar rules for the purpose of acquiring more vocabulary and grammar rules
  16. Create a knowledge-based intelligent dialog system that maintains a model of each user’s belief state, and which can guide users to communicate effectively using controlled English
  17. Create the ability to answer simple questions that do not involve deep deductive inference, nor induction, nor abduction
  18. Deploy a chat based system to acquire the lexical knowledge for a broad coverage of English with special emphasis on the ability to read (and to fully understand) the glosses in its own lexicon [AI-hard]
  19. Begin to acquire algorithmic knowledge, not yet including detailed Java programming knowledge, but sufficient that new skills can be acquired via teaching [AI-hard]

Mass Contribution Stage

At this point the system will perform mixed-initiative dialog, and entirely self-directed activities. The system is to be organized as an Albus Hierarchical Control System (AHCS). One can think of an AHCS as a model of a large human organization, in which the humans are the intelligent agents, and the corporate or military structure is the network.

  • Engage a multitude of users to flesh out the commonsense OpenCyc-derived ontology. Collaboration with the Cyc Foundation
  • Create a Knowledge Dashboard, which communicates the topic of discourse, meta knowledge, knowledge trails, and related info. Collaboration with Stu Rogers at AGS TechNet
  • Enable Texai to acquire the domain knowledge and skills to understand, revise, test and to compose its own Java programs, or more efficient programs in machine language [AI-hard]
  • Deploy the system on a virtual appliance that automatically connects itself into the Albus Hierarchical Control System
  • Develop a robust AHCS architecture that can scale to millions of nodes
  • Create an agency (i.e. set of Albus nodes) to oversee the creation of other useful agencies
  • Enable the system to seek out and to motivate human mentors for various agencies
  • Have Texai fully understand a substantial variety of the sentences contained in the English Wikipedia, and begin to read it [AI-hard]
  • Give Texai the ability to compose article-length English compositions from its own knowledge [AI-hard]
  • Have Texai provide an International Knowledge Infrastructure with human-usable and machine APIs
  • Provide compelling benefits that motivate users to download a Texai instance for every capable computer and to use all spare cycles
  • Among commonsense knowledge acquired, have an emphasis on ethics and friendship theory, be able to model adversaries, and prevent its knowledge corruption by malicious users [AI-hard]
  • Configure Albus Nodes to control robots, using their physical sensors and actuators - single robots and coordinated teams
  • Using mentors, incorporate AI algorithms for the following [AI-hard]

Takeoff Stage

  • Develop great expertise in recursive self-improvement, towards the goal of maximal Friendliness
  • Achieve the Singularity, delivering benefits beyond what we can now project