This document is
a "work in progress"
Last update: August 03, 2001
Document by: Thomas Ringate
Copyright © 2001
Contributing Authors: Dr. Richard S. Wallace; Anthony Taylor: Jon
Baer
CONTENTS
Given only the <pattern> and <template> tags, there are three general types of categories.
Strictly speaking, the three types overlap, because "atomic" and "default" refer to the <pattern> and "recursive" refers to a property of the <template>.
"Atomic"
categories are those with atomic patterns, i.e. the
pattern contains no wild card "*" or
"_" symbol. Atomic categories are the easiest,
simplest categories to add in AIML.
<category>
<pattern>WHAT IS A CIRCLE</pattern>
<template><set_it>A cicle</set_it> is
the set of points equidistant from a common point called
the center.
</template>
</category>
The above category does the
following:
- Matches the client input of "What is a
circle"
- Sets the "IT" variable to the value of
"A circle"
- Sends the client the response: "A circle is the
set of points equidestant from a common point called the
center"
The name "default category"
derives from the fact that its pattern has a wildcard
"*" or "_". The ultimate default
category is the one with
<pattern>*</pattern>, which matches any
input. In the ALICE distribution the ultimate default
category resides in a file called
"std-pickup.aiml". These default responses are
often called "pickup lines" because they
generally consist of leading questions designed to focus
the client on known topics.
The more common default categories have patterns
combining a few words and a wild card. For example the
category:
<category>
<pattern>I NEED HELP *</pattern>
<template>Can you ask for help in the form of a
question?</template>
</category>
responds to a variety of inputs from "I need help
debugging my program" to "I need help with my
marriage." Putting aside the philosophical question
of whether the robot really "understands" these
inputs, this category elucidates a coherent response from
the client, who at least has the impression of the robot
understanding the client's intention.
Default categories show that writing AIML is both an art
and a science. Writing good AIML responses is more like
writing good literature, perhaps drama, than like writing
computer programs.
"Recursive"
categories are those that "map" inputs to other
inputs, either to simplify the language or to identify
synonymous patterns.
Many synonymous inputs have the same response. This is
accomplished with the recursive <srai> tag. Take
for example the input "GOODBYE". This input has
dozens of synonyms: "BYE", "BYE BYE,
"CYA", "GOOD BYE", and so on. To map
these inputs to the same output for GOODBYE we use
categories like:
<category>
<pattern>BYE BYE</pattern>
<template><srai>GOODBYE</srai></template>
</category>
Simplification or reduction of complex input patterns is
another common application for recursive categories. In
English the question "What is X" could be asked
many different ways: "Do you know what X is?",
"Tell me about X", "Describe X",
"What can you tell me about X?", and "X is
what?" are just a few examples. Usually we try to
store knowledge in the most concise, or common form. The
<srai> function maps all these forms to the base
form:
<category>
<pattern>DO YOU KNOW WHAT * IS</pattern>
<template><srai>WHAT IS
<star/></srai></template>
</categroy>
The <star/> tag substitutes the value matched by
"*", before the recursive call to <srai>.
This category transforms "Do you know what a circle
is?" to "WHAT IS A CIRCLE", and then finds
the best match for the transformed input.
Another fairly common application of recursive categories
is what might be called "parsing", except that
AIML doesn't really parse natural language. A better term
might be "partitioning" because these AIML
categories break down an input into two (or more) parts,
and then combine their responses back together.
If a sentence begins with "Hello..." it doesn't
matter what comes after the first word, in the sense that
the robot can respond to "Hello" and whatever
is after "..." independently. "Hello my
name is Carl" and "Hello how are you" are
quite different, but they show how the input can be
broken into two parts.
The category:
<category>
<pattern>HELLO *</pattern>
<template><srai>HELLO</srai>
<sr/>
</template>
</category>
accomplishes the input partitioning by responding to
"HELLO" with <srai>HELLO</srai> and
to whatever matches "*" with <sr/>. The
response is the result of the two partial responses
appended together.
The above example
assume's that there is an ATOMIC category of
<pattern>HELLO</pattern>
Program D has a class called Substituter that performs a number of grammatical and syntactical substitutions on strings. One task involves preprocessing sentences to remove ambiguous punctuation to prepare the input for segmentation into individual sentence phrases. Another task expands all contractions and coverts all letters to upper case; this process is called "normalization".
The Substituter class also performs some spelling correction.
(See also the question "What is <person/>?")
One justification for removing all punctuation from inputs is the need to make ALICE compatible with speech input systems, which of course do not detect punctuation (unless the speaker utters the actual word for the punctuation mark -- "period").
When a client enters an input, the program scans the
categories to find the best match. By comparing the input with
the patterns in the following order, the algorithm ensures that
the most specific pattern matches first. "Specific" in
this case has a formal definition, but basically it means that
the program finds the "longest" pattern matching an
input.
Search order:
ATOMIC with a THAT
ATOMIC
DEFAULT with a THAT
DEFAULT
Example:.
What type of heaters do you have?
will match the ATOMIC: "WHAT TYPE OF
HEATERS DO YOU HAVE"
and not the REDUCTION of: WHAT TYPE OF *
The ATOMIC category will always take precidence
over any other type of category, other than another ATOMIC
with a THAT.
If you have two identical
patters, but one has a THAT, then the THAT category, will take
precidence over the ATOMIC category, if the THAT matches the
bot's previous response.
If neither of the above is true, then a REDUCTION
that matches part of the pattern will give it's response, and
finally if none of the above matches, then the catch-all or
pickup will take over.
Any categories that are contained within a TOPIC
section will be searched first if the current setting of TOPIC
matches a TOPIC section. This results in an
extension of the search order to the following:
ATOMIC with a TOPIC and a THAT
ATOMIC with a TOPIC
DEFAULT with a TOPIC and a THAT
DEFAULT with a TOPIC
ATOMIC with a THAT
ATOMIC
DEFAULT with a THAT
DEFAULT
The TOPIC sections are always
searched first if they match the current setting of TOPIC. This
permits the botmaster to have identical category patterns within
a TOPIC section and in the GENERAL section.
The wild-card character "*" comes before "A" in alphabetical order. For example, the "WHAT *" pattern is more general than "WHAT IS *". The default pattern "*" is first in alphabetical order and the most general pattern. For convenience AIML also provides a variation on "*" denoted "_", which comes after "Z" in alphabetical order.
No, the order is maintained internally when the categories load, but you can write them in any order.
If your session with program B included a "Classify" routine, then the AIML script is stored in order of category activation rank. In other words, program B stores the most frequently accessed category (usually '*') first, the second most frequently next, and so on. If a number of categories have the same activation count, program B saves them in alphabetical order by pattern. Hence, if the session did not include a "classify" routine, the program stores all the categories in alphabetical order by pattern (because they all have an activation count of zero).
One reason to store the categories in order by activation is to make the Applet interface more natural. Because the Applet interface starts simultaneously with a thread to load the robot source file, the Applet client can talk with the robot before all the categories are fully loaded. Given that the interlocutor is more likely to say something that activates a more frequently activated category, it makes sense to transmit these categories first. Storing the *.aiml files in order of category activation achieves the desired effect. The Applet loads the most frequent categories first, and continues loading in the background while the conversation begins.
In general there are a lot of categories whose job is "symbolic reduction". The category:
<category>
<pattern>ARE YOU VERY *</pattern>
<template><srai>ARE YOU
<star/></srai></template>
</category>
This category [in std-brain.aiml] will reduce "Are you very very smart" to "Are you smart".
AIML is extensible. You can create an infinite number of new tags for foreign language pronouns, predicates, or application-specific properties. "Predicate tags" mean tags that have a client-specific "set" and "get" method. Pronouns like "it" have predicate tags like <set name="it"></set>. AIML has a number of these built-in tags for common English pronouns.
Using the <set
name="xxxx">
and <get name="xxxx">
tags an endless variety of languages and possiblilties can be
supported.
Understanding recursion is important to understanding AIML. "Recursion" means applying the same solution over and over again, to smaller and smaller problems, until you reduce the problem to its simplest form. AIML uses the tags <sr/> and <srai> to implement recursion. The botmaster uses these tags to tell the robot how to respond to a complex sentence by breaking it down into the responses to simpler ones.
Recursion can apply many times to a single input. Given the normalized input:
ALICE CAN YOU PLEASE TELL ME WHAT LINUX IS RIGHT NOW
an AIML category with the pattern "_ RIGHT NOW"
matches first,
reducing the input to:
ALICE CAN YOU PLEASE TELL ME WHAT LINUX IS
Another pattern ("<bot name="name"/> *") reduces it to:
CAN YOU PLEASE TELL ME WHAT LINUX IS
And then:
PLEASE TELL ME WHAT LINUX IS
reduces to:
TELL ME WHAT LINUX IS
and finally to:
WHAT IS LINUX
If your reply contains the markup
<system>yourcammand <id/></system>
then the robot will insert the (virtual) client IP into the command line argument for "yourcommand". Then it is up to "yourcommand" to enforce access privileges.
If you are fortunate enough to be running lynx under Linux, the following markup is a simple way to "inline" the results of an HTTP request into the chat robot reply. Try asking ALICE: "What chatterbots do you know?" and she will reply with a page of links generated by the Google search engine.
<category>
<pattern>WHAT *</pattern>
<template>
Here is the information I found:
<system>
lynx -dump -source -image_links
http://www.google.com/search?q=<personf/>
</system>
</template>
</category>
Yes. You can include any HTML including <javascript> tags. Suppose you want to "chat AND browse," in other words, have the robot open up a new browser window when she provides a URL link. Here's a category that kicks out a piece of HTML/scripting that opens a new window with and loads a given URL. This is handy for search engines or showing off one's web page.
<category>
<pattern> WHERE IS YOUR WEB SITE </pattern>
<template>
It's at "http://www.geocities.com/krisdrent/"
<javascript language="JavaScript">
// Go to <a
href="http://www.geocities.com/krisdrent">The ALICE
Connection</a>
<!--
window.open("http://www.geocities.com/krisdrent/")
-->
</javascript>
</template>
</category>
A couple of things to note about this technique:
NORMALIZED
TEXT
_, *, and <bot name="name"/> (at present)
PSAE
AIML broadly breaks down into two parts: "Pattern Side AIML
expressions" that can appear in the <pattern>,
<that>, and <topic> and "Template-Side AIML
expressions" that appear inside the <template>.
Pattern-side AIML expressions (PSAE):
TSAE
TSAE expressions are comprised of ordinary text, optionally
marked up with all the other tags. Generally speaking, it
doesn't make sense to use PSAE's in the
template or TSAE's in the pattern, topic or
<that>...</that>. The sole exception at this
point is <bot name="name"/>.