Artificial Intelligence Markup Language
- Editing Bot Content with a Spreadsheet Editor (Richard Wallace)
Be Your Own Botmaster: The Step by Step Guide to Creating, Hosting and Selling your A. I. Chat Bot on Pandorabots (2005) [PDF] .. by Dr Richard S Wallace @drwallace
Using a Spreadsheet or Database Program to Write AIML (pp 132-147)
There are many authoring tools and editors one could use to write AIML. You can use your favorite text editor, be it MS WORD, Notepad, or a powerful text editor like EMACS. In addition, there are many tools developed by the AIML free software community designed to help make writing AIML easier. Pandorabots, for example, has a web based interface that helps you write one AIML category at a time. It also has a tool called Pandorawriter that converts dialog transcripts into AIML categories. Other software to help write AIML categories is listed on the A. I. Foundation web site under www.alicebot.org/downloads. Because AIML is an XML language, you can also use editors specifically designed for XML to author your AIML files. This document concerns a different approach, however; one based on using a spreadsheet or database program to help write massive numbers of AIML categories.We will take you through a step-by-step example of creating an AIML file using a spreadsheet program, specifically MS Excel. But the principles and procedures are about the same for any spreadsheet or database program that allows you to enter data in table format. There are a few pitfalls to using these programs, and we will point them out. Their advantage is that you can create a large number of AIML and manage them fairly easily. Especially, the ability to sort categories by <pattern> or <template> makes it easy, in some cases, to eliminate duplicate categories or find opportunities to simplify your AIML with <srai>.The following example is a simple case of creating categories that have only a <pattern> and <template>. More complex categories using <that> and <topic> do not appear in this example. But after following the example, it should be easy to see how to generalize this AIML authoring technique to categories with <that> and <topic>.One word of caution: a botmaster may end up wasting a lot of time creating AIML categories that will never be activated. This is because, it is difficult to predict in advance what kinds of conversations and inputs clients will have with your bot. A common mistake is to create categories with patterns that are too specific to ever be activated in a realistic conversation. This is why we generally prefer the approach called “Targeting” to create AIML categories.In the most general form, Targeting simply means reading the log files of conversations with your bot to get an idea about what inputs the bot cannot answer, and then writing new categories to handle those inputs. It is based on the principle that if one client makes a specific input to your bot, another client will come along later and make the same, or almost the same input, over again. So it is most productive to focus your efforts on the inputs people have already tried on your bot, than to try to predict in advance what those inputs will be. Believe us when we say that after your bot is running online and well publicized, you will collect plenty of conversation data to keep you busy writing AIML through the Targeting approach.Some AIML programs, such as Pandorabots, have special software tools to make Targeting even more efficient. You won’t even have to read the conversation log files one by one. The software automatically detects client inputs for which the bot does not have a specific reply, and alerts the botmaster to these as potential new input patterns. If you are starting a bot from scratch, you can build up your bot’s brain using Targeting to find the most common inputs first, and writing replies for those inputs. You can prioritize your work by writing AIML for common inputs first, and then work on less frequent input forms later. This approach guarantees that your bot will have the greatest “coverage” of inputs for the amount of work you put in.Having made that disclaimer about the Targeting approach, there are some circumstances when you just want to write a large amount of AIML categories without referring to dialogues or Targets. In these cases, using a database or spreadsheet program may be a useful and timesaving approach.We begin by observing that much AIML code is redundant XML, and that we would prefer to avoid typing the same <category><pattern></pattern><template></template></category> tags over and over for every new AIML category. The parts that really interests us are what goes between those <pattern> and <template> tags. So we can use a formentry program like MS Excel to create the data for our AIML file.The first screenshot illustrates an example of using MS Excel to input a large number of AIML patterns and templates, using the A and B columns of the spreadsheet respectively.Notice that we have adjusted the width of the A and B columns to take into account the expected size of our patterns and templates. Although this is not necessary, it makes it easier to read the categories and provides better formatting if you want to print them out.One convenience often provided by such programs is auto completion, which means that if you start to type the same thing over again in the same column, the program will match what you have typed with a previous entry and complete the entry for you. This may not always give you what you want, but it often improves efficiency if you are entering many similar patterns or identical templates.It is a good idea to save your work from time to time as you enter your AIML data, especially if you intend to create a large file. This example file is called Psychology.aiml, so we use the File/Save menu option to repeatedly save that file as we add new data. Eventually, the file filled up with 500 lines of data representing 500 new AIML categories.Another great convenience of these programs is that you can sort the categories by different columns. For example we can take the data we have entered and sort it by the A column by clicking on the A/Z button in MS Excel, or by pulling down the Data/Sort menu option. As the next screenshot shows, we can click on the A column and sort the categories by AIML pattern. One note of caution here: if you are using Excel be sure to select both the A and B columns before running the sort, otherwise you run the risk of sorting the patterns independently of the templates, and mixing up all your categories. Database programs, unlike spreadsheets, usually work differently and assume that the data is connected across every row, so sorting by any column keeps the row data together. In Excel, you can sort all the data by A or B, depending on which you select first, but it is important to select both.The next screenshot shows how we have sorted the categories by A, the AIML pattern. This is extremely useful for finding specific categories or for eliminating categories with duplicate patterns. For instance, suppose we know that the input pattern BUT * appears in another AIML file, and is duplicated in this new data. We can easily find it by sorting and then delete the BUT * category.Now, we consider how to format our data into proper AIML categories. First, we use the Insert menu to choose the Insert Columns option. Select the A column first and insert a new column to the left. Select the B column next and insert a new column between A and B.Now, scroll down to the last row of data in your spreadsheet. It is important to start at the bottom because we are going to use the Fill command to fill up the new A column with identical data. If we start at the top, Excel won’t know where to stop filling and create too many empty AIML categories. Go down to the last row of data and type <category><pattern> into the last row of the A column, as the next screen shot shows:Now, select that last data entry box and use your cursor to move up to the first data row, thereby selecting all the data boxes from 499 (in this case) back down to one. Then, use the Edit menu to select the Fill/Up option and you should see the A column fill up with identical entries of <category><pattern>. You may then want to adjust the width of the A column for appearance:Now, we basically repeat the same procedure in the C column by entering the data </pattern><template> and again in the E column with the data </template></category>. Again, scroll down to the last row of data and use the Fill/Up option so you don’t overflow the columns with empty categories.Now we are ready to convert the spreadsheet file to a text file and complete the process of conversion to proper AIML. Using the File menu, select the Save As… option. A dialog box will appear giving you the option to export the spreadsheet to many different file formats. For our purposes, the best choice is called “Text (tab delimited) *.txt”. Choosing this option will automatically create a file name called Psychology.txt, because our original file was called Psychology.xls.When you click the Save button, you may encounter a series of dialog boxes warning you about problems such as “The selected file type does not support multiple sheets” and “Psychology.txt may contain features that are not compatible with Text (tab delimited)”. Generally you can ignore these warnings and simply click OK or Yes as your option.After you have saved the file, you will now need to use a text editor to make some final formatting touch-ups to create a well-formed AIML file. At this point we often transfer the text file over to a Linux machine and use emacs to make the final changes, but a text editor as simple as Notepad works equally well.Let’s open the text file in Notepad and see what we have:The first item of business now is to eliminate all the tabs used as delimiters. This step is not strictly necessary for many AIML interpreters, because they will ignore the tabs or treat them as spaces. But eliminating them makes the file look nicer. With Notepad, you can use Edit/Replace option to replace a Tab with “nothing”.Sometimes it is not possible to type a Tab character directly into the Find What: text box, but you can get around this by copying and pasting a Tab character from your source. You don’t have to type anything in the Replace With: text box, just leave it empty and click Replace All.Now we can save our work as an AIML file. Use the File/Save As… menu item and select Save As Type: All Files. Name your file Psychology.aiml (or whatever name you choose, use a .aiml file extension). There is only a little more work to do to finalize your AIML file.If you look closely, you can see that the exported spreadsheet file contains some extra, unwanted double-quote marks. These were inserted in two cases: whenever your XML tag contained a quoted attribute value like index=”2” and whenever quote marks appeared in the AIML template. You need to follow the following steps to rewrite these categories1. Use Edit/Replace to replace all occurrences of “” (two double-quotes) with a one “ (a single double quote).2. Use Edit/Replace to replace all occurrences of >” with > (these occur at the beginning of a quoted <template>.3. Use Edit/Replace to replace all occurrences of “< with <. (these occur at the end of a quoted template.Of course, these rules are not foolproof. You may have wanted to have quote marks around your template. You may have templates that contain, for whatever reasons, a pair of double quotes together “”. But apart from these unusual circumstances, the substitutions will clean up your AIML file quite well.Finally, we need to add some text to the beginning and end of the AIML file to make it conform to the AIML schema. The end of the file is simple, just add a line that says </aiml>.At the beginning of the file, you may want to include a copyright statement in XML comment form, as well as the XML specification and the opening <aiml> tag:Finally, we have finished creating a well-formed AIML ready to upload to your favorite AIML interpreter. As we mentioned earlier, it should be easy for you to see how to create a similar file, which includes <that>, or <topic> patterns. In the case of <that>, you will start by entering three columns of data and fill up two columns with </pattern><that> and </that><template> respectively. You can add any <topic> tags using the text editor.In conclusion, you can use a spreadsheet or database program to efficiently write large numbers of AIML categories. The file export features of these programs allow you to convert the data from two- or three-column format to delimited text. Depending on which data entry program you used and its available file export functions, you may have to use a text editor to touch-up the file to finalize its AIML format. These procedures may be helpful in some AIML authoring scenarios, but you should also consider other options such as Targeting and AIML-specific authoring tools.