Avatars, Agency and Performance: The Fusion of Science and Technology within the Arts
Richard Andrew Salmon 2014
7. Presentation of Research Data and Findings
This section presents research data from text string analysis and from analysis of dialogue and events that took place during video cued recall interviews.
Details of how the data was qualitatively and/or quantitatively analysed, and the emergent themes from this investigation are given. The themes and supporting evidence are presented with a narrative emanating from the research data analysis and coding processes. After presentation of the themes, Section 8 considers key concepts that help explain the findings and inform the blueprint that follows. Section 9 then presents the blueprint of emergent ideas related to both the new media augmentation projects and the wider research questions related to agency, performance and engagement in general. The blueprint puts together a set of developmental recommendations for the improvement of human . machine interaction based on the emergent themes and findings from this investigation and the key concepts that help to explain them.
Please note, the interpretive account of the narrative emanating from the research data collected during this investigation, will present you with hyperlinks in the text that will allow you to review textual or auditory visual supporting and substantiating evidence, which bolsters the validity of the interpretive narrative. Microsoft Word, Excel and Adobe Acrobat Reader are required to review textual evidence interactively. Windows Media Player (Windows) or QuickTime (Mac) – with browser support for Windows Media Video files (.wmv) will be required in order to view the auditory visual data, which is highly recommended. Windows Media Video files (.wmv) support for Mackintosh computers can be installed using Flip4Mac (Telestream, 2013).
Node textual data is too expansive to be supplied as a hard copy appendix but supporting textual evidence is supplied through the Key Evidence Links Table electronically. All the textual and auditory visual content is supplied on an install DVD. The anonymity of participants and the public are maintained throughout the evidence presented. However media release forms were collected from participants because; although their names are not projected in the textual evidence, they may of course be present in the videos.
7.1 Text String Input/Output Analysis
Text input by the User interacting with the Articulated Head’s A.L.I.C.E Chabot and the Chatbot’s response strings were stored into text files by a Max patch every 30 minutes, for the creation of a data log. This logging process resulted in many thousands of text files for analysis. User input strings were prefixed with a ‘1’ and Chabot output strings prefixed with a ‘2’. The data log spans June 2011 to November 2012 – 17 months. The log files do not include every single interaction that took place during the exhibits time in the Powerhouse Museum, as logs before June 2011 were not captured and the logging process was interrupted by various crashes and shutdowns. However the logs do include the vast majority of all interactions that took place during the time span stated, and therefore, they are considered to be highly reflective of conversational content taking place between Users and the Chatbot. Furthermore because the text data logs are not restricted to the research participant frame, within which all other concrete evidential data collected in this investigation is constricted, it was thought that if a strong correlation exists between the text analysis and the rest of the research participant data collected, then it would strengthen and ratify the themed interpretive phenomenological analysis of data within the research participant frame considerably.
This text string input/output analysis acts as a sort of quality control mechanism in this investigation by providing a view of the conversational interaction that took place, which is largely independent of any transient environmental or operational conditions of the Articulated Head at the time.
The data is also largely independent of any influences and preconceptions that might be present within the participant frame. It should be noted that node coding but not theme identification of the data collected within the research participant frame, was in fact conducted prior to this text input/output analysis, even though they are presented in the reverse order in this section of the thesis. In line with a Grounded Theory approach, the text input/output analysis served as purposeful sampling across a different data set, facilitating the crosschecking of the fit of theory to the data. The reason for presenting the text input/output analysis first is that it helps cement the validity and relevance of the emergent themes from nodal analysis. The emerging pattern from the text input/output analysis does indeed fit and substantiate the identification of critical elements at work in the interaction.
There is a very strong correlation with two particular themes that had already been identified from Nodal analysis within the frame of research participant data. The two themes of interest are titled; Theme 5: Movement, Tracking and Control & Theme 6: Dialogue. The text input/output analysis also lends some weight to the other themes.
The analysis of text data that follows in this Sub Section focuses primarily on User input strings and Chatbot output strings as separate entities. Theme 6: Dialogue; then brings the findings from this input/output string analysis to compare it with the nodal data analysis of conversational dialogue that preceded it. The nodal analysis of conversational dialogue looks at User input strings and Chabot output strings together, to consider such issues as nonsensical input from the User, or vice versa, nonsensical output responses from the Chatbot.
User input text had an inordinate amount of spelling errors, general typo’s, mobile phone speak, slang, swearing and other spurious or erroneous data input, including colloquialisms and characters entered by small children fiddling with the keyboard, therefore an accurate spelling mistake count was not practical. However, it is clear from review of the Nodal data presented throughout Section 7 that data input did have many spelling errors. Chatbot output strings are more consistent than user input – but also have some erroneous data strings present, including strange spellings and some words sewn together, an example of which is present in table 7.35 under the Chatbot phrase query lists that follow in Section 7.1.2. I believe the programmers executed the words sewn together because the text to speech engine sounded better that way. Furthermore to the above anomalies in input/output textual data, a significant number of Chatbot responses do not make sense in relation to the User inputs that herald them, see Theme 6: Dialogue for more details. Therefore, the text input/output data is considered something of a quagmire in the first place. For this reason all text analysis is based on near approximations and only clearly indicative numerical string search counts are used in input/output text data analysis, to minimize the chances of misinterpretation of the data. Any specific words and/or phrases used from the textual data for queries are referred to by placing them between inverted commas in the following text, like so; ‘word or phrase‘. In total the text log files generated a word count of approximately 3 million words or 6 thousand pages of data. However it should be noted that these figures are not very useful and somewhat vague because they count all spurious as well as useful input/output.
To establish whether the textual data could reveal anything important about the conversational interaction-taking place, the following approach was adopted: The text files (approx. – 1488 files per month) were organized into monthly data folders. The files that were stored outside Museum opening times had little or no data in. An Automator – Apple script workflow was built to merge monthly text data files into one long text file for each month, which was saved, resulting in 17 monthly long text files. Another Automator – Apple script workflow was built to separate User input strings from Chatbot output strings using the saved monthly long text file. The script then saved the separated data into two separate text files one for the User input strings and one for the Chatbot output strings. This process was repeated on the long text files for each month resulting in 34 files in total, 17 monthly files for the User input and 17 for the Chatbot output. All the resulting files from the processing described above were imported into NVIVO, which is a Computer Assisted Data Analysis Software program for analysis.
A word frequency search was conducted on the User input string files to find the top 10 words present in User input strings with the following result;
Table 7-1 – User input strings – top 10 words
The same word frequency search was conducted on the Chatbot output string files to find the top 10 words present in Chatbot output strings with the following result;
Table 7-2 – Chatbot output strings – top 10 words
The results from these word searches are not surprising with short prepositions featuring strongly. The appearance of the words ‘what’ and ‘how’ in the Users top ten list is consistent with the User asking the Articulated Head questions, which was of course its intended function. Not much can be established from these searches other than to suggest perhaps that the User input, having ‘you’ as its number one most used word, indicates that the User was more interested in the Chatbot than in themselves with ‘I’ coming in at number ten. Whereas the Chatbot, having ‘I’ as it’s number one most used word, appears to have been perhaps more interested in itself than the User as the word ‘you’ is used approximately 15000 times less over the data span.
The other thing that appears fairly obvious from this initial word frequency search is that, the reciprocity of conversation was lopsided; the Chatbot appears to have had roughly twice as much to say as the User based on the total count of word instances in both tables with the ratio at 1.81:1. This point of observation based on the two tables above is indicative of the mode of information exchange (text in – speech out) representing difficulties of data string input for the User, as is detailed and evidenced in the Themes that follow later in Theme 4: Mode of Interaction.
The next step for text input/output text string analysis was to expand the word frequency search to the top 100 words and then drill down on a group of selected nouns, verbs and adjectives to see if any emergent patterns in conversational foci could be established.
The top 100 words in User input strings are shown as E-Appendix 13: Top 100 User Input String Words and E-Appendix 14: Top 100 Chatbot Output String Words respectively. When viewing these tables you will find the selected words highlighted by the background color Red. The chosen groups of words are shown in the two tables on the following page;
Table 7-3 User Input (left) Chatbot Output (right) Selected Search Words
The tables above do not show direct correlation between the words chosen from the User input Top 100 list in comparison with the Chatbot output Top 100 list, simply because the use of vocabulary of the User did not correlate with the Chatbot vocabulary. A few more words have been selected from the User input Top 100 list over the Chatbot output Top 100 list. Words chosen for further examination were picked on the basis that they present clear subjects of relevance. The aim was to avoid the ambiguity that some more frequent but less prominent words in the top 100 lists might introduce into the analysis, because they have vast arrays of possible conversational uses in phraseology. Choice of appropriate words from the two lists was more important than the number of words chosen. Given this investigation focus on the human experience of interaction with the Articulated Head, a slightly stronger emphasis on User input as oppose to Chatbot output in this analysis might be more appropriate and desirable anyway.
Each of the selected words from the top 100 User input and Chatbot output strings were entered into a string search (query) and the resulting references were reviewed to try to find the most common group of words surrounding the search string word. For example; if the search word was ‘favorite’ a surrounding conversational foci string might read ‘what is your favorite song’ or ‘my favourite time of day is’. Queries were constructed in NVIVO to include both English and American spellings of words where appropriate, such as ‘colour & color’ or ‘favourite & favorite’ E-Appendix 15 – Favourite – April2012-Results Preview for an example of a search result).
The aim was to catch as many instances of phrases with the same meaning as possible. Once a recurring common group of words surrounding a search string word was identified, the recurring group of words, normally limited to two or three because of the speed of data processing, then formulated the search string for the next query directed at the data. This process of phrase identification and query construction was conducted on both User input and Chatbot output strings. The results of this exercise were initially entered into two Excel spreadsheets in the order in which the identified phrases emerged from the text data E-Appendix 16: – User Input String Phrase Table and E- Appendix 17: Chatbot Output String Phrase Table.
An attempt to get 70% coverage of the initial search word count, by subsequent phrase searches was made. The choice of 70% was made on the basis that the vast majority of recurring common groups of words were picked up in the first 70% in initial searches and above this figure the commonality depleted rapidly and erroneous data began to appear in the phrases. The data was extrapolated from the linked tables above and assembled into the lists presented in the tables in subsection 7.1.1 and 7.1.2 below. Each list was either topped up to 70% coverage, or a reason is identified below the table that explains why this was not possible. All lists, and the phrases presented in them are in the most > least significant word or phrase order, based on the frequency of that word or phrase. A short statement of observations emerging from the information follows each table.
Subsection 7.1.3 then looks at key observations made from examination of the table data in more detail.
7.1.1 User Input String Lists
Table 7-4 – Word ‘name’ phrase query list
The word name as the Users most common word is not really surprising and with a very significant proportion of instances grouped with the phrase: ‘what is your name’, it can be said that this is likely to be one of the most common phrases input by Users because name is the most common word in the user input strings. Although there is no reliable delineator of the start and stop of individual interactions in the text data, it is likely that the word ‘name’ is used by the User or the Chatbot at the outset of many interactions. The frequency of the use of the phrase ‘my name is’ suggests that the Chatbot regularly asks Users what their name is.
Table 7-5 – Word ‘like’ phrase query list
The phrase ‘you like’ has hundreds of appendages in the data with examples including rag dolls, kittens, space travel, cheese, chocolate, surfing and so on. Instances of these appendages are diverse and seem to be based on the likes/dislikes of the human inquisitor. Based on the frequency of the ‘I like” phrase, it appears that the User is much more inclined to question the Articulated Head’s preferences rather than inform it of their own. The inclination towards establishing the Articulated Heads preferences based on the likes/dislikes of the human inquisitor is discussed more in Observation 1 – The anthropomorphic stance & in Theme 1: Anthropomorphism.
Table 7-6 – Word ‘old’ phrase query list
The phrase ‘(how) old are you’ is prefixed with ‘how’ almost exclusively in the text data. The only reason for the leaving the word ‘how’ out of the phrase search string is to speed up data processing of the query. It is clear that ‘what’s your name’ and ‘how old are you’ are very common phrases used by the general public during interaction.
Table 7-7 – Word ‘say’ phrase query list
The reason that the word ‘say’ is erroneous for a phrase search is that the public interacting with the Articulated Head discovered that it was possible to make the robot say anything by prefixing the sentence typed in with the word. For example; if one enters ‘say Cornflakes’ into the kiosk keyboard, the Articulated Head will say Cornflakes. This control word became a point of great interest, especially with school kids resulting in a number of expletives and other questionable or rude statements as well as many more benign uses present in the text data E-Appendix 18: [Example 1]. It is a bit of a mystery how so many people got to know of the control word because new visitors to the museum appeared to know that they could use it – perhaps the school kids grapevine, the bush telegraph, is more powerful than I had realised? This control word anomaly is discussed in more detail under Theme 5: Movement, Tracking and Control.
Note: lots of solo instances of ‘sing’
Table 7-8 – Word ‘sing’ phrase query list
The ability for the Articulated Head to simulate singing was introduced during its time at the Powerhouse Museum and so most, if not all instances of the word ‘sing’ related to its time there. Getting the coverage of the word ‘sing’ to over 70% was not possible because of the vast number of solo instances of the word. The reason for this is probably because the public was aware that the robot could sing as a result of museum staff regularly typing in the trigger strings, but the public did not know the trigger strings so tried the word ‘sing’ regularly. In fact ‘sing vincero’, ‘sing je crois’ or typing ‘it’s my birthday’ were the triggers, which explains their high occurrences in the list. The museum volunteer guides and permanent staff were aware of the trigger words and used them often when passing the exhibit. The other phrase instances in the table attempt to establish whether the robot can sing. How many of the phrase instances are attempts to trigger the robot to sing is unclear. However, as is the case with the word ‘say’ above, the control/triggering of a response from the Articulated Head was also at work with the word ‘sing’ and control words or phrases have implications for engagement in interaction as discussed in Theme 5: Movement, Tracking and Control.
Table 7-9 – Word ‘favourite’ phrase query list
The word ‘favourite’ and surrounding phrases shows a strong correlation with the word ‘like’ and its surrounding phrases. Links are also made to the words ‘colour’ and ‘love’ in the tables below Both words are linked to evidence that suggests the Users are testing the robots preferences in a specific way, this is discussed in more detail after these tables, under Observation 1 – The anthropomorphic stance & in Theme 1: Anthropomorphism.
Table 7-10 – Word ‘colour’ phrase query list
The word ‘colour’ is clearly linked to ‘favourite’ and is also discussed under Observation 1 – The anthropomorphic stance & in Theme 1: Anthropomorphism.
Note: multitude of misc; chocolate, cats, movies, sport – shows alignment with favourite and like
Table 7-11 – Word ‘love’ phrase query list
A [Sample] of ‘Loves’ search results was exported for reference as it was hard to get the coverage figure up to high percentage. The word ‘you’ almost always follows ‘I love’, although there are some instances of ‘food’ ‘movies,. Whereas “you love” is predominantly preceded by ‘do’ followed by a range of subjects ‘food (types chocolate, cheese, sushi)’, ‘movies’, ‘tv’ ‘Justin Beiber’.
The word ‘love’ has a large number of diverse appendages in the data, making it very difficult to attain 70% coverage. However, it is linked to the word ‘favourite’ because of an identified correlation in subjects appended to the word such as ‘chocolate’, ‘cats’, ‘movies’ and ‘sport’. This link is also discussed under Observation 1 – The anthropomorphic stance & in Theme 1: Anthropomorphism.
Note: hundreds of solo instances of ‘dance’
Table 7-12 – Word ‘dance’ phrase query list
‘Dance’ is linked with the control words ‘say’ and ‘sing’ because it appears that Users are either attempting to establish whether the robot can dance, or are trying to trigger it to do so. Control words are discussed in more detail under Theme 5: Movement, Tracking and Control. The Articulated Head did have a few preprogrammed movements installed for a performance that took place in the museum but they were not displayed in normal operation.
The museum did have a dancing robotic arm exhibit and it is thought that this may have had an influence on Users inclinations to check the Articulated Head’s capabilities for dance.
Table 7-13 – Word ‘life’ phrase query list
The high occurrence of the phrase ‘meaning of life’ was predominantly prefixed with the words ‘what is’ and sometimes appended with ‘the universe and everything’, this phrase is synonymous with the Douglas Adam’s third science fiction book in the series titled: Hitchhikers Guide to the Galaxy (Adams, 1984). The Users question in this context is very apt given the nature and context within which the Articulated Head was presented. Notably the
Articulated Head did not answer 42, which is the answer to the question in the book.
Note: hundreds of solo instances of ‘cool’
Table 7-14 – Word ‘cool’ phrase query list
The word ‘cool’ turned out to be unhelpful for phrase searches, mainly because of the high frequency of solo instances. However, the solo instances are indicative of an aspect of human . human conversation that doesn’t work so well in human . robot conversation. Humans often say ‘cool’ as single word responses during conversation with someone to show that they understand, agree and/or approve of what the other person is saying but when the Chatbot was confronted with this single word input conversational flow appeared to falters.
Table 7-15 – Word ‘robot’ phrase query list
The word robot’ was used in lots of different ways in the data but the most common instance was ‘do you like being a robot’, this was again the human User checking on the robot preferences. Further discussion regarding this preference checking is present in Observation 1 – The anthropomorphic stance & in Theme 1: Anthropomorphism.
Table 7-16 – Word ‘Harry’ phrase query list
The high occurrence of the name ‘Harry’ being followed by ‘Potter’ was expected because the museum had a Harry Potter exhibition running for part of the Articulated Head’s time there. This table clearly helps to illustrate that the environment within which the exhibit and human User are situated, can influence conversational foci and be reflected in conversational references. This is discussed more under Theme 6: Dialogue.
Table 7-17 – Word ‘live’ phrase query list
The User questioning the robot about where it lived is interesting because it was essentially a fixed, situated machine with no legs, wheels or means of movement from the base. Furthermore it was enclosed in a metal-framed glass surround, which would suggest that it lived where it was. Therefore the word ‘live’ adds some weight to the anthropomorphic stance detailed in Observation 1 – The anthropomorphic stance and in Theme 1: Anthropomorphism.
Table 7-18 – Word ‘eat’ phrase query
The word ‘eat’ was used frequently to test robot preferences with the phrases ‘what do you eat’ and ‘what do you like to eat’ easily making up the 70% coverage threshold. Again, this is discussed further in Observation 1 – The anthropomorphic stance & in Theme 1: Anthropomorphism.
Table 7-19 – Word ‘happy’ phrase query list
The phrase ‘are you happy’ once again is checking the robots status using an anthropomorphic stance as discussed in Observation 1 – The anthropomorphic stance & in Theme 1: Anthropomorphism. The other significant phrase ‘happy birthday’ is linked with the control word ‘sing’ and is clearly indicative of Users trying to get the robot to sing happy birthday.
Table 7-20 – Word ‘time’ phrase query list
‘What time is it’ as a phrase could be the User testing the robots knowledge but equally, it could be the User wanting to know the time and expecting the robot to recite it. Either way, the presumption of the User that the robot knows the time is implicit in the input. User expectations are discussed more under Theme 2: Expectations and enculturation.
Table 7-21 – Word ‘speak’ phrase query list
Users were very interested in whether the robot could speak other languages, Spanish, German, French, Italian, Japanese and a long list of other languages were cited. Clearly this unmistakable wish of Users presents one very obvious way in which the Chatbot’s capabilities could be extended to improve engagement in interaction.
Table 7-22 – Word ‘think’ phrase query list
The phrase ‘do you think’ was frequently prepended by the word ‘what’ and had many diverse appendages. The appendages predominantly appeared to be related to specific interests of the User. The diversity of subjects did not contribute to the identification of any particular pattern.
7.1.2 Chatbot Output String Lists
Table 7-23 – Word ‘name’ phrase query list
The word ‘name’ appearing as the number one high frequency word in Chatbot output strings marries with its appearance at position number one in the User input strings. The interesting point here is that the robot again presents some indicative evidence that it was more interested in itself than its audience, because the prepended word ‘my’ comes in at a higher frequency than ‘your’.
Table 7-24 – Word ‘Stelarc’ phrase query list
The phrase ‘my name is Stelarc’ was used by the Chatbot regularly. This broadly, although not exactly, aligns with the frequency of the question ‘what is your name’ used in User input. The table data does not contribute very strongly to any identified pattern or theme.
Table 7-25 – Word ‘think’ phrase query list
The word think is important in the respect that it is synonymous with the title of The Thinking Head Project. The Articulated Head was marketed as a thinking head. The list of contributory phrases required to get the coverage figure above 70% is quite long because the word ‘think’ was used in a number of differing but overlapping ways in the text data. Since the robot was projected as a thinking head, the number of ways that the Chatbot introduced the word ‘think’ into the conversational flow was probably encouraging Users to contemplate the possibility that the robot was thinking.
If so, this was good thing. There is a healthy balance between the suggested robot’s thinking and recognition of User thinking in the Chatbot’s use of the word.
Table 7-26 – Word ‘favourite’ phrase query list
The word ‘favorite’ in Chatbot output is directly linked to User input and the checking of the robots preferences, which is discussed in more detail under Sub Section 7.1.3.1 Observation 1 – The anthropomorphic stance, and in 7.2.4 Theme 1: Anthropomorphism.
Table 7-27 – Word ‘think’ phrase query list
The phrase ‘were talking about’ frequently has ‘we’ as a prefix. The phrase is interesting because it suggests that the robot possessed memory. There is evidence of this in the text data because the robot recalls a previously entered User string. There appears to be some randomness about which string it chooses to recall as the subject that they were previously talking about because sometimes the string makes sense in context and sometimes it does not. This is discussed in more detail in Theme 10: Memory Related.
Table 7-28 – Word ‘people’ phrase query list
The Chatbot used the word ‘people’ in a number of different ways, which suggested to the User that the robot chatted with lots of people and was aware of what other people thought and said. There was the inference that the robot could hear people and that the robot could compare previously expressed opinions with those being expressed by the current User. The phrase ‘mind if I tell other people’ is directly linked with a research participant exchange, which included a secret between the User and the robot.
Table 7-29 – Word ‘time’ phrase query list
The Chatbot high frequency use of the phrase ‘asked that all the time’ is clearly reflected in the high frequency of the phrases ‘what’s your name‘ and ‘how old are you’ in User Input. These phrases might have made the robot appear more human to Users. The ‘spare time’ phrase is normally prefixed with ‘what do you do in’, this phrase does encourage User input and also indicates that the robot was interested in the User. Attention is a subject of interest under Theme 7: Engagement.
Table 7-30 – Word ‘hear’ phrase query list
The high frequency of the phrase ‘want to hear a joke’ was in fact linked to a recurring crash loop that the Chatbot got stuck in from time to time. The log files only captured the first occurrence of this Chatbot output string in the loop subsequent repeats were not passed to the log file system by the event manager. The Chatbot repeated this phrase many thousands of times more than is indicated in the table above. This crash loop was a persistent problem that the engineers never really managed to solve completely. Why it happened with this particular phrase is a mystery. There was no filtering of text string data in the log file system – this was the only anomaly when recording text string data. The key point with reference to the Chatbot’s use of the word ‘hear’ in the other five instances in the table above below the phrase ‘want to hear a joke’, is that they infer to the User that that the Articulated Head can hear, which links with Theme 9: Senses – Related and Theme 10: Memory Related. The phrases ‘question I don’t hear everyday’ and ‘only hear that type’ both infer memory related to previous conversations and are also linked to Theme 10: Memory Related.
Table 7-31 – Word ‘talk’ phrase query list
The Chatbot said ‘lets talk about’, and then prepended many different things. Sometimes ‘my dress’ was prepended and although I think Stelarc meant this to be discussion about clothes, observation of the public laughing and talking about the phrase indicated that they thought it would be very funny to see the robot in a dress. Generally the Chatbot’s use of the word ‘talk’ was positive towards the User and encouraging conversation. However the phrase ‘I don’t want to talk about that now’ was mentioned by several people that interacted with the exhibit in the museum and came across to at least one participant interacting with the Articulated Head like an angry parent or other person making a terse, aloof comment that could be perceived as arrogant or rude. Rudeness appears under Theme 6: Dialogue.
Table 7-32 – Word ‘understand’ phrase query list
The Chatbot said it does not understand 67% of the time; this is too simplistic but in many ways it was also true as substantiated via Nodal data in Theme 6: Dialogue.
Table 7-33 – Word ‘western’ phrase query list
The Chatbot, having been programmed by some American programmers in consultation with Stelarc, who provided most of the preprogrammed responses to User input, clearly reflects the fact that Stelarc enjoys country and western music. This is a clear example of where Stelarc’s preferences showed in the Articulated Head’s performance and it is also a clear example of where Stelarc’s contribution to the programming of the Chatbot responses gave the Articulated Head agency. More examples of where Stelarc’s contribution to the programming presents Stelarc’s personality through in the performance of the Articulated Head are present in Theme 6: Dialogue.
Table 7-34 – Word ‘human’ phrase query list
‘Human years’ and ‘old’ are clearly linked in Chatbot strings because the Chatbot said ‘I am 62 of your human years’ frequently in response to the User questioning its age. This Chatbot phrase does infer to any User hearing it that the Articulated Head is not impersonating a human, this however does not preclude the inference that the Articulated Head was capable of human abilities such as seeing hearing and thinking. Many of the uses of the word ‘human’ in the phrases above are a little impersonal and clearly set the robot apart from humans in that respect. Possibly a more personal approach might increase engagement.
Table 7-35 – Word ‘old’ phrase query list
Probably the two most consistently common phrases to appear in both User input and Chatbot output strings are ‘what’s your name’ and how old are you’; more discussion of this takes place in Theme 6: Dialogue.
Table 7-36 – Word ‘color’ phrase query list
The word ‘color’ and its spelling are linked to the word ‘favourite’ and ‘favorite’ and discussed under Observation 2 – Programming additions that follows Observation 1 below.
7.1.3 Observations made from review of the phrase query lists.
A number of observations, both qualitative and quantitative in nature were made as a result of the word search queries and review of the phrase search table data, which does reveal something of a pattern emerging in the conversational foci.
7.1.3.1 Observation 1 – The anthropomorphic stance
There is sufficient evidence in the word frequency and phrase table searches alone to suggest that users interacting with the Articulated Head, appeared predisposed to question the Articulated Head about its preferences, with phrases such as; ‘what is your favorite colour, food, game, sport, song’ E- Appendix 19: Example 2, or ‘do you like food, cats, sport, movies?’, ‘do you like cheese’, ‘do you like chocolate’ E-Appendix 20: Example 3 proliferating in the text input/output data. Further substantiating evidence of this predisposition proliferates throughout the Nodal data that is presented with the themes that follow shortly.
The words ‘favorite”, ‘like’ and ‘love’ are all linked by a correlation between the subjects checked by User questioning of the robot. Implicit, in the human users questioning of the preferences of the Articulated Head is what I have termed the anthropomorphic stance. That is; a high occurrence of the questions a human User asked of the Articulated Head, at the very least test and probe the robot by making comparisons based on the human’s existential experience of their lifeworld, but furthermore, the questions regularly appear to have the implicit attribution of human capabilities afforded the robot. If one has a favourite colour or reads books and watches movies, then one can see. If one has a favourite song then presumably one has heard this song – so it stands to reason that one can hear. If one likes cheese or chocolate, then presumable one can taste, and probably smell as well. It was noted that there was much less evidence in the text data related to touch or feeling in particular, this observation will be discussed under Theme 9: Senses – Related. However emotional feelings do feature in the text and Video Cued Recall data, this will be discussed under Theme 8: Emotions.
The anthropomorphic stance taken by humans in approach to interaction with the Articulated Head is unmistakable and present in data across all conditions, both inside and outside the research participant data frame.
Human User adoption of the anthropomorphic stance appears to be instinctive and is a pervasive, if not a universal approach to this particular interaction as supported by the research data collected in both video cued recall interviews and the text string analysis. There are comments in participant reports to suggest that the human-like face of the Articulated Head was the catalyst that engendered adoption of the anthropomorphic stance, but there is not enough evidence in the data to confirm that this was in fact the case. The suggestion that the human-like face was the catalyst is probably only part of the story. The human-like face being a contributory factor sounds completely plausible but other elements of the presentation of the Articulated Head such as movement, speech and marketing must also have played their part.
The human User’s comparison of the robot’s preferences to their own, based on their own existential experience and lifeworld, and the correlation that this comparison has with the senses, is both obvious and remarkable given that the human must have known that the robot was not a biological existential being and was not endowed with the ability to see, hear, smell, taste, or touch/feel. Perhaps the anthropomorphic stance was adopted precisely for this reason, just to confirm their suspicions that the robot was not a real-live being. The anthropomorphic stance taken, draws some poignant parallels with the phenomenological reduction in Section 3.1.4.15 Sense by sense comparison, and is particularly interesting because, if one can predict with any significant degree of certainty, what subjects are the most important to the human in conversational interaction, then clearly the prediction provides some leverage for the business of a targeted approach to expansion of Chatbot phrase vocabulary, in order to enhance human engagement in this interaction. This opportunity is discussed in Section 9;The Blueprint of Emergent Recommendations.
7.1.3.2 Observation 2 – Programming additions
The words ‘favourite’ and ‘colour’ did stand out as very prevalent words with a range of surrounding topics in interaction strings E-Appendix 21: Example 4, so a little further investigation was called for. The fact that both these chosen top 100 search words from User input and Chatbot output strings can be spelt with English and/or American spellings becomes a point of interest in what follows. The word ‘colour’ appears in the User input strings top 100 words, but not the Chabot output strings top 100 words. – Upon a search of the E-Appendix 22: Chatbot’s top 1000 words ‘color’, with the American spelling does appear at position 340, so I decided to add it to the E- Appendix 17: Chatbot Output String Phrase Table (with orange highlighting).
It was initially thought that perhaps the Chatbot might only use the American spelling. If so, then the word spelt with a ‘u’ in it, would indicate User input in every case within the data set. However upon further investigation I found the following; User input shows 130 instances of ‘color’ and 1075 instances of ‘colour’, indicating a very strong tendency towards the English spelling of the word in the demographic of the visiting public.
This is not surprising as the data was collected in Australia, which is English speaking and has very strong current and historical links with the United Kingdom. Chatbot output shows 687 instances of ‘color’ and 433 of ‘colour’, this was surprising as it was expected the American spelling would be the only spelling used because the Chatbot was programmed in America by American programmers in consultation with Stelarc. So queries were set up to look for only the English, and only the American spelling. Upon viewing all the Chatbot text references for ‘color’ the Chatbot phrases made sense as stand alone statements. However, upon viewing all the Chatbot text references for ‘colour’ I discovered that most instances of its use were where the robot utterances were nonsensical -‘the colour of infinity’, the colour of motion’, ‘the colour of transparency’, ‘the colour of knowledge’ E-Appendix 23: Evidence 5. There was also a couple of other high frequency uses, which made a little more sense as standalone statements, such as; ‘the colour of the pacific’ and ‘the colour of the train behind me’. The last of these instances very strongly suggests that someone, possibly an engineer, carried out this programming and expansion of the Chatbot’s phrase vocabulary after the Articulated Head was situated in the Powerhouse Museum. This observation is made on the basis that there was a train behind the Articulated Head in the Powerhouse Museum, and it is unlikely that this was the case elsewhere. It certainly was not the case anywhere else during the course of this investigation. This anomaly only affected a minute fraction of the overall data collected so this was not considered a major issue.
Whoever chose to add these nonsensical additions to the Chatbot’s output strings, had a preference for the English rather than American spelling of the word ‘colour’. Clearly entering nonsensical preprogrammed responses to User input, no matter what the triggers are, is hardly going to increase the Users engagement in the conversational interaction, and although there are instances of the User questioning the robot about these statements, more often than not, the Users next input to these statements from the Chatbot was to move away from the current topic and try something new.
On the basis of the observations imparted above, the phrase counts listed in the Phrase Tables with specific reference to Table 7-9 – Word ‘favourite’ phrase query list and Table 7-10 – Word ‘colour’ phrase query list, where it is clear that the user is normally raising the questions rather than responding to Chatbot output, and taking into account the imbalance between User input and Chatbot output word counts, it can be said that the high prevalence of the word ‘colour’ as a conversational topic of focus in human input, was primarily human initiated rather than Chatbot driven. This is significant, as some frequently repeated pre-programmed strings, could otherwise have been driving the prevalence of this topic, which they were not. It can also be said that there appears to be a division in the ‘giver of agency’ to the Chatbot, where the switch from the American to English spelling indicates strongly that the Chatbot’s programming had at least two separate human contributors at the source of data input to the Chatbot’s phrase vocabulary.
It is clear, is that the humans present on all sides of this information exchange felt the word colour was important. I too believe that the word colour is important, because it relates directly to the sense of seeing, which was one of the most prevalent probes of the human User in questioning the robot.
Since the word ‘favourite’ also has the same English/American spelling anomaly as the word ‘colour’, I checked the data to see if there was a mixture of English and American spelling instances of this word in the Chatbot output text strings, there was not. The American spelling ‘favorite’ was used exclusively, lending more weight to the observation of that the word ‘colour’ was utilised for extending the Chatbot’s phrase vocabulary.
7.1.3.3 Observation 3 – The Input/output word ratio imbalance
The 1.81:1 Chatbot Output/User Input word ratio imbalance identified earlier under the top 10 word search expanded to nearly 2.54:1 in the top 100 word searches: User – 374,015 words, Chatbot – 948,230 words over the data set.
This expansion of ratio indicated a possible growth curve, so a word frequency search for the E-Appendix 24: Top 1000 User Input String Words and E-Appendix 22: Chatbot’s top 1000 words was conducted for the purposes of checking this ratio expansion over a broader span of the data.
This also served to make sure that there was not any other obvious pattern emerging from the list (other than the observations already made) that could readily be substantiated with clear evidence. The ratio of Chatbot Output/User input words, when totaled from the Top 1,000 table linked above is 2.92:1, Chatbot – 1,436,945 words, User 492,000 words. No other specific patterns were identified. This clearly indicates that the rate of ratio expansion slows with the increased number of words.
Increasing the Top 1000 to the E-Appendix 25: Top 10,000 User Input String Words and E-Appendix 26: Top 10,000 Chatbot Output String Words did identify that a very large proportion of occurrences of input strings and output strings did not make sense as proper words. The further down the list of the Top 10,000 input/output strings you go, the more examples you can find of erroneous input/output. It is not clear how many of the questionable spellings and sewn together words of the Chatbot output strings had been specifically constructed because they sounded better that way. It is also not clear whether Users interacting with the robot understood these unusually constructed strings. However, what is clear is that the deeper you dig, the more noise there is in the data, making it harder and harder to extrapolate any useful information or make sustainable observations.
7.1.3.4 Observation 4 – Conversational diversity across the data set
Humans appeared to display less conversational diversity or vocabulary bandwidth than the Articulated Head based on the length of phrase lists presented earlier. The Top 10,000 lists also indicate a wider vocabulary being used by the Chatbot output as oppose to the human User input. However, these observations are somewhat misleading because of aspects discussed under Theme 4: Mode of Interaction.
7.1.3.5 Conclusions from the text string analysis
The string analysis has dealt with high frequency phrases and treated input and output strings as separate entities to a large extent. The nodal analysis that follows looks at the sense of dialogue exchange between the User and the Chatbot under Theme 6: Dialogue. The most significant observations extrapolated from the textual input/output data are that; the anthropomorphic stance to conversational interaction is clearly evident; that the Chatbot says considerably more words than the User inputs; and that there is significant noise present on both sides of the input/output data flow, but User input is especially noisy.
7.2 The NVIVO Coding Process and Emergent Themes
A coding process was conducted in the computer assisted qualitative data analysis software called NVIVO. The coding process was used to sort out themes in research data collected. Before describing the coding process and how it was refined, it is important to note here the nature and presentation of data and evidence that follows in this section and how the presented evidence should be taken in context. You will be presented with a description of how the coding was done and how the main themes were sorted from the coding.
It is important to note that the textual data you view will in many cases display interspersion of qualitative researcher commentary and interpretation of the content of the interview, along with the Participants interview dialogue. The text may also show User Input and Chatbot output text from the interaction dialogue. This interspersion of data is appropriate because it captures the content, context, possible meanings and interpretations in the same place together, at source. Whilst attempts have been made to maintain spelling in the qualitative commentary and interview dialogue the NVIVO database software has no spell checker incorporated.
Every attempt has been made to retain the meaning and context of participants interview commentary. The following bracketed indicators (……….) or (???), have been used to signify where something was not clear, and (XXXX) where someone’s name or identity has been hidden for ethical reasons. No attempt whatsoever has been made to change or correct spelling or typos in the interaction dialogue, because it is what it is, and it should be retained that way for the purposes of analysis. Therefore the evidence you view, which contains text from interaction input/output in particular, will be littered with typo’s and spelling mistakes. However, this in itself is indicative of various points drawn out in the interpretive narrative that emerges from the data analysis.
It is also important to note that the NVIVO database and audiovisual content runs into several hundred gigabytes of data. To keep this thesis to a size practical for delivery and assessment on DVD, only specific auditory visual content is presented as substantiating evidence. This content has been exported from specific nodes in the database as interactive web pages, where only the relevant coded sections of video content are displayed at a smaller size (320 by 240) than the original capture size of (1024 by 768). Audio quality varies in the content considerably dependent on the ambient noise and acoustic environmental conditions under which the capture was conducted. Therefore good amplification and listening conditions are recommended when wishing to review the finer details of the audio content.
The content chosen for presentation as interactive web pages have been specifically selected as substantiating evidence to support major points emergent from the data, where it was felt that the textual evidence supplied from nodes alone may not be sufficient to substantiate the validity of the interpretive narrative, or where an emergent and pivotal observation from the data has profound ramifications on the findings of this investigation.
The following table outlines the interaction time stamps of Video Cued Recall interviews including the creative auditory visual project conditions.
Table 7-37 Video Cued Recall Interaction Captures
Summary of data collection
In summary of the above table, there is 3 Hours 46 minutes of clear captured and coded Video Cued Recall Interview data, with 23 contributing participant reports related to the Articulated Head Exhibit in total.
In mid/late 2011 a very large construction and refurbishment project commenced in the Powerhouse Museum and collection of interactions and Video Cued Recall interviews became impractical as the project commenced. Environmental sound conditions, pneumatic drills and hammering made audio data unusable. The Articulated Head was shut down and covered by sheets on several occasions due to problems associated with construction dust. Furthermore problems with torrential rain and a leaking roof in the museum also resulted in numerous shutdowns of the system. From October 2011 onwards data collection slowed dramatically and the coding process was started. The construction project-taking place in the museum was an enormous undertaking and was still progressing in November 2012 when the Articulated Head was decommissioned.
The changes in auditory visual conditions detailed in table 7-38 above, did not have any dramatic influences over the interactions that took place for the reasons detailed under 7.2.3 The Narrative Emanating from the Coding.
Discussion of the mediating considerations and constraints, which help to explicate confounding and compromising aspects of the experimental auditory visual designs and their presentation to the interacting audience is delivered under Theme 12: Auditory Visual .
There is a relative uniformity of participant reporting contributions to each of the emergent themes of this investigation, as well as consistency of participant spread across node coding that is relative to the duration of their interaction timestamps. Participants that have contributed with longer interaction timestamps, appear proportionately more often in the nodal data presented.
Most of the conditions changed in the interactive environment over the data set were very minor, and are not considered to have been significant to participant interactions. The reasons for this view are detailed under section 7.2.3 The Narrative Emanating from the Coding.
The individual differences of the participants involved in these interactions across the data set are evident in review of auditory visual data and questionnaire outcomes presented in the Key Evidence Links Table, but are not really relevant to the major findings of this investigation, because the consistency and spread of participant reporting on the themes picked up in data analysis, is relatively uniform across the data set. In the nodal analysis that follows, the omnipresence of participant contribution to the emergent themes, speaks clearly of the fit of analysis and interpretation to the research participant data set. More specifically, this omnipresence points to a fundamental aspect of the interaction and interactive environment, as being pivotal to all subsequent aspects of the interaction-taking place. For the most part (physical and mental disabilities excepted), the major findings from this investigation do not hinge on individual differences, and do relate to the vast majority of all possible human Users.
7.2.1 What was done?
Video Cued Recall Interviews were imported into NVIVO for qualitative data analysis. Watching and listening through the interviews started the analysis process. Whilst watching and listening the video was annotated in short sections. Selecting what appeared to be natural divisions of inactivity, sometimes bracketed by silence, or natural divisions in conversational foci- taking place during the Video Cued Recall Interviews, helped to choose each section for coding. Breaking video content and annotations up into small sections, allowed for interspersion of qualitative commentary with interpretation of the content of the interview, laced with transcription of the participants interview dialogue, and User Input/Chatbot output text. The image below shows a section of video with annotations in table rows to the right hand side of the video image. Each table row corresponds to a short section of video as indicated in the Timespan column. Each consecutive row details commentary and annotations related to the video content in chronological order.
Figure 7-1 Video Cued Recall Annotations, dialogue Transcription and Commentary
This process of annotation included simple coarse coding to nodes in NVIVO.
Coarse coding of nodes means that sections of video related to a table row would be linked to newly created nodes, which would be titled to broadly describe the content of that table row. For example the node might be called Memory if the interview dialogue and annotations were related to the Articulated Heads memory. This process of annotation and coarse node coding was conducted for all the auditory visual data collected from participant interviews resulting in approximately 30 initial nodes.
Once the annotation and coarse coding was completed, a second and third pass of all the data was made as the node coding process was refined and intensified to include creating new nodes for any newly emergent subjects of interest, whether they emerged from User input or Chatbot output text, Video Cued Recall interview dialogue or qualitative commentary and interpretation of the auditory visual evidence collected. Node creation was completed with little or no attention paid to whether a node being created would belong to a particular family of nodes or not, as the theme identification process was expected to perform this service later anyway. As a result of this, some nodes were created with very similar names and some nodes relate to very similar content. However, a few parent nodes were created with families of child nodes, where it was obvious that the children inherited a particular aspect of the parent node in their coded content.
The process of revisiting the data and annotations resulted in a total of 175 nodes of interest being created, and well over 2400 referenced sections of data to these nodes. Whilst conducting the node coding process three separate nodes were created named ‘positive, negative and neutral’ for the purpose of linking clear statements about interaction from participants as coding progressed. This was set up to provide a way of having a control check node. I.e., these nodes would provide a simple way to check if the positive, neutral and negative nodes broadly align with the emergent narrative from the data analysis.
Some nodes have overlapping themes and any auditory visual or textual data can be coded to more than one node at the same time. It is also possible that sections of data traverse nodes, which overlap but the content coded does not necessarily extend to both ends of the each of the nodes coded spans. The distinct advantages of the researcher conducting the Video Cued Recall Interview dialogue transcription, qualitative annotation and interpretation alongside the video data itself is that; 1) Visual interpretation to confirm the meaning of auditory data is not lost – as would be the case if normal audio transcription had been conducted by a third party transcription company without any visual cues. 2) The researcher is brought much closer to the data itself, leading to a deeper more immersive understanding and qualitative interpretation of meanings and messages emanating from the data.
Coding saturation
The node creation coding process was continued until the auditory visual and textual data stopped throwing out new ideas for node names so that when the data was revisited again, new sections of interest were consistently being coded or recoded against existing nodes. At this point coding saturation had been reached. It is possible to revisit the data over and over again creating ever more deeply layered and finer sub node coding of data already coded to broader nodes, but this process is unnecessary where a satisfactory understanding of the content of the node can be gained from review of the node data in reference view, which was the case for most of the 175 nodes created in this project.
Some nodes have many more references coded to them than others, primarily because the references occur much more often in the data. This does not automatically mean that these nodes are more significant than those with fewer references linked. However the multiple occurrences of specific or similar events in the interactions or the Video Cued Recall Interviews, along with associated annotations in any one particular group of nodes in the coding tree, is significant in that it speaks loudly of these events, and is likely to indicate a theme.
There are however some much quieter but very significant voices to be heard, or messages to be observed from the data, some of which are all but silent in the node coding, but endemic in the auditory visual material collected. A specific example is that all research participant interaction videos show that an inordinate proportion of the participants time during interactions was being spent looking down at the kiosk keyboard, and then another significant but smaller proportion of interaction time looking directly at the face of the Articulated Head, waiting for a response. Review of video data presented for the emergent themes that follow shortly will substantiate this observation. This fact is not surprising at all because the exhibit’s interactive design effectively forced this to happen. The time and attention- space left for the human to take in any other environmental information was greatly constrained by this single fact. Rather like fitting a horse with blinkers to keep its gaze focused on what is directly ahead, the human beings present in these interactions were similarly, effectively blinkered.
The above point is highly significant, because it dictated the mode of engagement. Some might say that it kept the human engaged in a specific mode, which indeed it did to a large extent, and human engagement in this interaction, or improvement thereof, was a goal of this investigation.
However, the overarching purpose of this research investigation is to consider how human . machine interaction can be improved within the context of the interactions that have taken place between the Articulated Head and humans. With this question in mind, one must consider whether this mode of engagement was the right one for nurturing human . machine interaction in this context? A major conclusion of this investigation is that it was not! This Section of the thesis and particularly Theme 4: Mode of Interaction presents themed evidence that confirms and substantiates the reasons for reaching this conclusion and therefore, what follows should adequately convey this.
Theme sorting
The next stage of the nodal analysis was to sort out the main emergent themes from the node data. This process is in effect the business of trying to work out what narrative the cumulative data analysis is actually purveying.
The question is of course, where does one start?
The text input output analysis at the beginning of this Section highlighted the high occurrence of what I have termed ‘the anthropomorphic stance’ adopted by many, if not all humans in conversational interaction with the Articulated Head. Therefore anthropomorphism was taken to be a likely theme and nodes were reviewed by title first, and then by content to see if they could be relevant to this particular theme. It very quickly became clear that several nodes were relevant to this theme but it also became clear that, just as nodes have overlapping and intersected data references, so to would the overarching themes and narrative that explicates them. The problem with this situation is that you do not want to repeat yourself (or presentation of evidence) too often when imparting a narrative in text, for fear of losing the reader’s interest and attention.
So, a list of all the node names was printed onto paper and they were cut up so that each node name was present on a separate slip. During the paper cutting session the names were reviewed again and a list of possible titles to themes were written on separate large labels. The individual node name slips were then sorted into lists by placing them beside the theme label that they most likely related to, any which could not initially be placed under a theme title were left aside for the moment.
After the process of sorting the node title slips under the titled theme labels, I was delighted to find that all but about 20 of the 175 slips had been fairly reliably identified as relating to one of the specific theme titles over and above any of the others, even if some of the data references coded at that particular node do have significance in other themes, as has already been discussed.
A new theme title ‘Outliers’ was created for the 20 or so node slips that could not yet be placed under a theme title, and then the list of node titles were arranged under theme titles in a word document list. This word document then became the Interactive Evidence List table, which was then distilled to create the Key Evidence Links Table supplied on DVD.
The Key Evidence Links Table lists theme titles with links to some key textual and audiovisual substantiating evidence. The links table is provided as electronic documentation to the reader to act as a quick access panel. The Key Evidence Links Table can, and should be used as a standalone link document, whilst reading the narrative that follows.
The next step was to review the content of the nodes to check that it made sense to consider that evidence under that particular theme title and if not, then move that node to an appropriate title.
The content of nodes under the Outlier theme title was reviewed to see if those nodes could be reliably and helpfully placed under an appropriate theme title. This sorting and reviewing process sounds simple, but in fact, it was very involved and took a protracted period of time.
Theme titles were sorted into an order that appeared to make some sense for imparting the narrative emanating from the data. Then an attempt was made to sort the node titles under each theme into a sensible order. The initial intention was to try and address each node’s data in list order in the narrative. However, upon beginning to write the narrative, it very quickly became clear that sorting the node order under theme titles was somewhat superfluous, because, in order to make proper sense of it all, the narrative was unlikely to address each node in the list order systematically.
Furthermore the narrative would not necessarily present the themes systematically in list order either, because many nodes traverse several themes. However, for the readers benefit, an attempt has been made to be as systematic as possible when imparting the themed narrative.
7.2.2 Revisiting the Knot
The node based data presented through the Key Evidence Links Table, is effectively the untangled knotted issues and strands of the Knot mentioned in section 2.14.2. The coded nodes are the strands that were knotted; the themes that follow are the families they should have be platted with. The Outlier theme and associated nodes are the strands that appear to be loose ends. The interpretative narrative that follows proceeds by identifying barriers in terms of nurturing flow in human . machine interaction in this interactive environment. The themes themselves represent the brushed groups of strands that are to be re-platted and The Blueprint presented in section 9 is the effective re-platting of these knotted issues and strands into a vision, which represents a group of outcomes and related ideas emergent from the investigation that show how interaction between humans and machines can be improved within the context of this, and similar interactive environments.
When reviewing nodal text data from the Key Evidence Links Table, poignant node references have been highlighted.
7.2.3 The Narrative Emanating from the Coding
Scope
The Articulated Head was exhibited in three separate places, the NIME, SEAM and the Powerhouse Museum exhibition as previously described. The node coding and interpretive narrative emanating from the coding spans data collected from all three exhibitions. Various environmental conditions changed over the duration of data collection including development and installation of the auditory visual additions as described previously. At one stage (only for a couple of days) during the Powerhouse museum exhibition, the face and voice of the Articulated Head was changed to a female.
Notably the Chatbot programming remained the same. The Articulated Head’s performance capabilities were expanded during the Powerhouse Museum exhibition with the ability for it to sing a few songs. However the core business of the Articulated Head, talking to humans interacting with it – primarily via the kiosk keyboard, did not change over the period of data collection. Furthermore the presentation of the Articulated Head, the Face shown on a screen, mounted on the end effecter of the industrial robotic arm with the exhibit presented in a metal-framed glass enclosure did not change to any great degree in any of the three exhibitions. All three exhibitions were held in noisy environmental conditions in public exhibition spaces. The tracking system and all main capabilities of the Articulated Head remained the same over the duration of data collection. There were some very small variations to the preprogrammed vocabulary of the Articulated Head’s A.L.I.C.E Chatbot as discussed previously under Observation 2 – Programming additions, but for the most part; all the variances discussed above had only minor reported impact upon the overall experience of Participants interactions.
Wherever any variance of the exhibit has affected the data collected, and therefore influenced the interpretive narrative that follows, its affect and any possible significance is discussed in the narrative. Observable impacts of the variations to the wider public interactions that took place are also discussed in the theme interpretive narrative where appropriate.
7.3 The Emergent Themes
7.3.1 Theme 1: Anthropomorphism
This theme is directly linked with sub section 9.2.1 Anthropomorphism.
As has already been identified and discussed in relation to the word ‘favourite’ under Table 7-26 – Word ‘favourite’ phrase query list and Observation 1 – The anthropomorphic stance; the anthropomorphic stance taken by humans approaching and interacting with the Articulated Head is strongly evident in the conversational text data logs collected. Examples of evidence show cordial politeness and manners being displayed by the interacting audience in conversation with the robot. Furthermore, this anthropomorphic stance and a human communications approach to interactions, is very strongly evident throughout the interactions captured in the Video Cued Recall data with research participants.
Indeed, the propensity for the probing and testing of the robots preferences and human-like capabilities, with the interacting human clearly making direct comparison of attributes based on their own existential experience, is ubiquitous throughout the entire data set. This anthropomorphic stance is clearly laced with the humans’ wants, hopes and beliefs when interacting with the Articulated Head. The empirical data set collected during this research, and more specifically the linked textual and auditory visual data presented as electronic appendices to this document via the Key Evidence Links Table, cumulatively, with numerous instances present across the data set, in all conditions, and in all three exhibitions, substantiates that humans do want, hope and in a number of cases, believe that the Articulated Head does in fact have human-like inclinations and capabilities such as the inclination to flirt for E-Appendix 27 Example 6, or the ability to E-Appendix 28: See, hear, think and even feel emotions or perceive the intentions of others.
The hopes and the wants feeding the beliefs of the humans displaying this anthropomorphic stance towards the Articulated Head, is considered in the conclusion of this investigation to provide pivotal leverage with regard to how one can improve human . machine interaction. Put simply, catering for the hopes and wants of the human in this interaction nurtures belief, even if that catered entity is in fact an untruth, a false statement.
Before proceeding with the themes, it should be noted here that various mitigating circumstances surrounding project development and data collection are cited under the following theme headings. One of the specific aims of this project was to conduct the research in ecologically valid environments, public exhibition spaces with a project team and all the trials and tribulations that this scenario might incur. Therefore circumstances are cited to give a clear overview of what actually happened as this research project progressed. A positive view has been taken in relation to overcoming all the challenges and barriers encountered, resulting in the projection of ideas, theories and recommendations for circumnavigation and negation of these challenges and barriers in future projects in public exhibition spaces in section 9.
7.3.2 Theme 2: Expectations and enculturation
This theme is directly linked with sub section 9.2.2 Expectations and enculturation.
The initial impressions and expectations of people when first approaching the Articulated Head were mixed. For example; one participant reported expecting the kiosk screen to be a touch screen. Many members of the public, and particularly children, frequently assumed that someone behind the scenes was controlling the robot. The Powerhouse Museum has a very large number of schools visits and I was regularly questioned by groups of school children coming around to the laboratory at the rear of the projection screen saying, “You’re controlling that robot!” – referring to either myself or sometimes others present in the lab area at the time. The response given (sometimes many times a day) was to move away from the computer keyboard and exclaim, “look no hands” – then wait for the robot to speak.
The children were amazed, and then a little perplexed by the idea that the robot might actually have a mind of its own, and was able to speak without me typing. Once convinced that I was not controlling the robot, in general the children showed credulity, appearing quick to accept the possibility that the robot might be alive, whereas adults approaching the robot were considerably more skeptical and testing, probing the robots capabilities. In general the research participant’s initial impressions and expectations of the robot were most strongly stated by the first few actions during their interactions or the first few statements made during their Video Cued Recall interviews. For example one participant had the visual expectation of seeing Stelarc’s face as the face of the robot, as all reporting suggested that this would be the case. The participant was surprised because the female face mentioned earlier was being displayed on the screen during her interaction.
Her first comment in the interview was;
“OK so my first thought was why on earth is it not Stelarc”
The research participants had many pre-conceptions and expectations of the robots abilities. Thought, memory, emotions, sight, hearing, controlled movements, dance, singing and a whole range of other expectations can be identified by reviewing the data collected. Some participant’s expectations were very high and they reported being disappointed by the interaction, which they expected more from, but others reported surprise when encountering flow in the interaction after lowering their expectations, because of their initial impression. This is an interesting point because it highlights a tangible link between the level of expectation in a performance and the perception of pleasant surprise or disappointment reported from the audience in this interactive performance.
Some participant expectations were the result of information received prior to the interaction, such as the expectation of the Articulated Heads ability to think or retain information in memory, whereas others were more likely the result of approaching the interaction, such as the belief that the Articulated Head could see or hear E-Appendix 29: Example 7: E-Appendix 30: Example 8. However, there are two resoundingly strong overarching expectations emanating from the research participant data that appear consistently held; 1) that the robot is expected to possess human-like capabilities; memory, environmental awareness, sight, hearing. – and 2) that these capabilities should evoke responses that reflect those of a human in behavioural patterns, such as; politeness, courtesy, expression. For example: The robot should possess memory, be able to remember ones name once told, and be able to say goodbye politely using that name, especially if the robot has asked for the name in the first place, otherwise it appears impolite to the human user. For another example: The expectation that the robot should pay attention to the person who they are conversing with, or that the robot should smile in congenial response to human facial expression, or when exchanging a joke. Other examples include participants expecting the robot to know about other aspects of the museum exhibits and have knowledge of all sorts of things including, but not limited to, visual and auditory awareness of its immediate environment and the people within it.
These overarching macro dynamic preconceptions and expectations of humans in this human . machine interaction are confirmation once again, of the strong anthropomorphic stance adopted by humans to this interaction. But what happens when these expectations are not met?
Enculturation describes a process that people go through when learning the requirements and behaviours expected of a surrounding culture. The Articulated Head and its interactive environment engender enculturation of those humans who spend more time with it. One participant in particular did spend a long time interacting with the Articulated Head. This participant also had a second interaction recorded and did a second Video Cued Recall interview. Aspects of enculturation do come up in this participant’s reports with reference to two specific aspects of conversational foci; 1) She became very aware that even a small mistake in typing could have a significant effect upon the Articulated Head’s response on conversation, this made her much more careful of her data input accuracy. 2) The participant found that she was able to make the Chatbot repeat itself with specific input: “to get him to repeat something I would ask the question again and he would always give me the same answer”
The repetition of output from the Chatbot was also noticeable in observations of interactions with the general public. For example; when the Chatbot was asked ‘what kind of food do you like?’ – the answer invariably included Sushi and Capsicum.
7.3.3 Theme 3: Physical Presentation and Operational Aspects
This theme is directly linked with sub section 9.2.3 Physical Presentation and Operational Aspects and also has a very strong correlation with features discussed under the next two themes: Theme 4: Mode of Interaction and Theme 5: Movement, Tracking and Control.
There are a number of elements pertaining to the physical presentation and operational aspects of the Articulated Head that were picked up on through observation, by research participants, and in textual input/output analysis that had an effect upon engagement over the duration of this study. Most of the operational aspects identified were very practical in nature and normally noticed as aspects of the operation that were not working properly.
The operational status of the Data Logger, Sonar Proximity Client, Audio localizer and the Face Tracking Software were only partially operational, or nonoperational for the majority of the duration of this study; as mentioned under section 2.8 Software component based architecture. The reasons for partial or nonoperation are as follows;
Data Logger
The data logger was implemented by engineers at the outset of the Articulated Head’s debut in the Powerhouse Museum, and was working properly for February and March of 2011. However, an upgrade to Articulated Head system, which I understand included a new computer, was implemented in March/April 2011 and unfortunately an oversight in the data transfer process meant that the data logger’s functionality was not installed with the new implementation. It was not until June 2011 when I asked the engineers to do a log file backup that I discovered that this functionality was no longer operational. Furthermore the old files were not available to the engineers because the old machine had been wiped for another purpose. I was told that a new logging system would be implemented in due course but it was not made clear to me when.
At this juncture I decided to implement my own logging system, hence the text input/output analysis starting in June 2011. The logging system I created was not as comprehensive as the original engineers system, because I was not given open access to the Articulated Head’s central systems or the data strings required to implement a fully featured data logger. At the time, only the User input and Chatbot output strings were available to be passed from the Articulated Heads Event Manager to the Max interface for my use in the Max-programming environment. Expansion of the Event Manager/Max interface to accommodate the passing of a little more data was tentatively achieved in June 2011 and the system was stable by August 2011, courtesy of Zhenghi Zhang who was most helpful in assisting me to interface with the Articulated Head’s Event Manager to facilitate the passing of the robotic arm’s coordinates into the Max programming environment. This made it possible to calculate the position of the robots display screen and to place its voice in correlation with the position of the face, in the spatial auditory field.
Non-implementation of the original engineers data logger, which logged far more information related to the Articulated Head’s operational status than my restricted version did, was not such a big problem because, as has been stated in sub section 7.1 the Text String Input/Output Analysis, the files collected between June 2011 and November 2012 are thought to be highly reflective of the conversational exchanges that transpired between the Articulated Head and its human interacting audience. However, as will become clear in further discussion of the functions of the sonar proximity sensor, audio localizer, face tracking software, and other possible related and linked functionalities of the Articulated Head detailed in section 9: The Blueprint of Emergent Recommendations, these extensions of the exhibits capabilities in conjunction with other extensions and a macro dynamic rearrangement of the interactive environment, are found to be critical to addressing the barriers to fluent interaction that have been identified through text input/output data, research participant Video Cued Recall interview reports, and the interpretive analysis thereof.
Sonar Proximity Client
The sonar proximity client was implemented as a way of helping to delineate between different Users interacting with the robot. The sensor would register when a person was standing in front of the kiosk and this data was registered in the original engineers implementation of the data logger. Although this registering would have been a little hit and miss, because people who did not interact with the exhibit still walk passed the kiosk, it would have helped to identify where text input from a User started and stopped.
The sonar proximity sensor was mounted inside the kiosk above the screen and was functional for a very short time at the beginning of the Articulated Head’s time in the Powerhouse Museum. Unfortunately the sensor became a feature of interest for young children, and it was not very long before kids poking their fingers into the mount point hole destroyed the sensor. A grill was fitted but this was not sufficient to stop the problem. Furthermore resurrection of the sensor was not useful after the engineer’s data logger became non- operational because the relevant strings were not available to me.
Audio localizer
The concept of the audio localiser establishing the position of a person in the audience with the use of a matching-pair microphone set-up, which measures the phase difference between the pick-up signals of each microphone and calculates a horizontal angle of incidence from the data, makes some sense. However, because of the ambient noise present in the public exhibition spaces where the Articulated Head was exhibited, this systems accuracy has to be questioned at the very least.
The audio localiser was said to make a contribution to confidence in the systems identification of the location of a person in the audience, by comparing the figures generated with those generated by the stereo camera tracking system. Under certain conditions this localiser data might help to increase confidence in the audience location identification, but not under the conditions that were in place during this study, and most certainly not the conditions that were implemented in the Powerhouse Museum. The mounting positions of the matching pair microphone set-up, which were placed above the projection screen to the rear of the exhibit in the museum Appendix 1: Plan view diagram, and the glass enclosure between the microphones and the interacting audience, compounded with the ambient noise present in the exhibition environment, sound from the lab area, which was much closer to the microphones than the audience, and loud noises from other exhibits such as the train behind the Articulated Head, must have rendered the audio localisation system and data collected all but completely useless, providing nothing but erroneous and misleading information for comparison with the stereo camera tracking system coordinates. Erroneous Audio Localiser data would help to explain the erratic movements of the Articulated Head’s robotic arm, which are linked to participant complaints about the robot not paying attention to them. This is discussed under Theme 7: Engagement. Further discussion of a macro dynamic realignment of the stereo microphone system is discussed in section 9.2.3 Physical Presentation and Operational Aspects.
Face Tracking Software
The face tracking software was working at the NIME exhibition; this is evident from the mimicking reported by participants in Video Cued Recall interviews and evident in interaction material. Theme 1: Anthropomorphism video links in the Key Evidence Links Table have already presented evidence of this, so I will not present it again but this direct link between evidence and themes does help to amplify the crosspollination permeating across the data set. This crosspollination is evident when viewing audiovisual data. To detail the minutiae of every crosspollination is beyond the scope of this document, but the macro dynamic links in this data are of the greatest interest to this investigation and formulate the main findings brought out in this themes analysis that are then re- platted in section 9.2.
The face tracking software was also operational for short periods of time, in both the SEAM and Powerhouse exhibitions. The software presented the engineers with a number of difficult challenges, by frequently causing crashes of the Articulated Head’s systems. There were also crashes caused by other things. Each time a crash happened, restarting the systems took some time. The engineers eventually set up auto-restart routines to circumnavigate the work required restarting the systems after various crash scenarios. These auto-start routines significantly reduced downtime. However, after many attempts to get the face tracking software to work properly with the rest of the systems, the engineers abandoned this feature because it was causing too many persistent problems.
As the software was causing problems at the time, it was important to make sure the exhibit was perceived to be operational whilst on public display; the face tracking software function and expansion of the use of information that could be stored in a database presented opportunities for enhancement of engagement in this interaction that are important – and do address issues raised in research participant reports. More discussion of this subject is offered under 9.2.3 Physical Presentation and Operational Aspects.
The problem with failure to implement and expand upon the above functionalities and operational aspects of the Articulated Head’s systems, was that these functions, regardless of the technical difficulties that they presented, were crucial to addressing interactive engagement. Without prolonged resilient research, development and implementation of these and similar features and functionalities of the Articulated Head or any other similar interactive system or exhibit, the quest for improvement of human machine interaction is essentially thwarted.
Crash/Restart
Other examples of the crash and restart problems are manifest in the text input/output analysis. The high occurrence of the phrase ‘do you want to hear a joke’ identified in text input/output analysis in Table 7-30 – Word ‘hear’ phrase query list is one clear example.
Here is a typical ‘do you want to hear a joke’ input/output set of exchange strings;
Time: 12_05_31 Robot: Do you want to hear a joke
Time: 12_05_35 Participant: YES
Time: 12_05_35 Robot: Do you want to hear a joke
Time: 12_05_39 Participant: YES
Time: 12_05_39 Robot: Do you want to hear a joke
Time: 12_05_51 Participant: IS THAT IT?
Time: 12_05_51 Robot: You have got me, I am dead, give me 15 second to restart
Time: 12_06_07 Robot: ok, I am back, let’s talk
For some reason the phrase would be repeated three times before the crash and restart. The engineers denied me access to the code, which was unfortunate for all sorts of reasons, one being that I could not diagnose the above problem. However, this was one of the reasons why the engineers instigated the auto-restart routines, because this crash loop happened quite often and I do not think they found the source of the problem either. The restart routine was successful most times but not every time, if the robot was not given 15 seconds to restart and further text input was presented, the crash restart routine was more likely to fail.
Latency
One participant reported experiencing latency with regard to the time that it took the robotic arm took to follow when they were moving around in the space in front of the exhibit. Other related reports under Theme 7: Engagement show that several participants felt the robot was not paying attention to them, because it was moving around and looking at other people whilst in conversation with them.
Latency is interesting because people interacting with the robot exhibited the most common aspect of delay in conversational flow, however It was not uncommon for participants or the general public to take a very long time to decide what question they would ask of the robot, or what response they might give to Chatbot output.
Noise
Ambient noise present in the surroundings of the Articulated Head in the Powerhouse Museum was a problem for a number of different reasons, but it was manageable until the major construction project gained momentum.
After commencement of the construction project the ambient noise levels in the museum were raised considerably. The robots voice required more amplification so that the public could decipher what the Chatbot was saying.
Spatial audio also needed more amplification but this just added to auditory confusion. The ambient noise levels present in the laboratory area became unworkable. Ambient noise and the effects in relation to the spatial auditory system are discussed in more detail under Theme 12: Auditory Visual Interactive Environment.
Other operational aspects commented on
One participant said that it took a long time for the robot to wake up; this was related to their physical presence within the interactive space. The robot did wake up immediately upon User text input.
E-Mote utterances were output from the Chatbot text to speech engine from time to time. E-Mote references are directly linked to triggering facial expressions of the Articulated Head so they should really have been kept internal to the system and not sent to the text to speech engine. Furthermore, the range or E-Mote functions available were not fully implemented with the exhibit. The E-Mote functions presented an obvious route for utilisation of extended capabilities of the avatar, where the infrastructure and programming already existed.
Text Space
The kiosk screen present above the input keyboard displayed the two lines of text input by the user. This limited input string length was observed as being inhibiting for Users who wanted to input more than a short sentence. This restricted input string length is likely to be one of the contributory factors that explains the imbalance of User input to Chatbot output discussed under section Observation 3 – The Input/output word ratio imbalance.
Facial and physical characteristics
There were some generally positive comments from participants regarding presentation of the Articulated Head’s face and movement related to voice, for example: Participant comments on the correlation between speech and facial animation was quite positive – “it does actually look like he is talking”
However, there was also some less positive commentary on the way in which the robots facial expression did not appear to follow the context of conversation taking place – smiling when happy for example. Indeed the robot confirms this with statements such as; “the body lacks any human emotions” – this statement makes it appear that the robot does not experience emotion in its body but the statement does not preclude it from experiencing emotion in its head or brain; Indeed the Articulated head did say things such as “I am very happy today” on a regular basis, indicating that it did experience humanlike emotions.
The female face/head displayed at one point during the Powerhouse exhibition, procured positive commentary on how real it appeared but also some negative commentary on how the Chatbot output did not alter with the change in physical presentation. The Chatbot still said things like ‘I am Stelarc, I am 62 of your human years’. The problem with this is that the female face looked more like 20 years old, not 62. The personality that is reflected in the Articulated Head’s performance as a result of Stelarc’s strong contribution to the programming of the Chatbot, clearly exhibits a male personality in the agency given.
One of the most interesting aspects related to the physical presentation that was reported by some research participant’s and also came up in conversations with members of the general public, was the perception amongst people interacting with the robot that it appeared somewhat like a Zoo animal, caged! There was a feeling of separation and distance experienced when interacting with the robot. This observation is and possible realignment of the macro dynamic interactive environment to address this point is revisited in Physical Presentation and Operational Aspects.
7.3.4 Theme 4: Mode of Interaction
This theme is directly linked with sub section 9.2.4 Mode of Interaction.
It was mentioned in section 7.1.3.4 Observation 4 – Conversational diversity across the data set that the vocabulary of the Chatbot looks wider than the Users based on review of text input/output string data, but it was also noted in Theme 3: Physical Presentation and Operational Aspects above that user input strings length was restricted to what one could see on the kiosk screen.
Hence the User input string length restriction is likely to be a contributory factor to the range of words used in conversational exchange.
There is a very large number of research participant and interpretive commentary references to problems associated with User input at the keyboard kiosk. These references relate to missing keystrokes, participant input mode references, and lack of punctuation characters available for User input from the kiosk keyboard. Participants commented that it was hard to type; keyboard input was slow, and required attention to typing. Problems such as typing mistakes and worn key labels affected fluent conversation.
Some participants reported that they felt the tempo or speed of exchange was important to conversational flow, much the same way as it is in human . human communication. These references span the entire participant group and are uniformly spread in reporting. Therefore this keyboard interaction holds a central position in the findings from this theme E-Appendix 31: Keyboard Evidence and also holds a central place in terms of major findings of this investigation.
To consider how this ‘text in-speech out’ modality of communication affects the interaction: Firstly, although this is an altered modality from that which is immediately natural to a human, in human . human communication, it is less of a problem than it might have been fifty years ago. Many people who interacted with the Articulated Head were accustomed to working with computers and a keyboard. Texting and email are commonplace modes of information exchange in modern society. However, this particular modality has the clear disadvantage in that one is normally looking at the keyboard when typing, not the entity one is attempting to communicate with, although one can look up before pressing return. It has already been noted that research participant interaction videos show an inordinate proportion of the participants time during interactions is being spent looking down at the kiosk keyboard, and then another significant but smaller proportion of interaction time looking directly at the face of the Articulated Head, waiting for a response. The time and attention-space left for the human to take in any other environmental information is greatly constrained by this single fact.
Having established text string input functionality to the Event Manager; it was then necessary to set up a specific speech prompt that triggered the Articulated Head to say ‘press enter’. I did this in order to combat audience disengagement with the exhibit and to save time of having to leave the lab to explain to patrons. This functionality was only used at times when I was present in the laboratory adjacent to the exhibit and could see a member of the public struggling to get a response from the robot, it was only used once with a single research participant to see what would happen if I introduced the apparent ability for the robot to see clothing.
Interestingly, one research participant commented in a Video Cued Recall interview that the text in-speech out altered modality when conversing with the Articulated Head would be less of a barrier to engagement in interaction, if the robot replied in the same modality. The participant said that she was very computer literate but the pre-conditioning that using computers, email, mobile communication tools and chat rooms in particular had given her, made the cycle of ‘text in-text out’ far more natural than the ‘text in-speech out’ modality that the Articulated Head installation displays. It was noted that; as it is with ‘text in-text out’, so to would it be with ‘speech in-speech out’. It is interesting also to note that some anecdotal and some concrete evidence collected from audience interactions with the Articulated Head at the museum supports this concept of matching modalities in information exchange. Many people wanted, hoped and expected to be able to speak to the robot E-Appendix 32: Example 9.
Research participants recorded interacting with the Articulated Head did, at specific points in interaction, instinctively revert back to a speech modality of information exchange with the robot. This speech modality phenomenon was also observed with several members of the general public. One specific and very compelling instance is described in section 7.3.6 where a young man gets angry with the Articulated Head. The speech modality phenomenon appears to happen when the robot says something personal ‘to or about’ the person interacting with it. Participants gave the impression in video cued recall interviews that they had forgotten momentarily that the Articulated Head was a robot. It is clear from some research participant reporting in the video cued recall interviews, and from other observations, that the human-machine barrier was temporarily suspended, that the human attributed human capabilities to the robot, that the robot had in some respects temporarily passed the Turing test during the interaction that was taking place. I decided to test this speech modality phenomenon further with one of the last research participants I interviewed and interestingly, from just that one short test, one appears to be able to elicit this response from a human in interaction by getting the robot to say something specific about the person interacting. I did this by using the aforementioned text string input functionality to the Event Manager.
The specific research participant test I mention above shows in the Video Cued Recall interview that; the robot said “I like your pink shirt” to which the participant typed “it is not pink” the robot replied “I am colour blind” the participant looked at the Articulated Head and said “Yeah, you are!” The two strings: “I like your pink shirt” and “I am colour blind” were introduced into the conversation manually via the text string input functionality to the Event Manager. This was the only time that text strings were introduced into a participant interview. The participant had no idea that I was able to make the robot speak independently from its preprogrammed response mechanisms, and it appears that because the robot had something specific to say about the personal attributes of the User interacting with it, even if the detail regarding the attribute was inaccurate, the user, momentarily at least, attributes a sense of consciousness to the robot and drops the percept of human . robot information exchange in favour of a human . human exchange – the modality of communicating is altered in that instant from text input to speech. This human speech response phenomenon (titled The Freakish Peak) is a hypothesis emergent from this investigation and is graphically represented by the diagram introduced earlier (see figure 2-1 page 18, section 2).
Whilst the test detailed above did introduce a real human consciousness to the robot’s systems during a single short interactional conversation with the Articulated Head, and this could be construed as cheating, concealing truths from research participants in research experiments to maintain the integrity of tests results is not unusual. The intention – and fact was that this test was a simulation of a stimulus that will improve human engagement in human machine interaction. The significant point is; if the robot could establish personal attributes of the person interacting with it, via sense apparatus afforded to the system (as is recommended in conjunction with the design refinements detailed in the text and diagrams presented in section 9) – and could then speak to this person about the attributes identified, then the elicitation of the human speech response phenomenon as detailed above, would not be simulated with the introduction of a human consciousness as it was in this test. It would be ‘real’ in the respect that its functionality was endogenous to the robots systems, and would therefore not be cheating, but autonomous.
Furthermore, since this investigation was focused on finding ways in which to improve human machine interaction – and research data prior to conducting this test very strongly indicated the phenomenon’s existence, this test and result constitute a valid attempt to triangulate findings from the collected research data with extra sampling, to affirm the existence of this phenomenon in a simulated test, and to ascertain the phenomenon’s sustainability as a repeatable stimulus in the interactive environment. The results of the test in conjunction with the other data and observations, confirms that the stimulus is repeatable. Variation of the identified attributes of the individual interacting with the robot would increase the sustainability of this phenomenon as a repeatable stimulus.
Development of the robots capability to identify attributes of a person interacting with it, in order to stimulate the Freakish Peak speech modality phenomenon and to increase engagement in human machine interaction is considered a key outcome of this research project. Some of the design refinements detailed in the text and diagrams presented in section 9 are specifically recommended to address this research outcome.
The attribution of perceived consciousness of the robot by the human interacting with it, is I believe, one of a set of keys to enhancement of audience engagement. Extension of multisensory capabilities to gather specific details pertaining to the interacting audience is essential to be able to invoke these kind of audience responses and furthermore, one would need to cater for the altered modality of the invoked instant in order to prolong the affect. I.e., the robot needs to be able to respond to the speech of the person interacting otherwise the invoked response fails to sustain engagement, hence the need for a voice recognition system.
The positive impacts upon human . machine interaction that automatic speech recognition might bring to this and similar interactive environments is significant and discussed in more detail in relation to Theme 12: Auditory Visual Interactive Environment and in section 9.2.4: Mode of Interaction.
7.3.5 Theme 5: Movement, Tracking and Control
This theme is directly linked with sub section 9.2.5 Movement Tracking and Control.
The audio localizer functionality and its useful contribution to the Articulated Head’s tracking functions have already been discussed in theme 4 above.
The stereo camera tracking system worked well for the vast majority of all exhibition time with the Articulated Head, and although it obtained some negative responses in relation to robotic arm movement latency, and the not paying attention me phenomenon reported by research participants, it obtained some very positive reactions from participants and the public alike.
Children were often observed trying to play hide and seek with the robotic arm, which was wonderful to see. In fact children’s engagement with the physical side of interaction was much stronger than that of their adult counterparts, possibly because typing has little appeal at a young age, and in the case of the really young ones, they may not have been tall enough to see the keyboard anyway.
One of the main complaints of participants reporting on movement and tracking of the Articulated Head was that it appeared distracted and was not paying attention to them. This attention phenomenon appears important because people seem to be put out when attention is not focused on them during conversation, not paying attention is perceived as rudeness.
There is another phenomenon that is clearly evident in participant interactions, and has also regularly been a feature observed in public interactions, that is; the wish of Users to exert some control over the robot and its responses.
The control word anomaly identified in the text input/output analysis presented under Table 7-7 – Word ‘say’ phrase query list identifies the words ‘say’ ‘sing’ and ‘dance’ as control words; that is words that were either the trigger or a central word in a phrase that triggered a specific response from the Articulated Head. One could argue that all input triggered a specific response, based on the selected 28000 or so preprogrammed responses available in the Articulated Heads Chatbot’s programming. Where these trigger related words are different, is that the User was expecting and trying to elicit control over a particular response.
Attempts at User exerted control over the robot manifest themselves in a variety of differing ways in interaction data, from the attempted control of mimicry in conjunction with the face tracking software, when it was working, to the control of the movement of the robotic arm by mimicry and movement in the interactive space, to the use of the word “say” with a following phrase to control Chatbot output. Other examples include attempts at conversational control, by trying to get the Chatbot to give specific responses to various text input strings, to specific command strings intended to evoke physical or facial expressions such as; ‘smile’, ‘laugh’, ‘cry’, ‘sing’, ‘dance’.
All of these and similar instances of User exerted control of the robot are interesting in that they seem to nurture engagement, especially when the exerted control is perceived by the participant as working. It is also interesting that in this transaction the User is noticeably giving agency to the Articulated Head, and is receiving feedback that confirms that their efforts have been successful. In this respect the human user has contributed, albeit momentarily, to the artwork and exhibit through the act of interaction, they have left a mark. This aspect if interaction related to User exerted control is thought to be an important human instinct and is considered to provide an avenue for improvement of engagement in this human . machine interaction as discussed in sub section 9.2.5 Movement Tracking and Control.
7.3.6 Theme 6: Dialogue
This theme is directly linked with section 9.2.6 Dialogue
7.1 presented the Text String Input/Output Analysis, which treated user input and Chatbot output largely as separate entities. The following looks at the dialogue and the conversational interaction largely from a combined perspective.
During conversational interaction between humans and the Articulated Head, many recurring audience questions put to the robot resulted in recurring robot responses, this is already evident from review of the Text String Input/Output Analysis tables.
A key aspect of conversational interaction that is emergent from participant reports of the experience, and has been raised in several other places already in this thesis, is that participants appeared to be drawn into a sort of interactive vacuum, where they were focused almost exclusively on the keyboard and face, at the expense of almost everything else. This stands in stark comparative contrast to normal human . human conversation in a number of differing and very important ways. Whilst it is normal to make eye contact at various intervals during human . human conversation, it is also normal for the gaze to be free to take in the visual environment. The eyes and mind build a panoramic vision of the immediate surroundings, and it is not uncommon for aspects of these panoramas to become subjects of interest in the conversational foci-taking place. Table 7-16 – Word ‘Harry’ phrase query list – clearly helps to illustrate that the environment within which the exhibit and human User are situated, can influence conversational foci and be reflected in conversational references.
The same is true of spatial audio in this respect; sounds and voices may become the subject of discussion. The brain uses head related transfer functions, which play a pivotal role in helping humans locate sound sources in environmental space. “Head movements are known to help listeners to perceive the location of a sound source”(Mason, Kim, & Brookes, 2009).
Therefore it follows that constriction of gaze and head movement will inevitably inhibit a human’s auditory visual perception. A humans focus on the conversational foci taking place is desirable, but only if the conversation taking place makes sense and has flow over a protracted period of time, or number of exchanges. Focus on conversational foci to try and make some sense of it, at the expense of everything else, is not desirable within the context of the interactions under investigation herein.
There were all sorts of anomalies identified in conversational interaction between humans and the Articulated Head. The following is an example of an anomaly reported:
“The robot contradicts himself, by saying that he is both self-programmable and unable to self-program. “
The robot stated in one conversation that he (it) was a libertarian. The participant comments that the robot being a libertarian was rather odd, as he was attached to a post in a museum so the ideology must be rather difficult to implement.
There were many reports with mixed messages regarding conversational interaction; some parts of conversation were reported to flow well, whereas other sections were reported to go badly. Sometimes participants reported banal responses, whereas as at other times they reported the robot making good sense. Generally participants experienced a little of both good sense and nonsense in conversational interaction.
Vocal characteristics of the robot affected intelligibility of its voice.
Participants who were having trouble understanding what the robot was saying, cited pronunciation and inflection, or lack thereof as the main cause of misunderstandings. Some participants complained that the robot was speaking to fast, whereas others said it was speaking to slow. Generally participants felt it was difficult to understand the robot due to the speed of talking and the lack of inflection in the voice.
There was a range of very coherent responses and engaging aspects of language picked up on in recorded interactions and interviews. One participant reported that they felt the robot was being nice to them and therefore, they felt like being nice back. This participant believed that this process of reciprocal niceness improved the conversation, and sense of engagement as a result.
Some participants reported it being difficult to converse with the robot. There appeared to be a number of different reasons for this, but the difficulty of keeping a strand of conversation flowing, tethered with the a waning will to keep trying, appeared to be the main underlying reason for participants experiencing this difficulty. Some latency was reported in sections of the interaction where the participant had to wait for the robot to respond to an input.
Questionable links between exchanges, banal replies to User input from the robot, repeated sections of sentences or repeated responses to User input were all recorded. There were many cases where the robots response did not match the question asked. Many sections of conversation suggest that the robot did not understand the user input. Some participants reported the conversation being shallow and said that conversation could only be maintained with short superficial input (small talk).
Several participants reported their pleasure that the robot showed signs of being interested in them. This was a pattern that has come up in many places in the data. It appears that it is important that the robot pay attention to the human interacting with them.
There are a multitude of other micro dynamic conversational exchanges that could be examined in more detail, for instance; some people perceived that the robot was being rude at times, some said it was sleazy, some reported concordance and agreement with the robot where as others reported misunderstandings and disagreement.
However, the macro dynamic messages that emerge from review of the dialogue are that, conversational flow rarely extends beyond a couple of exchanges and that humans find driving the conversation hard work, that humans want the robot to be interested in them and generally report more positive feedback when the robot shows an interest, or agrees with them.
Fundamentally, as is typically the case with human communication, engagement appears to be directly linked to the robot showing an interest in the audience and finding concord with them on subjects of the human’s interest. The result of this concord, when found, is that the human reports liking the robot. To address this macro dynamic finding in conjunction with Observation 1 – The anthropomorphic stance, a few simple recommendations for improvement of conversational interactive engagement are put forward in section 9.2.6 Dialogue.
The opposite of the above, where participants have experienced rudeness or disinterest emanating from the robot, invariably seemed to nurture some degree of human disengagement from the interaction. Disengagement was often shown by changes of subject in conversation, but has been observed to put members of the public off altogether. Table 7-31 – Word ‘talk’ phrase query list – displays the phrase ‘I don’t want to talk about that now’ emanating from the robot, and this phrase has been observed to engender frowns and other displays of upset from humans interacting with the robot.
One very interesting general public interaction I observed entailed a young man typing rude words into the kiosk. One of the engineers had set up a Chatbot response that called the User a loser when they used this particular four-letter word. Upon being called a loser by the Chatbot, this young man instinctively went into a fit of rage and started shouting expletives, pointing his finger and shaking his fists at the robot. I found this immensely amusing but also very interesting because it showed me that the instinctive nature of humans to default to the modality of speech is very strong, and lingers close to the surface of human machine interaction even when subdued.
Another interesting aspect of dialogue exchange is that Stelarc’s personality shows through in the Chatbot programming from time to time. Table 7-33 – Word ‘western’ phrase query list – showed Stelarc’s strong interest in the Western Bulldogs and country and western music in particular. This observation raises the idea that concordance with a human User in conversational interaction could fairly easily be simulated in dialogue by copying, storing and then repeating preferences of the User in subsequent conversation.
Table 7-35 – Word ‘old’ phrase query list – identifies the two most consistently common phrases to appear in both User input and Chatbot output, ‘what’s your name’ and how old are you’. User answers to these questions clearly present opportunities for enhanced engagement in conjunction with memory and representation as discussed in section 9.2.
Opportunities, which allow the robot to show more interest in, and concordance with the User, and to initiate conversational topics rather than being driven by user input, are expected to enhance conversational engagement in human . machine interaction.
7.3.7 Theme 7: Engagement
This theme is directly linked with section 9.2.7 Engagement.
Table 7-29 – Word ‘time’ phrase query list identifies that the ‘spare time’ phrase is normally prefixed with ‘what do you do in’ and that this phrase does encourage User input and also indicates that the robot was interested in the User. The crosspollination between themes is slowly but surely becoming much more obvious. Various aspects pertaining to engagement have already been raised in the themes preceding this one.
The robotic arm movements are linked to participant reports saying that the robot was not paying attention to them, and have also been cited as having a possible link to erroneous stereo microphone input. Participants report that the robot appeared distracted. Latency in human input is linked to people having to think hard to work out what to say and then needing time to pay attention to the typing, to get the words input correctly. Participants report becoming bored or exhausted with the interaction after a while, because it was difficult to keep a conversation going, they felt they were not getting anything back. Participants wanted the robot to be interested in them and found it annoying when the robot appeared to ignore them and pay attention to other people in interactions. Participants became excited when the robot showed signs of being interested in them or remembered something individual about them. When the robot was nice or friendly, participant’s tended towards being nice and friendly back.
The same things seem to crop up over and over again in the data, but the most prominent message of all with regard to engagement is that; it would be more engaging if you could speak to it. It is interesting to note that although some participants instinctively approached the interaction with speech, some participants who were already in the throes of text input still tipped over into a speech modality, this invariably happened after a preceding exchange which strongly suggested the robots possession of consciousness. It appears that the trigger can be accidental or deliberate and still procure the same outcome. However the triggers do appear to relate to the robot being conscious of a particular aspect or feature of the human, that it would be unlikely to know without senses or consciousness, hence triggering the attribution of consciousness in the human mind leading directly to the subsequent speech action.
It is not clear whether systematic deliberate placement of triggers will consistently stimulate the same response but this observation clearly provides opportunities for development and further testing as discussed in section 9.2.7 Engagement.
7.3.8 Theme 8: Emotions
This theme is directly linked with section 9.2.8 Emotions.
The emotions theme shows a strong correlation with previous themes. The anthropomorphic stance taken by the interacting audience manifests in the checking of emotions just in the same way as it transpires in the checking of preferences. This checking of emotions appears rather like a human prodding another creature that appears to be dead, just in case there is some life left in the creature. Strangely though, if one had a reaction from another creature after prodding, one would almost immediately assume that there was some life left in the creature and the prodding would stop.
However, this prodding by the participant to gain a response appears to continue even after confirmatory responses from the robot were received. It is as if the human somehow can’t believe it was true that the robot possessed the capacity for emotion. This further highlights the conflict between what one wants to believe and what one ultimately must know only exists within the biological condition.
Amazingly though, there are instances in the research data where a research participant has attributed the capacity for feeling emotions to the robot. For instance there is an exchange where the participant asks the robot if he was embarrassed, the robot replied saying that he doesn’t have feelings to which the participant replies, “I don’t believe that!”. In the Video Cued Recall interview the participant confirmed that she didn’t believe that.
This suggests that although it is difficult to elicit attributions of conscious existence from the human towards the robot, it appears equally difficult to reverse the attribution once given. This directly links with the hopes, wants and beliefs of the human interacting with the robot. The term Scotoma describes the condition where the mind sees what it chooses to see rather than what is actually there. For example, the mind can fill in gaps when reading text so that the sentence you are reading makes sense even when words are missing. It is as if the minds eye has a blind spot, but the mind itself allows sense to be made from the otherwise normal data, and clearly if one knows what the mind wants to see, then one can give the mind a helping hand in seeing it, through simulated performance that points in the desired direction! This human predilection for Scotoma represents opportunities for leverage in improvement of this human machine interaction as discussed in section 9.2.8 Emotions.
It seems to me that the systematic checking of preferences and emotions by the human interacting with the Articulated Head was really all about sharing existential experience. Humans appeared to want to be in charge of the interaction but simultaneously wanted the robot to be able to share common objects of consciousness and sedimented layers of experience as perceived through any and all of the senses, and to be able to discuss and have opinions related to those elements raised during the interaction.
Emotions expressed by the robot and by the humans in conversational interaction, needed to be acknowledged by the Articulated Head’s performance in various ways to show some concordance with the human’s existential experience. This needed doing, not just in words but also in gestures, facial expressions and tone of voice. Empathy, sorrow, fear, anger, loathing, happiness, humour and so on needed some simulation of human- type responses present in the Articulated Head’s performance. If not human- type responses, then at the very least, some robotic type responses that the human could easily draw parallels with. This expansion of nuances in the virtual performers performance was needed in order to combat the conflict between hopes, wants and reality that appears to exist in the human mind in relation to interactions. The aim would be to promote human favour of the hopes and wants that nurture belief in, and attribution of existential experience to the robot.
7.3.9 Theme 9: Senses – Related
This theme is directly linked with section 9.2.9 Senses – related.
The senses have been raised and their crosspollinations are evident in several other themes, and have been a subject of interest throughout this thesis.
Table 7-30 – Word ‘hear’ phrase query list references the Chatbot’s use of the word ‘hear’. The use of the word inferred to the User that the robot could hear. Several pieces of evidence already presented point to the fact that people interacting with the Articulated Head hoped, wanted and expected the robot to be able to see and hear them, this message is loud and clear in evidential data.
It has been noted that there was much less reference to the other senses, smell, taste and touch, in research data. Possibly this was because of the void and distance created between the Articulated Head and its audience by the enclosure. Perhaps the aforementioned caged zoo effect reduced the relevance of these other senses in the interactive environment to some extent, or perhaps the human instinctively thought the robot less likely to possess these senses anyway. Whatever the reason, further discussion of possible extensions to the robots performance in relation to these senses is conducted in section 9.2.9 Senses – related.
7.3.10 Theme 10: Memory Related
This theme is directly linked with section 9.2.10 Memory related.
In textual evidence for this theme, there is an engaging example of where a participant has shared a secret with the Articulated Head; examples of where the robot shows self-awareness and displays a possible capacity for learning, and teaching in one case are also present. There are further examples of participant wants and hopes regarding conversational flow.
Review of the Table 7-30 – Word ‘hear’ phrase query list shows the phrases ‘question I don’t hear everyday’ and ‘I only hear that type of thing’: both phrases infer memory related to previous conversations.
The robot stated that its prime directive was to collect new knowledge, yet there is scant evidence of the robots ability to do this in conversational interaction. There is more than one example in the empirical data where the robot has told a participant that he was searching for something in memory or on the Internet, and then he failed to follow through with the results of that search. In one case the participant expressly asked for the results and was not given them. The participant commented in Video Cued Recall that they had a fear that they maybe harbouring unrealistic expectations of the robot.
However, if the robot had followed through with some results, the participants fear or holding unrealistic expectations would have been allayed, then repeated re-affirmation of their expectation being realistic would very likely to lead to embedded enculturation that would have a positive, and possibly permanent affect upon future interaction, leaving the participant with the belief that the robot does possess the skills to search its own memory and present results.
Sometimes the Articulated Head did succeed in showing that it had a memory of conversational themes that had just passed. However, this memory seemed to be limited to the recall of short strings, and often the section of string chosen did not fit well into the context of the conversation currently taking-place, and sometimes the pasted string did not fit well into the constructed sentence either. Sometimes the Chatbot remembered a person’s name during interaction and other times it did not. Sometimes the Chatbot carried over a name that was stored from a previous interaction into the current interaction; this was also true of subjects that it seemed to recall.
The Articulated Head, and more specifically its Chatbot appeared to display a distinct lack of memory during interactions. Participant reports have shown that this can lead to frustration and annoyance. Some active display of memory is believed to be a very important factor in tying together all the findings from the themes that have preceded this one. How expansion of memory capabilities might have improved the Articulated Heads performance in human . machine interaction is discussed in more detail in section 9.2.10 Memory related.
The concept of thinking is normally shown in conversation by recalling strands of previously discussed or digested information, and then representing new reason based on consideration of these strands and synthesis of their contributions to meaning in the context of ‘now’. This process sounds complicated and indeed it is. Fortunately the biological brain is uniquely equipped for performing such processes. However, simulation of thinking in the Articulated Head’s performance could be, and was to some extent achieved through the construction of pre-programmed responses. The problem is that to simulate thinking convincingly, synthesis of prior information and its contribution to meaning in the context of ‘now’, must be included in the simulations ingredients. Pre-programmed strings can simulate synthesis of prior knowledge, but cannot be equipped with the facility to place synthesis of it in the context of ‘now’ without the inclusion of some current variable. Current knowledge pertinent to the immediate situation and surroundings could, and should have been pre-programmed into the Chatbot’s memory repertoire for inclusion in conversation, but it is the inclusion of current variables in simulation that makes the performance really convincing. It is for this reason that the Articulated Head needed to be able to establish at least some current variable conditions in order to place conversational foci in the context of ‘now’. Ways in which this could be achieved are discussed in section 9.2.1 Memory related.
7.3.11 Theme 11: Outliers
This theme is directly linked with section 9.2.1 Outliers.
Having reviewed the data in the nodes sorted under this theme, I found that virtually all of them could be satisfactorily sorted under Theme 1: Anthropomorphism or Theme 6: Dialogue. This left me with just four very shot snippets of nodal data that just did not fit anywhere else.
Two of the snippets very briefly relay comments related to science fiction references. One participant comments:
“I think the display or the head could become more engaging, for me personally, if it kind of bought up some of these issues with artificial life dominating human life”.
There were a few other references to science fiction identified in the text input/output analysis and participant interaction data, such as references to Star Trek, R2D2, Star Wars and the phrase ‘life the universe and everything’.
Given the context of the robot being presented as a Thinking Head, perhaps some more Chatbot programming related to these subjects would be interesting.
The other snippet of information that could not be placed comfortably under another theme relates to one participant indicating that she felt that there was a third party involved in the interaction, she states;
“it was paying attention to me, which was nice and that sense of being watched was, I guess you would say, more natural than the sense of something else watching us – It would be like me having this conversation with you and a security camera taking us, that sort of, someone else sitting behind that security camera watching what we were doing, where as, I attributed the camera ‘it’ to – it then interpreting what I was doing” –
Sub Section 7.3 – The Emergent Themes 200 Interviewer interjects – “so effectively like you meeting a robot that was being watched by the aliens that created it” participant confirms with a “yeah, for sure and I think the other thing is too, you can sort of immerse yourself in the conversation and interaction but as soon as you notice there is something else up there it sort of – snaps out the magic of it I guess – so as soon as you notice there is cameras and there is, it sort of makes you go oh – someone might be controlling that …it destroys the illusion”.
The participant was alluding to the fact that hiding aspects of the exhibit is important to creating the illusion, which sustains engagement.
The participant was aware of the fact that she was being recorded before she entered the interaction at the time, but nevertheless, the magic of the illusion is both reported and destroyed within her statement, and presumably the same was true within her interaction as well.
The participants comments strongly suggest that immersion in the magic of the illusion in this interaction can be reasonably easily reached, but it is important not to present any features in the exhibit design that would destroy the illusion quickly, because the illusion is just as easily (if not more easily) deconstructed.
7.3.12 Theme 12: Auditory Visual Interactive Environment This theme is directly linked with section 9.2.1 The Auditory Visual Interactive Environment.
This theme reflects upon, and evaluates the auditory visual additions tested in parallel with research participant interaction with the Articulated Head in the Powerhouse Museum. Within the context of the embodiment of the human participant’s brain during this interaction, the critical importance that dimensional layout and display have upon the effectiveness of audio visual aids and the strength of spatio-temporal contextualizing cues in relatively unconstrained interactive public exhibition spaces is considered.
Conclusions presented in section 9.2.12 The Auditory Visual Interactive Environment contribute a refined experimental project and exhibit design,
Section 7 – Presentation of Research Data and Findings 201 aimed at expediting more encouraging participant reportage of the enhancement of engagement in Human Computer Interaction with this, and similar types of interactive installation exhibits.
Video Cued Recall Outcomes The Video Cued Recall interview data identifies a very broad range of phenomena experienced by participants during interaction with the Articulated Head. The following text predominantly imparts only features that directly link participant experience to the auditory visual additions.
Extracts of Video Cued Recall interview data, where research participants were making direct reference to the auditory visual environment, are presented in as Appendix 3 The Auditory Visual References Table. These extracts have been specifically selected as they clearly impart the overarching impact of the auditory visual interventions. They are letter coded according to; Column ‘S’ The following is one example of a short extract: ‘Participant explains that although he was aware that the projections were there, he did not really pay any attention to them as he was focused on the interaction, on the keyboard and face.’ The overarching message extracted from all the Video Cued Recall data related to the auditory visual environment, including but not limited to the comments listed in the table in Appendix 3, was that most participants did not notice the auditory visual additions and even when they did, the contribution to the experience of interaction with the Articulated Head was generally perceived to be insignificant, peripheral, subconscious or even divorced from the mainstay of their experience of interaction with the Articulated Head.
Section 7 – Presentation of Research Data and Findings 203 b) How can one effectively amplify the impact of the noticed phenomena and convey the presence of those intended phenomena, which have not yet been recognised by the participants.
c) What other mediating considerations and constraints beside the above should be taken into account in order that one might expedite more encouraging participant reportage of the enhancement of engagement in human computer interaction with the Articulated Head or similar interactive installations, whilst retaining integrity of the exhibit within the wider museum environment? Mediating considerations and constraints Appendix 1 is a complete plan-view diagram of the exhibit layout and surroundings. Please refer to it for clarification of any features of the exhibit that are referenced by enclosing in [] brackets in the following text, including dimensions and position of any physical aspect of the enclosure and identification of speaker mounting positions.
The Spatial Audio System Loudspeakers [L1, L2 & L3] projected their dispersion patterns out toward the opposite [Wall A]. They were mounted to the left hand side of the enclosure [apex] just above ground level, spaced evenly and inset into the enclosure wall for the purpose of maintaining a flush face to the exhibit enclosure walls.
The direct sound dispersion angle of each speaker was too low to allow any direct sound to reach the ears of a person standing at the kiosk. Two loudspeakers [L4 & L5] mounted similarly to [L1, L2 & L3] but to the right hand side of the enclosure [apex] when facing the robot, projected their direct sound dispersion pattern away from a person standing at the [kiosk] – out towards the [large doorway] into the Engineering Excellence Awards Exhibition (EEAE) and [Wall B]. This meant that only low-level reflected sound from hard surfaces, which were considerably further away from the speakers than was the case on the left hand side of the [apex] was reaching the ears of a person at the [kiosk]. No direct sound from those speakers was reaching the ears of the person standing at the kiosk at all.
Sub Section 7.3 – The Emergent Themes 204 Bob Katz, in his book, Mastering Audio comments “did you know that wearing a hat with a brim puts a notch in your hearing at around 2Khz (Katz, 2007, p. 46)”. The reflective surfaces that feature as part of the enclosure were obscuring direct sound from reaching the ears of a person standing at the kiosk and were also affecting the frequency characteristics of source sounds in a similar way.
Frequency cancellation, the colouring effects of comb filtering and the altered frequency content of reflected sound, caused by speaker mounting positions and their subsequent dispersion patterns, were all ubiquitous within the spatial auditory environment. These confounding aspects of the built environment compromised aspects of the experimental design, having a pivotal impact upon the ability to balance the Distance Based Amplitude Panning (DBAP) (Lossius, Baltazar, & de la Hogue, 2009) system effectively.
Balancing this system was critical to the effectiveness of virtual sound source placement within the spatial audio environment. Distance Based Amplitude Panning is one of a range of auditory spatialisation techniques, chosen in this case, as it is a useful technique for auditory spatialisation where placement of loudspeakers units in irregular or undesirable positions becomes necessary.
Although Distance Based Amplitude Panning allows for irregular speaker placement, the system still has limits in this regard.
An Ono Sokki LA-1210 (Type 2) (“ONO SOKKI-Products Information & Service,” 2012) sound pressure level (SPL) meter, which conforms to [JIS C1502 Type 2, IEC 60651 Type2, Draft IEC 61672-Aug 1998 Class2, ANSI S 1.4 Type2] standards, was used for taking sound pressure level readings. The unit’s omnidirectional microphone was placed at the centre of the expected listening position of a person standing at the kiosk – 1.65m. Full bandwidth 20- 20Khz pink-noise was sent from Max environment out to each speaker in the SPAT array in turn, and then a decibel reading was taken. A purpose built iPhone Max app, created using the c74 (“c74,” 2012) object provided the ability to control the volume of each channel of the SPAT Max patch remotely, whilst taking readings at the kiosk. The spatial audio system was balanced by obtaining the same sound pressure level reading at the listening point from each speaker in the array.
Ambient noise in the museum was measured on a normal day before the aforementioned construction project started at the normal kiosk position. The reading produced a typical sound pressure level of around 83db. The maximum sound pressure level reading obtainable from the most obscured speakers in the SPAT array (L4 & L5), using pink noise at the maximum system volume obtainable without clipping was 86db. Although human hearing perception has high resolution and a change of 1db in sound pressure level of a source present within the sound field maybe perceptible to the experienced listener, the average listener does not readily notice it (Katz, 2007). Because all speakers in the SPAT array must be balanced to the same sound pressure level reading as (L4 & L5) to render a balanced DBAP system, they were all subject to the same governing constraints. Therefore, the effective system headroom for raising source levels above the ambient noise present within the spatial auditory environment was equal to or less than 3db, rendering sound pressure level variation of virtual sound sources barely perceptible to an audience at the kiosk position.
Furthermore, this balancing act left an unacceptable situation created by the exhibit, whereby the volume of audio output from speakers (L4 & L5), although supplying the correct sound pressure level reading for the meter at the kiosk listening position, rendered the audio volume in the museum space to the right of the enclosure [apex], far too loud to retain integrity of the exhibit within the wider museum environment. This problem manifested itself with several speakers in the array to a greater or lesser degree, depending on obscurations and/or distance from the meter at the kiosk position.
Other confounding and compromising aspects of the spatial auditory system design included; speakers [L8, L6, U5 & U7] mounted on [Wall A & Wall B] of the Engineering Excellence Awards Exhibition, which were probably too far away from a participant standing at the kiosk to fully contribute to the spatialisation. Speakers [U2, U3 & U4] were directed towards the participant Within the context of audience embodiment and in order to secure a powerful, immersive and dynamic auditory visual experience for them, auditory visual presentation of stimuli must remain a central critical concern to all design decisions made from concept through to the completion of any such installation. The compromises made with dimensional layout and display when installing the auditory visual additions in the Powerhouse, constituted the reasons that jeopardised full realisation of the design intention (Paine, 2010). One cannot compromise on the technical details of presentation of auditory visual stimuli with each obstacle that presents itself within the design and construction sequence of an exhibit, and then expect the outcome to retain veracity with conceptual intentions. In fact, with each compromise made, one experiences a dilution and depletion of the impact of auditory visual stimuli upon the interacting audience.
So, it is not a case of form over function or vice versa, it is very clearly a case of function within form. This means that function and form are of paramount importance together. Synergy rather than compromise with critical concern for the details of both visual and technical design decisions, centered around the unchangeable aspects of audience experience, presented by virtue of their embodiment, must be adhered to from the outset of any such project, if it is to be wholly cohesive and successful in meeting the conceptual intentions and outcome criteria set by the body of interested parties contributing to its implementation.
To this end section 9.2.12 The Auditory Visual Interactive Environment contributes an experimental projects and exhibit design refinement, aimed at giving an audience a more immersive and enveloping auditory visual experience and expediting more encouraging participant reportage of the enhancement of engagement in Human Computer Interaction with this, and similar types of interactive installation exhibits.