Avatars, Agency and Performance: The Fusion of Science and Technology within the Arts
Richard Andrew Salmon 2014
2. Introduction
The Australian Research Council (ARC) and the National Health and Medical Research Council (NHMRC) worked together to fund the Thinking Systems initiative in 2005/6. This funding initiative was aimed at supporting cross- disciplinary research.
This thesis and the research conducted to complete it, stem from that Thinking Systems initiative grant. The Thinking Systems initiative was a large- scale multi-institutional cross-disciplinary project involving a range of research initiatives related to thinking systems, robotics, human . computer and human . robot interaction.
The Thinking Head Project based at The MARCS Institute (Institute, 2012) at the University of Western Sydney (“University of Western Sydney,” 2009) was one of three successful applications for this research initiative;
1. From Talking Heads to Thinking Heads: A Research Platform for Human Communication Science
2. Optimizing autonomous system control with brain-like hierarchical control systems
3. Thinking Systems: Navigating Through Real and Conceptual Spaces
The Thinking Head project was a large scale, five-year art-science research project that owed its inception to Stelarc’s Prosthetic Head. The Prosthetic Head incorporated Head0+ as detailed in E-Appendix 1: An Introduction to Head Zero (Luerssen & Lewis, 2008). The project brought together speech scientists, psychologists, software developers, artificial intelligence experts and artists.
The research reported herein is based upon one particular component of the Thinking Head project, The Articulated Head. This was a part of the large scale multi-institutional and cross-disciplinary project called the “Thinking Head Project” (Herath, Kroos, Stevens, Cavedon, & Premaratne, 2010), and was conceptually the artistic vision of well-known Australian performance artist, Stelarc (Zebington, 2012).
2.1 Stelarc’s background
“Stelarc is an Australian performance and installation artist who has internationally exhibited, presented and performed in Europe, Asia, North America, South America and Australia. He has visually probed and acoustically amplified his body. He has made three films of the inside of his body, filming three meters of the internal space of his lungs, stomach and colon. Between 1976 and 1988 he completed 25 body suspension performances with hooks into the skin. He has used medical instruments, prosthetics, robotics, Virtual Reality systems, the Internet and biotechnology to explore alternate, intimate and involuntary interfaces with the body. He has performed with a Third Hand, a Virtual Arm, a Stomach Sculpture and Exoskeleton, a 6-legged walking robot. His Prosthetic Head is an embodied conversational agent that speaks to the person who interrogates it.
As part of the Thinking Head Project he collaborated with The MARCS Institute at the University of Western Sydney to develop the Articulated Head, the Floating Head and the Swarming Heads projects. He is surgically constructing and stem cell growing an Ear on Arm that will be electronically augmented and Internet enabled. The first surgical procedures occurred in 2006. He has recently been performing with his four avatar and automaton clones on his Second Life site, exploring actual-virtual interfaces and gesture actuation of his avatar with Kinect.
He has performed and exhibited at the Warhol Museum, Pittsburgh 2011; the Tate Modern, London 2011; the Kinetica Art Fair, London 2011; Kyoto Art Centre, Kyoto 2010; Ars Electronica, Linz 2010; Itau Cultural, Sao Paolo 2010; the Mori Art Museum, Tokyo 2009; National Centre for Contemporary Arts, Kalingrad 2008; National Art Museum, Beijing 2008; FACT, Liverpool 2008; Experimental Art Foundation, Adelaide 2007; Exit Art, New York 2006; Hangaram Design Museum, Seoul 2005; Museum of Science and Technology 2005; ACMI, Federation Square, Melbourne 2003; National Gallery of Victoria 2003; Museum of Art, Goteborg 2003 and the Institute of Contemporary Art, London 2003.
Publications about his artwork include “Stelarc: The Monograph, Edited by Marquard Smith, MIT Press, 2005; “Stelarc: Political Prosthesis and Knowledge of the Body” by Marina Grzinic-Mauhler Maska & MKC (Ljubljana & Maribor), 2002 and “The Cyborg Experiments: The Extensions of the Body in the Media Age”, Edited by Joanna Zylinska, Continuum Press (London, New York) 2002.
In 1997 he was appointed Honorary Professor of Art and Robotics at Carnegie Mellon University, Pittsburgh USA. From 2000-2003 he was Senior Research Fellow at the Nottingham Trent University, Nottingham UK where he developed the MUSCLE MACHINE, a 5m diameter-walking machine using pneumatic rubber muscles. In 2003 he was awarded an Honorary Degree of Laws by Monash University. He received a New Projects grant from the Australia Council in 2010 to develop a micro-robot. Stelarc was Senior Research Fellow and Visiting Artist at the MARCS Auditory Labs (now known as The MARCS Institute) at the University of Western Sydney, Australia, between 2006 – 2011. In 2010 he was also awarded the Prix Arts Electronica Hybrid Arts Prize. He is currently (at the time of writing this document) Chair in Performance Art, School of Arts, Brunel University London, Uxbridge, UK. Stelarc’s artwork is represented by the SCOTT LIVESEY GALLERIES in Melbourne” [Stelarc – personal communication – biographical notes]. Stelarc is now a Professor at Curtain University(“Professor Stelarc,” 2013)
2.1.1 From Talking Head to Thinking Heads: Aims & Objectives
The Australian Research Council grant application related to the Thinking Head; E-Appendix 2: From Talking Heads to Thinking Heads clearly outlines the aims and objectives projected at the time. The application states, “the overarching goal is to establish a Talking Head Research Platform” that will seed development of a new generation of talking head technology through gradual elaboration of a Thinking System that allows (a) implementation and integration of hardware and software components from various technology disciplines within Human Communication Science (HCS); and (b) iterative testing and development via integration of research and researchers across a wide range of HCS disciplines; and . provide a challenging and maieutic environment that enables researchers to address (a) significant questions in HCS and Embodied Conversational Agent (ECA) research, and (b) key questions in individual research areas, which could not be addressed without the implementation and development of the Thinking Head.
The Thinking Head Project and more specifically the Articulated Head has provided this challenging and maieutic environment, serving as a stage for the testing of various software engineering developments and the creative new media arts projects, which form a part of this research work. The Articulated Head, which was installed as an interactive exhibit at the Powerhouse Museum, Ultimo, Sydney, Australia between February 2011 and November 2012, has been a central focus for several published papers, some of which are referenced elsewhere in this document. The Articulated Head exhibit; its immediate environment and its interacting audience are the central concern of all the research detailed forthwith.
2.2 History
2.2.1 Mirrored development: the Prosthetic and Articulated Head
The Prosthetic Head also referred to as Head0 (zero) is an implementation of an Embodied Conversational Agent (ECA) based on and used by the Artist Stelarc (Luerssen & Lewis, 2008).
Stelarc describes the Prosthetic Head’s evolutionary developments into a number of other project iterations from his own perspective as follows:
“The Prosthetic Head project was realized in San Francisco with the assistance of three programmers. Karen Marcello (system configuration & Alice-bot customization), Sam Trychin (text to speech customization of 3D animation) and Barrett Fox (3D modeling and animation). The development of the Prosthetic Head was conducted in consultation with Dr. Richard Wallace, creator of Alice-bot and Artificial Intelligence Mark Up Language (AIML).
The Prosthetic Head was the first art project of mine that involved an interactive installation using verbal exchange. The aim was to construct an automated, animated and reasonably informed artificial agent. If the head responded appropriately, with convincing answers and facial expressions, it would be an effective – or “intelligent”- agent. It is an embodied conversational agent with real-time lip-syncing. That is; if you ask it a question via a keyboard, it searches its database for an appropriate answer and speaks the response. Rather than an illustration of an artificial intelligence, it is seen more as a conversational system that coupled to a human head is capable of carrying on a conversation. The idea was that – as its database extended, the head would become more unpredictable in its response. The artist would no longer be able to predict what his head says. The head’s database had a repertoire of stories, definitions and philosophical insights preprogrammed.
The 3D model was a 3000 polygon mesh, skinned with the artist’s face texture so that it somewhat resembled him. Its eyeballs; tongue and teeth are separate moving elements. In retrospect it was a kind of digital portrait of the artist. As well as its general database, it was also programmed with personal details of the artist as well as the artist’s opinions, theories and arts practice.
The Prosthetic Head is projected as a 5m high head. In the installation at the Australian Centre for the Moving Image (ACMI) (Thood, 2012) in Melbourne, an ultrasound sensor detected when someone approached the keyboard. The Prosthetic Head then turned to confront the user and initiated the conversation by asking the user who they were and what they are doing there. The Prosthetic Head being projected 5m in height gave the head a disembodied but strong feeling of presence. As well as carrying on a conversation, it generated its own poetry-like verse, and its own song-like sounds. These were different each time you asked it. So it was creative in the sense that it generated unexpected juxtapositions of words and sounds.
The Prosthetic Head also generated the Walking Head and the Partial Head. These works preceded the Thinking Head research project led by The MARCS Institute (Institute, 2012) at the University of Western Sydney (UWS) (“University of Western Sydney,” 2009). The Walking Head was a 2m diameter, 6-legged, autonomous walking robot with a human head displayed on its LCD screen. The scanning ultrasound sensor detected if anyone entered the gallery space, it then selected from its library of possible movements and performed a choreography that lasted for approximately one and a half minutes. It then sat and went to sleep until the next visitor came into the gallery. With the Partial Head, the artist’s face was scanned, as was a hominid skull. We then did a digital transplant producing a composite human-hominid face. Using that data we 3D printed a scaffold, seeded it with living cells to grow a living skin, a third face. If the Prosthetic Head was a kind of digital portrait of the artist, the Walking Head was a robotic portrait of the artist and the Partial Head was a bioengineered portrait.
As part of the Thinking Systems research funding the Prosthetic Head AIML Software was added to with the Head0+ version, this task was conducted by Martin Luerssen of Flinders University (“Flinders University in Adelaide, South Australia – Flinders University,” 2012). We could script facial expressions within the AIML text. The significant realization was that an intelligent agent might have to be more than a virtual embodiment to become a more seductive and interactive agent.
The Prosthetic Head became the Articulated Head, then the Floating Head (a collaboration with NXI Gestatio in Montreal (Gestatio NXI, 2012)) and finally the Swarming Heads (a collaboration with the Robotics Lab at The Marcs Institute (Institute, 2012) at the University of Western Sydney (“University of Western Sydney,” 2009)). With all these iterations of an embodied conversational agent there is an interest in alternate architectures, and what vocabularies of behaviour generate aliveness in these systems. Not only in the virtual behaviour of the heads, but in their physical components and in the interaction of the two. For the Articulated Head an LCD screen imaging the Prosthetic Head was fixed to a 6 degree-of-freedom industrial robot arm, allowing a 3D task envelope of interaction. The challenge with the Floating Head was that, its responses, compared with the Articulated Head, were much slower. However, this did not seem to affect the interaction with people. For example the Floating Head would hover low to the floor but when someone approached it, it would rise and move forwards towards you, acknowledging your presence. If there were a group of people around it, it would display curious behaviour moving sideways and turning one way then another, and if there were a crowd of people it would become nervous and rise higher. The Swarming Heads are a cluster of seven small wheeled robots each displaying the Prosthetic Head in their mounted screens. A Kinect motion sensor was also mounted at the front of each robot, allowing the visitor to interact with them. This multiple embodiment has the potential of becoming the most seductive system. Not only can a user interact with each robot through gesture, but also the cluster of robots might be able to generate emergent behaviours in their interactions. So as well as avoidance behaviour, predator and prey and flocking behaviour is possible. Also, if the robots are able to perform head recognition when they turn and face each other, then that could trigger conversational exchanges between them. The Swarming Heads could also become a Skype platform, enabling remote users to interact with other remote users through movement as well as image and voice”. [Stelarc – personal communication]
Stelarc’s overview of the various iterations of the Prosthetic Head, as manifested in the subsequent project developments detailed above, alludes to some aspects of how the Head, as an avatar, might be given agency – including ways in which it could be multi-sensory and multi-present. Suggestions for ways in which the characteristics of the `virtual’ performer (e.g., personality, nuance of response, context) condition affect and engagement are also inferred in Stelarc’s overview. Agency is of prime interest to this investigation. However, it must be noted that this study, although not ignoring the various other iterations of the Prosthetic Head as implemented in projects such as the Floating Head and the Swarming Heads, is firmly focused on the Articulated Head robotic installation, its interacting audience – and the environment in which these interactions took place.
2.2.2 The Projected Evolutionary Development of the Head
The research and development plan as detailed in E-Appendix 2: From Talking Heads to Thinking Heads puts forward a range of aspects for consideration in development of the Thinking Head, under four different sub- section titles; Embodiment, Performance, Interaction and Evaluation respectively. The Embodiment title incorporates aspects such as speech recognition, speech synthesis, and dialog management systems to allow the Head to converse. The Performance title incorporates new media arts and exploration of the Embodied Conversational Agent (ECA) as a virtual performer. The Interaction title incorporates a focus on auditory-visual (AV) speech, and its integration with key aspects of intelligence: prediction, interaction and learning. The Evaluation title has a focus on controlled experiments on behavioural interactions between the Head and research participants; the findings of these experiments would drive Head development.
2.3 Stelarc’s work and the Articulated Head set within the context of other robotic/AI chatbot artworks.
Stelarc’s work is situated within the context of a number of other artistic works, involving artificial intelligence and robotic style projects that have taken place in various locations around the world in recent years. Projects such as Mari Velonaki’s Fish-Bird project which has “wheelchairs, that can communicate with each other and with their audience through the modalities of movement and written text.” (Velonaki, 2006). The wheelchairs are conceptually ‘in love’ and they write letters to each other, which are left on the floor for an audience to see. The audience can disturb the wheelchairs, which can result in them modifying the content of letters dropped to be less personal or emotional. The Articulated Head project has some parallels with the Fish-Bird project in the respect that it can ‘sense’ the presence of an audience and modify its behavior as a result, it also has preprogramed messages, which are delivered in speech rather than text.
Another artwork that shares some parallels with the Articulated Head is Ken Feinegold’s robotic/animatronic chatbot art work “In all his work, Feingold programs his creations so that the conversations are continually changing: the robots’ dialogue, he has said, is neither scripted nor random, but instead designed to mimic “personality”: “a vocabulary, associative habits, obsessions, and other quirks … which make their conversations poetic, surprising, and often hilarious,”(Maclaughlin, 2004)
Norman White’s helpless robot “is an artificial personality that responds to the behavior of humans by using its electronic voice which speaks a total of 512 phrases. The speech that is delivered depends on its present and past experience of “emotions” ranging from boredom, frustration, arrogance, and overstimulation”.(White, 2011) The Articulated Head had a significantly wider vocabulary of phrases than the Helpless Robot and it was capable of a range of facial expressions that indicated emotions such as frowning and smiling.
Many robotic projects taking place around the world share parallels with the Articulated Head to differing degrees. Kismet and Cog are two sociable robots built at Massachusetts Institute of Technology (MIT). Cynthia Breazeal, a roboticist at Massachusetts Institute of Technology, has worked extensively with the robot called Kismet, a rather beautiful robot with big red/pink lips, eyes and ears. Kismet can talk and give human child like responses to verbal cues from Cynthia, who is something like a parent to Kismet. Breazeal comments in a YouTube video interview(Anders, 2006) that she would eventually like to give Kismet a face. Both Cog and Kismet can perform various motor skills (“Kismet,” 2011) and both have the ability to engage an audience by seemingly paying attention to them. In her book Designing Sociable Robots, Cynthia Breazeal points out “many skills can be thought of as fixed action patterns (FAPs). Each FAP consists of two components, the action component and the taxis (or orienting) component. For Kismet, FAPs often correspond to communicative gestures where the action component corresponds to the facial gesture, and the taxis component (to whom the gesture is directed) is controlled by gaze. People seem to intuitively understand that when Kismet makes eye contact with them, they are the locus of Kismet’s attention and the robot’s behavior is organized about them. This places the person in a state of action readiness where they are poised to respond to Kismet’s gestures.“(Breazeal, 2004, p. 149)
Using facial expressions to signify emotions is something that has been explored by many sociable robots; Kismet and the Articulated Head were capable of facial expression. “Whether we consciously recognize it or not, expressing emotion is often done as a means of communicating our feelings within the context of a target audience. (Harris & Sharlin, 2011, p. 442)
Cog was a moving robot that was built to look and act like a human (Anders, 2006) with a set of sensors and actuators which try to approximate the sensory and motor dynamics of a human body. The Articulated Head shared some similar functions and attributes of robots like Cog and Kismet, and placed an audience in readiness to respond in similar ways to those described above. The visual presentations of Cog, Kismet and the Articulated Head are very different, yet all three share the ability to make gestures of one form or another using goal driven motor skills. “People differentiate humans and computers along the lines of intentionality but initially equate robots and computers. However, the tendency to equate computers and robots can be at least partially overridden when attention is focused on robots engaging in intentional behavior.”(Levin, Killingsworth, Saylor, Gordon, & Kawamura, 2013, p. 161) All three robots have the ability to sense an audience and respond. “It is known that simple saliency mechanisms in the human visual attention system, trigger multiple reflexive behaviors”(Ruesch et al., 2008, p. 962), Cog, Kismet and the Articulated Head also demonstrate multiple reflexive behaviours in response to sensing an audience.
2.4 General history of ideas in relation to Human Machine Interaction.
The Articulated Head project was related to artificial intelligence (AI) in that AI did form a part of the performance of the avatar, through the sematic analysis of text input and subsequent choice of speech output. A focus on the nature of the Articulated Head’s performance is considered in more detail later (see section 5.2).
One of the early exponents of AI was Alan Turing. The Turing test measures a machines ability to demonstrate human-like capabilities in performance, such that a human is unable to distinguish between the machine and a human in that specific aspect of the machines performance.
The Turing test was conceived using a text only system (see https://en.wikipedia.org/wiki/Turing_test) and “seems to provide a scientific, objective, criterion of what is being discussed — but with the rather odd necessity of ‘imitation’ and deceit coming into it, for the machine is obliged to assert a falsity, whilst the human being is not”.(Hodges, 2012)
The deceit cited in the paragraph above is not unusual in scientific experiments. It is often necessary to conceal the truth in experiments that involve human participants, in order to maintain the integrity of the results (by trying to reduce the influence of demand characteristics, strategies and expectations). Examples of this practice exist in many published experiments such as that of the Reeves and Nass experiment cited later in Section 2.3.1. The reason for raising this point is that this investigation identifies the existence of a step-function in audience engagement during interaction with the Articulated Head. The said step-function has been titled “The Freakish Peak” phenomenon, because it has a similar but opposite step-function effect to that of Dr Mori’s so called “Uncanny Valley theory” (see section 2.4.3). The step-function effect of the Freakish Peak phenomenon, when shown in a graphical representation of the effect, creates a peak rather than a valley in the graph (see sections 2.4.2, 2.4.3 & Figure 2-1). The word “Freakish” was chosen because of the astonishment the interacting audience experiences at the Freakish Peak and because the terms “Uncanny” and “Freakish” are interchangeable; a test was employed to see whether the effect of The Freakish Peak phenomenon was repeatable. This test necessarily concealed a truth about the interaction from the interacting audience (see 2.4.2, 2.4.3 & Figure 2-1).
The Articulated Head’s performance was designed to give an interacting audience the perception that it was aware and thinking, that the machine, (the Articulated Head) was intelligent. Cognitive sciences study the cognition of the human mind2 and use the term “Strong Artificial Intelligence” (Strong AI) to refer to a view that the cognitions of the human mind can be modeled and recreated as a set of inter-functioning computer programs. The design and functions of the Articulated Head and its embodied conversation agent did in many ways show adoption of these assumptions of Strong AI. However, investigations into human machine interaction and the “problem of human machine communication” (Suchman, 1987, 2007) have presented “challenges to the assumptions of ‘strong’ AI and Cognitive Sciences”(Duguid, 2012, p. 4) by arguing that the planning of an end-to-end interaction within the context of a human machine communication, where the assumption that the situated actions of humans in that interaction would naturally follow the interaction plan are misguided. According to Suchman it is not possible to control all the variables that might enter into the context of an interaction flow, and she showed through her research that rigid plans of interaction designers are often challenged by the subsequent unpredictable actions of both machines and humans during interaction flows. “In other words, while projecting the future is useful, those projections rely on future activity that the projections themselves cannot fully specify”(Radics, N.D, p. 1). If Suchman is correct, then end-to-end micro dynamic interaction design of the Articulated Head’s conversational agent and interactive performance with its active audience is not possible, therefore one must look for other ways in which to show enhancement of engagement in interaction between humans and machines.
2 The term ‘the human mind’ should be taken to mean the embodiment of the human biological brain matter and all the influences that this physical situation may bring to bear upon perceptual experience and the exercise of free will therein.
Suchman’s work can be traced back to Hurbert Dreyfus and Martin Heidegger. Heidegger was a significant exponent and figure in the development of the school of phenomenology. Phenomenology forms a significant part of the underpinning methodological framework (see section
3) from within which this investigation is conducted. “Through Suchman, Heidegger has provided central tools for the critique of AI and cognitive Science and the general understanding of human-machine interaction and communication.”(Duguid, 2012, p. 7). An exposition of phenomenology comes later in this document (see section 3.3) but as a brief introduction, phenomenology involves the study of human experience of existence as perceived within the context of consciousness. A known phenomenon in the study of human machine interaction is that of personification.
2.4.1 Anthropomorphism
Personification describes human attribution of human-like characteristics and capabilities to a non-human entity such as a machine. The human attribution that takes place in personification is called anthropomorphism. A study was conducted by Byron Reeves and Clifford Nass titled “How People Treat Computers, Television, and New Media Like Real People and Places” in 1996. In Nass and Reeves (Reeves & Nass, 1996b), a chapter titled “The Media Equation” details an experiment which showed that humans are predisposed to being polite to computers in much the same way as they exchange pleasantries with other humans. This anthropomorphic stance is an interesting phenomenon. The experiments showed that “when a computer asks a user about itself, the user will give more positive responses than when a different computer asks the same question” and “because people are less honest when a computer asks about itself, the answers will be more homogeneous than when a different computer asks the same question”(Reeves & Nass, 1996b, p. 21). In this experiment the research participants had no idea what the experiment was testing, and the concealment of the true purpose of the testing from the participant was purely for the purpose of retaining the integrity of the result. The reason for mentioning these experiments is that the human approach in assuming the anthropomorphic stance in interactions with a machine plays a significant role in the findings of this study. The Freakish Peak phenomenon identified earlier (see section 2.4) appears to be triggered by a human’s heightened perception of agency in the Articulated Head’s conversational agent performance (see 2.4.2, 2.4.3, Figure 2-1 & 9.2). A test was employed to see whether the Freakish Peak phenomenon was repeatable (see section 7.3.4 page 183,184,185) and this test employed a similar concealment.
2.4.2 Perceptions of agency
Human perception of agency involves the imputation that a person or artifact can act to achieve a goal with autonomy. Humans can (and often do) attribute agency to machines and robots. There is a hypothesis emergent3 from this research investigation that I have called “The Freakish Peak”; this hypothesis is intimately connected with the perception of agency. The reasons for choosing this particular term are explained earlier in section 2.4. The Freakish Peak hypothesis posits that the degree of perceived human agency attributed to the machine by a human, has a threshold at which “suspension of disbelief”(Reeves & Nass, 1996a) becomes relatively automatic. A belief that the machine has real human capabilities such as sense and thought processing is triggered once this threshold has been crossed (See figure 2-1 ‘The Freakish Peak’). The design refinement of the Articulated Head presented in a sequence of figures in Section 9 is specifically aimed at enhancing the Articulated Heads performance so that people’s perception of it more readily reaches this threshold, thereby triggering suspension of disbelief in its interacting audience. The design refinement directly addresses the research questions at the centre of this study (see page 1). In what a follows, I outline some ideas concerning human responses to artificial agents.
3 The meaning of the term “emergence” such as in the phrase “there is a hypothesis emergent from this research” is more aligned with the common use and meaning of the word rather than the technical phenomenological use of the term such as in Heidegger’s use of the word, which refers to the revealing of the various modes of “Being in beings”, the details of which are beyond the scope of this thesis as imparted and explained in more detail shortly (see Section 3).
2.4.3 Human emotional response to virtual agents
In 1970 Mori put forward the so-called “Uncanny Valley” hypothesis. (Mori, 1970) The theory posits that visual representations of characters that were designed to be human-like, such as those used for films or robots can have a stage in their character development where they fall into the Uncanny Valley. This Uncanny Valley is a stage in the character development, close to the point at which the character is approaching the threshold of being convincingly human, where the character actually becomes much less humanlike and can repel humans, possibly because of a sense of fear or disbelief at perceived deformities. The Uncanny Valley theory is seen demonstrated in contemporary films such as Beowulf (Warner Bros, 2013a). The Articulated Head, whilst showing some human-like features, definitely did not fall into the Uncanny Valley category in terms of its visual presentation, as it was very obviously a machine. However, humans interacting with the Articulated Head did have emotional responses (see section 7). Dr Mori’s Uncanny Valley hypothesis is graphically represented in a diagram in a paper titled “Subjective ratings of robot video clips for human likeness, familiarity, and eeriness: An exploration of the uncanny valley” (MacDorman, 2006), where MacDorman plots the theoretical relationship between familiarity and human-likeness. The Freakish Peak phenomenon (see Figure 2- 1 below) shows a close but inverse correlation to Dr Mori’s Uncanny Valley hypothesis ((MacDorman, 2006)), in that the graphical representation shows a peak rather than a valley in the diagram. The Freakish Peak is represented in Figure 2-1 below;
Figure 2-1 The Freakish Peak
This investigation’s emergent hypothesis is that there is a threshold in the performance of human-like avatars and robots that incorporate Chatbots. This threshold is where a suspension of disbelief in the human-like capabilities of an avatar or robot can be triggered in a human by their own perception of the degree to which the avatar or robot demonstrates agency – just beyond this threshold the human momentarily has belief in the human-like capabilities of the robot.
2.5 The Original Plan and how this investigation differs
Referring back to section 2.2.2 and the projected evolutionary development of the Head, embodiment, performance, interaction and evaluation all feature in various sections of the scholarship detailed herein, and these sections can be mapped back to aspects of the original plan in some respects – but they do differ from the original plan in important ways, for example: Embodiment in the original plan focuses on the Embodied Conversational Agent (ECA), whereas this study, though not ignoring that aspect, is much more focused upon the implications brought to bear upon the experience of human . machine interaction as the result of the embodiment of the human mind4 participating in this human . machine interaction. In terms of evaluation, rather than using controlled experiments on behavioural interactions between the Articulated Head and research participants to drive Articulated Head development, this study focuses on understanding the participant experience of human . machine interaction in a relatively unconstrained public exhibition space in order to gather an empirical body of evidence, which is then subjected to an Interpretive Phenomenological Analysis5. This process has been adopted in order to address the key research questions designed to help answer the big question: how can interaction between humans and machines be improved? (with a specific focus on this particular interactive environment).
4 The term ‘the human mind’ should be taken to mean the embodiment of the human biological brain matter and all the influences that this physical situation may bring to bear upon perceptual experience and the exercise of free will therein.
5 Interpretive phenomenological analysis is a qualitative research approach committed to the examination of how people make sense of their life experiences – and what happens when the flow of lived experience takes on a particular significance for people” (Smith. J.A, Flowers. P, & Larkin. M, 2009)
Sub Section 2.6 – Technical Components of the Articulated Head 22 Figure 2-2 The Head The Head was capable of spatial movement via the robotic arm, speech and a small variety of facial expressions. The Articulated Head was surrounded by an aluminum framed glass enclosure in each of the exhibition spaces it occupied during the course of this investigation. The enclosure was a health and safety measure requested and approved by the relevant health and safety authorities at both the University of Western Sydney and the Powerhouse Museum respectively in order to protect any audience members from injury that could be caused by the industrial robotic arm’s movement in the public exhibition space. A commercially developed software conversational agent called the ‘ALICE ChatBot’ (“Artificial Linguistic Internet Computer Entity,” 2012) handled speech, in avatar communication with its audience. The Head was also furnished with some sensory capabilities both auditory and visual in nature, which allowed it to track various elements pertaining to its possible audience. An attention model, invented and programmed by Dr Christian Kroos utilising a software programming environment called MatLab (MATLAB, 2012) used sensory information, probabilities, weightings and thresholds to define and control motor goals relating to various behavioural aspects of the Articulated Head’s movement by directing the industrial robotic arm within the interactive installation enclosure. A carefully designed text to speech engine drove animation of the facial, jaw and mouth movements of this three-dimensional representation, based on the phonetic structure of the text to speech content input. By triggering a prerecorded library of facial point animation maps simultaneously with a correlating phonetic audio sample library, the system provided an interacting audience with what many visitors to the Articulated Head perceived to be fairly convincing synchronization of facial, jaws and lip movement with the auditory speech output. It should be noted here that Head zero did not have a co-articulation model. This meant that every articulation of a phoneme was the same no matter the context it was in. An independent stereo camera tracking system established audience locational information, which was utilised by the Thinking Head Attention Model & Behavioural System (THAMBS). THAMBS – programmed in MatLab (MATLAB, 2012) by Dr Christian Kroos from The MARCS Institute (Institute, 2012) at the University of Western Sydney (“University of Western Sydney,” 2009) could, for example, manipulate the robotic arm’s position to make the head face the current position of a person in the audience. The intention of including such functionality was to make it appear that the Articulated Head had turned to focus its attention upon a person in the audience. The computers system as a whole had a component based system architecture and involved an Event Manager (EM), which played a central role in coordinating the Articulated Head’s functions. The conversational agent ALICE (“Artificial Linguistic Internet Computer Entity,” 2012) Chatbot, was installed as part of the Articulated Head installation. The ALICE Chatbot allowed an interacting audience to converse with the Articulated Head using text, entered via a keyboard mounted on a kiosk just outside a glass safety barrier that separated the robotic arm from its interacting audience (see Figure 2-4 below). THAMBS operations were visualised on the kiosk screen with the purpose of giving a person typing at the keyboard a visual representation of what the THAMBS system was doing.
Section 2 – Introduction 25 Appendix 3: AH_QuickStartGuide. The guide provides diagrams of the hardware setup as implemented in the Powerhouse Museum.
2.8 Software component based architecture
Figure 2-4 (Herath, 2012) Figure 2-5 above shows a flow diagram of the operational framework of the Articulated Head as implemented in the Powerhouse Museum. The diagram does not show every aspect of the operating system environment – the creative additions, (see section 6) are not included in this flow diagram. Some of the modular software framework shown as rectangular boxes in the flow diagram in Figure 2-5 above, in particular the Data Logger, Sonar Proximity Client, Audio localizer, and the Face Tracker boxes, were in fact only partially operational or nonoperational for the majority of the duration of this study. The reasons for partial or non-operation of these modules and their influences, if any, during audience interactions with the Articulated Head, are discussed later under the themed data analysis headings of section 7. Some minor variations to the software component based architecture of the Articulated Head exhibit were tested at various points throughout the Articulated Head’s presence in the Powerhouse Museum. Although the impact of the said variations upon the main findings of this investigation are thought to have been negligible, based on the observation that dramatic noticeable changes to the interactions taking place did not occur, and research participant reporting of the interactive experience was also not noticeably modulated, where that impact of the variations is thought to have been a contributory factor modulating important aspects of the analysis of data in this study, the contributory factor is mentioned and any possible modulatory influences upon the findings related to the particular section of the data being analysed are discussed. 2.8.1 The Thinking Head Attention Model and Behavioural System. The Articulated Head required a behaviour system to control its physical actions when interacting with its audience. The plan was to provide a behavior system that would act as an attention model, defining how the Head would appear to pay attention to its interacting audience, this system was named THAMBS. THAMBS is an acronym for the Thinking Head Attention Model and Behavioural System. Dr Christian Kroos who was a research fellow working on the Articulated Head project described THAMBS in the document E-Appendix 4: THAMBS documentation as “biologically inspired control software implemented in MatLab (MATLAB, 2012) for robotic platforms with sensing capabilities.”. The document also provides a list of publications, which impart theoretical and technical considerations and implementation details related to THAMBS development. In essence THAMBS consisted of a small number of sub-system routines that collected information from system sources such as the co-ordinates generated by the industrial robotic arm and those values generated by the auditory and visual sensors. The auditory and visual sensors consisted of a stereo tracking camera and a stereo microphone set-up, which helped indicate the position of a person in the audience within the Articulated Head’s sensor pick-up fields. The THAMBS system used the gathered endogenous system and sensory information to process a perceptual event. The perceptual event subsequently defined an attended event, which then priority flags. In this respect the Event Manager was very similar to the event processing systems present in many commercial interactive multimedia software programs such as Max (Cycling 74, 2012) or Adobe Director (“Adobe (Software),” 2012) for example. 2.8.3 The EM/Max Interface The Event Manager/Max interface was designed to pass information out from the Event Manager to the Max-programming environment. Details such as the text strings for both user input, and the Articulated Head ALICE Chatbot response strings were passed to Max over a TCP/IP local area network connection. Open Sound Control (OSC) protocol messages passed data to Max. The creative additions were set up to run on a separate computer and operating system from the Event Manager in order to minimize Articulated Head down time, whilst testing and development of the creative additions took place. The passed data was then used for the purposes of displaying the text or related textual strings in the projections detailed in section 6.8.1. Other data such as the X Y Z transitional position coordinates of the robotic arm were also transmitted from the event manager through the Event Manager/Max interface to Max as and when necessary. These parameter values allowed for the positioning of the robots voice within the spatial auditory system as detailed in section 6 (The Creative New Media Propositions) where a more detailed description of the operational networking protocol, programming and implementation of the Event Manager/Max Interface are given.
2.9 Where and when has this investigation taken place?
This research investigation spanned dates between April 19th 2010 and April 18th 2013 and has taken place in connection with, and drawing from, the exhibition of the Articulated Head in three different places: The New Instruments for Musical Expression (NIME) (“NIME International Conference,” 2014) exhibition in June/July 2010, The Somatic Embodiment, Agency & Mediation in Digital Mediated Environments (SEAM) exhibition (“SEAM2010,” 2012) in the Seymour Centre (“The Seymour Centre,” 2010) Central Sydney in October 2010 and the Powerhouse Museum (PHM) (“Powerhouse Museum | Section 2 – Introduction 29 Science + Design | Sydney Australia,” 2012), Australian Engineering Excellence Awards Exhibition (AEEAE) from February 2011 to November 2012.
2.10 The Big Question.
The big question identified in the grant application E-Appendix 2: From Talking Heads to Thinking Heads was: . How can interaction between humans and machines be improved? This question will be referred to simply as the big question from now on. The critical importance of the human present in the human . machine interaction referenced in the big question has become the key object of focus within this investigation and the ramifications of this critical focus are presented throughout the following sections of this thesis.
2.11 The Research Questions
The key research questions presented on page 1 and reiterated below were specifically designed to help answer the big question. The key research questions have one main line of enquiry, which draws from the three openended subsequent questioning strands (a, b & c). The research seeks to examine avatar, conversational agent and audience interaction; it does so through the following questions: a. In which ways has and can the avatar be given agency? b. In which ways is and can the avatar be multi-sensory and multipresent? c. In which ways can and do the characteristics of a `virtual’ performer (e.g., personality, nuance of response, context) condition7 affect and engagement? These questions are reiterated and explored in detail throughout this document and many aspects of embodiment, performance, interaction and evaluation become features and/or concerns of the themed qualitative data analysis (see Section 7). Section 8 presents key concepts that help to explain the findings from the themed qualitative data analysis, and inform 7 The word ‘condition’ is used as a verb here; examination of the characteristics of the ‘virtual performer’ and how those characteristics affect the performance and engagement of an active audience is referred to herein as an aspect of conditioning of the interaction.
Sub Section 2.12 – This study’s contribution. 30 design refinements put forward in a blueprint of recommendations emergent from the investigation (see Section 9).
2.12 This study’s contribution.
Many roboticists have developed humanoid robots in order to explore human . machine interaction, one of the best known of these developers is probably Professor Hiroshi Ishiguru of Osaka University (“Osaka University,” 2012). Social robots such as Cog and Kismet mentioned earlier, the Home Exploring Robotic Butler (HERB) (Srinivasa, 2012) or Actroid-DER also known as the uncanny valley girl named Yume (Ishiguro, 2006) have been developed to explore human . machine interaction. The Articulated Head was just one such robotic project developed in part, to explore human . machine interaction. The key contribution that this particular research project makes to knowledge about human . machine interaction, i.e. where this study helps to fill a gap, is that, many investigations of human interaction with robots are focused on the engineering aspect of the robot. The current investigation was motivated from a performance arts perspective where prioritisation of human experience and human perception of the robotic installations performance are used to inform potential improvements to the design of the interactive environment in question. This investigation has explored an audience’s interactive experience with a Robot and Embodied Conversational Agent (ECA) installation in a relatively unencumbered public exhibition space employing the methods of Grounded Theory, Video Cued Recall and Interpretative Phenomenological Analysis (Smith. J.A et al., 2009).
2.13 The Approach: Prioritisation of Human Experience.
Many of the problems of robotic development are approached from a predominantly scientific and/or engineering perspective, where technical design and development of electronic software, hardware and mechanical aspects of the project become of paramount importance and take precedence over the subjective perception and lived experience of the robots performance as perceived by an interacting audience.