Avatars, Agency and Performance: The Fusion of Science and Technology within the Arts
Richard Andrew Salmon 2014
4. The Reduction
4.1 Scope of this Reduction
As is the accepted and the recommended tradition in the practice of a phenomenological reduction and epoché, this reduction should be seen as a simple, possibly naïve description of the things as they appear to my consciousness, absent of concern for previously established and accepted perspectives, academic or otherwise.
“We take for granted our bodies, the culture, gravity, our everyday language, logic and a myriad other facets of our existence. All of this together is present to every individual in every moment and makes up what Fink terms “human immanence”; everyone accepts it and this acceptance is what keeps us in captivity. The epoché is a procedure whereby we no longer accept it. Fink notes in Sixth Cartesian Meditation (Fink, 1995): “This self consciousness develops in that the onlooker that comes to himself in the epoché reduces ‘bracketed’ human immanence by explicit inquiry back behind the acceptednesses in self-apperception that hold regarding humanness, that is, regarding one’s belonging to the world; and thus he lays bare transcendental experiential life and the transcendental having of the world” (Cogan, 2012). If any links between the thoughts that are detailed in the following text regarding the senses and perception of things as they appear to my consciousness seem to correlate with the thoughts of others, these links are purely coincidental and born of a common interpretation of the lifeworld, they are not surmised by any direct reference to the writing or thoughts of those others.
To begin this reduction it should be noted that this is a critical evaluation of the forum and ways in which human communication takes place, focused on the phenomena that facilitate it. This evaluation is conducted, specifically in order to gain a deeper insight into the environment and means in which human . robot communications take place, with a focus on how this differs from human . human communication. The robot being referred to here is specifically the Articulated Head, and although much of what is said probably does apply to other robots, there may well be cases where it does not. The purpose of this critical evaluation is an attempt to lay bare the realities of the situation under investigation, to get at ‘what is’ nothing more.
Human . human communication has many facets that need consideration and human . robot communication is likely to present another range of facets for consideration in comparative analysis. Human perception as a multisensory experience plays a critical role in this comparison. “After many years of using a modality-specific ‘sense-by-sense’ approach, researchers across different disciplines in neuroscience and psychology now recognize that perception is fundamentally a multisensory experience.” (Calvert, Spence, Stein, & MITCogNet, 2004)
Clearly, the robot end of human . robot communication will lack the sophisticated multimodal senses that humans enjoy. As such, human . robot communication will be different; from the human’s perspective it might involve the realisation that robot interlocutor is “not all there”.
It is this type of realization that I wish to capture. This datum is something that is primarily declarative, a thought that can be put into words. It is this end product of perception and thought that forms the reduction I am making (something opposite to the physical reductionism of chemistry or physics).
Notwithstanding the concept that both consciousness and perception have sedimented layers of experience, enculturation and acculturation present and stored in memory in any human brain, and that these sedimented layers will inevitably precondition both consciousness and perception themselves, the following reductive descriptions are made with a focus on the things as they appear to consciousness – rather than any preconceived or pre- researched perspective on the things presented and described henceforth.
The descriptions given are not necessarily exactly the same as might be experienced by all other individuals but the majority of the detail is expected to find concomitance with most others, assuming unconstrained operation of all their senses – hence the employ of the term ‘our’ in the descriptions that follow.
Consciousness is a state of mind in which a human or other biological being is said to be aware of, and responsive to their surroundings. Consciousness is also the state of mind within which communications between humans take place. Our senses are vehicles by which external communications are intercepted and transported to the brain for processing. Perception is the initial outcome of this processing prior to any further action initiated as a result of our perceptions. Put another way, perception is fundamentally interpretation of messages and communications, received by the brain via our senses – from both internal and external sources to the human body.
Therefore, consciousness is essentially a multisensory experience and the sense fields that significantly contribute, if not constitute consciousness, within which communication takes place, are of particular interest here. So, it is probably helpful to try to describe the perceptual approach of things, as they appear to consciousness via the sense fields, including perceptual parameters, dimensions, boundaries, interdependencies and constituent characteristics as the first step in this reduction.
I note that many facets of communication are evanescent: variable in nature with a multitude of overlapping dimensions, which can impact and have ramifications which are diverse and wide ranging within the boundaries and context within which the communications are taking place. Secondly, I note that the variable facets of communication can have both instant, and delayed conditioning affects to subsequent elements communicated For instance; when two humans are conversing, a facial expression could trigger a perception that one of the humans in the conversation was uninterested in the topic being discussed, this could trigger an instant change in the topic of conversation but might also trigger a shortening of the exchange over time.
It may also affect future communication between these two individuals on this subject.
4.1.2 The senses
For the purposes of describing the sense fields and things as they appear to our consciousness, communication elements can be broadly defined as non verbal and/or verbal in nature and can also be broadly described as emanating from outside the body and mind of the perceiver (external) or inside the boundaries of the body and mind of the perceiver (internal).
The field within which human . human communication takes place is broad and may encompass elements supplied by any of our five traditionally accepted senses: sight, hearing, smell, taste and touch. Human communication also includes our sense of body position – kinesthesis and other aspects of the use of our senses: “In addition to sight, smell, taste, touch, and hearing, humans also have awareness of balance (equilibrioception), pressure, temperature (thermoception), pain (nociception), and motion all of which may involve the coordinated use of multiple sensory organs.” (Zamora, 2013)
4.1.3 A spiritual dimension
Many humans would proclaim they have a spiritual dimension to their consciousness and senses, and that this spiritual dimension has direct impact on their perception of communications received, and would also have influence upon any subsequent actions taken. Therefore, even if one is completely non-spiritual and does not believe that this so called ‘sixth sense’ really exists, one does have to take account of this belief as if it exists within the minds, consciousness and perception of others, since such a belief influences the direction of their subsequent actions, that is, its influence exists and so is real and potentially significant, and must therefore also be a consideration within this reduction if, and where appropriate.
4.1.4 Communication or information exchange, initiations and sense fields
One could consider a communication between two humans to have begun in a non-verbal format even when these individuals are very considerable distance apart. For instance, a person may glimpse and make brief eye contact with another individual from the corner of his/her eye from several hundred meters away, whilst walking along a busy city street. If the other knows this individual then one would consider that this distant eye contact had already been conditioned by previous communications between these individuals. On the other hand, if the other is not known, then one might say that this initial eye contact appears not to be preconditioned. However, beyond this point one has to consider that some form of conditioning has brought these individuals into communication. The spark that triggers these individuals to register the presence of one another, for whatever reason, even if it is just by chance, defines that communication has already begun. It might be a person’s looks and some aspect that promotes physical attraction, it maybe some other element of visual intrigue such as a shared idea of fashion or some other aspect of one individual that fosters perceived affiliation within the other, even if in reality the feeling of affiliation is not shared by both individuals. Conversely the non-verbal communication between these two individuals could be of a hostile nature, body language from one individual might foster aggressive tendencies within the other. It is also possible that the spark that triggered these individuals to register each other’s existence was not by the giving of any external sign, but could be the result of association with some other environmental object, something that has become the subject of one of the senses. There is an enormous range of possibilities present within this type of scenario, making it very difficult if not impossible for any third party to identify causes or even notice the onset of communication without the direct reporting of the individuals involved.
However, what is clear is that the visual domain is very important in non- verbal communication.
4.1.5 The senses and range
At this juncture it becomes apparent that the sense field of sight might have a longer range than the other sense fields depending on the conditions of communication taking place. For the blind person to identify the existence of someone several hundred meters away on a city street seems somewhat unlikely. Taste and touch are not possibilities in this scenario and hearing would be neigh on impossible due to environmental noise effectively dissipating any chance of hearing a specific individual over this distance.
The sense of smell is a possibility but even if the other individual were known to the blind person, the wind was blowing in the right direction and this individual had a particularly pungent or unique aroma, one would expect that there would be so much other aromatic information mixed in the air between these individuals, such that any presence would effectively be masked from the blind man.
However, if we change the scenario through some imaginative variations, it becomes clear that one can recall events within our own sedimented memories of experience, whereby any individual sense can take precedence over another in terms of its identification-al range, and any sense can effectively have content masked from it by natural barriers. Given a human with no sense impairments, examples might include:
On a clear night one can see the light emanating from celestial bodies, some of which may not even exist anymore, it’s just that that light has taken so long to reach Earth that the demise of this distant solar system precedes the arrival of its previous glittering glory to Earth. In the broad, vision operates over straight lines and so long as there is no physical obstruction between a bright source of light and the eye, sight’s range is literally astronomical in comparison to the range of other senses (of course, although sight is exquisitely sensitive, detailed information is curtailed due to limitations in acuity).
Whilst walking home from the Pub along high hedge rowed lanes and wooded surroundings on a warm summers evening in the English countryside, one hears the approach of a car winding around bends towards you long before any sight, smell, taste or touch of the vehicle can be made.
Whilst walking down the street, one can smell a Fish n’ Chip shop long before one can hear, see, taste or touch it!
One cannot necessarily tell that a fruit is bitter from the use of the senses of sight, hearing, smell or touch, only by tasting can we reveal this secret. Taste can reveal the hidden side of a multitude of herbs and spices present in food, which are not always revealed by smell, although sometimes smell can give clues which taste confirms.
Whilst engaged in a contact improvisation dance, the dancers involved can identify the direction and speed of movement of their co-dancer and can also identify the weight of leaning and needs of support between each other through the sense of touch. The amount and detail of information conveyed is only apparent to the sense of touch and cannot be accurately extracted by the sight, hearing, smell or taste senses of either dancer or an on looking audience.
4.1.6 Interdependencies and embedded enculturation
At this juncture it becomes clear that there are interdependencies between the senses and the hermeneutic interpretations formulated as a result of the communications received by them. There are also differences between the senses, they can have varying ranges and be more or less efficient and useful for identification purposes, dependent on the specific scenario and information that is the object of communication. Moreover, it becomes clear that our sedimented layers of experience allow associations and correlations between the senses that enable us to automatically identify, process and interpret sensory information that are so deeply embedded in our consciousness as to make description of objects that are the foci of any particular sense inseparable from the multisensory experience of consciousness and perception. Therefore we can only describe objects, which become the foci of consciousness and perception in terms of their apparent salience within any sensory conduit, given the relative influence imbued by layers of experience. We cannot completely dissect and separate any given phenomena from the experience (qualia) that also may have been the seed for the hermeneutic identification of the phenomena.
That is, from our phenomenology, we may have come to know that an event, object or stimulus exists but to all intent and purposes that phenomenon did not exist before we identified it, even though its, chemical, biological and ultimately physical antecedents did exist, since these ultimately gave rise to the events that underpinned identification.
4.1.7 Sense field pick up patterns
There is a distinction between the term sense, which is an instance of receiving information (input) and the subsequent processing of that information. The distinction between sense (input) and processing is true in both human . human and human . robot communication.
It is prudent then to note that the sense fields of human communication in our multisensory experience of the world are interactive and variable in range, and are conditioned by the scenario and object of information that is being communicated. Further, the senses can be sensitive to overlapping correlates amongst each other that help to solidify perceptions brought about by the processing of the sensory information received, the confirmation that taste brings to the smell of a particular herb, being just one of millions of possible examples.
As described by Don Ihde in his book, the Phenomenologies of Sound (Ihde, 2007, pp. 73-84) our sense fields display pick-up patterns of varying different shapes; that is, we can hear all around us, so our hearing displays a somewhat global pick up pattern, whereas, the pickup pattern of vision has a funnel shape (with maximal sensitivity in the fovea) and our brains stitch together a panoramic illusion of vision, which is what we perceive around us.
Our external environment structures the pick-up pattern of our sense fields shaping them in conjunction with the barriers and noise present into a continuously morphing set of pattern shapes and resonances, which change in response to our movement and according to the physical changes within the external environment.
Although the above sensory vignettes and descriptions of capacities are fairly obvious, focusing on a description of the sense fields within which human communications takes place, may help to identify some of the key differences and variations highlighted by comparisons in the reduction to phenomenology. Agency, experience and embodiment come to the fore, as it is in the description and comparison of these elements that the variations and differences between human . human and human . robot communication become most apparent.
4.1.8 The whole body sense
Touch (and the companion sense of proprioception), from a first glance appears to be the only overtly whole body specific sense. Sight, sound, smell and taste seem more head specific in terms of human design and structure, especially in relation to the location of the sense receptors positioning on the body. However, it is acknowledged here that a whole body chemical and electrical messaging system does exist and since the senses are connected to the central nervous system, all the senses do in fact have whole body influences. The minutiae of chemical and electrical messaging within the body is an area generally considered beyond the scope of this reduction but studies in these areas are worthy avenues of further scientific research in relation to the internal communications triggered by the senses and their associations.
The fact that four of the five senses considered so far, have sensors or receptors located in the head (a body part that can be oriented in diverse ways) may, or may not be important in terms of the phenomenological reduction of human . human communication, it is not beyond the realms of possibility to consider that the evolutionary placement of the eyes, ears, nose and mouth of a human, being located so close to the brain, could, in part, be to allow for messages to reach the brain for processing as fast as possible.
Another human sense apparatus located close to the brain is the vestibular system. The vestibular system located in the inner ear provides humans with their sense of balance, a sense that is essential for horizontal and vertical orientation (standing and walking) in the physical world. The Articulated Head was anchored to the ground and its physical movement hinged around those anchor points. Rotation and physical movement of the industrial robotic arm was defined by coordinates and driven by motors meaning that no sense of balance was necessary for its orientation.
Now, considering the brain itself, it definitely has the ability to learn and to commit information to memory. The brain has the capability to collect, analyse, evaluate and synthesize information. Many human beings believe they possess the power of free will, if so then the brain is the receptor of sensory information, a place of processing that information, and a place of genesis of will. A human’s sensory apparatus is the conduit, which carries messages to the receptor (the brain), and ‘will’ can be considered to be emanating from sources internal to the brain. Will may be the outcome of processing of sensory information and memory, in which case it is not entirely free but rather it is influenced – and so will becomes the decisions of a conscious mind. If will is simply the outcomes (decisions) of processing of sensory information and memory, then theoretically a robot designed with similar processing power, memory and sensory apparatus should be able to demonstrate similar abilities to those of the human brain such as collecting, analysing, evaluating and synthesizing information, it should also have the capacity to learn.
With regard to the robot having the capabilities to be the place of genesis of free will; a human might find it easy to conclude that no such dimension of brain or mind could exist within the robot, so why would people hope, want and expect to be able to converse with the Articulated Head in a similar vein to that which they might experience when communicating with another human being? Just as one might conclude that the robot has no capacity for free will, one might also conclude that the robot has a lack of human capabilities and capacity related to any, and all of the senses too. The Thinking Head could not see, hear, smell, taste or touch in any meaningful way. Let us deal with each of these senses in order and consider what, if any aspect of these senses the robot could summon; to what extent the information received and the abilities of the robot to process this information compares to that possessed by humans, and to what extent the differences enable or inhibit the scope of communication or information exchange between the two. To do this we should consider in more detail what a communication is and in what form it may be transmitted or received.
4.1.10 The language of communication
Some time has already been spent discussing the senses, what has been left out is the language of communication. By language, I do not mean French or English; I refer more to the interactive interplay present in human . human communications, the interception and hermeneutic interpretation of information captured by the senses providing conversational directives and foci for extension of the conversation-taking place. If one were to record a normal conversation between two individuals taking place on a bus or train heading to work, it is very likely that the dialogue would be laced with verbs, doing or being words and descriptive adjectives – you can imagine statements such as:
“I feel wonderful today”
“We had a brilliant night out at the theatre and the performance was magical, it has left me inspired to join an amateur dramatics group”
“I love driving over Sydney Harbor Bridge each day, the view reminds me of why I wanted to live here”
“It’s a shame we have to go to work, I could easily spend the whole day fishing on one of those boats”
These short statements can build affiliation with another individual we are communicating with by the associations that are invoked or they can elicit further comment, elaboration or the drawing of parallels extracted from our own experience and perceptions.
4.1.11 Non verbal communication
Humans have built a whole non-verbal “language” that communicates all sorts of information to those around them. The bus or train on the way to work is a very good place to see the semiotics of this communication in action.
The signs of this communication can be as stereotyped and iconic as the waiving of a fist to signify annoyance or the rubbing of a thumb and finger to signify money – to the very subtle candour of a female in recognition and tentative acceptance of an attempt at wooing her. Even in the apparent disinterest of those surrounding you, you can see that, in the incessant stare at the mobile communication tools and screens, the glow of laptops and iPads all over the place, people at first glance appear to be so wrapped in what they are doing as to be oblivious to the activity around them. However, when one takes a closer look, it becomes clearer that, at least in part, people are using these electronic communication tools to hide behind. You will notice them look up from time to time and if you watch people’s eyes closely it becomes apparent that many eye movements appear driven by interest in others and that many of us, in part, use electronic distractions to disguise this.
4.1.12 Sense by sense comparison
In what follows, I undertake a sense-by-sense comparison of human . robot communications and relate this to the portrait of human . human communications that I have already sketched.
The Articulated Head did have some simple capacity for visual processing. A stereo camera provided the ability for the Articulated Heads computer to track the co-ordinates and relative locations of several people in the immediate vicinity of the robot. It could also make use of this information. The THAMBS attention model helped to decide who or what the Articulated Head was going to prioritize in its processing. This information was then used to direct the industrial robotic arm to place an affordance on this location (i.e., if an action was to be undertaken it would occur in this space). Face recognition software was trialed with the Articulated Head but it proved to be unreliable and caused multiple and regular crashes, so it was not operational for the vast majority of the time in which this investigation took place. Nevertheless, face recognition software presents the possibility of extending behaviours beyond those that were possible with the Prosthetic Head. These behaviours included the gestural mimicry of a particular audience member and the possibility of remembering aspects of people who had previously interacted with the Articulated Head. Beyond this, the Articulated Head had no further capabilities for processing visual information.
It could tell you that it was tracking something and where that something was but it could not tell you that it was a human it was tracking, and it could not tell you anything related to shape, colour or any other visual attribute.
Therefore one could sum up the Articulated Head’s visual capabilities and response mechanism as follows – “there was something in the monitored visual field and THAMBS instructed the robotic arm to follow this” So, the Articulated Head could not see as it had no capabilities for differentiation and identification of any visual attributes of anything within its visually monitored field.
The Articulated Head was equipped with a stereo microphone, which was in theory capable of calculating the direction from which a sound source created in the sound field in front of it had emanated, assuming a low noise ambient environment. This in turn endowed the Head with the ability to track and also to follow, or “pay attention” (viz., allocate resources) to a sound source by moving the robotic arm and screen mounted upon it, to face the general direction from which the sound source was emanating, in practice, however, the accuracy and resolution of such a system in an unconstrained public exhibition space was questionable, indeed, where ambient noise was present this tracking system was rendered all but useless. This is the total extent of sensory processing that was available from the sound field of the Articulated Head. It was not able to process language or anything about the type of sounds that were produced within its microphones pick-up field, it only registered that a sound had emanated from a particular direction in the x plane. Accurate identification of sound generation in the y and z planes was not possible because only two microphones were used, mounted on the horizontal plane a short distance apart from each other. This system measured the time difference between a sound reaching one microphone and the other in order to establish the direction from which a sound was emanating. There was no provision for calculating the vertical position (angle) or distance of any sound source. Further, the Articulated Head did not have the ability to separate the different auditory information being collected (auditory streaming) and it certainly could not interpret the addition of multiple sounds emanating from various different places at the same time, so spatial auditory processing and interpretation of multiple data streams simultaneously was a one-sided capacity that was restricted to the human in interaction with the Articulated Head.
However, with reference to the auditory domain, although not a specific sense, it must be noted here that the Articulated Head did have the ability to speak to its human audience, or perhaps more accurately, it had the ability to articulate predetermined and programmed narratives to a human audience through the mechanism of a text to speech engine. Although at a functional level the abilities of the Articulated Head appeared speak-like (they could be interpreted as speech), these head noises were at a deeper level not the same as speaking. Indeed, although some semantic analysis was borrowed in the head responses via the triggering of pre-packaged narratives, the Articulated Head was not capable of relating stored information in a symbolic fashion (it did not have a “semantic engine that preserved the truth-value of the proposition over which it might operate).
However, even giving the appearance of speaking is a significant ability and it was this capacity that was probably the Articulated Head’s most significant ability during this study.
The Articulated Head had no capacity for the sense of smell; it had no olfactory sense organ and therefore no apparatus to capture aromas or the input patterns that give rise to the human perception of smell. It had no transducer or way to transport any of this type of information to any processing engine. Therefore we have to conclude that this sense did not exist for the Articulated Head and it could therefore not learn from this sense.
The Articulated Head had no capacity for the sense of taste; there was no receptor for this information and transducer to pass information for processing. Therefore the Articulated Head could not learn from this sense, and had no experience related to this phenomenon, other than that programmed into the conversational agent by a human. The Head could say “Oh yes I like chocolate!” but this was in fact just the re-iterated words of a human programmer – and as such the utterance “Oh yes I like chocolate!”
did not involve a thinking process on the part of the Articulated Head itself, this point though, does bring us closer to a more interesting discussion of embodiment and agency (which will unfold below).
Touch is an interesting sense because it is overtly and obviously distributed throughout the whole of the human structure – head and body. It is also interesting that it has the ability to be considered in the mental realm too: as one might say; “I was touched by the words of Nelson Mandela’s speech”, this statement indicates a stronger and more physical sensation of the thought processes being touched than the sense of hearing the words of Nelson Mandela’s speech would normally convey alone.
Touch is also interesting in that movement is the action and medium in which this sense takes place and one of its major constituents is to feel. That is, it has the ability to act as a transducer of two-way communication; it can just as easily transmit information as it can receive it! For example, one can touch the cooker and feel that it is hot therefore receiving information through touching and feeling. One can also touch and feel the hand of a loved one transmitting the information that the loved one is cared for, cherished. The vestibular system plays a role in the articulation of a human’s body movement in correlation with the ability to touch and feel.
With a deeper focus, it becomes clear that other senses too, or more specifically the receptive apparatus of each can also provide the facility for two-way communication. For example the eyes can both see – and can convey or reveal aspects of a story of the mind. The eyes can reveal happiness, sadness, anger and pain. In conjunction with facial expression, dress and composure, the eyes can reveal deeper details of the story of a long and troubled past for example. Many would say that the eyes are a window into the soul! Much can be gained from clues given by looking into another’s eyes. The two-way nature of these sense conduits and the handshake style information exchanges that take place over them is of considerable significance in relation to Articulated Head-to-human interaction because this is the way in which humans perceive agency.
To consider the ramifications of embodiment and how it affects the distribution of agency in humans, there is a type of dance called Contact Improvisation, which is a good example of a human . human activity that demonstrates some of the effects that contact can have with regard to distribution of agency between dancers interacting. Touch, feeling and movement conspire to distribute and extend the sense of embodiment of each dancer to encompass parts of the other dancer’s body – and in this respect each dancer can experience the sense that the other dancer participating in the dance with them has some distributed agency within their body – therefore extending embodiment. Put simply, through touch you can feel the intention, leaning and the weight/balance of another individual in interaction and this in turn can influence your own movement. Although the Articulated Head could move, it could not meaningfully touch or feel – so couldn’t learn or convey any information through this sense. Furthermore because it could not feel, any attempt at exchanging information with the Articulated Head through touch was not received. This constraint was common to all the senses and the associated transduction of communication with the exception of sight and sound where the human could at least impart presence and approximate location but nothing else.
4.1.13 The revelation
The above analysis conflicts with how the Articulated Head was presented to the public, i.e., as a robot that could converse with you – and it was implied that the Articulated Head was capable of thinking and learning, indeed it was part of a larger research project labeled the “The Thinking Head Project”.
So far, however, we find that the Articulated Head had no senses and was unable to gather any information about the humans in its vicinity other than their presence, approximate location in the x plane and information given in text input. Whilst information about someone interacting with the exhibit could be gathered from text input – and could be usefully stored in memory for recall and use in subsequent conversational discourse, this type of information gathering and processing does not really present convincing signs of thinking taking place in the machine to a human, largely because the human input the information into the machine in the first place. Human beings understandably associate the phenomenon of thinking with consciousness and embodiment of their own brain – with its ability to gather (via sense apparatus), analyse, evaluate and synthesise information in order to stimulate original thought. Therefore, to present convincing signs of original thought taking place in another entity, a machine, to a human, requires that the thought appears to have had its genesis in that entity, the machine. Arbitrary presentation of preprogrammed thoughts from the Articulated Head to its interacting audience is unlikely to work convincingly because randomness is not really a feature of a thought process, rather the opposite, a thought process demonstrates rationale and/or logic. Since presenting a convincing demonstration of original thought taking place in a machine is inextricably linked to a humans perception that the thought originated in that other entity, the machine, presentation of the thought is likely to be most convincing via demonstration of an awareness of the multifaceted aspects and features of the immediate environment, including the human that the machine is suppose to be convincing.
The realisation that the Articulated Head had no capacity to sense aspects and features of the multifaceted environment was not a helpful place to be with reference to identifying new avenues for improvement of audience engagement with the machine, which was a primary directive of this research. If engagement is defined as passive observation of reaction to presence, then one can say that the Articulated Head did engage its audience, People did walk passed the exhibit, and in many cases did observe the Articulated Head’s reaction to their presence. Some did engage with the keyboard for text input, but most observed interactions with the exhibit were short and people very quickly became disengaged when the conversational agent made no sense or showed no awareness of the environment and features within it. Avenues for improvement of human engagement appeared very limited at this juncture. However, in terms of understanding the audience’s engagement in interaction with the Articulated Head, realisation of the limitations of the robot bring us to an important place with reference to an appreciation of the experience of avatar, conversational agent and audience interaction.
It is important to recognise here that there was one significant channel of information transfer, if not communication, that was available to the human to input information to the Articulated Head that has not yet been mentioned in this reduction, largely because its modality was not directly aligned with that of human . human communications and the senses. A human could exchange information with the robot via typing text on a kiosk keyboard. This mechanism for exchanging information (texting) with the robot was not ideal but was a form of communication (= writing); speaking would have been far more conducive to the flow of information – as for at least one of the parties present in the interaction speech was the customary modality of exchange. However, in practice the keyboard was the way that the Articulated Head’s audience conversed with it over the duration of this study.
4.1.14 Conclusion to the reduction
In summary of the points made in the reduction above, the senses and their transduction apparatus feed consciousness with information. Perception of sensory information constitutes the nature and landscape within which communication and learning between a human and other living entities takes place. For two way communication of the senses to really take place such as has been discussed with the examples of touch and feel in relation to agency and embodiment, or eye contact in relation to it being a window into the soul, both parties present in the interaction have to share similar sense apparatus, receptors and transducers to conduct information to the brain, mind and consciousness in order to process and build an interpretation of the information received. The output of processing is perception, learning, the accumulation of our layers of experience, whether our perception is accurate or not, along with any actions subsequently instigated. Subsequent actions instigated can take a multitude of forms but will always include some other communication whether it is to modify our experience and memory as the result of something learned, or whether it is to take no action at all.
Finally if perception is fundamentally a multisensory experience, and the senses feed information to the conscious mind8, within which processing of this information takes place – then it should be clear that the Articulated Head possessed little if any of the contributory phenomenological attributes of consciousness, perception or multisensory capabilities of a human because the Articulated Head did not have the ability to interpret multiple and multilayered data streams in parallel or simultaneously. Furthermore, since the Articulated Head couldn’t conceivably be given these uniquely human attributes of consciousness, perception or the multisensory capabilities to feed them, then one must conclude that (as was established previously as part of the universal law of communication theory (Pillai, 2011, p. 58) S. F. Scudder (1900) the avatar or robot was not ’in itself’ a living entity and as such had no initiative, impulse or desire to initiate communication, survival was of no concern, the robot couldn’t see, hear, smell, taste, touch or more strictly – feel in any meaningful way. The robot did not have sense conduits, which allowed it to do anything in terms of collecting and interpreting information about its audience beyond the very basic auditory and visual capabilities previously imparted.
8 The term ‘the conscious mind’ refers to the embodied human brain and the experience of existence as perceived by that brain through the sense apparatus that feed consciousness.
4.1.15 Tentative findings
So in the threads of the phenomenological reduction above, we find a vast void between human . human as oppose to human . robot communication – or perhaps more precisely we find that communication in terms of the above stated universal law, is only possible in one of the two conditions examined as both entities, sender and receiver, have to be living entities in any communication. Therefore, the Articulated Head or robot did not communicate anything! However, and most importantly, the Articulated Head was a conduit for the exchange of information, an agent between those who designed and programmed the Articulated Head and those who interacted with it. Just as the sense receptors capture external communications and act as a vehicle to transport the communications to the brain for processing, so to should the Articulated Head have been considered as an information interceptor and transport mechanism, and as such, it had the ability to display agency in communication transactions and also had the ability to encapsulate and display a multitude of ideas put into it by all involved, the designers, programmers and those interacting with it.
The Articulated Head was not just the embodiment of the original projected Prosthetic Head with the addition of the industrial robotic arm, moreover, it was the embodiment of a menagerie of ideas and embellishments that emanated from any and all those who contributed to its development.
Therefore it stands to reason that any future propositions and developments designed to enhance engagement of an audience in interaction with the exhibits of this type should either:
1) seek to address the complete lack of multisensory capabilities identified in this reduction in one way or another – whether it be through addition of real sensory capabilities or through the enhancement of the avatars performance by conveyance of the illusion of sensory capabilities, perception and consciousness of the avatar to the interacting audience or 2) seek to bring a more integrated ergonomic system type approach to the audience in this interactive experience by providing more seamless linking of the assortment of ideas and embellishments that have emanated from any and all those who have contributed to its development, as opposed to the rather bolt on feel that a menagerie of ideas and embellishments can so easily bring to the design and feel of any interactive environment. This does not mean that the miscellany of ideas and embellishments were a mistake, it is just to say that their macro dynamic arrangement within the interactive environment needs careful consideration and alignment with the needs of the human in this interaction, if one is to communicate the feel of a cohesive engaging whole to the interactive experience, to the audience.
Having situated the project within a discussion of its limitations and constraints, in what follows I describe the methods by which data about human . robot interactions was collected and the theories that underpinned these.
4.2 Practical application of the Video Cued Recall Method
Stage one of gathering Video Cued Recall Interview data was conducted at the NIME conference in 2010. The second phase of Video Cued Recall interview data collection was conducted at the SEAM Symposium held in Sydney in 2010. Stage three of the Video Cued Recall interview data collection was conducted at the Powerhouse Museum, Ultimo, Sydney, Australia.
The person/audience interacting with the Articulated Head (a research participant) signed a consent form to agree to the Interaction and Video Cued Recall session being filmed. They also signed a release form at the end of the video recall to give permission for the audio/visual material and transcription related to their interaction to be published. This gave participants an opportunity to review their interaction before signing the release form. Finally participants were asked to answer a simple questionnaire, the results of which can be accessed via the Key Evidence Links Table Provided on CD ROM.
1. The consenting participant then interacted with the Articulated Head and the interaction was captured.
2. The audio and visual material was subsequently played back to the person who was interacting with the Articulated Head in a Video Cued Recall Interview, and they are asked to comment on what they were thinking and feeling during the interaction. This interview was also recorded.
A set up that allowed for the capture of a research participant’s interaction with the Articulated Head was required. A separate set-up to capture the subsequent Video Cued Recall Interview was also required. Since this investigation was embarked upon with little or no preconceptions about what were, and what were not important aspects of the interactions or interviews taking place at the time, it was decided to attempt to capture as much information as possible pertaining to both the interactions and interviews, given the tools and technology available to the investigation.
Therefore, the following set-ups were used.
4.2.2 Interaction Capture Set-up
Figure 4-1 Interaction Capture Set-up
The capture set up in Figure 4-1 above utilises a Security Video Device that allowed for four separate video streams to be input to the device. The Security Video Device then allowed the user to switch output options between showing one of the four video input streams at full screen or showing all four video input streams on screen as one vision mixed data stream as shown in the diagram on the Video Cued Recall and Projects PC Monitor in Figure 4-1 above. The Articulated Heads text to speech audio stream could have been recorded directly, this would have resulted in excellent quality audio capture of what the Articulated Head was saying during interaction, but would not have captured any sound that the interacting audience may have made during interactions. After some initial experimentation with the possible audio streams, it was decided that the audio stream coming from the Firewire camera’s microphone captured a reasonable mixture of all the ambient sound sources providing a rich audio data stream that could provide information about audience and ambient sound conditions as well as the sound of the Articulated Heads voice. Whilst the Articulated Head’s voice was not so clearly defined in the audio stream captured, we also had text string data of the Articulated Head’s spoken words captured for comparative analysis. Interaction capture of the data fed into the computer via a video capture card was achieved using Adobe On Location (“On Location (software),” 2013)
4.2.3 Video Cued Recall Interview Set-up
Figure 4-2 Video Cued Recall Interview Set-up
To capture the interviewer and interviewee dialogue during the Video Cued Recall Interviews, a microphone present in the webcam in Figure 4-2 above was utilised. The software program iSpy (“ISpy (Software),” 2013), an open source camera security software, allowed for simultaneous display of the different data streams in one window. CamStudio software (CamStudio, 2013) was used for whole screen capture of the interview and playback data in one video file. The final audio stream for the CamStudio full screen capture, became a composite of the original interaction playback audio data stream, captured by the webcam microphone from the Video Cued Recall interview computer speakers, mixed with the interviewer and interviewee dialogue taking place in the interview. Although the mixed methods capture of audio data streams in both the Interaction Capture and the Video Cued Recall Interviews was not ideal in terms of audio quality, the quality was sufficient to capture most of the richness of all contributory factors to the real auditory environments experienced, without expanding the technical complexity of the audio data capture set-up with the use of extra microphones and multichannel live mixing. Such an expansion of the capture set-up would have included the need for more equipment and possibly extra personnel as well. Furthermore the enhanced quality of the audio capture afforded, would not add greatly to the richness of the auditory detail apprehended in the data stream for analysis. Therefore the trade-off between actual audio quality as perceived by the listener and the quality of audio capture related to the richness of auditory detail apprehended in the audio data stream, was a fair and beneficial trade off for this investigation. A full list of the Video Cued Recall Interviews and other evidence collection conducted during this investigation is detailed in table 7-38, Video Cued Recall Interaction Captures.
The completed National Ethics Application Form (NEAF) Approval Number H8543 for this research details the research participant consent procedures for participation in the Video Cued Recall Interviews and other aspects of this research investigation.
Research Participant Information Pack
The research participant information pack is included as E-Appendix 5.
The research participant consent form is included as E-Appendix 6.
The research participant release form is included as E-Appendix 7.
The research participant questionnaire is included as E-Appendix 8.
Ethics amendment approval was sought so as to include all text string data (made anonymous) if published, in data analysis. The ethics committee amendment approval is included as E-Appendix 9.