Avatars, Agency and Performance: The Fusion of Science and Technology within the Arts
Richard Andrew Salmon 2014
9. The Blueprint of Emergent Recommendations
It should be noted here that any suggestions and recommendations put forward in this blueprint are very unlikely to be implemented with the Articulated Head, because it has now been decommissioned and its hardware has been allocated to another purpose. Nevertheless these recommendations were established from the research conducted in conjunction with the Articulated Head, and thus they are presented as if they could be implemented with the Articulated Head. Many of the recommendations will also apply to similar interactive exhibits.
The Emergent Themes from this investigation identify several small crimes committed against the flow of human . machine interaction in the environment under investigation. Many of these small crimes were simple and practical in nature. Some of the smaller crimes identified had a larger impact on human . machine interaction than others. Some of the small crimes were not negative influences on the interaction as such; they were more like opportunities for enhancement of audience engagement with the interactive exhibit that were not taken advantage of.
The major crime identified by analysis of empirical human centered research data in this investigation was simply that the macro dynamic interactive design of the Articulated Head, including many features of the exhibit design enclosure and auditory visual environment, failed to put the human (the customer) first in this interactive environment. The form in the design of many features of the project and exhibit took precedence over function, and simplicity of implementation in functionality took precedence over implementation of interactive functionality that would have nurtured greater flow in this human . machine interaction. Whilst combinatorial explosion and technological barriers and constraints are real and important concerns, they should not mask or preclude the stimulus for trying, and most certainly not stall the quest for progress; moreover they should inform the passage for advancement. Once again, these statements are not to suggest that the Articulated Head did not engage its audience, it is just to say that in hindsight, and armed with the findings from this human centered research
investigation, it is easy to say that much more could have been done. It is hoped that the blueprint of recommendations that follow, will help others avoid the pitfalls that ensue rejection of functionality on the basis of assumed technological complexity, or constraint to advancements, and will also help anyone intending to embark upon the design, construction and implementation of any such similar interactive exhibit, by highlighting the critical implications that the embodiment of any human brain present in an interactive environment, brings to their perception of it, and hence to the design table for that environment.
What follows is effectively a macro dynamic realignment of the interactive exhibit design that addresses the barriers to human . machine interaction that have been identified through this investigation in The Emergent Themes, based on concepts that help explain the theme findings and inform the new design.
9.1 The ecosystem
An ecosystem is defined as a biological community of interacting organisms and their physical environment (Jewell, Abate, & McKean, 2005), quite what one calls an environment that incorporates these factors and a lot of technology such as was the case with the Articulated Head exhibit, I am not quite sure? However the term and concept of an ecosystem is helpful in terms of setting out the main concept for the interactive design refinement that follows. The key idea is that the interactive design needs to create a separate microcosmic-ecosystem for the exhibit that is partially separated from the wider ecosystem in which it is situated (the museum or other larger exhibition space). This simple concept allows most of the practical technical and technological barriers to interactive flow identified through this investigation to be addressed.
For real exponential improvement of human . machine interaction, the machine must ergonomically meet the human on their own playing field on as many different levels as is conceivably possible, physically, practically, mentally and psychologically. This is an enormously tall order, which ultimately leads to the point that now seems simple and obvious, that
humans in this interaction seem to want the robot to behave and act much like a human would, and in as many different ways as possible. Humans want the robot to come across as an existential being that can share their experience, preferences and sensory perception of the immediate surrounding environment with them. For this interaction to be truly fluent over protracted periods of time, the robots performance must at least attempt to match that of a human. However, this somewhat impossible task is made significantly easier by two key factors in relation to the context of Articulated Head . human interaction being;
1) Most interactions that take place are short with external factors to the interaction often dictating the tendency for the audiences’ departure. 2) The audience hope, want and expect the robot to display predictable human preferences and capabilities.
So what exactly do we actually know about the human experience of the interactive environment under investigation? – and how exactly does this knowledge inform the design refinement?
9.2 The blueprint themes
9.2.1 Anthropomorphism
This sub section is directly linked with Theme 1: Anthropomorphism.
We know that humans are predisposed to the anthropomorphic stance as previously discussed and we know that humans can be convinced of an illusion momentarily, especially when they are not given the time for deep critical reflection. Humans can be tricked even when they know, or at the very least must suspect that an illusion is just that, an illusion, and it cannot actually be true, especially if the hope and want to believe that it is true. Examples of this included, the rabbit in the hat trick and a whole range of other pieces in a magician’s repertoire. Indeed without human partiality for believing the impossible magicians would be redundant.
This point is simple; if humans want to believe that the robot possesses consciousness, human preferences and capabilities such as senses, which clearly they do according to the research data, then convincing them that this is the case should not be so difficult – surely all one needs to do is present a cascade of evidence to the human that says that the robot does posses these attributes over the short duration of the interaction. If this cascade of evidence is not criticized by the robots subsequent actions during the interaction then, is it not possible that they might depart the interaction remaining fairly convinced?
9.2.2 Expectations and enculturation
This sub section is directly linked with Theme 2: Expectations and enculturation.
We know that humans hope, want and expect the robot to be able to see and hear, if not smell, taste, touch and feel. Therefore we must try to convince the human that the robot can see and hear. Smell, taste, touch and feel are probably more difficult illusions to sustain though not necessarily impossible. Smell is probably the easiest to implement by using some form of air constituent analysis sensor apparatus, and because it would not require physical contact, it therefore avoids the ensuing health and safety issues. Implementation of taste, touch and feel are more problematic. Nevertheless, the illusion of sight and hearing can be achieved.
The robot already shows signs of being able to see because it is an intentional agent and displays the intention of looking at the person it is interacting with. We know that because it is not always focused on its audience they have reported it as being distracted and/or consider that it is not paying attention to them. If the robot were endowed with the ability to establish features of current variables related to the immediate interactive environment and/or pertaining to the person interacting with it, then that person is likely to be convinced that it can see. This can be achieved in a number of different ways but two obvious examples spring to mind;
1) Present variable features in the spatial environment that the human would not immediately attribute as parameters that are likely to be under the control of the robots systems and then endow the robot with the ability to raise these features in the course of conversational foci. This also gives the robot food for initiating conversation, which partially addresses the finding that it was hard to keep a conversation going. This arrangement includes one example of a way in which the avatar can be silently mutli-present, and extension of this silent multi- presence idea can further extend multisensory capabilities as described in point 2 below. 2) Provide avenues for the robot to be able to establish features of the individual interacting with it that, again, the robot can raise in conversational foci. For example, if one of the features in the spatial environment that the robot had control of, was a large white wall behind, and some distance away from the human interacting with the robot, then the human would be unlikely to attribute the wall as being a parameter that the robot might control, especially if they have their back to it anyway. If this wall could be made to change colour from white to blue or green, which can be achieved through lighting or by using materials that are electro-colour sensitive such as types of St Gobain Glass (“St Gobain Glass,” 2013) then a high resolution camera could take a picture of the human and apply a pixel colour analysis to the picture after chroma-keying out the background of the image. The pixel analysis could then be used to establish various features of the human such as height and the predominant colours present on the lower and upper half of the body. It maybe possible to establish hair colour with reasonable accuracy as well. This would allow the robot to say something about the height or garments that the human is wearing, hence lending significant weight to the idea that it can see. Furthermore this would dispel any human feeling that the robot was not paying attention to them.
9.2.3 Physical Presentation and Operational Aspects
This sub section is directly linked with Theme 3: Physical Presentation and Operational Aspects.
It is interesting that physical presentation of the speaking face was reported to be convincing, even though articulation of a phoneme was the same no matter the context because there was not a co-articulation model. The fact that the face did not incur detrimental reporting from research participants, suggests that the head speech, lip and jaw movement worked fairly well – maybe it would not be completely convincing if you were asked to concentrate on it, and I am sure anyone lip-reading would have had trouble, but in general it did not appear to be a significant impediment to intelligibility or satisfactory interaction?
There are many aspects of the physical enclosure that had affects upon the interaction; these are picked up in section 9.2.12 The Auditory Visual Interactive Environment.
With regard to the operational aspects of the exhibit, the data logger, sonar proximity client, audio localizer and face-tracking software functionality are now seen as being critical functionalities for addressing various aspects of participant reports in relation to improving this human-machine interaction. The most useful function of the face tracking software other than mimicry, is that it is possible for it to recognise and differentiate between faces with reasonable accuracy, yes, it may make some mistakes but we should remember that, just as it is with colour recognition, so to with face recognition. The goal in this interaction is to legitimize the role of the intentional agent and if the face tracking software in cooperation with the data logger, could store the name and face pattern of an individual, then clearly it would allow recognition of the face in future interactions to trigger robot initiation of the use of the humans name. This functionality addresses commentary from several themes because it clearly suggests that the robot can see, has memory and is interested in the human. It also suggests to the human the possibility of relationship building.
To address the audio localizer and sonar proximity client problems, the new design refinement presented shortly suggests amalgamating these functions into one microphone system placed very close to the human in the new microcosmic-ecosystem environment created for the exhibit. The new arrangement would reduce the erratic movements of the Articulated Head’s robotic arm, as well as addressing more fundamental problems listed in the following headings.
9.2.4 Mode of Interaction
This sub section is directly linked with Theme 4: Mode of Interaction.
We know that the keyboard input system was reported to be difficult for participants, and many said it was an unnatural mode for communication. This negative impact upon the human . machine interaction was reported universally across the research participant set, and observations of the public use of the kiosk keyboard also confirm this. It is a major finding of this investigation that the kiosk keyboard input system was the perpetrator of the most major crime committed against human . machine interaction in this interactive environment, because it failed to put the human first, it failed to meet the human on their own playing field in terms of modality of communication and furthermore, it sucked the life out of any other attempts to legitimize the intentional agents performance by drawing the attentional status of the human brain into concentrating on a typing activity with a less than conducive interface at the expense almost everything else related to the exhibit for a very significant percentage of the interaction time. We know that speaking to the robot is something that humans would prefer as their mode of interaction.
We know that there are a very large number of mistakes in input strings. We know that speech recognition technology is not perfect but it is getting better every day and there are models of speech recognition technology that no longer require training before operation. Therefore it is a recommendation, or should say instruction that installation of speech recognition technology is an imperative addition to improving the flow of human machine interaction with any such similar exhibit. The argument that it makes too many mistakes does not wash because textual data analysis
shows that keyboard input is unlikely to be better and could possibly be much worse, only time and experimentation will confirm this. There is the argument that speech recognition technology would not work in a public exhibition environment because of acoustics and ambient sound, but the new microcosmic-ecosystem environment presented in The Auditory Visual Interactive Environment addresses this concern directly.
9.2.5 Movement Tracking and Control
The control word anomaly identified in Theme 5: Movement, Tracking and Control should be explored in the programming the Chatbot and in the pre- programmed actions of the Articulated Head. For example, if the human asked the robot to dance or sing the robot could/should have a small repertoire of responses to the control words such as; conducting a short dance or song line or saying no, why should I dance or sing for you. Providing a range of control words and responses to them would enhance engagement.
What we do know is that people like to be able to make the robot do something. The humans use of the control word ‘say’ shown in Table 7-7 – Word ‘say’ phrase query list confirms this. It appears that the opportunity to contribute something to the artwork, to leave ones mark, even if it is only momentarily is compulsively engaging for humans. This phenomenon seems synonymous with the human impulses associated with graffiti and carving ‘I woz here’ into the wood of a park bench.
The projections project detailed in 6.8.3; The Thought Cloud Textual Projections in conjunction with the spatial environmental design refinement presented in 9.2.12 The Auditory Visual Interactive Environment addresses the weak impact of the projections reported by humans in interaction with the exhibit, and represents excellent opportunities for humans to contribute temporary markings to the interactive artwork, hence improving engagement in human . machine interaction.
9.2.6 Dialogue
This sub section is directly linked with Theme 6: Dialogue.
A key finding of this theme is that programming dialogue to cater for the anthropomorphic stance of the human in conversational foci is likely to enhance engagement in the interaction, and this is probably also the most sensible approach to expansion of the Chatbot’s conversational repertoire.
During the course of this investigation I discussed expansion of the Chatbot repertoire with the programming engineers and one of the key concerns was related to the fact that, creating what they called a ‘say anything’ Chatbot would entail the ramifications of combinatorial explosion, requiring a team of programmers and a protracted period of time in order to achieve this.
I agree with the programming engineers, and so recommend the employ of combinatorial reduction of the task by limiting the development of the Chatbot’s repertoire to the subjects that we know from text string data analysis are the main subject of interest to the human.
We know that the human appears to test and probe the robot with their anthropomorphic stance, checking its preferences against their own existential experience of the lifeworld. We know that a biological “systems desires are those it ought to have, given its biological needs and most practical means of satisfying them. Thus intentional systems desire survival and procreation, and hence desire food, security, health, sex, wealth, power, influence and so forth”(D.C Dennett, 1989, p. 49)
Maslow’s hierarchy of needs (Maslow, 1954) focuses on human needs and the innate curiosity of humans. The hierarchy of needs is represented in the diagram below;
Description: ile:Maslow’s hierarchy of needs.svg
Figure 9-1 Maslow’s Hierarchy of Needs
Image by J. Finkelstein http://commons.wikimedia.org/wiki/File:Maslow%27s_hierarchy_of_needs.svg
My recommendation is that expansion of the Chatbot’s conversational repertoire should be based on the E-Appendix 13: Top 100 User Input String Words. The list should be examined to extrapolate the words that related to the hierarchy of needs as shown in Figure 9-1 above, by working from the base of the triangle to the tip and starting with the words ‘food’ or ‘eat’. Conversational strings should be expanded by providing a range of Chatbot responses to common User input probes thereby enhancing engagement in conversational interaction by variation. Furthermore the preferences of the User input to the system should be stored in the data logger (which serves as the robots memory) with the humans name and face recognition pattern so that the Chatbot in subsequent interaction can use these preferences for conversational initiation. Extrapolation of the humans name and other details can of course be achieved by targeting form field type questions such as ‘what is your name?’ at the human. This approach caters for the most common intrigue of the human anthropomorphic stance and probing, whilst combating the ramifications of combinatorial explosion in Chatbot
repertoire expansion, simultaneously to facilitating expansion of the memory and knowledge base that the robot can draw upon – leading to enhancement of engagement in future interactions.
Chatbot accord with User preferences should be displayed in conversational responses regularly because reciprocal niceness was reported to improve the conversation, and sense of engagement as a result. Humans like people who share agreement with them, and thus it is likely that this concordance will find favour in human . machine interaction too.
A database/memory library related to music, film, TV and game preferences are another way in which the Chatbot’s conversational repertoire can be expanded; again User preferences could be stored. The library would provide opportunities from Chatbot initiation of conversation and could easily show concordance with User preferences by the Chatbot picking music and films from similar genre categories as the User’s stored preference for discussion.
The robot should be given that capability to search the web and other data sources so that it can present results on specific subjects such as the weather, temperature, and sport results. Some facilitation for presenting results both visually and/or through the Chatbot should be implemented.
9.2.7 Engagement
This sub section is directly linked with Theme 7: Engagement.
You, the reader should now be able to feel the re-platting of the untangled strands of the aforementioned knot beginning to come together. Engagement is the theme that all the other themes effectively contribute to.
We have already discussed several ideas that relate to face recognition, human feature recognition, memory, data/knowledge base access and speech recognition. The functionality and ideas already discussed contribute very significant ways in which the avatar can be given agency and can be multisensory and multi-present with regard to its interactive environment and
knowledge acquisition. The ways in which the functionalities and ideas put forward contribute to the characteristics of the virtual performers performance, and subsequently condition affect and engagement of the audience are clear.
9.2.8 Emotions
This sub section is directly linked with Theme 8: Emotions.
Human predilection for Scotoma has been discussed before (see 7.3.8), if one knows what the mind wants to see anyway, then one can give the mind a helping hand in seeing it, through simulated performance that points in the desired direction!
Utilisation of the aforementioned E-mote command repertoire for facial expression linked directly to User input regarding emotions is one obvious way of getting the Articulated Head to appear to have emotions. This would address in part, the need for concordance to be shown, not just in words but also in facial expressions. Gestures and tone of voice could also be linked here; sad could indicate slower talking and lowering of the industrial robotic arm whereas excitement and happiness could be simulated by the opposite. These simulations would indicate to the human that the robot is able to share common objects of consciousness and sedimented layers of experience.
There would be some technical issues to overcome with implementation of the above functionalities, especially with regard to control of voice inflection, but the majority of the framework was already in place with the Articulated Head.
9.2.9 Senses – related
This sub section is directly linked with Theme 9: Senses – Related.
Much has already been said regarding simulation of the senses, suffice to say that convincing the audience that the robot can see and hear is paramount to enhancing legitimization of the intentional agents performance and the human perception of its presence as an existential being active in the interaction.
9.2.10 Memory related
This sub section is directly linked with Theme 10: Memory Related.
Crosspollination of memory in the themes above has already constituted its integration in the re-plating process with reference to the acquisition and storage of Users preferences, and in relation to access to knowledge bases for display and/or inclusion in conversational foci.
9.2.11 Outliers
This sub section is directly linked with Theme 11: Outliers.
The outlier theme only left two subjects of interest in analysis. The Sci-Fi theme is already integrated in the re-platting process with the inclusion of a database library related to music, film, TV and game preferences. The other subject of interest related to a research participant feeling that a third party was watching them during their interaction. Indeed a third party was watching them. To address this issue one simply conceals any cameras and other sensory apparatus within the exhibit structure so that they are essentially invisible and their functionality does not distract the human in interaction.
Then, after making sure that the exhibits functionalities perform successfully with autonomy, one removes the third party altogether and lets the exhibit speak for itself!
9.2.12 The Auditory Visual Interactive Environment
The Design Refinement
The experimental projects and exhibit design refinement detailed below integrates recommendations from the themes analysis under the headings in section 9.2 above, and stems directly from the explanation of research participant reporting imparted in Theme 12: Auditory Visual Interactive Environment. The design refinement is a plan for the creation of a microcosmic-ecosystem and interactive environment for the exhibit, aimed at giving an audience a more immersive and enveloping auditory visual
experience and expediting more encouraging participant reportage of the enhancement of engagement in human . machine Interaction with this, and similar types of interactive installations. The description of the design that follows assumes implementation in the Powerhouse Museum but could easily be applied to many exhibition environments.
Some people might ask, what does all this have to do with research? Research is by definition the systematic investigation into, and study of materials and sources in order to establish facts and reach new conclusions: this investigation systematically studied and observed human interactions with the Articulated Head and has identified that humans like it when the robot pays attention to them. Humans have indicated in the research data that they want to use speech modality in communication. When the robot demonstrates a level of agency that is pre-conceptually dismissed by the human as being beyond its capabilities, the human momentarily attributes the robot with real human capabilities and consciousness, and in that instance has been shown to instinctively switch to a speech modality. (see 7.3.4 ‘pink shirt’ & 7.3.6 ‘young man’). This phenomenon’s relationship is illustrated on the following page.
Figure 9-2 – The Freakish Peak
Whilst this phenomenon does provide a way to demonstrate a clear peak in human engagement with the robot, which meets with the big question; How can interaction between humans and machines be improved? Apprehension of this new speech modality of communication needs initiating at the speech recognition apprehension point indicated in figure 9- 2 above. Sustainability of the enhanced engagement of the audience in this interaction would require that the robot could demonstrate the capability of speech recognition. Beyond this, the sustainability of the interaction and conversation would rely heavily on the diversity and coherency of the robots
‘database of conversational possibilities’ and the number of attributes about its audience and the immediate environment that it could bring into conversation.
With the antecedent of the Freakish Peak Phenomenon hypothesis, and in an attempt to address many of the drawbacks of the Articulated Heads exhibit design and performance identified through the data analysis in sections 7 & 9 of this document, a design refinement was established. Figure 9-3 presents a rudimentary schematic depiction of a proposed environment that would assist in promoting human-machine interaction. The schematic in Figure 9-3 forms a useful indicator of the proposed shape and scale of the redesign that is then extended by the subsequent Figures 9.4 & 9.5, both of which provide further insight into key aspects of the design.
A refined experimental project design
Figure 9-3 Interactive environment design refinement & layout.
Grapher File (Mac Only)
i. The round black circles indicate loudspeaker positions ii. The red objects in the middle indicate the humans position in the instalation iii. The yellow object indicate the Articulated Head’s position iv. The concave shape indicates a large surrounding projection screen which would be acoustically pourous.
The x,y,z formula above was entered into Apple’s proprietary Grapher software application. The formula generates the concave grid shape in Figure 9-3 above. The other objects present in Figure 9-3 above are created and positioned with other formulae not shown here. The colours and rendering of the concave screen are a function of the Grapher software application. By manipulating the value of t in the formula one can close down the size of the circular cut out in the base of the concave screen, similarly, if one changes the value of 0.8 figure just before pi in the formula one can close down the size of the circular cut out in the ceiling of the shape. You can also manipulate the width of the entrance gap in the shape by manipulating the value 2 preceding the second pi character in the formula. The formula below Figure 9-3 above, which creates the concave screen shape, can be unpacked and shown as follows;
x = 0 + 0.7(sin [t] . sin [u]) where t=0.9, 0.8; u=pi, 2.pi; b=4
y = 0 + 0.7(sin [t] . cos [u]) where t=0.9, 0.8; u=pi, 2.pi; b=4
z = 0 + 0.7(cos [t] . cos [b]) where t=0.9, 0.8; u=pi, 2.pi; b=4
The black dots in Figure 9-3 above indicate prospective speaker mounting positions. The yellow objects in the middle indicate the robot and the red indicate an interacting audience. No kiosk is indicated in the design because the Video Cued Recall interview and Interpretive Phenomenological Analysis data clearly indicate participant problems with typing, and that the modality of information exchange appeared to be channeling the interacting audience’s focus between the keyboard and the head. The majority of the interacting audience’s attention time was predictably spent looking at the kiosk keyboard. User input data clearly showed a very high rate of spelling mistakes and incomplete sentences present in typed kiosk keyboard input. Therefore voice recognition software and a purpose-designed microphone array system utilizing well known standard audio compression techniques including ducking, gating and/or switching would be expected to produce a comparable, if not better standard of system text input whilst liberating the participant’s gaze and opportunities for visual meanderings – hence immediately rendering any
projections more tangible. The diagram in Figure 9-3 is not made specifically to scale, it is just for illustrative purposes to help elucidate the idea of a redesigned, more intimate and immersive interactive enclosure for the exhibit. The exact shape of the concave screen, speaker-mounting positions and placement of the robot and audience would depend on a number of factors, such as projection options and dispersion field angle specifications of the speakers.
There is a multitude of ways in which these ideas could be incorporated into any similar large-scale installation or exhibit, with variations to the scale, dimensions and layout entirely dependent on the scenario in question, therefore the details in figure 9-3, 9-4 & 9-5 are simply meant to convey the basic concepts, which they do so adequately.
Whilst it could be argued that Figure 9-3 could have been presented with more sophistication by the use of higher-end CAD software or 3DS Max, Maya etc.., and that high-end software depictions would look good for any hypothetical prospective client, which indeed they would, – without the specific needs of any prospective client and the exhibition/installation scenario in question, any such depictions would be somewhat superfluous making little difference to elucidation of the ideas already conveyed .
Consideration of health and safety would also affect the design. Previously a 1.83-3m glass barrier separated the audience from the robot. In this new design, it is envisaged that both the audience and the Articulated Head would be on raised platforms. The robot would be separated from the audience by a moat with a raised outer lip to deter any further human movement towards the robot. The moat gully and some of the wall rising out towards the human would be connected with pressure sensitive trip switches for disabling the robot should any foolish individual choose to ignore the obvious warning signs sent to them by the raised moat lip and labeling saying “do not cross this line”. The moat would completely surround the Articulated Head and the trip switches would disable the robot before any person could enter its navigational space. To be doubly sure of health and
safety, a laser and photosensitive resistor trip switch arrangement would also protect the navigational space as a failsafe contingency plan.
Figure 9-4 Enclosure Cross Section View
Design refinement materials and layout need further consideration to take account of expense and other issues such as disabled access and durability for example. However, the diagram 9-5 below shows a rough (not to scale) plan view of the area shown in Figure 9-4 above.
EntrancePlan View of Design RefinementNot to scaleChroma-keyingWall54321Robot rotational limitRobot Rotational limitHumanRobot= Microphone arrayRaised lip barrier1 = Concave display screen2 = Human movement area3 = Protection gulley4 = Robot navigational area5 = Robot mounting plinth= No navigation areaMaterials1 = Acoustically Absorbent2 = Acoustically porus grid3 = Acoustically porus grid4 = Acoustically porus grid5 = Wood & metalAll mountings rubber shocked metal bolted= Human navigation area= Robot navigation area= Trip switch protected gulley area
Figure 9-5 Plan View of Design Refinement
This arrangement addresses another issue present in the wider Video Cued Recall interview and Interpretive Phenomenological Analysis data; that of the audience reporting that the glass barrier reduced intimacy of the engagement and made the Articulated Head more like a caged animal.
Further key aspects of this design that address concerns identified through themes analyses are as follows;
The Spatial Auditory System
All eight loudspeakers in the array are mounted in a relatively uncompromising position so a direct uncoloured sound image is communicated to the audiences’ ears. The problem of the audience receiving a sonic image comprised of substantially reflected sound from surface materials, is dramatically reduced as 1) no acoustically reflective material is present between the participant and the speakers and 2) the speakers are mounted so that their projection field propagates a significant proportion of their output, out from under or over the concave screen surrounding the audience. The upper four-speaker array prorogation will be reflected off the floor and out into the museum.
The same is true of the lower four-speaker array except that prorogated waves will take longer to reach the museum ceiling. The screen arrangement also means that less reflected sound would enter back into the audience arena as the screen effectively acts as an acoustic shield surrounding the audience. Furthermore the concave screen reduces the effects of ambient noise from entering the audience’s arena, therefore making a clear and tangible contribution to raising the effective headroom available within the audio system. This in turn allows for much clearer balanced spatial audio definition within the immediate space surrounding the exhibit. Accurate psychoacoustic placement of virtual sound sources would be achievable. The speaker arrangement allows for a controlled and consistent exhibit sound pressure level contribution to the wider museum environment, whilst retaining the positive aspects of the robots singing voice with other sounds from the exhibit attracting a wider audience, a feature that was indicated as being desirable in extract L in Appendix 3; The Auditory Visual References Table. The speaker array also allows for experimentation with a range of spatialisation techniques including Vector Based Amplitude Panning (VBAP), Vector Distance Panning (VDP), DBAP and Ambisonics, Virtual Microphone Technique (ViMiC), Wave Field Synthesis (WFS) and Ambisonics Equivalent Panning (AEP). [Lossius, personal communication]. The projection screen also provides partial isolation from ambient noise, which would otherwise hinder accurate voice recognition. Spatial audio presented from within the partially isolated acoustic environment could easily be cancelled from voice recognition signals by phase reversal and addition of the spatial auditory source signal to the voice recognition microphone signal hence dramatically increasing system intelligibility whist still retaining the benefits of spatial auditory cues. “High levels of envelopment and room impression created by surround sound have been shown, through auditory research, to be the descriptive features most closely related to positive emotional responses” (Rumsey, 2001, p. 46), hence more positive experience should be reported by patrons of the exhibit simply because of the increased spatial auditory envelopment created by this design refinement, regardless of any other features of the design that have a positive effect upon engagement.
Section 9 – The Blueprint of Emergent Recommendations 249 The Thought Cloud Projections The concept for the Thought Clouds draws from an analogy with the famous rabbit in a hat trick – you know that the rabbit cannot really have been in the hat as you checked it out for yourself at the beginning of the trick. However, you believe you just saw the rabbit come out of the hat – and seeing is believing. Enhancement of the illusion of the Articulated Head thinking, brought about in the minds of the interacting audience by the Thought Cloud projections, relies on allowing the audiences visual experience to meander freely with their thoughts and imagination. Extract G in Appendix 2; The Auditory Visual References Table suggests that although relations between the text in the projections and some connection with the user input can be made, it was not clear enough and probably needed to be made more explicit. Stelarc showed a stronger interest in the more simple Director projections, which do have more explicit connections to user input and ECA output as the exact words are extracted for display, rather than related words. Perhaps some experimentation with a mixture of both was called for so extension of the projection capabilities was explored with the use of Vizzie objects (Cycling, 2010) and Jamoma (“Jamoma,” 2012) within the Max/MSP (Cycling 74, 2012) programming environment. We know that participant processing of auditory visual as oppose to auditory only information, increases cognitive load and this has been shown to reflect in increased participant reaction times (Stevens, Gibert, Leung, & Zhang, 2011). It is clear that the audience needs to be able to make a tangible visual connection with the projections for a sufficiently long duration of time, in order to view, read, digest and identify connections between the displayed words, input text, Embodied Conversational Agent output and their own thought processes. To evoke attribution of the ability of conscious thought to the Articulated Head, the audience’s imagination must dream up links between the projections and the thread of conversation taking place. Fortunately the cognitive processing time required for dreaming up these links is still thought to be relatively short in comparison to the time it took to type a sentence on the kiosk keyboard, implementation of voice recognition in our design refinement, delivered along with emancipation of gaze,
Sub Section 9.2 – The blueprint themes 250 dramatically increases the available time for auditory visual processing during interaction. This trick of the imagination can be very powerful and convincing but it does require an immersive interactive environment with a good degree of visual freedom afforded to the audience, in order for the trick to work. With the above points in mind, the concave visual screens design refinement surrounding the audience and the recommendation for removal of a kiosk and keyboard, in favour of voice recognition for input was conceived. Removal of the kiosk screen, which displays typed input from the keyboard and THAMBS is also desirable, because it is another visual distraction for the audience and since the audience do not need to look at what they have typed anymore with voice recognition, it would no longer be a necessity. Options for high absorption coefficient materials to be used in the construction of the display screen and options for the display of the Thought Clouds from a concave screen needs more investigation. Possibilities exist for projection onto the screen or display emanating from the screen. However, (“Surround Vision – Projections on any type of surface,” 2007) provides a video with examples of options for projecting onto curved screens and (“Command Australia – GlassVu Carbon Projection Film,” 2012) provides one example of projection screen material. Another consideration, and possible visual enhancement to the exhibit, which would also help to attract an audience from within the wider museum environment, is the idea that the display screen could be visually active on both sides, allowing the exhibits display to be visible on the inside and outside of the exhibits screen. The blueprint of recommendations above is effectively the re-platting of the separated strands of the aforementioned knot into a new considered and more conducive interactive environment designed to promote and sustain the magic of the interaction for the human user for long enough to present a convincing, powerful, immersive and enveloping interactive experience without encumbering the project team and producers with unmanageable complications represented by issues such as combinatorial explosion.