MaryTTS (MARY Text-to-Speech System)

Notes:

MaryTTS is an open-source Text-to-Speech (TTS) synthesis platform that is written in the Java programming language. It is designed to be a flexible and customizable TTS engine that can support a wide range of languages and voices. MaryTTS can be used for a variety of applications, such as generating speech for accessibility aids or creating synthetic voices for use in language learning or entertainment.

TTS (Text-to-Speech) synthesis is the process of converting written text into spoken speech. It is a technology that enables computers, mobile devices, and other electronic devices to produce speech that sounds like human speech. The process of TTS synthesis involves several steps such as text analysis, prosody prediction, and speech synthesis.

The text analysis step involves breaking down the text into smaller units such as words, phrases, and sentences, and analyzing their grammatical structure and meaning. The prosody prediction step involves predicting the intonation, rhythm, and stress of the speech based on the text analysis. The speech synthesis step involves generating speech waveforms from the prosody predictions and the text analysis. These waveforms can be then converted into an audio format that can be played on any device.

TTS synthesis is used in various applications such as speech-enabled devices, such as smartphones and smart speakers, assistive technology for the visually impaired, virtual assistants, and more. It’s also used in IVR systems, GPS navigation systems, and other applications where it’s useful to have speech output.

Resources:

github.com/psibre/artimate .. articulatory animation framework
github.com/marytts .. open-source, multilingual text-to-speech synthesis system written in java
dfki.de/speecheval .. automatic end-to-end tests of spoken dialog systems
mary.dfki.de .. open-source, multilingual text-to-speech synthesis platform written in java

Wikipedia:

Open source voice creation toolkit for the MARY TTS Platform M Schröder, M Charfuelan, S Pammi… – 12th Annual Conference of …, 2011 – hal.inria.fr … con- catenation of diphones. For creating a new voice, a diphone database should be provided to the MBROLA team, who will 1http://mary.dfki.de/ process and adapt it to the MBROLA format for free. The re- sulting MBROLA … Cited by 17 Related articles All 14 versions

Prosody control in HMM-based speech synthesis S Pammi – DFKI Speech Technology Lab., Tech. Rep, 2011 – dfki.de …

An event-based conversational system for the nao robot I Kruijff-Korbayová, G Athanasopoulos, A Beck… – Proceedings of the …, 2011 – Springer … Page 7. An Event-Based Conversational System for the Nao Robot 131 References [1] Acapela website http://www.acapela-group.com/index.html [2] ALIZ-E website http://aliz-e.org/ [3] Mary TTS website http://mary.dfki.de/ [4] OpenCCG website http://openccg.sourceforge.net/ [5 … Cited by 15 Related articles All 13 versions

Visual SceneMaker—a tool for authoring interactive virtual characters P Gebhard, G Mehlmann, M Kipp – Journal on Multimodal User Interfaces, 2012 – Springer … The figure presented a typical application con- figuration showing the Mary TTS2 for speech output (see 2http://mary.dfki.de Page 4. 6 J Multimodal User Interfaces (2012) 6:3–11 Fig. 3 The Visual SceneMaker IDE showing different working areas and highlighting modes Fig. … Cited by 14 Related articles All 4 versions

Continuous interaction with a virtual human D Reidsma, I de Kok, D Neiberg, SC Pammi… – Journal on Multimodal …, 2011 – Springer Page 1. J Multimodal User Interfaces (2011) 4:97–118 DOI 10.1007/s12193-011- 0060-x ORIGINAL PAPER Continuous interaction with a virtual human Dennis Reidsma · Iwan de Kok · Daniel Neiberg · Sathish Chandra Pammi … Cited by 26 Related articles All 28 versions

Sign language avatars: animation and comprehensibility M Kipp, A Heloir, Q Nguyen – Intelligent Virtual Agents, 2011 – Springer … However, we had to realize that our result was not comprehensible by our deaf assistant – not a single sign. The superficial sign language knowledge of our (hearing) animation 4 http://mary.dfki.de 5 The stroke is the most energetic part of a sign and can be repeated [19]. … Cited by 34 Related articles All 13 versions

Symbolic vs. acoustics-based style control for expressive unit selection. I Steiner, M Schröder, M Charfuelan, A Klepp – SSW, 2010 – dfki.de … 1http://festvox.org/ 2http://mary.dfki.de/ 3Due to labeling mismatches, a small number of utterances had to be removed from this data, and the numbers of utterances from each style actually used for voice building are as follows: aggressive 394; cheerful, depressed, and poker … Cited by 12 Related articles All 12 versions

A conversational system for multi-session child-robot interaction with several games I Kruijff-Korbayová, H Cuayáhuitl, B Kiefer… – … German Conference on …, 2012 – Citeseer Page 141. A Conversational System for Multi-Session Child-Robot Interaction with Several Games Ivana Kruijff-Korbayová1, Heriberto Cuayáhuitl1, Bernd Kiefer1, Stefania Racioppa1, Piero Cosi2, Giulio Paci2, Giacomo Sommavilla2 … Cited by 6 Related articles All 5 versions

Analysis of significant dialog events in realistic human–computer interaction D Prylipko, D Rösner, I Siegert, S Günther… – Journal on Multimodal …, 2014 – Springer Page 1. J Multimodal User Interfaces (2014) 8:75–86 DOI 10.1007/s12193-013- 0144-x ORIGINAL PAPER Analysis of significant dialog events in realistic human–computer interaction Dmytro Prylipko · Dietmar Rösner · Ingo … Cited by 9 Related articles All 3 versions

Verbally assisted virtual-environment tactile maps: a prototype system K Lohmann, M Kerzel, C Habel – Proceedings of the Workshop on …, 2012 – ceur-ws.org … Interface. Speech synthesis was realized using Mary TTS. 1 1 http://www.chai3d. org and http://mary.dfki.de Proceedings SKALID 2012 27 Verbally Assisted Virtual-Environment Tactile Maps: A Prototype System Page 4. Table … Cited by 7 Related articles All 3 versions

Implementing Intelligent Pedagogical Agents in virtual worlds: Tutoring natural science experiments in OpenWonderland M Soliman, C Guetl – Global Engineering Education …, 2013 – ieeexplore.ieee.org … [28] AIML: Artificial Intelligence Markup Language, http://www.alicebot.org/aiml.html, Online, Accessed Dec. 3, 2012. [29] Mary Text To Speech, http://mary.dfki.de/, Online, Accessed Dec. 3, 2012. … Page 8. http://mary.dfki.de/documentation/publications/schroeder_trouvain2003. … Cited by 4 Related articles All 5 versions

Facial expression-based affective speech translation É Székely, I Steiner, Z Ahmed… – Journal on Multimodal …, 2014 – Springer Page 1. J Multimodal User Interfaces (2014) 8:87–96 DOI 10.1007/s12193-013-0128- x ORIGINAL PAPER Facial expression-based affective speech translation Éva Székely · Ingmar Steiner · Zeeshan Ahmed · Julie Carson-Berndsen … Cited by 8 Related articles All 4 versions

Improving TTS synthesis for emotional expressivity by a prosodic parameterization of affect based on linguistic analysis MAM Shaikh, ARF Rebordao… – Proceedings of the 5th …, 2010 – isle.illinois.edu … The following code is an example of the dynamic MaryXML for the sentence referred in 3.1.1.

Facial expression as an input annotation modality for affective speech-to-speech translation É Székely, Z Ahmed, I Steiner… – … Artificial Agents in …, 2012 – coli.uni-saarland.de … Three evaluation experiments were conducted to address these questions. 3 http://mary.dfki.de/ 4 Released under the Creative Commons Attribution-NoDerivatives license. Page 5. English video neutral angry sad happy 31 4 17 88 8 17 52 6 23 67 13 0 38 13 17 6 … Cited by 3 Related articles All 5 versions

Speecheval: A domain-independent user simulation platform for spoken dialog system evaluation T Scheffler, R Roller, N Reithinger – … of the Paralinguistic Information and its …, 2011 – Springer … An out-of-the-box text-to-speech system, the open source system MARY1, is used to render the generated utterances in spoken German, which is then sent on 1 http://mary.dfki.de/ Page 4. 298 Tatjana Scheffler, Roland Roller and Norbert Reithinger … Cited by 2 Related articles All 7 versions

Music synchronizer with runner’s pace for supporting steady pace jogging T Kitahara, S Hokari, T Nagayasu – HCI International 2014-Posters’ …, 2014 – Springer … ISWC 2006 (2006); Flanagan, JL, et al.: Phase Vocoder. Bell System Technical Journal 45, 1492–1509 (1966) CrossRef; MARY Text To Speech, http://mary.dfki.de/; Goto, M., et al.: RWC Music Database: Music Genre Database and Musical Instrument Sound Database. In: Proc. … Cited by 2 Related articles

Artimate: an articulatory animation framework for audiovisual speech synthesis I Steiner, S Ouni – arXiv preprint arXiv:1203.3574, 2012 – arxiv.org … 2009,” in Proc. Blizzard Challenge, Edinburgh, UK, Sep. 2009. [Online]. Available: http://mary.dfki.de/ [22] A. Zierdt. Three-dimensional Artikulographic Position and Align Determination with MATLAB. [Online]. Available: http: //wiki … Cited by 3 Related articles All 12 versions

Multi-step natural language understanding P Milhorat, S Schlögl, G Chollet, J Boudy – SIGdial 2013: 14th Annual …, 2013 – aclweb.org … This module is supported by exter- nal knowledge sources such as for example the 5http://mary.dfki.de/ 6https://github.com/stephanschloegl/WebWOZ context in which an utterance has been produced (ie it receives input from the Context Catcher module described below). … Cited by 1 Related articles All 11 versions

From Tale to Speech: Ontology-based Emotion and Dialogue Annotation of Fairy Tales with a TTS Output C Eisenreich, J Ott, T Süßdorf, C Willms, T Declerck – 2011 – ceur-ws.org … 3 Natural Language Toolkit: http://www.nltk.org/. See also (Bird et al., 2009) 4 See (De Smedt & Daelemans, 2012). 5 See (Horridge & Bechhofer, 2011). 6 http://mary.dfki.de/. See also (Schröder Marc &Trouvain, 2003) or (Charfuelan & Steiner, 2013). Page 3. … Related articles

Progress In Facial Expression Based Affective Speech Translation Z Ahmed, I Steiner, E Székely, J Carson-Berndsen – www-old.coli.uni-saarland.de … in the console window. 5http://mary.dfki.de/ 6dfki-pavoque-styles, released under the Creative Commons Attribution-NoDerivatives 3.0 license. 7http://mary.dfki.de/ documentation/maryxml 192 Page 5. Figure 2 – Screenshot of … Related articles All 4 versions

Improving speech synthesis quality by reducing pitch peaks in the source recordings. L Violante, PR Zivic, A Gravano – HLT-NAACL, 2013 – anthology.aclweb.org Page 540. Proceedings of NAACL-HLT 2013, pages 502–506, Atlanta, Georgia, 9–14 June 2013. cO2013 Association for Computational Linguistics Improving speech synthesis quality by reducing pitch peaks in the source recordings … Related articles All 9 versions

Towards a Practical Silent Speech Interface Based on Vocal Tract Imaging B Denby, J Cai, T Hueber… – 9th International …, 2011 – halshs.archives-ouvertes.fr … Denby, B. et al. Silent Speech Interfaces. Speech Communication, 52(4):270-287, 2010. DFKI, http://mary.dfki.de/ OpenMary TTS System. Florescu, V. et al. Silent vs Vocalized Articulation for a Portable Ultrasound-Based Silent Speech Interface. … Related articles All 16 versions

Progress report, compendium of work done during months 1-12 M Charfuelan, E Douglas-Cowie, R Cowie, D Heylen… – dcs.gla.ac.uk … Tangible outcome: A package for HNM speech analysis and synthesis Partners involved: DFKI Dissemination: Freely available on MARY TTS version 4.0: http://mary.dfki.de/Download SSPNet D9.2: page 5 of 13 Page 12. SSPNet [231287] D9.2: Modelling politeness … Related articles

Designing Natural Language User Interfaces with Elderly Users S Schlögl, M Garschall, M Tscheligi – Workshop on Designing …, 2014 – cs.toronto.edu … While they stated that they would still prefer a physical or a touch button over voice triggers or gestures for activating speech recognition, this highlights the need for clear visual or auditory feedback to indicate whether speech recognition 12http://mary.dfki.de/ 3 Page 4. … Cited by 1 Related articles All 2 versions

Investigating the Social Facilitation Effect in Human–Robot Interaction I Wechsung, P Ehrenbrink, R Schleicher… – Natural Interaction with …, 2014 – Springer Page 1. Chapter 15 Investigating the Social Facilitation Effect in Human–Robot Interaction Ina Wechsung, Patrick Ehrenbrink, Robert Schleicher, and Sebastian Möller Abstract The social facilitation effect is a well-known social-psychological phenomenon. … Cited by 3 Related articles All 5 versions

Multimodal affect recognition for adaptive intelligent tutoring systems R Janning, C Schatten… – Workshop on Feedback …, 2014 – researchgate.net Page 1. Multimodal Affect Recognition for Adaptive Intelligent Tutoring Systems Ruth Janning Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Marienburger Platz 22, 31141 Hildesheim, Germany janning@ismll.uni- hildesheim.de … Cited by 5 Related articles All 3 versions

Expressive gibberish speech synthesis for affective human-computer interaction S Yilmazyildiz, L Latacz, W Mattheyses… – Text, Speech and …, 2010 – Springer … Speech Synthesis. Ph.D. thesis, PHONUS 7, Research Report of the Institute of Phonetics, Saarland University (2004) 11. OpenMary: Open Source Emotional Text-to-Speech Synthesis System, http://mary.dfki.de/ 12. Schröder, M … Cited by 7 Related articles All 8 versions

Controlling interaction with digital product memories P Gebhard – SemProM, 2013 – Springer Logo Springer. Search Options: … Cited by 2 Related articles All 3 versions

Towards conversational agents that attend to and adapt to communicative user feedback H Buschmeier, S Kopp – Intelligent Virtual Agents, 2011 – Springer … The realiser schedules speech and non- verbal behaviour, provides the estimated duration back to the behaviour planner and starts the animation. The behaviour planner delays the generation of the 2 MARY TTS – https://mary.dfki.de/ Page 11. … Cited by 21 Related articles All 11 versions

The SEMAINE API: towards a standards-based framework for building emotion-oriented systems M Schröder – Advances in human-computer interaction, 2010 – dl.acm.org Page 1. Hindawi Publishing Corporation Advances in Human-Computer Interaction Volume 2010, Article ID 319406, 21 pages doi:10.1155/2010/319406 Research Article The SEMAINE API: Towards a Standards-Based Framework for Building Emotion-Oriented Systems … Cited by 82 Related articles All 8 versions

Reference-Related Speaker Gaze as a Cue in Online Sentence Processing H Kreysa, P Knoeferle – 2013 – duepublico.uni-duisburg-essen.de … speech speed of the virtual agent. Billie uses the Articulated Communicator Engine (Kopp & Wachsmuth, 2004) together with Mary TTS for speech synthesis (http://mary.dfki.de). However, Billie’s speech production only served … Related articles All 3 versions

MOTI: A Motivational Prosody Corpus for Speech-Based Tutorial Systems S Wolff, A Brechmann – Speech Communication; 10. ITG …, 2012 – ieeexplore.ieee.org … With the MOTI Corpus, we provide a small but well-eval- uated collection of pseudo-words and real word feedback comments with neutral as well as motivational (praising, 2http://mary.dfki. de/ 3http://www.cstr.ed.ac.uk/projects/festival/ blaming, commiserative) prosody. … Cited by 2 Related articles All 3 versions

Data Collection in a Wizard-of-Oz Experiment V Rieser, O Lemon – Reinforcement Learning for Adaptive Dialogue …, 2011 – Springer … displaying a subset of columns. These four screens are presented to the 5 The Mary system for German TTS is available at http://mary.dfki.de/ (3. January 2011) Page 5. 6.1 Experimental Setup 89 Fig. 6.4 Screenshot from the … All 3 versions

Evaluation study and results of intelligent pedagogical agent-led learning scenarios in a virtual world M Soliman, C Guetl – Information and Communication …, 2014 – ieeexplore.ieee.org … [11] Mary Text To Speech, http://mary.dfki.de/, Online, Accessed March 15, 2014. [12] AIML: Artificial Intelligence Markup Language, http://www.alicebot.org/aiml.html, Online, Accessed March 15, 2014. [13] R. Likert, “A technique for the measurement of attitudes. … Related articles All 2 versions

Implementation and evaluation of an HMM-based speech generation component for the SVOX TTS system S Würgler – tik.ee.ethz.ch … This is described more detailed in section 1.2.2. It is possible to have a quick look at the client GUI without installing the whole TTS system: the developers have put online at http://mary.dfki. de:59125/ a MARY 2http://mary.dfki.de/ Page 9. 1.2 The MARY Text-To-Speech System 8 … Related articles

Highly Realistic MPEG-4 Compliant Facial Animation with Charisma A El Rhalibi, C Carter, S Cooper… – … and Networks (ICCCN) …, 2011 – ieeexplore.ieee.org … Development and Deployment”, International Journal on Information and Communication Technologies, Vol. 2 no. 3-4 (July-Dec 2009). December 2099, pp.221-230. [21] OpenMary Text-to-Speech System. http://mary.dfki.de/ Cited by 2 Related articles All 3 versions

Comparing Virtual Patients with synthesized and natural speech A Heitz, A Dünser, P Seaton, L Seaton, A Basu – 2012 – ir.canterbury.ac.nz … [4] The MARY Text-to-Speech System. Retrieved February 28, 2012, from http://mary.dfki.de/ [5] Rogers, JL, Howard, KI, & Vessey, JT (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113(3), 553-565. Related articles All 2 versions

Survey on Speech, Machine Translation and Gestures in Ambient Assisted Living. D Anastasiou – 2011-Paris, 2011 – lodel.irevues.inist.fr Translation Careers and Technologies : Convergence Points for the Future. Cited by 3 Related articles All 5 versions

Identification of interactivity sequences in interactions with spoken dialog systems S Schmidt, KP Engelbrecht, M Schulz… – Proceedings of the …, 2010 – researchgate.net … The results might be used to construct appropriate user groups. In the interaction between a participant and INSPIRE, we 3MARY Text-to-Speech System, see: http://mary.dfki.de Page 4. try to discover the practices well-learned by acting with other technologies. … Cited by 6 Related articles

The Charismatic Computer: Examining How Expressive Interfaces Influence Beliefs D Lam – courses.ece.ubc.ca … Retrieved March 21 2014: https://vhtoolkit.ict.usc.edu/ [9] MARY Text-To-Speech. Retrieved March 28 2014: http://mary.dfki.de/ [10] Hill, G. Why I’m a weekday vegetarian. Retrieved March 31 2014: http://www.ted.com/talks/graham_hill_weekday_vegetarian Related articles

CineCubes: cubes as movie stars with little effort D Gkesoulis, P Vassiliadis – … of the sixteenth international workshop on …, 2013 – dl.acm.org Page 1. CineCubes: Cubes As Movie Stars with Little Effort Dimitrios Gkesoulis Panos Vassiliadis Dept. of Computer Science, Univ. of Ioannina Ioannina, 45110, Hellas {dgesouli, pvassil}@cs.uoi.gr ABSTRACT In this paper … Cited by 2 Related articles All 6 versions

Supporting Jogging at an Even Pace by Synchronizing Music Playback Speed with Runner’s Pace T Kitahara, S Hokari, T Nagayasu – IEICE TRANSACTIONS on …, 2015 – search.ieice.org … Tech. J., vol.45, pp.1492–1509, 1966. [14] “Mary text to speech,” http://mary.dfki.de/ [15] M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka, “RWC mu- sic database: Music genre database and musical instrument sound database,” Proc. ISMIR, pp.229–230, 2003. Related articles All 4 versions

Adaptive human–robot interaction in sensorimotor task instruction: From human to robot dance tutors R Ros, I Baroni, Y Demiris – Robotics and Autonomous Systems, 2014 – Elsevier We explore the potential for humanoid robots to interact with children in a dance activity. In this context, the robot plays the role of an instructor to guide. Cited by 11 Related articles All 5 versions

Creating Emotional Speech for Conversational Agents AT Do, S King – Digital Media and Digital Content …, 2011 – ieeexplore.ieee.org … In IEEE Transactios on Audio, Speech and Language Processing (July 2006). [6] FR DIE WISSENSGESELLSCHAFT, IL The mary text-to-speech system. http://mary.dfki.de/Download. [7] GARY BRADSKI, AK Learning OpenCV Computer Vision with the OpenCV Library. … Related articles All 5 versions

Exploring inter-and intra-speaker variability in multi-modal task descriptions S Schreitter, B Krenn – … , 2014 RO-MAN: The 23rd IEEE …, 2014 – ieeexplore.ieee.org … Their mother tongue was German. Twelve male and four female 3http://mary.dfki.de/ 44 Page 3. teachers with an average age of 27.19 explained the task to a human learner. One teacher in the HH setting had to conduct the task twice because she forgot how to mount the tube. … Cited by 1 Related articles

Using Wizard of Oz to Collect Interaction Data for Voice Controlled Home Care and Communication Services S Schlögl, G Chollet, P Milhorat, J Deslis… – Proceedings of the …, 2013 – researchgate.net … of domain-specific spoken dialogue system vAssist aims to offer. 16http://mary.dfki. de/ 17http://www.acapela-group.com/text-to-speech-interactive-demo.html 6 Conclusion This paper has reported on work in progress that aims … Cited by 3 Related articles All 5 versions

Synthesis of listener vocalizations: towards interactive speech synthesis SC Pammi – 2012 – scidok.sulb.uni-saarland.de Page 1. Synthesis of Listener Vocalizations Towards Interactive Speech Synthesis Dissertation zur Erlangung des Grades des Doktors der Ingenieurwissenschaften der Naturwissenschaftlich- Technischen Fakultäten der Universität des Saarlandes vorgelegt von Sathish Pammi … Cited by 1 Related articles All 2 versions

MARY TTS HMMbased voices for the Blizzard Challenge 2012 M Charfuelan – Blizzard Challenge Workshop 2012, 2012 – festvox.org … into MARY TTS and has additional possibili- ties like: • support for explicit prosody specification using the “prosody” element of the Speech Synthesis Markup Lan- guage (SSML) [8]. Examples of adjusting speech rate 1http://mary.dfki.de 2https://github.com/marytts/marytts … Cited by 3 Related articles All 4 versions

A Framework for Human-like Behavior in an immersive virtual world F Kuijk, S Van Broeck, C Dareau… – … (DSP), 2013 18th …, 2013 – ieeexplore.ieee.org … 529-536. ACM, 2003 7 http://mary.dfki.de/ Page 7. [5] Vinayagamoorthy, V., Garau, M., Steed, A. and Slater, M. An eye gaze model for dyadic interaction in an immersive virtual environment: Practice and experience. In Computer Graphics Forum, vol. 23, no. 1, pp. 1-11. … Cited by 2 Related articles All 12 versions

Expressive speech synthesis in MARY TTS using audiobook data and emotionML. M Charfuelan, I Steiner – INTERSPEECH, 2013 – www-old.coli.uni-saarland.de Page 1. Expressive speech synthesis in MARY TTS using audiobook data and EmotionML Marcela Charfuelan, Ingmar Steiner DFKI GmbH, Language Technology Lab Berlin and Saarbrücken, Germany firstname.lastname@dfki.de Abstract … Cited by 7 Related articles All 5 versions

Sensor Data and Speech MM Richter, RO Weber – Case-Based Reasoning, 2013 – Springer … It runs on multiple platforms offering black box text to speech. Another speech synthesis system is Mary (http://?mary.?dfki.?de). It is a continuation of Festival and supports English, German and Tibetan. 19.9 Chapter Summary. … Related articles

Multilingual Voice Creation Toolkit for the MARY TTS Platform. S Pammi, M Charfuelan, M Schröder – LREC, 2010 – dfki.de … In Section 4 the strategy for quality control of phonetic labelling is explained. The experience with the toolkit is presented in Section 5 and conclusions are made in Section 6. 1http://mary.dfki.de/ 2. Open source voice creation toolkits … Cited by 16 Related articles All 9 versions

MAT: a tool for L2 pronunciation errors annotation R Ai, M Charfuelan – Proc. of LREC, Reykjavik, Iceland, 2014 – lrec-conf.org … 1http://festvox.org 2https://github.com/marytts 3http://sprinter.dfki.de/ … _____

Synthesis of listener vocalisations with imposed intonation contours. S Pammi, M Schröder, M Charfuelan, O Türk, I Steiner – SSW, 2010 – researchgate.net … The target unit is also used to select a suitable intona- tion contour, which is then imposed onto the selected unit. The approach is implemented in our unit selection synthesis frame- work MARY (http://mary.dfki.de). Page 2. Figure 1: Overview of the approach 2.1. … Cited by 6 Related articles All 12 versions

Natural-Language-Based Conversion of Images to Mobile Multimedia Experiences B Reiterer, C Concolato, H Hellwagner – User Centric Media, 2010 – Springer … text. From the script, an SVG4 animation is generated. In the finalization steps, which, unless configured otherwise, delegate to software of GPAC [4] and FFmpeg5, 3 http://mary.dfki.de/, accessed on 28 September 2009. 4 http … Cited by 2 Related articles All 8 versions

Bridging the gap between social animal and unsocial machine: A survey of social signal processing A Vinciarelli, M Pantic, D Heylen… – Affective Computing, …, 2012 – ieeexplore.ieee.org Page 1. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 1 Bridging the Gap Between Social Animal and Unsocial Machine: A Survey of Social Signal Processing Alessandro Vinciarelli, Maja Pantic, Dirk Heylen, Catherine … Cited by 141 Related articles All 22 versions

MARY TTS unit selection and HMM-based voices for the Blizzard Challenge 2013 M Charfuelan, S Pammi, I Steiner – Proc. of the Blizzard …, 2013 – coli.uni-saarland.de … Available: https://github.com/marytts/marytts [2] Z.-H. Ling and R.-H. Wang, “HMM-based hierarchical unit selec- tion combining Kullback-Leibler divergence with likelihood cri- terion,” in IEEE International Conference on Acoustics, Speech Page 5. … Cited by 2 Related articles All 4 versions

CineCubes: Aiding data workers gain insights from OLAP queries D Gkesoulis, P Vassiliadis, P Manousis – Information Systems, 2015 – Elsevier In this paper we demonstrate that it is possible to enrich query answering with a short data movie that gives insights to the original results of an OLAP query. Cited by 1 Related articles All 2 versions

Phonological and metrical variation across genres A Anttila, R Heuser – stress, 2015 – stanford.edu … Page 3. 2015 Annual Meeting on Phonology, Vancouver, October 9–11, 2015 3 (16) Phonological annotation: English from the CMU Dictionary (Weide 1998) and OpenMary (http://mary.dfki.de/), Finnish from a Prosodic module written by Josh Falk. i P:’aÉª S:P W:H …

Application of EmotionML F Burkhardt, C Becker-Asano, E Begoli, R Cowie… – httpd.coli.uni-saarland.de … It utilizes EmotionML as an exchange format to import and export emotionally annotated speech data. 10http://mary.dfki.de/ and https://github.com/marytts 11https://github.com/dtag-dbu/speechalyzer 4 Page 5. 4. Conclusions … Related articles All 7 versions

Evaluating the Meaning of Synthesized Listener Vocalizations. S Pammi, M Schröder – INTERSPEECH, 2011 – researchgate.net … The target unit is also used to select a suitable intona- tion contour, which is then imposed onto the selected unit. The approach is implemented in our unit selection synthesis frame- work MARY (http://mary.dfki.de). Figure 1: Overview of the approach 3.1. … Cited by 3 Related articles All 6 versions

A system for facial expression-based affective speech translation Z Ahmed, I Steiner, É Székely… – Proceedings of the …, 2013 – dl.acm.org … International Journal of Speech Technology 6, 4 (2003), 365–377. http://mary.dfki.de/. 6. Éva Székely, Ahmed, Z., Steiner, I., and Carson-Berndsen, J. Facial expression as an input annotation modality for affective speech-to-speech translation. … Cited by 2 Related articles All 5 versions

The hybrid Agent MARCO N Riesterer, C Becker Asano, J Hué… – Proceedings of the 16th …, 2014 – dl.acm.org … International Journal of Human-Computer Studies, 69(7–8):483–495, 2011. [10] M. Schröder. OpenMARY sources. https://github.com/marytts/marytts, 2013. [11] H. Vilhjálmsson, et. al. The behavior markup language: Recent developments and challenges. … Cited by 1 Related articles All 3 versions

Designing Language Technology Applications: A Wizard of Oz Driven Prototyping Framework S Schlögl, P Milhorat, G Chollet, J Boudy, T SudParis – EACL 2014, 2014 – aclweb.org Page 99. Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 85–88, Gothenburg, Sweden, April 26-30 2014. cO2014 Association for Computational … Related articles All 7 versions

Emotionally expressive song synthesis using formants and syllables D Gramfors, A Johansson – 2014 – diva-portal.org … The acoustic parameters that are relevant in singing, such as pitch and du- ration, can be altered using MaryXML, and this process can be conducted 29http://mary.dfki.de, 04-03-2014 15 Page 16. … The 30http://github.com/marytts/marytts, 04-03-2014 16 Page 17. … Related articles All 2 versions

FAU IISAH Corpus–A German Speech Database Consisting of Human-Machine and Human-Human Interaction Acquired by Close-Talking and Far-Distance … W Spiegl, K Riedhammer, S Steidl, E Nöth – LREC, 2010 – hnk.ffzg.hr … As underlying TTS the software MaryTTS (http://mary.dfki.de/) was used. 3. Description of the FAU IISAH Corpus The name FAU IISAH Corpus is an acronym for Friedrich-Alexander- University – Interaction in the Intelli- gent, Senior-Adapted House. 3.1. … Cited by 5 Related articles All 8 versions

[BOOK] Reinforcement learning for adaptive dialogue systems: a data-driven methodology for dialogue management and natural language generation V Rieser, O Lemon – 2011 – books.google.com Page 1. Theory and Applications of Natural Language Processing Monographs Werena Rieser Oliver Lemon Reinforcement Learning for Adaptive Dialogue Systems A Data-driven Methodology for Dialogue Management and Natural Language Generation 2) Springer Page 2. … Cited by 47 Related articles All 7 versions

Message oriented middleware for flexible wizard of oz experiments in hci M Otto, R Friesen, D Rösner – Human-Computer Interaction. Design and …, 2011 – Springer … The visual user interface is written in JavaFX2. 5.3 Speech Component The Speech Component receives messages send from the wizard control com- ponent and sends the messages to a MARY TTS3 server to synthesise these 2 http://javafx.com/ 3 http://mary.dfki.de/ Page 7. … Cited by 6 Related articles All 4 versions

[BOOK] Computational model of listener behavior for embodied conversational agents E Bevacqua – 2010 – books.google.com Page 1. Computational Model of Listener Behavior for Embodied Conversational Agents Elisabetta Bevacqua Page 2. Computational Model of Listener Behavior for Embodied Conversational Agents Elisabetta Bevacqua DISSERTATION.COM Boca Raton Page 3. … Cited by 8 Related articles All 4 versions

Spoofing Detection with DNN and One-class SVM for the ASVspoof 2015 Challenge J Villalba, A Miguel, A Ortega… – … Annual Conference of …, 2015 – spoofingchallenge.org … transformation [22]. • S10: SS with MaryTTS2 training models with 40 utter- ances per target speaker. 1http://www.festvox.org 2http://mary.dfki.de Page 3. Table 1: EER(%) for different systems and attack types in development set. The …

ASVspoof 2015: the First Automatic Speaker Verification Spoofing and Countermeasures Challenge Z Wu, T Kinnunen, N Evans, J Yamagishi, C Hanilçi… – Training – spoofingchallenge.org … For compatibility with NIST speaker recognition evaluations, we assume that the positive 5http://mary.dfki.de/ Table 3: PLDA ASV system performance. Results illustrated for the baseline and the same system when subjected to spoofing (S1-S10). EER=Equal Error Rate. … Cited by 10 Related articles

Constructive Feedback, Thinking Process and Cooperation: Assessing the Quality of Classroom Interaction T Sousa, L Flekova, M Mieskes… – … Annual Conference of …, 2015 – ukp.tu-darmstadt.de … speaker. (2)List (German) available on our website. (3)See also www.liwc.net. (4)See also mary.dfki.de Semantic (Se) features are mainly based on the Ger- man version of the Linguistic Inquiry and Word Count utility (LIWC). The …

Methodologies in the digital humanities for analyzing aural patterns in texts T Clement – Proceedings of the 2012 iConference, 2012 – dl.acm.org … 2.3 A fragmentary theory of intelligent reasoning: the OpenMary data model for text-to-speech conversion Theories in aurality and research in phonetic symbolism underpin our choice to use OpenMary (http://mary.dfki.de/) to analyze the aurality of texts. … Cited by 2 Related articles

e-Turist: Electronic Mobile Tourist Guide I Jurin?i?, A Gosar, M Luštrek, B Kaluža, S Kerma… – researchgate.net … Najdi svoje mesto – Find your city. Bit Planota: Podeželski elektronski vodi? – Rural electronic guide, http://www.fundacija- bitplanota.si/p1-projekti/pev.html DFKI, Modular Architecture for Research on speech sYnthesis (MARY), http://mary.dfki.de/ Dines, S. (2010). … Related articles All 2 versions

An Integrated Architecture for Multiagent Virtual Worlds for Performing Adaptive Testing Games SVFLG McClure, R Heller – 2012 – io.acad.athabascau.ca … artifacts. A MaryTTS server is added to the system to realize speech synthesis function with Open MARY TTS which is an open-source, multilingual Text-to-Speech Synthesis platform written in Java (http://mary.dfki.de/). One … Related articles All 2 versions

Basic tutorial tactics for virtual agents M Wißner, M Häring, G Mehlmann, R Bühling… – EC FP7, STREP …, 2010 – dare.uva.nl … These screenplays are XML-based and prompt the characters to walk around, play animations and of course speak. The characters’ utterances are displayed with speech bubbles, but can also be synthesized with the Mary Text-To-Speech System (http://mary.dfki.de/). … Cited by 3 Related articles All 5 versions

Back of the steering wheel interaction: The car braille keyer S Osswald, A Meschtscherjakov, N Mirnig… – Ambient …, 2012 – Springer … 5 the whole code used is shown. To enter one of this characters the driver has to press the buttons according to the dot pattern. For example, in order to enter the letter C the top left and the top right button need to be pressed. 4 http://mary.dfki.de/ Page 8. 56 S. Osswald et al. Fig. … Cited by 1 Related articles All 5 versions

Human vs Machine Spoofing Detection on Wideband and Narrowband Data M Wester, Z Wu, J Yamagishi – … Annual Conference of the …, 2015 – homepages.inf.ed.ac.uk … A higher acceptance rate indicates that the artificial system (SS or VC) is recognised more as the target speaker, ie, it gives an indication of how well the artificial system imitates the target, or in other words, how similar the SS or VC system is to the target. 3http://mary.dfki.de/ …

Emotional Prosodic Model Evaluation for Greek Expressive Text-to-Speech Synthesis D Tsonos, P Stavropoulou, G Kouroupetroglou… – Universal Access in …, 2014 – Springer … Behavior Research Methods 42(1), 74–81 (2010) 23. OpenMARY, Emotion-to-Mary XSL, http://mary.dfki.de/lib/emotion-to-mary.xsl/view 24. James, A., Russell, JA, Mehrabian, A.: Evidence for a three-factor theory of emotions. … Related articles All 2 versions

Distant Listening to Gertrude Stein’s ‘Melanctha’: Using Similarity Analysis in a Discovery Paradigm to Analyze Prosody and Author Influence T Clement, D Tcheng, L Auvil, B Capitanu… – Literary and linguistic …, 2013 – ALLC … (2013), this module uses the OpenMary text-to-speech software. Attributes are explained in detail in the OpenMary Documentation, specifically at http://mary.dfki.de/documentation/module- architecture/. The features can be used one at a time or in combination. … Cited by 1 Related articles All 7 versions

Partial Representations Improve the Prosody of Incremental Speech Synthesis T Baumann – Fifteenth Annual Conference of the International …, 2014 – 193.6.4.39 … InproTK is free and open-source software and is available at http://inprotk.sf.net. MaryTTS, which forms the basis of the present work, is avail- able at http://mary.dfki.de. Example audio files for the various lookahead conditions are included with the proceedings. … Cited by 3 Related articles All 6 versions

The AHOLAB RPS SSD Spoofing Challenge 2015 Submission J Sanchez, I Saratxaga, I Hernaez… – … Conference of the …, 2015 – spoofingchallenge.org … 8, no. 2, pp. 184–194, Apr. 2014. [26] D. Erro, I. Sainz, E. Navas, and I. Hernáez, “Improved HNM-Based Vocoder for Statistical Synthesizers.,” in Interspeech, 2011, pp. 1809 – 1812. [27] “MaryTTS – Introduction.” [Online]. Available: http://mary.dfki.de/. [Accessed: 09-Mar-2015].

Greta: Towards an interactive conversational virtual companion E Bevacqua, K Prepin, R Niewiadomski… – … : perspectives on the …, 2010 – researchgate.net … Our ECA system has been used to build an interactive listening agent (Bevacqua et al., 2008). In the SEMAINE project6, we are developing a Sensitive Artificial 5 MARY text-to-speech system developed by March Schröder, DFKI. http://mary.dfki.de/ Page 11. 11 … Cited by 20 Related articles All 2 versions

HMM-based sCost quality control for unit selection speech synthesis S Pammi, M Charfuelan – ISCA Speech Synthesis Workshop, 2013 – ssw8.talp.cat … ICASSP-96. Con- ference Proceedings., 1996 IEEE International Conference on, vol. 1. IEEE, 1996, pp. 373–376. [9] MARY TTS, “VoiceImportTools Tutorial,” https://github.com/marytts/marytts/wiki/VoiceImportToolsTutorial, 2012. … Cited by 1 Related articles All 4 versions

Combining Evidences from Mel Cepstral, Cochlear Filter Cepstral and Instantaneous Frequency Features for Detection of Natural vs. Spoofed Speech TB Patel, HA Patil – Sixteenth Annual Conference of the …, 2015 – spoofingchallenge.org Page 1. Combining Evidences from Mel Cepstral, Cochlear Filter Cepstral and Instantaneous Frequency Features for Detection of Natural vs. Spoofed Speech Tanvina B. Patel and Hemant A. Patil Dhirubhai Ambani Institute …

Situated Dialogue for Speaking Robots EA Ribeiro – 2012 – fenix.tecnico.ulisboa.pt Page 1. Situated Dialogue for Speaking Robots Eugénio Alves Ribeiro Dissertation submitted to obtain the Master Degree in Information Systems and Computer Engineering Jury President: Doctor José Carlos Alves Pereira … Related articles All 2 versions

Improving Domiciliary Robotic Services by Integrating the ASTRO Robot in an AmI Infrastructure F Cavallo, M Aquilano, M Bonaccorsi… – Gearing Up and …, 2014 – Springer … 1572–1577 (1998) 19. Simon Listens official web page, http://simon-listens.org 20. Modular Architecture for Research on speech sYnthesis (MARY) official web page, http://mary.dfki.de 21. D-Bus web page, http://www.freedesktop.org/wiki/Software/dbus 22. … Cited by 3 Related articles All 4 versions

Embodiment, emotion, and chess: A system description C Becker-Asano, N Riesterer, J Hué, B Nebel – cs.kent.ac.uk … International Journal of Human-Computer Studies, 69(7– 8):483–495, 2011. [19] M. Schröder. OpenMARY sources. https://github.com/marytts/marytts, April 2013. [20] P. Sweetser and P. Wyeth. Gameflow: a model for evaluating player en- joyment in games. … Related articles

The influence of a robot’s voice on proxemics in human-robot interaction R Hoegen – hmi.ewi.utwente.nl … Interactive Communication, RO-MAN (2008). 707- 712. DOI=http://dx.doi.org/10.1109/ ROMAN.2008.4600750 [12] Magabot: Computer on Wheels: http://magabot.cc [13] The MARY Text-to-Speech System – DFKI: http://mary.dfki.de/ Related articles All 2 versions

Spoken language processing in a conversational system for child-robot interaction. I Kruijff-Korbayová, H Cuayáhuitl, B Kiefer, M Schröder… – WOCCI, 2012 – macs.hw.ac.uk Page 1. Spoken Language Processing in a Conversational System for Child-Robot Interaction Ivana Kruijff-Korbayová1, Heriberto Cuayáhuitl1, Bernd Kiefer1, Marc Schröder1 Piero Cosi2, Giulio Paci2, Giacomo Sommavilla2 … Cited by 15 Related articles All 6 versions

The attentive robot companion: learning spatial information from observation and verbal interaction L Ziegler – 2015 – pub.uni-bielefeld.de Page 1. The Attentive Robot Companion Learning Spatial Information from Observation and Verbal Interaction Leon Ziegler Page 2. Page 3. Declaration of Authorship According to Bielefeld University’s doctoral degree regulations …

An Italian event-based ASR-TTS system for the Nao robot P Cosi, G Paci, G Sommavilla, F Tesser… – Proceedings of the …, 2012 – researchgate.net Page 1. Atti del VIII° Convegno dell’Associazione Italiana Scienze della Voce AN ITALIAN EVENT-BASED ASR-TTS SYSTEM FOR THE NAO ROBOT Piero Cosi*, Giulio Paci*, Giacomo Sommavilla*, Fabio Tesser* Marco Nalin … Cited by 1

Towards the generation of dialogue acts in socio-affective ECAs: a corpus-based prosodic analysis R Bawden, C Clavel, F Landragin – Language Resources and Evaluation, 2015 – Springer

Speech Synthesis Z Shao – 2014 – wiki.csem.flinders.edu.au Page 1. FLINDERSE UNIVERSITY Speech Synthesis Synthesizing Speech using the AusTalk Corpus 2014/6/20 Submitted to the School of Computer Science, Engineering, and Mathematics in the Faculty of Science and Engineering … Related articles

Children’s Turn-Taking Behavior Adaptation in Multi-Session Interactions with a Humanoid Robot I Kruijff-Korbayová, I Baroni, M Nalin… – Special Issue of …, 2013 – deib.polimi.it Page 1. March 25, 2013 11:58 WSPC/INSTRUCTION FILE ijhr International Journal of Humanoid Robotics c World Scientific Publishing Company Children’s Turn-Taking Behavior Adaptation in Multi-Session Interactions with a Humanoid Robot … Cited by 3 Related articles All 3 versions

Carrot and stick 2.0: The benefits of natural and motivational prosody in computer-assisted learning S Wolff, A Brechmann – Computers in Human Behavior, 2015 – Elsevier For acquiring new skills or knowledge, contemporary learners frequently rely on the help of educational technologies supplementing human teachers as a learning. Cited by 2 Related articles All 2 versions

Speech-based Recommender Systems P Grasch – grasch.net Page 1. Peter Grasch Speech-based Recommender Systems Master’s Thesis Graz University of Technology Institute for Software Technology Supervisor: Univ.-Prof. Dipl-Ing. Dr.techn. Alexander Felfernig Graz, April 2015 Page 2. Page 3. Statutory Declaration …

Conveying emotion in robotic speech: Lessons learned J Crumpton, C Bethel – … , 2014 RO-MAN: The 23rd IEEE …, 2014 – ieeexplore.ieee.org … [Online]. Available: http://mary.dfki.de/documentation/maryxml [25] F. Burkhardt and WF Sendlmeier, “Verification of acoustical corre- lates of emotional speech using formant-synthesis,” in Proceedings: ISCA Tutorial and Research Workshop (ITRW) on Speech and Emo- tion. … Cited by 2 Related articles All 3 versions

Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques O Türk, M Schröder – Audio, Speech, and Language Processing …, 2010 – ieeexplore.ieee.org Page 1. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 5, JULY 2010 965 Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques Oytun Türk and Marc Schröder … Cited by 19 Related articles All 7 versions

Validation of vocal prosody modifications to communicate emotion in robot speech J Crumpton, CL Bethel – Collaboration Technologies and …, 2015 – ieeexplore.ieee.org Page 1. Validation of Vocal Prosody Modifications to Communicate Emotion in Robot Speech Joe Crumpton Distributed Analytics and Security Institute Mississippi State University Starkville, MS, USA joe.crumpton@msstate.edu …

Automatic Fingersign to Speech Translator P Campr, E Dikici, M Hruz, A Kindiroglu… – … Summer Workshop on …, 2010 – dare.uva.nl Page 78. ENTERFACE’10, JULY 12TH–AUGUST 6TH, AMSTERDAM, THE NETHERLANDS 69 Abstract—The aim of this project is to help the communication of two people, one hearing impaired and one visually impaired by … Cited by 4 Related articles All 3 versions

Immersive education: virtual reality in clinical audiology: a pilot study of the effectiveness of a new patient simulator program on audiology students’ performance on … SC Howland – 2012 – ir.canterbury.ac.nz Page 1. IMMERSIVE EDUCATION: VIRTUAL REALITY IN CLINICAL AUDIOLOGY A pilot study of the effectiveness of a new patient simulator program on audiology students’ performance on case history tasks Sarah Caroline Howland … Cited by 1 Related articles

Towards a Persuasive Dialog System Supporting Personal Health Management V Götzmann – 2015 – isl.anthropomatik.kit.edu Page 1. Towards a Persuasive Dialog System Supporting Personal Health Management Bachelor Thesis of Vera Götzmann at the Institute for Anthropomatics and Robotics, Interactive Systems Lab KIT Advisor: MA Maria Schmidt Reviewer: Prof. Alexander Waibel …

Exploring Social Feedback in Human-Robot Interaction During Cognitive Stress DII Berger – 2011 – aiweb.techfak.uni-bielefeld.de Page 1. Exploring Social Feedback in Human-Robot Interaction During Cognitive Stress Masterarbeit im Fach Intelligente Systeme an der Technischen Fakultät Universität Bielefeld von : Sebastian Schneider sebschne@techfak … Related articles All 3 versions

Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing C Pelachaud, I Poggi, F D’Errico, M Schröder – dcs.gla.ac.uk Page 1. Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing Alessandro Vinciarelli, Member, IEEE, Maja Pantic, Fellow, IEEE, Dirk Heylen, Catherine Pelachaud, Isabella Poggi, Francesca D’Errico, and Marc Schröder … Related articles All 2 versions