Data Analytics & Natural Language Generation


Data analysis, or data analytics, is the process of examining, cleansing, transforming, and modeling data with the goal of discovering useful information. Data analysis is a key aspect of data science, and it is used in a variety of fields and industries to extract insights and knowledge from data.

There are many different approaches and techniques that can be used for data analysis, and the specific methods used depend on the nature of the data and the goals of the analysis. Some common techniques used in data analysis include:

  1. Data cleaning: This involves identifying and correcting errors and inconsistencies in the data, such as missing values or duplicates.
  2. Data transformation: This involves manipulating the data in order to make it more suitable for analysis, such as aggregating data or converting data types.
  3. Data visualization: This involves creating visualizations, such as charts and graphs, to help understand and communicate the data.
  4. Statistical analysis: This involves applying statistical techniques, such as regression analysis and hypothesis testing, to understand patterns and trends in the data.

Data analysis is often used in natural language generation (NLG) to extract insights and knowledge from data and use it to generate text. NLG is a field of artificial intelligence that focuses on the automated production of natural language text, and it is used in a variety of applications, such as chatbots, virtual assistants, and content generation.

There are many different approaches and techniques that can be used to incorporate data analysis into NLG systems, and the specific methods used depend on the nature of the data and the goals of the NLG system. Some common ways in which data analysis is used in NLG include:

  1. Extracting key insights and trends: Data analysis can be used to identify key insights and trends in the data, which can be used to generate text that summarizes and communicates these insights to the user.
  2. Identifying patterns and relationships: Data analysis can be used to identify patterns and relationships in the data, which can be used to generate text that describes and explains these patterns and relationships.
  3. Generating reports and summaries: Data analysis can be used to generate reports and summaries of the data, which can be used to generate text that presents and summarizes the key findings and insights from the data.



See also:

Natural Language Generation Pipeline

Computing with words is an implementable paradigm: fuzzy queries, linguistic data summaries, and natural-language generation
J Kacprzyk, S Zadrozny – IEEE Transactions on Fuzzy Systems, 2010 –
We point out some relevant issues that are related to the computing-with-words (CWW) paradigm and argue for an urgent need for a new, nontraditional look at the area, since the traditional approach has resulted in very valuable theoretical research results. However,

From data to text in the neonatal intensive care unit: Using NLG technology for decision support and information management
A Gatt, F Portet, E Reiter, J Hunter… – Ai …, 2009 –
Abstract Contemporary Neonatal Intensive Care Units collect vast amounts of patient data in various formats, making efficient processing of information by medical professionals difficult. Moreover, different stakeholders in the neonatal scenario, which include parents as well as

Reinforcement learning for adaptive dialogue systems: a data-driven methodology for dialogue management and natural language generation
V Rieser, O Lemon – 2011 –
The past decade has seen a revolution in the field of spoken dialogue systems. As in other areas of Computer Science and Artificial Intelligence, data-driven methods are now being used to drive new methodologies for system development and evaluation. This book is a

The importance of narrative and other lessons from an evaluation of an NLG system that summarises clinical data
E Reiter, A Gatt, F Portet… – … Language Generation …, 2008 –
Abstract The BABYTALK BT-45 system generates textual summaries of clinical data about babies in a neonatal intensive care unit. A recent task-based evaluation of the system suggested that these summaries are useful, but not as effective as they could be. In this

On the role of linguistic descriptions of data in the building of natural language generation systems
A Ramos-Soto, A Bugarín, S Barro – Fuzzy Sets and Systems, 2016 – Elsevier
Abstract This paper explores the current state of the task of generating easily understandable information from data for people using natural language, which is currently addressed by two independent research fields: the natural language generation field

Crowd-sourcing nlg data: Pictures elicit better data
J Novikova, O Lemon, V Rieser – arXiv preprint arXiv:1608.00339, 2016 –
Abstract: Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances. In this

What is in a text and what does it do: qualitative evaluations of an NLG system–the BT-Nurse–using content analysis and discourse analysis
R Sambaraju, E Reiter, R Logie, A McKinlay… – … Language Generation, 2011 –
Abstract Evaluations of NLG systems generally are quantiative, that is, based on corpus comparison statistics and/or results of experiments with people. Outcomes of such evaluations are important in demonstrating whether or not an NLG system is successful, but

Data mining via protoform based linguistic summaries: Some possible relations to natural language generation
J Kacprzyk, S Zadrozny – Computational Intelligence and Data …, 2009 –
Linguistic database summaries in the sense of Yager (1982), further extended to an implementable form by Kacprzyk & Yager (2001) and Kacprzyk, Yager & Zadrozny (2000), are extremely simple natural language like statements exemplified by, for a personnel

Computing with words and systemic functional linguistics: linguistic data summaries and natural language generation
J Kacprzyk, S Zadro?ny – Integrated Uncertainty Management and …, 2010 – Springer
Abstract We briefly consider systemic functional linguistics, notably in its natural language generation perspective. We analyze our recent works (notably Kacprzyk and Zadroÿzny [18]) in which a new relation between our recent works on linguistic data summaries, based on

Empirical methods in natural language generation: data-oriented methods and empirical evaluation
E Krahmer, M Theune – 2010 –
Natural language generation (NLG) is a subfield of natural language processing (NLP) that is often characterized as the study of automatically converting non-linguistic representations (eg, from databases or other knowledge sources) into coherent natural language text. In

Data-driven Natural Language Generation: Making Machines Talk Like Humans Using Natural Corpora
B Langner – 2010 –
Abstract With the significant improvements that have been seen in speech applications, the long-held goal of building machines that can have humanlike conversations has begun to seem more reachable; there exist spoken dialog systems which can now be used effectively

Finding middle ground? Multi-objective Natural Language Generation from time-series data
D Gkatzia, H Hastie, O Lemon – Proceedings of the 14th Conference of …, 2014 –
Abstract A Natural Language Generation (NLG) system is able to generate text from nonlinguistic data, ideally personalising the content to a user’s specific needs. In some cases, however, there are multiple stakeholders with their own individual goals, needs and

Automatic Corpus Extension for Data-driven Natural Language Generation.
E Manishina, B Jabaian, S Huet, F Lefèvre – LREC, 2016 –
Abstract As data-driven approaches started to make their way into the Natural Language Generation (NLG) domain, the need for automation of corpus building and extension became apparent. Corpus creation and extension in data-driven NLG domain traditionally

Context?Sensitive Natural Language Generation: From Knowledge?Driven to Data?Driven Techniques
N Dethlefs – Language and Linguistics Compass, 2014 – Wiley Online Library
Abstract Context-sensitive Natural Language Generation is concerned with the automatic generation of system output that is in several ways adaptive to its target audience or the situational circumstances of its production. In this article, I will provide an overview of the

Towards NLG for Physiological Data Monitoringwith Body Area Networks
H Banaee, MU Ahmed, A Loutfi – … on Natural Language Generation …, 2013 –
Abstract This position paper presents an on-going work on a natural language generation framework that is particularly tailored for summary text generation from body area networks. We present an overview of the main challenges when considering this type of sensor

Natural language generation in dialogue using lexicalized and delexicalized data
S Sharma, J He, K Suleman, H Schulz… – arXiv preprint arXiv …, 2016 –
Abstract: Natural language generation plays a critical role in any spoken dialogue system. We present a new approach to natural language generation using recurrent neural networks in an encoder-decoder framework. In contrast with previous work, our model uses both

Exploring Flexibility in Natural Language Generation Through Discursive Analysis of New Textual Genres
M Vicente, E Lloret – International Workshop on Future and Emerging …, 2016 – Springer
Abstract Since automatic language generation is a task able to enrich applications rooted in most of the language-related areas, from machine translation to interactive dialogue, it seems worthwhile to undertake a strategy focused on enhancing generation system’s

An ontology-based approach to natural language generation from coded data in electronic health records
M Arguello, J Des, MJ Fernandez-Prieto… – … and Simulation (EMS …, 2011 –
The worldwide adoption of the HL7 Clinical Document Architecture (CDA) is promoting the availability of coded data (CDA entries) within sections of clinical documents. At the moment, an increasing number of studies are investigating ways to transform the narratives of CDA

BT-Nurse: Computer generation of natural language shift summaries from complex heterogeneous medical data
J Hunter, Y Freer, A Gatt, E Reiter… – Journal of the …, 2011 –
… However, as part of a larger project, BabyTalk,3,4 we have developed a Natural Language Generation system, BT-Nurse, which automatically generates English summaries of the … Signal Analysis detects and removes artifacts from the physiological data and extracts a …

Statistical natural language generation from tabular non-textual data
J Mahapatra, SK Naskar… – … Language Generation …, 2016 –
Abstract Most of the existing natural language generation (NLG) techniques employing statistical methods are typically resource and time intensive. On the other hand, handcrafted rulebased and template-based NLG systems typically require significant human/designer

Analysis of communication of uncertainty in genetic counseling patient letters for design of a natural language generation system
NL Green – Social Semiotics, 2010 – Taylor & Francis
The GenIE (Genetics Information Expression) Assistant is a proof-of-concept computer system designed to assist healthcare providers by creating editable first drafts of genetic counseling patient letters. Due to the wide range of genetic conditions to be covered and the

A natural language generation approach to support understanding and traceability of multi-dimensional preferential sensitivity analysis in multi-criteria decision …
D Wulf, V Bertsch – Expert Systems with Applications, 2017 – Elsevier
Abstract Multi-Criteria Decision Analysis (MCDA) enables decision makers (DM) and decision analysts (DA) to analyse and understand decision situations in a structured and formalised way. With the increasing complexity of decision support systems (DSSs), it

Natural language news generation from big data
B Haarmann, L Sikorski – International Journal of Computer, Electrical …, 2015 –
Page 1. ? Abstract—In this paper, we introduce an NLG application for the automatic creation of ready-to-publish texts from big data … Next, we will shortly describe the analysis of the facts given in the input data as well as how they are selected and stored …

From Web to Web: A General Approach for Data-to-text Natural Language Generation and One Example
X Han, S Sripada – 1st W. Data-to-text Generation, 2015 –
We proposed a general approach of ac- quiring NLG knowledge from web, building a data-to-text NLG system ac- cordingly, and evaluating the perfor- mance interactively on web. One exam- ple about river information communica- tion was given to explain the

Communication Mediated through Natural Language Generation in Big Data Environments: The Case of Nomao
JS Vayre, E Delpech, A Dufresne… – Journal of Computer and …, 2017 –
Abstract Along with the development of big data, various Natural Language Generation systems (NLGs) have recently been developed by different companies. The aim of this paper is to propose a better understanding of how these systems are designed and used. We

Making Structured Data Searchable via Natural Language Generation
JL Leidner, D Kamkova – International Conference on Flexible Query …, 2013 – Springer
Abstract Relational Databases are used to store structured data, which is typically accessed using report builders based on SQL queries. To search, forms need to be understood and filled out, which demands a high cognitive load. Due to the success of Web search engines,

A repository of data and evaluation resources for natural language generation
A Belz, A Gatt – TRIAL, 2012 –
Abstract Starting in 2007, the field of natural language generation (NLG) has organised shared-task evaluation events every year, under the Generation Challenges umbrella. In the course of these shared tasks, a wealth of data has been created, along with associated task

A multilingual multi-domain data-to-text natural language generation approach
C Barros, E Lloret – Procesamiento del Lenguaje Natural, 2017 –
Resumen La investigación en enfoques multidominio innovadores y flexibles puede ser un paso significativo en el área de Generación del Lenguaje Natural. En este sentido, el objetivo de este artículo es presentar un enfoque estadístico centrado en la fase de

Data-driven broad-coverage grammars for opinionated natural language generation (ONLG)
T Cagan, SL Frank, R Tsarfaty – Proceedings of the 55th Annual Meeting …, 2017 –
Abstract Opinionated natural language generation (ONLG) is a new, challenging, NLG task in which we aim to automatically generate human-like, subjective, responses to opinionated articles online. We present a data-driven architecture for ONLG that generates subjective

Data-driven natural language generation using statistical machine translation and discriminative learning
E Manishina – 2016 –
The humanity has long been passionate about creating intellectual machines that can freely communicate with us in our language. Most modern systems communicating directly with the user share one common feature: they have a dialog system (DS) at their base. As of today

D6. 1.1: Domain-limited TTS corpus for expressive speech synthesis and Wizard-of-Oz Data for NLG Strategies
C Boidin, V Rieser, S Janarthanam, O Lemon – 2009 –
Executive summary This document is the deliverable 6.1. 1, due at month 18 of the CLASSIC project. It describes two data sets collected during the first half of the project: a domain-limited TTS corpus for expressive speech synthesis and two Wizard-of-Oz corpora (one in

Trainable NLG for Data to Portuguese-With application to a Medication Assistant
JC Pereira, A Teixeira – Linguamática, 2015 –
New equipments, such as smartphones and tablets, are changing human computer interaction. These devices present several challenges, especially due to their small screen and keyboard. In order to use text and voice in multimodal interaction, it is essential to

Sentiment analysis using sentence minimization with natural language generation (NLG)
M Likhar, SL Kasar – Intelligent Systems and Information …, 2017 –
The analysis of feeling is used to define the attitude of a writer in relation to a subject or the appropriate global polarity of a document. The proposed work is to provide a platform in order to visualize the relative analysis of feedback for some particular product. In doing so,

Research data supporting” Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems”
TH Wen, M Gasic, N Mrksic, PH Su, D Vandyke… – 2015 –
This dataset is in JSON format and contains log files of interactions between a turn-taking spoken dialogue system and Amazon Mechanical turkers, collected from our previous live trials. It includes two application domains: San Francisco restaurants and hotels, each of

Failure analysis of cracked NLG brackets
CR Kannan, M Madan, M Sujata, K Raghavendra… – 2011 –
Two cracked NLG brackets; one in finished condition and the other in semi-finished condition, were sent to the laboratory for analysis. Cracks were noticed in the components during dye penetrant inspection. It was established that the cracks were resulted during heat

User Interfaces to the Web of Data based on Natural Language Generation
B Ell – 2017 –
Abstract The core idea of the Semantic Web vision is the evolution from a Web of hyperlinked human-readable web pages, the Web of Documents, to a machine-interpretable Web of Data. Since natural language text is a suitable knowledge representation for humans

Data-driven Natural Language Generation: Paving the Road to Success
J Novikova, O Dušek, V Rieser – arXiv preprint arXiv:1706.09433, 2017 –
Abstract: We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG):(a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-

Natural language generation in the context of multimodal interaction in Portuguese: Data-to-text based in automatic translation
JC Pereira – 2017 –
To enable the interaction by text and/or speech it is essential that we devise systems capable of translating internal data into sentences or texts that can be shown on screen or heard by users. In this context, it is essential that these natural language generation (NLG)

Data-Driven Solutions to Bottlenecks in Natural Language Generation
O Biran – 2016 –
Abstract: Concept-to-text generation suffers from what can be called generation bottlenecks-aspects of the generated text which should change for different subject domains, and which are usually hard to obtain or require manual work. Some examples are domain-specific

Aircraft NLG Shimmy: Methods, Tools and Analysis
GW Davidson – 2012 –
ELGEAR was a UK government/industry partnership created in 2006 to research electrical technology in aircraft landing gear systems. The project designed, built and ground tested electrical technology for a Landing Gear Extension/Retraction system (LGERS) and Nose

BT-Nurse: Computer Generation of Natural Language Shift Summaries from Complex Heterogeneous Medical Data
J Hunter –
… 3. Signal Analysis detects and removes artifacts from the physiological data and extracts a small number of … E, Hunter JRW, Mellish C. Choosing the content of textual summaries of large time-series data sets … Harris M. Building a large-scale commercial NLG system for an EMR …

Image Generation and Analysis using Natural Language Processing: A Review
BP Nandi –
… The co- occurrence model (Mori et al.,1999), standard latent semantic analysis(LSA) and its probabilistic variant(PLSA) (Hofmann, 1998) used manual … Negative effect of noisy data will also be smoothened … NLG module for this system uses Text Planner and Text Realizer …

Where am I coming from: The reversibility of analysis and generation in natural language processing
Y Wilks – Machine Translation, 2009 – Springer
… First of all, any demonstration of this symmetry helped the case of those who were actually interested in natural language generation. In other words, if the pro- cesses of analysis and generation were symmetrical or even identical, then gener- ation would be as “interesting” as …

Natural language descriptions of visual scenes: corpus generation and analysis
MUG Khan, RMA Nawab, Y Gotoh – … of the Joint Workshop on Exploiting …, 2012 –
… for video segments crafted from TREC video data. Analysis of the descriptions created by 13 annotators presents insights into humans’ interests and thoughts on videos. Such re- source can also be used to evaluate auto- matic natural language generation systems for video …

Natural Language Descriptions of Visual Scenes: Corpus Generation and Analysis
MUGKR Muhammad, ANY Gotoh – EACL 2012, 2012 –
… for video segments crafted from TREC video data. Analysis of the descriptions created by 13 annotators presents insights into humans’ interests and thoughts on videos. Such re- source can also be used to evaluate auto- matic natural language generation systems for video …