DKPro (Darmstadt Knowledge Processing)


The Darmstadt Knowledge Processing repository is a collection of software components for semantic information processing based on the Unstructured Information Management Architecture (UIMA). It includes a variety of tools and resources for natural language processing, such as software components for text annotation and analysis, and tools for building and evaluating natural language processing systems. The Darmstadt Knowledge Processing repository is open-source and is intended to be a resource for researchers and developers working in the field of natural language processing. It includes components for tasks such as text classification, information extraction, and part-of-speech tagging, and is designed to be easily extensible and customizable. The goal of the Darmstadt Knowledge Processing repository is to provide a comprehensive collection of tools and resources for natural language processing that can be used by researchers and developers to build and evaluate systems for processing and understanding human language.

The Darmstadt Knowledge Processing repository (DKPro) is a collection of open-source software components and tools for natural language processing (NLP) that can be used to build and evaluate story understanding systems. DKPro includes a variety of resources and tools that can be used to analyze and understand stories, such as software components for text annotation and analysis, tools for building and evaluating NLP systems, and resources for tasks such as information extraction, text classification, and part-of-speech tagging.

One way that DKPro can be useful in story understanding systems is by providing a set of standard tools and resources that can be used to build and evaluate NLP systems. This can help to reduce the amount of time and effort required to develop and test new systems, and can provide a common basis for comparing the performance of different approaches.

Another way that DKPro can be useful in story understanding systems is by providing access to a wide range of resources and tools that can be used to analyze and understand stories. These resources can include annotated text corpora, lexical resources, and tools for tasks such as entity recognition, event extraction, and sentiment analysis. By providing access to these resources, DKPro can help to support the development of more sophisticated and accurate story understanding systems.

  • Story understanding refers to the ability to comprehend and make sense of a narrative or story, whether it is told in written, spoken, or visual form. It involves the interpretation of the plot, characters, setting, and themes of a story, as well as the relationships between these elements. Story understanding is a complex process that requires the integration of multiple sources of information and the ability to draw inferences and make connections between different parts of the story.
  • Story understanding system is a machine or computer program that is designed to analyze and understand stories. This can involve the automatic extraction of information from stories, such as character names, plot summaries, or themes, as well as the identification of relationships between different elements of the story. Story understanding systems may be used to analyze large volumes of text or video data, and can be used to support tasks such as information retrieval, text classification, or language translation.


  • clarino .. language analysis portal
  • .. projects focusing on re-usable natural language processing software
  • rocstories .. story cloze test and rocstories corpora



See also:

Story Understanding Systems

The Russian Language Pipeline In The Lima Multilingual Analyzer
B VV, G de Chalendar –
… DKPro (The Darmstadt Knowledge Processing Soft- ware Repository) is a collection of software components for natural language process- ing based on the Apache UIMA framework. Both GATE and UIMA provide pipeline-based frameworks and analysis modules …

A Data-Centric Framework for Composable NLP Workflows
Z Liu, G Ding, A Bukkittu, M Gupta, P Gao… – arXiv preprint arXiv …, 2021 –
… A wealth of NLP toolkits exist (§4), such as spaCy (Honnibal and Montani, 2017), DKPro (Eckart de Castilho and Gurevych, 2014), CoreNLP (?), for pipelining multiple NLP functions; BRAT (Stenetorp et al., 2012) and YEDDA (Yang et al., 2018) for annotating certain types of …

Building web corpora for minority languages
H Jauhiainen, T Jauhiainen, K Lindén – Proceedings of the 12th Web as …, 2020 –
Page 1. Proceedings of the 12th Web as Corpus Workshop, pages 23–32 Language Resources and Evaluation Conference (LREC 2020), Marseille, 11–16 May 2020 c European Language Resources Association (ELRA), licensed under CC-BY-NC 23 …

Building the Emirati Arabic FrameNet
A Gargett, T Leung – … of the International FrameNet Workshop 2020 …, 2020 –
… There have been a variety of attempts to 4 5 … 1) Word form 2) Part-of-speech 3) All possible definitions for this lexeme 6 Page 6. 74 …

Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT
A Bompelli, G Silverman, R Finzel, J Vasilakes… – … Conference on Artificial …, 2020 – Springer
… Annotations produced by NLP-ADAPT-kube were extracted using dkpro-cassis, a software library developed by the Technische Universität Darmstadt [23] … Technische Universität Darmstadt, ubiquitous knowledge processing lab, dkpro-cassis (2019) …

From zero to hero: Human-in-the-loop entity linking in low resource domains
JC Klie, RE de Castilho, I Gurevych – … of the 58th Annual Meeting of the …, 2020 –
Page 1. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6982–6993 July 5 – 10, 2020. c 2020 Association for Computational Linguistics 6982 From Zero to Hero: Human-In-The-Loop Entity Linking in Low Resource Domains …

TEXTANNOTATOR: A UIMA based tool for the simultaneous and collaborative annotation of texts
G Abrami, M Stoeckel, A Mehler – … of The 12th Language Resources and …, 2020 –
… How- ever, MAE2 supports the computation of inter-annotator agreement (IAA) by means of DKPro (Meyer et al., 2014) … These user views are the base for our IAA implementation. We use the DKPro Agreement module (Meyer et al., 2014) to compute IAA scores …

Online similarity learning with feedback for invoice line item matching
CK Maurya, N Gantayat, S Dechu, T Horvath – arXiv preprint arXiv …, 2020 –
… DKPro [1]8: Similarity comprises a wide variety of mea- sures ranging from ones based on simple n-grams and common subsequences to high-dimensional vector com- parisons and structural, stylistic, and phonetic measures …

Exploring the Impact of Handwriting Recognition on the Automated Scoring of Handwritten Student Answers
C Gold, T Zesch – … 17th International Conference on Frontiers in …, 2020 –
… For this purpose, we used the DKPro [19] wrapper for the open-source Jazzy spellchecker.6 An important parameter of the spellchecker is the dictionary used to find correction candidates. We experiment with three different dictionaries …

Using Natural Language Preprocessing Architecture (NLPA) for Big Data Text Sources
M Novo-Loures, R Pavon, R Laza… – Scientific …, 2020 –
Journals; Publish with us; Publishing partnerships; About us; Blog. Scientific Programming. +Journal Menu. PDF. Journal overview. For authorsFor reviewersFor editorsTable of Contents Special Issues.

Prevent Low-Quality Analytics by Automatic Selection of the Best-Fitting Training Data
C Kiefer, P Reimann, B Mitschang – 2020 –
… We chose the DKPro Similarity library [35], since it is open-source, actively developed and easy to use. It is part of DKPro, a collection of … DKPro Similarity is based on the Apache UIMA framework [36] which ensures easy extensibility and scalability …

PISA reading: Mode effects unveiled in short text responses
FZ DIPF, U Kroehne, C Hahnel… – Psychological Test …, 2020 –
… Software ReCo (Zehner et al., 2016) extracted the response features with its many software com ponents: DKPro Core (Gurevych et al., 2007), DKPro Similarity (Bär, Zesch, & Gurevych, 2013), JWPL (Zesch, Müller, & Gurevych, 2008), SSpace (Jurgens & Stevens, 2010 …

Cartographic Intertextuality: Reading The Narrative of Captivity and Restoration of Mrs. Mary Rowlandson with Geographic Information Systems
D Mischke – Polish Journal for American Studies, 2020 –
… A lemmatization entails the transformation of all inflected words of a text into its basic form (its lemma). This step of text preprocessing and tagging was accomplished with the DARIAH- EU DKPro Wrapper5, as well as the Sandford Named Entity Recognizer …

text2City: Räumliche Visualisierung textueller Strukturen
A Kett – 2020 –
… VR Virtual Reality POS Part-of-speech DDC Dewey decimal classification [29] DKPro Darmstadt Knowledge Processing Repository [6] [25] HUD Head-up-Display UIMA Unstructured Information Management Architecture [5] [10] XMI XML Metadata Interchange …

A survey of semantic relatedness evaluation datasets and procedures
MAH Taieb, T Zesch, MB Aouicha – Artificial Intelligence Review, 2020 – Springer
Semantic relatedness between words is a core concept in natural language processing. While countless approaches have been proposed, measuring which one wor.

The stem-ecr dataset: Grounding scientific entity references in stem scholarly content to authoritative encyclopedic and lexicographic sources
J D’Souza, A Hoppe, A Brack, MY Jaradeh… – arXiv preprint arXiv …, 2020 –
… They queried the local dumps using the DKPro JWPL tool (Zesch et al., 2008) for Wikipedia and the DKPro JWKTL tool (Meyer and Gurevych, 2012) for Wiktionary, where both tools enable optimized search through the large Wiki data volume …

GerDraCor-Coref: A Coreference Corpus for Dramatic Texts in German
J Pagel, N Reiter – Proceedings of The 12th Language Resources and …, 2020 –
… Stanford POS tagger (Toutanova et al., 2003) by the DKPro NLP project.7 In GerDraCor-Coref, pronouns are by far the most com- mon way of referring to entities (40%), followed by noun and name mentions, which are on par (17% and 16%) …

Early Requirements Traceability with Domain-Specific Taxonomies-A Pilot Experiment
M Unterkalmsteiner – 2020 IEEE 28th International …, 2020 –
… those traces on nouns. We use a basic natural language processing pipeline that consists of a segmenter, tokenizer, stemmer and part-of-speech tagger to identify nouns, using the DKPro framework [10]. The domain-specific …

CEREC: Causality Extraction from Requirements Artifacts
J Frattini – 2020 IEEE Seventh International Workshop on …, 2020 –
… library provides a syntactic approach to causality extraction, which is applicable to requirements engineering and can be integrated into a model-based testing pipeline visualized in Figure 1. The machine learning core of the library is realized with the DKPro framework using …

Multi-objective code reviewer recommendations: balancing expertise, availability and collaborations
S Rebai, A Amich, S Molaei, M Kessentini… – Automated Software …, 2020 – Springer
Modern Code review is one of the most critical tasks in software maintenance and evolution. A rigorous code review leads to fewer bugs and reduced overall.

Arguments as Social Good: Good Arguments in Times of Crisis
J Daxenberger, I Gurevych – 2020 –
… Boilerplate removal to clean unwanted text elements is carried out using the Apache Tika toolkit.4 The processing backbone of this pipeline uses DKPro Core (Eckart de Castilho and Gurevych 2014) for metadata con- version and sentence segmentation …

Extraction Dependency Based on Evolutionary Requirement Using Natural Language Processing
R Asyrofi, DO Siahaan, Y Priyadi – 2020 3rd International …, 2020 –
… The explanation of how to use libraries and tools is used to detect interdependency requirements [9]. This study explains how to identify dependency on requirement based on 1) Cortical, to measure similarity based on cosine metrics between texts, 2) DKpro, this library uses …

Interchange Formats for Visualization: LIF and MMIF
K Rim, K Lynch, M Verhagen, N Ide… – Proceedings of The 12th …, 2020 –
… Beyond in-platform inter- operability, the LAPPS Grid has established multi-platform interoperability between LAPPS Grid and two CLARIN platforms (Hinrichs et al., 2018) as well as several other platforms (eg, DKPro (Eckart de Castilho and Gurevych, 2014), PubAnnotation …

An ai-assisted approach for checking the completeness of privacy policies against gdpr
D Torre, S Abualhaija, M Sabetzadeh… – 2020 IEEE 28th …, 2020 –
… III. NLP and ML. Our proposed approach heavily relies on Natural Language Processing (NLP) and Machine Learn- ing (ML). For the basic NLP pipeline, we use the DKPro toolkit [9]; this toolkit has already been used in the context of RE, eg, see [10] …

Beyer, Hartmut/Münkner, Jörn/Schmidt, Katrin
T Steyer, P Sahle – DHd 2019 –
… Texte und/oder Literaturpreisträger). Alle Texte wurden mit dem DARIAH-DKPro- Wrapper (Jannidis ua 2016) 3 verarbeitet. Syntaktische Komplexitätsmaße sind typischerweise auf Satzebene definiert. Wir berechnen für jeden …

Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction
C España-Bonet, A Barrón-Cedeño… – arXiv preprint arXiv …, 2020 –
… Follow- ing Hecht and Gergle (2010), we look for the glob- ally relevant concepts and assume that a category 15 of the ten lan- guage editions from January and February 2015 16 17Most of such articles are labelled …

Automated demarcation of requirements in textual specifications: a machine learning-based approach
S Abualhaija, C Arora, M Sabetzadeh… – Empirical Software …, 2020 – Springer
A simple but important task during the analysis of a textual requirements specification is to determine which statements in the specification represent req.

Natural Language Processing Methods to Automatically Parse Eligibility Criteria in Dietary Supplements Clinical Trials
A Bompelli – 2020 –
Page 1. Natural Language Processing Methods to Automatically Parse Eligibility Criteria in Dietary Supplements Clinical Trials A Thesis SUBMITTED TO THE FACULTY OF THE UNIVERSITY OF MINNESOTA BY Anusha Bompelli …

Chinese Content Scoring: Open-Access Datasets and Features on Different Segmentation Levels
Y Ding, A Horbach, T Zesch – Proceedings of the 1st Conference of the …, 2020 –
… ESCRITO is a publicly available general-purpose scoring framework based on DKPro TC (Daxenberger et al., 2014), which uses an SVM classifier (Cortes and Vapnik, 1995) using the SMO algorithm as provided by WEKA (Witten et al., 1999) …

Code Reviewer Recommendations as a Multi-Objective Problem: Balancing Expertise, Availability and Collaborations
S Rebai, A Amich, S Molaei, M Kessentini, R Kazman –
… We used our tool to collect the data about Atomix, Tablesaw, Vavr, Takes, Dkpro-core, and Pac4j … Dkpro-core: A collection of reusable NLP tools for linguistic pre-processing, machine learning, lexical resources, etc. – Pac4j: A security engine …

Context Classification in Dialog-Based Interaction
A Wachtel, F Eurich, D Fuchß… – 2020 IEEE 14th …, 2020 –
… For classification, we use bagging in combination with seven classifiers from DKPro Similarity [5]. Four of them consider the lexical similarity of sentences: • NGram compares the n-element parts of two propositions, we use n = 2. • CosinusSimilarity forms vectors from the …

Sentiment Analysis for Review Rating Prediction in a Travel Journal
JC Cuizon, CG Agravante – … of the 4th International Conference on …, 2020 –
… WSD is done in order to identify the correct sense of the word in context [16] using DKPro WSD API, specifically the Simplified Lesk algorithm. Simplified Lesk measures the overlap between sense definitions of a word and current context …

Terminologies augmented recurrent neural network model for clinical named entity recognition
I Lerner, N Paris, X Tannier – Journal of biomedical informatics, 2020 – Elsevier
… We discarded common terms based on Wikipedia word count. The matching rules were based on the apache-UIMA framework, CoreNLP and dkPRO and allowed multiple words matching, stop words, accent normalization and case insensitive matching …

SuperMat: Construction of a linked annotated dataset from superconductors-related publications
L Foppiano, S Dieb, A Suzuki, PB de Castro… – arXiv preprint arXiv …, 2021 –
… com/kermitt2/grobid). We computed the IAA using the Java library DkPro statistics ( [30]. References [1] Bo-Christer Björk, Annikki Roos, and Mari Lauri … DKPro agreement: An open-source Java library for measuring inter-rater agreement …

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
AA Iyad, A Ahmed – Journal of Information Technology Management, 2020 –
Page 1. Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore Al-Agha Iyad* *Corresponding Author, Associate Prof., Department of Computer Science, Faculty of Information Technology, The Islamic University of Gaza, Palestine …

An Intent Taxonomy for Questions Asked in Web Search
BB Cambazoglu, L Tavakoli, F Scholer… – Proceedings of the …, 2021 –
… Platform and Agreement Measure. Labeling was performed using spreadsheet software. Editorial agreement is computed by means of Krippendorff’s ?, a chance-corrected agreement mea- sure [15], using the dkpro-statistics library [7] …

Automated Scoring of Teachers’ Pedagogical Content Knowledge-A Comparison between Human and Machine Scoring
A Wahlen, C Kuhn, O Zlatkin-Troitschanskaia… – Frontiers in …, 2020 –
… The ESCRITO for short responses, developed by Zesch and Horbach (2018), was used for the automated scoring. ESCRITO is a publicly available general-purpose scoring framework based on DKPro TC (Daxenberger et al., 2014) …

Assessing semantic similarity between concepts: A weighted?feature?based approach
SH Wasti, MJ Hussain, G Huang… – Concurrency and …, 2020 – Wiley Online Library
Summary Traditional feature?based semantic similarity (SS) approaches exploit the Wikipedia features in term of sets. They evaluate the similarity of concepts based on the commonalities among their…

TextAnnotator: A web-based annotation suite for texts
G Abrami, A Mehler, M Stoeckel – 2020 –
… 2014). “DKPro Agreement: An Open-Source Java Library for Measuring Inter-Rater Agreement.” In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations. Dublin …

Softcite dataset: A dataset of software mentions in biomedical and economic research publications
C Du, J Cohoon, P Lopez… – Journal of the Association …, 2021 – Wiley Online Library
Abstract Software contributions to academic research are relatively invisible, especially to the formalized scholarly reputation system based on bibliometrics. In this article, we introduce a gold?…

An approach for measuring semantic similarity between Wikipedia concepts using multiple inheritances
MJ Hussain, SH Wasti, G Huang, L Wei, Y Jiang… – Information Processing …, 2020 – Elsevier
JavaScript is disabled on your browser. Please enable JavaScript to use all the features on this page. Skip to main content Skip to article …

Seshat: A tool for managing and verifying annotation campaigns of audio data
H Titeux, R Riad, XN Cao, N Hamilakis… – arXiv preprint arXiv …, 2020 –
… measure and alignment. Computa- tional Linguistics, 41(3):437–479. Meyer, CM, Mieskes, M., Stab, C., and Gurevych, I. (2014). Dkpro agreement: An open-source java li- brary for measuring inter-rater agreement. In Proceed- ings …

Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements
S Ezzini, S Abualhaija, C Arora… – In Proceedings of the …, 2021 –
Page 1. Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements Saad Ezzini?§, Sallam Abualhaija?§, Chetan Arora‡?, Mehrdad Sabetzadeh†?, Lionel C. Briand?† ? SnT Centre for Security, Reliability …

Predicting software defect type using concept-based classification
S Patil, B Ravindran – Empirical Software Engineering, 2020 – Springer
Automatically predicting the defect type of a software defect from its description can significantly speed up and improve the software defect management pr.

3 Management, Sustainability, and Interoperability of Linguistic Annotations
N Ide – Development of Linguistic Linked Open Data …, 2020 –
… 15. An ad hoc mechanism to connect annotations on different graphs was later introduced into the AG model to accommodate hierarchical relations. 16. http://www. ukp. tu-darmstadt. de/research/current-projects/dkpro/. 17. http://lappsgrid. org. 18 …

How Does Refactoring Impact Security When Improving Quality? A Security-Aware Refactoring Approach
C Abid, M Kessentini, V Alizadeh… – IEEE Transactions …, 2020 –
Page 1. 0098-5589 (c) 2020 IEEE. Personal use is permitted, but republication/ redistribution requires IEEE permission. See publications_standards/publications/rights/index.html for more information. This …

How Does Refactoring Impact Security When Improving Quality? A Security Aware Refactoring
C Abid, M Kessentini, V Alizadeh, M Dhaouadi… – 2020 –
Page 1. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 How Does Refactoring Impact Security When Improving Quality? A Security Aware Refactoring Chaima Abid, Marouane Kessentini, Vahid Alizadeh, Mouna Dhaouadi and Rick Kazman …

European language grid: An overview
G Rehm, M Berger, E Elsholz, S Hegele… – arXiv preprint arXiv …, 2020 –
Page 1. European Language Grid: An Overview Georg Rehm1, Maria Berger1, Ela Elsholz1, Stefanie Hegele1, Florian Kintzel1, Katrin Marheinecke1, Stelios Piperidis2, Miltos Deligiannis2, Dimitrios Galanis2, Katerina Gkirtzou2 …

A compression based toolkit for text processing
WJ Teahan – 2020 –
Page 1. P R IF Y S G O L B A N G O R / B A N G O R U N IV E R S IT Y A compression based toolkit for text processing Teahan, William Published: 10/04/2018 Cyswllt i’r cyhoeddiad / Link to publication Dyfyniad o’r fersiwn a gyhoeddwyd …