Regular Expression Engines 2020


Notes:

Implementations of regex functionality is often called a regex engine. A regular expression is a pattern that the regular expression engine attempts to match in input text. Semi-structured and structured text are easier to be handled by regular expressions. Lex is the prototypical regular-expression-based lexer generator. Realtime rule engines overlap with inference engines and reasoning systems. 

  • NLTK regular expression chunker
  • POS-based regular expression (POS = Part Of Speech)
  • Regex based chatbot
  • Regex in Excel
  • Regex module
  • Regex tool
  • RegExp
  • Regular expression-based sentence splitter *
  • Regular expression matches in Selenium
  • Regular expression matching
  • Regular expressions for web scraping
  • Regular expressions in MySQL
  • Sentence tokenization algorithms based on regular expressions *
  • Xpath using regular expressions

Resources:

Wikipedia:

References:

See also:

100 Best MySQL Regular Expression Videos | 100 Best Pentaho Integration Videos100 Best Rules Engine Videos100 Best Text Regular Expression Videos | 100 Best UbotStudio Videos100 Best WebHarvy Videos | Rules Engine & Dialog SystemsStanford Tregex


Using Selective Memoization to Defeat Regular Expression Denial of Service (ReDoS)
JC Davis, F Servant, D Lee – cs.stonybrook.edu
… Engine implementations: In practice, form may follow function — regex features dictate regex engine algorithms. Spencer’s backtracking algorithm is used by all “PCRE” regex engines, including Java, JavaScript-V8, PHP, Python, Ruby, Perl, and .NET [1], [7]. This was an …

ESEC/FSE: G: On the Impact and Defeat of Regex DoS
JC Davis – src-m.ly4008.com
… a a a b a · b a a? a b a |b a a (a |a)* Figure 1: NFAs for the fundamental regex operations. After constructing the NFA for a regex, a regex engine resolves a match query by simulating the automaton. Most regex engines use backtracking [6] to resolve any non-determinism …

Natural Language Processing (NLP) and Text Analytics
JM Patel – Getting Structured Data from the Internet, 2020 – Springer
… at the results in Listing 4-7 that the re2 regex engine is about 7–8X faster with only about 8 seconds for going 640 iterations. df = pd.read_csv(“profile.csv”, index_col = ‘Unnamed: 0’). df.head(10). Output: Listing 4-7 Printing the comparison table for Python and re2 regex engines …

On the Impact and Defeat of Regular Expression Denial of Service
JC Davis – 2020 – vtechworks.lib.vt.edu
… I report that application refactoring is error-prone, and that regex engine replacement seems unlikely due to incom- patibilities between regex engines … In the long term, regex engine developers should modify their regex engines as a result of my findings …

Arabic Text Processing Model: Verbs Roots and Conjugation Automation
MTB Othman, MA Al-Hagery, YM El Hashemi – IEEE Access, 2020 – ieeexplore.ieee.org
… About 87% of the verbs represented in our regular expressions’ engine are detected. Moreover, the sentences are also recognized … Level 2: The lexical level, where the different parsed lex- emes are matched to the forms of Arabic words using regular expression engine …

Development of a Novel Tool for the Retrieval and Analysis of Hormone Receptor Expression Characteristics in Metastatic Breast Cancer via Data Mining on …
KP Chang, J Wang, CC Chang, YW Chu – BioMed research …, 2020 – hindawi.com
Information about the expression status of hormone receptors such as estrogen receptor (ER), progesterone receptor (PR), and Her-2 is crucial in the management and prognosis of breast cancer. Therefore, the retrieval and analysis of hormone receptor expression characteristics …

Advanced Regular Expressions
WB Rothwell, WB Rothwell – Pro Perl Programming: From Professional to …, 2020 – Springer
… Consider an atom to be those special characters in a pattern that are interpolated by the Regular Expression engine (*, +, etc … the following are not regex atoms: \t, \U, $var, and \E. These are, instead, string characters that are interpolated *before* the regex engine sees that …

LexiDB: Patterns & Methods for Corpus Linguistic Database Management
M Coole, P Rayson, J Mariani – … of The 12th Language Resources and …, 2020 – aclweb.org
… allows for a practical approach that can facilitate the use of many existing regex libraries without the need for a be- spoke regex engine … Algorithm 2 attempts to do this in a way that is agnostic of any such regular expression engine, as such as with resolving regular expressions …

LING83800: Formal languages
K Gorman, M Mandel – m.mr-pc.org
… Page 8. 5.1.1 Union Regular expression engines use several different syntactic constructions that represent unions … Regular expression engines do not usually support intersection, but the effect can be simulated by matching a string against multiple regular expressions …

HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Guided Micro-Fuzzing
W Blair, A Mambretti, S Arshad, M Weissbacher… – arXiv preprint arXiv …, 2020 – arxiv.org
… Most of this work is based on manual or static analysis that scales to real world code bases, but focuses on detecting known sources of AC vulnerabilities, such as triggering worst case performance of commonly used data structures [19], regular expression engines [32], [57], [62 …

Parsing INI Files Using Regexes and Grammars
M Lenz – Raku Fundamentals, 2020 – Springer
… You just state the pattern, and the regex engine determines for you whether a string matches the pattern or not. While implementing a regex engine is a tricky business, the basics aren’t too hard to understand … the regex engine first evaluates the .*. The . matches any character …

Regular Expressions for Fast-response COVID-19 Text Classification
IL Markov, J Liu, A Vagner – arXiv preprint arXiv:2102.09507, 2021 – arxiv.org
… Therefore, we have developed a portable way to support such comments and line breaks, compat- ible with major regex engines … Cross-platform compatibility for negative lookahead covers the popular PCRE regex engine, but not the C++ re2 library …

Robust PDF Files Forensics Using Coding Style
S Adhatarao, C Lauradoux – arXiv preprint arXiv:2103.02702, 2021 – arxiv.org
… producer tools. We have com- pared the different files to identify the pattern in each section of the PDF files. We created 192 rules in regular expression engine to iden- tify these patterns and detect the PDF producer tool. Then, we …

Visualization of diseases at risk in the COVID-19 Literature
F Wolinski – arXiv preprint arXiv:2005.00848, 2020 – arxiv.org
… ICD-11. This library implements a powerful regular expression engine, named keyword processor able to search for phrasal keywords in any text and in a single pass. In this project, 3 keyword processor instances are built: • A …

RE2C: A lexer generator based on lookahead-TDFA
U Trofimovich – Software Impacts, 2020 – Elsevier
… Support email for questions, [email protected]. 1. Introduction. Regular expression engines can be divided in two categories: run-time libraries and lexer generators. Run-time libraries perform interpretation or just-in-time compilation of regular expressions …

stringi: Fast and Portable Character String Processing in R
M Gagolewski – stringi.gagolewski.com
Page 1. stringi: Fast and Portable Character String Processing in R Marek Gagolewski Deakin University, Australia Abstract Effective processing of character strings is required at various stages of data analysis pipelines: from …

Challenging Sequential Bitstream Processing via Principled Bitwise Speculation
J Qiu, L Jiang, Z Zhao – Proceedings of the Twenty-Fifth International …, 2020 – dl.acm.org
… With speculative bitstream processing, PBS brings up to 60X speedup on a 64-core machine. To demonstrate the end-to- end benefits, we also apply PBS to a state-of-the-art regular expression engine, called icgrep [5]. Results show that, with …

Software Impacts
U Trofimovich – re2c.org
… Support email for questions re2c-general@lists.sourceforge.net 1. Introduction Regular expression engines can be divided in two categories: run- time libraries and lexer generators. Run-time libraries perform inter- pretation or just-in-time compilation of regular expressions …

Getting Structured Data from the Internet
BDP Scale, JM Patel – Springer
… 136 Extract email addresses using regex ….. 137 Re2 regex engine ….. 143 Named entity recognition (NER) …. 150 …

Towards Accelerating Intrusion Detection Operations at the Edge Network using FPGAs
Y Rebahi, F Catal, N Tcholtchev… – … Conference on Fog …, 2020 – ieeexplore.ieee.org
… These algorithms need to be redesigned in order to use regular expressions. In [22], a hardware based regular expression engine for Snort was built by transforming the PCRE opcodes generated by the PCRE compiler from Snort regular expression rules …

Financial Services Heuristic Retrieval for Operations and Payments Settlement Directorate of Banca d’Italia
M Papa, I Chatzigiannakis, A Anagnostopoulos – ichatz.me
Page 1. Universit`a degli Studi di Roma La Sapienza Faculty of Ingegneria dell’Informazione, Informatica e Statistica Master of Science in Engineering in Computer Science Master’s Degree Financial Services Heuristic Retrieval for Operations and Payments Settlement …

Advanced String Manipulation and Pattern Matching
R Wade – Advanced Analytics in Power BI with R and Python, 2020 – Springer
… it. With that being said, if you want to send \. to the regular expression engine, you need to send the string “\\.”. Now that you know of a way to identify numbers in a regular expression, let’s build the basic pattern of a SSN. The …

Dictionary-Based Data Generation for Fine-Tuning Bert for Adverbial Paraphrasing Tasks
M Carthon III – 2020 – search.proquest.com
… Page 26. 16 Regular Expressions Regular expressions are powerful text pattern recognition codes/software tools that can be very efficiently implemented by programs (so-called Regex engines) based on the theory of finite automata …

Automatic Repair of Vulnerable Regular Expressions
N Chida, T Terauchi – arXiv preprint arXiv:2010.12450, 2020 – arxiv.org
… To this end, we define a formal model of real-world regular expression engines given by a set of natural semantics deduction rules … The LTP is a property that ensures that a linear running time of a regular expression engine …

A New Approach to Fuzzy Regular Expression Parsers for Cybersecurity Logs
T Martin, A Healing, B Azvine – 2020 IEEE International …, 2020 – ieeexplore.ieee.org
… For consistency with other work (not reported here), we have used the java 9 regular expression engine docs.oracle.com/javase/ 9/docs/api/java/util/regex/Pattern. Note that the approach described in this paper is not dependent on the regexp engine …

Developing a surgical site infection surveillance system based on hospital unstructured clinical notes and text mining
ML Ciofi Degli Atti, F Pecoraro, S Piga, D Luzi… – Surgical …, 2020 – liebertpub.com
… user interface applications. It also provides an embedded regular expression engine that enabled us to implement the algorithm presented in this paper easily to capture an SSI within the explored narrative texts. In particular …

Variable Textual Syntaxes
S Sobernig – Variable Domain-specific Software Languages with …, 2020 – Springer
… 4For example, by using the Tcl command subst. 5For example, by using the built-in regular-expression engine via the Tcl command regsub. Page 5. 5.1 Internal DSL: Pattern-Based Variability Implementation Techniques 171 Expression Builder Dynamic Reception …

Inferring Temporal Compositions of Actions Using Probabilistic Automata
RS Cruz, A Cherian, B Fernando… – Proceedings of the …, 2020 – openaccess.thecvf.com
… action patterns. We formulate a framework for this task that resembles a regular expression engine in which we can perform in- ference for any compositional activity that can be de- scribed as a regular expression of primitives. 2 …

Inferring Temporal Compositions of Actions Using Probabilistic Automata
R Santa Cruz, A Cherian, B Fernando… – 2020 IEEE/CVF …, 2020 – openaccess.thecvf.com
… action patterns. We formulate a framework for this task that resembles a regular expression engine in which we can perform in- ference for any compositional activity that can be de- scribed as a regular expression of primitives. 2 …

Achieving 100Gbps Intrusion Prevention on a Single Server
Z Zhao, H Sadok, N Atre, JC Hoe, V Sekar… – … USENIX} Symposium on …, 2020 – usenix.org
… We estimate that GRAPEFRUIT [37], a state-of-the-art regular expression engine for FPGAs, would require 8MB of BRAM to statically map all the regular ex- pressions from our ruleset on the FPGA, and yet would still only keep up with a few Gbps of traffic …

Adaptive Lightweight Compression Acceleration on Hybrid CPU-FPGA System
NJ Lisa – 2020 – vbn.aau.dk
Page 1. Aalborg Universitet Adaptive Lightweight Compression Acceleration on Hybrid CPU-FPGA System Jahan Lisa, Nusrat Publication date: 2020 Document Version Publisher’s PDF, also known as Version of record Link to publication from Aalborg University …

Report of the ALCTS Cataloging and Metadata Management Section (CaMMS) Catalog Management Interest Group Meeting, American Library Association Midwinter …
DT Do, M Morgan – Technical Services Quarterly, 2020 – Taylor & Francis
… The presentation concluded with a review of open source tools to aid data preparation. One of the biggest transformation tools in Melvin’s opinion was the regular expressions engine built in the Windows.NET Framework. Another …

Compression for population genetic data through finite-state entropy
W Chen, LT Elliott – bioRxiv, 2021 – biorxiv.org
… This is a computationally efficient technique which has also found ubiquitous usage in fast string matching algorithms, such as modern regular expression engines. The speed of fse approaches Huffman coding, yet without the compression ratio issues. For …

A Corpus-Based Study of Complex Prepositions in a Non-Native English Variety
RA Adejare – Open Journal of Modern Linguistics, 2020 – scirp.org
… 3.3. The Manual Retrieval Option. Manual retrieval of the PNP-constructions was the option because none of the Regular Expression Engines such as Practical Extraction and Report Language (Perl) was within reach. Moreover …

IntelliGen: Automatic Driver Synthesis for FuzzTesting
M Zhang, J Liu, F Ma, H Zhang, Y Jiang – arXiv preprint arXiv:2103.00862, 2021 – arxiv.org
… These projects consist of image processing libraries (libjpeg), file processing libraries (libxml2, JSON), regular expression engines (pcre2), asynchronous resolver libraries (c ares), font compression and decompression libraries (woff2, libhevc, libhavc), and font shaping …

Raku Fundamentals
M Lenz – Springer
Page 1. Raku Fundamentals A Primer with Examples, Projects, and Case Studies — Second Edition — Moritz Lenz Foreword by Larry Wall, creator of Raku Page 2. Raku Fundamentals A Primer with Examples, Projects, and Case Studies Second Edition Moritz Lenz …

The effects of fiscal and tax incentives on regional innovation capability: text extraction based on python
Y Qi, W Peng, NN Xiong – Mathematics, 2020 – mdpi.com
The regulation of fiscal and tax policies is an imperative prerequisite for improving the regional innovation capability. In view of this, an attempt was made to select 31 provinces and cities in China as the research object from 2009 to 2018, to extract the fiscal and tax policy text …

Stackless Processing of Streamed Trees
C Barloy, F Murlak, C Paperman – 2021 PODS, 2021 – hal.archives-ouvertes.fr
… On a standard laptop computer, it easily reaches 20Gb/s. The Hyperscan regular expression engine reaches performance of 10Gb/s [29] … this is the case, successful vectorization of XML or JSON parsers might be more tricky than for regular expression engines: Dyck languages …

Strings
V Domkin – Programming Algorithms in Lisp, 2021 – Springer
… For instance, the Perl regex engine (PCRE) requires over 60 seconds to match a 30-character string aa..a against the pattern a? {15}a{15} (on standard hardware), while the alternative approach, which we’ll discuss next, requires just 20 microseconds—a million times faster …

Trace-SRL: A Framework for Analysis of Microlevel Processes of Self-Regulated Learning From Trace Data
J Saint, A Whitelock-Wainwright… – IEEE Transactions …, 2020 – ieeexplore.ieee.org
… model. The parser, thus, consolidates event sequences into microlevel processes. 2) SRL Eventization: As part of the eventization sequence, all relevant raw log data were passed through our REGEX engine. The eventization …

Mobile App Privacy in Software Engineering Research: A Systematic Mapping Study
F Ebrahimi, M Tushev, A Mahmoud – Information and Software Technology, 2020 – Elsevier
JavaScript is disabled on your browser. Please enable JavaScript to use all the features on this page. Skip to main content Skip to article …

Continuous Monitoring
EC Thompson – Designing a HIPAA-Compliant Security Operations …, 2020 – Springer
… This is demonstrated here during the PE log discussion. grep is a regular expression engine used to search text-based files and return values based on the search parameters. We are using it here to find our fuid in other log files and return the entry to the screen …

Novel database design for extreme scale corpus analysis
M Coole – 2021 – eprints.lancs.ac.uk
Page 1. Novel Database Design for Extreme Scale Corpus Analysis A thesis submitted to Lancaster University for the degree of Ph.D. in Computer Science Matthew Parry Coole January 2021 Page 2. Page 3. Abstract This thesis …

Multi-head monitoring of metric dynamic logic
M Raszyk, D Basin, D Traytel – International Symposium on Automated …, 2020 – Springer
We develop a monitoring algorithm for metric dynamic logic, an extension of metric temporal logic with regular expressions. The monitor computes whether a given formula is satisfied at every position…

Introduction to Common Crawl Datasets
JM Patel – Getting Structured Data from the Internet, 2020 – Springer
In this chapter, we’ll talk about an open source dataset called common crawl which is available on AWS’s registry of open data ( https://registry.opendata.aws/).

Advanced Web Crawlers
JM Patel – Getting Structured Data from the Internet, 2020 – Springer
… underlying code more efficient. In a similar vein, whenever possible, try and use the more efficient libraries written in C such as lxml, regex engines such as re2, and so on over more slower pure Python-based variants. I hope by …

IDS for logs: Towards implementing a streaming Sigma rule engine
M Kont, M Pihelgas – ccdcoe.org
Page 1. Tallinn 2020 IDS for logs: Towards implementing a streaming Sigma rule engine Markus Kont NATO CCDCOE Technology Branch Researcher Mauno Pihelgas NATO CCDCOE Technology Branch Researcher Page 2. 2 …

Creating Mini-Languages
JJ Merelo – Raku Recipes, 2020 – Springer
Grammars are a unique feature of Raku; they are a powerful way to process text with structure in it, and they can be used to create mini-languages. You can use these mini-languages for many different…

Code clone matching: A practical and effective approach to find code snippets
K Inoue, Y Miyamoto, DM German, T Ishio – arXiv preprint arXiv …, 2020 – arxiv.org
Page 1. Code Clone Matching: A Practical and Effective Approach to Find Code Snippets Katsuro Inoue Osaka University Osaka, Japan inoue@ist.osaka-u.ac.jp Yuya Miyamoto Osaka University Osaka, Japan yuy-mymt@ist.osaka-u.ac.jp …

Critiquing Antipatterns In Novice Code
LC Ureel II – 2020 – digitalcommons.mtu.edu
Page 1. Michigan Technological University Digital Commons @ Michigan Tech Dissertations, Master’s Theses and Master’s Reports 2020 Critiquing Antipatterns In Novice Code Leo C. Ureel II Michigan Technological University, ureel@mtu.edu Copyright 2020 Leo C. Ureel II …

Reassessing the locus of normalization in machine-assisted collation.
DJ Birnbaum, E Spadini – DHQ: Digital Humanities Quarterly, 2020 – search.ebscohost.com

Rogue Automation
F Maggi, M Pogliani – personeltest.ru
Page 1. In partnership with Rogue Automation Vulnerable and Malicious Code in Industrial Programming Federico Maggi Trend Micro Research Marcello Pogliani Politecnico di Milano Page 2. Rogue Automation Vulnerable and Malicious Code in Industrial Programming …

Improving Network Security with Low-Cost and Easy-to-Adopt Solutions
S Zheng – 2020 – dukespace.lib.duke.edu
Page 1. Improving Network Security with Low-Cost and Easy-to-Adopt Solutions by Shengbao Zheng Department of Computer Science Duke University Date: Approved: Xiaowei Yang, Advisor Bruce MacDowell Maggs Jeffrey S. Chase Maria Gorlatova …

C#: The Ultimate Beginner’s Guide to Learn C# Programming Step by Step
R Turner – 2020 – books.google.com
Page 1. C# THE ULTIMATE BEGINNER’S GUIDE TO LEARN C# PROGRAMMING STEP BY STEP RYAN TURNER Page 2. CONTENTS Introduction 1. What is C# 2. Detailed Overview 3. Demystifying Data Types 4. Working with Variables 5. What is Type Conversion …

C#: 3 books in 1-The Ultimate Beginners, Intermediate and Expert Guide to Master C# Programming
R Turner – 2020 – books.google.com
Page 1. © Copyright 2019 – Ryan Turner All rights reserved. The content contained within this book may not be reproduced, duplicated or transmitted without direct written permission from the author or the publisher. Under no …

Enhancing System Transparency, Trust, and Privacy with Internet Measurement
B VanderSloot – 2020 – deepblue.lib.umich.edu
Page 1. Enhancing System Transparency, Trust, and Privacy with Internet Measurement by Benjamin VanderSloot A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer …

(Visited 123 times, 1 visits today)