[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/concept/kbhregular_expression_complexity.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "regular expression (complexity)" source: https://www.jemoka.com/posts/kbhregular_expression_complexity/ --- Regular expressions express computation as a simple, logical discretion, through closure properties (such as properties of regular languages). a family of languages which satisfy closure properties of DFA regular languages, which turns out to be exactly the set of languages that DFA and NFA both recognize. “What is the complexity of describing the strings in a language?” definition Let \(\Sigma\) be an alphabet, we define the regular expressions over \(\Sigma\): for all \(\sigma \in \Sigma\), \(\sigma\) is a regexp \(\varepsilon\) (empty string) is a regexp \(\emptyset\) (empty set) is a regexp If \(R_1\), \(R_2\) are regexps, then: \(R_1 \cdot R_2\) (concat) is a regexp \(R_1 + R_2\) (plus) is a regexp \((R_1)^{*}\) (set of all subsets) Operator precidence: then . then + additional information regexp representing languages The regexp \(\sigma \in \Sigma\) represents the language \(\qty {\sigma}\), the regexp \(\varepsilon\) represents the language \(\qty {\epsilon }\), the regexp \(\emptyset\) represents the language \(\emptyset\). for \(R_1, R_2\) being regular expressions representing a particular regular language \(L_1, L_2\dots\) \((R_1 \cdot R_2)\) represents concatenation \(L_2 \cdot L_2\) in the language \((R_1 + R_2)\) represents union \(L_1 \cup L_2\) in the language \((R_1)^{*}\) represents the clean star of \(L_1\) in the language L(R) we refer \(L( R)\) to be the language that \(R\) represents. accept a string \(w \in \Sigma\) is accepted by \(R\) (“w matches R”) if \(w \in L( R)\). regular expressions are equivalent to regular languages another double containment any regexp can be an NFA given any regexp \(R\), we will construct an NFA such that \(N\) accepts exactly that in \(R\). base cases: \(R\) has length \(1\); \(R = \sigma\), \(R = \varepsilon\), and \(R = \emptyset\), each of which has a NFA (only one symbol, directly to the accept state, or no accept state at all) invariant: every regexp of length \(k\) represents some regular language induction: consider \(R\) of length \(k+1\); here are the three possibilities: \(R = R_1 + R_2\), \(R = R_1R_2\), and \(R = (R_{1})^{*}\), where all \(|R_{j}| \leq |k|\). note that this means that each of the mappings map directly to a properties of regular languages (union theorem, concatenation theorem, clean star) respectively. any NFA can be a regexp remove states and re-labeling arcs with regular expressions—meaning we can then read substrings at a time; this operation gives us a Generalized NFA— each edge can read not just a single symbol, but any string that corresponds to a regular expression proof idea: constructions add unique start and accept states for every path of length 2 that goes between the unique start and end states, pick and internal state, rip it out, and re-label arrows