[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/course/cs143/kbhsu_cs143_apr242025.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "SU-CS143 APR242025" source: https://www.jemoka.com/posts/kbhsu_cs143_apr242025/ date: 2025-04-24 --- Bottom-Up Parsing Bottom-Up Parsing is more general than top-down parsing (i.e., it doesn’t require left factorization, etc.) This is generally the preferred method. Key insight: we reduce strings into increasingly higher level symbols Sketch Because of the Properties of Bottom-Up Parsing, we can…. initialization split the string into two pieces right substring has terminals, not yet examined left substring has terminals and non-terminals We begin by splitting to the left since we begin with no non-terminals. x1 x2 x3 …. xn moves Shift / Reduce shift: move | one place to the right — shifts a terminal to the left string reduce: apply an inverse production at the right end of the left string (i.e., take a substring closest to the bar, and reduce it) Think: left string is a stack. top: is the |; push: the “shift” operation to push a terminal onto the stack; reduce: pop something onto the stack. conflict When Shift/Reduce are both legal, that’s a “shift reduce” conflict; if multiple reductions are possible, that’s a “reduce reduce” conflict. deciding moves handle We want to reduce only if the result can still be reduced to the starting symbol. A handle is a single production in a legal sequence coming from \(S\) which can be reduced into a single nonterminal. Consider: \begin{equation} S \to^{*} \alpha X \omega \to \alpha \beta \omega \end{equation} then \(X \to \beta\) in the position after \(\alpha\) is a handle of \(\alpha \beta \omega\). We reduce at handles. The handles are always top of the stack, otherwise it’d not be a rightmost derivation in reverse. SLR Simple Left-to-right Rightmost derivation is a class of CFGs for which we have simple heuristics for detecting handles. heuristics item An item is a production with “.” somewhere on the RHS, denoting a focus point. For instance, \(T \to .(E)\). The only item for \(X \to \epsilon\) is $X → .$ We also call these LR(0) items. viable prefix \(\alpha\) is a “viable prefix” if there exists \(\omega\) such that \(\alpha | \omega\) is a state of a shift-reduce parser. a viable prefix doesn’t extend beyond the right end of handle its a variable prefix because its a possible prefix of a handle as long as a parser has viable prefixes on the stack, no parsing error detecting viable prefixs We us an NFA! add a new start production \(S’ \to S\) to the grammar the NFA states are items of the grammar For item \(E \to \alpha . X \beta\) add the transition \([E \to \alpha . X \beta] \to^{x} [E \to \alpha X . \beta]\) For item \(E \to \alpha . X \beta\) and production \(X \to \gamma\), add the transition \([E \to \alpha . X \beta] \to^{\epsilon} [X \to . \gamma]\) every state is an accepting state, start state is \(S’ \to .S\) LR(0) Parsing Suppose stack contains \(\alpha\), next input is \(t\). DFA on input \(\alpha\) terminated it state \(s\). Reduce by \(X \to \beta\) if \(s\) contains item $X → β .$ Shift if \(s\) contains item \(X \to \beta .t\omega\) (that is, there’s a transition labeled \(t\) We will get reduce/reduce conflicts if one state contains $X → β .$ and $Y → ω .$; we will get shift/reduce conflict if $X → β.$ and \(Y \to \omega .t \gamma\) LR(0) to SLR Exactly the same process, but! Suppose stack contains \(\alpha\), next input is \(t\). DFA on input \(\alpha\) terminated it state \(s\). Reduce by \(X \to \beta\) if \(s\) contains item $X → β. $; and if \(t \in \text{Follow}\qty(X)\) Shift if \(s\) contains item \(X \to \beta .t\omega\) (that is, there’s a transition labeled \(t\) If there are conflicts under these rules, then you don’t have an SLR grammar. Remember to keep (symbol, DFA state) on the stack. Properties of Bottom-Up Parsing bottom-up parser traces a rightmost derivation in reverse in shift-reduce strategy, handles appear only at the top of the stack, never inside the stack for any SLR grammar, the set of viable prefixes is a regular language! Let \(\alpha \beta \omega\) be a step of bottom up parse, assume the next reduction is \(X \to \beta\), then \(\omega\) should already have been a string of terminals (since we go rightmost first) Example int * int + int | T -> int int * T + int | T -> int * T T + int | T -> int T + T | E -> T T + E | E -> T + E E Weird! It somehow starts in the middle. Generally, the parsing strategy expands non-terminals bottom-up in the rightmost first.