當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

从零写一个编译器（五）：语法分析之自动机的缺陷和改进

發(fā)布時間：2023/12/20 编程问答 21 豆豆

生活随笔收集整理的這篇文章主要介紹了从零写一个编译器（五）：语法分析之自动机的缺陷和改进小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

項目的完整代碼在 C2j-Compiler

前言

在上一篇，已經(jīng)成功的構(gòu)建了有限狀態(tài)自動機，但是這個自動機還存在兩個問題：

無法處理shift/reduce矛盾
狀態(tài)節(jié)點太多，導致自動機過大，效率較低

這一節(jié)就要解決這兩個問題

shift/reduce矛盾

看上一節(jié)那個例子的一個節(jié)點

e -> t . t -> t . * f

這時候通過狀態(tài)節(jié)點0輸入t跳轉(zhuǎn)到這個節(jié)點，但是這時候狀態(tài)機無法分清是根據(jù)推導式1做reduce還是根據(jù)推導式2做shift操作，這種情況就稱之為shift / reduce矛盾。

SLR(1)語法

在之前的LL(1)語法分析過程中，有一個FOLLOW set，也就是指的是，對某個非終結(jié)符，根據(jù)語法推導表達式構(gòu)建出的所有可以跟在該非終結(jié)符后面的終結(jié)符集合，我們稱作該非終結(jié)符的FOLLOW set.

之前的博文目錄

FOLLOW(s) = {EOI} FOLLOW(e) = {EOI, },+} FOLLOW(t) = {EOI, }, + , * } FOLLOW(f) = {EOI, }, +, * }

也就是說如果當前的輸入字符屬于e的FOLLOW SET，那么就可以根據(jù)第一個推導式做reduce操作

如果構(gòu)建的狀態(tài)機，出現(xiàn)reduce / shift矛盾的節(jié)點都可以根據(jù)上面的原則處理的話，那么這種語法，我們稱之為SLR(1)語法。

LR(1)語法

但是如果當前的輸入字符，既屬于第一個推導式的FLLOW SET，又是第二個推導式 . 右邊的符號，這樣shift /reduce矛盾就難以解決了。

當我們根據(jù)一個輸入符號來判斷是否可以進行reduce操作時，只需要判斷在我們做完了reduce操作后，當前的輸入符號是否能夠合法的跟在reduce后的非終結(jié)符的后面，也就是只要收集只要該符號能夠被reduce到退回它的節(jié)點的所有路徑的能跟在后面的終結(jié)符

這種能合法的跟在某個非終結(jié)符后面的符號集合，我們稱之為look ahead set, 它是FOLLOW set的子集。

在給出LookAhead Set的算法前要先明確兩個個概念：

First Set

對一個給定的非終結(jié)符，通過一系列語法推導后，能出現(xiàn)在推導最左端的所有終結(jié)符的集合，統(tǒng)稱為該非終結(jié)符的FIRST SET

nullable

如果一個非終結(jié)符，它可以推導出空集，那么這樣的非終結(jié)符我們稱之為nullable的非終結(jié)符

nullable在之前SyntaxProductionInit里的初始化時已經(jīng)賦值了

First Set的構(gòu)建

在前面的陳述后，為了能夠解決shift/reduce矛盾，就需要一個lookAhead Set，當然在構(gòu)建LookAhead Set前，就需要先有First Set

First Set構(gòu)建算法

如果A是一個終結(jié)符，那么FIRST(A)={A}
對于以下形式的語法推導:
s -> A a
s是非終結(jié)符，A是終結(jié)符，a 是零個或多個終結(jié)符或非終結(jié)符的組合，那么A屬于FIRST(s).
對于推導表達式：
s -> b a
s和b是非終結(jié)符，而且b不是nullable的，那么first(s) = first(b)
對于推導表達式:
s -> a1 a2 … an b
如果a1, a2 … an 是nullable 的非終結(jié)符，b是非終結(jié)符但不是nullable的，或者b是終結(jié)符，那么
first(s) 是 first(a1)… first(an) 以及first(b)的集合。

FirstSetBuilder類

First Set構(gòu)建都在FirstSetBuilder類里實現(xiàn)

這些就是用代碼將上面的邏輯實現(xiàn)而已

這時候之前在SyntaxProductionInit初始化用到的symbolMap、symbolArray兩個數(shù)據(jù)結(jié)構(gòu)終于派上用場了

public void buildFirstSets() {while (runFirstSetPass) {runFirstSetPass = false;Iterator<Symbols> it = symbolArray.iterator();while (it.hasNext()) {Symbols symbol = it.next();addSymbolFirstSet(symbol);}}ConsoleDebugColor.outlnPurple("First sets :");debugPrintAllFirstSet();ConsoleDebugColor.outlnPurple("First sets end");}private void addSymbolFirstSet(Symbols symbol) {if (Token.isTerminal(symbol.value)) {if (!symbol.firstSet.contains(symbol.value)) {symbol.firstSet.add(symbol.value);}return ;}ArrayList<int[]> productions = symbol.productions;for (int[] rightSize : productions) {if (rightSize.length == 0) {continue;}if (Token.isTerminal(rightSize[0]) && !symbol.firstSet.contains(rightSize[0])) {runFirstSetPass = true;symbol.firstSet.add(rightSize[0]);} else if (!Token.isTerminal(rightSize[0])) {int pos = 0;Symbols curSymbol;do {curSymbol = symbolMap.get(rightSize[pos]);if (!symbol.firstSet.containsAll(curSymbol.firstSet)) {runFirstSetPass = true;for (int j = 0; j < curSymbol.firstSet.size(); j++) {if (!symbol.firstSet.contains(curSymbol.firstSet.get(j))) {symbol.firstSet.add(curSymbol.firstSet.get(j));}}}pos++;} while (pos < rightSize.length && curSymbol.isNullable);}} }

LookAhead Set的算法

[S -> a .r B, C] r -> r1

r是一個非終結(jié)符，a, B是0個或多個終結(jié)符或非終結(jié)符的集合。

在自動機進入r -> r1所在的節(jié)點時，如果采取的是reduce操作，那么自動機的節(jié)點將會退回[S -> a .r B, C]這個推導式所在的節(jié)點，所以要正確的進行reduce操作就要保證當前的輸入字符，必須屬于FIRST(B)

所以推導式2的look ahead集合就是FIRST(B),如果B是空，那么2的look ahead集合就等于C, 如果B是nullable的，那么推導式2的look ahead集合就是FIRST(B) ∪ C

computeFirstSetOfBetaAndc

計算LookAhead set在每一個production的方法里

public ArrayList<Integer> computeFirstSetOfBetaAndc() {ArrayList<Integer> set = new ArrayList<>();for (int i = dotPos + 1; i < right.size(); i++) {set.add(right.get(i));}ProductionManager manager = ProductionManager.getInstance();ArrayList<Integer> firstSet = new ArrayList<>();if (set.size() > 0) {for (int i = 0; i < set.size(); i++) {ArrayList<Integer> lookAhead = manager.getFirstSetBuilder().getFirstSet(set.get(i));for (int s : lookAhead) {if (!firstSet.contains(s)) {firstSet.add(s);}}if (!manager.getFirstSetBuilder().isSymbolNullable(set.get(i))) {break;}if (i == lookAhead.size() - 1) {//beta is composed by nulleable termsfirstSet.addAll(this.lookAhead);}}} else {firstSet.addAll(lookAhead);}return firstSet; }

竟然計算了Lookahead Set，那么在計算閉包時，每個節(jié)點里的推導式都要加上LookAhead Set以便之后求語法分析表

private void makeClosure() {ConsoleDebugColor.outlnPurple("==== state begin make closure sets ====");Stack<Production> productionStack = new Stack<>();for (Production production : productions) {productionStack.push(production);}while (!productionStack.isEmpty()) {Production production = productionStack.pop();ConsoleDebugColor.outlnPurple("production on top of stack is : ");production.debugPrint();production.debugPrintBeta();if (Token.isTerminal(production.getDotSymbol())) {ConsoleDebugColor.outlnPurple("Symbol after dot is not non-terminal, ignore and process next item");continue;}int symbol = production.getDotSymbol();ArrayList<Production> closures = productionManager.getProduction(symbol);ArrayList<Integer> lookAhead = production.computeFirstSetOfBetaAndc();Iterator<Production> it = closures.iterator();while (it.hasNext()) {Production oldProduct = it.next();Production newProduct = oldProduct.cloneSelf();newProduct.addLookAheadSet(lookAhead);if (!closureSet.contains(newProduct)) { closureSet.add(newProduct);productionStack.push(newProduct); removeRedundantProduction(newProduct);} else {ConsoleDebugColor.outlnPurple("the production is already exist!");}}}debugPrintClosure();ConsoleDebugColor.outlnPurple("==== make closure sets end ===="); }

removeRedundantProduction是處理冗余的產(chǎn)生式，比如

1. [t -> . t * f, {* EOI}] 2. [t -> .t * f {EOI}]

這樣就可以認為產(chǎn)生式1可以覆蓋產(chǎn)生式2

private void removeRedundantProduction(Production product) {boolean removeHappended = true;while (removeHappended) {removeHappended = false;Iterator it = closureSet.iterator();while (it.hasNext()) {Production item = (Production) it.next();if (product.isCover(item)) {removeHappended = true;closureSet.remove(item);break;}}} }

有限狀態(tài)自動機的壓縮

到現(xiàn)在我們已經(jīng)算出了LookAhead Set，已經(jīng)可以正確的計算語法分析表了，但是還有一個問題就是，現(xiàn)在的自動機節(jié)點過多，非常影響效率，所以下面的任務就是壓縮有限狀態(tài)自動機

在我們之前構(gòu)造的LR(1)有限自動機里，如果根據(jù)C語言的推導式，應該會產(chǎn)生600多個狀態(tài)節(jié)點，但是是因為之前在構(gòu)造狀態(tài)節(jié)點時，如果相同的推導式但是它的lookAhead Sets不一樣，就認為這是兩個不一樣的產(chǎn)生式。

下面是對狀態(tài)節(jié)點的equals的重寫

@Override public boolean equals(Object obj) {return checkProductionEqual(obj, false); }public boolean checkProductionEqual(Object obj, boolean isPartial) {ProductionsStateNode node = (ProductionsStateNode) obj;if (node.productions.size() != this.productions.size()) {return false;}int equalCount = 0;for (int i = 0; i < node.productions.size(); i++) {for (int j = 0; j < this.productions.size(); j++) {if (!isPartial) {if (node.productions.get(i).equals(this.productions.get(j))) {equalCount++;break;}} else {if (node.productions.get(i).productionEquals(this.productions.get(j))) {equalCount++;break;}}}}return equalCount == node.productions.size(); }

所以對這些推導式相同但是LookAhead Sets不同的節(jié)點，就可以進行合并，以達到壓縮節(jié)點數(shù)量的目的

合并相似的節(jié)點最好的地方，自然就是在添加節(jié)點和節(jié)點之間的跳轉(zhuǎn)關系的時候了

public void addTransition(ProductionsStateNode from, ProductionsStateNode to, int on) {/* Compress the finite state machine nodes */if (isTransitionTableCompressed) {from = getAndMergeSimilarStates(from);to = getAndMergeSimilarStates(to);}HashMap<Integer, ProductionsStateNode> map = transitionMap.get(from);if (map == null) {map = new HashMap<>();}map.put(on, to);transitionMap.put(from, map); }

getAndMergeSimilarStates的邏輯也很簡單，遍歷當前的所有節(jié)點，找出相似，把編號大的合并到小的節(jié)點上

private ProductionsStateNode getAndMergeSimilarStates(ProductionsStateNode node) {Iterator<ProductionsStateNode> it = stateList.iterator();ProductionsStateNode currentNode = null, returnNode = node;while (it.hasNext()) {currentNode = it.next();if (!currentNode.equals(node) && currentNode.checkProductionEqual(node, true)) {if (currentNode.stateNum < node.stateNum) {currentNode.stateMerge(node);returnNode = currentNode;} else {node.stateMerge(currentNode);returnNode = node;}break;}}if (!compressedStateList.contains(returnNode)) {compressedStateList.add(returnNode);}return returnNode;} public void stateMerge(ProductionsStateNode node) {if (!this.productions.contains(node.productions)) {for (int i = 0; i < node.productions.size(); i++) {if (!this.productions.contains(node.productions.get(i)) && !mergedProduction.contains(node.productions.get(i))) {mergedProduction.add(node.productions.get(i));}}} }

小結(jié)

這一節(jié)的貼的代碼應該是到現(xiàn)在五篇里最多，但是主要的就是

解決shift/reduce矛盾
主要在于構(gòu)造一個lookahead sets，也就是當前的輸入符號是否能夠合法的跟在reduce后的非終結(jié)符的后面
壓縮有限狀態(tài)自動機節(jié)點
壓縮節(jié)點在于合并推導式一樣但是lookahead sets不一樣的節(jié)點

下一篇的內(nèi)容比較少，也就是可以正式構(gòu)造出語法分析表和根據(jù)表驅(qū)動的語法分析，也就代表語法分析階段的結(jié)束

另外的github博客：https://dejavudwh.cn/

轉(zhuǎn)載于:https://www.cnblogs.com/secoding/p/11369177.html

總結(jié)

以上是生活随笔為你收集整理的从零写一个编译器（五）：语法分析之自动机的缺陷和改进的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：从零写一个编译器（四）：语法分析之构造有
下一篇：从零写一个编译器（六）：语法分析之表驱动