文章目錄
- 什么是ANTLR?
- 第一個例子
- ANTLR4 的工作流程
- Lua腳本語法校驗
- 準備一個Lua Grammar文件
- maven配置
- 新建實體類
- Lua語法遍歷器
- 語法錯誤監聽器
- 單元測試
- 參考
什么是ANTLR?
https://www.antlr.org/
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.
ANTLR(ANother Tool for Language Recognition)是一個強大的解析器生成器,用于讀取、處理、執行或翻譯結構化文本或二進制文件。 它被廣泛用于構建語言、工具和框架。ANTLR 根據語法定義生成解析器,解析器可以構建和遍歷解析樹。
第一個例子
https://github.com/antlr/antlr4/blob/master/doc/getting-started.md#a-first-example
- 新建個
Hello.g4
文件:
// Define a grammar called Hello
grammar Hello;
r : 'hello' ID ; // match keyword hello followed by an identifier
ID : [a-z]+ ; // match lower-case identifiers
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
-
安裝IDEA插件
ANTLR v4:https://plugins.jetbrains.com/plugin/7358-antlr-v4 -
打開ANTLR Preview
在r : 'hello' ID ; // match keyword hello followed by an identifier
這行上右鍵,點擊Test Rule r
輸入hello world
,能夠準確識別出ID為word。
輸入hello World
,就不能夠識別出ID為world了。
ANTLR4 的工作流程
- 詞法分析器 (
Lexer
) :將字符序列轉換為單詞(Token
)的過程。詞法分析器(Lexer
)一般是用來供語法解析器(Parser
)調用的。 - 語法解析器 (
Parser
) :通常作為編譯器或解釋器出現。它的作用是進行語法檢查,并構建由輸入單詞(Token
)組成的數據結構(即抽象語法樹)。語法解析器通常使用詞法分析器(Lexer)從輸入字符流中分離出一個個的單詞(Token
),并將單詞(Token
)流作為其輸入。實際開發中,語法解析器可以手工編寫,也可以使用工具自動生成。 - 抽象語法樹 (
Parse Tree
) :是源代碼結構的一種抽象表示,它以樹的形狀表示語言的語法結構。抽象語法樹一般可以用來進行代碼語法的檢查,代碼風格的檢查,代碼的格式化,代碼的高亮,代碼的錯誤提示以及代碼的自動補全等。
如上左邊的點線流程代表了通過 ANTLR4,將原始的.g4 規則轉化為 Lexer、Parser、Listener 和 Visitor。右邊的虛線流程代表了將原始的輸入流通過 Lexer 轉化為 Tokens,再將 Tokens 通過 Parser 轉化為語法樹,最后通過 Listener 或 Visitor 遍歷 ParseTree 得到最終結果。
Lua腳本語法校驗
準備一個Lua Grammar文件
https://github.com/antlr/grammars-v4/tree/master/lua
/*
BSD LicenseCopyright (c) 2013, Kazunori Sakamoto
Copyright (c) 2016, Alexander Alexeev
All rights reserved.Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:1. Redistributions of source code must retain the above copyrightnotice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyrightnotice, this list of conditions and the following disclaimer in thedocumentation and/or other materials provided with the distribution.
3. Neither the NAME of Rainer Schuster nor the NAMEs of its contributorsmay be used to endorse or promote products derived from this softwarewithout specific prior written permission.THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.This grammar file derived from:Lua 5.3 Reference Manualhttp://www.lua.org/manual/5.3/manual.htmlLua 5.2 Reference Manualhttp://www.lua.org/manual/5.2/manual.htmlLua 5.1 grammar written by Nicolai Mainierohttp://www.antlr3.org/grammar/1178608849736/Lua.gTested by Kazunori Sakamoto with Test suite for Lua 5.2 (http://www.lua.org/tests/5.2/)Tested by Alexander Alexeev with Test suite for Lua 5.3 http://www.lua.org/tests/lua-5.3.2-tests.tar.gz
*/grammar Lua;chunk: block EOF;block: stat* retstat?;stat: ';'| varlist '=' explist| functioncall| label| 'break'| 'goto' NAME| 'do' block 'end'| 'while' exp 'do' block 'end'| 'repeat' block 'until' exp| 'if' exp 'then' block ('elseif' exp 'then' block)* ('else' block)? 'end'| 'for' NAME '=' exp ',' exp (',' exp)? 'do' block 'end'| 'for' namelist 'in' explist 'do' block 'end'| 'function' funcname funcbody| 'local' 'function' NAME funcbody| 'local' attnamelist ('=' explist)?;attnamelist: NAME attrib (',' NAME attrib)*;attrib: ('<' NAME '>')?;retstat: 'return' explist? ';'?;label: '::' NAME '::';funcname: NAME ('.' NAME)* (':' NAME)?;varlist: var_ (',' var_)*;namelist: NAME (',' NAME)*;explist: exp (',' exp)*;exp: 'nil' | 'false' | 'true'| number| string| '...'| functiondef| prefixexp| tableconstructor| <assoc=right> exp operatorPower exp| operatorUnary exp| exp operatorMulDivMod exp| exp operatorAddSub exp| <assoc=right> exp operatorStrcat exp| exp operatorComparison exp| exp operatorAnd exp| exp operatorOr exp| exp operatorBitwise exp;prefixexp: varOrExp nameAndArgs*;functioncall: varOrExp nameAndArgs+;varOrExp: var_ | '(' exp ')';var_: (NAME | '(' exp ')' varSuffix) varSuffix*;varSuffix: nameAndArgs* ('[' exp ']' | '.' NAME);nameAndArgs: (':' NAME)? args;/*
var_: NAME | prefixexp '[' exp ']' | prefixexp '.' NAME;prefixexp: var_ | functioncall | '(' exp ')';functioncall: prefixexp args | prefixexp ':' NAME args;
*/args: '(' explist? ')' | tableconstructor | string;functiondef: 'function' funcbody;funcbody: '(' parlist? ')' block 'end';parlist: namelist (',' '...')? | '...';tableconstructor: '{' fieldlist? '}';fieldlist: field (fieldsep field)* fieldsep?;field: '[' exp ']' '=' exp | NAME '=' exp | exp;fieldsep: ',' | ';';operatorOr: 'or';operatorAnd: 'and';operatorComparison: '<' | '>' | '<=' | '>=' | '~=' | '==';operatorStrcat: '..';operatorAddSub: '+' | '-';operatorMulDivMod: '*' | '/' | '%' | '//';operatorBitwise: '&' | '|' | '~' | '<<' | '>>';operatorUnary: 'not' | '#' | '-' | '~';operatorPower: '^';number: INT | HEX | FLOAT | HEX_FLOAT;string: NORMALSTRING | CHARSTRING | LONGSTRING;// LEXERNAME: [a-zA-Z_][a-zA-Z_0-9]*;NORMALSTRING: '"' ( EscapeSequence | ~('\\'|'"') )* '"';CHARSTRING: '\'' ( EscapeSequence | ~('\''|'\\') )* '\'';LONGSTRING: '[' NESTED_STR ']';fragment
NESTED_STR: '=' NESTED_STR '='| '[' .*? ']';INT: Digit+;HEX: '0' [xX] HexDigit+;FLOAT: Digit+ '.' Digit* ExponentPart?| '.' Digit+ ExponentPart?| Digit+ ExponentPart;HEX_FLOAT: '0' [xX] HexDigit+ '.' HexDigit* HexExponentPart?| '0' [xX] '.' HexDigit+ HexExponentPart?| '0' [xX] HexDigit+ HexExponentPart;fragment
ExponentPart: [eE] [+-]? Digit+;fragment
HexExponentPart: [pP] [+-]? Digit+;fragment
EscapeSequence: '\\' [abfnrtvz"'\\]| '\\' '\r'? '\n'| DecimalEscape| HexEscape| UtfEscape;fragment
DecimalEscape: '\\' Digit| '\\' Digit Digit| '\\' [0-2] Digit Digit;fragment
HexEscape: '\\' 'x' HexDigit HexDigit;fragment
UtfEscape: '\\' 'u{' HexDigit+ '}';fragment
Digit: [0-9];fragment
HexDigit: [0-9a-fA-F];COMMENT: '--[' NESTED_STR ']' -> channel(HIDDEN);LINE_COMMENT: '--'( // --| '[' '='* // --[==| '[' '='* ~('='|'['|'\r'|'\n') ~('\r'|'\n')* // --[==AA| ~('['|'\r'|'\n') ~('\r'|'\n')* // --AAA) ('\r\n'|'\r'|'\n'|EOF)-> channel(HIDDEN);WS: [ \t\u000C\r\n]+ -> skip;SHEBANG: '#' '!' ~('\n'|'\r')* -> channel(HIDDEN);
maven配置
使用JDK8的注意:antlr4最高版本為4.9.3
,原因如下:
來源:https://github.com/antlr/antlr4/releases/tag/4.10
Increasing minimum java version
Going forward, we are using Java 11 for the source code and the compiled .class files for the ANTLR tool. The Java runtime target, however, and the associated runtime tests use Java 8 (bumping up from Java 7).
<dependencies><dependency><groupId>org.antlr</groupId><artifactId>antlr4-runtime</artifactId><version>${antlr.version}</version></dependency>
</dependencies><build><plugins><plugin><groupId>org.antlr</groupId><artifactId>antlr4-maven-plugin</artifactId><version>${antlr.version}</version><configuration><visitor>true</visitor><listener>true</listener></configuration><executions><execution><goals><goal>antlr4</goal></goals></execution></executions></plugin></plugins>
</build><properties><!--https://mvnrepository.com/artifact/org.antlr/antlr4-runtime--><antlr.version>4.9.3</antlr.version><mojo.version>3.0.0</mojo.version>
</properties>
新建實體類
語法錯誤:每行有什么錯誤。
package com.baeldung.antlr.lua.model;/*** 語法錯誤** @author duhongming* @see* @since 1.0.0*/
public class SyntaxErrorEntry {private Integer lineNum;private String errorInfo;public Integer getLineNum() {return lineNum;}public void setLineNum(Integer lineNum) {this.lineNum = lineNum;}public String getErrorInfo() {return errorInfo;}public void setErrorInfo(String errorInfo) {this.errorInfo = errorInfo;}
}
語法錯誤報告:每行有什么錯誤的集合。
package com.baeldung.antlr.lua.model;import java.util.LinkedList;
import java.util.List;/*** 語法錯誤報告** @author duhongming* @see* @since 1.0.0*/
public class SyntaxErrorReportEntry {private final List<SyntaxErrorEntry> syntaxErrorList = new LinkedList<>();public void addError(int line, int charPositionInLine, Object offendingSymbol, String msg) {SyntaxErrorEntry syntaxErrorEntry = new SyntaxErrorEntry();syntaxErrorEntry.setLineNum(line);syntaxErrorEntry.setErrorInfo(line + "行," + charPositionInLine + "列," + offendingSymbol + "字符處,存在語法錯誤:" + msg);syntaxErrorList.add(syntaxErrorEntry);}public List<SyntaxErrorEntry> getSyntaxErrorReport() {return syntaxErrorList;}
}
Lua語法遍歷器
package com.baeldung.antlr.lua;import com.baeldung.antlr.LuaParser;
import com.baeldung.antlr.LuaVisitor;
import org.antlr.v4.runtime.tree.ErrorNode;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.tree.RuleNode;
import org.antlr.v4.runtime.tree.TerminalNode;/*** Lua語法遍歷器** @author duhongming* @see* @since 1.0.0*/
public class LuaSyntaxVisitor implements LuaVisitor<Object> {
// ctrl+O Override即可
}
語法錯誤監聽器
package com.baeldung.antlr.lua;import com.baeldung.antlr.lua.model.SyntaxErrorReportEntry;
import org.antlr.v4.runtime.BaseErrorListener;
import org.antlr.v4.runtime.RecognitionException;
import org.antlr.v4.runtime.Recognizer;/*** 語法錯誤監聽器** @author duhongming* @see* @since 1.0.0*/
public class SyntaxErrorListener extends BaseErrorListener {private final SyntaxErrorReportEntry reporter;public SyntaxErrorListener(SyntaxErrorReportEntry reporter) {this.reporter = reporter;}@Overridepublic void syntaxError(Recognizer<?, ?> recognizer,Object offendingSymbol, int line, int charPositionInLine,String msg, RecognitionException e) {this.reporter.addError(line, charPositionInLine, offendingSymbol, msg);}
}
單元測試
package com.baeldung.antlr;import com.baeldung.antlr.lua.LuaSyntaxVisitor;
import com.baeldung.antlr.lua.SyntaxErrorListener;
import com.baeldung.antlr.lua.model.SyntaxErrorEntry;
import com.baeldung.antlr.lua.model.SyntaxErrorReportEntry;
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.junit.Test;import java.util.List;import static org.hamcrest.CoreMatchers.is;
import static org.hamcrest.MatcherAssert.assertThat;public class LuaSyntaxErrorUnitTest {public static List<SyntaxErrorEntry> judgeLuaSyntax(String luaScript) {//新建一個CharStream,讀取數據CharStream charStreams = CharStreams.fromString(luaScript);//包含一個詞法分析器的定義,作用是將輸入的字符序列聚集成詞匯符號。LuaLexer luaLexer = new LuaLexer(charStreams);//新建一個詞法符號的緩沖區,用于存儲詞法分析器生成的詞法符號(Token)CommonTokenStream tokenStream = new CommonTokenStream(luaLexer);//新建一個語法分析器,用于分析詞法符號緩沖區中的詞法符號LuaParser luaParser = new LuaParser(tokenStream);SyntaxErrorReportEntry syntaxErrorReporter = new SyntaxErrorReportEntry();SyntaxErrorListener errorListener = new SyntaxErrorListener(syntaxErrorReporter);luaParser.addErrorListener(errorListener);LuaSyntaxVisitor luaSyntaxVisitor = new LuaSyntaxVisitor();luaSyntaxVisitor.visit(luaParser.chunk());return syntaxErrorReporter.getSyntaxErrorReport();}@Testpublic void testGood() throws Exception {List<SyntaxErrorEntry> errorEntryList = judgeLuaSyntax("if a~=1 then print(1) end");assertThat(errorEntryList.size(), is(0));}@Testpublic void testBad() throws Exception {//新建一個CharStream,讀取數據List<SyntaxErrorEntry> errorEntryList = judgeLuaSyntax("if a!=1 then print(1) end");assertThat(errorEntryList.size(), is(2));}
}
最終目錄情況,及單元測試情況!
參考
https://www.baeldung.com/java-antlr
https://juejin.cn/post/7018521754125467661
https://www.nosuchfield.com/2023/08/26/ANTLR4-from-Beginning-to-Practice/
https://blog.csdn.net/qq_37771475/article/details/106387201
https://blog.csdn.net/qq_37771475/article/details/106426327