lexer - Lexical Analyser In Java -
i have been trying write simple lexical analyzer in java .
the file token.java looks follows :
import java.util.regex.matcher; import java.util.regex.pattern; public enum token { tk_minus ("-"), tk_plus ("\\+"), tk_mul ("\\*"), tk_div ("/"), tk_not ("~"), tk_and ("&"), tk_or ("\\|"), tk_less ("<"), tk_leg ("<="), tk_gt (">"), tk_geq (">="), tk_eq ("=="), tk_assign ("="), tk_open ("\\("), tk_close ("\\)"), tk_semi (";"), tk_comma (","), tk_key_define ("define"), tk_key_as ("as"), tk_key_is ("is"), tk_key_if ("if"), tk_key_then ("then"), tk_key_else ("else"), tk_key_endif ("endif"), open_bracket ("\\{"), close_bracket ("\\}"), different ("<>"), string ("\"[^\"]+\""), integer ("\\d"), identifier ("\\w+"); private final pattern pattern; token(string regex) { pattern = pattern.compile("^" + regex); } int endofmatch(string s) { matcher m = pattern.matcher(s); if (m.find()) { return m.end(); } return -1; } }
the lexer follows : lexer.java
import java.io.ioexception; import java.nio.file.files; import java.nio.file.paths; import java.util.hashset; import java.util.set; import java.util.stream.stream; public class lexer { private stringbuilder input = new stringbuilder(); private token token; private string lexema; private boolean exausthed = false; private string errormessage = ""; private set<character> blankchars = new hashset<character>(); public lexer(string filepath) { try (stream<string> st = files.lines(paths.get(filepath))) { st.foreach(input::append); } catch (ioexception ex) { exausthed = true; errormessage = "could not read file: " + filepath; return; } blankchars.add('\r'); blankchars.add('\n'); blankchars.add((char) 8); blankchars.add((char) 9); blankchars.add((char) 11); blankchars.add((char) 12); blankchars.add((char) 32); moveahead(); } public void moveahead() { if (exausthed) { return; } if (input.length() == 0) { exausthed = true; return; } ignorewhitespaces(); if (findnexttoken()) { return; } exausthed = true; if (input.length() > 0) { errormessage = "unexpected symbol: '" + input.charat(0) + "'"; } } private void ignorewhitespaces() { int charstodelete = 0; while (blankchars.contains(input.charat(charstodelete))) { charstodelete++; } if (charstodelete > 0) { input.delete(0, charstodelete); } } private boolean findnexttoken() { (token t : token.values()) { int end = t.endofmatch(input.tostring()); if (end != -1) { token = t; lexema = input.substring(0, end); input.delete(0, end); return true; } } return false; } public token currenttoken() { return token; } public string currentlexema() { return lexema; } public boolean issuccessful() { return errormessage.isempty(); } public string errormessage() { return errormessage; } public boolean isexausthed() { return exausthed; } }
and can tested try.java follows :
public class try { public static void main(string[] args) { lexer lexer = new lexer("c:/users/input.txt"); system.out.println("lexical analysis"); system.out.println("-----------------"); while (!lexer.isexausthed()) { system.out.printf("%-18s : %s \n",lexer.currentlexema() , lexer.currenttoken()); lexer.moveahead(); } if (lexer.issuccessful()) { system.out.println("ok! :d"); } else { system.out.println(lexer.errormessage()); } } }
say input.txt has
define mine a=1000; b=23.5;
the output expect is
define : tk_keyword mine : identifier : identifier = : tk_assign 1000 : integer ; : tk_semi b : identifier = : tk_assign 23.5 : real
but issue facing : treats each digit
1 integer 0 integer 0 integer 0 integer
also doesn't recognize real numbers . get:
unexpected symbol: '.'
what changes needed expected results?
your pattern match integer is:
integer ("\\d"),
that matches one digit.
if want more one, go for
integer ("\\d+"),
for example.
and, completion, missing other pattern floating point numbers like
real ("(\\d+)\\.\\d+")
as comments pointed out. or
real ("(\\d*)\\.\\d+")
to allow
.23
too - if looking for!
Comments
Post a Comment