lexer - Lexical Analyser In Java -


i have been trying write simple lexical analyzer in java .

the file token.java looks follows :

import java.util.regex.matcher; import java.util.regex.pattern;  public enum token {      tk_minus ("-"),      tk_plus ("\\+"),      tk_mul ("\\*"),      tk_div ("/"),      tk_not ("~"),      tk_and ("&"),       tk_or ("\\|"),       tk_less ("<"),     tk_leg ("<="),     tk_gt (">"),     tk_geq (">="),      tk_eq ("=="),     tk_assign ("="),     tk_open ("\\("),     tk_close ("\\)"),      tk_semi (";"),      tk_comma (","),      tk_key_define ("define"),      tk_key_as ("as"),     tk_key_is ("is"),     tk_key_if ("if"),      tk_key_then ("then"),      tk_key_else ("else"),      tk_key_endif ("endif"),     open_bracket ("\\{"),     close_bracket ("\\}"),     different ("<>"),      string ("\"[^\"]+\""),     integer ("\\d"),      identifier ("\\w+");      private final pattern pattern;      token(string regex) {         pattern = pattern.compile("^" + regex);     }      int endofmatch(string s) {         matcher m = pattern.matcher(s);          if (m.find()) {             return m.end();         }         return -1;     } } 

the lexer follows : lexer.java

import java.io.ioexception; import java.nio.file.files; import java.nio.file.paths; import java.util.hashset; import java.util.set; import java.util.stream.stream;  public class lexer {     private stringbuilder input = new stringbuilder();     private token token;     private string lexema;     private boolean exausthed = false;     private string errormessage = "";     private set<character> blankchars = new hashset<character>();      public lexer(string filepath) {         try (stream<string> st = files.lines(paths.get(filepath))) {             st.foreach(input::append);         } catch (ioexception ex) {             exausthed = true;             errormessage = "could not read file: " + filepath;             return;         }          blankchars.add('\r');         blankchars.add('\n');         blankchars.add((char) 8);         blankchars.add((char) 9);         blankchars.add((char) 11);         blankchars.add((char) 12);         blankchars.add((char) 32);          moveahead();     }      public void moveahead() {         if (exausthed) {             return;         }          if (input.length() == 0) {             exausthed = true;             return;         }          ignorewhitespaces();          if (findnexttoken()) {             return;         }          exausthed = true;          if (input.length() > 0) {             errormessage = "unexpected symbol: '" + input.charat(0) + "'";         }     }      private void ignorewhitespaces() {         int charstodelete = 0;          while (blankchars.contains(input.charat(charstodelete))) {             charstodelete++;         }          if (charstodelete > 0) {             input.delete(0, charstodelete);         }     }      private boolean findnexttoken() {         (token t : token.values()) {             int end = t.endofmatch(input.tostring());              if (end != -1) {                 token = t;                 lexema = input.substring(0, end);                 input.delete(0, end);                 return true;             }         }          return false;     }      public token currenttoken() {         return token;     }      public string currentlexema() {         return lexema;     }      public boolean issuccessful() {         return errormessage.isempty();     }      public string errormessage() {         return errormessage;     }      public boolean isexausthed() {         return exausthed;     } } 

and can tested try.java follows :

public class try {      public static void main(string[] args) {          lexer lexer = new lexer("c:/users/input.txt");          system.out.println("lexical analysis");         system.out.println("-----------------");         while (!lexer.isexausthed()) {             system.out.printf("%-18s :  %s \n",lexer.currentlexema() , lexer.currenttoken());             lexer.moveahead();         }          if (lexer.issuccessful()) {             system.out.println("ok! :d");         } else {             system.out.println(lexer.errormessage());         }     } } 

say input.txt has

define mine  a=1000; b=23.5; 

the output expect is

define : tk_keyword mine : identifier : identifier = : tk_assign 1000 : integer ; : tk_semi b : identifier = : tk_assign 23.5 : real 

but issue facing : treats each digit

1 integer 0 integer 0 integer 0 integer 

also doesn't recognize real numbers . get:

unexpected symbol: '.'

what changes needed expected results?

your pattern match integer is:

integer ("\\d"),  

that matches one digit.

if want more one, go for

integer ("\\d+"),  

for example.

and, completion, missing other pattern floating point numbers like

real ("(\\d+)\\.\\d+") 

as comments pointed out. or

real ("(\\d*)\\.\\d+") 

to allow

.23

too - if looking for!


Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -