opennlp - open nlp training named entity -


i training model named entity recognition not identifying names of person?

my training data looks like:

<start:person> pierre vinken <end>  , 61 years old , join board nonexecutive director nov. 29 . nonexecutive  director has many similar responsibilities executive director.however, there no voting rights position.` mr . <start:person> vinken <end> chairman of elsevier n.v., dutch publishing group.  former chairman of society  <start:person> rudolph agnew <end> assisting <start:person> vinken <end> in activities.  mr . <start:person> vinken <end> right person in industry. competitior <start:person> steve <end> vice chairman of himbeldon n.v., ericson publishing group. <start:person> vinken <end> assisted <start:person> angelina tucci <end>  has been recognized many times work.  <start:person> juilie <end>  vp of weterwood a.b., zs publishing group supported him. mr . <start:person> stewart <end> recruiter of metric c.d., drishti publishing. recruited <start:person> adam <end>  work on nlp  <start:person> vinken <end> . lead conference  appointing him director held <start:person> daniel smith <end> @ boston. 

the java file training model is:

public class namedentitymodel {     public static void train(string inputfile,string modelfile) throws ioexception {         charset charset = charset.forname("utf-8");         markablefileinputstreamfactory factory = new markablefileinputstreamfactory( new file(inputfile));         objectstream<string> linestream = new plaintextbylinestream( factory, charset);         objectstream<namesample> samplestream = new namesampledatastream( linestream);         tokennamefindermodel model = null;          try {                         model = namefinderme.train("en", "person", samplestream,trainingparameters.defaultparams(),                                  new tokennamefinderfactory());          } {                         samplestream.close();         }         bufferedoutputstream modelout = null;         try {                         modelout = new bufferedoutputstream(new fileoutputstream(modelfile));                         model.serialize(modelout);         } {                         if (modelout != null)                                         modelout.close();         } } } 

and how main class looks:

public class namefinder {     public static void main(string [] args) throws ioexception{         string inputfile="c:/setup/apache-opennlp-1.7.2/bin/ner_training_data.txt";         string modelfile="c:/setup/apache-opennlp-1.7.2/bin/en-tr-ner-person.bin";          namedentitymodel.train(inputfile, modelfile);      string sentence ="pierre vinken , 61 years old , join board nonexecutive director nov. 29 . mr . vinken chairman of elsevier n.v. , dutch publishing group. rudolph agnew , 55 years old , former chairman of consolidated gold fields plc , named director of british industrial conglomerate . peter on leave today . "             + "steve competitor . daniel smith lead ceremony. kristen svery happpy know it. thomas u please matter ruby busy";       whitespacetokenizer whitespacetokenizer = whitespacetokenizer.instance;        //tokenizing given paragraph      string tokens[] = whitespacetokenizer.tokenize(sentence);       for(string str:tokens)         system.out.println(str);      inputstream inputstreamnamefinder = new fileinputstream(modelfile);            tokennamefindermodel model = new tokennamefindermodel(inputstreamnamefinder);      namefinderme namefinder = new namefinderme(model);          span namespans[] = namefinder.find(tokens);          system.out.println(arrays.tostring(span.spanstostrings(namespans, tokens)));      for(span s: namespans)                system.out.println(s.tostring()+"  "+tokens[s.getstart()]);        }      } 

and output is:

[pierre vinken, vinken, peter, steve, daniel smith, kristen, thomas] 

this trained model not able recognize names rudolph agnew , ruby. how train more accurately able recognize names more correctly ?

+1 answer of @caffeinator13. also, there params (https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/trainingparameters.html) (link older version, guess there params still in more recent versions), control number of iterations , (perhaps more relevant you) cutoff, i.e. number of times entity has appear in training data considered recognition. setting more or less controls precision vs. recall , maybe should set bit more lenient (not sure default again). instead of using defaultparams, try:

trainingparameters tp = new trainingparameters(); tp.put(trainingparameters.cutoff_param, "1"); tp.put(trainingparameters.iterations_param, "100"); tokennamefinderfactory tnff = new tokennamefinderfactory(); model = namefinderme.train(language, modelname, samplestream, tp, tnff); 

Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -