opennlp - open nlp training named entity -
i training model named entity recognition not identifying names of person?
my training data looks like:
<start:person> pierre vinken <end> , 61 years old , join board nonexecutive director nov. 29 . nonexecutive director has many similar responsibilities executive director.however, there no voting rights position.` mr . <start:person> vinken <end> chairman of elsevier n.v., dutch publishing group. former chairman of society <start:person> rudolph agnew <end> assisting <start:person> vinken <end> in activities. mr . <start:person> vinken <end> right person in industry. competitior <start:person> steve <end> vice chairman of himbeldon n.v., ericson publishing group. <start:person> vinken <end> assisted <start:person> angelina tucci <end> has been recognized many times work. <start:person> juilie <end> vp of weterwood a.b., zs publishing group supported him. mr . <start:person> stewart <end> recruiter of metric c.d., drishti publishing. recruited <start:person> adam <end> work on nlp <start:person> vinken <end> . lead conference appointing him director held <start:person> daniel smith <end> @ boston.
the java file training model is:
public class namedentitymodel { public static void train(string inputfile,string modelfile) throws ioexception { charset charset = charset.forname("utf-8"); markablefileinputstreamfactory factory = new markablefileinputstreamfactory( new file(inputfile)); objectstream<string> linestream = new plaintextbylinestream( factory, charset); objectstream<namesample> samplestream = new namesampledatastream( linestream); tokennamefindermodel model = null; try { model = namefinderme.train("en", "person", samplestream,trainingparameters.defaultparams(), new tokennamefinderfactory()); } { samplestream.close(); } bufferedoutputstream modelout = null; try { modelout = new bufferedoutputstream(new fileoutputstream(modelfile)); model.serialize(modelout); } { if (modelout != null) modelout.close(); } } }
and how main class looks:
public class namefinder { public static void main(string [] args) throws ioexception{ string inputfile="c:/setup/apache-opennlp-1.7.2/bin/ner_training_data.txt"; string modelfile="c:/setup/apache-opennlp-1.7.2/bin/en-tr-ner-person.bin"; namedentitymodel.train(inputfile, modelfile); string sentence ="pierre vinken , 61 years old , join board nonexecutive director nov. 29 . mr . vinken chairman of elsevier n.v. , dutch publishing group. rudolph agnew , 55 years old , former chairman of consolidated gold fields plc , named director of british industrial conglomerate . peter on leave today . " + "steve competitor . daniel smith lead ceremony. kristen svery happpy know it. thomas u please matter ruby busy"; whitespacetokenizer whitespacetokenizer = whitespacetokenizer.instance; //tokenizing given paragraph string tokens[] = whitespacetokenizer.tokenize(sentence); for(string str:tokens) system.out.println(str); inputstream inputstreamnamefinder = new fileinputstream(modelfile); tokennamefindermodel model = new tokennamefindermodel(inputstreamnamefinder); namefinderme namefinder = new namefinderme(model); span namespans[] = namefinder.find(tokens); system.out.println(arrays.tostring(span.spanstostrings(namespans, tokens))); for(span s: namespans) system.out.println(s.tostring()+" "+tokens[s.getstart()]); } }
and output is:
[pierre vinken, vinken, peter, steve, daniel smith, kristen, thomas]
this trained model not able recognize names rudolph agnew , ruby. how train more accurately able recognize names more correctly ?
+1 answer of @caffeinator13. also, there params (https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/trainingparameters.html) (link older version, guess there params still in more recent versions), control number of iterations , (perhaps more relevant you) cutoff, i.e. number of times entity has appear in training data considered recognition. setting more or less controls precision vs. recall , maybe should set bit more lenient (not sure default again). instead of using defaultparams, try:
trainingparameters tp = new trainingparameters(); tp.put(trainingparameters.cutoff_param, "1"); tp.put(trainingparameters.iterations_param, "100"); tokennamefinderfactory tnff = new tokennamefinderfactory(); model = namefinderme.train(language, modelname, samplestream, tp, tnff);
Comments
Post a Comment