regex - Need Regular expression to replace RTF control word with in text. Java -


i have string in tags can appear inside caret sysmbol ^...^. found regular expression can find tags in string \\^.*?\\^. after finding tags tags can contain rtf control words. it's not in cases can. here example of such tag ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^. want replace rtf control word in tag. tried make regular expression can start \, can contain letters or numbers or both after slash , ends space. , replace empty "". in way have lot-city remains. how can it. tried following

string tagregex = "\\^.*?\\^"; pattern tagregexpattern = pattern.compile(tagregex, pattern.multiline); matcher tagregexpatternmatcher = tagregexpattern.matcher(input); while(tagregexpatternmatcher.find()) {   // work     string tag = tagregexpatternmatcher.group();     string controlwordregex = "\\b\\[a-za-z]+(-?[0-9]+)? ? \\b";     pattern controlwordregexpattern = pattern.compile(controlwordregex, pattern.multiline);     matcher controlwordregexpatternmatcher = controlwordregexpattern.matcher(tag);     while (controlwordregexpatternmatcher.find()) {  // didn't work         string matchedtext = controlwordregexpatternmatcher.group();     } } 

here input tried

string input = "{\\rtlch\\fcs1 \\af39\\afs20 \\ltrch\\fcs0 \\fs20\\insrsid10175635\\charrsid8585274 \\hich\\af39\\dbch\\af31505\\loch\\f39 build job city:\\par \\hich\\af39\\dbch\\af31505\\loch\\f39 ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^}"; 

i tried following \\b\\[a-za-z0-9]+ \\b. boundary , unboundary matches. didn't work. how can make such regular expression ?

thanks

here way solve issue:

string input = "{\\rtlch\\fcs1 \\af39\\afs20 \\ltrch\\fcs0 \\fs20\\insrsid10175635\\charrsid8585274 \\hich\\af39\\dbch\\af31505\\loch\\f39 build job city:\\par \\hich\\af39\\dbch\\af31505\\loch\\f39 ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^}"; string tagregex = "\\^(.*?)\\^"; pattern tagregexpattern = pattern.compile(tagregex, pattern.dotall); matcher tagregexpatternmatcher = tagregexpattern.matcher(input); while(tagregexpatternmatcher.find()) {   // work     string tag = tagregexpatternmatcher.group(1);     string controlwordregex = "\\b(?:\\\\[a-za-z]+(-?[0-9]+)? ?)+ \\b";     system.out.println(tag.replaceall(controlwordregex, "")); } 

see java demo

first, added capturing group in initial regex grab text between ^ symbols.

then, second regex matches

  • \\b - word boundary (there must start of string or word char before)
  • (?:\\\\[a-za-z]+(-?[0-9]+)? ?)+ - non-capturing group ((?:....), used group patterns match them sequence) matching 1 or more sequences of:
    • \\\\ - \
    • [a-za-z]+ - 1 or more letters
    • (-?[0-9]+)? - optional sequence of optional - , 1+ digits
    • ? - optional space (replace \\s safety)
  • \\b - leading word boundary (there must end of string or word char after)

this regex used inside .replaceall method remove rtf codes matches obtained first regex.


Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -