regex - Need Regular expression to replace RTF control word with in text. Java -


i have string in tags can appear inside caret sysmbol ^...^. found regular expression can find tags in string \\^.*?\\^. after finding tags tags can contain rtf control words. it's not in cases can. here example of such tag ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^. want replace rtf control word in tag. tried make regular expression can start \, can contain letters or numbers or both after slash , ends space. , replace empty "". in way have lot-city remains. how can it. tried following

string tagregex = "\\^.*?\\^"; pattern tagregexpattern = pattern.compile(tagregex, pattern.multiline); matcher tagregexpatternmatcher = tagregexpattern.matcher(input); while(tagregexpatternmatcher.find()) {   // work     string tag = tagregexpatternmatcher.group();     string controlwordregex = "\\b\\[a-za-z]+(-?[0-9]+)? ? \\b";     pattern controlwordregexpattern = pattern.compile(controlwordregex, pattern.multiline);     matcher controlwordregexpatternmatcher = controlwordregexpattern.matcher(tag);     while (controlwordregexpatternmatcher.find()) {  // didn't work         string matchedtext = controlwordregexpatternmatcher.group();     } } 

here input tried

string input = "{\\rtlch\\fcs1 \\af39\\afs20 \\ltrch\\fcs0 \\fs20\\insrsid10175635\\charrsid8585274 \\hich\\af39\\dbch\\af31505\\loch\\f39 build job city:\\par \\hich\\af39\\dbch\\af31505\\loch\\f39 ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^}"; 

i tried following \\b\\[a-za-z0-9]+ \\b. boundary , unboundary matches. didn't work. how can make such regular expression ?

thanks

here way solve issue:

string input = "{\\rtlch\\fcs1 \\af39\\afs20 \\ltrch\\fcs0 \\fs20\\insrsid10175635\\charrsid8585274 \\hich\\af39\\dbch\\af31505\\loch\\f39 build job city:\\par \\hich\\af39\\dbch\\af31505\\loch\\f39 ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^}"; string tagregex = "\\^(.*?)\\^"; pattern tagregexpattern = pattern.compile(tagregex, pattern.dotall); matcher tagregexpatternmatcher = tagregexpattern.matcher(input); while(tagregexpatternmatcher.find()) {   // work     string tag = tagregexpatternmatcher.group(1);     string controlwordregex = "\\b(?:\\\\[a-za-z]+(-?[0-9]+)? ?)+ \\b";     system.out.println(tag.replaceall(controlwordregex, "")); } 

see java demo

first, added capturing group in initial regex grab text between ^ symbols.

then, second regex matches

  • \\b - word boundary (there must start of string or word char before)
  • (?:\\\\[a-za-z]+(-?[0-9]+)? ?)+ - non-capturing group ((?:....), used group patterns match them sequence) matching 1 or more sequences of:
    • \\\\ - \
    • [a-za-z]+ - 1 or more letters
    • (-?[0-9]+)? - optional sequence of optional - , 1+ digits
    • ? - optional space (replace \\s safety)
  • \\b - leading word boundary (there must end of string or word char after)

this regex used inside .replaceall method remove rtf codes matches obtained first regex.


Comments