regex - Need Regular expression to replace RTF control word with in text. Java -
i have string in tags can appear inside caret sysmbol ^...^
. found regular expression can find tags in string \\^.*?\\^
. after finding tags tags can contain rtf control words. it's not in cases can. here example of such tag ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^
. want replace rtf control word in tag. tried make regular expression can start \
, can contain letters or numbers or both after slash , ends space. , replace empty ""
. in way have lot-city
remains. how can it. tried following
string tagregex = "\\^.*?\\^"; pattern tagregexpattern = pattern.compile(tagregex, pattern.multiline); matcher tagregexpatternmatcher = tagregexpattern.matcher(input); while(tagregexpatternmatcher.find()) { // work string tag = tagregexpatternmatcher.group(); string controlwordregex = "\\b\\[a-za-z]+(-?[0-9]+)? ? \\b"; pattern controlwordregexpattern = pattern.compile(controlwordregex, pattern.multiline); matcher controlwordregexpatternmatcher = controlwordregexpattern.matcher(tag); while (controlwordregexpatternmatcher.find()) { // didn't work string matchedtext = controlwordregexpatternmatcher.group(); } }
here input tried
string input = "{\\rtlch\\fcs1 \\af39\\afs20 \\ltrch\\fcs0 \\fs20\\insrsid10175635\\charrsid8585274 \\hich\\af39\\dbch\\af31505\\loch\\f39 build job city:\\par \\hich\\af39\\dbch\\af31505\\loch\\f39 ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^}";
i tried following \\b\\[a-za-z0-9]+ \\b
. boundary , unboundary matches. didn't work. how can make such regular expression ?
thanks
here way solve issue:
string input = "{\\rtlch\\fcs1 \\af39\\afs20 \\ltrch\\fcs0 \\fs20\\insrsid10175635\\charrsid8585274 \\hich\\af39\\dbch\\af31505\\loch\\f39 build job city:\\par \\hich\\af39\\dbch\\af31505\\loch\\f39 ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^}"; string tagregex = "\\^(.*?)\\^"; pattern tagregexpattern = pattern.compile(tagregex, pattern.dotall); matcher tagregexpatternmatcher = tagregexpattern.matcher(input); while(tagregexpatternmatcher.find()) { // work string tag = tagregexpatternmatcher.group(1); string controlwordregex = "\\b(?:\\\\[a-za-z]+(-?[0-9]+)? ?)+ \\b"; system.out.println(tag.replaceall(controlwordregex, "")); }
see java demo
first, added capturing group in initial regex grab text between ^
symbols.
then, second regex matches
\\b
- word boundary (there must start of string or word char before)(?:\\\\[a-za-z]+(-?[0-9]+)? ?)+
- non-capturing group ((?:....)
, used group patterns match them sequence) matching 1 or more sequences of:\\\\
-\
[a-za-z]+
- 1 or more letters(-?[0-9]+)?
- optional sequence of optional-
, 1+ digits?
- optional space (replace\\s
safety)
\\b
- leading word boundary (there must end of string or word char after)
this regex used inside .replaceall
method remove rtf codes matches obtained first regex.
Comments
Post a Comment