regex - Need Regular expression to replace RTF control word with in text. Java -
i have string in tags can appear inside caret sysmbol ^...^. found regular expression can find tags in string \\^.*?\\^. after finding tags tags can contain rtf control words. it's not in cases can. here example of such tag ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^. want replace rtf control word in tag. tried make regular expression can start \, can contain letters or numbers or both after slash , ends space. , replace empty "". in way have lot-city remains. how can it. tried following
string tagregex = "\\^.*?\\^"; pattern tagregexpattern = pattern.compile(tagregex, pattern.multiline); matcher tagregexpatternmatcher = tagregexpattern.matcher(input); while(tagregexpatternmatcher.find()) { // work string tag = tagregexpatternmatcher.group(); string controlwordregex = "\\b\\[a-za-z]+(-?[0-9]+)? ? \\b"; pattern controlwordregexpattern = pattern.compile(controlwordregex, pattern.multiline); matcher controlwordregexpatternmatcher = controlwordregexpattern.matcher(tag); while (controlwordregexpatternmatcher.find()) { // didn't work string matchedtext = controlwordregexpatternmatcher.group(); } } here input tried
string input = "{\\rtlch\\fcs1 \\af39\\afs20 \\ltrch\\fcs0 \\fs20\\insrsid10175635\\charrsid8585274 \\hich\\af39\\dbch\\af31505\\loch\\f39 build job city:\\par \\hich\\af39\\dbch\\af31505\\loch\\f39 ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^}"; i tried following \\b\\[a-za-z0-9]+ \\b. boundary , unboundary matches. didn't work. how can make such regular expression ?
thanks
here way solve issue:
string input = "{\\rtlch\\fcs1 \\af39\\afs20 \\ltrch\\fcs0 \\fs20\\insrsid10175635\\charrsid8585274 \\hich\\af39\\dbch\\af31505\\loch\\f39 build job city:\\par \\hich\\af39\\dbch\\af31505\\loch\\f39 ^l\\hich\\af39\\dbch\\af31505\\loch\\f39 ot-city^}"; string tagregex = "\\^(.*?)\\^"; pattern tagregexpattern = pattern.compile(tagregex, pattern.dotall); matcher tagregexpatternmatcher = tagregexpattern.matcher(input); while(tagregexpatternmatcher.find()) { // work string tag = tagregexpatternmatcher.group(1); string controlwordregex = "\\b(?:\\\\[a-za-z]+(-?[0-9]+)? ?)+ \\b"; system.out.println(tag.replaceall(controlwordregex, "")); } see java demo
first, added capturing group in initial regex grab text between ^ symbols.
then, second regex matches
\\b- word boundary (there must start of string or word char before)(?:\\\\[a-za-z]+(-?[0-9]+)? ?)+- non-capturing group ((?:....), used group patterns match them sequence) matching 1 or more sequences of:\\\\-\[a-za-z]+- 1 or more letters(-?[0-9]+)?- optional sequence of optional-, 1+ digits?- optional space (replace\\ssafety)
\\b- leading word boundary (there must end of string or word char after)
this regex used inside .replaceall method remove rtf codes matches obtained first regex.
Comments
Post a Comment