REGEX replace leading spaces and tabs to html code in encoded xml per line, in Java -


i replace leading spaces , tabs, in encoded xml/html, per line html-codes.

replace groups of 4 spaces or every tabulator through tabulator (#09;) replace rest of spaces through space ( ) replaces may/must on start of each line, until first non-space or tab character

example

begin of line: (^|(\\r|\\n)+) => (\\r|\\n)+ multiple linebrakes can wrapped  replacmentment chars: [ ], [\t]  21 whitespaces = 5 x #09; + 1 x   10 whitespace + 1 tab + 6 whitespaces = 2x #09; + 2x   + 1x #09; + 1x  #09; + 2x    :: 10 spaces = 2 x #09 + 2x &nbsp :: 1 tab = 1x #09 :: 6 spaces = 1 x #09 + 2x &nbsp 

input string, , replaces other regular expressions

text = text.replace(regex1, replacement1) text = text.replace(regex2, replacement2) text = text.replace(regex3, replacement3) text = text.replace(regex4, replacement4) 

at position must implement new regular expression

visual xml

<test>     <node1>         <value>         test</value>     </node1>     <node1>         <value>         test</value>     </node1> </test> 

encoded xml structure, visual , on input string

&lt;test&gt;     &lt;node1&gt;         &lt;value&gt;         test&lt;/value&gt;     &lt;/node1&gt;     &lt;node1&gt;         &lt;value&gt;         test&lt;/value&gt;     &lt;/node1&gt; &lt;/test&gt; 

expected output

&lt;test&gt; &#09;&lt;node1&gt; &#09;&#09;&nbsp;&lt;value&gt;         test&lt;/value&gt; <- not replaced in <value> &#09;&lt;/node1&gt; &#09;&lt;node1&gt; &#09;&#09;&nbsp;&lt;value&gt;         test&lt;/value&gt; <- not replaced in <value> &#09;&lt;/node1&gt; &lt;/test&gt; 

i tried lot,

tried , failed store beginning of line in regex-mempory, replace whitespaces groups

result: repeating beginning of line , html coded spaces/tabs example: \r&#09;\r&#09;\r&#09;\r&#09; expected:\r&#09;&#09;&#09;&#09;  "(^|(\\r|\\n))[ ]{4}", "\\1&#09" 

tried to in 2 line, first replace 4 spaces tabs, tabs tabs, , second replace rest of spaces &bnsp; replaces every space tried same, "&#09;[ ]", "&#09;&nbps;"

i tried matcher.find() loop , substring shows best not 100% correct results.

i fail , fail correct regex, can help?

how following program using bunch of replaceall methods , lookbehinds:

    public static void main (string[] args) {         final string[] input = new string[] { "<test>", "    <node1>", "         <value>         test</value>",                // 2 tabs 1 space here "    </node1>", "    <node1>", "        <value>         test</value>", "    </node1>", "</test>"     };          (string str: input) {             system.out.println("new: " + htmlspecialchars(str));         }     }      private static string htmlspecialchars(string str) {         return str             .replaceall("&", "&quot;")                  // replace html entities             .replaceall("<", "&lt;")             .replaceall(">", "&gt;")             .replaceall("(?<=^\\s*)\t", "    ")         // replace tabs 4 spaces             .replaceall("(?<=^\\s*)    ", "&#09;")      // replace 4 spaces &#09;             .replaceall("(?<=^(?:&#09;)*) ", "&nbsp;"); // replace rest spaces &nbsp;     } 

the resulting output is:

new: &lt;test&gt; new: &#09;&lt;node1&gt; new: &#09;&#09;&nbsp;&lt;value&gt;         test&lt;/value&gt; new: &#09;&lt;/node1&gt; new: &#09;&lt;node1&gt; new: &#09;&#09;&lt;value&gt;         test&lt;/value&gt; new: &#09;&lt;/node1&gt; new: &lt;/test&gt; 

Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -