REGEX replace leading spaces and tabs to html code in encoded xml per line, in Java -
i replace leading spaces , tabs, in encoded xml/html, per line html-codes.
replace groups of 4 spaces or every tabulator through tabulator (#09;) replace rest of spaces through space ( ) replaces may/must on start of each line, until first non-space or tab character
example
begin of line: (^|(\\r|\\n)+) => (\\r|\\n)+ multiple linebrakes can wrapped replacmentment chars: [ ], [\t] 21 whitespaces = 5 x #09; + 1 x 10 whitespace + 1 tab + 6 whitespaces = 2x #09; + 2x + 1x #09; + 1x #09; + 2x :: 10 spaces = 2 x #09 + 2x   :: 1 tab = 1x #09 :: 6 spaces = 1 x #09 + 2x  
input string, , replaces other regular expressions
text = text.replace(regex1, replacement1) text = text.replace(regex2, replacement2) text = text.replace(regex3, replacement3) text = text.replace(regex4, replacement4)
at position must implement new regular expression
visual xml
<test> <node1> <value> test</value> </node1> <node1> <value> test</value> </node1> </test>
encoded xml structure, visual , on input string
<test> <node1> <value> test</value> </node1> <node1> <value> test</value> </node1> </test>
expected output
<test> 	<node1> 		 <value> test</value> <- not replaced in <value> 	</node1> 	<node1> 		 <value> test</value> <- not replaced in <value> 	</node1> </test>
i tried lot,
tried , failed store beginning of line in regex-mempory, replace whitespaces groups
result: repeating beginning of line , html coded spaces/tabs example: \r	\r	\r	\r	 expected:\r				 "(^|(\\r|\\n))[ ]{4}", "\\1	"
tried to in 2 line, first replace 4 spaces tabs, tabs tabs, , second replace rest of spaces &bnsp; replaces every space tried same, "	[ ]", "	&nbps;"
i tried matcher.find() loop , substring shows best not 100% correct results.
i fail , fail correct regex, can help?
how following program using bunch of replaceall methods , lookbehinds:
public static void main (string[] args) { final string[] input = new string[] { "<test>", " <node1>", " <value> test</value>", // 2 tabs 1 space here " </node1>", " <node1>", " <value> test</value>", " </node1>", "</test>" }; (string str: input) { system.out.println("new: " + htmlspecialchars(str)); } } private static string htmlspecialchars(string str) { return str .replaceall("&", """) // replace html entities .replaceall("<", "<") .replaceall(">", ">") .replaceall("(?<=^\\s*)\t", " ") // replace tabs 4 spaces .replaceall("(?<=^\\s*) ", "	") // replace 4 spaces 	 .replaceall("(?<=^(?:	)*) ", " "); // replace rest spaces }
the resulting output is:
new: <test> new: 	<node1> new: 		 <value> test</value> new: 	</node1> new: 	<node1> new: 		<value> test</value> new: 	</node1> new: </test>
Comments
Post a Comment