PHP regex, skip <link> tags when rel="canonical" -
i run php script in wordpress removes http: , https: protocols links using following regex:
$links = preg_replace( '/<input\b[^<]*\bvalue=[\"\']https?:\/\/(*skip)(*f)|https?:\/\//', '//', $links ); for first part: <input\b[^<]*\bvalue=[\"\']https?:\/\/(*skip)(*f), skips <input> tags have http: / https: value, such as:
<input type="url" value="http://example.com"> additionally, i'd skip <link> tags have rel="canonical" attribute:
<link rel="canonical" href="http://example.com/remove-http/" /> using regex tester, i've been trying update logic. i've come far:
<(input|link)\b[^<]*\(value|rel)=[\"\'](https?:\/\/|canonical)(*skip)(*f)|https?:\/\/ but hasn't worked me.
the (*skip)(*f) verbs used discard text matched far , proceed search next match position regex index after matching text pattern before these verbs.
so, match word1 or word2, drop them , go on word3, need use
'~(?:word1|word2)(*skip)(*f)|word3~' the (?:...) non-capturing group group alternatives must dropped.
in case, whole <link...> should matched, not attribute. thus, need link\b[^>]*?\brel=[\'\"]canonical[\'\"][^>]*> instead of word2 in above regex.
however, should think using html parser compatible environment (i saw note domdocument malfunctions there).
Comments
Post a Comment