PHP regex, skip <link> tags when rel="canonical" -
i run php script in wordpress removes http:
, https:
protocols links using following regex:
$links = preg_replace( '/<input\b[^<]*\bvalue=[\"\']https?:\/\/(*skip)(*f)|https?:\/\//', '//', $links );
for first part: <input\b[^<]*\bvalue=[\"\']https?:\/\/(*skip)(*f)
, skips <input>
tags have http:
/ https:
value, such as:
<input type="url" value="http://example.com">
additionally, i'd skip <link>
tags have rel="canonical"
attribute:
<link rel="canonical" href="http://example.com/remove-http/" />
using regex tester, i've been trying update logic. i've come far:
<(input|link)\b[^<]*\(value|rel)=[\"\'](https?:\/\/|canonical)(*skip)(*f)|https?:\/\/
but hasn't worked me.
the (*skip)(*f)
verbs used discard text matched far , proceed search next match position regex index after matching text pattern before these verbs.
so, match word1
or word2
, drop them , go on word3
, need use
'~(?:word1|word2)(*skip)(*f)|word3~'
the (?:...)
non-capturing group group alternatives must dropped.
in case, whole <link...>
should matched, not attribute. thus, need link\b[^>]*?\brel=[\'\"]canonical[\'\"][^>]*>
instead of word2
in above regex.
however, should think using html parser compatible environment (i saw note domdocument malfunctions there).
Comments
Post a Comment