[wp-trac] [WordPress Trac] #50514: make_clickable nested links bug
WordPress Trac
noreply at wordpress.org
Tue Jun 30 09:29:29 UTC 2020
#50514: make_clickable nested links bug
--------------------------+-----------------------------
Reporter: elhardoum | Owner: (none)
Type: defect (bug) | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Formatting | Version: trunk
Severity: normal | Keywords: has-patch
Focuses: |
--------------------------+-----------------------------
If you look at the source of
[`make_clickable`](https://core.trac.wordpress.org/browser/tags/5.4/src
/wp-includes/formatting.php#L2985) there's one last regex call to replace
potentially nested links:
{{{#!php
// Cleanup of accidental links within links.
return preg_replace( '#(<a([ \r\n\t]+[^>]+?>|>))<a
[^>]+?>([^>]+?)</a></a>#i', '$1$3</a>', $r );
}}}
From the looks of the expression, this is meant to remove accidental
nested links only if the parent link wrapping them does not have any non-
link text at the edges. Let me provide a few examples:
1. This works as intended:
{{{#!php
<?php
$text = '<a href="https://w.org">https://w.org</a>';
$click = make_clickable($text);
# <a href="https://w.org">https://w.org</a>
}}}
2. Let's introduce more content inside the link, either prepend or append
it to the hyperlink's inner URL:
{{{#!php
<?php
$text = '<a href="https://w.org"> https://w.org</a>';
$click = make_clickable($text); # <a
href="https://w.org">https://w.org</a>
# <a href="https://w.org"> <a href="https://w.org"
rel="nofollow">https://w.org</a></a>
$text = '<a href="https://w.org">https://w.org </a>';
$click = make_clickable($text); # <a
href="https://w.org">https://w.org</a>
# <a href="https://w.org"><a href="https://w.org"
rel="nofollow">https://w.org</a> </a>
}}}
There, I used a simple whitespace for the sake of an example.
**Suggested Patch**
I am suggesting a simple fix with the cleanup regex expression, although
however if you managed to understand the root problem you'd be able to
come up with something much better.
{{{#!php
<?php
// Cleanup of accidental links within links.
return preg_replace( '#(<a([ \r\n\t]+[^>]+?>|>))(.+)?<a
[^>]+?>([^>]+?)</a>(.+)?</a>#is', '$1$3$4$5</a>', $r );
}}}
In my patch you'll see the following changes:
1. Added `(.+)?` for capturing any leading characters before any nested
links
2. Added `(.+)?` for capturing any trailing characters before any nested
links
3. Added `s` modifier so as to have the previous capturers work with
newlines
4. `'$1$3$4$5</a>'` restoring the captured leading or trailing characters
back into the cleaned up HTML.
Also worth noting I am running WordPress 5.4.2, PHP 7.4.4, nginx/1.17.10
on an Alpine Linux docker container (Linux 4.19.76-linuxkit x86_64).
--
Ticket URL: <https://core.trac.wordpress.org/ticket/50514>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list