[wp-trac] [WordPress Trac] #50514: make_clickable nested links bug

WordPress Trac noreply at wordpress.org
Tue Jun 30 09:29:29 UTC 2020


#50514: make_clickable nested links bug
--------------------------+-----------------------------
 Reporter:  elhardoum     |      Owner:  (none)
     Type:  defect (bug)  |     Status:  new
 Priority:  normal        |  Milestone:  Awaiting Review
Component:  Formatting    |    Version:  trunk
 Severity:  normal        |   Keywords:  has-patch
  Focuses:                |
--------------------------+-----------------------------
 If you look at the source of
 [`make_clickable`](https://core.trac.wordpress.org/browser/tags/5.4/src
 /wp-includes/formatting.php#L2985) there's one last regex call to replace
 potentially nested links:


 {{{#!php
 // Cleanup of accidental links within links.
 return preg_replace( '#(<a([ \r\n\t]+[^>]+?>|>))<a
 [^>]+?>([^>]+?)</a></a>#i', '$1$3</a>', $r );
 }}}

 From the looks of the expression, this is meant to remove accidental
 nested links only if the parent link wrapping them does not have any non-
 link text at the edges. Let me provide a few examples:

 1. This works as intended:

 {{{#!php
 <?php

 $text = '<a href="https://w.org">https://w.org</a>';
 $click = make_clickable($text);
 # <a href="https://w.org">https://w.org</a>
 }}}

 2. Let's introduce more content inside the link, either prepend or append
 it to the hyperlink's inner URL:

 {{{#!php
 <?php

 $text = '<a href="https://w.org"> https://w.org</a>';
 $click = make_clickable($text); # <a
 href="https://w.org">https://w.org</a>
 # <a href="https://w.org"> <a href="https://w.org"
 rel="nofollow">https://w.org</a></a>

 $text = '<a href="https://w.org">https://w.org </a>';
 $click = make_clickable($text); # <a
 href="https://w.org">https://w.org</a>
 # <a href="https://w.org"><a href="https://w.org"
 rel="nofollow">https://w.org</a> </a>
 }}}

 There, I used a simple whitespace for the sake of an example.

 **Suggested Patch**

 I am suggesting a simple fix with the cleanup regex expression, although
 however if you managed to understand the root problem you'd be able to
 come up with something much better.

 {{{#!php
 <?php

 // Cleanup of accidental links within links.
 return preg_replace( '#(<a([ \r\n\t]+[^>]+?>|>))(.+)?<a
 [^>]+?>([^>]+?)</a>(.+)?</a>#is', '$1$3$4$5</a>', $r );
 }}}

 In my patch you'll see the following changes:

 1. Added `(.+)?` for capturing any leading characters before any nested
 links
 2. Added `(.+)?` for capturing any trailing characters before any nested
 links
 3. Added `s` modifier so as to have the previous capturers work with
 newlines
 4. `'$1$3$4$5</a>'` restoring the captured leading or trailing characters
 back into the cleaned up HTML.

 Also worth noting I am running WordPress 5.4.2, PHP 7.4.4, nginx/1.17.10
 on an Alpine Linux docker container (Linux 4.19.76-linuxkit x86_64).

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/50514>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list