[wp-hackers] non-ascii characters at URL and pasrsing those chars at string level

Jason LeVan jason at codeclarified.com
Tue Sep 9 23:53:54 UTC 2014


urldecode() mixed with remove_accents() perhaps?

https://core.trac.wordpress.org/browser/trunk/src/wp-includes/formatting.php#L794

___________________________________

Jason LeVan

Email: jason at codeclarified.com

Twitter: @codeclarified

On Tue, Sep 9, 2014 at 6:03 PM, Haluk Karamete <halukkaramete at gmail.com>
wrote:

> First off, I need to get you what non-ascii chacters I'm talking about.
>
> For instance, just type in 'Slobodan Milosevic' in Google Search and go to
> the first suggested wikipedia link.
>
> You will see that the URL contains very unusual characters that is well
> beyond the common ASCII set. I'm simply curious if WordPress support that.
>
> Though this is not a feature I particularly like (to say the least), I do
> confess that I find it quite interesting from an HTTP point of view.
>
> But my real question (or pain to better put) is this.
> Say you are scraping that data and you came across that title with those
> funny characers...  and you want to create a tag out of that.
>
> Is there a conversion function that I can pass in that string and get back
> the ASCII 128 or below translated version?
>
> So I pass in 'slobodan_milo%c5%a1evi%c4%87', and I get back the good old
> 'Slobodan Milosevic'
>
> Does such a function exist? Or how do you deal with that situation?
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>


More information about the wp-hackers mailing list