<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[5417] sites/trunk/api.wordpress.org/public_html/events/1.0: Events: Refactor the ideographic fallback for use with ASCII queries too</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="http://meta.trac.wordpress.org/changeset/5417">5417</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"http://meta.trac.wordpress.org/changeset/5417","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>iandunn</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2017-04-27 18:11:16 +0000 (Thu, 27 Apr 2017)</dd>
</dl>
<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>Events: Refactor the ideographic fallback for use with ASCII queries too
This was originally intended for ideographic languages, but there are new edge cases where it is helpful for ASCII queries as well.
See https://github.com/coreymckrill/nearby-wordpress-events/issues/37</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#sitestrunkapiwordpressorgpublic_htmlevents10indexphp">sites/trunk/api.wordpress.org/public_html/events/1.0/index.php</a></li>
<li><a href="#sitestrunkapiwordpressorgpublic_htmlevents10teststestindexphp">sites/trunk/api.wordpress.org/public_html/events/1.0/tests/test-index.php</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="sitestrunkapiwordpressorgpublic_htmlevents10indexphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: sites/trunk/api.wordpress.org/public_html/events/1.0/index.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- sites/trunk/api.wordpress.org/public_html/events/1.0/index.php 2017-04-26 22:46:56 UTC (rev 5416)
+++ sites/trunk/api.wordpress.org/public_html/events/1.0/index.php 2017-04-27 18:11:16 UTC (rev 5417)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -211,34 +211,39 @@
</span><span class="cx" style="display: block; padding: 0 10px"> ) );
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> if ( ! is_a( $row, 'stdClass' ) && 'ASCII' !== mb_detect_encoding( $location_name ) ) {
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $row = guess_ideographic_location_from_geonames( $location_name, $country, $timezone );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $row = guess_location_from_geonames_fallback( $location_name, $country, $timezone, 'exact', 'ideographic' );
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> return $row;
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * Look for the given ideographic location in the Geonames database
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Look for the given location in the Geonames database using a LIKE query
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * This is a fallback for situations where the full-text search in `guess_location_from_geonames()` resulted
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * in a false-negative. MySQL < 5.7.6 doesn't support full-text searches on ideographic languages, because
- * it cannot determine where the word boundaries are.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * in a false-negative.
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * One situation where this happens is with queries in ideographic languages, because MySQL < 5.7.6 doesn't
+ * support full-text searches for them, because it can't determine where the word boundaries are.
</ins><span class="cx" style="display: block; padding: 0 10px"> * See https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * There are also edge cases where the exact query doesn't exist in the database, but a loose LIKE query will find
+ * a similar alternate, like `Osakashi`.
+ *
</ins><span class="cx" style="display: block; padding: 0 10px"> * @param string $location_name
</span><span class="cx" style="display: block; padding: 0 10px"> * @param string $country
</span><span class="cx" style="display: block; padding: 0 10px"> * @param string $timezone
</span><span class="cx" style="display: block; padding: 0 10px"> * @param string $mode 'exact' to only return exact matches from the database;
</span><span class="cx" style="display: block; padding: 0 10px"> * 'loose' to return any match. This has a high chance of false positives.
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * @param string $restrict_counties 'ideographic' to only search in countries where ideographic languages are common;
+ * 'none' to search all countries
</ins><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * @return stdClass|null
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-function guess_ideographic_location_from_geonames( $location_name, $country, $timezone, $mode = 'exact' ) {
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+function guess_location_from_geonames_fallback( $location_name, $country, $timezone, $mode = 'exact', $restrict_counties = 'ideographic' ) {
</ins><span class="cx" style="display: block; padding: 0 10px"> global $wpdb;
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $ideographic_countries = get_ideographic_counties();
- $ideographic_country_placeholders = get_prepare_placeholders( count( $ideographic_countries ), '%s' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $where = $ideographic_countries = $ideographic_country_placeholders = '';
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * The name is wrapped in commas in order to ensure that we're only matching the exact location, which is
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -255,6 +260,17 @@
</span><span class="cx" style="display: block; padding: 0 10px"> $wpdb->esc_like( $location_name )
</span><span class="cx" style="display: block; padding: 0 10px"> );
</span><span class="cx" style="display: block; padding: 0 10px">
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $prepare_args = array( $escaped_location_name, $country, $timezone );
+
+ if ( 'ideographic' == $restrict_counties ) {
+ $ideographic_countries = get_ideographic_counties();
+ $ideographic_country_placeholders = get_prepare_placeholders( count( $ideographic_countries ), '%s' );
+
+ $where .= "country IN ( $ideographic_country_placeholders ) AND";
+
+ $prepare_args = array_merge( $ideographic_countries, $prepare_args );
+ }
+
</ins><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * REPLACE() is used because sometimes the `alternatenames` column contains entries where the `asciiname` is
</span><span class="cx" style="display: block; padding: 0 10px"> * prefixed to an ideographic name; for example: `,Karachi - كراچى,`
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -269,7 +285,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> SELECT name, latitude, longitude, country
</span><span class="cx" style="display: block; padding: 0 10px"> FROM `geoname`
</span><span class="cx" style="display: block; padding: 0 10px"> WHERE
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- country IN ( $ideographic_country_placeholders ) AND
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $where
</ins><span class="cx" style="display: block; padding: 0 10px"> REPLACE( alternatenames, CONCAT( asciiname, ' - ' ), '' ) LIKE %s
</span><span class="cx" style="display: block; padding: 0 10px"> ORDER BY
</span><span class="cx" style="display: block; padding: 0 10px"> FIELD( %s, country ) DESC,
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -277,10 +293,7 @@
</span><span class="cx" style="display: block; padding: 0 10px"> population DESC
</span><span class="cx" style="display: block; padding: 0 10px"> LIMIT 1";
</span><span class="cx" style="display: block; padding: 0 10px">
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- $prepared_query = $wpdb->prepare(
- $query,
- array_merge( $ideographic_countries, array( $escaped_location_name, $country, $timezone ) )
- );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ $prepared_query = $wpdb->prepare( $query, $prepare_args );
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> return $wpdb->get_row( $prepared_query );
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -443,14 +456,18 @@
</span><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /*
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- * If all else fails for a non-ASCII request, cast a wide net and try to find something before giving up, even
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * If all else fails, cast a wide net and try to find something before giving up, even
</ins><span class="cx" style="display: block; padding: 0 10px"> * if the chance of success if lower than normal. Returning false is guaranteed failure, so this improves things
</span><span class="cx" style="display: block; padding: 0 10px"> * even if it only works 10% of the time.
</span><span class="cx" style="display: block; padding: 0 10px"> *
</span><span class="cx" style="display: block; padding: 0 10px"> * This must be done as the very last thing before giving up, because the likelihood of false positives is high.
</span><span class="cx" style="display: block; padding: 0 10px"> */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">- if ( ! $location && isset( $args['location_name'] ) && 'ASCII' !== mb_detect_encoding( $args['location_name'] ) ) {
- $guess = guess_ideographic_location_from_geonames( $args['location_name'], $country_code, $args['timezone'] ?? '', 'loose' );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( ! $location && isset( $args['location_name'] ) ) {
+ if ( 'ASCII' === mb_detect_encoding( $args['location_name'] ) ) {
+ $guess = guess_location_from_geonames_fallback( $args['location_name'], $country_code, $args['timezone'] ?? '', 'loose', 'none' );
+ } else {
+ $guess = guess_location_from_geonames_fallback( $args['location_name'], $country_code, $args['timezone'] ?? '', 'loose', 'ideographic' );
+ }
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> if ( $guess ) {
</span><span class="cx" style="display: block; padding: 0 10px"> $location = array(
</span></span></pre></div>
<a id="sitestrunkapiwordpressorgpublic_htmlevents10teststestindexphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: sites/trunk/api.wordpress.org/public_html/events/1.0/tests/test-index.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- sites/trunk/api.wordpress.org/public_html/events/1.0/tests/test-index.php 2017-04-26 22:46:56 UTC (rev 5416)
+++ sites/trunk/api.wordpress.org/public_html/events/1.0/tests/test-index.php 2017-04-27 18:11:16 UTC (rev 5417)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -269,6 +269,20 @@
</span><span class="cx" style="display: block; padding: 0 10px"> ),
</span><span class="cx" style="display: block; padding: 0 10px"> ),
</span><span class="cx" style="display: block; padding: 0 10px">
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ // Users will often type them without the dash, bypassing an exact match
+ 'city-with-dashes-in-formal-name' => array(
+ 'input' => array(
+ 'location_name' => 'Osakashi',
+ 'locale' => 'ja',
+ 'timezone' => 'Asia/Tokyo',
+ ),
+ 'expected' => array(
+ 'description' => 'osaka',
+ 'latitude' => '34.694',
+ 'longitude' => '135.502',
+ 'country' => 'JP',
+ ),
+ ),
</ins><span class="cx" style="display: block; padding: 0 10px">
</span><span class="cx" style="display: block; padding: 0 10px"> /*
</span><span class="cx" style="display: block; padding: 0 10px"> * The city endonym, locale, and timezone are given
</span></span></pre>
</div>
</div>
</body>
</html>