<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[5101] sites/trunk/api.wordpress.org/public_html/events/1.0: Events API: Extract country IDs from multi-word inputs to improve matching</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta" style="font-size: 105%">
<dt style="float: left; width: 6em; font-weight: bold">Revision</dt> <dd><a style="font-weight: bold" href="http://meta.trac.wordpress.org/changeset/5101">5101</a><script type="application/ld+json">{"@context":"http://schema.org","@type":"EmailMessage","description":"Review this Commit","action":{"@type":"ViewAction","url":"http://meta.trac.wordpress.org/changeset/5101","name":"Review Commit"}}</script></dd>
<dt style="float: left; width: 6em; font-weight: bold">Author</dt> <dd>iandunn</dd>
<dt style="float: left; width: 6em; font-weight: bold">Date</dt> <dd>2017-03-07 01:28:40 +0000 (Tue, 07 Mar 2017)</dd>
</dl>

<pre style='padding-left: 1em; margin: 2em 0; border-left: 2px solid #ccc; line-height: 1.25; font-size: 105%; font-family: sans-serif'>Events API: Extract country IDs from multi-word inputs to improve matching

This enables successfully matching locations for country codes in inputs like "GB" and "London GB"; and also matching country names of varying length, regardless of the city length, in inputs like "Vancouver Canada", "Santiago De Los Caballeros, Dominican Republic", and "Kaga-Bandoro, Central African Republic".</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#sitestrunkapiwordpressorgpublic_htmlevents10indexphp">sites/trunk/api.wordpress.org/public_html/events/1.0/index.php</a></li>
<li><a href="#sitestrunkapiwordpressorgpublic_htmlevents10teststestindexphp">sites/trunk/api.wordpress.org/public_html/events/1.0/tests/test-index.php</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="sitestrunkapiwordpressorgpublic_htmlevents10indexphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: sites/trunk/api.wordpress.org/public_html/events/1.0/index.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- sites/trunk/api.wordpress.org/public_html/events/1.0/index.php    2017-03-07 01:28:35 UTC (rev 5100)
+++ sites/trunk/api.wordpress.org/public_html/events/1.0/index.php      2017-03-07 01:28:40 UTC (rev 5101)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -270,26 +270,83 @@
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><span class="cx" style="display: block; padding: 0 10px">  * Guess the location based on a country identifier inside the given input
</span><span class="cx" style="display: block; padding: 0 10px">  *
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * This isn't perfect because some of the country names in the database are in a format that regular
+ * people wouldn't type -- e.g., "Venezuela, Bolvarian Republic Of" -- but this will still match a
+ * majority of them.
+ *
+ * Currently, this only works with English names because that's the only data we have.
+ *
</ins><span class="cx" style="display: block; padding: 0 10px">  * @param string $location_name
</span><span class="cx" style="display: block; padding: 0 10px">  *
</span><span class="cx" style="display: block; padding: 0 10px">  * @return false|string false on failure; a country code on success
</span><span class="cx" style="display: block; padding: 0 10px">  */
</span><span class="cx" style="display: block; padding: 0 10px"> function guess_location_from_country( $location_name ) {
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+        // Check if they entered only the country name, e.g. "Germany" or "New Zealand"
+       $country_code = get_country_code_from_name( $location_name );
+       $location_word_count = str_word_count( $location_name );
+       $location_name_parts = explode( ' ', $location_name );
+
+       // Check if they entered only the country code, e.g., "GB"
+       if ( ! $country_code ) {
+               $valid_country_codes = get_valid_country_codes();
+
+               if ( in_array( $location_name, $valid_country_codes, true ) ) {
+                       $country_code = $location_name;
+               }
+       }
+
</ins><span class="cx" style="display: block; padding: 0 10px">         /*
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-         * Check if they entered only the country name, e.g. "Germany" or "New Zealand"
-        *
-        * This isn't perfect because some of the country names in the database are in a format that regular
-        * people wouldn't type -- e.g., "Venezuela, Bolvarian Republic Of" -- but this will still match a
-        * majority of them.
-        *
-        * Currently, this only works with English names because that's the only data we have.
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+  * Multi-word queries may contain cities, regions, and countries, so try to extract just the country
</ins><span class="cx" style="display: block; padding: 0 10px">          */
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        $location_country_code = get_country_code_from_name( $location_name );
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ if ( ! $country_code && $location_word_count >= 2 ) {
+               // Catch input like "Vancouver Canada"
+               $country_id   = $location_name_parts[ $location_word_count - 1 ];
+               $country_code = get_country_code_from_name( $country_id );
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><del style="background-color: #fdd; text-decoration:none; display:block; padding: 0 10px">-        return $location_country_code;
</del><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+         // Catch input like "London GB"
+               if ( ! $country_code ) {
+                       if ( in_array( $country_id, $valid_country_codes, true ) ) {
+                               $country_code = $country_id;
+                       }
+               }
+       }
+
+       if ( ! $country_code && $location_word_count >= 3 ) {
+               // Catch input like "Santiago De Los Caballeros, Dominican Republic"
+               $country_name = sprintf(
+                       '%s %s',
+                       $location_name_parts[ $location_word_count - 2 ],
+                       $location_name_parts[ $location_word_count - 1 ]
+               );
+               $country_code = get_country_code_from_name( $country_name );
+       }
+
+       if ( ! $country_code && $location_word_count >= 4 ) {
+               // Catch input like "Kaga-Bandoro, Central African Republic"
+               $country_name = sprintf(
+                       '%s %s %s',
+                       $location_name_parts[ $location_word_count - 3 ],
+                       $location_name_parts[ $location_word_count - 2 ],
+                       $location_name_parts[ $location_word_count - 1 ]
+               );
+               $country_code = get_country_code_from_name( $country_name );
+       }
+
+       return $country_code;
</ins><span class="cx" style="display: block; padding: 0 10px"> }
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px"> /**
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+ * Get a list of valid country codes
+ *
+ * @return array
+ */
+function get_valid_country_codes() {
+       global $wpdb;
+
+       return $wpdb->get_col( "SELECT DISTINCT country FROM geoname" );
+}
+
+/**
</ins><span class="cx" style="display: block; padding: 0 10px">  * Get the country code that corresponds to the given country name
</span><span class="cx" style="display: block; padding: 0 10px">  *
</span><span class="cx" style="display: block; padding: 0 10px">  * @param string $country_name
</span></span></pre></div>
<a id="sitestrunkapiwordpressorgpublic_htmlevents10teststestindexphp"></a>
<div class="modfile"><h4 style="background-color: #eee; color: inherit; margin: 1em 0; padding: 1.3em; font-size: 115%">Modified: sites/trunk/api.wordpress.org/public_html/events/1.0/tests/test-index.php</h4>
<pre class="diff"><span>
<span class="info" style="display: block; padding: 0 10px; color: #888">--- sites/trunk/api.wordpress.org/public_html/events/1.0/tests/test-index.php 2017-03-07 01:28:35 UTC (rev 5100)
+++ sites/trunk/api.wordpress.org/public_html/events/1.0/tests/test-index.php   2017-03-07 01:28:40 UTC (rev 5101)
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -426,6 +426,8 @@
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><span class="cx" style="display: block; padding: 0 10px">                /*
</span><span class="cx" style="display: block; padding: 0 10px">                 * A combination of city, region, and country are given, along with the locale and timezone
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                 *
+                * InvalidCity is used in tests that want to bypass the guess_location_from_city() tests and only test the country
</ins><span class="cx" style="display: block; padding: 0 10px">                  */
</span><span class="cx" style="display: block; padding: 0 10px">                '1-word-city-region' => array(
</span><span class="cx" style="display: block; padding: 0 10px">                        'input' => array(
</span><span class="lines" style="display: block; padding: 0 10px; color: #888">@@ -455,7 +457,62 @@
</span><span class="cx" style="display: block; padding: 0 10px">                        ),
</span><span class="cx" style="display: block; padding: 0 10px">                ),
</span><span class="cx" style="display: block; padding: 0 10px"> 
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                'city-1-word-country' => array(
+                       'input' => array(
+                               'location_name' => 'InvalidCity Canada',
+                               'locale'        => 'en_CA',
+                               'timezone'      => 'America/Vancouver',
+                       ),
+                       'expected' => array(
+                               'country' => 'CA',
+                       ),
+               ),
</ins><span class="cx" style="display: block; padding: 0 10px"> 
</span><ins style="background-color: #dfd; text-decoration:none; display:block; padding: 0 10px">+                'city-2-word-country' => array(
+                       'input' => array(
+                               'location_name' => 'InvalidCity Dominican Republic',
+                               'locale'        => 'es_ES',
+                               'timezone'      => 'America/Santo_Domingo',
+                       ),
+                       'expected' => array(
+                               'country' => 'DO',
+                       ),
+               ),
+
+               'city-3-word-country' => array(
+                       'input' => array(
+                               'location_name' => 'InvalidCity Central African Republic',
+                               'locale'        => 'fr_FR',
+                               'timezone'      => 'Africa/Bangui',
+                       ),
+                       'expected' => array(
+                               'country' => 'CF',
+                       ),
+               ),
+
+               'country-code' => array(
+                       'input' => array(
+                               'location_name' => 'GB',
+                               'locale'        => 'en_GB',
+                               'timezone'      => 'Europe/London',
+                       ),
+                       'expected' => array(
+                               'country' => 'GB',
+                       ),
+               ),
+
+               'city-country-code' => array(
+                       'input' => array(
+                               'location_name' => 'InvalidCity BI',
+                               'locale'        => 'fr_FR',
+                               'timezone'      => 'Africa/Bujumbura',
+                       ),
+                       'expected' => array(
+                               'country' => 'BI',
+                       ),
+               ),
+
+
</ins><span class="cx" style="display: block; padding: 0 10px">                 /*
</span><span class="cx" style="display: block; padding: 0 10px">                 * Only the IP is given
</span><span class="cx" style="display: block; padding: 0 10px">                 */
</span></span></pre>
</div>
</div>

</body>
</html>