<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[Tests][915] trunk: Make some sense of the original formatting tests from [121].</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg > ul, #logmsg > ol { margin-left: 0; margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://unit-tests.trac.wordpress.org/changeset/915">915</a></dd>
<dt>Author</dt> <dd>nacin</dd>
<dt>Date</dt> <dd>2012-07-19 14:41:52 +0000 (Thu, 19 Jul 2012)</dd>
</dl>
<h3>Log Message</h3>
<pre>Make some sense of the original formatting tests from <a href="http://unit-tests.trac.wordpress.org/changeset/121">[121]</a>.
* Dissolve the old directories, moving data into /data/formatting/ and tests into /tests/formatting/.
* Remove the old formatting testcase, instead using straight file() calls functioning as PHPUnit data providers.
* Bring back the tests for the "funky javascript fix" in the form of testing _convert_urlencoded_to_entities(), removed in <a href="http://unit-tests.trac.wordpress.org/changeset/403">[403]</a>.
see <a href="http://unit-tests.trac.wordpress.org/ticket/42">#42</a>, see <a href="http://unit-tests.trac.wordpress.org/ticket/12">#12</a>.</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunktestsformattingRemoveAccentsphp">trunk/tests/formatting/RemoveAccents.php</a></li>
</ul>
<h3>Added Paths</h3>
<ul>
<li><a href="#trunkdataformattingbig5txt">trunk/data/formatting/big5.txt</a></li>
<li><a href="#trunkdataformattingentitiestxt">trunk/data/formatting/entities.txt</a></li>
<li>trunk/data/formatting/utf-8/</li>
<li><a href="#trunkdataformattingutf8README">trunk/data/formatting/utf-8/README</a></li>
<li><a href="#trunkdataformattingutf8entitizepy">trunk/data/formatting/utf-8/entitize.py</a></li>
<li><a href="#trunkdataformattingutf8entitizedtxt">trunk/data/formatting/utf-8/entitized.txt</a></li>
<li><a href="#trunkdataformattingutf8uurlencodepy">trunk/data/formatting/utf-8/u-urlencode.py</a></li>
<li><a href="#trunkdataformattingutf8uurlencodedtxt">trunk/data/formatting/utf-8/u-urlencoded.txt</a></li>
<li><a href="#trunkdataformattingutf8urlencodepy">trunk/data/formatting/utf-8/urlencode.py</a></li>
<li><a href="#trunkdataformattingutf8urlencodedtxt">trunk/data/formatting/utf-8/urlencoded.txt</a></li>
<li><a href="#trunkdataformattingutf8utf8txt">trunk/data/formatting/utf-8/utf-8.txt</a></li>
<li><a href="#trunkdataformattingwindows1252py">trunk/data/formatting/windows1252.py</a></li>
<li><a href="#trunktestsformattingSeemsUtf8php">trunk/tests/formatting/SeemsUtf8.php</a></li>
<li><a href="#trunktestsformattingUrlEncodedToEntitiesphp">trunk/tests/formatting/UrlEncodedToEntities.php</a></li>
<li><a href="#trunktestsformattingUtf8UriEncodephp">trunk/tests/formatting/Utf8UriEncode.php</a></li>
<li><a href="#trunktestsformattingent2ncrphp">trunk/tests/formatting/ent2ncr.php</a></li>
<li><a href="#trunktestsformattingisoDescramblerphp">trunk/tests/formatting/isoDescrambler.php</a></li>
<li><a href="#trunktestsqueryverboseRewriteRulesphp">trunk/tests/query/verboseRewriteRules.php</a></li>
</ul>
<h3>Removed Paths</h3>
<ul>
<li>trunk/data/jacob/</li>
<li>trunk/tests/jacob/</li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkdataformattingbig5txtfromrev909trunkdatajacobtest_big5txt"></a>
<div class="copfile"><h4>Copied: trunk/data/formatting/big5.txt (from rev 909, trunk/data/jacob/test_big5.txt) (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/data/formatting/big5.txt         (rev 0)
+++ trunk/data/formatting/big5.txt        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,51 @@
</span><ins>+?lmDwgn H@~|Q
+
+?lDg
+
+H@
+
+DiDAD`DCWiWAD`WCLAW?alQAWUC
+G`LAH[?F`AH[uC?APX?WAP?
+?C?S?AC
+
+HG
+
+?U?AcoQ?AoCGL??A
+?Au??AU?gAn?MAeHCOHtHBuL
+voAuv?CU@j?A??A?A\
+?~C??~AOHhC
+
+HT
+
+|A?QQofA?sQiA??
+COHutHvvA?A?Az?Aj?C`?LL
+C???]CuLvAhLvC
+
+h|
+
+uDvRA????CWAUvQUA?gAM
+AP?Q?FsC^?lHH?C
+
+H
+
+?aAHU?QtHAH?m?C?aAS
+GH?}A??UXCh?aApuC
+
+H
+
+AO??gC??AO??a?CYsA??C
+
+HC
+
+?a[C?a?HB[?AH???AG[COHtH
+?A~??sCDHLpHG?pC
+
+HK
+
+WYCQU?CBH?cAGXDC~aA?
+WAPAHAFvA?A?gC??AGL?C
+
+HE
+
+??ApwQ?UAiOC?AuQIQ
+zA?SC\EhA?DC
</ins></span></pre></div>
<a id="trunkdataformattingentitiestxtfromrev909trunkdatajacobentitiestxt"></a>
<div class="copfile"><h4>Copied: trunk/data/formatting/entities.txt (from rev 909, trunk/data/jacob/entities.txt) (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/data/formatting/entities.txt         (rev 0)
+++ trunk/data/formatting/entities.txt        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,255 @@
</span><ins>+### Named HTML character entities, their numeric reference
+### (e.g. for &#[0-9]+; entity form), and their use.
+### From: http://www.w3.org/TR/html401/sgml/entities.html
+###
+nbsp        | 160        ### no-break space
+iexcl        | 161        ### inverted exclamation mark
+cent        | 162        ### cent sign
+pound        | 163        ### pound sterling sign
+curren        | 164        ### general currency sign
+yen         | 165        ### yen sign
+brvbar        | 166        ### broken (vertical) bar
+sect        | 167        ### section sign
+uml         | 168        ### umlaut (dieresis)
+copy        | 169        ### copyright sign
+ordf        | 170        ### ordinal indicator, feminine
+laquo        | 171        ### angle quotation mark, left
+not         | 172        ### not sign
+shy         | 173        ### soft hyphen
+reg         | 174        ### registered sign
+macr        | 175        ### macron
+deg         | 176        ### degree sign
+plusmn        | 177        ### plus-or-minus sign
+sup2        | 178        ### superscript two
+sup3        | 179        ### superscript three
+acute        | 180        ### acute accent
+micro        | 181        ### micro sign
+para        | 182        ### pilcrow (paragraph sign)
+middot        | 183        ### middle dot
+cedil        | 184        ### cedilla
+sup1        | 185        ### superscript one
+ordm        | 186        ### ordinal indicator, masculine
+raquo        | 187        ### angle quotation mark, right
+frac14        | 188        ### fraction one-quarter
+frac12        | 189        ### fraction one-half
+frac34        | 190        ### fraction three-quarters
+iquest        | 191        ### inverted question mark
+Agrave        | 192        ### capital A, grave accent
+Aacute        | 193        ### capital A, acute accent
+Acirc        | 194        ### capital A, circumflex accent
+Atilde        | 195        ### capital A, tilde
+Auml        | 196        ### capital A, dieresis or umlaut mark
+Aring        | 197        ### capital A, ring
+AElig        | 198        ### capital AE diphthong (ligature)
+Ccedil        | 199        ### capital C, cedilla
+Egrave        | 200        ### capital E, grave accent
+Eacute        | 201        ### capital E, acute accent
+Ecirc        | 202        ### capital E, circumflex accent
+Euml        | 203        ### capital E, dieresis or umlaut mark
+Igrave        | 204        ### capital I, grave accent
+Iacute        | 205        ### capital I, acute accent
+Icirc        | 206        ### capital I, circumflex accent
+Iuml        | 207        ### capital I, dieresis or umlaut mark
+ETH         | 208        ### capital Eth, Icelandic
+Ntilde        | 209        ### capital N, tilde
+Ograve        | 210        ### capital O, grave accent
+Oacute        | 211        ### capital O, acute accent
+Ocirc        | 212        ### capital O, circumflex accent
+Otilde        | 213        ### capital O, tilde
+Ouml        | 214        ### capital O, dieresis or umlaut mark
+times        | 215        ### multiply sign
+Oslash        | 216        ### capital O, slash
+Ugrave        | 217        ### capital U, grave accent
+Uacute        | 218        ### capital U, acute accent
+Ucirc        | 219        ### capital U, circumflex accent
+Uuml        | 220        ### capital U, dieresis or umlaut mark
+Yacute        | 221        ### capital Y, acute accent
+THORN        | 222        ### capital THORN, Icelandic
+szlig        | 223        ### small sharp s, German (sz ligature)
+agrave        | 224        ### small a, grave accent
+aacute        | 225        ### small a, acute accent
+acirc        | 226        ### small a, circumflex accent
+atilde        | 227        ### small a, tilde
+auml        | 228        ### small a, dieresis or umlaut mark
+aring        | 229        ### small a, ring
+aelig        | 230        ### small ae diphthong (ligature)
+ccedil        | 231        ### small c, cedilla
+egrave        | 232        ### small e, grave accent
+eacute        | 233        ### small e, acute accent
+ecirc        | 234        ### small e, circumflex accent
+euml        | 235        ### small e, dieresis or umlaut mark
+igrave        | 236        ### small i, grave accent
+iacute        | 237        ### small i, acute accent
+icirc        | 238        ### small i, circumflex accent
+iuml        | 239        ### small i, dieresis or umlaut mark
+eth         | 240        ### small eth, Icelandic
+ntilde        | 241        ### small n, tilde
+ograve        | 242        ### small o, grave accent
+oacute        | 243        ### small o, acute accent
+ocirc        | 244        ### small o, circumflex accent
+otilde        | 245        ### small o, tilde
+ouml        | 246        ### small o, dieresis or umlaut mark
+divide        | 247        ### divide sign
+oslash        | 248        ### small o, slash
+ugrave        | 249        ### small u, grave accent
+uacute        | 250        ### small u, acute accent
+ucirc        | 251        ### small u, circumflex accent
+uuml        | 252        ### small u, dieresis or umlaut mark
+yacute        | 253        ### small y, acute accent
+thorn        | 254        ### small thorn, Icelandic
+yuml        | 255        ### small y, dieresis or umlaut mark
+fnof        | 402        ### latin small f with hook, =function, =florin, u+0192 ISOtech
+Alpha        | 913        ### greek capital letter alpha, u+0391
+Beta        | 914        ### greek capital letter beta, u+0392
+Gamma        | 915        ### greek capital letter gamma, u+0393 ISOgrk3
+Delta        | 916        ### greek capital letter delta, u+0394 ISOgrk3
+Epsilon        | 917        ### greek capital letter epsilon, u+0395
+Zeta        | 918        ### greek capital letter zeta, u+0396
+Eta         | 919        ### greek capital letter eta, u+0397
+Theta        | 920        ### greek capital letter theta, u+0398 ISOgrk3
+Iota        | 921        ### greek capital letter iota, u+0399
+Kappa        | 922        ### greek capital letter kappa, u+039A
+Lambda        | 923        ### greek capital letter lambda, u+039B ISOgrk3
+Mu         | 924        ### greek capital letter mu, u+039C
+Nu         | 925        ### greek capital letter nu, u+039D
+Xi         | 926        ### greek capital letter xi, u+039E ISOgrk3
+Omicron        | 927        ### greek capital letter omicron, u+039F
+Pi         | 928        ### greek capital letter pi, u+03A0 ISOgrk3
+Rho         | 929        ### greek capital letter rho, u+03A1
+Sigma        | 931        ### greek capital letter sigma, u+03A3 ISOgrk3
+Tau         | 932        ### greek capital letter tau, u+03A4
+Upsilon        | 933        ### greek capital letter upsilon, u+03A5 ISOgrk3
+Phi         | 934        ### greek capital letter phi, u+03A6 ISOgrk3
+Chi         | 935        ### greek capital letter chi, u+03A7
+Psi         | 936        ### greek capital letter psi, u+03A8 ISOgrk3
+Omega        | 937        ### greek capital letter omega, u+03A9 ISOgrk3
+alpha        | 945        ### greek small letter alpha, u+03B1 ISOgrk3
+beta        | 946        ### greek small letter beta, u+03B2 ISOgrk3
+gamma        | 947        ### greek small letter gamma, u+03B3 ISOgrk3
+delta        | 948        ### greek small letter delta, u+03B4 ISOgrk3
+epsilon        | 949        ### greek small letter epsilon, u+03B5 ISOgrk3
+zeta        | 950        ### greek small letter zeta, u+03B6 ISOgrk3
+eta         | 951        ### greek small letter eta, u+03B7 ISOgrk3
+theta        | 952        ### greek small letter theta, u+03B8 ISOgrk3
+iota        | 953        ### greek small letter iota, u+03B9 ISOgrk3
+kappa        | 954        ### greek small letter kappa, u+03BA ISOgrk3
+lambda        | 955        ### greek small letter lambda, u+03BB ISOgrk3
+mu         | 956        ### greek small letter mu, u+03BC ISOgrk3
+nu         | 957        ### greek small letter nu, u+03BD ISOgrk3
+xi         | 958        ### greek small letter xi, u+03BE ISOgrk3
+omicron        | 959        ### greek small letter omicron, u+03BF NEW
+pi         | 960        ### greek small letter pi, u+03C0 ISOgrk3
+rho         | 961        ### greek small letter rho, u+03C1 ISOgrk3
+sigmaf        | 962        ### greek small letter final sigma, u+03C2 ISOgrk3
+sigma        | 963        ### greek small letter sigma, u+03C3 ISOgrk3
+tau         | 964        ### greek small letter tau, u+03C4 ISOgrk3
+upsilon        | 965        ### greek small letter upsilon, u+03C5 ISOgrk3
+phi         | 966        ### greek small letter phi, u+03C6 ISOgrk3
+chi         | 967        ### greek small letter chi, u+03C7 ISOgrk3
+psi         | 968        ### greek small letter psi, u+03C8 ISOgrk3
+omega        | 969        ### greek small letter omega, u+03C9 ISOgrk3
+thetasym| 977        ### greek small letter theta symbol, u+03D1 NEW
+upsih        | 978        ### greek upsilon with hook symbol, u+03D2 NEW
+piv         | 982        ### greek pi symbol, u+03D6 ISOgrk3
+bull        | 8226        ### bullet, =black small circle, u+2022 ISOpub
+hellip        | 8230        ### horizontal ellipsis, =three dot leader, u+2026 ISOpub
+prime        | 8242        ### prime, =minutes, =feet, u+2032 ISOtech
+Prime        | 8243        ### double prime, =seconds, =inches, u+2033 ISOtech
+oline        | 8254        ### overline, =spacing overscore, u+203E NEW
+frasl        | 8260        ### fraction slash, u+2044 NEW
+weierp        | 8472        ### script capital P, =power set, =Weierstrass p, u+2118 ISOamso
+image        | 8465        ### blackletter capital I, =imaginary part, u+2111 ISOamso
+real        | 8476        ### blackletter capital R, =real part symbol, u+211C ISOamso
+trade        | 8482        ### trade mark sign, u+2122 ISOnum
+alefsym        | 8501        ### alef symbol, =first transfinite cardinal, u+2135 NEW
+larr        | 8592        ### leftwards arrow, u+2190 ISOnum
+uarr        | 8593        ### upwards arrow, u+2191 ISOnum
+rarr        | 8594        ### rightwards arrow, u+2192 ISOnum
+darr        | 8595        ### downwards arrow, u+2193 ISOnum
+harr        | 8596        ### left right arrow, u+2194 ISOamsa
+crarr        | 8629        ### downwards arrow with corner leftwards, =carriage return, u+21B5 NEW
+lArr        | 8656        ### leftwards double arrow, u+21D0 ISOtech
+uArr        | 8657        ### upwards double arrow, u+21D1 ISOamsa
+rArr        | 8658        ### rightwards double arrow, u+21D2 ISOtech
+dArr        | 8659        ### downwards double arrow, u+21D3 ISOamsa
+hArr        | 8660        ### left right double arrow, u+21D4 ISOamsa
+forall        | 8704        ### for all, u+2200 ISOtech
+part        | 8706        ### partial differential, u+2202 ISOtech
+exist        | 8707        ### there exists, u+2203 ISOtech
+empty        | 8709        ### empty set, =null set, =diameter, u+2205 ISOamso
+nabla        | 8711        ### nabla, =backward difference, u+2207 ISOtech
+isin        | 8712        ### element of, u+2208 ISOtech
+notin        | 8713        ### not an element of, u+2209 ISOtech
+ni         | 8715        ### contains as member, u+220B ISOtech
+prod        | 8719        ### n-ary product, =product sign, u+220F ISOamsb
+sum         | 8721        ### n-ary sumation, u+2211 ISOamsb
+minus        | 8722        ### minus sign, u+2212 ISOtech
+lowast        | 8727        ### asterisk operator, u+2217 ISOtech
+radic        | 8730        ### square root, =radical sign, u+221A ISOtech
+prop        | 8733        ### proportional to, u+221D ISOtech
+infin        | 8734        ### infinity, u+221E ISOtech
+ang         | 8736        ### angle, u+2220 ISOamso
+and         | 8743        ### logical and, =wedge, u+2227 ISOtech
+or         | 8744        ### logical or, =vee, u+2228 ISOtech
+cap         | 8745        ### intersection, =cap, u+2229 ISOtech
+cup         | 8746        ### union, =cup, u+222A ISOtech
+int         | 8747        ### integral, u+222B ISOtech
+there4        | 8756        ### therefore, u+2234 ISOtech
+sim         | 8764        ### tilde operator, =varies with, =similar to, u+223C ISOtech
+cong        | 8773        ### approximately equal to, u+2245 ISOtech
+asymp        | 8776        ### almost equal to, =asymptotic to, u+2248 ISOamsr
+ne         | 8800        ### not equal to, u+2260 ISOtech
+equiv        | 8801        ### identical to, u+2261 ISOtech
+le         | 8804        ### less-than or equal to, u+2264 ISOtech
+ge         | 8805        ### greater-than or equal to, u+2265 ISOtech
+sub         | 8834        ### subset of, u+2282 ISOtech
+sup         | 8835        ### superset of, u+2283 ISOtech
+nsub        | 8836        ### not a subset of, u+2284 ISOamsn
+sube        | 8838        ### subset of or equal to, u+2286 ISOtech
+supe        | 8839        ### superset of or equal to, u+2287 ISOtech
+oplus        | 8853        ### circled plus, =direct sum, u+2295 ISOamsb
+otimes        | 8855        ### circled times, =vector product, u+2297 ISOamsb
+perp        | 8869        ### up tack, =orthogonal to, =perpendicular, u+22A5 ISOtech
+sdot        | 8901        ### dot operator, u+22C5 ISOamsb
+lceil        | 8968        ### left ceiling, =apl upstile, u+2308, ISOamsc
+rceil        | 8969        ### right ceiling, u+2309, ISOamsc
+lfloor        | 8970        ### left floor, =apl downstile, u+230A, ISOamsc
+rfloor        | 8971        ### right floor, u+230B, ISOamsc
+lang        | 9001        ### left-pointing angle bracket, =bra, u+2329 ISOtech
+rang        | 9002        ### right-pointing angle bracket, =ket, u+232A ISOtech
+loz         | 9674        ### lozenge, u+25CA ISOpub
+spades        | 9824        ### black spade suit, u+2660 ISOpub
+clubs        | 9827        ### black club suit, =shamrock, u+2663 ISOpub
+hearts        | 9829        ### black heart suit, =valentine, u+2665 ISOpub
+diams        | 9830        ### black diamond suit, u+2666 ISOpub
+quot        | 34        ### quotation mark, =apl quote, u+0022 ISOnum
+amp         | 38        ### ampersand, u+0026 ISOnum
+lt         | 60        ### less-than sign, u+003C ISOnum
+gt         | 62        ### greater-than sign, u+003E ISOnum
+OElig        | 338        ### latin capital ligature oe, u+0152 ISOlat2
+oelig        | 339        ### latin small ligature oe, u+0153 ISOlat2
+Scaron        | 352        ### latin capital letter s with caron, u+0160 ISOlat2
+scaron        | 353        ### latin small letter s with caron, u+0161 ISOlat2
+Yuml        | 376        ### latin capital letter y with diaeresis, u+0178 ISOlat2
+circ        | 710        ### modifier letter circumflex accent, u+02C6 ISOpub
+tilde        | 732        ### small tilde, u+02DC ISOdia
+ensp        | 8194        ### en space, u+2002 ISOpub
+emsp        | 8195        ### em space, u+2003 ISOpub
+thinsp        | 8201        ### thin space, u+2009 ISOpub
+zwnj        | 8204        ### zero width non-joiner, u+200C NEW RFC 2070
+zwj         | 8205        ### zero width joiner, u+200D NEW RFC 2070
+lrm         | 8206        ### left-to-right mark, u+200E NEW RFC 2070
+rlm         | 8207        ### right-to-left mark, u+200F NEW RFC 2070
+ndash        | 8211        ### en dash, u+2013 ISOpub
+mdash        | 8212        ### em dash, u+2014 ISOpub
+lsquo        | 8216        ### left single quotation mark, u+2018 ISOnum
+rsquo        | 8217        ### right single quotation mark, u+2019 ISOnum
+sbquo        | 8218        ### single low-9 quotation mark, u+201A NEW
+ldquo        | 8220        ### left double quotation mark, u+201C ISOnum
+rdquo        | 8221        ### right double quotation mark, u+201D ISOnum
+bdquo        | 8222        ### double low-9 quotation mark, u+201E NEW
+dagger        | 8224        ### dagger, u+2020 ISOpub
+Dagger        | 8225        ### double dagger, u+2021 ISOpub
+permil        | 8240        ### per mille sign, u+2030 ISOtech
+lsaquo        | 8249        ### single left-pointing angle quotation mark; proposed but not yet standardised
+rsaquo        | 8250        ### single right-pointing angle quotation mark; proposed but not yet standardised
</ins></span></pre></div>
<a id="trunkdataformattingutf8READMEfromrev909trunkdatajacobREADME"></a>
<div class="copfile"><h4>Copied: trunk/data/formatting/utf-8/README (from rev 909, trunk/data/jacob/README) (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/data/formatting/utf-8/README         (rev 0)
+++ trunk/data/formatting/utf-8/README        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,15 @@
</span><ins>+The Python scripts are for generating test data, because Python's Unicode
+support is much, much, much, much better than PHP's.
+
+ * `utf-8/urlencode.py`, `utf-8/u-urlencode.py` and `utf-8/entitize.py` process UTF-8
+ into a few different formats (%-encoding, %u-encoding, &#decimal;)
+ and are used like normal UNIXy pipes.
+
+ Try:
+
+ `python urlencode.py < utf-8.txt > urlencoded.txt`
+ `python u-urlencode.py < utf-8.txt > u-urlencoded.txt`
+ `python entitize.py < utf-8.txt > entitized.txt`
+
+ * `windows-1252.py` converts Windows-only smart-quotes and things
+ into their unicode &#decimal reference; equivalents.
</ins></span></pre></div>
<a id="trunkdataformattingutf8entitizepyfromrev909trunkdatajacobentitizepy"></a>
<div class="copfile"><h4>Copied: trunk/data/formatting/utf-8/entitize.py (from rev 909, trunk/data/jacob/entitize.py) (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/data/formatting/utf-8/entitize.py         (rev 0)
+++ trunk/data/formatting/utf-8/entitize.py        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,23 @@
</span><ins>+# Generates entitized.txt from utf-8.txt.
+# Used by Test_Convert_UrlEncoded_To_Entities.
+
+import codecs
+import sys
+
+def entitize(line):
+ """Convert text to &#[dec]; entities."""
+ line = line.strip();
+ line = ["&#%d;" % ord(s) for s in line]
+ return "".join(line)
+
+if __name__ == "__main__":
+ args = sys.argv[1:]
+ if args and args[0] in ("-h", "--help"):
+ print "Usage: python entitize.py < utf-8.txt > entitized.txt"
+ sys.exit(2)
+
+ sys.stdin = codecs.getreader("utf-8")(sys.stdin)
+ sys.stdout = codecs.getwriter("ascii")(sys.stdout)
+
+ lines = sys.stdin.readlines()
+ sys.stdout.write( "\n".join(map(entitize, lines)) )
</ins></span></pre></div>
<a id="trunkdataformattingutf8entitizedtxtfromrev909trunkdatajacobutf8entitizedtxt"></a>
<div class="copfile"><h4>Copied: trunk/data/formatting/utf-8/entitized.txt (from rev 909, trunk/data/jacob/utf-8-entitized.txt) (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/data/formatting/utf-8/entitized.txt         (rev 0)
+++ trunk/data/formatting/utf-8/entitized.txt        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,5 @@
</span><ins>+&#31456;&#23376;&#24609;
+&#70;&#114;&#97;&#110;&#231;&#111;&#105;&#115;&#32;&#84;&#114;&#117;&#102;&#102;&#97;&#117;&#116;
+&#4321;&#4304;&#4325;&#4304;&#4320;&#4311;&#4309;&#4308;&#4314;&#4317;
+&#66;&#106;&#246;&#114;&#107;&#32;&#71;&#117;&#240;&#109;&#117;&#110;&#100;&#115;&#100;&#243;&#116;&#116;&#105;&#114;
+&#23470;&#23822;&#12288;&#39423;
</ins><span class="cx">\ No newline at end of file
</span></span></pre></div>
<a id="trunkdataformattingutf8uurlencodepyfromrev909trunkdatajacobuurlencodepy"></a>
<div class="copfile"><h4>Copied: trunk/data/formatting/utf-8/u-urlencode.py (from rev 909, trunk/data/jacob/u-urlencode.py) (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/data/formatting/utf-8/u-urlencode.py         (rev 0)
+++ trunk/data/formatting/utf-8/u-urlencode.py        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,23 @@
</span><ins>+# Generates u-urlencoded.txt from utf-8.txt.
+# Used for Test_Convert_UrlEncoded_To_Entities.
+
+import codecs
+import sys
+
+def uurlencode(line):
+ """Use %u[hexvalue] percent encoding."""
+ line = line.strip()
+ line = ["%%u%04X" % ord(s) for s in line]
+ return "".join(line)
+
+if __name__ == "__main__":
+ args = sys.argv[1:]
+ if args and args[0] in ("-h", "--help"):
+ print "Usage: python u-urlencode.py < utf-8.txt > u-urlencoded.txt"
+ sys.exit(2)
+
+ sys.stdin = codecs.getreader("utf-8")(sys.stdin)
+ sys.stdout = codecs.getwriter("ascii")(sys.stdout)
+
+ lines = sys.stdin.readlines()
+ sys.stdout.write( "\n".join(map(uurlencode, lines)) )
</ins></span></pre></div>
<a id="trunkdataformattingutf8uurlencodedtxtfromrev909trunkdatajacobutf8uurlencodedtxt"></a>
<div class="copfile"><h4>Copied: trunk/data/formatting/utf-8/u-urlencoded.txt (from rev 909, trunk/data/jacob/utf-8-u-urlencoded.txt) (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/data/formatting/utf-8/u-urlencoded.txt         (rev 0)
+++ trunk/data/formatting/utf-8/u-urlencoded.txt        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,5 @@
</span><ins>+%u7AE0%u5B50%u6021
+%u0046%u0072%u0061%u006E%u00E7%u006F%u0069%u0073%u0020%u0054%u0072%u0075%u0066%u0066%u0061%u0075%u0074
+%u10E1%u10D0%u10E5%u10D0%u10E0%u10D7%u10D5%u10D4%u10DA%u10DD
+%u0042%u006A%u00F6%u0072%u006B%u0020%u0047%u0075%u00F0%u006D%u0075%u006E%u0064%u0073%u0064%u00F3%u0074%u0074%u0069%u0072
+%u5BAE%u5D0E%u3000%u99FF
</ins></span></pre></div>
<a id="trunkdataformattingutf8urlencodepyfromrev909trunkdatajacoburlencodepy"></a>
<div class="copfile"><h4>Copied: trunk/data/formatting/utf-8/urlencode.py (from rev 909, trunk/data/jacob/urlencode.py) (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/data/formatting/utf-8/urlencode.py         (rev 0)
+++ trunk/data/formatting/utf-8/urlencode.py        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,32 @@
</span><ins>+# Generates urlencoded.txt from utf-8.txt.
+# Used for Test_UTF8_URI_Encode.
+
+import urllib, codecs, re
+import sys
+
+# uncapitalize pct-encoded values, leave the rest alone
+capfix = re.compile("%([0-9A-Z]{2})");
+def fix(match):
+ octet = match.group(1)
+ intval = int(octet, 16)
+ if intval < 128:
+ return chr(intval).lower()
+ return '%' + octet.lower()
+
+def urlencode(line):
+ """Percent-encode each byte of non-ASCII unicode characters."""
+ line = urllib.quote(line.strip().encode("utf-8"))
+ line = capfix.sub(fix, line)
+ return line
+
+if __name__ == "__main__":
+ args = sys.argv[1:]
+ if args and args[0] in ("-h", "--help"):
+ print "Usage: python urlencode.py < utf-8.txt > urlencoded.txt"
+ sys.exit(2)
+
+ sys.stdin = codecs.getreader("utf-8")(sys.stdin)
+ sys.stdout = codecs.getwriter("ascii")(sys.stdout)
+
+ lines = sys.stdin.readlines()
+ sys.stdout.write( "\n".join(map(urlencode, lines)) )
</ins></span></pre></div>
<a id="trunkdataformattingutf8urlencodedtxtfromrev909trunkdatajacobutf8urlencodedtxt"></a>
<div class="copfile"><h4>Copied: trunk/data/formatting/utf-8/urlencoded.txt (from rev 909, trunk/data/jacob/utf-8-urlencoded.txt) (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/data/formatting/utf-8/urlencoded.txt         (rev 0)
+++ trunk/data/formatting/utf-8/urlencoded.txt        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,5 @@
</span><ins>+%e7%ab%a0%e5%ad%90%e6%80%a1
+Fran%c3%a7ois Truffaut
+%e1%83%a1%e1%83%90%e1%83%a5%e1%83%90%e1%83%a0%e1%83%97%e1%83%95%e1%83%94%e1%83%9a%e1%83%9d
+Bj%c3%b6rk Gu%c3%b0mundsd%c3%b3ttir
+%e5%ae%ae%e5%b4%8e%e3%80%80%e9%a7%bf
</ins></span></pre></div>
<a id="trunkdataformattingutf8utf8txtfromrev909trunkdatajacobutf8txt"></a>
<div class="copfile"><h4>Copied: trunk/data/formatting/utf-8/utf-8.txt (from rev 909, trunk/data/jacob/utf-8.txt) (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/data/formatting/utf-8/utf-8.txt         (rev 0)
+++ trunk/data/formatting/utf-8/utf-8.txt        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,5 @@
</span><ins>+章子怡
+François Truffaut
+საქართველო
+Björk Guðmundsdóttir
+宮崎 駿
</ins></span></pre></div>
<a id="trunkdataformattingwindows1252pyfromrev909trunkdatajacobwindows1252py"></a>
<div class="copfile"><h4>Copied: trunk/data/formatting/windows1252.py (from rev 909, trunk/data/jacob/windows1252.py) (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/data/formatting/windows1252.py         (rev 0)
+++ trunk/data/formatting/windows1252.py        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,27 @@
</span><ins>+# Generates test data for functions converting between
+# dodgy windows-1252-only values and their unicode counterparts
+
+unichars = ["201A", "0192", "201E", "2026", "2020", "2021",
+ "02C6", "2030", "0160", "2039", "0152", "2018",
+ "2019", "201C", "201D", "2022", "2013", "2014",
+ "02DC", "2122", "0161", "203A", "0153", "0178"];
+
+winpoints = []
+unipoints = []
+
+for char in unichars:
+ char = unichr(int(char, 16))
+ dec = ord(char)
+ win = ord(char.encode("windows-1252"))
+
+ unipoints.append(dec)
+ winpoints.append(win)
+
+def entitize(s):
+ return "&#%s;" % s
+
+winpoints = map(entitize, winpoints)
+unipoints = map(entitize, unipoints)
+
+print "".join(winpoints), "".join(unipoints)
+
</ins></span></pre></div>
<a id="trunktestsformattingRemoveAccentsphp"></a>
<div class="modfile"><h4>Modified: trunk/tests/formatting/RemoveAccents.php (914 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/tests/formatting/RemoveAccents.php        2012-07-19 14:39:51 UTC (rev 914)
+++ trunk/tests/formatting/RemoveAccents.php        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -36,7 +36,7 @@
</span><span class="cx">
</span><span class="cx">         public function test_remove_accents_iso8859() {
</span><span class="cx">                 // File is Latin1 encoded
</span><del>-                $file = DIR_TESTDATA . DIRECTORY_SEPARATOR . 'formatting' . DIRECTORY_SEPARATOR . 'remove_accents.01.input.txt';
</del><ins>+                $file = DIR_TESTDATA . '/formatting/remove_accents.01.input.txt';
</ins><span class="cx">                 $input = file_get_contents( $file );
</span><span class="cx">                 $input = trim( $input );
</span><span class="cx">                 $output = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyyOEoeAEDHTHssaedhth";
</span></span></pre></div>
<a id="trunktestsformattingSeemsUtf8php"></a>
<div class="addfile"><h4>Added: trunk/tests/formatting/SeemsUtf8.php (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/tests/formatting/SeemsUtf8.php         (rev 0)
+++ trunk/tests/formatting/SeemsUtf8.php        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,46 @@
</span><ins>+<?php
+
+/**
+ * @group formatting
+ */
+class Tests_Formatting_SeemsUtf8 extends WP_UnitTestCase {
+
+        /**
+         * `seems_utf8` returns true for utf-8 strings, false otherwise.
+         *
+         * @dataProvider utf8_strings
+         */
+ function test_returns_true_for_utf8_strings( $utf8_string ) {
+                // from http://www.i18nguy.com/unicode-example.html
+                $this->assertTrue( seems_utf8( $string ) );
+        }
+
+        function utf8_strings() {
+                $utf8_strings = file( DIR_TESTDATA . '/formatting/utf-8/utf-8.txt' );
+                foreach ( $utf8_strings as &$string ) {
+                        $string = (array) trim( $string );
+                }
+                unset( $string );
+                return $utf8_strings;
+        }
+
+        /**
+         * @dataProvider big5_strings
+         */
+        function test_returns_false_for_non_utf8_strings( $big5_string ) {
+                $this->markTestIncomplete( 'This test does not have any assertions.' );
+
+                $big5 = $big5[0];
+                $strings = array(
+                        "abc",
+                        "123",
+                        $big5
+                );
+        }
+
+        function big5_strings() {
+                // Get data from formatting/big5.txt
+                return array( array( 'incomplete' ) );
+        }
+}
+
</ins></span></pre></div>
<a id="trunktestsformattingUrlEncodedToEntitiesphp"></a>
<div class="addfile"><h4>Added: trunk/tests/formatting/UrlEncodedToEntities.php (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/tests/formatting/UrlEncodedToEntities.php         (rev 0)
+++ trunk/tests/formatting/UrlEncodedToEntities.php        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,24 @@
</span><ins>+<?php
+
+/**
+ * @group formatting
+ */
+class Tests_Formatting_UrlEncodedToEntities extends WP_UnitTestCase {
+        /**
+         * @dataProvider data
+         */
+        function test_convert_urlencoded_to_entities( $u_urlencoded, $entity ) {
+                $this->assertEquals( $entity, preg_replace_callback('/\%u([0-9A-F]{4})/', '_convert_urlencoded_to_entities', $u_urlencoded ), $entity );
+        }
+
+        function data() {
+                $input = file( DIR_TESTDATA . '/formatting/utf-8/u-urlencoded.txt' );
+                $output = file( DIR_TESTDATA . '/formatting/utf-8/entitized.txt' );
+                $data_provided = array();
+                foreach ( $input as $key => $value ) {
+                        $data_provided[] = array( trim( $value ), trim( $output[ $key ] ) );
+                }
+                return $data_provided;
+        }
+}
+
</ins></span></pre></div>
<a id="trunktestsformattingUtf8UriEncodephp"></a>
<div class="addfile"><h4>Added: trunk/tests/formatting/Utf8UriEncode.php (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/tests/formatting/Utf8UriEncode.php         (rev 0)
+++ trunk/tests/formatting/Utf8UriEncode.php        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,36 @@
</span><ins>+<?php
+
+/**
+ * @group formatting
+ */
+class Tests_Formatting_Utf8UriEncode extends WP_UnitTestCase {
+
+        /**
+         * Non-ASCII UTF-8 characters should be percent encoded. Spaces etc.
+         * are dealt with elsewhere.
+         *
+         * @dataProvider data
+         */
+        function test_percent_encodes_non_reserved_characters( $utf8, $urlencoded ) {
+                $this->assertEquals($urlencoded, utf8_uri_encode( $utf8 ) );
+        }
+
+        /**
+         * @dataProvider data
+         */
+        function test_output_is_not_longer_than_optional_length_argument( $utf8, $unused_for_this_test ) {
+                $max_length = 30;
+                $this->assertTrue( strlen( utf8_uri_encode( $utf8, $max_length ) ) <= $max_length );
+        }
+
+        function data() {
+                $utf8_urls = file( DIR_TESTDATA . '/formatting/utf-8/utf-8.txt' );
+                $urlencoded = file( DIR_TESTDATA . '/formatting/utf-8/urlencoded.txt' );
+ $data_provided = array();
+                foreach ( $utf8_urls as $key => $value ) {
+                        $data_provided[] = array( trim( $value ), trim( $urlencoded[ $key ] ) );
+                }
+                return $data_provided;
+        }
+}
+
</ins></span></pre></div>
<a id="trunktestsformattingent2ncrphp"></a>
<div class="addfile"><h4>Added: trunk/tests/formatting/ent2ncr.php (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/tests/formatting/ent2ncr.php         (rev 0)
+++ trunk/tests/formatting/ent2ncr.php        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,36 @@
</span><ins>+<?php
+
+/**
+ * @group formatting
+ */
+class Tests_Formatting_Ent2NCR extends WP_UnitTestCase {
+        /**
+         * @dataProvider entities
+         */
+        function test_converts_named_entities_to_numeric_character_references( $entity, $ncr ) {
+                $entity = '&' . $entity . ';';
+                $ncr = '&#' . $ncr . ';';
+                $this->assertEquals( $ncr, ent2ncr( $entity ), $entity );
+        }
+
+        /**
+         Get test data from files, one test per line.
+         Comments start with "###".
+        */
+        function entities() {
+                $entities = file( DIR_TESTDATA . '/formatting/entities.txt' );
+                $data_provided = array();
+                foreach ( $entities as $line ) {
+                        // comment
+                        $commentpos = strpos( $line, "###" );
+                        if ( false !== $commentpos ) {
+                                $line = trim( substr( $line, 0, $commentpos ) );
+                                if ( ! $line )
+                                        continue;
+                        }
+                        $data_provided[] = array_map( 'trim', explode( '|', $line ) );
+                }
+                return $data_provided;
+        }
+}
+
</ins></span></pre></div>
<a id="trunktestsformattingisoDescramblerphp"></a>
<div class="addfile"><h4>Added: trunk/tests/formatting/isoDescrambler.php (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/tests/formatting/isoDescrambler.php         (rev 0)
+++ trunk/tests/formatting/isoDescrambler.php        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,14 @@
</span><ins>+<?php
+
+/**
+ * @group formatting
+ */
+class Test_WP_ISO_Descrambler extends WP_UnitTestCase {
+        /*
+         * Decodes text in RFC2047 "Q"-encoding, e.g.
+         * =?iso-8859-1?q?this=20is=20some=20text?=
+        */
+ function test_decodes_iso_8859_1_rfc2047_q_encoding() {
+ $this->assertEquals("this is some text", wp_iso_descrambler("=?iso-8859-1?q?this=20is=20some=20text?="));
+ }
+}
</ins></span></pre></div>
<a id="trunktestsqueryverboseRewriteRulesphp"></a>
<div class="addfile"><h4>Added: trunk/tests/query/verboseRewriteRules.php (0 => 915)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/tests/query/verboseRewriteRules.php         (rev 0)
+++ trunk/tests/query/verboseRewriteRules.php        2012-07-19 14:41:52 UTC (rev 915)
</span><span class="lines">@@ -0,0 +1,18 @@
</span><ins>+<?php
+
+require_once dirname( dirname( __FILE__ ) ) . '/query.php';
+
+/**
+ * @group query
+ * @group rewrite
+ */
+class Tests_Query_VerbosePageRules extends Tests_Query {
+        function setUp() {
+                parent::setUp();
+                global $wp_rewrite;
+                update_option( 'permalink_structure', '/%category%/%year%/%postname%/' );
+                create_initial_taxonomies();
+                $GLOBALS['wp_rewrite']->init();
+                flush_rewrite_rules();
+        }
+}
</ins></span></pre>
</div>
</div>
</body>
</html>