[wp-trac] [WordPress Trac] #4457: WP does not properly encode UTF-8 mail per RFC 2047

WordPress Trac wp-trac at lists.automattic.com
Wed Jun 13 19:05:55 GMT 2007


#4457: WP does not properly encode UTF-8 mail per RFC 2047
-----------------------+----------------------------------------------------
 Reporter:  trauschus  |       Owner:  anonymous   
     Type:  defect     |      Status:  new         
 Priority:  high       |   Milestone:  2.2.1       
Component:  General    |     Version:  2.2         
 Severity:  critical   |    Keywords:  rfc2047 mail
-----------------------+----------------------------------------------------
 RFC2047, which is MIME Part 3, specifies that when sending non-ASCII
 information in headers such as the RFC(2)822 Subject header, it must be
 properly encoded.  WordPress gets it *mostly* right, however, it violates
 one very important rule (quoted from RFC2047):

    Each 'encoded-word' MUST encode an integral number of octets.  The
    'encoded-text' in each 'encoded-word' must be well-formed according
    to the encoding specified; the 'encoded-text' may not be continued in
    the next 'encoded-word'.  (For example, "=?charset?Q?=?=
    =?charset?Q?AB?=" would be illegal, because the two hex digits "AB"
    must follow the "=" in the same 'encoded-word'.)

    Each 'encoded-word' MUST represent an integral number of characters.
    A multi-octet character may not be split across adjacent 'encoded-
    word's.

 However, I just received a mail from WordPress with the following subject
 header:
   Subject:
 =?UTF-8?Q?[Trausch=E2=80=99s_Little_Home]_Please_moderate:_"Well,_it=E2?=
         =?UTF-8?Q?=80=99s_good_I_don=E2=80=99t_use_IE=E2=80=A6"?=

 The ’ (Unicode 0x2019) is split in mid-character, which is incorrect.  The
 sequence of hex characters E2 80 99 cannot be split per the standard, and
 this causes RFC 2047-compliant mailers such as Evolution to display the
 subject as-transmitted (e.g., in quite an ugly manner).

 I used some code in a C# application to avoid this situation:

         // c is a byte representing an octet of a UTF-8 Character.
         if(RetVal.Length > wrapLength) {
                                   if(((c & 0xC0) == 0xC0) || ((c & 0xC0)
 == 0x80)) {
                                     // Do Nothing -- We cannot split here.
                                   } else {
                                     RetVal += Ending;
                                     Lines.Add(RetVal);
                                     RetVal = "\n " + Preamble;
                                   }
                                 }

 Basically, if the character ANDed with 0xC0 is equal to 0xC0 or 0x80, the
 string should not be split at that location.  It should not be terribly
 hard to express that in PHP, as well.  This is most likely not a potential
 security issue, though it could cause strange behavior in mail user agents
 (MUAs) which attempt to parse the quoted-words anyway.  Evolution is
 following the standard by choosing not to parse the quote-words.  From RFC
 2047:

 6.3. Mail reader handling of incorrectly formed 'encoded-word's

    It is possible that an 'encoded-word' that is legal according to the
    syntax defined in section 2, is incorrectly formed according to the
    rules for the encoding being used.   For example:

    (1) An 'encoded-word' which contains characters which are not legal
        for a particular encoding (for example, a "-" in the "B"
        encoding, or a SPACE or HTAB in either the "B" or "Q" encoding),
        is incorrectly formed.

    (2) Any 'encoded-word' which encodes a non-integral number of
        characters or octets is incorrectly formed.

    A mail reader need not attempt to display the text associated with an
    'encoded-word' that is incorrectly formed.  However, a mail reader
    MUST NOT prevent the display or handling of a message because an
    'encoded-word' is incorrectly formed.

 I have chosen the pri/sev high/crit because this is a standards-compliance
 issue, and might in remote situations be a security issue if there are
 particularly borked MUAs out there that do strange things with the header.

-- 
Ticket URL: <http://trac.wordpress.org/ticket/4457>
WordPress Trac <http://trac.wordpress.org/>
WordPress blogging software


More information about the wp-trac mailing list