[wp-hackers] WP issues
Geoffrey Sneddon
foolistbar at googlemail.com
Sat Jun 2 14:17:14 GMT 2007
On 2 Jun 2007, at 14:46, Sam Angove wrote:
> On 6/1/07, Geoffrey Sneddon <foolistbar at googlemail.com> wrote:
>>
>> 1. People have been asking for an XML serialiser to be used for all
>> the XML WordPress produces for years. This doesn't exist. This allows
>> invalid bytes to get into XML data. Try parsing <http://
>> photomatt.net/
>> comments/feed/atom/> with a compliant XML parser. You'll get a fatal
>> error. This is the exact sort of issue that an XML serialiser would
>> avoid.
>>
>> c) "We have no way currently to ensure XHTML
>> validity."[TICKET1526]
>> — See 1.
>
> The term "serialiser" is vague (what are you serialising from?), but I
> assume you meant that the output should be built as, say, a DOM
> object, then serialised from it to a text|application/xml document. If
> so, then I disagree. It's not a magic bullet.
Any XML structure, whether it be SAX, DOM, or something else.
> Most errors occur when users save posts and comments full of malformed
> markup and bad character data. Building output as an XML DOM won't
> help with that at all, because the broken input comes in as a string
> and will need to be corrected beforehand. If that problem can be
> solved, the class of errors that a serialiser would catch are
> comparatively easy to handle.
The serialiser will ensure that that it is well-formed, so would
therefore strip invalid characters.
> WordPress works the way it does now because:
>
> a) It's an impossibly bad user-experience to show a yellow screen of
> death to an end user.
Any UA that uses an XML parser already shows an XML fatal error.
> b) It's almost as bad to expect an end-user to manually correct
> well-formedness errors. (Does Aunt Mildred even know what an entity
> is?)
Well, why not use HTML? That gets around both this and the above.
>
> c) It's hard to automatically, reliably correct broken HTML. The
> rough consensus last time this came up was that it was *too* hard.
We already create broken HTML though. What's the difference?
> Until that time, the fact is that the burden of dealing with bad
> markup is pushed onto those users best able to deal with it. The
> browsers already can;
IE7 refuses to display a feed with any XML error (including invalid
characters).
Fx 2.0 and Safari 2 both display feeds with XML errors.
> the major aggregators do (with varying degrees
> of success);
Like above, not all of the major aggregators do.
> there are excellent error-correcting feed parsing
> libraries for Python and Java.
Each correcting errors in their own way, meaning you have to reverse-
engineer one-another.
> Yes, as a feed consumer, it sucks, these might not seem like good
> reasons, and it's perfectly fair to resent that the problem is pushed
> onto you. (In my case, the path of least resistance was switching to
> Python, where others had already dealt with it. ;)
And suffer how browser makers have for years? "This feed works in x
aggregator, but it doesn't work in your aggregator. Please fix this
bug." — So then you go off and do more reverse-engineering of
malformed XML.
> In any event, it seems to me that many of the specific problems you
> cite are symptomatic of the lack of automated testing. One check with
> a validator would have caught the @content bug; unit tests would have
> proven the tag balancing issue and prevented regressions. Tests with
> an XML parser could catch the same errors as a serialiser. I'd much
> rather see a comprehensive test suite than a bunch of hideous DOM
> manipulation code.
Using SAX would allow us to behave in similar ways as we already do.
Tag-balancing issues would never arise with a serialiser. You're
never going to have test suites to test everything. Something
explicitly designed to avoid these errors would avoid them happening.
There are literally thousands of places in WP where I can insert
content that'll cause a fatal error.
- Geoffrey Sneddon
More information about the wp-hackers
mailing list