[wp-trac] Re: [WordPress Trac] #5183: General Meta-Data Table
WordPress Trac
wp-trac at lists.automattic.com
Wed Jun 24 23:25:15 GMT 2009
#5183: General Meta-Data Table
-----------------------------------------+----------------------------------
Reporter: filosofo | Owner: anonymous
Type: enhancement | Status: reopened
Priority: normal | Milestone: Future Release
Component: General | Version: 2.9
Severity: normal | Resolution:
Keywords: needs-patch meta-table meta |
-----------------------------------------+----------------------------------
Comment(by mikeschinkel):
Replying to [comment:32 Denis-de-Bernardy]:
> I think the real underlying issue is we need abstraction in the php.
It's like, if the data cannot go into wp_posts (which could really be
called wp_nodes),
How very Drupalish of you. (I came to WordPress by way of Drupal. :)
> and ends up requiring a separate table, I'd personally suggest it should
get its own meta table as well. So long as the needed php functions are
around to manage this user-defined meta table, things should be fine.
I think that is grand in concept and certainly follows the relational
model but I think in practice following said rules results in increases in
rigidity and complexity without equivalent increases in benefit.
Certainly for the most prominent data elements it makes sense to have them
each in their own tables but I urge you to reconsider the edge cases as
they are IMO handled much better by creating a repository to hold them all
instead of requiring "yet another table" for what are often edge cases.
> I know... I've been playing around with the idea of a nodes table for
over 10 years.
Not to one-up you (well, okay maybe I am) I've been playing with them for
20 years... :) And for the most I've decided the complexity is not worth
the pain.
> We could actually do this by switching to UUID types instead of auto-
incremented integers. It adds an extra benefit along the way, too: you can
then sync sites that use shared databases.
UUID have a certain elegance but if you are concerned about performance as
you were regarding "vtypes" then UUIDs have a whole series of issues:
http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/
> Might it be that you only ever needed to write queries that involved one
or two joins?
LOL!!!!! Sorry, I've at times written queries that exceeded the number of
tables that the then SQL Server parser could handle (max 32 tables.)
I've built some very complex database systems. I built a full ecommerce
platform starting in 1995 for internal use that processed almost $100
million in revenue for my company over a 10 year period. I can promise you
that almost everything I did had more than two joins.
And I've had to do a lot of work with tables in Drupal too. For example,
here's a query I pulled from a project of mine (not exactly trivial):
{{{
SELECT
vehicle.nid,
vehicle.title,
vehicle_content.teaser,
vehicle_path.dst AS path,
IFNULL(caption.field_photo_caption_value,image_node.title) AS
photo_caption,
files.filepath AS photo_file,
IFNULL(photo_info.field_image_orientation_value,'landscape') AS
photo_orientation
FROM
node showcase
INNER JOIN content_field_vehicle cfv ON showcase.nid=cfv.nid
INNER JOIN node vehicle ON cfv.field_vehicle_nid=vehicle.nid
INNER JOIN content_type_vehicle ctv ON vehicle.nid=ctv.nid
INNER JOIN node_revisions vehicle_content ON
vehicle.nid=vehicle_content.nid AND vehicle.vid=vehicle_content.vid
INNER JOIN url_alias vehicle_path ON
vehicle_path.src=CONCAT('node/',vehicle.nid)
LEFT OUTER JOIN content_field_photo_caption caption ON
vehicle.nid=caption.nid
LEFT OUTER JOIN image_attach ia ON vehicle.nid=ia.nid
LEFT OUTER JOIN node image_node ON ia.iid=image_node.nid
LEFT OUTER JOIN content_type_image photo_info ON
photo_info.nid=image_node.nid
LEFT OUTER JOIN files ON image_node.nid=files.nid
WHERE
showcase.type='showcase' AND
vehicle.type='vehicle' AND
ctv.field_vehicle_subtype_value in ('abstract','standalone') AND
(files.fid IS NULL OR COALESCE(files.filename,'small')='%s') AND
showcase.nid=%d AND
vehicle.status=1
}}}
> I ask, because my own 10 years of experience tell me a rather different
story. I was a varchar index junkie until 2-3 years ago -- until I ended
up needing to look into the performance of an app that was heavy on joins
and that had more than a minute amount of data.
Funny how the pendulum swings one way and then the next. It is likely our
cycles are just not in sync.
> PGSQL ended up using merge join plans (= the slowest you can get) that
involved zillions of rows. No amount of vacuum analyze, adding partial
indexes, or clustering the data helped in any meaningful way. Re-
engineering the DB in such a way that PG had means to fetch the needed
rows using less joins, on integer/uuid fields only, bought the app was
brought back to life.
No debate there.
> My own bottom line was this: primary key (object_id, object_type) =
avoid like the Plague if you've a large amount of data.
What's large? 1000? Or 100 million? I'll give you the latter, not the
former.
For performance on large data sets, integer keys are the best. String are
much slower, and the string the larger the indexes get and the more page
loads from disk are required to seek into the index. But that's doesn't
mean all indexes MUST be integers as I'll explain below.
Also I'm not a fan of composite keys in general, but I've also come to
believe that every rule has reasonable exceptions. You can get around the
composite key issue by concatonating (i.e. "273links_meta") but that's not
elegant unless it can be done on a trigger.
OTOH, one of the biggest evils I have found in technology is premature
optimization. To enforce a prohibition on composite indexes of
(int,string) "at all costs" is a premature optimization for tables that
may 99.9% of the time have less than 100,000 records. That's why I am
suggesting them for use in edge cases. IF we find that one of those
objects starts getting hundreds of thousands of records and many
installations THEN it is time to optimize. But don't force all the
complexity on the edge cases simple because there is a tiny chance on of
them might become a non-edge case.
> In case you're thinking WP will never get so much data that this will
make a big difference, consider the zillions of blogs, users, posts,
terms, links, and comments on wordpress.com.
I'm not at all assuming WP won't get that much data. I am instead assuming
that not all data in WP will get so big that performance optimization
concerns actually matter.
Look at it this way; if you don't give plugin and theme developers an
appropriate place to store data they will instead store their data in far
less appropriate places (i.e. an overloaded wp_options table, for example)
and they will see far worse performance than the purity of your approach
wants to ensure against. Remember that most people who use and program for
WordPress do know have any experience with SQL and are even afraid of it
so they won't be creating custom tables on their own, they will be
overloading wp_options.
As a case in point: Nathan Rice. Nathan is a brillant theme developer but
has said he doesn't understand database and won't go near SQL even though
I know if he did he's easily smart enough to learn it in no time. There
are *many* Nathans out that and you need to give them a tool they are
comfortable using instead of giving them prohibition for fear that one day
by some small chance they might misuse that tool.
--
Ticket URL: <http://core.trac.wordpress.org/ticket/5183#comment:33>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list