[wp-hackers] A "terms" table
Chris
chris.hearn01 at ntlworld.com
Sun Apr 15 20:42:49 GMT 2007
You're the boss in this I guess - fwiw why the rush? because some other
blogging systems put out releases more often? bad reason!
The documentation needs sorting for version 2.1 - stability is good - DB
changes are bad unless very carefully implemented and documented so =
Keep tags out of the way, and don't rush to another release - May/June,
later.. whatever!
Chris
Matt Mullenweg wrote:
> WordPress is like a sandwich.
>
> Assuming we've scared off all the vegetarians with all the talk of
> BBQ, the core is the meat. Our meat is the wp_posts table, which
> stores what I would refer to as the primary points of content.
> Currently for us this is posts, pages, and attachments, though in the
> future I could see it expanding to support new post types such as
> externals, galleries, and hopefully things we can't even imagine yet.
>
> On the side you have chips (good comments), vegetables (idiot
> comments), and that funny stuff your cousin brought that you're going
> to move around on the plate but never eat (spam comments). I think
> comments are okay right now, maybe they could use a meta table but we
> can talk about that later.
>
> Meat alone is only a real meal at rboren's house, so most people put
> things on the sandwich to add flavor and spice it up. Some add other
> types of meat, in the WP world this is postmeta, which we call custom
> fields in polite company.
>
> We also havae condiments which are currently handled by two tables:
> wp_categories and wp_post2cat. On the taxonomy/condiment side, right
> now we really only allow ketchup aka categories, and users for at
> least a year have been asking for more. In 2.2 we decided to satiate
> their appetites.
>
> Everyone agrees that ketchup and mayonnaise are totally different,
> even though they're both condiments and you put them both on
> sandwiches. No one is trying to create some horrible pink mixture of
> the two tastes.
>
> However there are currently two schools of thought on how we should
> store the data for categories and tags at a very low level in our DB.
>
> Let me do my best to make the case for putting category data and tag
> data in separate tables, and feel free to chime in if you think I've
> missed any points.
>
> * We shouldn't ship anything with a data schema people disagree on,
> because plugins and themes will be written against it.
> * They're different things, so we should have them in different tables.
> * Tags can have things like synonyms, and don't need things like
> hierarchy.
> * There are ugly legacy field names in the category table like
> category_nicename, cat_name, cat_ID (wtf capitals) and we can clean
> those up in new tables
> * With separate tables our queries on the admin side become WAY easier
> and cleaner to do, with no bitwise or _count nonsense
> * Plugins for tagging have implemented it this way.
>
> The code currently in SVN does something different. It uses the
> categories table for names of the tags and then adds fields to hint
> how those names are being used for the admin section. If I wanted to
> make everyone happy and be popular I would just go with the above
> since there seems to be good consensus there, but I think this is an
> important long-term decision for WP so let me spell out some reasons
> why I think the current design has legs not just for 2.2 but beyond.
>
> 1. It performs faster.
>
> On front-end display, we have added ZERO QUERIES to support tags. The
> query that grabs categories is also grabbing tags and we're sorting
> them out in the code.
>
> In the dashboard some of the queries are more complicated (though not
> really any different than what we deal with for link categories) and a
> few milliseconds slower than the old ones. However, that really
> doesn't matter because 1) we only need to write them once and more
> importantly 2) they're run several orders of magnitude fewer times
> than the ones that display the blog on the front-end. A mantra has
> always been that user time is more important than developer time.
>
> A separate tag naming table and post2tag table would require at least
> 2 additional queries and/or joins to the front page, which already
> think does too many queries and is too heavy.
>
> 2. It's a better long-term foundation.
>
> I think there are a lot of benefits to having a single ID that maps to
> a term and a slug. Let's pretend we had perfect foresight 5 years ago
> and instead of wp_categories we had wp_terms.
>
> Regardless of the UI and philosophy behind categories, tags, and ooga
> booga, on a data level they're still mapping a set of terms to an item
> in post_content.
>
> In WP a term has three important things: an ID, a human-entered name,
> and a URL-friendly slug. We use the ID in our relations instead of the
> slug because it's more efficient and slugs are not necessarily unique
> (because of hierarchy).
>
> Having "dogs" in a category table have one ID and "dogs" in a tag
> table have a different ID is a long-term deck of cards that we will
> seriously regret later. It's MUCH harder to reconcile items with
> internally different IDs than it is to split out unique IDs into
> different tables.
>
> As for some of the bit and count fields currently causing grief, I
> would argue the solution for that isn't a separate tags table, but a
> separate table specifically for that type of data. In Drupal for this
> infrastructure they have a term_data, term_hiercharchy, term_node,
> term_relation, term_synonym, vocabulary, and vocabulary_node_types
> tables. I think that might be a little more than we need, but there
> are some concepts there we could pretty cleanly combine into a single
> extra table that isn't called categories or tags, and will provide a
> good and scalable foundation for years to come.
>
> 3. There should be no user- or plugin-facing problems with how it's
> currently implemented, or if we decide to change it.
>
> Now this isn't to suggest for a second there aren't bugs, many have
> been fixed already and I'm sure there are many still left, but that is
> going to be true of ANY code we put in WP and anyone who suggests
> otherwise is not very familiar with software development. From a point
> of view of plugin authors, they shouldn't have to think or care if
> we're storing it in a categories table or a turkey, the function they
> use should remain consistent no matter what we change or gymnastics we
> do behind the curtain. No matter what we do in 2.2 or 2.3, that's not
> going to change.
>
> I do think there is something intrinsically better about shipping and
> iterating than noodling without release in search of the "perfect"
> implementation.
>
> More importantly from a user's point of view, all that really matters
> is that they have a box they can type tags in and that their host
> doesn't tell them not to upgrade to 2.2 because it does more queries.
>
> 4. I'm open
>
> I'm not personally tied to any code written thus far and if I think
> the best thing is.
>
> There is a separate but related decision around what to do about the
> release date. Based on the discussion here I'm going to make go/no-go
> decision on Tuesday.
>
> If we do delay I think we should laser-focus on tags and now allow
> other pet-issues to creep in, and I will fully expect people to put in
> as much time writing code and fixing bugs as they have arguing points
> on mailing lists, IRC, and trac. At the very least I hope we've
> learned a bit more about getting these things out of the way early
> rather than a week or two before a release. Also if something is
> sitting in trac, take it to the hackers list early.
>
> I think if we stick with the current implementation we can hit it with
> a very stable release next Monday, but if we decide to replace it we
> need to push it back at least into mid-May.
>
More information about the wp-hackers
mailing list