[wp-trac] [WordPress Trac] #60375: Site Transfer Protocol

WordPress Trac noreply at wordpress.org
Thu Feb 1 00:54:22 UTC 2024


#60375: Site Transfer Protocol
-------------------------+------------------------------
 Reporter:  zieladam     |       Owner:  (none)
     Type:  enhancement  |      Status:  new
 Priority:  normal       |   Milestone:  Awaiting Review
Component:  Import       |     Version:
 Severity:  normal       |  Resolution:
 Keywords:               |     Focuses:
-------------------------+------------------------------

Comment (by dmsnell):

 > Is Site Transfer a direct Host <-> Host operation with optional support
 for .zip uploads? Or is it an export&download -> upload&import operation
 built with future Host<->Host exchange in mind?

 The more I consider it the more I see these as the same thing, whereby the
 ZIP format is the means through which we normalize the transfer. I could
 be totally overlooking obvious things here though, so I would like to know
 where this idea makes no sense.

 Given the VFS-like interface we have with ZIPs, I imagine that if a site
 only wants to import posts and not media then it will skip the part of the
 ZIP containing the `wp-contents` file.

 Maybe this is asking too much of the remote site, to regenerate a ZIP on
 the fly for specific parts. The challenges I'm capable of seeing at the
 moment are all more related to whether we ship `wp-content` assets the
 same way we ship database and config values. It's all about bulk data and
 less about the destination of the transfer.

 > WXR. However, when would WXR hold both content AND metadata? On site
 export the content would be in the database so the WXR file would only
 carry the metadata – at which point it's wouldn't have almost anything in
 common with WXR as we know it. On the upside, WXR can be streamed with the
 upcoming XML API and you can also edit them with a text editor.

 I'm still completely on the fence about this too. Of course there'd be
 duplication of content in the WXR vs. the tables, but I see the database
 as the authoritative source for non-asset content while the WXR could be a
 reasonable signaling protocol to guide the import.

 Some part of me wants to remove the content from the WXR, but if we do
 that we potentially lose a lot for older systems and for our ability to
 easily inspect the export. 🤔

 Even if we have post content in the WXR it will lack the meta information
 unless we also export it there as well, which I guess we could do, and
 even remove those rows from the sqlite database 🤷‍♂️

 Something big still seems to be missing that I haven't seen yet on all
 this, but I think we're starting to get a better handle of the space by
 asking all these questions and figuring out how it could all go wrong.

 > In direct Host <-> Host transfer we need an entire world of error
 handling logic.

 Yes, but also if "the Playground ZIP" is the transfer format then it's
 indistinguishable from importing a ZIP from a local disk, other than the
 bytes are arriving over the network. yeah we'll need another layer of
 error handling, but we should be able to restart the ZIP mid-sequence on
 the source site.

 makes me thing that one preliminary step we'd need for this, to make it
 reentrant, is to create a relatively small manifest on the source site to
 start the process. this could do a number of things:
  - generate content hashes for all relevant media or database tables. this
 might involve some way of snapshotting the data.
  - generate a list of media files and their content hashes
  - sequence the files for the ZIP stream.

 after this the source site can reference that manifest to virtually
 deliver the ZIP stream mid-sequence without having to scan all the data on
 its own disk. this manifest would roughly correspond in size to the number
 of files and database objects, but it could itself be a kind of journaling
 snapshot of a site - maybe there's a tie-in with other
 snapshotting/concurrent work on this

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/60375#comment:9>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform


More information about the wp-trac mailing list