[wp-trac] [WordPress Trac] #55128: REST API /media leaves holes in the result, making it virtually impossible to paginate through them

Wed Feb 9 18:09:03 UTC 2022

#55128: REST API /media leaves holes in the result, making it virtually impossible
to paginate through them
------------------------------+-----------------------------
 Reporter:  frankieandshadow  |      Owner:  (none)
     Type:  defect (bug)      |     Status:  new
 Priority:  normal            |  Milestone:  Awaiting Review
Component:  REST API          |    Version:  5.9
 Severity:  normal            |   Keywords:
  Focuses:  rest-api          |
------------------------------+-----------------------------
 A request to the REST API for say /media?per_page=20 can unexpectedly
 return any number of results up to 20, including an empty array when there
 are more pages to follow.

 This happens when some of the media library entries are attached to
 unpublished posts, and the media were added in the post, not directly to
 the media library.

 Now it may be reasonable to omit these when not authenticated, but not by
 post-processing the array after it has been fetched from the database, as
 seems to be happening currently.

 Doing that makes it extremely difficult to paginate through them, or to
 choose an appropriate page size. This is compounded by the fact that it is
 more likely the missing images will be near the beginning, as those are
 the ones less likely to have been published yet.

 Consider just displaying a matrix of the thumbnails, as obtained from the
 API, 20 at a time. So you ask for a page of 20 and get 5. OK, let's get
 another page before returning it to the client: you get 18 this time, so
 you take the first 15 of those and along with the first 5 give those to
 the client. The user clicks "show more". Where do I start? I can't start
 on page 3 because there were some left over on page 2. But I have no idea
 where the gaps are so I can't set an appropriate offset - offset
 apparently _includes_ the missing entries! And there is no information
 about where the missing entries might be. Basically using offset is not
 possible.

 The only solution is to start from the beginning every time, and then omit
 the first however many results already delivered to the client. This
 defeats the point of pagination, and gets slower and slower as they ask
 for more.

 I could provide randomly more than they requested, up to some maximum (set
 as per_page), and work entirely in pages, noting the page number highwater
 mark on each request, each time fetching as many pages as needed to get
 some minimum number of images. This doesn't require starting again each
 time, but can also result in hundreds of API requests retrieving empty
 arrays each time if there are many unpublished images, which is also very
 slow. It also makes it impossible to work to a UI where the user specifies
 how many to retrieve at once.

 Or I could work with page_size = 1, which means I can use page number
 where I would have liked to have used offset, but that is also very slow:
 at least 20 requests, probably many more to skip unpublished images, where
 I would expect only to need one request.

 And it makes it much harder to implement either way, as the obviously
 intended implementation is to set offset to the currently retrieved
 images, and a number up to 100 as the page size, and just fetch that page.

-- 
Ticket URL: <https://core.trac.wordpress.org/ticket/55128>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform