Reopening the discussion on data distribution

Speaker: px

Track: Packaging, policy, and Debian infrastructure

Type: BoF (45 minutes)

One or one and a half decades ago, we had a discussion on “data-only” packages, prompted at the time by the presence of large game data files in the Debian archive. While this didn’t result in the proposed additional source package format, it helped shape thinking on this topic for quite a while.

While in 2024, commercial CDNs do exist, the distribution of larger data sets is still an ongoing and unsolved problem for most of the world, newly exacerbated by the advent of huge deep learning models.

In this BoF we want to explore the possibility space for a stronger data distribution story within the Debian project (possibly utilizing git-annex, possibly other tools), all the while staying respectful of our mirror operators’ incredible commitment and limited resources.

We will also use the BoF to gauge interest in a longer-form workshop.

URLs