How to handle big data files?

What should we do with large research data files, let's say those exceeding the reasonable limits of an email attachment? (My current case is a file which is 202 MiB big after zstd compression; uncompressed it is 1.5 GiB.)

Even though I explain how to recreate the file and give all the source code, I think the file should be stored as-is in the MathRepo because (1) it saves time and trouble for people who want to build on this data, and (2) publishing the data I produced is necessary to allow independent verification of it.

Thinking this further, storing the whole of MathRepo (including big data files such as mine) in a single git repository is probably not future-proof. The repository as it is currently used will grow big over time (hundreds of projects, each with at least email attachment-size data files, sometimes larger -- that's the idea, right?) and eventually the first thing every researcher at MPI has to do is download gigabytes of unrelated MathRepo git objects. This is a problem that may well be fixed in the future, once it actually occurs.

But at least for data files of considerable size, perhaps having a separate, non-versioned and write-protected (new files only) storage would be useful.

Edited May 14, 2022 by Tobias Boege