Bonjour,
In the following months, in the context of the forgefriends project, I’ll work on the Gitea migration file format and mirroring projects in Gitea (not just git, also issues, etc.). My motivation is primarily because I think it provides an essential building block for federation.
Here is a high level view of my roadmap in the hope it will help the people funded to further federation in Gitea figure out where I’m headed and why. I very much look forward to reading their roadmap so that I can adjust mine accordingly. Ideally it will all fit nicely together
Cheers
Maintaining and documenting a migration format
Gitea migration data structures and format as of January 2022
The Gitea migration data structures are used as a pivot when importing software projects from GitLab, GitHub, and more. It is internal, not documented and subject to breaking changes whenever a new Gitea version is released.
The dump-repo command dumps this data structures as YAML files to be read by the restore-repo command. It is not designed for archival because there is no guarantee that future Gitea versions are going to be able to read it back. It is however useful to temporarily store the data structures on disk when the import of a large project takes a long time. Or when creating a new software project from these structures uncovers bugs and requires multiple attempts to get it right.
The migration data structures are different from the database schema or the data structure used by the Gitea API.
Requirements for a durable Gitea migration format
In order for a software project to be dumped and successfully restored by future versions of Gitea, the migration file format must be:
- Validated
- Documented
- Versioned
- Backward compatible
File format validation
For each file format, a corresponding JSON schema is created to list the required fields, their data types, etc. See for instance the schema describing the file format of an issue.
Documented
The JSON schema includes a reference documentation of the semantic of each field. It is exhaustive and non ambiguous.
Versioned
A version number X.Y is included in each file. When validating the file, the JSON schema matching the version is used. Y is incremented every time the JSON schema changes. X is incremented when the JSON schema changes in a non backward compatible way.
Backward compatibility
Software reading a file with version X.Y is expected to also work when reading files with version X.Y+N. For instance older Gitea versions will be able to import a file from a newer Gitea version as long they are both compatible with version X of the format. However if an older version of Gitea only supports X-1, it will not be able to read the files.
Mirroring
Gitea mirroring as of January 2022
Mirroring is implemented in Gitea to push or pull git repositories. Other project information can be migrated but not mirrored.
The code use to migrate a project from one forge to another is neither idempotent nor incremental. If interrupted for any reason, it has to start over from scratch.
Mirroring a project as a whole
The codepath used to migrate a project is modified to be idempotent. It can resume when interrupted. It can also be run on a regular basis to mirror a project instead of migrating it.
Using the Gitea migration format for federation
Federating forges requires two kind of communication:
- Notification (e.g., a pull request was merged)
- Project state synchronization (e.g., closed pull request now closed, the modified state of the associated issues, the effect on milestone completion, etc.)
While notification is in the scope of ActivityPub, project state synchronisation is not. ActivityPub does not provide any kind of guarantee to ensure the consistency of a data set. The project state is best shared between federated Gitea forges using git.