I have posted this question on Server Fault as well just to see if I can get some help from any postgres gurus. I’m reaching out here as well to see if there’s anything I can try.
First and foremost, I’ve verified that all of the git repositories are ok and we have many backups of our storage box that contains them. As such, the only data that’s at risk of being lost are the 30 or so user accounts and the organization as well as any permissions I had setup. Even still, I would like to avoid the tedium if possible.
We’re running gitea and postgres through docker using the following docker compose config.
version: "3.9"
services:
gitea:
image: gitea/gitea:latest
restart: always
hostname: ###.###.###.###
environment:
- USER=git
- USER_UID=1000
- USER_GID=998
- GITEA__database__DB_TYPE=postgres
- GITEA__database___HOST=giteadb:5432
- GITEA__database__NAME=gitea
- GITEA__database__USER=gitea
- GITEA__database__PASSWD=###################
networks:
- gitea
ports:
- 3000:3000
- 22:22
volumes:
- /mnt/###################/gitea/data:/data
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
shm_size: 256m
depends_on:
- giteadb
giteadb:
image: postgres:14
restart: always
environment:
- POSTGRES_USER=gitea
- POSTGRES_PASSWORD=###################
- POSTGRES_DB=gitea
networks:
- gitea
volumes:
- /srv/###################/gitea/db:/var/lib/postgresql/data
runner:
image: gitea/act_runner:latest
restart: always
depends_on:
- gitea
volumes:
- /mnt/###################/gitea/data/act_runner:/data
- /var/run/docker.sock:/var/run/docker.sock
environment:
- GITEA_INSTANCE_URL=https://git.###################.com
# When using Docker Secrets, it's also possible to use
# GITEA_RUNNER_REGISTRATION_TOKEN_FILE to pass the location.
# The env var takes precedence.
# Needed only for the first start.
- GITEA_RUNNER_REGISTRATION_TOKEN=###################
networks:
gitea:
external: false
As mentioned in the server fault post, the database was corrupted because I mistakenly spun up two postgres instances that were pointing at the same volume, as seen in the docker ps
taken just before shutting down the instances.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
845b1e2b0725 ##################-website "/usr/bin/supervisor…" 4 months ago Up 4 months (healthy) 0.0.0.0:8081->8080/tcp, :::8081->8080/tcp ####################-website-1
46cd2f360104 gitea/act_runner:latest "/sbin/tini -- /opt/…" 6 months ago Up 4 months docker-runner-1
2173badc8b07 postgres:14 "docker-entrypoint.s…" 6 months ago Up 4 months 5432/tcp docker-giteadb-1
0990a6a53465 gitea/act_runner:latest "/sbin/tini -- /opt/…" 7 months ago Up 4 months docker-compose-configs-runner-1
34fd2b4b3cd9 gitea/gitea:latest "/usr/bin/entrypoint…" 7 months ago Up 4 months 0.0.0.0:22->22/tcp, :::22->22/tcp, 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp docker-compose-configs-gitea-1
84a8132cc11d postgres:14 "docker-entrypoint.s…" 7 months ago Up 4 months 5432/tcp docker-compose-configs-giteadb-1
bda89d4a5cfe 176399451347 "docker-entrypoint.s…" 18 months ago Up 4 months 5432/tcp docker-compose-configs-odoodb-1
Once I shut down the instances and brought them back up, the postgres container was bootlooping with the logs:
$ docker logs 8377a9155386
PostgreSQL Database directory appears to contain a database; Skipping initialization
2024-10-15 02:07:25.301 UTC [1] LOG: starting PostgreSQL 14.11 (Debian 14.11-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-10-15 02:07:25.302 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2024-10-15 02:07:25.302 UTC [1] LOG: listening on IPv6 address "::", port 5432
2024-10-15 02:07:25.309 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-10-15 02:07:25.319 UTC [27] LOG: database system was shut down at 2024-10-14 23:25:46 UTC
2024-10-15 02:07:25.319 UTC [27] LOG: invalid primary checkpoint record
2024-10-15 02:07:25.319 UTC [27] PANIC: could not locate a valid checkpoint record
2024-10-15 02:07:25.706 UTC [1] LOG: startup process (PID 27) was terminated by signal 6: Aborted
2024-10-15 02:07:25.707 UTC [1] LOG: aborting startup due to startup process failure
2024-10-15 02:07:25.736 UTC [1] LOG: database system is shut down
I found many posts about “PANIC: could not locate a valid checkpoint record” and This stack overflow post in particular was very helpful. Following the steps from the top two answers, I was able to get the gitea server and database back up and running. However, there was significant data loss. While there were some accounts still accessible, a seemingly random set of accounts (as seen in the admin panel) were missing.
A couple github issues that I also found relevant were:
What Now?
So at this point, I’m faced with a few possibilities and associated questions:
- It’s possible that I might be able to recover the database, but I don’t know enough about postgres to know all of my options with that. As such, if anyone has suggestions or ideas for things I can try, let me know and I’ll give them a chance.
- I may need to re-setup the server and have everyone recreate their accounts and permissions. In this case I’d like to know how one might go about linking back all of the repositories to their respective accounts and organizations in a clean way, and hopefully a way that reduces the burden for everyone that was using this server having to re-configure their repositories.