Docs/Mirrors: Difference between revisions

From MTU LUG Wiki
Jump to navigation Jump to search
(initial commit)
 
Line 27: Line 27:
This is especially important for downstream mirrors, as they do not(?) typically check package authenticity, replying on end-user's package managers to check them.
This is especially important for downstream mirrors, as they do not(?) typically check package authenticity, replying on end-user's package managers to check them.


Under the hood, ftpsync can use the ftp or rsync protocols, and is determined by the ftpsync config for a distribution.
Under the hood, ftpsync can use the ftp(?) or rsync protocols, and is determined by the ftpsync config for a distribution.


Mirrors pulls from most upstream mirrors over FTP, I'm not completely sure why (perhaps better performance than rsync?)
The upstream servers Mirrors pulls from, as well as what 'Tier' we are for that distribution, is as follows:
{| class="wikitable sortable"
!Distribution
!Tier
!Upstream
!Method
|-
|Arch Linux
|1
|rsync.archlinux.org
|rsync
|-
|CentOS AltArch
|1
|msync.centos.org
|rsync
|-
|CentOS Stream
|1
|rsync.stream.centos.org
|rsync
|-
|CentOS
|1
|msync.centos.org
|rsync
|-
|Debian CD
|1
|cdimage.debian.org
|rsync
|-
|Debian
|1
|syncproxy2.wna.debian.org
|rsync
|-
|EPEL
|1
|dl.fedoraproject.org
|rsync
|-
|Fedora
|1
|dl.fedoraproject.org
|rsync
|-
|gentoo
|2?
|ftp.ussg.iu.edu
|rsync
|-
|RPM Fusion
|1
|download1.rpmfusion.org
|rsync
|-
|Ubuntu Releases
|2?
|mirror.math.princeton.edu
|rsync
|-
|Ubuntu
|2?
|mirror.math.princeton.edu
|rsync
|}




Line 58: Line 124:
|*:43
|*:43
|-
|-
|CentOS'''*'''
|CentOS AltArch'''*'''
|00:24 AM
|00:24 AM
|06:24 AM
|06:24 AM
Line 64: Line 130:
|06:24 PM
|06:24 PM
|-
|-
|CentOS AltArch'''*'''
|CentOS Stream'''**'''
|00:24 AM
|00:24 AM
|06:24 AM
|06:24 AM
Line 70: Line 136:
|06:24 PM
|06:24 PM
|-
|-
|CentOS Stream'''**'''
|CentOS'''*'''
|00:24 AM
|00:24 AM
|06:24 AM
|06:24 AM
|12:24 PM
|12:24 PM
|06:24 PM
|06:24 PM
|-
|Debian
|00:03 AM
|06:03 AM
|12:03 AM
|06:03 PM
|-
|-
|Debian CD
|Debian CD
Line 87: Line 147:
|12:12 PM
|12:12 PM
|06:12 PM
|06:12 PM
|-
|Debian
|00:03 AM
|06:03 AM
|12:03 AM
|06:03 PM
|-
|-
|EPEL'''***'''
|EPEL'''***'''
Line 111: Line 177:
|12:15 PM
|12:15 PM
|06:15 PM
|06:15 PM
|-
|Ubuntu'''**'''
|00:30 AM
|06:30 AM
|12:30 PM
|06:30 PM
|-
|-
|Ubuntu Releases'''**'''
|Ubuntu Releases'''**'''
Line 123: Line 183:
|12:45 PM
|12:45 PM
|06:45 PM
|06:45 PM
|-
|Ubuntu'''**'''
|00:30 AM
|06:30 AM
|12:30 PM
|06:30 PM
|}
|}
'''* = Deprecated, should be removed'''
'''* = Deprecated, should be removed'''

Revision as of 09:49, 11 February 2025

Mirrors runs FreeBSD

uses ZFS as the filesystem

used to use salt, broke when upgrading from FreeBSD 12-14


At its core, a mirror server performs two functions:

  1. Synchronizing the content from upstream mirrors to itself
  2. Hosting that downloaded content for end-users and other downstream mirrors to pull from


On our mirrors, this is accomplished with archvsync+cron to synchronize content with upstream, and vsftpd+rsyncd+nginx to handle hosting the content.

Pulling from upstream

ftpsync

The ftpsync utility from the archvsync project is what we use to synchronize content on upstream servers to Mirrors.

archvsync is a script by the Debian project to provision mirror servers, and should be all-inclusive.

We don't use archvsync in full, only the ftpsync utility to handle synchronizing files.

It's nice because it prevents half-sync'd files from getting served to users, so they don't pull corrupted binaries that don't verify with the package signature.

This is especially important for downstream mirrors, as they do not(?) typically check package authenticity, replying on end-user's package managers to check them.

Under the hood, ftpsync can use the ftp(?) or rsync protocols, and is determined by the ftpsync config for a distribution.

The upstream servers Mirrors pulls from, as well as what 'Tier' we are for that distribution, is as follows:

Distribution Tier Upstream Method
Arch Linux 1 rsync.archlinux.org rsync
CentOS AltArch 1 msync.centos.org rsync
CentOS Stream 1 rsync.stream.centos.org rsync
CentOS 1 msync.centos.org rsync
Debian CD 1 cdimage.debian.org rsync
Debian 1 syncproxy2.wna.debian.org rsync
EPEL 1 dl.fedoraproject.org rsync
Fedora 1 dl.fedoraproject.org rsync
gentoo 2? ftp.ussg.iu.edu rsync
RPM Fusion 1 download1.rpmfusion.org rsync
Ubuntu Releases 2? mirror.math.princeton.edu rsync
Ubuntu 2? mirror.math.princeton.edu rsync


archvsync.tar.gz contains all the archvsync configs and scripts.

It's just a repo that was git pull'd, and was not natively installed on the system via pkg.

When we rebuild mirrors, we should either install it via pkg (if available) or git pull it anew, and place it in /opt/archvsync/

cron

cron calls ftpsync to run at certain times.

This is what determines that, say, arch is synced with upstream every ~15 minutes, while Debian is only synced four times a day, for example.

This should be set according to the distribution's official docs on mirrors.

Most distros want tier 1's to sync 4 times a day, and set the exact hours/minutes slightly randomly so they don't get every downstream server hammering requests all at once.

Currently, this is the schedule Mirrors uses (all times in EST):

Distribution Sync Times
Arch Linux *:03 *:13 *:33 *:43
CentOS AltArch* 00:24 AM 06:24 AM 12:24 PM 06:24 PM
CentOS Stream** 00:24 AM 06:24 AM 12:24 PM 06:24 PM
CentOS* 00:24 AM 06:24 AM 12:24 PM 06:24 PM
Debian CD 00:12 AM 06:12 AM 12:12 PM 06:12 PM
Debian 00:03 AM 06:03 AM 12:03 AM 06:03 PM
EPEL*** 00:30 AM 06:30 AM 12:30 PM 06:30 PM
Fedora 00:15 AM 06:15 AM 12:15 PM 06:15 PM
gentoo 00:14 AM 06:14 AM 12:14 PM 06:14 PM
RPM Fusion*** 00:15 AM 06:15 AM 12:15 PM 06:15 PM
Ubuntu Releases** 00:45 AM 06:45 AM 12:45 PM 06:45 PM
Ubuntu** 00:30 AM 06:30 AM 12:30 PM 06:30 PM

* = Deprecated, should be removed

** = Suspected broken, needs to be investigated

*** = Maybe deprecated?

Keep in mind this is when syncing starts, it may take a moment before it's fully up-to-date with upstream.

Serving to downstream

vsftpd is the ftp daemon running on port 21, and allows all recursive content inside /lug to be downloaded by anonymous users.

rsyncd does the same, but as the rsync daemon running on port 873.

This is what many downstream mirrors use to pull from us, as we're a tier 1 for some distros.

nginx is again similar, but as the http daemon running on ports 80/443 (http/https, respectively).

This is what's used by most end-users to download packages from us for their installs.

It has a hardcoded if block pointing to each distro's dataset path, I'm almost certain could just be replaced with root /lug in the server block.


salt.tar.gz contains all the configuration for salt, and the config files it uses to overwrite the config files located in the standard location (as well as the template files it uses to 'build' configs for services like rsyncd and archvsync when a new distro is added to the primary salt config)


Salt

Salt used to administer these services, but it's half-broken at the moment and should not be reinstalled (in my opinion).

All salt did was functionally copy config files from it's 'special snowflake' directory (/usr/local/srv/salt/files) into the standard locations (like /etc/nginx and whatnot), so functionally replacing the established norm for administering a *NIX system with a custom setup.

The benefit of easing the process of adding new distributions to the mirror is not worth the consequence of having a brittle system that breaks when config files are manually edited or when upgrading the system.

As such, I think the way Mirrors is setup is essentially perfect, sans salt.