Docs/Mirrors: Difference between revisions
No edit summary |
(→cron) |
||
| (One intermediate revision by the same user not shown) | |||
| Line 43: | Line 43: | ||
===== ftpsync ===== |
===== ftpsync ===== |
||
The ftpsync utility from the archvsync project is what we use to synchronize content on upstream servers to |
The ftpsync utility from the archvsync project is what we use to synchronize content on upstream servers to our local Mirror server. |
||
archvsync is a script by the Debian project to provision mirror servers, and should be all-inclusive. |
archvsync is a script by the Debian project to provision mirror servers, and should be all-inclusive. |
||
| Line 51: | Line 51: | ||
It's nice because it prevents half-sync'd files from getting served to users, so they don't pull corrupted binaries that don't verify with the package signature. |
It's nice because it prevents half-sync'd files from getting served to users, so they don't pull corrupted binaries that don't verify with the package signature. |
||
This is especially important for downstream |
This is especially important when acting as an upstream mirror for downstream mirror servers, as they do not typically check package authenticity, replying on end-user's package managers to check them. |
||
Under the hood, ftpsync can use the ftp(?) or rsync protocols, and is determined by the ftpsync config for a distribution. |
Under the hood, ftpsync can use the ftp(?) or rsync protocols, and is determined by the ftpsync config for a distribution. |
||
| Line 122: | Line 122: | ||
|rsync |
|rsync |
||
|} |
|} |
||
| Line 128: | Line 129: | ||
It's just a repo that was <code>git pull</code>'d, and was not natively installed on the system via pkg. |
It's just a repo that was <code>git pull</code>'d, and was not natively installed on the system via pkg. |
||
If we rebuild mirrors, we should either install it via pkg (if available) or <code>git pull</code> it anew, and place it in <code>/opt/archvsync/</code> |
|||
===== cron ===== |
===== cron ===== |
||
<code>cron</code> calls <code>ftpsync</code> to run at certain times. |
<code>cron</code> calls <code>ftpsync</code> to run at certain times. |
||
This is what determines that |
This is what determines that Arch Linux is synced with upstream every ~15 minutes, while Debian is only synced four times a day. |
||
This should be set according to |
This should be set according to each distribution's rules on mirror servers. |
||
Most distros want tier 1's to sync 4 times a day, and |
Most distros want tier 1's to sync 4 times a day, and want the exact hours/minutes set randomly so the Tier0's don't get every downstream Tier1 hammering requests all at once. |
||
Currently, this is the schedule Mirrors uses (all times in EST): |
Currently, this is the schedule Mirrors uses (all times in EST): |
||
| Line 236: | Line 237: | ||
It has a hardcoded <code>if</code> block pointing to each distro's dataset path, I'm almost certain could just be replaced with <code>root /lug</code> in the <code>server</code> block. |
It has a hardcoded <code>if</code> block pointing to each distro's dataset path, I'm almost certain could just be replaced with <code>root /lug</code> in the <code>server</code> block. |
||
| ⚫ | |||
=== Salt === |
=== Salt === |
||
Salt used to administer these services, but it's half-broken at the moment and ''should not be reinstalled'' (in my opinion). |
Salt used to administer these services, but it's half-broken at the moment and ''should not be reinstalled'' (in my opinion). |
||
| Line 250: | Line 245: | ||
As such, I think the way Mirrors is setup is essentially perfect, sans salt. |
As such, I think the way Mirrors is setup is essentially perfect, sans salt. |
||
| ⚫ | |||
Latest revision as of 22:18, 10 October 2025
The LUG Mirror server mirrors a number of different Linux distros.
Currently, we mirror Debian (+ISOs), Arch Linux, CentOS (+AltArch), gentoo, Fedora (+EPEL &RPMfusion), and Ubuntu (+ISOs). The full list can be seen by going to https://mirrors.lug.mtu.edu/
Hardware
Mirrors is a standalone Dell R730xd server (3.5" drive bay variant)
Currently iDrac is non-functional, this should be investigated.
Operating System
Mirrors runs FreeBSD.
It uses ZFS as the filesystem for the root pool and primary pool that's used for the distribution mirrors (the pool named lug)
It used to use salt, but it broke when upgrading from FreeBSD 12-14. Currently, all maintenance is done by hand (this is a good thing)
At its core, a mirror server performs two functions:
- Synchronizing the content from upstream mirrors to itself
- Hosting that downloaded content for end-users and other downstream mirrors to pull from
On our mirrors, this is accomplished with archvsync+cron to synchronize content with upstream, and vsftpd+rsyncd+nginx to handle hosting the content.
Maintenance
Certificates
Put the certificate (the 'intermediate' download option from our Certificate Authority) in /usr/local/share/certs/mirrors_lug_mtu_edu_bundle.cer, and the key in /usr/local/share/certs/mirrors_lug_mtu_edu.key
then run: service nginx reload Note: reload and NOT restart, as restart kills all existing http(s) connections, while reload just applies the new settings for all future connections. It also won't kill the background daemon if the settings are not valid.
You can view the nginx configuration in /usr/local/etc/nginx/nginx.conf to view and change settings. This file is no longer managed by salt, and can be edited by hand.
Core tasks
At its core, a mirror server performs two functions:
- Synchronizing the content from upstream mirrors to itself
- Hosting that downloaded content for end-users and other downstream mirrors to pull from
On our mirrors, this is accomplished with archvsync+cron to synchronize content with upstream, and vsftpd+rsyncd+nginx to handle hosting the content.
Pulling from upstream
ftpsync
The ftpsync utility from the archvsync project is what we use to synchronize content on upstream servers to our local Mirror server.
archvsync is a script by the Debian project to provision mirror servers, and should be all-inclusive.
We don't use archvsync in full, only the ftpsync utility to handle synchronizing files.
It's nice because it prevents half-sync'd files from getting served to users, so they don't pull corrupted binaries that don't verify with the package signature.
This is especially important when acting as an upstream mirror for downstream mirror servers, as they do not typically check package authenticity, replying on end-user's package managers to check them.
Under the hood, ftpsync can use the ftp(?) or rsync protocols, and is determined by the ftpsync config for a distribution.
The upstream servers Mirrors pulls from, as well as what 'Tier' we are for that distribution, is as follows:
| Distribution | Tier | Upstream | Method |
|---|---|---|---|
| Arch Linux | 1 | rsync.archlinux.org | rsync |
| CentOS AltArch | 1 | msync.centos.org | rsync |
| CentOS Stream | 1 | rsync.stream.centos.org | rsync |
| CentOS | 1 | msync.centos.org | rsync |
| Debian CD | 1 | cdimage.debian.org | rsync |
| Debian | 1 | syncproxy2.wna.debian.org | rsync |
| EPEL | 1 | dl.fedoraproject.org | rsync |
| Fedora | 1 | dl.fedoraproject.org | rsync |
| gentoo | 2? | ftp.ussg.iu.edu | rsync |
| RPM Fusion | 1 | download1.rpmfusion.org | rsync |
| Ubuntu Releases | 2? | mirror.math.princeton.edu | rsync |
| Ubuntu | 2? | mirror.math.princeton.edu | rsync |
archvsync.tar.gz contains all the archvsync configs and scripts.
It's just a repo that was git pull'd, and was not natively installed on the system via pkg.
If we rebuild mirrors, we should either install it via pkg (if available) or git pull it anew, and place it in /opt/archvsync/
cron
cron calls ftpsync to run at certain times.
This is what determines that Arch Linux is synced with upstream every ~15 minutes, while Debian is only synced four times a day.
This should be set according to each distribution's rules on mirror servers.
Most distros want tier 1's to sync 4 times a day, and want the exact hours/minutes set randomly so the Tier0's don't get every downstream Tier1 hammering requests all at once.
Currently, this is the schedule Mirrors uses (all times in EST):
| Distribution | Sync Times | |||
|---|---|---|---|---|
| Arch Linux | *:03 | *:13 | *:33 | *:43 |
| CentOS AltArch* | 00:24 AM | 06:24 AM | 12:24 PM | 06:24 PM |
| CentOS Stream** | 00:24 AM | 06:24 AM | 12:24 PM | 06:24 PM |
| CentOS* | 00:24 AM | 06:24 AM | 12:24 PM | 06:24 PM |
| Debian CD | 00:12 AM | 06:12 AM | 12:12 PM | 06:12 PM |
| Debian | 00:03 AM | 06:03 AM | 12:03 AM | 06:03 PM |
| EPEL*** | 00:30 AM | 06:30 AM | 12:30 PM | 06:30 PM |
| Fedora | 00:15 AM | 06:15 AM | 12:15 PM | 06:15 PM |
| gentoo | 00:14 AM | 06:14 AM | 12:14 PM | 06:14 PM |
| RPM Fusion*** | 00:15 AM | 06:15 AM | 12:15 PM | 06:15 PM |
| Ubuntu Releases | 00:45 AM | 06:45 AM | 12:45 PM | 06:45 PM |
| Ubuntu | 00:30 AM | 06:30 AM | 12:30 PM | 06:30 PM |
* = Deprecated, should be removed
** = Suspected broken, needs to be investigated
*** = Maybe deprecated?
Keep in mind this is when syncing starts, it may take a moment before it's fully up-to-date with upstream.
Serving to downstream
vsftpd is the ftp daemon running on port 21, and allows all recursive content inside /lug to be downloaded by anonymous users.
rsyncd does the same, but as the rsync daemon running on port 873.
This is what many downstream mirrors use to pull from us, as we're a tier 1 for some distros.
nginx is again similar, but as the http daemon running on ports 80/443 (http/https, respectively).
This is what's used by most end-users to download packages from us for their installs.
It has a hardcoded if block pointing to each distro's dataset path, I'm almost certain could just be replaced with root /lug in the server block.
Salt
Salt used to administer these services, but it's half-broken at the moment and should not be reinstalled (in my opinion).
All salt did was functionally copy config files from it's 'special snowflake' directory (/usr/local/srv/salt/files) into the standard locations (like /etc/nginx and whatnot), so functionally replacing the established norm for administering a *NIX system with a custom setup.
The benefit of easing the process of adding new distributions to the mirror is not worth the consequence of having a brittle system that breaks when config files are manually edited or when upgrading the system.
As such, I think the way Mirrors is setup is essentially perfect, sans salt.
salt.tar.gz contains all the configuration for salt, and the config files it uses to overwrite the config files located in the standard location (as well as the template files it uses to 'build' configs for services like rsyncd and archvsync when a new distro is added to the primary salt config)