[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: CI: Issues with networking on some platforms



Hi Paul,

On 2023-11-05 13:03:15 +0100, Paul Gevers wrote:
On 05/11/2023 12.05, Peter Wienemann wrote:
2. other nodes like the salsa CI runners do not show this behaviour.

Inside a unstable lxc on ci-worker13:
root@elbrus:~# curl https://mirrors.almalinux.org/mirrorlist/8/baseos
http://dal.mirrors.clouvider.net/almalinux/8.8/BaseOS/$basearch/os/
https://na.edge.kernel.org/almalinux/8.8/BaseOS/$basearch/os/
http://mirror.dal.nexril.net/almalinux/8.8/BaseOS/$basearch/os/
http://tx-mirror.tier.net/almalinux/8.8/BaseOS/$basearch/os/
http://almalinux-mirror.dal1.hivelocity.net/8.8/BaseOS/$basearch/os/
http://mirror.almalinux.dal01.readydedis.com/almalinux/8.8/BaseOS/$basearch/os/
http://almalinux.mirror.beocat.ksu.edu/8.8/BaseOS/$basearch/os/
http://nocix.mm.fcix.net/almalinux/8.8/BaseOS/$basearch/os/
https://repos.eggycrew.com/almalinux/8.8/BaseOS/$basearch/os/
http://mirror.rnet.missouri.edu/almalinux/8.8/BaseOS/$basearch/os/root@elbrus:~# echo $?
0

And after installing dnf (so I miss info to help you):
root@elbrus:~# dnf install -y --setopt=install_weak_deps=false openssh-clients Unable to detect release version (use '--releasever' to specify release version) Error: There are no enabled repositories in "/etc/yum.repos.d", "/etc/yum/repos.d", "/etc/distro.repos.d".

thanks for this additional data point. I think it is nicely in line with the findings for ch-image, namely that networking in general works. In the meantime I have been able to pin down the culprit for those connection timeouts if the builds are done using docker:

If "docker build" is called without any options, the RUN instructions in the Dockerfile are executed inside network namespaces. If I do this on ci-worker13, I run into connection timeouts. If I call "docker build" with the "--network=host" option, i. e. no dedicated network namespace is created for the RUN statements, it works on ci-worker13. Thus my present working hypothesis is that somehow ci-worker13 (and maybe also the other CI workers) seems to impose restrictions on the connectivity in network namespaces - at least for those set up by docker build.

The remaining failures on armel, armhf and i386 are simply because those platforms are not available for almalinux. This is straightforward to fix. Less obvious is the cause for the DNS issue observed for s390x [0]:

--------------------------------------------------------------------------
159s error: GET failed: HTTPSConnectionPool(host='registry-1.docker.io', port=443): Max retries exceeded with url: /v2/library/almalinux/manifests/8 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x3ff9784ef10>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
--------------------------------------------------------------------------

Paul, provided you are able to reproduce those results and provided it is clear where those restrictions come from, do you think it would be possible to equalize the configuration of the CI worker nodes behind ci.debian.net and the CI runners for salsa? Debugging salsa CI issues is significantly faster than debugging issues only showing up on ci.debian.net.

Thanks again for all your efforts and best regards,

Peter

[0] https://ci.debian.net/data/autopkgtest/testing/s390x/c/charliecloud/39635696/log.gz


Reply to: