ianw | admittedly trying to land a stack of 7-8 changes really multiples out your exposure to it, but it does look like our timesouts aren't sufficient for ovh gra1 | 00:00 |
---|---|---|
*** diablo_rojo is now known as Guest7758 | 00:04 | |
clarkb | ya, I wonder if we can identify any particularly bad spots and see if it is related to disk io | 00:05 |
ianw | from a quick look they all seem to get into testinfra, but nothing is clearly a slow point. it's just, slow :) | 00:06 |
ianw | https://paste.opendev.org/show/boadCZ8kQZg4UcDaM2jN/ is the last ~100 jobs that timed out bucketed | 00:16 |
*** rlandy|bbl is now known as rlandy | 00:28 | |
*** rlandy is now known as rlandy|out | 00:46 | |
opendevreview | Ian Wienand proposed opendev/system-config master: install-ansible: remove testinfra version install workaround https://review.opendev.org/c/opendev/system-config/+/852475 | 01:13 |
opendevreview | Ian Wienand proposed opendev/system-config master: testinfra: install with ansible extras https://review.opendev.org/c/opendev/system-config/+/852476 | 01:13 |
opendevreview | Ian Wienand proposed opendev/system-config master: install-ansible: remove stevedore workaround https://review.opendev.org/c/opendev/system-config/+/852477 | 01:15 |
opendevreview | Ian Wienand proposed opendev/system-config master: install-ansible: remove stub install for ARA https://review.opendev.org/c/opendev/system-config/+/852478 | 01:25 |
opendevreview | Ian Wienand proposed opendev/system-config master: system-config-run: bump base timeout to 3600 https://review.opendev.org/c/opendev/system-config/+/852479 | 01:46 |
ianw | clarkb/fungi: ^ i noticed we've got a lot of overrides to 3600 anyway; so i think that is probably sane | 01:47 |
ianw | https://review.opendev.org/c/opendev/system-config/+/850435 and https://review.opendev.org/c/opendev/system-config/+/696211 hit timeouts just now | 01:47 |
*** prometheanfire is now known as Guest1 | 01:48 | |
*** osmanlicilegi is now known as Guest0 | 01:48 | |
*** Guest1 is now known as prometheanfire | 01:48 | |
fungi | wfm | 01:56 |
opendevreview | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/852483 | 02:10 |
opendevreview | Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/852483 | 02:41 |
opendevreview | Ian Wienand proposed opendev/system-config master: system-config-run: bump base timeout to 3600 https://review.opendev.org/c/opendev/system-config/+/852479 | 04:36 |
opendevreview | Ian Wienand proposed opendev/system-config master: install-borg: update pip/setuptools in venv https://review.opendev.org/c/opendev/system-config/+/852487 | 04:36 |
opendevreview | Ian Wienand proposed opendev/system-config master: install-borg: update to borg 1.1.18 https://review.opendev.org/c/opendev/system-config/+/852488 | 04:36 |
opendevreview | Ian Wienand proposed opendev/system-config master: system-config-run-borg-backup: add jammy test host https://review.opendev.org/c/opendev/system-config/+/852489 | 04:48 |
*** ysandeep|out is now known as ysandeep | 05:12 | |
*** persia is now known as Guest23 | 05:27 | |
*** marios is now known as marios|ruck | 05:30 | |
opendevreview | Ian Wienand proposed opendev/system-config master: install-borg: update to borg 1.1.18 https://review.opendev.org/c/opendev/system-config/+/852488 | 06:42 |
opendevreview | Ian Wienand proposed opendev/system-config master: system-config-run-borg-backup: add jammy test host https://review.opendev.org/c/opendev/system-config/+/852489 | 06:42 |
opendevreview | Ian Wienand proposed openstack/project-config master: nodepool: update package maps for Rocky 9 https://review.opendev.org/c/openstack/project-config/+/852518 | 06:51 |
ianw | i always forget to check that :/ | 06:52 |
*** ysandeep is now known as ysandeep|afk | 07:02 | |
*** jpena|off is now known as jpena | 07:35 | |
*** zbr is now known as ssbarnea | 07:54 | |
opendevreview | jayaditya gupta proposed openstack/diskimage-builder master: Fix issue in extract image https://review.opendev.org/c/openstack/diskimage-builder/+/850882 | 08:23 |
*** ysandeep|afk is now known as ysandeep | 08:25 | |
*** marios|ruck is now known as marios|ruck|afk | 08:47 | |
*** marios|ruck|afk is now known as marios|ruck | 09:34 | |
*** soniya29 is now known as soniya29|afk | 09:44 | |
*** soniya29|afk is now known as soniya29 | 09:48 | |
*** rlandy__ is now known as rlandy | 10:30 | |
*** ysandeep is now known as ysandeep|afk | 10:43 | |
*** tosky is now known as Guest71 | 10:46 | |
*** tosky_ is now known as tosky | 10:46 | |
*** dviroel|out is now known as dviroel | 11:32 | |
*** bhagyashris is now known as bhagyashris|afk | 11:44 | |
*** tosky is now known as Guest77 | 12:01 | |
*** tosky__ is now known as tosky | 12:01 | |
*** tosky is now known as Guest78 | 12:30 | |
*** tosky_ is now known as tosky | 12:30 | |
*** bhagyashris|afk is now known as bhagyashris | 12:50 | |
*** tosky is now known as Guest85 | 13:13 | |
*** tosky__ is now known as tosky | 13:13 | |
*** dasm|off is now known as dasm | 13:30 | |
opendevreview | Jeremy Stanley proposed opendev/infra-manual master: Stop recommending PyPI project name squatting https://review.opendev.org/c/opendev/infra-manual/+/852584 | 14:06 |
*** pojadhav is now known as pojadhav|afk | 14:10 | |
*** Guest7758 is now known as diablo_rojo | 14:11 | |
*** sfinucan is now known as stephenfin | 15:16 | |
*** rlandy__ is now known as rlandy | 15:20 | |
*** dviroel is now known as dviroel|lunch | 15:39 | |
*** dasm is now known as dasm|off | 15:54 | |
*** ysandeep is now known as ysandeep|out | 15:58 | |
TheJulia | out of curiosity, are there any known issues on centos stream 9 with connecting to gerrit? | 16:05 |
clarkb | TheJulia: there is a now long standing issue with gerrit's mina sshd where it cannot support ssh clients that default to rsa + sha1. Unfortunately despite new openssh dropping support for rsa + sha1 by default they also default to it if the server doesn't negotiate sha2 (they should default to sha2 if sha1 is invalid..) | 16:14 |
clarkb | Gerrit 3.6 will fix this when we eventually upgrade. Until then you can manually override the support for sha1 to enable it again openssh 8.8 release notes have details on this. or use a different key type. I personally created a new key rather than use sha1 | 16:15 |
TheJulia | clarkb: is it possible to use a non-rsa key? | 16:24 |
TheJulia | and just to confirm, this presents as error in libcrypto to ssh? | 16:24 |
clarkb | TheJulia: I have no idea about libcrypto. The underlying mechanism is that when ssh implementations started removing rsa+sha1 they didn't update their rsa fallback to rsa+sha2. That means when the server does not support key exchange extensions to negotiate rsa+sha2 then rsa+sha1 is the only option and it fails on the client side. OpenSSH with 8.8 has this issue. Earlier versions on | 16:26 |
clarkb | fedora because fedora patched locally. Yes, using a different key type is what we suggest | 16:26 |
clarkb | I switched to ed25519 keys in response to this | 16:27 |
TheJulia | ok | 16:27 |
*** marios|ruck is now known as marios|out | 16:28 | |
clarkb | ianw, fungi and myself managed to work with upstream MINA SSHD and Gerrit to finally get their sshd implementation to support key exchange extensions but that required updating the MINA lib which they don't typically backport | 16:28 |
clarkb | that means Gerrit 3.6 is the oldest Gerrit with the fix in it | 16:28 |
clarkb | note this doesn't have anything to do with your key material on disk. This is intirely a protocol negotiation problem and poor default choices in clinets | 16:29 |
clarkb | The ssh RFC even says clients should update to sha2 as their default when removing sha1... unfortunately no one seems to have done this | 16:30 |
clarkb | fungi: left a note on https://review.opendev.org/c/opendev/infra-manual/+/852584 and didn't approve as its parent needs to go first | 16:32 |
clarkb | But I think its fine as is | 16:32 |
fungi | thanks. i also responded (i think) | 16:33 |
clarkb | oh ya I have to refresh. At some point the gerrit web ui stopped alerting you there are new comments and you shoudl reload :/ | 16:35 |
*** jpena is now known as jpena|off | 16:38 | |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 16:41 |
clarkb | That latest patchset makes some small changes to the lets encrypt setup for the service to address ianw's comments. Still nothing substantial enough to require a newly held node | 16:47 |
fungi | so still 104.130.26.212 at the moment? hopefully i can poke at it after my next meeting | 16:48 |
clarkb | yup that node is still the one | 16:49 |
*** dviroel|lunch is now known as dviroel | 16:51 | |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 17:59 |
clarkb | Just a bug in the last ps. Old hold still good | 17:59 |
*** dasm|off is now known as dasm | 18:23 | |
clarkb | hrm I appear to have a race in creating the admin user: https://zuul.opendev.org/t/openstack/build/7500b1062f1949ebac0d2093a77d57dd/log/job-output.txt#15210-15330 I already delay waiting for the db table to be present and for the rest api to respond. Maybe I need to wait for all the DB tables to be present | 18:38 |
clarkb | I've rechecked it to see if it is consistent. In theory it could be something else updating in the system that makes this flaky. But I suspect it is just a race and I need a better delay mechanism | 18:39 |
fungi | that would make sense | 18:56 |
ianw | https://104.130.26.212/ gives me "bad request", is that a known issue? | 19:34 |
clarkb | ianw: yes, you have to use one of the mailman domain names | 19:34 |
ianw | ahh | 19:34 |
*** dviroel is now known as dviroel|biab | 19:34 | |
clarkb | django will refuse to serve content that isn't for the actual named stuff. We could add the public IP of the server to the list. But since we are doing vhosted mailman it doesn't make mcuh sense | 19:34 |
clarkb | This means if you go to lists.openstack.org or list.opendev.org etc at taht IP you'll see different content | 19:35 |
clarkb | (and that part of the deployment does seem to be working) | 19:35 |
ianw | right, yeah i saw the hostname setup bit of the ansible | 19:35 |
ianw | i tried running through account creation at https://lists.opendev.org/accounts/signup/ (with redirect) and got a 500 error back | 19:38 |
ianw | after entering details, and choosing a password that satisfied it's internal desires | 19:39 |
clarkb | oh heh I'd only created accounts on the command line. Definitely should've tried through the UI | 19:39 |
clarkb | my hunch is something about the outbound smtp settings for user verification are sad. But the logs should tell us | 19:39 |
clarkb | /var/lib/mailman/core/var/logs and/or /var/lib/mailman/web-data/logs/ | 19:39 |
clarkb | ya "smtp recipients refused" for your email addr | 19:41 |
clarkb | definitely the sort of thing I'll need to lean on others to help configure properly | 19:41 |
ianw | smtplib.SMTPRecipientsRefused: {'<mail>': (451, b'Temporary local problem - please try later')} | 19:41 |
clarkb | /var/lib/mailman/web-data/logs/mailmanweb.log | 19:41 |
ianw | jinx :) | 19:41 |
fungi | that may be a safety measure we added to keep the test deployment from sending e-mail to people | 19:45 |
clarkb | fungi: no, I haven't added anything like that yet since the test lists are just the default mailman@domain lists | 19:45 |
clarkb | I definitelly attempted to configure mailman-web to talk to local exim on port 25 to forward email | 19:46 |
clarkb | we may haev an exim config issue or a firewall problem | 19:46 |
fungi | ahh okay. so it's not configuring the lists from our data | 19:46 |
clarkb | fungi: not yet. I pruned the list back due to that concern | 19:46 |
fungi | i know we have safety mechanisms in place in our mm2 tests to avoid having them spam people | 19:47 |
clarkb | ianw: for https://review.opendev.org/c/opendev/system-config/+/852487 that will install a virtualenv then upgrade pip and setuptools. Any concern that the pip and setuptools install will hit the same issue where it will install the wrong version of pip and/or setuptools for the local python version? | 19:47 |
clarkb | latest setuptools for example needs python3.7 or newer | 19:47 |
ianw | hrm, maybe? perhaps those are coming in via wheels, and the version we have on bionic knows how to get the right wheel, but not things that don't have wheels? (maybe?) | 19:49 |
clarkb | would it be worth holding a test node with that change in place to see what the resulting venv looks like? | 19:49 |
clarkb | I'm slightly worried that CI might pass because the installation works but when yo uactually run the software it could break and then we'd break backups in bprod | 19:50 |
ianw | i'm not too worried about that because the backup jobs do actually run a full backup so exercise that | 19:50 |
ianw | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_bcf/852479/1/check/system-config-run-borg-backup/bcf669c/bridge.openstack.org/ara-report/results/432.html | 19:50 |
ianw | that is the original error | 19:50 |
clarkb | ah ok | 19:51 |
clarkb | we can also check on it afterwards and do venv surgery if necessary | 19:51 |
clarkb | ianw: I +2'd it but didn't approve just so that you can confirm you are happy with that plan before we dive in | 19:52 |
clarkb | I also need to step away for a bit and find lunch here | 19:53 |
ianw | i don't know why i called these "test0X" instead of "test-<distro>" | 19:55 |
ianw | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c96/852487/1/check/system-config-run-borg-backup/c96cb5e/bridge.openstack.org/ara-report/results/433.html | 19:57 |
ianw | is the install/update | 19:57 |
ianw | so you're right, it's chosen versions that are technically 3.7+ ... :/ | 19:59 |
clarkb | in that case I think what we want to do is add python3.6 annotations for what we install | 19:59 |
clarkb | So add a task for bionic nodes that install pip<foo and setuptools<bar and then the default should just work beacuse focal and jammy have new enough pip | 20:00 |
ianw | Requirement already satisfied: pip in ./venv-3.6/lib/python3.6/site-packages (21.3.1) | 20:01 |
ianw | Requirement already satisfied: setuptools in ./venv-3.6/lib/python3.6/site-packages (59.6.0) | 20:01 |
ianw | if you create a 3.6 venv with tools that seem to get this right, that is the versions they choose | 20:01 |
fungi | looks right to me | 20:02 |
fungi | pip<22 setuptools<60 | 20:02 |
clarkb | "with tools" ? | 20:02 |
clarkb | oh I see updated tools that know what version to do | 20:02 |
clarkb | ++ to fungi's versions | 20:03 |
ianw | yeah, if you start with pip/setuptools in the venv that get it right | 20:03 |
ianw | now i think about it, this is also going to affect putting ansible in a venv on bionic | 20:03 |
ianw | i think it might be best to abstract this | 20:03 |
ianw | when we have purged bionic, we can then just remove one workaround place | 20:04 |
clarkb | a new "install venv" role that knows how to handle bionic maybe? | 20:04 |
clarkb | then anywhere we need a venv we can just use that | 20:04 |
ianw | yeah | 20:04 |
fungi | if you do it as two steps, first installing pip<22 in the venv and then installing the rest, it should pick correct versions | 20:04 |
ianw | it seems to be setuptools causing the problems, so i think we probably want that at the later version first? | 20:05 |
clarkb | maybe you do three setups: 1) create venv with pip<22 2) install setuptools in venv 3) install $application | 20:06 |
fungi | sure, just saying that if you have the right pip in the venv, then it will pick a correct newer setuptools without needing to pin | 20:07 |
ianw | this is such a groundhog day of how we got into a huge mess with "ensure-pip-and-virtualenv" :) | 20:07 |
clarkb | when you install setuptools alongside $application in the same pip invocation you use the old setuptools to install the application not the one you just installed | 20:07 |
ianw | yeah, that was my concern | 20:07 |
fungi | though technically not the case for pep 517 compliant installation | 20:08 |
fungi | since it should pull in a newer build backend for the isolated build environment | 20:08 |
fungi | (where setuptools is the default build backend) | 20:08 |
fungi | so in that case the version of setuptools in your venv may not even be used at all | 20:09 |
*** dviroel|biab is now known as dviroel | 20:13 | |
ianw | ok i will look at this after some breakfast :) | 20:15 |
*** dasm is now known as dasm|off | 20:43 | |
clarkb | hrm that mm3 change admin user creation issue happened on my recheck. I'm going to push a naive pause up to see if that helps | 21:14 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 21:15 |
ianw | is this a "wait for the db to be up" thing? i feel like we did have something for that in one of the roles... | 21:15 |
clarkb | ianw: its a wait for the db to be populated with the necessary tables. | 21:16 |
clarkb | ianw: I'm already waiting for the rest api to respond and for the table that records the admin user to be in place. But the command to create the admin user appears to rely on other bits in the database | 21:17 |
clarkb | specifically django.db.utils.ProgrammingError: (1146, "Table 'mailmandb.hyperkitty_profile' doesn't exist") | 21:17 |
*** dviroel is now known as dviroel|out | 21:17 | |
ianw | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gerrit/templates/docker-compose.yaml.j2#L24 was what i was thinking of, "wait-for-it.sh" | 21:18 |
clarkb | but I think there are other tables and stuff. I don't know that there is a singular "its up" notification or something | 21:18 |
ianw | ahh :/ | 21:19 |
clarkb | for production its hardly worth worryin about because once the db is populated it will be happy from then on. But making CI more reliable is proving tricky | 21:19 |
clarkb | it does use alembic to manage the schema. I wonder if alembic records a "I'm done" record somewhere | 21:22 |
*** diablo_rojo_phone is now known as Guest139 | 21:23 | |
clarkb | ianw: any idea why I can't get a tty in the mariadb container using `sudo docker exec -it mailman-compose_database_1 bash` on that jammy mm3 node? I suspect that maybe cgroupsv2 are to blame? | 21:26 |
ianw | hrm, do you see a "open /dev/pts/0: operation not permitted: unknown"? | 21:27 |
ianw | ls -l /dev/pts/0 | 21:27 |
ianw | crw--w---- 1 clarkb tty 136, 0 Aug 9 21:26 /dev/pts/0 | 21:27 |
ianw | perhaps you don't :) | 21:27 |
clarkb | ya I see that | 21:28 |
clarkb | huh screen isn't installed either | 21:29 |
ianw | https://github.com/opencontainers/runc/issues/3551 seems like it | 21:30 |
ianw | i have not seen this before | 21:30 |
clarkb | oh cool. I guess upstream knows about it so patience is the best thing unless I want to actually fix it | 21:31 |
opendevreview | Clark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server https://review.opendev.org/c/opendev/system-config/+/851248 | 21:41 |
clarkb | If that works that will be better than the naive pause | 21:41 |
ianw | $ touch /tmp/foo.txt | 21:45 |
ianw | $ echo "hi" | sudo tee -a /tmp/foo.txt | 21:45 |
ianw | tee: /tmp/foo.txt: Permission denied | 21:45 |
ianw | i feel like this has something to do with it | 21:45 |
ianw | maybe /tmp has always been sticky and i've never noticed? | 21:46 |
fungi | uh... root can't append to a user-owned file? | 21:46 |
clarkb | fwiw I was able to docker exec -it last friday | 21:47 |
clarkb | I did that to browse the container fs's to find where the logs were written to | 21:47 |
ianw | https://paste.opendev.org/show/b6M5uw1XZ4YuwTOPNo5D/ | 21:47 |
clarkb | oh its failing to open the tty for that part of tee not the tmpfile | 21:48 |
clarkb | still odd that root can't open that file | 21:48 |
ianw | sorry that trace is from the docker exec | 21:50 |
clarkb | ah, But I bet that is what breaks with tee | 21:51 |
clarkb | its probably not writing to the tmp file but writing to your tty? | 21:51 |
clarkb | or does it just inherit stdout to avoid that/ maybe | 21:52 |
ianw | https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1877151 | 21:57 |
ianw | this is definitely TIL territory | 21:57 |
ianw | root not being able to append is the idea of fs.protected_regular=2 | 21:58 |
ianw | well setting all the "protected" sysctls to 0 on that host still doesn't make it work. so might be a red-herring | 22:01 |
clarkb | that bug indicates that focal should have this behavior too. I wonder why we haven't seen it there? THough maybe we just haven't tried it | 22:01 |
clarkb | ah ok | 22:01 |
*** undefined is now known as Guest145 | 22:02 | |
*** Guest145 is now known as rcastillo | 22:03 | |
fungi | that's... so strange | 22:03 |
ianw | 28674 prctl(PR_SET_NAME, "runc:[2:INIT]" <unfinished ...> : it is runc that gets this error | 22:04 |
ianw | https://patchwork.ozlabs.org/project/buildroot/patch/35a6c994-5279-4823-8864-be52f6f925ef@cesnet.cz/ | 22:07 |
ianw | looks related | 22:07 |
ianw | he container boots, but | 22:08 |
ianw | `machinectl login mycontainer` fails. The culprit is /dev/pts/ptmx with | 22:08 |
ianw | 0000 perms. | 22:08 |
ianw | $ ls -l /dev/pts/ptmx | 22:08 |
ianw | c--------- 1 root root 5, 2 Aug 8 21:12 /dev/pts/ptmx | 22:08 |
clarkb | https://www.kernel.org/doc/Documentation/filesystems/devpts.txt | 22:10 |
clarkb | I'm not sure I undersatnd what that is trying to tell me but it does indicate the old system used more permissive perms | 22:12 |
ianw | well the trace of this is /tmp/containerd.txt ; 28674 is the pid that fails | 22:13 |
clarkb | the wait on alembic to show the db version is at head passed. I'm going to recheck it a couple of times just to see if we can catch that problem again | 22:14 |
ianw | clarkb: https://5c3613e862b57d687223-af2016a5632320f910deb9dcbf495ac6.ssl.cf1.rackcdn.com/847204/9/check/system-config-run-gitea/eb70227/bridge.openstack.org/screenshots/gitea-project-system-config.png that seems like a new warning box on the screenshot | 23:16 |
ianw | i think with the recent cert work, we should be able to hit that as opendev.org, i'll put it on the list | 23:16 |
Clark[m] | ianw: yup it's a new gitea feature. I think it is nice in that it helps confirm we aren't screenshotting production somehow | 23:17 |
opendevreview | Ian Wienand proposed opendev/system-config master: create-venv: add role; use in install-borg https://review.opendev.org/c/opendev/system-config/+/852487 | 23:36 |
opendevreview | Ian Wienand proposed opendev/system-config master: system-config-run: bump base timeout to 3600 https://review.opendev.org/c/opendev/system-config/+/852479 | 23:36 |
opendevreview | Ian Wienand proposed opendev/system-config master: install-borg: update to borg 1.1.18 https://review.opendev.org/c/opendev/system-config/+/852488 | 23:36 |
opendevreview | Ian Wienand proposed opendev/system-config master: system-config-run-borg-backup: add jammy test host https://review.opendev.org/c/opendev/system-config/+/852489 | 23:36 |
opendevreview | Ian Wienand proposed opendev/system-config master: gate-groups: remove old backup group https://review.opendev.org/c/opendev/system-config/+/852684 | 23:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!