Tuesday, 2022-08-09

ianwadmittedly trying to land a stack of 7-8 changes really multiples out your exposure to it, but it does look like our timesouts aren't sufficient for ovh gra100:00
*** diablo_rojo is now known as Guest775800:04
clarkbya, I wonder if we can identify any particularly bad spots and see if it is related to disk io00:05
ianwfrom a quick look they all seem to get into testinfra, but nothing is clearly a slow point.  it's just, slow :)00:06
ianwhttps://paste.opendev.org/show/boadCZ8kQZg4UcDaM2jN/ is the last ~100 jobs that timed out bucketed00:16
*** rlandy|bbl is now known as rlandy00:28
*** rlandy is now known as rlandy|out00:46
opendevreviewIan Wienand proposed opendev/system-config master: install-ansible: remove testinfra version install workaround  https://review.opendev.org/c/opendev/system-config/+/85247501:13
opendevreviewIan Wienand proposed opendev/system-config master: testinfra: install with ansible extras  https://review.opendev.org/c/opendev/system-config/+/85247601:13
opendevreviewIan Wienand proposed opendev/system-config master: install-ansible: remove stevedore workaround  https://review.opendev.org/c/opendev/system-config/+/85247701:15
opendevreviewIan Wienand proposed opendev/system-config master: install-ansible: remove stub install for ARA  https://review.opendev.org/c/opendev/system-config/+/85247801:25
opendevreviewIan Wienand proposed opendev/system-config master: system-config-run: bump base timeout to 3600  https://review.opendev.org/c/opendev/system-config/+/85247901:46
ianwclarkb/fungi: ^ i noticed we've got a lot of overrides to 3600 anyway; so i think that is probably sane01:47
ianwhttps://review.opendev.org/c/opendev/system-config/+/850435 and https://review.opendev.org/c/opendev/system-config/+/696211 hit timeouts just now01:47
*** prometheanfire is now known as Guest101:48
*** osmanlicilegi is now known as Guest001:48
*** Guest1 is now known as prometheanfire01:48
opendevreviewOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/85248302:10
opendevreviewMerged openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/85248302:41
opendevreviewIan Wienand proposed opendev/system-config master: system-config-run: bump base timeout to 3600  https://review.opendev.org/c/opendev/system-config/+/85247904:36
opendevreviewIan Wienand proposed opendev/system-config master: install-borg: update pip/setuptools in venv  https://review.opendev.org/c/opendev/system-config/+/85248704:36
opendevreviewIan Wienand proposed opendev/system-config master: install-borg: update to borg 1.1.18  https://review.opendev.org/c/opendev/system-config/+/85248804:36
opendevreviewIan Wienand proposed opendev/system-config master: system-config-run-borg-backup: add jammy test host  https://review.opendev.org/c/opendev/system-config/+/85248904:48
*** ysandeep|out is now known as ysandeep05:12
*** persia is now known as Guest2305:27
*** marios is now known as marios|ruck05:30
opendevreviewIan Wienand proposed opendev/system-config master: install-borg: update to borg 1.1.18  https://review.opendev.org/c/opendev/system-config/+/85248806:42
opendevreviewIan Wienand proposed opendev/system-config master: system-config-run-borg-backup: add jammy test host  https://review.opendev.org/c/opendev/system-config/+/85248906:42
opendevreviewIan Wienand proposed openstack/project-config master: nodepool: update package maps for Rocky 9  https://review.opendev.org/c/openstack/project-config/+/85251806:51
ianwi always forget to check that :/06:52
*** ysandeep is now known as ysandeep|afk07:02
*** jpena|off is now known as jpena07:35
*** zbr is now known as ssbarnea07:54
opendevreviewjayaditya gupta proposed openstack/diskimage-builder master: Fix issue in extract image  https://review.opendev.org/c/openstack/diskimage-builder/+/85088208:23
*** ysandeep|afk is now known as ysandeep08:25
*** marios|ruck is now known as marios|ruck|afk08:47
*** marios|ruck|afk is now known as marios|ruck09:34
*** soniya29 is now known as soniya29|afk09:44
*** soniya29|afk is now known as soniya2909:48
*** rlandy__ is now known as rlandy10:30
*** ysandeep is now known as ysandeep|afk10:43
*** tosky is now known as Guest7110:46
*** tosky_ is now known as tosky10:46
*** dviroel|out is now known as dviroel11:32
*** bhagyashris is now known as bhagyashris|afk11:44
*** tosky is now known as Guest7712:01
*** tosky__ is now known as tosky12:01
*** tosky is now known as Guest7812:30
*** tosky_ is now known as tosky12:30
*** bhagyashris|afk is now known as bhagyashris12:50
*** tosky is now known as Guest8513:13
*** tosky__ is now known as tosky13:13
*** dasm|off is now known as dasm13:30
opendevreviewJeremy Stanley proposed opendev/infra-manual master: Stop recommending PyPI project name squatting  https://review.opendev.org/c/opendev/infra-manual/+/85258414:06
*** pojadhav is now known as pojadhav|afk14:10
*** Guest7758 is now known as diablo_rojo14:11
*** sfinucan is now known as stephenfin15:16
*** rlandy__ is now known as rlandy15:20
*** dviroel is now known as dviroel|lunch15:39
*** dasm is now known as dasm|off15:54
*** ysandeep is now known as ysandeep|out15:58
TheJuliaout of curiosity, are there any known issues on centos stream 9 with connecting to gerrit?16:05
clarkbTheJulia: there is a now long standing issue with gerrit's mina sshd where it cannot support ssh clients that default to rsa + sha1. Unfortunately despite new openssh dropping support for rsa + sha1 by default they also default to it if the server doesn't negotiate sha2 (they should default to sha2 if sha1 is invalid..)16:14
clarkbGerrit 3.6 will fix this when we eventually upgrade. Until then you can manually override the support for sha1 to enable it again openssh 8.8 release notes have details on this. or use a different key type. I personally created a new key rather than use sha116:15
TheJuliaclarkb: is it possible to use a non-rsa key?16:24
TheJuliaand just to confirm, this presents as error in libcrypto to ssh?16:24
clarkbTheJulia: I have no idea about libcrypto. The underlying mechanism is that when ssh implementations started removing rsa+sha1 they didn't update their rsa fallback to rsa+sha2. That means when the server does not support key exchange extensions to negotiate rsa+sha2 then rsa+sha1 is the only option and it fails on the client side. OpenSSH with 8.8 has this issue. Earlier versions on16:26
clarkbfedora because fedora patched locally. Yes, using a different key type is what we suggest16:26
clarkbI switched to ed25519 keys in response to this16:27
*** marios|ruck is now known as marios|out16:28
clarkbianw, fungi and myself managed to work with upstream MINA SSHD and Gerrit to finally get their sshd implementation to support key exchange extensions but that required updating the MINA lib which they don't typically backport16:28
clarkbthat means Gerrit 3.6 is the oldest Gerrit with the fix in it16:28
clarkbnote this doesn't have anything to do with your key material on disk. This is intirely a protocol negotiation problem and poor default choices in clinets16:29
clarkbThe ssh RFC even says clients should update to sha2 as their default when removing sha1... unfortunately no one seems to have done this16:30
clarkbfungi: left a note on https://review.opendev.org/c/opendev/infra-manual/+/852584 and didn't approve as its parent needs to go first16:32
clarkbBut I think its fine as is16:32
fungithanks. i also responded (i think)16:33
clarkboh ya I have to refresh. At some point the gerrit web ui stopped alerting you there are new comments and you shoudl reload :/16:35
*** jpena is now known as jpena|off16:38
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124816:41
clarkbThat latest patchset makes some small changes to the lets encrypt setup for the service to address ianw's comments. Still nothing substantial enough to require a newly held node16:47
fungiso still at the moment? hopefully i can poke at it after my next meeting16:48
clarkbyup that node is still the one16:49
*** dviroel|lunch is now known as dviroel16:51
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124817:59
clarkbJust a bug in the last ps. Old hold still good17:59
*** dasm|off is now known as dasm18:23
clarkbhrm I appear to have a race in creating the admin user: https://zuul.opendev.org/t/openstack/build/7500b1062f1949ebac0d2093a77d57dd/log/job-output.txt#15210-15330 I already delay waiting for the db table to be present and for the rest api to respond. Maybe I need to wait for all the DB tables to be present18:38
clarkbI've rechecked it to see if it is consistent. In theory it could be something else updating in the system that makes this flaky. But I suspect it is just a race and I need a better delay mechanism18:39
fungithat would make sense18:56
ianwhttps:// gives me "bad request", is that a known issue?19:34
clarkbianw: yes, you have to use one of the mailman domain names19:34
*** dviroel is now known as dviroel|biab19:34
clarkbdjango will refuse to serve content that isn't for the actual named stuff. We could add the public IP of the server to the list. But since we are doing vhosted mailman it doesn't make mcuh sense19:34
clarkbThis means if you go to lists.openstack.org or list.opendev.org etc at taht IP you'll see different content19:35
clarkb(and that part of the deployment does seem to be working)19:35
ianwright, yeah i saw the hostname setup bit of the ansible19:35
ianwi tried running through account creation at https://lists.opendev.org/accounts/signup/ (with redirect) and got a 500 error back19:38
ianwafter entering details, and choosing a password that satisfied it's internal desires19:39
clarkboh heh I'd only created accounts on the command line. Definitely should've tried through the UI19:39
clarkbmy hunch is something about the outbound smtp settings for user verification are sad. But the logs should tell us19:39
clarkb/var/lib/mailman/core/var/logs and/or /var/lib/mailman/web-data/logs/19:39
clarkbya "smtp recipients refused" for your email addr19:41
clarkbdefinitely the sort of thing I'll need to lean on others to help configure properly19:41
ianwsmtplib.SMTPRecipientsRefused: {'<mail>': (451, b'Temporary local problem - please try later')}19:41
ianwjinx :)19:41
fungithat may be a safety measure we added to keep the test deployment from sending e-mail to people19:45
clarkbfungi: no, I haven't added anything like that yet since the test lists are just the default mailman@domain lists19:45
clarkbI definitelly attempted to configure mailman-web to talk to local exim on port 25 to forward email19:46
clarkbwe may haev an exim config issue or a firewall problem19:46
fungiahh okay. so it's not configuring the lists from our data19:46
clarkbfungi: not yet. I pruned the list back due to that concern19:46
fungii know we have safety mechanisms in place in our mm2 tests to avoid having them spam people19:47
clarkbianw: for https://review.opendev.org/c/opendev/system-config/+/852487 that will install a virtualenv then upgrade pip and setuptools. Any concern that the pip and setuptools install will hit the same issue where it will install the wrong version of pip and/or setuptools for the local python version?19:47
clarkblatest setuptools for example needs python3.7 or newer19:47
ianwhrm, maybe?  perhaps those are coming in via wheels, and the version we have on bionic knows how to get the right wheel, but not things that don't have wheels?  (maybe?)19:49
clarkbwould it be worth holding a test node with that change in place to see what the resulting venv looks like?19:49
clarkbI'm slightly worried that CI might pass because the installation works but when yo uactually run the software it could break and then we'd break backups in bprod19:50
ianwi'm not too worried about that because the backup jobs do actually run a full backup so exercise that19:50
ianwthat is the original error19:50
clarkbah ok19:51
clarkbwe can also check on it afterwards and do venv surgery if necessary19:51
clarkbianw: I +2'd it but didn't approve just so that you can confirm you are happy with that plan before we dive in19:52
clarkbI also need to step away for a bit and find lunch here19:53
ianwi don't know why i called these "test0X" instead of "test-<distro>"19:55
ianwis the install/update19:57
ianwso you're right, it's chosen versions that are technically 3.7+ ... :/19:59
clarkbin that case I think what we want to do is add python3.6 annotations for what we install19:59
clarkbSo add a task for bionic nodes that install pip<foo and setuptools<bar and then the default should just work beacuse focal and jammy have new enough pip20:00
ianwRequirement already satisfied: pip in ./venv-3.6/lib/python3.6/site-packages (21.3.1)20:01
ianwRequirement already satisfied: setuptools in ./venv-3.6/lib/python3.6/site-packages (59.6.0)20:01
ianwif you create a 3.6 venv with tools that seem to get this right, that is the versions they choose20:01
fungilooks right to me20:02
fungipip<22 setuptools<6020:02
clarkb"with tools" ?20:02
clarkboh I see updated tools that know what version to do20:02
clarkb++ to fungi's versions20:03
ianwyeah, if you start with pip/setuptools in the venv that get it right20:03
ianwnow i think about it, this is also going to affect putting ansible in a venv on bionic20:03
ianwi think it might be best to abstract this20:03
ianwwhen we have purged bionic, we can then just remove one workaround place20:04
clarkba new "install venv" role that knows how to handle bionic maybe?20:04
clarkbthen anywhere we need a venv we can just use that20:04
fungiif you do it as two steps, first installing pip<22 in the venv and then installing the rest, it should pick correct versions20:04
ianwit seems to be setuptools causing the problems, so i think we probably want that at the later version first?20:05
clarkbmaybe you do three setups: 1) create venv with pip<22 2) install setuptools in venv 3) install $application20:06
fungisure, just saying that if you have the right pip in the venv, then it will pick a correct newer setuptools without needing to pin20:07
ianwthis is such a groundhog day of how we got into a huge mess with "ensure-pip-and-virtualenv" :)20:07
clarkbwhen you install setuptools alongside $application in the same pip invocation you use the old setuptools to install the application not the one you just installed20:07
ianwyeah, that was my concern20:07
fungithough technically not the case for pep 517 compliant installation20:08
fungisince it should pull in a newer build backend for the isolated build environment20:08
fungi(where setuptools is the default build backend)20:08
fungiso in that case the version of setuptools in your venv may not even be used at all20:09
*** dviroel|biab is now known as dviroel20:13
ianwok i will look at this after some breakfast :)20:15
*** dasm is now known as dasm|off20:43
clarkbhrm that mm3 change admin user creation issue happened on my recheck. I'm going to push a naive pause up to see if that helps21:14
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124821:15
ianwis this a "wait for the db to be up" thing?  i feel like we did have something for that in one of the roles...21:15
clarkbianw: its a wait for the db to be populated with the necessary tables.21:16
clarkbianw: I'm already waiting for the rest api to respond and for the table that records the admin user to be in place. But the command to create the admin user appears to rely on other bits in the database21:17
clarkbspecifically django.db.utils.ProgrammingError: (1146, "Table 'mailmandb.hyperkitty_profile' doesn't exist")21:17
*** dviroel is now known as dviroel|out21:17
ianwhttps://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gerrit/templates/docker-compose.yaml.j2#L24 was what i was thinking of, "wait-for-it.sh"21:18
clarkbbut I think there are other tables and stuff. I don't know that there is a singular "its up" notification or something21:18
ianwahh :/21:19
clarkbfor production its hardly worth worryin about because once the db is populated it will be happy from then on. But making CI more reliable is proving tricky21:19
clarkbit does use alembic to manage the schema. I wonder if alembic records a "I'm done" record somewhere21:22
*** diablo_rojo_phone is now known as Guest13921:23
clarkbianw: any idea why I can't get a tty in the mariadb container using `sudo docker exec -it mailman-compose_database_1 bash` on that jammy mm3 node? I suspect that maybe cgroupsv2 are to blame?21:26
ianwhrm, do you see a "open /dev/pts/0: operation not permitted: unknown"?21:27
ianw ls -l /dev/pts/021:27
ianwcrw--w---- 1 clarkb tty 136, 0 Aug  9 21:26 /dev/pts/021:27
ianwperhaps you don't :)21:27
clarkbya I see that21:28
clarkbhuh screen isn't installed either21:29
ianwhttps://github.com/opencontainers/runc/issues/3551 seems like it21:30
ianwi have not seen this before21:30
clarkboh cool. I guess upstream knows about it so patience is the best thing unless I want to actually fix it21:31
opendevreviewClark Boylan proposed opendev/system-config master: WIP Add a mailman3 list server  https://review.opendev.org/c/opendev/system-config/+/85124821:41
clarkbIf that works that will be better than the naive pause21:41
ianw$ touch /tmp/foo.txt21:45
ianw$ echo "hi" | sudo tee -a /tmp/foo.txt 21:45
ianwtee: /tmp/foo.txt: Permission denied21:45
ianwi feel like this has something to do with it21:45
ianwmaybe /tmp has always been sticky and i've never noticed?21:46
fungiuh... root can't append to a user-owned file?21:46
clarkbfwiw I was able to docker exec -it last friday21:47
clarkbI did that to browse the container fs's to find where the logs were written to21:47
clarkboh its failing to open the tty for that part of tee not the tmpfile21:48
clarkbstill odd that root can't open that file21:48
ianwsorry that trace is from the docker exec21:50
clarkbah, But I bet that is what breaks with tee21:51
clarkbits probably not writing to the tmp file but writing to your tty?21:51
clarkbor does it just inherit stdout to avoid that/ maybe21:52
ianwthis is definitely TIL territory 21:57
ianwroot not being able to append is the idea of fs.protected_regular=221:58
ianwwell setting all the "protected" sysctls to 0 on that host still doesn't make it work.  so might be a red-herring22:01
clarkbthat bug indicates that focal should have this behavior too. I wonder why we haven't seen it there? THough maybe we just haven't tried it22:01
clarkbah ok22:01
*** undefined is now known as Guest14522:02
*** Guest145 is now known as rcastillo22:03
fungithat's... so strange22:03
ianw28674 prctl(PR_SET_NAME, "runc:[2:INIT]" <unfinished ...> : it is runc that gets this error22:04
ianwlooks related22:07
ianwhe container boots, but 22:08
ianw`machinectl login mycontainer` fails. The culprit is /dev/pts/ptmx with 22:08
ianw0000 perms.22:08
ianw$ ls -l /dev/pts/ptmx22:08
ianwc--------- 1 root root 5, 2 Aug  8 21:12 /dev/pts/ptmx22:08
clarkbI'm not sure I undersatnd what that is trying to tell me but it does indicate the old system used more permissive perms22:12
ianwwell the trace of this is /tmp/containerd.txt ; 28674 is the pid that fails 22:13
clarkbthe wait on alembic to show the db version is at head passed. I'm going to recheck it a couple of times just to see if we can catch that problem again22:14
ianwclarkb: https://5c3613e862b57d687223-af2016a5632320f910deb9dcbf495ac6.ssl.cf1.rackcdn.com/847204/9/check/system-config-run-gitea/eb70227/bridge.openstack.org/screenshots/gitea-project-system-config.png that seems like a new warning box on the screenshot23:16
ianwi think with the recent cert work, we should be able to hit that as opendev.org, i'll put it on the list23:16
Clark[m]ianw: yup it's a new gitea feature. I think it is nice in that it helps confirm we aren't screenshotting production somehow23:17
opendevreviewIan Wienand proposed opendev/system-config master: create-venv: add role; use in install-borg  https://review.opendev.org/c/opendev/system-config/+/85248723:36
opendevreviewIan Wienand proposed opendev/system-config master: system-config-run: bump base timeout to 3600  https://review.opendev.org/c/opendev/system-config/+/85247923:36
opendevreviewIan Wienand proposed opendev/system-config master: install-borg: update to borg 1.1.18  https://review.opendev.org/c/opendev/system-config/+/85248823:36
opendevreviewIan Wienand proposed opendev/system-config master: system-config-run-borg-backup: add jammy test host  https://review.opendev.org/c/opendev/system-config/+/85248923:36
opendevreviewIan Wienand proposed opendev/system-config master: gate-groups: remove old backup group  https://review.opendev.org/c/opendev/system-config/+/85268423:56

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!