Tuesday, 2021-10-19

fungiyeah, i'm good either way00:00
ianwi don't feel i've been paying sufficient attention TBH00:19
fungiianw: so there was a race condition in change cache cleanup where a change could get enqueued while cleanup was underway and wind up with its entry removed if it had previously been present in the cache and was up for expiration in that pass00:48
fungithat resulted in the release-approvals pipeline getting blocked over the weekend00:48
fungiso we wanted to restart on that fix00:49
fungiseparately there's a transitional situation where builds which are paused when the scheduler is restarted will end up perpetually running from the executor's perspective, and keep nodes locked indefinitely until the executor is restarted, so a full zuul restart would clean us up from the scheduler restart last week00:50
*** dviroel|rover|afk is now known as dviroel|out00:50
fungithere isn't a fix per se for that second problem, but it will in theory go away once everything is being tracked in zk00:51
fungithere's a third minor problem where project key deletion from zk left empty parent znodes behind, which was causing the key backup cronjob to emit errors after the rename maintenance last week, clarkb has a fix up for that00:52
fungiand we'll presumably have to manually clean up the ones we've got, i think00:53
fungithe fix there isn't urgent, we'll almost certainly have it in before the next time we need to rename any projects00:54
Clark[m]The backups are working for the keys that do exist so that is not very urgent. Mostly just a make scary errors go away thing as they are noise00:58
fungiyeah, that too. it's really just noise01:00
Clark[m]And then I want to pull the Gerrit image for 3.3.7 build and restart to ensure the gerrit.config cleanups are happy. Also not urgent and can be done tomorrow01:02
fungisounds good, my ptg schedule tomorrow is mildly less jam-packed than it was today01:12
*** mazzy509 is now known as mazzy5001:23
*** ysandeep|out is now known as ysandeep05:02
*** ykarel_ is now known as ykarel05:20
opendevreviewSandeep Yadav proposed zuul/zuul-jobs master: multi-node-bridge: repos to install ovs in C9  https://review.opendev.org/c/zuul/zuul-jobs/+/81451606:12
opendevreviewSandeep Yadav proposed zuul/zuul-jobs master: multi-node-bridge: repos to install ovs in C9  https://review.opendev.org/c/zuul/zuul-jobs/+/81451606:20
*** ysandeep is now known as ysandeep|afk06:30
*** ysandeep|afk is now known as ysandeep|trng06:58
*** jpena|off is now known as jpena07:32
*** ykarel is now known as ykarel|lunch08:48
*** pojadhav|ruck is now known as pojadhav|lunch09:32
*** pojadhav|lunch is now known as pojadhav09:59
opendevreviewPierre Riteau proposed openstack/project-config master: [kolla] Preserve Backport-Candidate and Review-Priority scores  https://review.opendev.org/c/openstack/project-config/+/81454810:14
*** ykarel|lunch is now known as ykarel10:34
*** pojadhav is now known as pojadhav|ruck10:40
yoctozeptohas anyone tried creating a gerrit query that list all changes *without* hashtags?11:04
yoctozeptomeaning having 0 hashtags11:04
*** dviroel|out is now known as dviroel|rover11:06
*** jpena is now known as jpena|lunch11:26
*** jpena|lunch is now known as jpena12:15
opendevreviewMerged zuul/zuul-jobs master: ensure-podman: support Debian bullseye  https://review.opendev.org/c/zuul/zuul-jobs/+/81408812:54
*** ysandeep|trng is now known as ysandeep12:56
fungiyoctozepto: i have not, but i'll readily admit i haven't messed around with hashtags much yet13:22
yoctozeptoack, thanks for responding13:23
fungiyoctozepto: did you try hashtag:'' (the hashtag parameter seems to suggest that it will only match one)13:26
opendevreviewMark Goddard proposed openstack/project-config master: kolla-cli: end gating for retirement  https://review.opendev.org/c/openstack/project-config/+/81458013:27
Clark[m]yoctozepto: fungi: newer Gerrit seems to have the inhashtag search operator. This should allow you to search for -inhashtag:"^.*" But our Gerrit isn't new enough. If you are consistent enough about hashtags you can negate a specific list of them instead.13:35
fungiinhashtag is added in 3.4?13:36
Clark[m]I'm not sure when it got added. Gerrit upstream documents it currently but our Gerrit reports it isn't valid13:37
Clark[m]Might be a 3.5 festure13:37
yoctozeptoClark[m]: ah, thanks13:50
*** lbragstad_ is now known as lbragstad14:15
*** ykarel_ is now known as ykarel14:22
opendevreviewMark Goddard proposed openstack/project-config master: kolla-cli: enter retirement  https://review.opendev.org/c/openstack/project-config/+/81459714:48
opendevreviewMark Goddard proposed openstack/project-config master: kolla-cli: end gating for retirement  https://review.opendev.org/c/openstack/project-config/+/81458015:05
opendevreviewMark Goddard proposed openstack/project-config master: kolla-cli: enter retirement  https://review.opendev.org/c/openstack/project-config/+/81459715:05
opendevreviewThiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel  https://review.opendev.org/c/openstack/project-config/+/81460015:06
opendevreviewThiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel  https://review.opendev.org/c/openstack/project-config/+/81460015:30
opendevreviewShnaidman Sagi (Sergey) proposed zuul/zuul-jobs master: Print version of installed podman  https://review.opendev.org/c/zuul/zuul-jobs/+/81460415:44
opendevreviewThiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel  https://review.opendev.org/c/openstack/project-config/+/81460015:47
opendevreviewAlfredo Moralejo proposed openstack/diskimage-builder master: [WIP] Add support for CentOS Stream 9 in DIB  https://review.opendev.org/c/openstack/diskimage-builder/+/81139216:01
opendevreviewAlfredo Moralejo proposed openstack/diskimage-builder master: [WIP] Add support for CentOS Stream 9 in DIB  https://review.opendev.org/c/openstack/diskimage-builder/+/81139216:13
clarkbI'm working on fixing up my zuul change for key deletion, but then I'm free to do zuul + gerrit restarts until the infra meeting and free again after that (though a bike ride this afternoon would be good too before the rain returns)16:24
fungisame, the last ptg session in this block ends in half an hour and then i can help with restarts/reviews16:24
*** marios is now known as marios|out16:25
opendevreviewMerged openstack/project-config master: Adding gerritreview messages to starlingx channel  https://review.opendev.org/c/openstack/project-config/+/81460016:25
*** pojadhav|ruck is now known as pojadhav|out16:26
*** jpena is now known as jpena|off16:27
*** ysandeep is now known as ysandeep|out16:29
corvusclarkb: i guess we don't need to wait on your change for the restart...?  it won't fix the current situation (and we'll need to delete those znodes manually if not done already)16:31
corvusfungi, clarkb: are we doing a synchronized zuul+gerrit restart?16:32
corvusassuming "yes" to all the above, seems like aiming for 1700utc is the way to go16:32
fungicorvus: yes, we have gerrit things we want included in a restart16:33
fungii'm up for a 17:00 utc restart of both together16:33
corvuskk see you then16:33
fungiclarkb: you're okay to do that in ~25 minutes?16:34
fungii'll give #openstack-release a heads uip16:34
clarkbcorvus: correct don't need to wait for my change16:34
clarkb17:00 UTC wfm16:34
clarkbfungi: should we go ahead and do a docker-compose pull on review?16:46
fungiyeah, i can do that now16:49
fungiit's running in a root screen session there now16:50
clarkbcool docker image list shows the new image now16:51
fungiopendevorg/gerrit   <none>    a7c2687bb510   3 weeks ago    793MB16:51
fungithat one?16:51
clarkbopendevorg/gerrit   3.3       33d6300c73ad   24 hours ago   795MB that one16:53
fungioh, yep, screen within tmux, i scrolled the wrong one ;)16:54
fungiwe have more than a a terminal's worth of images listed now16:54
clarkbWhen we restart we'll have caught up to wikimedia in terms of gerrit version :)16:54
fungiuntil they upgrade again16:54
fungialso didn't they replace gerrit with phabricator forever ago?16:55
clarkbha yes. Though I'm hoping to get the 3.4 upgrade done by the end of the year16:55
clarkbfungi: I think replacing gerrit has become more difficult than they anticipated (I suspect they have gerrit fans)16:55
fungistatus notice Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again16:59
fungishould i send that? i copied it from the last one we did16:59
fungi#status notice Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again16:59
opendevstatusfungi: sending notice16:59
-opendevstatus- NOTICE: Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again16:59
corvusi ran zuul_pull just in case (but i'm pretty sure all the images were already there)16:59
fungiawesome, i'm ready for the gerrit restart once zuul is down16:59
clarkbfungi: I'm attached to the screen now too and will follow along17:00
corvusshall i stop zuul now?17:00
clarkbI'm ready17:01
fungicorvus: go for it17:01
corvusscheduler is stopped; feel free to proceed17:01
fungigerrit is stopping17:01
corvus(waiting for okay to start zuul)17:01
fungistarting gerrit again now17:02
fungi[2021-10-19T17:02:23.083Z] [main] INFO  com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.3.7-2-g17936a0b79-dirty ready17:02
fungii can pull up the webui too17:02
fungiclarkb: lgty?17:03
clarkbweb ui is working for me17:03
clarkbya I think its happy. reports 3.3.7 as the version too17:03
fungicorvus: go for starting zuul when you're ready17:03
corvusstarting zuul17:03
clarkbre the gerrit.config updates we made the biggest thing I was worried about was change screen and theming since we removed the old theming config and removed the default change screen config17:04
clarkbbut neither is a thing in gerrit anymore so would have to be a weird interaction that (and gerrit bug) for there to be any problems17:04
clarkbtheme and change screen look as expected to me17:04
fungibefore the restart i was seeing weird fonts in the unified diff view, looks like it's still happening too17:05
fungisame in side-by-side actually17:05
clarkbfungi: are you filtering web fonts maybe and the nfallibng back to whatever is in your browser? I don't have this issue btu I don't think I'm filtering any web fonts17:06
fungiprobably something weird with my browser settings, but all the lines of code are rendered double-size but with single size spacing17:06
clarkbhuh ya that isn't a problem for me. Let me double check in a different browser17:06
fungineither privacy badger nor ddg privacy essentials indicate anything is being blocked17:07
clarkbya I can't reproduce17:07
corvuslooks fine for me.  i tried zooming in and out and font size+leading seem to go hand-in-hand so it looks good at all zooms17:08
fungiit might be that newest firefox is assuming a smarter window manager than i'm using17:11
corvusi blame rust17:12
fungiwhen i try to use its "take a screenshot" feature it complains about a background error for "getZoomFactor is not defined"17:12
clarkbare you on wayland?17:12
funginope, zorg with ratpoison17:13
fungier, xorg17:13
clarkbfractional scaling and all that is a big deal with wayland17:13
fungilibwayland is installed, but probably just because some things link it17:14
corvus("zorg here!")17:14
corvustenants loaded17:15
corvusstarting re-enqueueeueueuing17:15
clarkbI'm glad others find that word hard to type17:16
corvusperhaps we should have the zuul cli accept "enq[ue]*"17:17
fungiwe could even just set up a conditional ladder which checks whether each of the available subcommands .contains() sys.argv[1]17:18
fungielif "enqueue".contains(sys.argv[1]): ...17:19
fungibasically allow all subcommands to be arbitrarily shortened17:19
corvusyeah, i like programs that do that17:20
fungimore magic could be added to identify ambiguous abbrevs17:20
clarkb| 0026931648 | rax-dfw             | ubuntu-focal              | c1e73330-d61f-423b-bc32-e4460f15ee56 |  |                                         | used     | 05:07:20:22  | locked   | still shows up in a nodepool listing but the other 59 in-use nodes that got stuck seem to be gone17:20
clarkbI suppose ^ may be a different issue?17:20
fungiclarkb: held?17:21
fungidoes --detail give you anything else useful?17:21
clarkbfungi: no it is "used". There is a held node from the timeframe too but thats fine17:21
fungioh, yeah, it wouldn't be used in that case17:21
clarkb--detail doesn't give any additional useful info17:22
corvusit doesn't say who locked it?  i thought it did17:23
clarkboh is that what nl01.opendev.org-PoolWorker.rax-dfw-main-23ef88ea5474439dac253fa13c63d4f7 is?17:24
clarkbmaybe we can restart the launcher and the lock will go away then it can retry deleting it?17:24
clarkbLauncher is the column id for that value17:25
corvusyeah.  a sigusr2 thread dump might illustrate why nl01 is sitting on a locked used node and not doing anything with it17:25
corvusoh then that may not be the lock holder17:26
clarkbya --detail's column header doesn't seem to identify the lock holder17:26
corvusbummer, i thought we had that :/17:27
corvusre-enqueue complete17:27
corvusat worst, we can inspect zk, but that's probably something we should expose thru the cli17:27
clarkbI need to figure out food as I've somehow neglected to eat anything today. Back in a bit17:29
fungithis is a screen capture of what the gerrit diff view has been doing in ff for me: http://fungi.yuggoth.org/tmp/ff.png17:36
fungicomparing the same url, chromium looks fine17:37
fungii closed out the screen session on review just now, btw, seems like the restart was a success17:48
clarkbfungi: could it be the dark theme?17:49
fungii'll try switching it up17:50
fungiclarkb: aha, it's at least *something* to do with my account prefs, because if i load it in another container tab not authenticated, i get a reasonable looking diff17:54
fungifound it!17:56
fungiapparently changing "font size" in the diff view doesn't change the space between the lines? it was for some reason set to 24 in my preferences, dropping it to 12 seems to have fixed things17:57
clarkbthat is similar to the issue in etherpad chopping off subsequent top of lines17:58
clarkbcorvus: I'm looking at the lock contents in zk for that node. Is the uuid looking thing at the front of the path identifying a connection?18:01
clarkbhrm no it seems zk identifies connections with a session id which is a different ype of value. Any idea how to map the lock to the connection?18:05
corvusif the lock doesn't have an id, then you may be able to map it to a session, then find an ephemeral node owned by the same session that identifies it.18:06
clarkboh right there is a way to list ephemeral nodes by sessions iirc and the locks are ephemeral?18:07
clarkbdump on the leader18:07
clarkbzk05 is the laeder18:07
clarkbcorvus: nl01 is holding the lock. I ran dump then found the session id associated with the lock then listed session with `cons` which gave me an ip address that dig -x resoled to nl0118:10
clarkbso now I guess we sigusr2 on nl01 and see what might be holding up the used node deletion18:11
funginl01 is also the launcher responsible for that node, so i guess it makes sense18:12
clarkbneither 0026931648 nor c1e73330-d61f-423b-bc32-e4460f15ee56 show up in the thread dump. Launcher threads seem to use the nodepool id to name the thread and deleter threads the node uuid18:15
clarkbtwo thread match rax-dfw a delete thread for a different uuid and the poolworker for rax dfw18:16
clarkbcorvus: the last thing logged by the launcher is that it unlocked the node and it is ready18:20
clarkbI'm wondering if we somehow lost the lock during the zk problems18:21
clarkblike zuul ran the job and set it to used, then nl01 grabs the lock and immediately after the zk problems occur causing nl01 to lose track18:21
clarkband that happens before nl01 can log anything about doing cleanup of the used now18:21
clarkb*used node18:21
clarkbI suspect that restarting the launcher on nl01 will correct this. But I still can't find an indication for how this started18:25
kopecmartinhi all, how can we publish project documentation to docs.opendev.org? we tried opendev-promote-docs and also promote-tox-docs-infra but we can't still see the doc https://docs.opendev.org/opendev/19:20
kopecmartinI made a silly mistake somewhere 19:20
kopecmartinfor reference: https://review.opendev.org/c/openinfra/refstack/+/81463519:21
clarkbkopecmartin: I would start by looking at the job logs for the jobs that ran already19:21
clarkbhttps://static.opendev.org/docs/refstack/latest/ does seem to be updating so it may just be a matter of vhost stuff properly serving it?19:22
kopecmartinah, yeah, that would explain it 19:23
kopecmartinclarkb: thanks!19:23
clarkbkopecmartin: https://docs.opendev.org/openinfra/refstack/latest/ there you go19:24
kopecmartinnice, now i wonder whether it was done by promote-tox-docs-infra or opendev-promote-docs19:24
kopecmartini'm gonna check the logs19:24
fungiyes, project docs are namespaced on docs.opendev.org since we aim to publish documentation for multiple communities19:25
fungiinstead of writing to docs/refstack the job should be configured to publish to docs/openinfra/refstack19:26
clarkbianw: I think the whole dib stack should be approved now20:13
ianwclarkb: thanks!  i'll keep an eye on it all20:14
opendevreviewDouglas Viroel proposed zuul/zuul-jobs master: Add FIPS enable multinode job definition  https://review.opendev.org/c/zuul/zuul-jobs/+/81325320:21
clarkbianw: the whole stack just failed on a tox py35 failure20:29
clarkbfungi: ^ we broke xenial jobs with the bindep release20:29
clarkbfungi: we need a packaging pin for python3.520:29
opendevreviewClark Boylan proposed opendev/bindep master: Add old python packaging pin  https://review.opendev.org/c/opendev/bindep/+/81464720:32
clarkbsomething like ^ then a release?20:32
clarkbdib can also rpobably drop python3.5 testing?20:33
clarkbianw: ^ that might be quicker.20:33
fungiaha, we can't install latest packaging on xenial? makes sense20:46
fungithough surprised bindep's xenial job didn't catch it20:46
clarkbya I don't understand how it got through but if you look at the error at https://zuul.opendev.org/t/openstack/build/cae47da97cff44c8a855f30378634ee1/log/job-output.txt packaging complains about invalid python and their changelog says 20.9 was the last python3.5 capable release20:49
clarkbI'm going to get a bike ride in now before the rain arrives tomorrow but can dig more after if we want to fully understand that20:49
fungiand yeah, 21.0 was tagged back at the beginning of july20:50
fungihave a good ride, i'll take a closer look after dinner20:50
ianwsorry, back, looking20:53
*** dviroel|rover is now known as dviroel|rover|afk20:59
*** avass[m] is now known as AlbinVass[m]21:00
ianwi guess it's just not  a path covered by the tox run21:03
ianwthat's not it.  my tox install chose packaging (20.9)21:12
ianw(tox py35)21:12
ianwseemingly so did the bindep gate tests21:12
fungixenial's default pip version is too old to support python_requires metadata in packages21:14
fungiso it probably only failed in jobs which are not using new pip21:14
ianwhrm, it looks like the bindep gate uses xenial for 3.5 -> https://zuul.opendev.org/t/opendev/build/e613c1b0042549c59d07825d97b5ff05/logs21:19
ianwbut that's not "openstack-tox-py35", it's "tox-py35"21:20
ianwi wonder if that's doing some pip upgrades in the tox env21:20
fungithe tox logs should say21:23
ianwoh, i think what has happened here is that the bindep jobs run "ensure-pip"21:24
ianwbindep -> https://zuul.opendev.org/t/opendev/build/e613c1b0042549c59d07825d97b5ff05/console21:25
ianwdib -> https://zuul.opendev.org/t/openstack/build/cae47da97cff44c8a855f30378634ee1/console21:25
fungiyup, that'll do it21:27
fungihuh, the fix failed tox-py35 on this: https://zuul.opendev.org/t/opendev/build/1f00b3e2c8a749eca74ee50a7cc17d44/console#1/0/16/ubuntu-xenial21:37
ianwwhy does openstack-tox-py35 run zuul-jobs/playbooks/tox/post.yaml but not zuul-jobs/playbooks/tox/pre.yaml ?21:37
ianwtox-py35 really should have run playbooks/tox/pre.yaml, right?  from https://opendev.org/zuul/zuul-jobs/src/branch/master/zuul.d/python-jobs.yaml#L4221:40
fungialso someone just e-mailed openstack-discuss asking for help logging into their gerrit account, seems like it might be another case of a duplicate resulting from an address change in ubuntuone21:44
ianwit really does seem to me that openstack-tox-p35 should ultimately parent to "tox", which should have a pre-run.yaml step that runs ensure-tox ,which will run ensure-pip, which will upgrade things21:49
ianwis it possible the dib problem and the failing bindep fix both stem from some other root cause relating to pip not upgrading?21:50
ianwpossibly the zuul restart ~ 5 hours ago ... ?21:52
fungisomething that's causing some playbooks to no longer be run?21:54
ianwit seems unlikely looking at recent changes ... but i am struggling to see why that playbook wouldn't run21:55
ianwi mean compare console of21:57
ianwhttps://zuul.opendev.org/t/opendev/build/7966ab9ee15f4f3e8460b23652cfddc5/console (tox-py35, earlier run)21:58
ianwhttps://zuul.opendev.org/t/opendev/build/1f00b3e2c8a749eca74ee50a7cc17d44/console (tox-py35, failing run now)21:58
ianwit's actually missing "tox/pre.yaml" and "tox/run.yaml" ... ?21:59
ianwohh, i see: pre/unittests.yaml is the one that's failing.  further bits are skipped22:01
ianwoohhh, i further see -- the broken bindep has broken the bindep testing22:02
ianwok ... soooo ...22:14
ianwwe have created the on-image bindep 2.10.0 @ https://nb02.opendev.org/ubuntu-xenial-0000210004.log22:14
ianwfor some reason, this has created /usr/bindep-env/ with "2021-10-19 14:22:49.157 | You are using pip version 8.1.1, however version 21.3 is available."22:15
ianwthis has pulled packaging 21 into this venv (incorrectly)22:16
ianwbindep uses this to setup the tox environment to run the bindep tests, hence the recent explosion22:17
opendevreviewIan Wienand proposed openstack/project-config master: infra-package-needs: install latest pip  https://review.opendev.org/c/openstack/project-config/+/81467722:20
ianwactually we should probably do that in all venvs we prime on the images22:22
opendevreviewIan Wienand proposed openstack/project-config master: infra-package-needs: install latest pip  https://review.opendev.org/c/openstack/project-config/+/81467722:22
ianwsince that dib stack got -2'd anyway, i could rebase that on a job to remove py35 testing which would be a workaround for dib, for now22:35
opendevreviewIan Wienand proposed openstack/diskimage-builder master: epel: match replacement better  https://review.opendev.org/c/openstack/diskimage-builder/+/81392222:39
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Revert "Allowing ubuntu element use local image"  https://review.opendev.org/c/openstack/diskimage-builder/+/81409422:39
opendevreviewIan Wienand proposed openstack/diskimage-builder master: ubuntu-systemd-container: deprecate and remove jobs  https://review.opendev.org/c/openstack/diskimage-builder/+/81406822:39
opendevreviewIan Wienand proposed openstack/diskimage-builder master: ubuntu: add Focal test  https://review.opendev.org/c/openstack/diskimage-builder/+/81407222:40
opendevreviewIan Wienand proposed openstack/diskimage-builder master: functests: drop apt-sources  https://review.opendev.org/c/openstack/diskimage-builder/+/81407422:40
opendevreviewIan Wienand proposed openstack/diskimage-builder master: centos7 : drop functional testing  https://review.opendev.org/c/openstack/diskimage-builder/+/81407522:40
opendevreviewIan Wienand proposed openstack/diskimage-builder master: functests: drop minimal tests in the gate  https://review.opendev.org/c/openstack/diskimage-builder/+/81407822:40
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Remove extras job, put gentoo job in gate  https://review.opendev.org/c/openstack/diskimage-builder/+/81407922:40
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Simplify functests job  https://review.opendev.org/c/openstack/diskimage-builder/+/81408022:40
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Run functional tests on Debian Bullseye  https://review.opendev.org/c/openstack/diskimage-builder/+/81408122:40
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Update centos element for 9-stream  https://review.opendev.org/c/openstack/diskimage-builder/+/80681922:40
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Remove py35 tox jobs  https://review.opendev.org/c/openstack/diskimage-builder/+/81468022:40
clarkbfungi: oh cool you figured it out. I had a eureka moment on my bike ride. I think that when we build our images we do so with old pip but when we run tox we do so with newer pip and that made the bindep jobs pass22:40
clarkbfungi: I think my change is correct given that22:40
clarkbnow why did it retry limit in the gate on tox py3522:41
clarkbits a chicken and egg issue I think22:41
clarkbfungi: mayeb we manually test it in a docker container and if that works force merge then do a release?22:42
ianwclarkb: yeah, its actually the pip in "python -m venv" -- which must be vendored?  it's 8 and even the system one is 922:42
ianwclarkb: i think we just need to rebuild images with https://review.opendev.org/c/openstack/project-config/+/81467722:42
clarkbianw: iirc we override it in zuul jobs to be 9 out of our ppa but in dib builds we dont do that and get 822:42
clarkbah yup I think that will do it too22:43
clarkbhowever it may struggle to update to latest pip for the same reason22:43
clarkbwe might have to do it in two passes. pip install -U pip<someknowngoodver && pip install -U pip22:44
clarkbianw: ^ I expect that will be necessary22:44
corvusclarkb, fungi, ianw, mordred: i'm starting to think that zuul may need larger test nodes to run its unit tests.22:44
corvushave we (opendev) thought about expanding the options for test node sizes?22:45
clarkbcorvus: we do have larger labels available. Maybe give them a go and see if it helps?22:45
ianwclarkb: i'm not sure i remember why updating from 8 failed?  22:45
clarkbcorvus: we've alerady done it a coupel of years ago, its just that the availability of those nodes is more limited22:45
corvusare they multi-region?  i thought maybe it was just one region for one project22:45
fungiand if it does, we can think about how we might roll those out more broadly22:45
clarkbcorvus: they are multiregion. airship and vexxhost iirc.22:45
fungithey're in at least two regions22:45
clarkbcorvus: yes22:46
clarkband when we had the donnyd basement cloud we had them there too22:47
ianwclarkb: i think we only upgraded to 9 because of mirror issues, which wouldn't affect the dib build https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-pip/tasks/xenial.yaml#L122:47
ianw(sorry i know we have two conversations going on :)22:47
clarkbianw: but 8 doesn't do the package metadata which is necessary to install the newest pip that supports python3.5 I think22:47
corvus[as an aside, i think 2 things are at play: 1) zuul and openstack have different constraints/goals/etc; 2) even in openstack's case, it's probably reasonable to reconsider whether "standard issue laptop from 12 years ago" is still the right baseline for unit tests :) ]22:47
clarkbianw: ya pip 21 and newer doesn't support python3.522:48
clarkbcorvus: I think the broader issue is that getting these resources to fit into clouds is difficult. The bigger nodes we do the less throughput we can offer22:48
clarkband that has detrimental affects in other ways22:48
fungiwell, not entirely true if the jobs run faster22:49
clarkbfungi: for most of our workload (openstack) we are cpu constrained22:49
corvusclarkb: ++ especially if the cloud tenant is designed specifically for one flavor22:49
clarkbfungi: and usually we can get more memory but not more cpu22:49
fungiclarkb: gotta wonder what percentage of overall node-hours are consumed by dog-slow jobs grinding in swap thrash though22:49
fungithose could significantly skew the overall usage22:50
corvusi feel like zuul-web is insufficiently responsive22:50
clarkbya thats fair its possible we could see cpu be freed for real work22:50
corvusmaybe because zuul-scheduler is busy?22:51
corvusyep, it's better now.  nm.22:51
clarkbcorvus: ya I agree it loads status but very slowly22:51
corvusi think it's building a bunch of layouts22:51
clarkbcorvus: re zuul specifically are you finding that it is memory constrainted or cpu or both? or maybe disk?22:51
fungiit would be an interesting experiment to switch to 16gb flavors everywhere, but we'd probably be unable to roll it back since a bunch of projects would unknowingly merge changes which consumed a lot more ram22:51
clarkbdisk is probably the most difficult of the bunch to address22:51
corvusis ubuntu-bionic-32GB what i'm looking for?22:52
fungicorvus: we should probably add a focal version of those too22:52
clarkbianw: I just confirmed on an ubuntu xenial container that we'll have to two step udpate pip22:53
ianwclarkb: yep, me too :)  just fixing, a great observation :)22:53
clarkbcorvus: I think the flavors are called -expanded22:53
clarkbfor 16gb and then there is a 32gb flavor which is also available22:53
clarkbmight be good to check against both?22:53
corvusyep.  -32GB is only 1 region22:54
corvusoh wait, ubuntu-bionic-expanded is also only 1 region22:54
clarkbhrm when we did the vexxhost stuff did we not add them to the existing pools /me looks22:54
corvusthere's a ubuntu-bionic-expanded-vexxhost22:55
clarkbya ok so we did split them up like that hrm22:55
clarkbI think normalizing that better is a reasonable thing to do22:55
corvusbut it's also one region.  so maybe there's confusion thinking that ubuntu-bionic-expanded is 2x, but really it's ubuntu-bionic-expanded x1 and ubuntu-bionic-expanded-vexxhost x122:55
clarkbcorvus: yup exactly. I think we could do two regions but haven't22:55
corvusokay.  i'll come up with something.  gimme a few mins22:55
opendevreviewIan Wienand proposed openstack/project-config master: infra-package-needs: install latest pip  https://review.opendev.org/c/openstack/project-config/+/81467722:56
clarkbianw: pip install -U pip<21 && pip install -U pip?22:56
ianwi can make it more like that if you like22:56
clarkbianw: no your change is fine. I did suggest maybe using <21 in the xenial case but I seriously doubt we'll get a new 20.x release22:57
corvuswhat's the purpose of ubuntu-bionic-vexxhost ?  it's just an 8g node; seems the same as ubuntu-bionic22:58
ianwcorvus: i have some feeling that may be for kvm nesting?22:59
corvusoh, like it's just "get be a bionic node on vexxhost because they have kvm"?22:59
corvusnested virt22:59
clarkbcorvus: I think thats actually the big memory flavor with 8vcpus not 8gb memory23:00
clarkbcorvus: you should double check with the cloud23:00
clarkbthis really could use some normalizing and maybe comments to explain the different cloud flavors since their choices don't encessary mimic our choices in naming scheme23:01
clarkbnested-virt-ubuntu-focal <- that might actually be a big memory server23:01
clarkbah but it isn't in other clouds so that would be the issue. We want a new label using that flavor name and the -expanded theme on our side23:01
fungiyes, part of why we switched vexxhost nodes out of our normal pool is their 8vcpu flavor switched to coming with 32gb ram, and then a zuul change was merged after passing testing on one of those which used more memory causing it to no longer work on 8gb nodes, so we isolated them to nonstandard labels23:02
corvus(was zuul-operator, but yeah)23:03
fungiahh, yep sorry23:03
corvusand yeah, v3-standard-8 seems to be 8cpu 32gb ram23:03
clarkbfungi: running https://review.opendev.org/c/opendev/bindep/+/814647 locally in a xenial container works and the non py35 jobs passed on that change. Ithink we can force merge and make a release of that23:04
clarkbfungi: but I'll defer to you on that since I wrote the change23:04
fungivexxhost's cpu:ram ratio on their hardware is apparently ~1cpu:1gb, so they wanted to align their flavors to better fit the systems23:04
fungiclarkb: so you're sure >3.5 is correct and doesn't need to be >=3.6 instead?23:05
fungi(per my inline comment on it)23:05
clarkbit seems to have worked but that is a good point. Let me update it so that we don't have human confusion at least23:06
fungii thought we'd used >= elsewhere over concerns that >3.5 would still match 3.5.x versions23:06
fungiclarkb: your xenial container had distro-supplied pip? (9.something was it?)23:07
clarkbpackaging ; python_version >= '3.6' and packaging<21.0 ; python_version < '3.6'23:07
clarkbfungi: no I had to do the two pass thing I described above before installing tox23:07
clarkbapt-get install python3-pip && pip3 isntall -U 'pip<21' && pip install -U pip && pip install tox23:08
opendevreviewClark Boylan proposed opendev/bindep master: Add old python packaging pin  https://review.opendev.org/c/opendev/bindep/+/81464723:09
clarkbfungi: ^ like that?23:09
fungiahh, okay. in that case you likely had new enough pip that it wouldn't have downloaded packaging 21.0 anyway, right?23:09
fungii thought the problem was you needed old pip without python_requires metadata support in order to trigger it23:10
ianwyeah i'm not sure we need the pin, we just need up-to-date pip's?23:10
fungibecause newer pip knows not to install a version of packaging which says it won't work with python 3.523:10
clarkboh right so in my test I need to downgrade pip back to 823:11
clarkbthe reason I updated was I needed to install tox23:11
clarkbI think making bindep work with older pip is a reasonabel thing given its position in bootstrapping things23:11
opendevreviewJames E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB to vexxhost-specific  https://review.opendev.org/c/openstack/project-config/+/81468323:11
clarkbother tools I wouldn't worry too much23:11
corvusclarkb, fungi, ianw: ^ that's the minimal change to get a 'large node' on 2 clouds.23:12
fungiright, i'm good with the change, just pointing out the hole in the test methodology23:12
clarkbinfra-root on a hunch about slowness of things re corvus observation that zuul status was lsow and being told I couldn't resolve review.o.o locally I checked our nsX servers and only ns1 is running and nsd23:12
clarkbLet me finish up this bindep checking then I can look closer if no one sle has addressed that yet23:12
fungii'll look at the nameservers now23:13
corvusi confess, i'm still not sure whether that should be in the "main" pool which holds the "nested-virt-*" labels, or the "vexxhost-specific" pool which holds the "vexxhost" labels23:13
opendevreviewJames E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB to vexxhost-specific  https://review.opendev.org/c/openstack/project-config/+/81468323:13
clarkbcorvus: I think the main pool23:14
fungisystemctl status says nsd on ns2 crashed on 2021-08-02 at 01:45:31 UTC (2 months 18 days ago), so looks like we probably no longer have a log of why23:14
corvusclarkb: why's that?  the big ones are in the vexxhost-specific pool23:15
fungiuptime for ns2 is 78 days, which looks suspiciously similar23:15
corvus"journalctl -fu nsd" says it failed to start but not why23:15
fungii think this means the server was rebooted and nsd crashed during boot?23:15
fungithis may have been during vexxhost server migrations?23:16
fungi(ns2 is in vexxhost)23:16
clarkbcorvus: I think vexxhost pool was done when our vexxhost tenant was wanting to try some stuff and didn't care about signle region issues23:16
fungii vaguely recall those were going on around that time23:17
clarkbcorvus: I suspect that now we can fold the vexxhost specific stuff into main since we'er doing that normally now23:17
corvusclarkb: okay i'll put it in main23:17
opendevreviewJames E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB  https://review.opendev.org/c/openstack/project-config/+/81468323:17
fungii manually stopped and started nsd with systemctl and it's running now23:17
clarkbif my container hadn't said "I can't resolve this" I wouldn't have thought to check the nsds23:18
ianwclarkb: if you can look at https://review.opendev.org/c/openstack/diskimage-builder/+/814680 to remove py35 jobs on dib when all this over that would be good too23:18
clarkbfungi: corvus: do we need to force a replication from adns now?23:19
clarkbotherwise ns2 might serve old stale data?23:19
clarkbianw: can do23:19
fungins1 and ns2 are serving the same serial on the opendev.org soa23:19
fungiso i think it's all good now?23:19
fungi#status log Manually restarted nsd on ns2.opendev.org, which seems to have failed to start at boot23:19
opendevstatusfungi: finished logging23:20
fungiclarkb: i'll check the logs, but nsd ought to be smart enough to not take requests until after it checks serials on its zones against adns1 and initiates any necessary zone transfers23:21
clarkbya if the serial is the same we should be good23:21
clarkbheh and now is paste unhappy?23:22
clarkbthere it goes maybe my local resolver hasn't figured out ns2 is happy again23:22
fungiprobably the usual db socket timeout23:22
corvusmy git review running right now is very slow23:25
clarkbI have pretty high packet loss to paste.o.o23:25
clarkbI think that explains it23:25
clarkbpign to review is fine though so I don't know that explains a slow git review23:26
clarkbfungi: ianw  https://gist.github.com/cboylan/a14e3458f187ccd3561c8fe96b82509b that hsould be a better test of the bindep install with pip 8.1.123:27
corvusseems better now.  :/23:27
ianwclarkb: so pbr is really in the same boat?23:28
clarkbianw: pbr is in a different boat :( pbr is a setup requires which get install by easy_install. easy_install doesn't support SNI on xenial (and maybe bionic? I don't remember how far back that went) and pypi is SNI only on their CDN now23:30
clarkbianw: pip does do SNI even when old like that so you have to install pbr first then install other things that use pbr :(23:30
clarkbdoes anyone else have trouble getting to paste?23:30
clarkbI'm wondering if we're going to get a message from rax saying the host it is on had trouble and got rebooted or if this is specific to me23:30
clarkbvia ipv4 fwiw23:30
fungii'm getting no icmp6 echo replies23:31
ianwagree from .au too23:31
funginor can i ping it over ipv423:31
fungioh, intermittent response now23:32
fungi55.5556% packet loss23:32
fungii'll see if i can get to the console for it23:32
fungimmm, i was able to ssh in just now23:32
fungibut extremely laggy23:32
fungi23:33:00 up 28 days, 18:47,  1 user,  load average: 0.00, 0.00, 0.0023:33
fungiso i don't think the server is being hammered23:33
ianwi just pulled up a console and there's nothing but a login prompt that responds on it23:33
fungilikely network upstream from the instance has many sads23:33
fungino ticket from rackspace about any issues yet though23:34
clarkbI guess we wait it out for a bit then?23:36
ianwi'm not sure i was aware of that pbr snafu23:38
clarkbI've just responded to a user on openstack-discuss that cannot login to gerrit because they are trying to login with a new openid that is associated with an old gerrit account and gerrit won't create a new account with conflicting ids23:39
clarkbJust a heads up here as I've asked them to reach out on IRC as its a bit easier to debug this stuff interactively23:39
clarkbbut I did make a couple of suggestions for how we can proceed in the email should it come up when I am not around23:40
clarkbianw: I +2'd the dib removal of py35 jobs but left a note23:43
clarkbianw: we very intentionally made pbr continue to support really old stuff because setup_requires also can't effectively pin deps23:45
clarkbianw: which basically maens you always get the latest thing even on the oldest system you've got23:45
clarkbwe really have to be careful with pbr to not add a bunch of fancy new python stuff23:45
fungior to sufficiently guard it so that it only gets called on new enough python and has working fallbacks23:46
ianwi forget why we added openstack-python3-wallaby-jobs instead of individual tox jobs ... to git blame!23:50
clarkbfungi: note we still expect https://review.opendev.org/c/opendev/bindep/+/814647 to fail in the gate right? Are you just double checking it against the newer pythons?23:51

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!