fungi | yeah, i'm good either way | 00:00 |
---|---|---|
ianw | i don't feel i've been paying sufficient attention TBH | 00:19 |
fungi | ianw: so there was a race condition in change cache cleanup where a change could get enqueued while cleanup was underway and wind up with its entry removed if it had previously been present in the cache and was up for expiration in that pass | 00:48 |
fungi | that resulted in the release-approvals pipeline getting blocked over the weekend | 00:48 |
fungi | so we wanted to restart on that fix | 00:49 |
fungi | separately there's a transitional situation where builds which are paused when the scheduler is restarted will end up perpetually running from the executor's perspective, and keep nodes locked indefinitely until the executor is restarted, so a full zuul restart would clean us up from the scheduler restart last week | 00:50 |
*** dviroel|rover|afk is now known as dviroel|out | 00:50 | |
fungi | there isn't a fix per se for that second problem, but it will in theory go away once everything is being tracked in zk | 00:51 |
fungi | there's a third minor problem where project key deletion from zk left empty parent znodes behind, which was causing the key backup cronjob to emit errors after the rename maintenance last week, clarkb has a fix up for that | 00:52 |
fungi | and we'll presumably have to manually clean up the ones we've got, i think | 00:53 |
fungi | the fix there isn't urgent, we'll almost certainly have it in before the next time we need to rename any projects | 00:54 |
Clark[m] | The backups are working for the keys that do exist so that is not very urgent. Mostly just a make scary errors go away thing as they are noise | 00:58 |
fungi | yeah, that too. it's really just noise | 01:00 |
Clark[m] | And then I want to pull the Gerrit image for 3.3.7 build and restart to ensure the gerrit.config cleanups are happy. Also not urgent and can be done tomorrow | 01:02 |
fungi | sounds good, my ptg schedule tomorrow is mildly less jam-packed than it was today | 01:12 |
*** mazzy509 is now known as mazzy50 | 01:23 | |
*** ysandeep|out is now known as ysandeep | 05:02 | |
*** ykarel_ is now known as ykarel | 05:20 | |
opendevreview | Sandeep Yadav proposed zuul/zuul-jobs master: multi-node-bridge: repos to install ovs in C9 https://review.opendev.org/c/zuul/zuul-jobs/+/814516 | 06:12 |
opendevreview | Sandeep Yadav proposed zuul/zuul-jobs master: multi-node-bridge: repos to install ovs in C9 https://review.opendev.org/c/zuul/zuul-jobs/+/814516 | 06:20 |
*** ysandeep is now known as ysandeep|afk | 06:30 | |
*** ysandeep|afk is now known as ysandeep|trng | 06:58 | |
*** jpena|off is now known as jpena | 07:32 | |
*** ykarel is now known as ykarel|lunch | 08:48 | |
*** pojadhav|ruck is now known as pojadhav|lunch | 09:32 | |
*** pojadhav|lunch is now known as pojadhav | 09:59 | |
opendevreview | Pierre Riteau proposed openstack/project-config master: [kolla] Preserve Backport-Candidate and Review-Priority scores https://review.opendev.org/c/openstack/project-config/+/814548 | 10:14 |
*** ykarel|lunch is now known as ykarel | 10:34 | |
*** pojadhav is now known as pojadhav|ruck | 10:40 | |
yoctozepto | has anyone tried creating a gerrit query that list all changes *without* hashtags? | 11:04 |
yoctozepto | meaning having 0 hashtags | 11:04 |
*** dviroel|out is now known as dviroel|rover | 11:06 | |
*** jpena is now known as jpena|lunch | 11:26 | |
*** jpena|lunch is now known as jpena | 12:15 | |
opendevreview | Merged zuul/zuul-jobs master: ensure-podman: support Debian bullseye https://review.opendev.org/c/zuul/zuul-jobs/+/814088 | 12:54 |
*** ysandeep|trng is now known as ysandeep | 12:56 | |
fungi | yoctozepto: i have not, but i'll readily admit i haven't messed around with hashtags much yet | 13:22 |
yoctozepto | ack, thanks for responding | 13:23 |
fungi | yoctozepto: did you try hashtag:'' (the hashtag parameter seems to suggest that it will only match one) | 13:26 |
fungi | https://review.opendev.org/Documentation/user-search.html#hashtag | 13:26 |
opendevreview | Mark Goddard proposed openstack/project-config master: kolla-cli: end gating for retirement https://review.opendev.org/c/openstack/project-config/+/814580 | 13:27 |
Clark[m] | yoctozepto: fungi: newer Gerrit seems to have the inhashtag search operator. This should allow you to search for -inhashtag:"^.*" But our Gerrit isn't new enough. If you are consistent enough about hashtags you can negate a specific list of them instead. | 13:35 |
fungi | inhashtag is added in 3.4? | 13:36 |
Clark[m] | I'm not sure when it got added. Gerrit upstream documents it currently but our Gerrit reports it isn't valid | 13:37 |
Clark[m] | Might be a 3.5 festure | 13:37 |
fungi | ahh | 13:38 |
yoctozepto | Clark[m]: ah, thanks | 13:50 |
*** lbragstad_ is now known as lbragstad | 14:15 | |
*** ykarel_ is now known as ykarel | 14:22 | |
opendevreview | Mark Goddard proposed openstack/project-config master: kolla-cli: enter retirement https://review.opendev.org/c/openstack/project-config/+/814597 | 14:48 |
opendevreview | Mark Goddard proposed openstack/project-config master: kolla-cli: end gating for retirement https://review.opendev.org/c/openstack/project-config/+/814580 | 15:05 |
opendevreview | Mark Goddard proposed openstack/project-config master: kolla-cli: enter retirement https://review.opendev.org/c/openstack/project-config/+/814597 | 15:05 |
opendevreview | Thiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel https://review.opendev.org/c/openstack/project-config/+/814600 | 15:06 |
opendevreview | Thiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel https://review.opendev.org/c/openstack/project-config/+/814600 | 15:30 |
opendevreview | Shnaidman Sagi (Sergey) proposed zuul/zuul-jobs master: Print version of installed podman https://review.opendev.org/c/zuul/zuul-jobs/+/814604 | 15:44 |
opendevreview | Thiago Paiva Brito proposed openstack/project-config master: Adding gerritreview messages to starlingx channel https://review.opendev.org/c/openstack/project-config/+/814600 | 15:47 |
opendevreview | Alfredo Moralejo proposed openstack/diskimage-builder master: [WIP] Add support for CentOS Stream 9 in DIB https://review.opendev.org/c/openstack/diskimage-builder/+/811392 | 16:01 |
opendevreview | Alfredo Moralejo proposed openstack/diskimage-builder master: [WIP] Add support for CentOS Stream 9 in DIB https://review.opendev.org/c/openstack/diskimage-builder/+/811392 | 16:13 |
clarkb | I'm working on fixing up my zuul change for key deletion, but then I'm free to do zuul + gerrit restarts until the infra meeting and free again after that (though a bike ride this afternoon would be good too before the rain returns) | 16:24 |
fungi | same, the last ptg session in this block ends in half an hour and then i can help with restarts/reviews | 16:24 |
*** marios is now known as marios|out | 16:25 | |
opendevreview | Merged openstack/project-config master: Adding gerritreview messages to starlingx channel https://review.opendev.org/c/openstack/project-config/+/814600 | 16:25 |
*** pojadhav|ruck is now known as pojadhav|out | 16:26 | |
*** jpena is now known as jpena|off | 16:27 | |
*** ysandeep is now known as ysandeep|out | 16:29 | |
corvus | clarkb: i guess we don't need to wait on your change for the restart...? it won't fix the current situation (and we'll need to delete those znodes manually if not done already) | 16:31 |
corvus | fungi, clarkb: are we doing a synchronized zuul+gerrit restart? | 16:32 |
corvus | assuming "yes" to all the above, seems like aiming for 1700utc is the way to go | 16:32 |
fungi | corvus: yes, we have gerrit things we want included in a restart | 16:33 |
fungi | i'm up for a 17:00 utc restart of both together | 16:33 |
corvus | kk see you then | 16:33 |
fungi | thanks! | 16:33 |
fungi | clarkb: you're okay to do that in ~25 minutes? | 16:34 |
fungi | i'll give #openstack-release a heads uip | 16:34 |
clarkb | corvus: correct don't need to wait for my change | 16:34 |
clarkb | 17:00 UTC wfm | 16:34 |
clarkb | fungi: should we go ahead and do a docker-compose pull on review? | 16:46 |
fungi | yeah, i can do that now | 16:49 |
fungi | it's running in a root screen session there now | 16:50 |
clarkb | cool docker image list shows the new image now | 16:51 |
fungi | opendevorg/gerrit <none> a7c2687bb510 3 weeks ago 793MB | 16:51 |
fungi | that one? | 16:51 |
clarkb | opendevorg/gerrit 3.3 33d6300c73ad 24 hours ago 795MB that one | 16:53 |
fungi | oh, yep, screen within tmux, i scrolled the wrong one ;) | 16:54 |
fungi | we have more than a a terminal's worth of images listed now | 16:54 |
clarkb | When we restart we'll have caught up to wikimedia in terms of gerrit version :) | 16:54 |
fungi | until they upgrade again | 16:54 |
fungi | also didn't they replace gerrit with phabricator forever ago? | 16:55 |
clarkb | ha yes. Though I'm hoping to get the 3.4 upgrade done by the end of the year | 16:55 |
clarkb | fungi: I think replacing gerrit has become more difficult than they anticipated (I suspect they have gerrit fans) | 16:55 |
fungi | status notice Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again | 16:59 |
fungi | should i send that? i copied it from the last one we did | 16:59 |
corvus | lgtm | 16:59 |
fungi | #status notice Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again | 16:59 |
opendevstatus | fungi: sending notice | 16:59 |
-opendevstatus- NOTICE: Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again | 16:59 | |
corvus | i ran zuul_pull just in case (but i'm pretty sure all the images were already there) | 16:59 |
fungi | awesome, i'm ready for the gerrit restart once zuul is down | 16:59 |
clarkb | fungi: I'm attached to the screen now too and will follow along | 17:00 |
corvus | shall i stop zuul now? | 17:00 |
clarkb | I'm ready | 17:01 |
fungi | corvus: go for it | 17:01 |
corvus | scheduler is stopped; feel free to proceed | 17:01 |
fungi | gerrit is stopping | 17:01 |
corvus | (waiting for okay to start zuul) | 17:01 |
fungi | starting gerrit again now | 17:02 |
fungi | [2021-10-19T17:02:23.083Z] [main] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.3.7-2-g17936a0b79-dirty ready | 17:02 |
fungi | i can pull up the webui too | 17:02 |
fungi | clarkb: lgty? | 17:03 |
clarkb | web ui is working for me | 17:03 |
clarkb | ya I think its happy. reports 3.3.7 as the version too | 17:03 |
fungi | corvus: go for starting zuul when you're ready | 17:03 |
corvus | starting zuul | 17:03 |
clarkb | re the gerrit.config updates we made the biggest thing I was worried about was change screen and theming since we removed the old theming config and removed the default change screen config | 17:04 |
clarkb | but neither is a thing in gerrit anymore so would have to be a weird interaction that (and gerrit bug) for there to be any problems | 17:04 |
clarkb | theme and change screen look as expected to me | 17:04 |
fungi | yeah | 17:04 |
fungi | before the restart i was seeing weird fonts in the unified diff view, looks like it's still happening too | 17:05 |
fungi | same in side-by-side actually | 17:05 |
clarkb | fungi: are you filtering web fonts maybe and the nfallibng back to whatever is in your browser? I don't have this issue btu I don't think I'm filtering any web fonts | 17:06 |
fungi | probably something weird with my browser settings, but all the lines of code are rendered double-size but with single size spacing | 17:06 |
clarkb | huh ya that isn't a problem for me. Let me double check in a different browser | 17:06 |
fungi | neither privacy badger nor ddg privacy essentials indicate anything is being blocked | 17:07 |
clarkb | ya I can't reproduce | 17:07 |
corvus | looks fine for me. i tried zooming in and out and font size+leading seem to go hand-in-hand so it looks good at all zooms | 17:08 |
fungi | it might be that newest firefox is assuming a smarter window manager than i'm using | 17:11 |
corvus | i blame rust | 17:12 |
corvus | ;) | 17:12 |
fungi | when i try to use its "take a screenshot" feature it complains about a background error for "getZoomFactor is not defined" | 17:12 |
clarkb | are you on wayland? | 17:12 |
fungi | nope, zorg with ratpoison | 17:13 |
fungi | er, xorg | 17:13 |
clarkb | fractional scaling and all that is a big deal with wayland | 17:13 |
fungi | libwayland is installed, but probably just because some things link it | 17:14 |
corvus | ("zorg here!") | 17:14 |
corvus | tenants loaded | 17:15 |
corvus | starting re-enqueueeueueuing | 17:15 |
clarkb | I'm glad others find that word hard to type | 17:16 |
corvus | perhaps we should have the zuul cli accept "enq[ue]*" | 17:17 |
fungi | we could even just set up a conditional ladder which checks whether each of the available subcommands .contains() sys.argv[1] | 17:18 |
fungi | elif "enqueue".contains(sys.argv[1]): ... | 17:19 |
fungi | basically allow all subcommands to be arbitrarily shortened | 17:19 |
corvus | yeah, i like programs that do that | 17:20 |
fungi | more magic could be added to identify ambiguous abbrevs | 17:20 |
clarkb | | 0026931648 | rax-dfw | ubuntu-focal | c1e73330-d61f-423b-bc32-e4460f15ee56 | 104.130.141.20 | | used | 05:07:20:22 | locked | still shows up in a nodepool listing but the other 59 in-use nodes that got stuck seem to be gone | 17:20 |
clarkb | I suppose ^ may be a different issue? | 17:20 |
fungi | clarkb: held? | 17:21 |
fungi | does --detail give you anything else useful? | 17:21 |
clarkb | fungi: no it is "used". There is a held node from the timeframe too but thats fine | 17:21 |
fungi | oh, yeah, it wouldn't be used in that case | 17:21 |
clarkb | --detail doesn't give any additional useful info | 17:22 |
corvus | it doesn't say who locked it? i thought it did | 17:23 |
clarkb | oh is that what nl01.opendev.org-PoolWorker.rax-dfw-main-23ef88ea5474439dac253fa13c63d4f7 is? | 17:24 |
clarkb | maybe we can restart the launcher and the lock will go away then it can retry deleting it? | 17:24 |
clarkb | Launcher is the column id for that value | 17:25 |
corvus | yeah. a sigusr2 thread dump might illustrate why nl01 is sitting on a locked used node and not doing anything with it | 17:25 |
corvus | oh then that may not be the lock holder | 17:26 |
clarkb | ya --detail's column header doesn't seem to identify the lock holder | 17:26 |
corvus | bummer, i thought we had that :/ | 17:27 |
corvus | re-enqueue complete | 17:27 |
corvus | at worst, we can inspect zk, but that's probably something we should expose thru the cli | 17:27 |
clarkb | I need to figure out food as I've somehow neglected to eat anything today. Back in a bit | 17:29 |
fungi | this is a screen capture of what the gerrit diff view has been doing in ff for me: http://fungi.yuggoth.org/tmp/ff.png | 17:36 |
fungi | comparing the same url, chromium looks fine | 17:37 |
fungi | i closed out the screen session on review just now, btw, seems like the restart was a success | 17:48 |
clarkb | fungi: could it be the dark theme? | 17:49 |
fungi | mebbe | 17:50 |
fungi | i'll try switching it up | 17:50 |
fungi | clarkb: aha, it's at least *something* to do with my account prefs, because if i load it in another container tab not authenticated, i get a reasonable looking diff | 17:54 |
fungi | found it! | 17:56 |
fungi | apparently changing "font size" in the diff view doesn't change the space between the lines? it was for some reason set to 24 in my preferences, dropping it to 12 seems to have fixed things | 17:57 |
clarkb | that is similar to the issue in etherpad chopping off subsequent top of lines | 17:58 |
clarkb | corvus: I'm looking at the lock contents in zk for that node. Is the uuid looking thing at the front of the path identifying a connection? | 18:01 |
clarkb | hrm no it seems zk identifies connections with a session id which is a different ype of value. Any idea how to map the lock to the connection? | 18:05 |
corvus | if the lock doesn't have an id, then you may be able to map it to a session, then find an ephemeral node owned by the same session that identifies it. | 18:06 |
clarkb | oh right there is a way to list ephemeral nodes by sessions iirc and the locks are ephemeral? | 18:07 |
clarkb | dump on the leader | 18:07 |
clarkb | zk05 is the laeder | 18:07 |
clarkb | corvus: nl01 is holding the lock. I ran dump then found the session id associated with the lock then listed session with `cons` which gave me an ip address that dig -x resoled to nl01 | 18:10 |
clarkb | so now I guess we sigusr2 on nl01 and see what might be holding up the used node deletion | 18:11 |
fungi | nl01 is also the launcher responsible for that node, so i guess it makes sense | 18:12 |
clarkb | neither 0026931648 nor c1e73330-d61f-423b-bc32-e4460f15ee56 show up in the thread dump. Launcher threads seem to use the nodepool id to name the thread and deleter threads the node uuid | 18:15 |
clarkb | two thread match rax-dfw a delete thread for a different uuid and the poolworker for rax dfw | 18:16 |
clarkb | corvus: the last thing logged by the launcher is that it unlocked the node and it is ready | 18:20 |
clarkb | I'm wondering if we somehow lost the lock during the zk problems | 18:21 |
clarkb | like zuul ran the job and set it to used, then nl01 grabs the lock and immediately after the zk problems occur causing nl01 to lose track | 18:21 |
clarkb | and that happens before nl01 can log anything about doing cleanup of the used now | 18:21 |
clarkb | *used node | 18:21 |
clarkb | I suspect that restarting the launcher on nl01 will correct this. But I still can't find an indication for how this started | 18:25 |
kopecmartin | hi all, how can we publish project documentation to docs.opendev.org? we tried opendev-promote-docs and also promote-tox-docs-infra but we can't still see the doc https://docs.opendev.org/opendev/ | 19:20 |
kopecmartin | I made a silly mistake somewhere | 19:20 |
kopecmartin | for reference: https://review.opendev.org/c/openinfra/refstack/+/814635 | 19:21 |
clarkb | kopecmartin: I would start by looking at the job logs for the jobs that ran already | 19:21 |
clarkb | https://static.opendev.org/docs/refstack/latest/ does seem to be updating so it may just be a matter of vhost stuff properly serving it? | 19:22 |
kopecmartin | ah, yeah, that would explain it | 19:23 |
kopecmartin | clarkb: thanks! | 19:23 |
clarkb | kopecmartin: https://docs.opendev.org/openinfra/refstack/latest/ there you go | 19:24 |
kopecmartin | nice, now i wonder whether it was done by promote-tox-docs-infra or opendev-promote-docs | 19:24 |
kopecmartin | i'm gonna check the logs | 19:24 |
fungi | yes, project docs are namespaced on docs.opendev.org since we aim to publish documentation for multiple communities | 19:25 |
fungi | instead of writing to docs/refstack the job should be configured to publish to docs/openinfra/refstack | 19:26 |
clarkb | ianw: I think the whole dib stack should be approved now | 20:13 |
ianw | clarkb: thanks! i'll keep an eye on it all | 20:14 |
opendevreview | Douglas Viroel proposed zuul/zuul-jobs master: Add FIPS enable multinode job definition https://review.opendev.org/c/zuul/zuul-jobs/+/813253 | 20:21 |
clarkb | ianw: the whole stack just failed on a tox py35 failure | 20:29 |
clarkb | fungi: ^ we broke xenial jobs with the bindep release | 20:29 |
clarkb | fungi: we need a packaging pin for python3.5 | 20:29 |
opendevreview | Clark Boylan proposed opendev/bindep master: Add old python packaging pin https://review.opendev.org/c/opendev/bindep/+/814647 | 20:32 |
clarkb | something like ^ then a release? | 20:32 |
clarkb | dib can also rpobably drop python3.5 testing? | 20:33 |
clarkb | ianw: ^ that might be quicker. | 20:33 |
fungi | aha, we can't install latest packaging on xenial? makes sense | 20:46 |
fungi | though surprised bindep's xenial job didn't catch it | 20:46 |
clarkb | ya I don't understand how it got through but if you look at the error at https://zuul.opendev.org/t/openstack/build/cae47da97cff44c8a855f30378634ee1/log/job-output.txt packaging complains about invalid python and their changelog says 20.9 was the last python3.5 capable release | 20:49 |
clarkb | I'm going to get a bike ride in now before the rain arrives tomorrow but can dig more after if we want to fully understand that | 20:49 |
fungi | and yeah, 21.0 was tagged back at the beginning of july | 20:50 |
fungi | have a good ride, i'll take a closer look after dinner | 20:50 |
ianw | sorry, back, looking | 20:53 |
*** dviroel|rover is now known as dviroel|rover|afk | 20:59 | |
*** avass[m] is now known as AlbinVass[m] | 21:00 | |
ianw | i guess it's just not a path covered by the tox run | 21:03 |
ianw | that's not it. my tox install chose packaging (20.9) | 21:12 |
ianw | (tox py35) | 21:12 |
ianw | seemingly so did the bindep gate tests | 21:12 |
fungi | xenial's default pip version is too old to support python_requires metadata in packages | 21:14 |
fungi | so it probably only failed in jobs which are not using new pip | 21:14 |
ianw | hrm, it looks like the bindep gate uses xenial for 3.5 -> https://zuul.opendev.org/t/opendev/build/e613c1b0042549c59d07825d97b5ff05/logs | 21:19 |
ianw | but that's not "openstack-tox-py35", it's "tox-py35" | 21:20 |
ianw | i wonder if that's doing some pip upgrades in the tox env | 21:20 |
fungi | the tox logs should say | 21:23 |
ianw | oh, i think what has happened here is that the bindep jobs run "ensure-pip" | 21:24 |
ianw | compare | 21:24 |
ianw | bindep -> https://zuul.opendev.org/t/opendev/build/e613c1b0042549c59d07825d97b5ff05/console | 21:25 |
ianw | dib -> https://zuul.opendev.org/t/openstack/build/cae47da97cff44c8a855f30378634ee1/console | 21:25 |
fungi | yup, that'll do it | 21:27 |
fungi | huh, the fix failed tox-py35 on this: https://zuul.opendev.org/t/opendev/build/1f00b3e2c8a749eca74ee50a7cc17d44/console#1/0/16/ubuntu-xenial | 21:37 |
ianw | why does openstack-tox-py35 run zuul-jobs/playbooks/tox/post.yaml but not zuul-jobs/playbooks/tox/pre.yaml ? | 21:37 |
ianw | tox-py35 really should have run playbooks/tox/pre.yaml, right? from https://opendev.org/zuul/zuul-jobs/src/branch/master/zuul.d/python-jobs.yaml#L42 | 21:40 |
fungi | also someone just e-mailed openstack-discuss asking for help logging into their gerrit account, seems like it might be another case of a duplicate resulting from an address change in ubuntuone | 21:44 |
ianw | it really does seem to me that openstack-tox-p35 should ultimately parent to "tox", which should have a pre-run.yaml step that runs ensure-tox ,which will run ensure-pip, which will upgrade things | 21:49 |
ianw | is it possible the dib problem and the failing bindep fix both stem from some other root cause relating to pip not upgrading? | 21:50 |
ianw | possibly the zuul restart ~ 5 hours ago ... ? | 21:52 |
fungi | something that's causing some playbooks to no longer be run? | 21:54 |
ianw | it seems unlikely looking at recent changes ... but i am struggling to see why that playbook wouldn't run | 21:55 |
ianw | i mean compare console of | 21:57 |
ianw | https://zuul.opendev.org/t/opendev/build/7966ab9ee15f4f3e8460b23652cfddc5/console (tox-py35, earlier run) | 21:58 |
ianw | https://zuul.opendev.org/t/opendev/build/1f00b3e2c8a749eca74ee50a7cc17d44/console (tox-py35, failing run now) | 21:58 |
ianw | it's actually missing "tox/pre.yaml" and "tox/run.yaml" ... ? | 21:59 |
ianw | ohh, i see: pre/unittests.yaml is the one that's failing. further bits are skipped | 22:01 |
ianw | oohhh, i further see -- the broken bindep has broken the bindep testing | 22:02 |
ianw | ok ... soooo ... | 22:14 |
ianw | we have created the on-image bindep 2.10.0 @ https://nb02.opendev.org/ubuntu-xenial-0000210004.log | 22:14 |
ianw | for some reason, this has created /usr/bindep-env/ with "2021-10-19 14:22:49.157 | You are using pip version 8.1.1, however version 21.3 is available." | 22:15 |
ianw | this has pulled packaging 21 into this venv (incorrectly) | 22:16 |
ianw | bindep uses this to setup the tox environment to run the bindep tests, hence the recent explosion | 22:17 |
opendevreview | Ian Wienand proposed openstack/project-config master: infra-package-needs: install latest pip https://review.opendev.org/c/openstack/project-config/+/814677 | 22:20 |
ianw | actually we should probably do that in all venvs we prime on the images | 22:22 |
opendevreview | Ian Wienand proposed openstack/project-config master: infra-package-needs: install latest pip https://review.opendev.org/c/openstack/project-config/+/814677 | 22:22 |
ianw | since that dib stack got -2'd anyway, i could rebase that on a job to remove py35 testing which would be a workaround for dib, for now | 22:35 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: epel: match replacement better https://review.opendev.org/c/openstack/diskimage-builder/+/813922 | 22:39 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: Revert "Allowing ubuntu element use local image" https://review.opendev.org/c/openstack/diskimage-builder/+/814094 | 22:39 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: ubuntu-systemd-container: deprecate and remove jobs https://review.opendev.org/c/openstack/diskimage-builder/+/814068 | 22:39 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: ubuntu: add Focal test https://review.opendev.org/c/openstack/diskimage-builder/+/814072 | 22:40 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: functests: drop apt-sources https://review.opendev.org/c/openstack/diskimage-builder/+/814074 | 22:40 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: centos7 : drop functional testing https://review.opendev.org/c/openstack/diskimage-builder/+/814075 | 22:40 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: functests: drop minimal tests in the gate https://review.opendev.org/c/openstack/diskimage-builder/+/814078 | 22:40 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: Remove extras job, put gentoo job in gate https://review.opendev.org/c/openstack/diskimage-builder/+/814079 | 22:40 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: Simplify functests job https://review.opendev.org/c/openstack/diskimage-builder/+/814080 | 22:40 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: Run functional tests on Debian Bullseye https://review.opendev.org/c/openstack/diskimage-builder/+/814081 | 22:40 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: Update centos element for 9-stream https://review.opendev.org/c/openstack/diskimage-builder/+/806819 | 22:40 |
opendevreview | Ian Wienand proposed openstack/diskimage-builder master: Remove py35 tox jobs https://review.opendev.org/c/openstack/diskimage-builder/+/814680 | 22:40 |
clarkb | fungi: oh cool you figured it out. I had a eureka moment on my bike ride. I think that when we build our images we do so with old pip but when we run tox we do so with newer pip and that made the bindep jobs pass | 22:40 |
clarkb | fungi: I think my change is correct given that | 22:40 |
clarkb | now why did it retry limit in the gate on tox py35 | 22:41 |
clarkb | its a chicken and egg issue I think | 22:41 |
clarkb | fungi: mayeb we manually test it in a docker container and if that works force merge then do a release? | 22:42 |
ianw | clarkb: yeah, its actually the pip in "python -m venv" -- which must be vendored? it's 8 and even the system one is 9 | 22:42 |
ianw | clarkb: i think we just need to rebuild images with https://review.opendev.org/c/openstack/project-config/+/814677 | 22:42 |
clarkb | ianw: iirc we override it in zuul jobs to be 9 out of our ppa but in dib builds we dont do that and get 8 | 22:42 |
clarkb | ah yup I think that will do it too | 22:43 |
clarkb | however it may struggle to update to latest pip for the same reason | 22:43 |
clarkb | we might have to do it in two passes. pip install -U pip<someknowngoodver && pip install -U pip | 22:44 |
clarkb | ianw: ^ I expect that will be necessary | 22:44 |
corvus | clarkb, fungi, ianw, mordred: i'm starting to think that zuul may need larger test nodes to run its unit tests. | 22:44 |
corvus | have we (opendev) thought about expanding the options for test node sizes? | 22:45 |
clarkb | corvus: we do have larger labels available. Maybe give them a go and see if it helps? | 22:45 |
ianw | clarkb: i'm not sure i remember why updating from 8 failed? | 22:45 |
clarkb | corvus: we've alerady done it a coupel of years ago, its just that the availability of those nodes is more limited | 22:45 |
corvus | are they multi-region? i thought maybe it was just one region for one project | 22:45 |
fungi | and if it does, we can think about how we might roll those out more broadly | 22:45 |
clarkb | corvus: they are multiregion. airship and vexxhost iirc. | 22:45 |
fungi | they're in at least two regions | 22:45 |
corvus | airship=citycloud? | 22:46 |
clarkb | corvus: yes | 22:46 |
clarkb | and when we had the donnyd basement cloud we had them there too | 22:47 |
ianw | clarkb: i think we only upgraded to 9 because of mirror issues, which wouldn't affect the dib build https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-pip/tasks/xenial.yaml#L1 | 22:47 |
ianw | (sorry i know we have two conversations going on :) | 22:47 |
clarkb | ianw: but 8 doesn't do the package metadata which is necessary to install the newest pip that supports python3.5 I think | 22:47 |
corvus | [as an aside, i think 2 things are at play: 1) zuul and openstack have different constraints/goals/etc; 2) even in openstack's case, it's probably reasonable to reconsider whether "standard issue laptop from 12 years ago" is still the right baseline for unit tests :) ] | 22:47 |
clarkb | ianw: ya pip 21 and newer doesn't support python3.5 | 22:48 |
clarkb | corvus: I think the broader issue is that getting these resources to fit into clouds is difficult. The bigger nodes we do the less throughput we can offer | 22:48 |
clarkb | and that has detrimental affects in other ways | 22:48 |
fungi | well, not entirely true if the jobs run faster | 22:49 |
clarkb | fungi: for most of our workload (openstack) we are cpu constrained | 22:49 |
corvus | clarkb: ++ especially if the cloud tenant is designed specifically for one flavor | 22:49 |
clarkb | fungi: and usually we can get more memory but not more cpu | 22:49 |
fungi | clarkb: gotta wonder what percentage of overall node-hours are consumed by dog-slow jobs grinding in swap thrash though | 22:49 |
fungi | those could significantly skew the overall usage | 22:50 |
corvus | i feel like zuul-web is insufficiently responsive | 22:50 |
clarkb | ya thats fair its possible we could see cpu be freed for real work | 22:50 |
corvus | maybe because zuul-scheduler is busy? | 22:51 |
corvus | yep, it's better now. nm. | 22:51 |
clarkb | corvus: ya I agree it loads status but very slowly | 22:51 |
corvus | i think it's building a bunch of layouts | 22:51 |
clarkb | corvus: re zuul specifically are you finding that it is memory constrainted or cpu or both? or maybe disk? | 22:51 |
fungi | it would be an interesting experiment to switch to 16gb flavors everywhere, but we'd probably be unable to roll it back since a bunch of projects would unknowingly merge changes which consumed a lot more ram | 22:51 |
clarkb | disk is probably the most difficult of the bunch to address | 22:51 |
corvus | is ubuntu-bionic-32GB what i'm looking for? | 22:52 |
fungi | corvus: we should probably add a focal version of those too | 22:52 |
clarkb | ianw: I just confirmed on an ubuntu xenial container that we'll have to two step udpate pip | 22:53 |
ianw | clarkb: yep, me too :) just fixing, a great observation :) | 22:53 |
clarkb | corvus: I think the flavors are called -expanded | 22:53 |
clarkb | for 16gb and then there is a 32gb flavor which is also available | 22:53 |
clarkb | might be good to check against both? | 22:53 |
corvus | yep. -32GB is only 1 region | 22:54 |
corvus | oh wait, ubuntu-bionic-expanded is also only 1 region | 22:54 |
clarkb | hrm when we did the vexxhost stuff did we not add them to the existing pools /me looks | 22:54 |
corvus | there's a ubuntu-bionic-expanded-vexxhost | 22:55 |
clarkb | ya ok so we did split them up like that hrm | 22:55 |
clarkb | I think normalizing that better is a reasonable thing to do | 22:55 |
corvus | but it's also one region. so maybe there's confusion thinking that ubuntu-bionic-expanded is 2x, but really it's ubuntu-bionic-expanded x1 and ubuntu-bionic-expanded-vexxhost x1 | 22:55 |
clarkb | corvus: yup exactly. I think we could do two regions but haven't | 22:55 |
corvus | okay. i'll come up with something. gimme a few mins | 22:55 |
opendevreview | Ian Wienand proposed openstack/project-config master: infra-package-needs: install latest pip https://review.opendev.org/c/openstack/project-config/+/814677 | 22:56 |
clarkb | ianw: pip install -U pip<21 && pip install -U pip? | 22:56 |
ianw | i can make it more like that if you like | 22:56 |
clarkb | ianw: no your change is fine. I did suggest maybe using <21 in the xenial case but I seriously doubt we'll get a new 20.x release | 22:57 |
corvus | what's the purpose of ubuntu-bionic-vexxhost ? it's just an 8g node; seems the same as ubuntu-bionic | 22:58 |
ianw | corvus: i have some feeling that may be for kvm nesting? | 22:59 |
corvus | oh, like it's just "get be a bionic node on vexxhost because they have kvm"? | 22:59 |
corvus | nested virt | 22:59 |
clarkb | corvus: I think thats actually the big memory flavor with 8vcpus not 8gb memory | 23:00 |
clarkb | corvus: you should double check with the cloud | 23:00 |
clarkb | this really could use some normalizing and maybe comments to explain the different cloud flavors since their choices don't encessary mimic our choices in naming scheme | 23:01 |
clarkb | nested-virt-ubuntu-focal <- that might actually be a big memory server | 23:01 |
clarkb | ah but it isn't in other clouds so that would be the issue. We want a new label using that flavor name and the -expanded theme on our side | 23:01 |
fungi | yes, part of why we switched vexxhost nodes out of our normal pool is their 8vcpu flavor switched to coming with 32gb ram, and then a zuul change was merged after passing testing on one of those which used more memory causing it to no longer work on 8gb nodes, so we isolated them to nonstandard labels | 23:02 |
corvus | (was zuul-operator, but yeah) | 23:03 |
fungi | ahh, yep sorry | 23:03 |
corvus | and yeah, v3-standard-8 seems to be 8cpu 32gb ram | 23:03 |
clarkb | fungi: running https://review.opendev.org/c/opendev/bindep/+/814647 locally in a xenial container works and the non py35 jobs passed on that change. Ithink we can force merge and make a release of that | 23:04 |
clarkb | fungi: but I'll defer to you on that since I wrote the change | 23:04 |
fungi | vexxhost's cpu:ram ratio on their hardware is apparently ~1cpu:1gb, so they wanted to align their flavors to better fit the systems | 23:04 |
fungi | clarkb: so you're sure >3.5 is correct and doesn't need to be >=3.6 instead? | 23:05 |
fungi | (per my inline comment on it) | 23:05 |
clarkb | it seems to have worked but that is a good point. Let me update it so that we don't have human confusion at least | 23:06 |
fungi | i thought we'd used >= elsewhere over concerns that >3.5 would still match 3.5.x versions | 23:06 |
fungi | clarkb: your xenial container had distro-supplied pip? (9.something was it?) | 23:07 |
clarkb | packaging ; python_version >= '3.6' and packaging<21.0 ; python_version < '3.6' | 23:07 |
clarkb | fungi: no I had to do the two pass thing I described above before installing tox | 23:07 |
clarkb | apt-get install python3-pip && pip3 isntall -U 'pip<21' && pip install -U pip && pip install tox | 23:08 |
opendevreview | Clark Boylan proposed opendev/bindep master: Add old python packaging pin https://review.opendev.org/c/opendev/bindep/+/814647 | 23:09 |
clarkb | fungi: ^ like that? | 23:09 |
fungi | ahh, okay. in that case you likely had new enough pip that it wouldn't have downloaded packaging 21.0 anyway, right? | 23:09 |
fungi | i thought the problem was you needed old pip without python_requires metadata support in order to trigger it | 23:10 |
ianw | yeah i'm not sure we need the pin, we just need up-to-date pip's? | 23:10 |
fungi | because newer pip knows not to install a version of packaging which says it won't work with python 3.5 | 23:10 |
clarkb | oh right so in my test I need to downgrade pip back to 8 | 23:11 |
clarkb | the reason I updated was I needed to install tox | 23:11 |
clarkb | I think making bindep work with older pip is a reasonabel thing given its position in bootstrapping things | 23:11 |
opendevreview | James E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB to vexxhost-specific https://review.opendev.org/c/openstack/project-config/+/814683 | 23:11 |
clarkb | other tools I wouldn't worry too much | 23:11 |
corvus | clarkb, fungi, ianw: ^ that's the minimal change to get a 'large node' on 2 clouds. | 23:12 |
fungi | right, i'm good with the change, just pointing out the hole in the test methodology | 23:12 |
clarkb | infra-root on a hunch about slowness of things re corvus observation that zuul status was lsow and being told I couldn't resolve review.o.o locally I checked our nsX servers and only ns1 is running and nsd | 23:12 |
clarkb | Let me finish up this bindep checking then I can look closer if no one sle has addressed that yet | 23:12 |
fungi | i'll look at the nameservers now | 23:13 |
corvus | i confess, i'm still not sure whether that should be in the "main" pool which holds the "nested-virt-*" labels, or the "vexxhost-specific" pool which holds the "vexxhost" labels | 23:13 |
opendevreview | James E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB to vexxhost-specific https://review.opendev.org/c/openstack/project-config/+/814683 | 23:13 |
clarkb | corvus: I think the main pool | 23:14 |
fungi | systemctl status says nsd on ns2 crashed on 2021-08-02 at 01:45:31 UTC (2 months 18 days ago), so looks like we probably no longer have a log of why | 23:14 |
corvus | clarkb: why's that? the big ones are in the vexxhost-specific pool | 23:15 |
fungi | uptime for ns2 is 78 days, which looks suspiciously similar | 23:15 |
corvus | "journalctl -fu nsd" says it failed to start but not why | 23:15 |
fungi | i think this means the server was rebooted and nsd crashed during boot? | 23:15 |
fungi | this may have been during vexxhost server migrations? | 23:16 |
fungi | (ns2 is in vexxhost) | 23:16 |
clarkb | corvus: I think vexxhost pool was done when our vexxhost tenant was wanting to try some stuff and didn't care about signle region issues | 23:16 |
fungi | i vaguely recall those were going on around that time | 23:17 |
clarkb | corvus: I suspect that now we can fold the vexxhost specific stuff into main since we'er doing that normally now | 23:17 |
corvus | clarkb: okay i'll put it in main | 23:17 |
opendevreview | James E. Blair proposed openstack/project-config master: Add ubuntu-bionic-32GB https://review.opendev.org/c/openstack/project-config/+/814683 | 23:17 |
fungi | i manually stopped and started nsd with systemctl and it's running now | 23:17 |
clarkb | if my container hadn't said "I can't resolve this" I wouldn't have thought to check the nsds | 23:18 |
ianw | clarkb: if you can look at https://review.opendev.org/c/openstack/diskimage-builder/+/814680 to remove py35 jobs on dib when all this over that would be good too | 23:18 |
clarkb | fungi: corvus: do we need to force a replication from adns now? | 23:19 |
clarkb | otherwise ns2 might serve old stale data? | 23:19 |
clarkb | ianw: can do | 23:19 |
fungi | ns1 and ns2 are serving the same serial on the opendev.org soa | 23:19 |
fungi | so i think it's all good now? | 23:19 |
fungi | #status log Manually restarted nsd on ns2.opendev.org, which seems to have failed to start at boot | 23:19 |
opendevstatus | fungi: finished logging | 23:20 |
fungi | clarkb: i'll check the logs, but nsd ought to be smart enough to not take requests until after it checks serials on its zones against adns1 and initiates any necessary zone transfers | 23:21 |
clarkb | ya if the serial is the same we should be good | 23:21 |
clarkb | heh and now is paste unhappy? | 23:22 |
clarkb | there it goes maybe my local resolver hasn't figured out ns2 is happy again | 23:22 |
fungi | probably the usual db socket timeout | 23:22 |
corvus | my git review running right now is very slow | 23:25 |
clarkb | I have pretty high packet loss to paste.o.o | 23:25 |
clarkb | I think that explains it | 23:25 |
clarkb | pign to review is fine though so I don't know that explains a slow git review | 23:26 |
clarkb | fungi: ianw https://gist.github.com/cboylan/a14e3458f187ccd3561c8fe96b82509b that hsould be a better test of the bindep install with pip 8.1.1 | 23:27 |
corvus | seems better now. :/ | 23:27 |
ianw | clarkb: so pbr is really in the same boat? | 23:28 |
clarkb | ianw: pbr is in a different boat :( pbr is a setup requires which get install by easy_install. easy_install doesn't support SNI on xenial (and maybe bionic? I don't remember how far back that went) and pypi is SNI only on their CDN now | 23:30 |
clarkb | ianw: pip does do SNI even when old like that so you have to install pbr first then install other things that use pbr :( | 23:30 |
clarkb | does anyone else have trouble getting to paste? | 23:30 |
clarkb | I'm wondering if we're going to get a message from rax saying the host it is on had trouble and got rebooted or if this is specific to me | 23:30 |
clarkb | via ipv4 fwiw | 23:30 |
fungi | i'm getting no icmp6 echo replies | 23:31 |
ianw | agree from .au too | 23:31 |
fungi | nor can i ping it over ipv4 | 23:31 |
fungi | oh, intermittent response now | 23:32 |
fungi | 55.5556% packet loss | 23:32 |
fungi | i'll see if i can get to the console for it | 23:32 |
fungi | mmm, i was able to ssh in just now | 23:32 |
fungi | but extremely laggy | 23:32 |
fungi | 23:33:00 up 28 days, 18:47, 1 user, load average: 0.00, 0.00, 0.00 | 23:33 |
fungi | so i don't think the server is being hammered | 23:33 |
ianw | i just pulled up a console and there's nothing but a login prompt that responds on it | 23:33 |
fungi | likely network upstream from the instance has many sads | 23:33 |
fungi | no ticket from rackspace about any issues yet though | 23:34 |
clarkb | I guess we wait it out for a bit then? | 23:36 |
ianw | i'm not sure i was aware of that pbr snafu | 23:38 |
clarkb | I've just responded to a user on openstack-discuss that cannot login to gerrit because they are trying to login with a new openid that is associated with an old gerrit account and gerrit won't create a new account with conflicting ids | 23:39 |
clarkb | Just a heads up here as I've asked them to reach out on IRC as its a bit easier to debug this stuff interactively | 23:39 |
clarkb | but I did make a couple of suggestions for how we can proceed in the email should it come up when I am not around | 23:40 |
clarkb | ianw: I +2'd the dib removal of py35 jobs but left a note | 23:43 |
clarkb | ianw: we very intentionally made pbr continue to support really old stuff because setup_requires also can't effectively pin deps | 23:45 |
clarkb | ianw: which basically maens you always get the latest thing even on the oldest system you've got | 23:45 |
clarkb | we really have to be careful with pbr to not add a bunch of fancy new python stuff | 23:45 |
fungi | or to sufficiently guard it so that it only gets called on new enough python and has working fallbacks | 23:46 |
ianw | i forget why we added openstack-python3-wallaby-jobs instead of individual tox jobs ... to git blame! | 23:50 |
clarkb | fungi: note we still expect https://review.opendev.org/c/opendev/bindep/+/814647 to fail in the gate right? Are you just double checking it against the newer pythons? | 23:51 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!