openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: install python2-pip on SuSE when required https://review.opendev.org/724777 | 01:55 |
---|---|---|
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial https://review.opendev.org/724788 | 01:55 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing https://review.opendev.org/724776 | 01:55 |
ianw | does anyone know how the nb04 config started using ipv6 addresses for the zk hosts? i can't find any discussion on it afaics | 03:29 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: nodepool-base: Quote ipv6 literals for ZK hosts https://review.opendev.org/725160 | 03:48 |
ianw | mordred / infra-root: ^ i see now we're overwriting the zk hosts; either this or https://review.opendev.org/#/c/725157/ or both should get builders connected again | 03:50 |
*** ykarel|away is now known as ykarel | 04:19 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: install python2-pip when running under Python 2 https://review.opendev.org/724777 | 04:46 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial https://review.opendev.org/724788 | 04:46 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing https://review.opendev.org/724776 | 04:46 |
*** ysandeep|afk is now known as ysandeep | 05:20 | |
*** dpawlik has joined #opendev | 06:04 | |
*** dpawlik has quit IRC | 06:04 | |
ianw | btw virtualenv is broken on centos-7 : see https://github.com/pypa/virtualenv/issues/1810 | 06:06 |
ianw | this makes testing of zuul-jobs related things fail, so for today i give up | 06:07 |
*** dpawlik has joined #opendev | 06:07 | |
*** dpawlik has quit IRC | 06:07 | |
*** dpawlik has joined #opendev | 06:08 | |
*** ykarel is now known as ykarel|afk | 06:21 | |
*** rchurch has quit IRC | 06:31 | |
*** rchurch has joined #opendev | 06:32 | |
*** rpittau|afk is now known as rpittau | 06:33 | |
*** ykarel|afk is now known as ykarel | 06:37 | |
*** DSpider has joined #opendev | 06:53 | |
*** tosky has joined #opendev | 07:32 | |
*** sshnaidm|off is now known as sshnaidm | 07:33 | |
dpawlik | hi. Is everything OK with mirroring centos and fedora? It seems like mirrors.centos and mirror.fedora was last time updated 6 days ago http://grafana.openstack.org/d/ACtl1JSmz/afs?orgId=1 | 08:01 |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Fix fetch-sphinx-tarball fails https://review.opendev.org/725210 | 08:09 |
*** roman_g has joined #opendev | 08:10 | |
*** ysandeep is now known as ysandeep|lunch | 08:35 | |
*** dtantsur|afk is now known as dtantsur | 08:43 | |
*** roman_g has quit IRC | 08:50 | |
jrosser | i am also having trouble with centos jobs where i get conflicting packages tying to install git-daemon http://paste.openstack.org/show/793031/ | 09:01 |
AJaeger | I see el_7_7 and el_7_8 in there- was CentOS 7.8 released and we didn't mirror completely? | 09:03 |
*** roman_g has joined #opendev | 09:03 | |
AJaeger | infra-root, please see jrosser's and dpawlik's comments on centos and fedora mirroring | 09:04 |
jrosser | AJaeger: from my very brief poke at this a couple of days ago it didn't look like the git-daemon package i need was present in the place we mirror from | 09:06 |
jrosser | and yes it seems like an incomplete mix of 7.7 and 7.8 | 09:07 |
dpawlik | ack | 09:08 |
*** ykarel is now known as ykarel|lunch | 09:22 | |
jrosser | oops i mean git-daemon package _wasnt_ present | 09:23 |
*** lpetrut has joined #opendev | 09:25 | |
*** Dmitrii-Sh has joined #opendev | 09:25 | |
*** roman_g has quit IRC | 09:38 | |
*** panda|ruck is now known as panda|pto | 09:40 | |
*** roman_g has joined #opendev | 09:55 | |
*** ralonsoh has joined #opendev | 10:03 | |
*** rpittau is now known as rpittau|bbl | 10:14 | |
*** ykarel|lunch is now known as ykarel | 10:32 | |
*** ysandeep|lunch is now known as ysandeep | 10:47 | |
*** kevinz has quit IRC | 11:04 | |
*** olaph has joined #opendev | 11:05 | |
AJaeger | infra-root, donnyd , any idea what's up with openedge? http://mirror.us-east.openedge.opendev.org/ looks down. See also https://zuul.opendev.org/t/openstack/build/65ac4493a586466781744c093ad63392 | 11:12 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Disable openedge https://review.opendev.org/725234 | 11:15 |
AJaeger | proposal to disable for now ^ | 11:15 |
donnyd | hrm... everything else is working fine | 11:17 |
donnyd | checking now | 11:17 |
donnyd | interesting... the mirror node was magically shut down | 11:18 |
AJaeger | infra-root, today's fires that I'm aware: 1) http://mirror.us-east.openedge.opendev.org/ down ; 2) virtualenv on CentOS broken, new virtualenv release is out, we need new nodepool images; 3) CentOS 7 and Fedora mirrors are old, CentOS has partial update to 7.7 and needs fixing | 11:19 |
AJaeger | donnyd: thanks for looking! | 11:19 |
donnyd | infra-root the mirrror at OE is fixed. the machine got shutdown somehow | 11:19 |
AJaeger | donnyd: thanks! So, that problems was solved quickly | 11:21 |
AJaeger | #status log mirror.us-east.openedge.opendev.org was down, donnyd restarted the node and openedge should be fine again | 11:21 |
donnyd | 9 minutes isn't too bad of a turn around time | 11:21 |
openstackstatus | AJaeger: finished logging | 11:21 |
AJaeger | donnyd: 9 minutes is excellent ;) | 11:22 |
donnyd | AJaeger: yea I logged into the project and the instance was in "shutdown" | 11:22 |
donnyd | idk how.. but anyways its back online now | 11:22 |
*** ysandeep is now known as ysandeep|brb | 11:36 | |
*** ysandeep|brb is now known as ysandeep | 11:52 | |
*** rpittau|bbl is now known as rpittau | 12:22 | |
*** ykarel is now known as ykarel|afk | 12:38 | |
*** hashar has joined #opendev | 12:53 | |
*** sgw has joined #opendev | 13:01 | |
ttx | hey everyone... was Gerrit restarted since we merged | 13:03 |
ttx | https://review.opendev.org/#/c/718478/ | 13:03 |
ttx | (Apr 29 22:52) | 13:03 |
ttx | need to know if I can start moving things around on the GitHub side | 13:04 |
frickler | ttx: I still see replication to github in the log, so I'd assume not | 13:25 |
AJaeger | ttx, it was not | 13:26 |
ttx | ok thanks! Keep me posted when it is :) | 13:26 |
*** hashar has quit IRC | 13:33 | |
*** ralonsoh has quit IRC | 13:37 | |
*** lpetrut has quit IRC | 13:49 | |
*** ralonsoh has joined #opendev | 13:49 | |
corvus | i'm looking into the centos mirror issues | 13:58 |
corvus | it looks like the volume is locked but no actual release transaction is in progress | 13:58 |
corvus | it looks like it was updating afs02.dfw when it stopped | 14:00 |
fungi | so not like related to the 2020-04-28 afs01.dfw outage | 14:05 |
fungi | er, not likely | 14:05 |
corvus | fungi: oh, that probably was it actually. the release command is run on afs01.dfw | 14:07 |
corvus | i'm going to start a screen session on afs01.dfw, grab the mirror lock, unlock the afs volume, and start a release | 14:10 |
fungi | if memory serves, it died in such a way that it was hanging clients rather than causing them to fail over to the other server, but due to a kernel panic (presumed to be from a host migration problem) it had to be rebooted | 14:10 |
*** ykarel|afk is now known as ykarel | 14:10 | |
fungi | so makes sense that the vos release command may have hung waiting for the server to respond | 14:11 |
corvus | fungi: sorry, the vos release command was *issued* on afs01.dfw; it crashed in mid process. so in this case it's afs02 waiting for afs01 to tell it to finish and unlock. | 14:12 |
corvus | basically the reverse | 14:12 |
fungi | oh, interesting | 14:13 |
fungi | if that happened before/during the afs02 reboot, i wouldn't expect afs02 to think it was waiting on anything there, but maybe it's more stateful than i realize | 14:13 |
corvus | afs02 has been up 167 days | 14:14 |
corvus | This is a completion of a previous release | 14:15 |
corvus | Starting ForwardMulti from 536870962 to 536870962 on afs02.dfw.openstack.org (full release). | 14:15 |
corvus | that's in progress now | 14:16 |
*** ysandeep is now known as ysandeep|brb | 14:18 | |
fungi | oh, wait, it was afs01.dfw which got rebooted, caffeine not connecting this morning i guess | 14:20 |
fungi | yeah, so i guess maybe it was in the middle of that when it died | 14:20 |
corvus | infra-root: i think that of AJaeger's 3 fires: #1 is done; #3 is in progress; that leaves #2 -- centos nodepool images | 14:33 |
corvus | before i just poke nodepool to make new images -- does anyone understand why a new release of virtualenv would cause our existing centos images to break? | 14:38 |
corvus | oh https://github.com/pypa/virtualenv/issues/1810 | 14:40 |
corvus | so it looks like there are 2 new releases at issue | 14:41 |
corvus | .19 bad, and is what is on our current images | 14:41 |
corvus | .18 and .20 good | 14:41 |
corvus | #status log unlocked centos mirror openafs volume and manually started release | 14:45 |
openstackstatus | corvus: finished logging | 14:45 |
corvus | #status log deleted centos-7-0000124082 image to force rebuild with newer virtualenv | 14:45 |
openstackstatus | corvus: finished logging | 14:45 |
*** hashar has joined #opendev | 14:45 | |
*** panda|pto has quit IRC | 14:46 | |
corvus | infra-root: that should mean that all 3 issues are being addressed | 14:47 |
corvus | centos-7-0000124083 is the replacement dib image | 14:47 |
corvus | building now | 14:47 |
*** mlavalle has joined #opendev | 14:48 | |
*** hashar has quit IRC | 14:50 | |
*** panda has joined #opendev | 14:50 | |
fungi | ykarel: ^ | 14:52 |
*** ysandeep|brb is now known as ysandeep | 14:52 | |
ykarel | fungi, corvus Thanks | 14:53 |
AJaeger | corvus: thanks | 14:53 |
AJaeger | dpawlik, jrosser, FYI, CentOS 7 mirror should be uptodate again. | 14:55 |
dpawlik | \o/ AJaeger | 14:56 |
dpawlik | thank you | 14:56 |
AJaeger | dpawlik: corvus did the work, I just passed messages around ;) | 14:57 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: DNM: trigger registry tests https://review.opendev.org/725294 | 14:57 |
*** olaph has quit IRC | 14:58 | |
dpawlik | AJaeger, ah | 14:58 |
dpawlik | so corvus++ | 14:58 |
dpawlik | corvus, AJaeger how ofter do you refresh data in graphana? Just curious because http://grafana.openstack.org/d/ACtl1JSmz/afs?orgId=1 there is still 6 days | 15:00 |
fungi | i just approved clarkb's 644432 fix which should correctly attempt to apply read-only settings to retired projects which were missing them | 15:01 |
fungi | that may run longer than usual | 15:01 |
clarkb | mordred: ^ I think that wont cause any issues other than ti.ing out the manage projevts job potentially | 15:02 |
clarkb | *timing | 15:02 |
clarkb | in which case we can run it again I suppose | 15:02 |
clarkb | dpawlik: the graphana data is generated by a script I expect corvus manually fixed things and the udp packets for timing info werent sent | 15:03 |
mordred | clarkb: ++ | 15:04 |
fungi | dpawlik: it also may not update until the image builds are completed | 15:04 |
dpawlik | clarkb, fungi thanks for explanation | 15:04 |
fungi | dpawlik: ykarel: the centos-7-0000124083 build corvus mentioned is getting logged at https://nb01.openstack.org/centos-7-0000124083.log (self-signed ssl cert, sorry) and once that's done, it still has to get uploaded to our providers which also takes a few minutes | 15:08 |
ykarel | fungi, Thanks | 15:10 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Do not use bare 'item' in build-container-image https://review.opendev.org/725298 | 15:11 |
openstackgerrit | Brian Haley proposed openstack/project-config master: Update Neutron grafana dashboard https://review.opendev.org/725299 | 15:12 |
*** ysandeep is now known as ysandeep|away | 15:13 | |
corvus | AJaeger, dpawlik: the release is still in progress; so i don't think the mirror is up to date yet | 15:23 |
dpawlik | corvus, ack | 15:24 |
clarkb | no lists ooms since robots.txt was updated | 15:27 |
clarkb | zuul scheduler looks like it might need a sigusr2 pair again | 15:28 |
AJaeger | clarkb: want to restart gerrit some time so that we stop github replication? | 15:30 |
clarkb | AJaeger: maybe? I think the jeepyb thing above is to address some part of that (fungi and ttx would know more than I if we are rready to update gerrit yet) | 15:37 |
fungi | the gerrit restart would merely be to stop github replication, the jeepyb fix above isn't a blocker for that | 15:38 |
fungi | ttx is separately working on tooling to build project-config changes to set which repositories should be replicating via zuul jobs, but the current set they're applied to is fairly comprehensive | 15:39 |
openstackgerrit | Merged opendev/jeepyb master: Inspect all configs in manage-projects https://review.opendev.org/644432 | 15:43 |
*** odyssey4me has joined #opendev | 15:44 | |
*** ykarel is now known as ykarel|away | 15:47 | |
*** diablo_rojo has joined #opendev | 15:47 | |
*** hashar has joined #opendev | 15:51 | |
*** dpawlik has quit IRC | 15:59 | |
*** rpittau is now known as rpittau|afk | 16:08 | |
openstackgerrit | Merged zuul/zuul-jobs master: go: Use 'block: ... always: ...' and failed_when instead of ignore_errors https://review.opendev.org/723643 | 16:15 |
openstackgerrit | Merged zuul/zuul-jobs master: ara-report: use failed_when: false instead of ignore_errors: true https://review.opendev.org/723644 | 16:17 |
clarkb | fungi: mordred ^ it doesn't look like the jeepyb change landing caused infra-prod-manage-projects to run. Maybe that means we can run it manually without a timeout? | 16:18 |
*** smcginnis has quit IRC | 16:19 | |
openstackgerrit | Merged zuul/zuul-jobs master: fetch-subunit-output: use failed_when: instead of ignore_errors: https://review.opendev.org/723653 | 16:20 |
clarkb | also where are we with zuul python3.8 iamges because if they aren't close maybe we should write an hourly cron to sigusr2 zuul :/ | 16:20 |
fungi | clarkb: good idea, i can do that after my current meeting maybe | 16:22 |
clarkb | fwiw I'll plan to sigusr2 zuul after my meeting | 16:22 |
clarkb | to hopefully reset the current trned | 16:22 |
mordred | clarkb: latest zuul images should be on 3.8 | 16:24 |
mordred | clarkb: https://review.opendev.org/#/c/724908/ landed - so restarting with a pull should have us on 3.8 | 16:24 |
clarkb | cool so maybe this sigusr2 is the last one we need and we can schedule a restart to see if 3.8 is any better | 16:25 |
mordred | yeah | 16:25 |
mordred | clarkb: we also need a gerrit restart to pick up the github repl change | 16:25 |
mordred | so maybe we do them around a similar time | 16:25 |
clarkb | mordred: we should double check that that change landed wasn't affected by the docker image promotion bugs we were working through recently | 16:25 |
clarkb | https://hub.docker.com/r/zuul/zuul-scheduler/tags I don't know how to map that back to a change | 16:26 |
mordred | clarkb: just click through on the sha: https://hub.docker.com/layers/zuul/zuul-scheduler/latest/images/sha256-80d80631a2ce593db67e3f4827bf3a22bf2152f8a160467e869aa0713b305ccb?context=explore | 16:28 |
mordred | and at least in this case you can see it built with 3.8 | 16:28 |
*** smcginnis has joined #opendev | 16:28 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Do not use bare 'item' in build-container-image https://review.opendev.org/725298 | 16:30 |
mordred | corvus: ooh - maybe we should add a label to the images when we build them to tie them back to a change - docker build has a --label option to add additional ones at build time | 16:30 |
corvus | mordred: ++ | 16:31 |
mordred | corvus: maybe one for the change, and maybe one for the git sha of the change itself (not the merge commit) - and then maybe just one that says "built by zuul.opendev.org" or something | 16:32 |
corvus | sgtm | 16:32 |
corvus | mordred: ooh | 16:32 |
corvus | mordred: could we put in a url to the build page? | 16:32 |
*** panda is now known as panda|pto | 16:35 | |
mordred | corvus: yeah | 16:37 |
mordred | corvus: we can put anything we want to :) | 16:37 |
mordred | corvus: what's the best way to get the build url in a job? | 16:47 |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul-jobs master: Enable yamllint https://review.opendev.org/725091 | 16:48 |
clarkb | infra-root on closer examination the bulk of the memory use on zuul01 is zuul-web (17GB ish) not zuul-scheduler (3GB ish) | 16:50 |
clarkb | restarting zuul-web is easy compared to scheduler can I just restart that one on the new python3.8 image? | 16:51 |
fungi | and sending any signals to zuul-web seems to result in stopping the process, right? | 16:51 |
clarkb | mordred: ^ | 16:51 |
corvus | clarkb: seems like we may have disproven the theory about having a memleak in latest cherrypy | 16:51 |
clarkb | fungi: yes, https://review.opendev.org/#/c/724946/ is related | 16:51 |
fungi | i expect just restarting it should be fine, even on a different python version | 16:51 |
fungi | ooh, you found the handler problem i guess | 16:52 |
clarkb | looks like tobiash has a good suggestion I need to consider | 16:53 |
corvus | mordred: unsure; there's a build.uuid variable; and the artifact promote job does some stuff with the api | 16:53 |
clarkb | corvus: I guess? it could be an interaction with cherrpy and newer python since the sigusr2 seemed to unstick the scheduler | 16:53 |
corvus | clarkb: we're running old-cherrypy on 3.7 though; much like we were before the container restarts | 16:54 |
clarkb | corvus: correct, but before we had old cherrypy + python 3.5 | 16:54 |
clarkb | I' | 16:54 |
clarkb | er | 16:54 |
corvus | oooh | 16:54 |
clarkb | I'm suggesting that python3.7 is the issue here too | 16:54 |
corvus | gotcha | 16:55 |
clarkb | which is why restarting it on 3.8 may be useful | 16:55 |
fungi | seems likely the same presumed gc issue could be impacting multiple daemons | 16:55 |
clarkb | fungi: yup | 16:55 |
corvus | clarkb: agreed; my point is that we have eliminated cherrypy alone as the cause | 16:55 |
clarkb | rgr | 16:55 |
mordred | corvus: download_artifact_api: "https://zuul.opendev.org/api/tenant/{{ zuul.tenant }}" | 16:56 |
mordred | corvus: we seem to just hardcode base api in the docs promote job | 16:56 |
clarkb | looking at docker image ls I think if I do cd /etc/zuul-web ; sudo docker-compose down && sudo docker-compose up -d we'll be running zuul-web on python3.8 | 16:57 |
corvus | mordred: since the build job is opendev specific, we can probably do that there | 16:57 |
corvus | clarkb: ++ | 16:57 |
*** dtantsur is now known as dtantsur|afk | 16:57 | |
corvus | clarkb: can't hurt to do an extra docker-compose pull before starting though | 16:57 |
clarkb | corvus: k | 16:58 |
mordred | corvus: the build-docker-image role isn't - and I think there's several generic things we can do | 16:58 |
mordred | corvus: lemme push up what I've got so far and we can go from there | 16:58 |
clarkb | alright I'll run a pull. down. then up -d in /etc/zuul-web now | 16:58 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx https://review.opendev.org/722339 | 16:58 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: DNM Check to see if images from intermediate work https://review.opendev.org/724751 | 16:58 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Write a buildkitd config file pointing to buildset registry https://review.opendev.org/724757 | 16:58 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Write buildkitd.toml in use-buildset-registry https://review.opendev.org/724837 | 16:58 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Add zuul labels to images and ability to user-define more https://review.opendev.org/725339 | 16:58 |
clarkb | #status log Restarted zuul-web and zuul-fingergw on new container images on zuul01. We were running out of memory due to leak in zuul-web which may be caused by python3.7 and new images provide python3.8 | 17:00 |
openstackstatus | clarkb: finished logging | 17:00 |
*** ralonsoh has quit IRC | 17:07 | |
clarkb | memory use on zuul01 has been steadily climbing since the erstart. We'll haev to wait and see if we plateau | 17:52 |
clarkb | mordred: fungi: what ist he process for running that playbook manually? do we need to lock anything? | 17:54 |
clarkb | or maybe we can trigger the job directly in zuul (that would be subject to timeouts but wouldn't have lock issues) | 17:54 |
*** mrunge_ has joined #opendev | 17:55 | |
*** mrunge has quit IRC | 17:56 | |
fungi | clarkb: for manage-projects? i was expecting to just fire the command locally on review.o.o instead of from ansible | 17:57 |
fungi | using docker exec or however ansible has been calling it | 17:58 |
clarkb | fungi: ya and oh ya that will work | 17:58 |
mordred | clarkb: yeah - what fungi said | 17:58 |
clarkb | I guess my concern is that if we land projects.yaml updates we could have competing processes | 17:58 |
mordred | although you can also touch the lockfile on bridge and run the playbook from the system-config dir | 17:58 |
clarkb | config-core ^ maybe hold off on landing new projects until we run a manage-projects by hand | 17:58 |
mordred | clarkb: /home/zuul/DISABLE-ANSIBLE | 17:59 |
mordred | clarkb: touch that on bridge and it'll prevent jobs from running - they have an hour timeout - so they'll resume once you rm it | 17:59 |
clarkb | gotcha | 17:59 |
fungi | alternatively, we can put review.o.o in the temporary disable list | 18:03 |
clarkb | fungi: that might be simpler? | 18:04 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Write buildkitd.toml in use-buildset-registry https://review.opendev.org/724837 | 18:04 |
clarkb | fwiw I'm not planning on doing it since you volunteered, but did want to have a short think about whether or not we needed to put safety glasses on first | 18:04 |
fungi | yes, absolutely | 18:04 |
fungi | my last scheduled meeting of the day just wrapped up, so catching up on a few urgent conversations and will start on that | 18:05 |
mordred | no - don't put it in the disable list | 18:05 |
mordred | touch the lock file | 18:05 |
mordred | you need to stop zuul from doing the things | 18:05 |
mordred | it's what it's there for :) | 18:06 |
*** sshnaidm is now known as sshnaidm|afk | 18:07 | |
mordred | or - I guess - honestly it's probably fine - ignore me | 18:07 |
clarkb | it should also be fine if we don't approve any new proejct additions | 18:08 |
mordred | but just saying "touch /home/zuul/DISABLE-ANSIBLE" should cover all the bases and also not cause anything to run in a half-configured state | 18:08 |
clarkb | fungi: also as a sanity check you can run manage-projects against a specific project or three first before doing the whole list | 18:08 |
clarkb | fungi: maybe run it against a retired project that hasn't had an acl update and a non retired project and make sure we get the expected results? | 18:08 |
fungi | yeah, i can do that | 18:11 |
fungi | mordred: `touch /home/zuul/DISABLE-ANSIBLE` on bridge will stop zuul from deploying anything to any server though, right? do we need to worry about it missing events then for other stuff we'd have deployed from unrelated changes? | 18:12 |
clarkb | I never had breakfast. At this point I'll call it early lunch. Back in a bit | 18:13 |
fungi | i scarfed some lentil chips and hummus while in a meeting | 18:13 |
mordred | fungi: it backs up | 18:18 |
mordred | fungi: so - the first job zuul enqueues will wait for up to an hour for the file to go away (and the jobs behind it will just be queued up in zuul) | 18:18 |
mordred | fungi: we might start missing things if it's in place for more than an hour - but at that point we should probably be disabling hosts and stuff | 18:19 |
mordred | it's basically a big pause button | 18:19 |
fungi | oh, okay that helps | 18:20 |
fungi | at the mere push of a single button! | 18:22 |
fungi | the beautiful shiny button | 18:22 |
fungi | the jolly candy-like button | 18:22 |
* fungi can't hold out, no sir-ee | 18:23 | |
fungi | pushing it now | 18:23 |
fungi | #status log temporarily paused ansible deploys from zuul by touching /home/zuul/DISABLE-ANSIBLE on bridge.o.o | 18:24 |
openstackstatus | fungi: finished logging | 18:24 |
clarkb | fungi: mordred you may need to pull a new image too? | 18:24 |
fungi | i'll check | 18:25 |
fungi | we're baking jeepyb into the gerrit image, or installing it into a separate image? i'm supposing the former since some of the gerrit hook scripts call into it | 18:25 |
mordred | it's in the gerrit image | 18:27 |
fungi | and yeah, /usr/local/bin/manage-projects is a wrapper calling docker exec | 18:27 |
mordred | we should maybe also just make a jeepyb image | 18:27 |
mordred | and use that for manage-projects but with a similar set of volume mounts | 18:27 |
mordred | so that we can update jeepyb independent of gerrit | 18:28 |
mordred | fungi, clarkb : I may have missed a thing - we have a jeepyb update that we need for manage-projects? | 18:28 |
fungi | ocker says jeepyb==0.0.1.dev467 # git sha 9d733a9 | 18:28 |
clarkb | mordred: yes the fix for updating retired projects | 18:29 |
fungi | er pbr freeze via docker run says that i mean | 18:29 |
mordred | yah - but try it via exec | 18:29 |
mordred | since that'll be what manage-projects does - run will make a new container | 18:29 |
mordred | exec will use the existing gerrit one | 18:29 |
mordred | I think atm we're going to need to restart the gerrit container to pick up that jeepyb change | 18:29 |
clarkb | oh I thought wedid run not exec | 18:30 |
mordred | or - you could urn the command in /usr/local/bin/manage-projects but replace exec with run (and add an --rm) | 18:30 |
fungi | i was running `exec docker run ... pbr freeze` via a copy of the manage-projects wrapper script yeah | 18:30 |
mordred | oh! we do do run | 18:30 |
mordred | yeah- nevermind me - I for some idiotic reason thought we were execing (and planning to fix that) | 18:31 |
mordred | you should be fine :) | 18:31 |
fungi | anyway, that commit is too old i think | 18:31 |
fungi | so maybe we didn't build a new gerrit image when the jeepyb change merged, or i need to pull it | 18:31 |
mordred | yeah- you likely need to docker pull - and if that doesn't work - then we missed building the image on jeepyb change | 18:32 |
fungi | 9d733a9 is the previous commit before the fix | 18:32 |
fungi | well, i may as well check the zuul builds page | 18:32 |
*** redrobot has joined #opendev | 18:33 | |
fungi | we built system-config-promote-image-gerrit-2.13 after that change merged, so i guess it's just the pull we need | 18:33 |
mordred | yeah. I concur | 18:33 |
fungi | mordred: is it really just `sudo docker pull` on review.o.o then? no additional arguments? or do i need to specify the image name? | 18:34 |
fungi | and that won't restart the running container processes, right? | 18:34 |
mordred | either ... | 18:34 |
mordred | it will not | 18:34 |
mordred | docker pull opendevorg/gerrit:2.13 | 18:34 |
mordred | or | 18:34 |
fungi | ahh, yeah it needs the image name | 18:34 |
mordred | cd /etc/gerrit-compose ; docker-compose pull | 18:35 |
fungi | running the latter now | 18:35 |
fungi | #status log manually pulled updated gerrit image on review.o.o for recent jeepyb fix | 18:35 |
openstackstatus | fungi: finished logging | 18:36 |
mordred | clarkb, fungi: incidentally: https://review.opendev.org/#/c/725339/ should add metadata to our images that will let us inspect and see what change they were built from | 18:36 |
fungi | jeepyb==0.0.1.dev469 # git sha ab498db | 18:36 |
mordred | fungi: that seems better | 18:36 |
fungi | well, ab498db does not appear in jeepyb's master branch history | 18:37 |
fungi | wheere did that come from? | 18:37 |
mordred | that'll be the merge commit on the executor | 18:37 |
fungi | yes, metadata would be awesome, especially in cases like this | 18:37 |
mordred | yup | 18:37 |
fungi | oh, right, executor made a different merge commit than gerrit did | 18:37 |
mordred | yah - so yeah, I think the labels are going to be super helpful :) | 18:38 |
fungi | so we can't really expect merge commit shas to match between promoted images and git history | 18:38 |
fungi | at least not until we can have zuul push merge commits into gerrit | 18:38 |
mordred | yah | 18:38 |
fungi | anyway, 0.0.1.dev469 is 0.0.1.dev467 + 2 | 18:39 |
fungi | so the fix plus the merge commit for it | 18:39 |
fungi | okay, so one example of a lagging retirement was https://review.opendev.org/#/admin/projects/openstack/fuel-devops | 18:40 |
fungi | https://opendev.org/openstack/project-config/src/branch/master/gerrit/projects.yaml#L2985-L2987 does say it should have a read-only config | 18:42 |
fungi | so i'll run `sudo manage-projects openstack/fuel-devops` next and see if the status changes | 18:43 |
fungi | clarkb: mordred: sound good? | 18:43 |
fungi | s/status/state/ | 18:43 |
mordred | fungi: yes | 18:44 |
fungi | running | 18:45 |
fungi | and finished | 18:45 |
fungi | state: read only | 18:45 |
fungi | yay!!! | 18:45 |
fungi | so now i'll just run `sudo manage-projects` i guess to do them all? | 18:45 |
fungi | maybe i'll start a root screen session | 18:45 |
mordred | fungi: ++ | 18:46 |
fungi | --verbose is likely to overrun the buffer. should i use it anyway? | 18:46 |
fungi | or 2>&1 | tee something? | 18:46 |
fungi | i'll do that | 18:47 |
fungi | this is staged in a root screen session on review.o.o: | 18:48 |
fungi | manage-projects --verbose 2>&1 | tee manage-projects.2020-05-04.retirements.log | 18:48 |
* mordred joins | 18:48 | |
mordred | fungi: I agree - you have staged that | 18:48 |
* mordred is ready when you are | 18:48 | |
fungi | running | 18:49 |
clarkb | fungi: thanks sorry lunch is distracting me | 18:49 |
fungi | clarkb: summary is we needed new gerrit image pulled as you guessed | 18:50 |
fungi | and that merge commit shas in our containers don't match git history because they're executor/merger-constructed | 18:50 |
fungi | but that your fix works | 18:50 |
mordred | fungi: it at least doesn't look angry | 18:50 |
fungi | and we're in the full run in a root screen session on review.o.o now | 18:51 |
mordred | fungi: why did we just clone deb-cinder? | 18:51 |
mordred | is that the thing we have to do to properly retire things? | 18:51 |
fungi | clarkb's fix is to loop over the full project list and not just the not-retired projects list | 18:52 |
clarkb | mordred: I think we maintain a cache of things and ya to retire we'd have to populate the cache if it were missing | 18:52 |
fungi | though maybe if we don't want to cache we can make it smarter so that it skips any repos with a read-only state | 18:52 |
mordred | maybe - but meh, probably for the best at this point | 18:53 |
clarkb | ya we could probably optimize it more? | 18:53 |
fungi | that would in theory make un-retiring harder, but in reality that needs manual intervention anyway due to a different catch-22 | 18:53 |
fungi | (can't push updated gerrit config to a read-only repo, so need to add an api call to set the desired state out of band) | 18:54 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx https://review.opendev.org/722339 | 18:57 |
fungi | after this completes, i think i should run it a second time and time it so we can get a rough baseline for how long a normal triggered run should require | 18:58 |
fungi | after that, assuming no surprises, i can un-pause deployment jobs | 18:58 |
clarkb | fungi: ++ though I don't expect it will be much longer. The noop case is much faster | 18:58 |
clarkb | *much longer than before | 18:58 |
fungi | right, also retired repositories are something like 10% of our total repo count | 18:59 |
fungi | so skipping them wasn't a huge time savings anyway | 18:59 |
mnaser | mordred: we started the constraints support inside python-builder but never got around wrapping that up. do you think we can find time to work together on that? | 19:01 |
* mnaser is having problems building openstack images with python-builder due to that | 19:02 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx https://review.opendev.org/722339 | 19:02 |
mordred | mnaser: yes we can | 19:03 |
mordred | mnaser: I think we finally got multi-arch containers sorted | 19:03 |
mnaser | mordred: nice. i've been stuck a bit on finding ways to work around the constraints stuff (for now to unblock) but its not sustainable in the long term | 19:04 |
mnaser | turns out msgpack 1.0.0 breaks a lot of things : | 19:04 |
mnaser | :) | 19:04 |
mordred | mnaser: hah | 19:04 |
mordred | yeah - turns out contraints are important | 19:04 |
mnaser | and that was a fun one to find too.. | 19:04 |
mordred | mnaser: lemme unfog my brain after all this multi-arch, then I'll start poking at constraints again | 19:05 |
openstackgerrit | Mohammed Naser proposed opendev/system-config master: python-builder: drop # from line https://review.opendev.org/725374 | 19:18 |
mnaser | mordred: ^ something i caught in the midst of all of this too | 19:18 |
openstackgerrit | Merged zuul/zuul-jobs master: Add zuul labels to images and ability to user-define more https://review.opendev.org/725339 | 19:19 |
fungi | i wonder if the depsolver in pip will obsolete the need for constraints lists in situations like this | 19:19 |
mordred | fungi: it's a good question - although contraints still allows for a central single-point - whereas without constraints you'd have to make sure every consumer of msgpack had a version pin | 19:22 |
fungi | yep | 19:22 |
fungi | i mean, ideally they should if they're broken by it, but... | 19:22 |
clarkb | that was weird had high packet loss to my irc bouncer for a minute for 5 | 19:22 |
mordred | clarkb: it wanted you to take some time off | 19:23 |
fungi | okay, the manage-projects run completed. i'll rerun it without --verbose and time it | 19:23 |
mordred | mnaser: hah. nice | 19:23 |
clarkb | we are still seeing zuul-web memory use climb but we are nowhere near danger yet | 19:23 |
clarkb | (unfortunately I think that may be pointing to python3.8 not fixing it) | 19:24 |
fungi | or the problem not actually being a regression in the interpreter | 19:24 |
fungi | real 0m5.165s | 19:24 |
fungi | i'd call that fast enough that i don't care how much slower it got | 19:24 |
clarkb | fungi: thats the first or second run? | 19:24 |
fungi | second | 19:25 |
clarkb | and ya that seems quck enough for our purposes | 19:25 |
fungi | anybody want to spot-check anything before i un-pause deployments? | 19:25 |
mordred | clarkb: next thing to try woudl be disabling jemalloc I think | 19:25 |
clarkb | mordred: ++ | 19:25 |
mordred | clarkb: we should be able to do that just by setting LD_PRELOAD to '' in the docker-compose file | 19:25 |
clarkb | mordred: ok that should be easy enough to do. Also packet loss coming back again :( | 19:28 |
*** roman_g has quit IRC | 19:29 | |
fungi | #status log deployments unpaused by removing /home/zuul/DISABLE-ANSIBLE on bridge.o.o | 19:32 |
openstackstatus | fungi: finished logging | 19:32 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Fix siblings support in python-builder https://review.opendev.org/715717 | 19:32 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add constraints support to python-builder https://review.opendev.org/713972 | 19:32 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Don't pull and retag in buildx workflow https://review.opendev.org/725380 | 19:39 |
mordred | mnaser: do you have any convenient way to verify it that ^^ works? | 19:50 |
mordred | mnaser: also - I'm less sure about the siblings patch - we're apparently using siblings support in nodepool jobs, so I want to check in with ianw before we land that one | 19:51 |
mordred | I'm *pretty* sure it's right, and I think we might just be getting lucky in nodepool | 19:51 |
mnaser | mordred: i could build them locally.. i'm not using git right now to build things (for now): https://opendev.org/vexxhost/openstack-operator/src/branch/master/images/keystone/Dockerfile | 19:51 |
mnaser | mordred: ACTUALLY i have something | 19:51 |
mnaser | mordred: https://review.opendev.org/#/c/713975/ | 19:52 |
mordred | mnaser: oh - yeah - with a depends-on that should be a good test case | 19:52 |
mnaser | mordred: feel free to update that, i think we will need to copy upper-constraints from requirements though.. | 19:52 |
mnaser | or wget it in.. | 19:52 |
mordred | kk | 19:52 |
mordred | lemme update that patch | 19:52 |
mnaser | because upper-constraints.txt is not inside the repo | 19:53 |
*** roman_g has joined #opendev | 19:54 | |
mordred | mnaser: ok - updated that patch and added a pre-playbook to copy in the upper-constraints file | 19:56 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Clear LD_PRELOAD variable on zuul-web containers https://review.opendev.org/725384 | 20:02 |
clarkb | mordred: ^ I think that is what you were suggesting | 20:02 |
mordred | clarkb: yes - I thnk that's a great next thing to try | 20:13 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add constraints support to python-builder https://review.opendev.org/713972 | 20:21 |
mordred | mnaser: I pulled the siblings patch out from under the constraints patch - I think it's ditracting for now | 20:22 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Use tempfile in buildx build https://review.opendev.org/725387 | 20:29 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: ansible-lint: use matchplay instead of matchtask https://review.opendev.org/724910 | 20:33 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: use zj_image instead of image as loopvar https://review.opendev.org/725012 | 20:33 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: use zj_log_file instead of item as loop_var https://review.opendev.org/725013 | 20:34 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Check blocks recursively for loops https://review.opendev.org/724967 | 20:34 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Update ansible-lint-rules testsuite to only test with the relevant rule https://review.opendev.org/725014 | 20:34 |
*** hillpd has joined #opendev | 20:52 | |
clarkb | the LD_PRELOAD change should land soonish. I'll restart zuul-web again once that is in place | 20:52 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: fetch-logs-openshift: fix miss when replacing item with loop_var: zj_ https://review.opendev.org/725392 | 21:04 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: fetch-logs-openshift: fix miss when replacing item with loop_var: zj_ https://review.opendev.org/725392 | 21:08 |
openstackgerrit | Merged opendev/system-config master: Clear LD_PRELOAD variable on zuul-web containers https://review.opendev.org/725384 | 21:13 |
tosky | it looks like a very basic question, but... what is the API entry point for the zuul instance on zuul.openstack.org? | 21:15 |
clarkb | tosky: https://zuul.openstack.org/api | 21:17 |
tosky | clarkb: thanks! | 21:17 |
*** DSpider has quit IRC | 21:25 | |
clarkb | I'm going to restart zuul-web now without jemalloc LD_PRELOAD set | 21:32 |
clarkb | #status Log restarted zuul-web without LD_PRELOAD var set for jemalloc. | 21:33 |
openstackstatus | clarkb: finished logging | 21:33 |
clarkb | it seems incredibly stable over the last ~12 minutes | 21:45 |
clarkb | maybe the issue is jemalloc afterall | 21:46 |
clarkb | (need more data to be confident) | 21:46 |
clarkb | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all is what I'm looking at. Specifically that first graph and how level the line is now | 21:55 |
clarkb | I think if that holds overnight then maybe we drop it from our images entirely? | 21:55 |
fungi | tosky: be aware that's a legacy white-labeled api endpoint, the multi-tenant url for it is https://zuul.opendev.org/api so https://zuul.opendev.org/api/tenant/openstack/projects for example | 22:00 |
clarkb | I'm due for a bike ride and zuul-web looks stable for now s ogoing to pop out | 22:00 |
clarkb | back in a bit | 22:00 |
tosky | fungi: thanks; that part is clear in https://zuul-ci.org/docs/zuul/reference/web.html, but I missed the starting poin :) | 22:01 |
fungi | tosky: also the zuul dashboard hosts dynamic api docs: https://zuul.opendev.org/openapi | 22:03 |
tosky | that's useful, thanks | 22:03 |
ianw | infra-root: i'm not seeing we merged either fix for nb04 and ipv6 addresses ... can we do either https://review.opendev.org/#/c/725160/ or https://review.opendev.org/#/c/725157/ or both? | 22:13 |
*** tobiash has quit IRC | 22:13 | |
corvus | ianw: i forgot to +2 that after fixing it; +2 on 725157 now | 22:18 |
ianw | corvus: thanks; apropos prior discussion is centos + virtualenv ok now? | 22:19 |
openstackgerrit | Merged zuul/zuul-jobs master: fetch-logs-openshift: fix miss when replacing item with loop_var: zj_ https://review.opendev.org/725392 | 22:20 |
corvus | ianw: i'll check the volume release; you want to check the image build? | 22:23 |
corvus | ianw: Released volume mirror.centos successfully | 22:24 |
corvus | ianw: looks like we should be up to date there; i'll exit out of my manual flock, so mirror updates should continue | 22:24 |
ianw | corvus: yeah, it may be getting held up as nb04 has dropped out of zk due to the ipv6 literals coming back | 22:24 |
ianw | thanks for looking in on mirror | 22:24 |
ianw | 00:06:55:36 for the last centos-7 | 22:25 |
corvus | ianw: that sounds about right | 22:25 |
corvus | 7 hours ago sounds about like when i started my day :) | 22:26 |
ianw | virtualenv 20.0.20 | 22:26 |
ianw | virtualenv almost takes the record from dib for emergency point releases :) | 22:27 |
*** avass has quit IRC | 22:35 | |
*** rchurch has quit IRC | 22:36 | |
*** rchurch has joined #opendev | 22:39 | |
*** hashar has quit IRC | 22:42 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: use python2-pip when running under Python 2 https://review.opendev.org/724777 | 22:54 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial https://review.opendev.org/724788 | 22:54 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing https://review.opendev.org/724776 | 22:54 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: use python2-pip when running under Python 2 https://review.opendev.org/724777 | 23:10 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial https://review.opendev.org/724788 | 23:10 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing https://review.opendev.org/724776 | 23:10 |
*** mlavalle has quit IRC | 23:25 | |
clarkb | ianw: corvus does https://review.opendev.org/#/c/725160/1 imply the nodepool ansiblification and containering has landed? | 23:27 |
clarkb | I was waiting on that to happen to redo my system-config reorg change | 23:28 |
*** tosky has quit IRC | 23:28 | |
ianw | clarkb: only nb04 is affected afaik atm | 23:28 |
ianw | so in a word, no | 23:28 |
clarkb | got it | 23:30 |
clarkb | ianw: is it still desireable to land that one if the other has been approved? | 23:30 |
clarkb | zuul memory use looks very stable | 23:30 |
ianw | clarkb: i'm ... not sure; zuul may have problems if we reuse that zk writing function? i didn't look into that. maybe we would want a similar check in zuul if it doesn't already? | 23:32 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Stop using jemalloc in python base image https://review.opendev.org/725431 | 23:32 |
clarkb | infra-root ^ I'm pushing that now and will WIP it to collect more data from zuul01, but it is looking like jemalloc is a likely source of our problems there. | 23:33 |
clarkb | ianw for now the change is specific to nodepool configs right? | 23:34 |
ianw | clarkb: yes, that's the only place it writes out the zk hosts from the inventory ATM | 23:34 |
corvus | clarkb: awesome. i guess at the end of the day, that's not a shocking conclusion is it? at least, as a hypothesis, "different malloc borks memory usage" passes the sniff test. | 23:34 |
clarkb | corvus: ya | 23:34 |
ianw | clarbk: but i imagine in the final switch we would want to do similar for zuul | 23:35 |
clarkb | corvus: also worth noting the so version is different between xenial and our docker containers. It could be a bug in jemalloc | 23:35 |
clarkb | or a bug in python using jemalloc | 23:35 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: use python2-pip when running under Python 2 https://review.opendev.org/724777 | 23:38 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial https://review.opendev.org/724788 | 23:38 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing https://review.opendev.org/724776 | 23:38 |
*** calcmandan has quit IRC | 23:45 | |
*** calcmandan has joined #opendev | 23:46 | |
ianw | it doesn't seem 725157 deployed itself on nb04 ... looking | 23:46 |
ianw | https://zuul.openstack.org/build/cd7fc0ea5c694631a472ab3d491d346e was the last nodepool hourly run : @ 2020-05-04T23:04:14 | 23:48 |
ianw | ok, promote missed it https://zuul.opendev.org/t/zuul/build/f7b24f73bc4e4318a1cc42488493ee13 | 23:50 |
ianw | 2020-05-04T23:10:49 | 23:50 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!