Monday, 2020-05-04

openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: install python2-pip on SuSE when required
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing
ianwdoes anyone know how the nb04 config started using ipv6 addresses for the zk hosts?  i can't find any discussion on it afaics03:29
openstackgerritIan Wienand proposed opendev/system-config master: nodepool-base: Quote ipv6 literals for ZK hosts
ianwmordred / infra-root: ^ i see now we're overwriting the zk hosts; either this or or both should get builders connected again03:50
*** ykarel|away is now known as ykarel04:19
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: install python2-pip when running under Python 2
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing
*** ysandeep|afk is now known as ysandeep05:20
*** dpawlik has joined #opendev06:04
*** dpawlik has quit IRC06:04
ianwbtw virtualenv is broken on centos-7 : see
ianwthis makes testing of zuul-jobs related things fail, so for today i give up06:07
*** dpawlik has joined #opendev06:07
*** dpawlik has quit IRC06:07
*** dpawlik has joined #opendev06:08
*** ykarel is now known as ykarel|afk06:21
*** rchurch has quit IRC06:31
*** rchurch has joined #opendev06:32
*** rpittau|afk is now known as rpittau06:33
*** ykarel|afk is now known as ykarel06:37
*** DSpider has joined #opendev06:53
*** tosky has joined #opendev07:32
*** sshnaidm|off is now known as sshnaidm07:33
dpawlikhi. Is everything OK with mirroring centos and fedora? It seems like mirrors.centos and mirror.fedora was last time updated 6 days ago
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Fix fetch-sphinx-tarball fails
*** roman_g has joined #opendev08:10
*** ysandeep is now known as ysandeep|lunch08:35
*** dtantsur|afk is now known as dtantsur08:43
*** roman_g has quit IRC08:50
jrosseri am also having trouble with centos jobs where i get conflicting packages tying to install git-daemon
AJaegerI see el_7_7 and el_7_8 in there- was CentOS 7.8 released and we didn't mirror completely?09:03
*** roman_g has joined #opendev09:03
AJaegerinfra-root, please see jrosser's and dpawlik's comments on centos and fedora mirroring09:04
jrosserAJaeger: from my very brief poke at this a couple of days ago it didn't look like the git-daemon package i need was present in the place we mirror from09:06
jrosserand yes it seems like an incomplete mix of 7.7 and 7.809:07
*** ykarel is now known as ykarel|lunch09:22
jrosseroops i mean git-daemon package _wasnt_ present09:23
*** lpetrut has joined #opendev09:25
*** Dmitrii-Sh has joined #opendev09:25
*** roman_g has quit IRC09:38
*** panda|ruck is now known as panda|pto09:40
*** roman_g has joined #opendev09:55
*** ralonsoh has joined #opendev10:03
*** rpittau is now known as rpittau|bbl10:14
*** ykarel|lunch is now known as ykarel10:32
*** ysandeep|lunch is now known as ysandeep10:47
*** kevinz has quit IRC11:04
*** olaph has joined #opendev11:05
AJaegerinfra-root, donnyd , any idea what's up with openedge? looks down. See also
openstackgerritAndreas Jaeger proposed openstack/project-config master: Disable openedge
AJaegerproposal to disable for now ^11:15
donnydhrm... everything else is working fine11:17
donnydchecking now11:17
donnydinteresting... the mirror node was magically shut down11:18
AJaegerinfra-root, today's fires that I'm aware: 1) down ; 2) virtualenv on CentOS broken, new virtualenv release is out, we need new nodepool images; 3) CentOS 7 and Fedora mirrors are old, CentOS has partial update to 7.7 and needs fixing11:19
AJaegerdonnyd: thanks for looking!11:19
donnydinfra-root the mirrror at OE is fixed. the machine got shutdown somehow11:19
AJaegerdonnyd: thanks! So, that problems was solved quickly11:21
AJaeger#status log was down, donnyd restarted the node and openedge should be fine again11:21
donnyd9 minutes isn't too bad of a turn around time11:21
openstackstatusAJaeger: finished logging11:21
AJaegerdonnyd: 9 minutes is excellent ;)11:22
donnydAJaeger: yea I logged into the project and the instance was in "shutdown"11:22
donnydidk how.. but anyways its back online now11:22
*** ysandeep is now known as ysandeep|brb11:36
*** ysandeep|brb is now known as ysandeep11:52
*** rpittau|bbl is now known as rpittau12:22
*** ykarel is now known as ykarel|afk12:38
*** hashar has joined #opendev12:53
*** sgw has joined #opendev13:01
ttxhey everyone... was Gerrit restarted since we merged13:03
ttx(Apr 29 22:52)13:03
ttxneed to know if I can start moving things around on the GitHub side13:04
fricklerttx: I still see replication to github in the log, so I'd assume not13:25
AJaegerttx, it was not13:26
ttxok thanks! Keep me posted when it is :)13:26
*** hashar has quit IRC13:33
*** ralonsoh has quit IRC13:37
*** lpetrut has quit IRC13:49
*** ralonsoh has joined #opendev13:49
corvusi'm looking into the centos mirror issues13:58
corvusit looks like the volume is locked but no actual release transaction is in progress13:58
corvusit looks like it was updating afs02.dfw when it stopped14:00
fungiso not like related to the 2020-04-28 afs01.dfw outage14:05
fungier, not likely14:05
corvusfungi: oh, that probably was it actually.  the release command is run on afs01.dfw14:07
corvusi'm going to start a screen session on afs01.dfw, grab the mirror lock, unlock the afs volume, and start a release14:10
fungiif memory serves, it died in such a way that it was hanging clients rather than causing them to fail over to the other server, but due to a kernel panic (presumed to be from a host migration problem) it had to be rebooted14:10
*** ykarel|afk is now known as ykarel14:10
fungiso makes sense that the vos release command may have hung waiting for the server to respond14:11
corvusfungi: sorry, the vos release command was *issued* on afs01.dfw; it crashed in mid process.  so in this case it's afs02 waiting for afs01 to tell it to finish and unlock.14:12
corvusbasically the reverse14:12
fungioh, interesting14:13
fungiif that happened before/during the afs02 reboot, i wouldn't expect afs02 to think it was waiting on anything there, but maybe it's more stateful than i realize14:13
corvusafs02 has been up 167 days14:14
corvusThis is a completion of a previous release14:15
corvusStarting ForwardMulti from 536870962 to 536870962 on (full release).14:15
corvusthat's in progress now14:16
*** ysandeep is now known as ysandeep|brb14:18
fungioh, wait, it was afs01.dfw which got rebooted, caffeine not connecting this morning i guess14:20
fungiyeah, so i guess maybe it was in the middle of that when it died14:20
corvusinfra-root: i think that of AJaeger's 3 fires: #1 is done; #3 is in progress; that leaves #2 -- centos nodepool images14:33
corvusbefore i just poke nodepool to make new images -- does anyone understand why a new release of virtualenv would cause our existing centos images to break?14:38
corvusso it looks like there are 2 new releases at issue14:41
corvus.19 bad, and is what is on our current images14:41
corvus.18 and .20 good14:41
corvus#status log unlocked centos mirror openafs volume and manually started release14:45
openstackstatuscorvus: finished logging14:45
corvus#status log deleted centos-7-0000124082 image to force rebuild with newer virtualenv14:45
openstackstatuscorvus: finished logging14:45
*** hashar has joined #opendev14:45
*** panda|pto has quit IRC14:46
corvusinfra-root: that should mean that all 3 issues are being addressed14:47
corvuscentos-7-0000124083 is the replacement dib image14:47
corvusbuilding now14:47
*** mlavalle has joined #opendev14:48
*** hashar has quit IRC14:50
*** panda has joined #opendev14:50
fungiykarel: ^14:52
*** ysandeep|brb is now known as ysandeep14:52
ykarelfungi, corvus Thanks14:53
AJaegercorvus: thanks14:53
AJaegerdpawlik, jrosser, FYI, CentOS 7 mirror should be uptodate again.14:55
dpawlik\o/ AJaeger14:56
dpawlikthank you14:56
AJaegerdpawlik: corvus did the work, I just passed messages around ;)14:57
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: DNM: trigger registry tests
*** olaph has quit IRC14:58
dpawlikAJaeger, ah14:58
dpawlikso corvus++14:58
dpawlikcorvus, AJaeger how ofter do you refresh data in graphana? Just curious because there is still 6 days15:00
fungii just approved clarkb's 644432 fix which should correctly attempt to apply read-only settings to retired projects which were missing them15:01
fungithat may run longer than usual15:01
clarkbmordred: ^ I think that wont cause any issues other than out the manage projevts job potentially15:02
clarkbin which case we can run it again I suppose15:02
clarkbdpawlik: the graphana data is generated by a script I expect corvus manually fixed things and the udp packets for timing info werent sent15:03
mordredclarkb: ++15:04
fungidpawlik: it also may not update until the image builds are completed15:04
dpawlikclarkb, fungi thanks for explanation15:04
fungidpawlik: ykarel: the centos-7-0000124083 build corvus mentioned is getting logged at (self-signed ssl cert, sorry) and once that's done, it still has to get uploaded to our providers which also takes a few minutes15:08
ykarelfungi, Thanks15:10
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Do not use bare 'item' in build-container-image
openstackgerritBrian Haley proposed openstack/project-config master: Update Neutron grafana dashboard
*** ysandeep is now known as ysandeep|away15:13
corvusAJaeger, dpawlik: the release is still in progress; so i don't think the mirror is up to date yet15:23
dpawlikcorvus, ack15:24
clarkbno lists ooms since robots.txt was updated15:27
clarkbzuul scheduler looks like it might need a sigusr2 pair again15:28
AJaegerclarkb: want to restart gerrit some time so that we stop github replication?15:30
clarkbAJaeger: maybe? I think the jeepyb thing above is to address some part of that (fungi and ttx would know more than I if we are rready to update gerrit yet)15:37
fungithe gerrit restart would merely be to stop github replication, the jeepyb fix above isn't a blocker for that15:38
fungittx is separately working on tooling to build project-config changes to set which repositories should be replicating via zuul jobs, but the current set they're applied to is fairly comprehensive15:39
openstackgerritMerged opendev/jeepyb master: Inspect all configs in manage-projects
*** odyssey4me has joined #opendev15:44
*** ykarel is now known as ykarel|away15:47
*** diablo_rojo has joined #opendev15:47
*** hashar has joined #opendev15:51
*** dpawlik has quit IRC15:59
*** rpittau is now known as rpittau|afk16:08
openstackgerritMerged zuul/zuul-jobs master: go: Use 'block: ... always: ...' and failed_when instead of ignore_errors
openstackgerritMerged zuul/zuul-jobs master: ara-report: use failed_when: false instead of ignore_errors: true
clarkbfungi: mordred ^ it doesn't look like the jeepyb change landing caused infra-prod-manage-projects to run. Maybe that means we can run it manually without a timeout?16:18
*** smcginnis has quit IRC16:19
openstackgerritMerged zuul/zuul-jobs master: fetch-subunit-output: use failed_when: instead of ignore_errors:
clarkbalso where are we with zuul python3.8 iamges because if they aren't close maybe we should write an hourly cron to sigusr2 zuul :/16:20
fungiclarkb: good idea, i can do that after my current meeting maybe16:22
clarkbfwiw I'll plan to sigusr2 zuul after my meeting16:22
clarkbto hopefully reset the current trned16:22
mordredclarkb: latest zuul images should be on 3.816:24
mordredclarkb: landed - so restarting with a pull should have us on 3.816:24
clarkbcool so maybe this sigusr2 is the last one we need and we can schedule a restart to see if 3.8 is any better16:25
mordredclarkb: we also need a gerrit restart to pick up the github repl change16:25
mordredso maybe we do them around a similar time16:25
clarkbmordred: we should double check that that change landed wasn't affected by the docker image promotion bugs we were working through recently16:25
clarkb I don't know how to map that back to a change16:26
mordredclarkb: just click through on the sha:
mordredand at least in this case you can see it built with 3.816:28
*** smcginnis has joined #opendev16:28
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Do not use bare 'item' in build-container-image
mordredcorvus: ooh - maybe we should add a label to the images when we build them to tie them back to a change - docker build has a --label option to add additional ones at build time16:30
corvusmordred: ++16:31
mordredcorvus: maybe one for the change, and maybe one for the git sha of the change itself (not the merge commit) - and then maybe just one that says "built by" or something16:32
corvusmordred: ooh16:32
corvusmordred: could we put in a url to the build page?16:32
*** panda is now known as panda|pto16:35
mordredcorvus: yeah16:37
mordredcorvus: we can put anything we want to :)16:37
mordredcorvus: what's the best way to get the build url in a job?16:47
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul-jobs master: Enable yamllint
clarkbinfra-root on closer examination the bulk of the memory use on zuul01 is zuul-web (17GB ish) not zuul-scheduler (3GB ish)16:50
clarkbrestarting zuul-web is easy compared to scheduler can I just restart that one on the new python3.8 image?16:51
fungiand sending any signals to zuul-web seems to result in stopping the process, right?16:51
clarkbmordred: ^16:51
corvusclarkb: seems like we may have disproven the theory about having a memleak in latest cherrypy16:51
clarkbfungi: yes, is related16:51
fungii expect just restarting it should be fine, even on a different python version16:51
fungiooh, you found the handler problem i guess16:52
clarkblooks like tobiash has a good suggestion I need to consider16:53
corvusmordred: unsure; there's a build.uuid variable; and the artifact promote job does some stuff with the api16:53
clarkbcorvus: I guess? it could be an interaction with cherrpy and newer python since the sigusr2 seemed to unstick the scheduler16:53
corvusclarkb: we're running old-cherrypy on 3.7 though; much like we were before the container restarts16:54
clarkbcorvus: correct, but before we had old cherrypy + python 3.516:54
clarkbI'm suggesting that python3.7 is the issue here too16:54
clarkbwhich is why restarting it on 3.8 may be useful16:55
fungiseems likely the same presumed gc issue could be impacting multiple daemons16:55
clarkbfungi: yup16:55
corvusclarkb: agreed; my point is that we have eliminated cherrypy alone as the cause16:55
mordredcorvus:         download_artifact_api: "{{ zuul.tenant }}"16:56
mordredcorvus: we seem to just hardcode base api in the docs promote job16:56
clarkblooking at docker image ls I think if I do cd /etc/zuul-web ; sudo docker-compose down && sudo docker-compose up -d we'll be running zuul-web on python3.816:57
corvusmordred: since the build job is opendev specific, we can probably do that there16:57
corvusclarkb: ++16:57
*** dtantsur is now known as dtantsur|afk16:57
corvusclarkb: can't hurt to do an extra docker-compose pull before starting though16:57
clarkbcorvus: k16:58
mordredcorvus: the build-docker-image role isn't - and I think there's several generic things we can do16:58
mordredcorvus: lemme push up what I've got so far and we can go from there16:58
clarkbalright I'll run a pull. down. then up -d in /etc/zuul-web now16:58
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: DNM Check to see if images from intermediate work
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Write a buildkitd config file pointing to buildset registry
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Write buildkitd.toml in use-buildset-registry
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add zuul labels to images and ability to user-define more
clarkb#status log Restarted zuul-web and zuul-fingergw on new container images on zuul01. We were running out of memory due to leak in zuul-web which may be caused by python3.7 and new images provide python3.817:00
openstackstatusclarkb: finished logging17:00
*** ralonsoh has quit IRC17:07
clarkbmemory use on zuul01 has been steadily climbing since the erstart. We'll haev to wait and see if we plateau17:52
clarkbmordred: fungi: what ist he process for running that playbook manually? do we need to lock anything?17:54
clarkbor maybe we can trigger the job directly in zuul (that would be subject to timeouts but wouldn't have lock issues)17:54
*** mrunge_ has joined #opendev17:55
*** mrunge has quit IRC17:56
fungiclarkb: for manage-projects? i was expecting to just fire the command locally on review.o.o instead of from ansible17:57
fungiusing docker exec or however ansible has been calling it17:58
clarkbfungi: ya and oh ya that will work17:58
mordredclarkb: yeah - what fungi said17:58
clarkbI guess my concern is that if we land projects.yaml updates we could have competing processes17:58
mordredalthough you can also touch the lockfile on bridge and run the playbook from the system-config dir17:58
clarkbconfig-core ^ maybe hold off on landing new projects until we run a manage-projects by hand17:58
mordredclarkb: /home/zuul/DISABLE-ANSIBLE17:59
mordredclarkb: touch that on bridge and it'll prevent jobs from running - they have an hour timeout - so they'll resume once you rm it17:59
fungialternatively, we can put review.o.o in the temporary disable list18:03
clarkbfungi: that might be simpler?18:04
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Write buildkitd.toml in use-buildset-registry
clarkbfwiw I'm not planning on doing it since you volunteered, but did want to have a short think about whether or not we needed to put safety glasses on first18:04
fungiyes, absolutely18:04
fungimy last scheduled meeting of the day just wrapped up, so catching up on a few urgent conversations and will start on that18:05
mordredno - don't put it in the disable list18:05
mordredtouch the lock file18:05
mordredyou need to stop zuul from doing the things18:05
mordredit's what it's there for :)18:06
*** sshnaidm is now known as sshnaidm|afk18:07
mordredor - I guess - honestly it's probably fine - ignore me18:07
clarkbit should also be fine if we don't approve any new proejct additions18:08
mordredbut just saying "touch /home/zuul/DISABLE-ANSIBLE" should cover all the bases and also not cause anything to run in a half-configured state18:08
clarkbfungi: also as a sanity check you can run manage-projects against a specific project or three first before doing the whole list18:08
clarkbfungi: maybe run it against a retired project that hasn't had an acl update and a non retired project and make sure we get the expected results?18:08
fungiyeah, i can do that18:11
fungimordred: `touch /home/zuul/DISABLE-ANSIBLE` on bridge will stop zuul from deploying anything to any server though, right? do we need to worry about it missing events then for other stuff we'd have deployed from unrelated changes?18:12
clarkbI never had breakfast. At this point I'll call it early lunch. Back in a bit18:13
fungii scarfed some lentil chips and hummus while in a meeting18:13
mordredfungi: it backs up18:18
mordredfungi: so - the first job zuul enqueues will wait for up to an hour for the file to go away (and the jobs behind it will just be queued up in zuul)18:18
mordredfungi: we might start missing things if it's in place for more than an hour - but at that point we should probably be disabling hosts and stuff18:19
mordredit's basically a big pause button18:19
fungioh, okay that helps18:20
fungiat the mere push of a single button!18:22
fungithe beautiful shiny button18:22
fungithe jolly candy-like button18:22
* fungi can't hold out, no sir-ee18:23
fungipushing it now18:23
fungi#status log temporarily paused ansible deploys from zuul by touching /home/zuul/DISABLE-ANSIBLE on bridge.o.o18:24
openstackstatusfungi: finished logging18:24
clarkbfungi: mordred you may need to pull a new image too?18:24
fungii'll check18:25
fungiwe're baking jeepyb into the gerrit image, or installing it into a separate image? i'm supposing the former since some of the gerrit hook scripts call into it18:25
mordredit's in the gerrit image18:27
fungiand yeah, /usr/local/bin/manage-projects is a wrapper calling docker exec18:27
mordredwe should maybe also just make a jeepyb image18:27
mordredand use that for manage-projects but with a similar set of volume mounts18:27
mordredso that we can update jeepyb independent of gerrit18:28
mordredfungi, clarkb : I may have missed a thing - we have a jeepyb update that we need for manage-projects?18:28
fungiocker says jeepyb==0.0.1.dev467  # git sha 9d733a918:28
clarkbmordred: yes the fix for updating retired projects18:29
fungier pbr freeze via docker run says that i mean18:29
mordredyah - but try it via exec18:29
mordredsince that'll be what manage-projects does - run will make a new container18:29
mordredexec will use the existing gerrit one18:29
mordredI think atm we're going to need to restart the gerrit container to pick up that jeepyb change18:29
clarkboh I thought wedid run not exec18:30
mordredor - you could urn the command in /usr/local/bin/manage-projects but replace exec with run (and add an --rm)18:30
fungii was running `exec docker run ... pbr freeze` via a copy of the manage-projects wrapper script yeah18:30
mordredoh! we do do run18:30
mordredyeah- nevermind me - I for some idiotic reason thought we were execing (and planning to fix that)18:31
mordredyou should be fine :)18:31
fungianyway, that commit is too old i think18:31
fungiso maybe we didn't build a new gerrit image when the jeepyb change merged, or i need to pull it18:31
mordredyeah- you likely need to docker pull - and if that doesn't work - then we missed building the image on jeepyb change18:32
fungi9d733a9 is the previous commit before the fix18:32
fungiwell, i may as well check the zuul builds page18:32
*** redrobot has joined #opendev18:33
fungiwe built system-config-promote-image-gerrit-2.13 after that change merged, so i guess it's just the pull we need18:33
mordredyeah. I concur18:33
fungimordred: is it really just `sudo docker pull` on review.o.o then? no additional arguments? or do i need to specify the image name?18:34
fungiand that won't restart the running container processes, right?18:34
mordredeither ...18:34
mordredit will not18:34
mordreddocker pull opendevorg/gerrit:2.1318:34
fungiahh, yeah it needs the image name18:34
mordredcd /etc/gerrit-compose ; docker-compose pull18:35
fungirunning the latter now18:35
fungi#status log manually pulled updated gerrit image on review.o.o for recent jeepyb fix18:35
openstackstatusfungi: finished logging18:36
mordredclarkb, fungi: incidentally: should add metadata to our images that will let us inspect and see what change they were built from18:36
fungijeepyb==0.0.1.dev469  # git sha ab498db18:36
mordredfungi: that seems better18:36
fungiwell, ab498db does not appear in jeepyb's master branch history18:37
fungiwheere did that come from?18:37
mordredthat'll be the merge commit on the executor18:37
fungiyes, metadata would be awesome, especially in cases like this18:37
fungioh, right, executor made a different merge commit than gerrit did18:37
mordredyah - so yeah, I think the labels are going to be super helpful :)18:38
fungiso we can't really expect merge commit shas to match between promoted images and git history18:38
fungiat least not until we can have zuul push merge commits into gerrit18:38
fungianyway, 0.0.1.dev469 is 0.0.1.dev467 + 218:39
fungiso the fix plus the merge commit for it18:39
fungiokay, so one example of a lagging retirement was
fungi does say it should have a read-only config18:42
fungiso i'll run `sudo manage-projects openstack/fuel-devops` next and see if the status changes18:43
fungiclarkb: mordred: sound good?18:43
mordredfungi: yes18:44
fungiand finished18:45
fungistate: read only18:45
fungiso now i'll just run `sudo manage-projects` i guess to do them all?18:45
fungimaybe i'll start a root screen session18:45
mordredfungi: ++18:46
fungi--verbose is likely to overrun the buffer. should i use it anyway?18:46
fungior 2>&1 | tee something?18:46
fungii'll do that18:47
fungithis is staged in a root screen session on review.o.o:18:48
fungimanage-projects --verbose 2>&1 | tee manage-projects.2020-05-04.retirements.log18:48
* mordred joins18:48
mordredfungi: I agree - you have staged that18:48
* mordred is ready when you are18:48
clarkbfungi: thanks sorry lunch is distracting me18:49
fungiclarkb: summary is we needed new gerrit image pulled as you guessed18:50
fungiand that merge commit shas in our containers don't match git history because they're executor/merger-constructed18:50
fungibut that your fix works18:50
mordredfungi: it at least doesn't look angry18:50
fungiand we're in the full run in a root screen session on review.o.o now18:51
mordredfungi: why did we just clone deb-cinder?18:51
mordredis that the thing we have to do to properly retire things?18:51
fungiclarkb's fix is to loop over the full project list and not just the not-retired projects list18:52
clarkbmordred: I think we maintain a cache of things and ya to retire we'd have to populate the cache if it were missing18:52
fungithough maybe if we don't want to cache we can make it smarter so that it skips any repos with a read-only state18:52
mordredmaybe - but meh, probably for the best at this point18:53
clarkbya we could probably optimize it more?18:53
fungithat would in theory make un-retiring harder, but in reality that needs manual intervention anyway due to a different catch-2218:53
fungi(can't push updated gerrit config to a read-only repo, so need to add an api call to set the desired state out of band)18:54
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx
fungiafter this completes, i think i should run it a second time and time it so we can get a rough baseline for how long a normal triggered run should require18:58
fungiafter that, assuming no surprises, i can un-pause deployment jobs18:58
clarkbfungi: ++ though I don't expect it will be much longer. The noop case is much faster18:58
clarkb*much longer than before18:58
fungiright, also retired repositories are something like 10% of our total repo count18:59
fungiso skipping them wasn't a huge time savings anyway18:59
mnasermordred: we started the constraints support inside python-builder but never got around wrapping that up.  do you think we can find time to work together on that?19:01
* mnaser is having problems building openstack images with python-builder due to that19:02
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Support multi-arch image builds with docker buildx
mordredmnaser: yes we can19:03
mordredmnaser: I think we finally got multi-arch containers sorted19:03
mnasermordred: nice.  i've been stuck a bit on finding ways to work around the constraints stuff (for now to unblock) but its not sustainable in the long term19:04
mnaserturns out msgpack 1.0.0 breaks a lot of things :19:04
mordredmnaser: hah19:04
mordredyeah - turns out contraints are important19:04
mnaserand that was a fun one to find too..19:04
mordredmnaser: lemme unfog my brain after all this multi-arch, then I'll start poking at constraints again19:05
openstackgerritMohammed Naser proposed opendev/system-config master: python-builder: drop # from line
mnasermordred: ^ something i caught in the midst of all of this too19:18
openstackgerritMerged zuul/zuul-jobs master: Add zuul labels to images and ability to user-define more
fungii wonder if the depsolver in pip will obsolete the need for constraints lists in situations like this19:19
mordredfungi: it's a good question - although contraints still allows for a central single-point - whereas without constraints you'd have to make sure every consumer of msgpack had a version pin19:22
fungii mean, ideally they should if they're broken by it, but...19:22
clarkbthat was weird had high packet loss to my irc bouncer for a minute for 519:22
mordredclarkb: it wanted you to take some time off19:23
fungiokay, the manage-projects run completed. i'll rerun it without --verbose and time it19:23
mordredmnaser: hah. nice19:23
clarkbwe are still seeing zuul-web memory use climb but we are nowhere near danger yet19:23
clarkb(unfortunately I think that may be pointing to python3.8 not fixing it)19:24
fungior the problem not actually being a regression in the interpreter19:24
fungireal 0m5.165s19:24
fungii'd call that fast enough that i don't care how much slower it got19:24
clarkbfungi: thats the first or second run?19:24
clarkband ya that seems quck enough for our purposes19:25
fungianybody want to spot-check anything before i un-pause deployments?19:25
mordredclarkb: next thing to try woudl be disabling jemalloc I think19:25
clarkbmordred: ++19:25
mordredclarkb: we should be able to do that just by setting LD_PRELOAD to '' in the docker-compose file19:25
clarkbmordred: ok that should be easy enough to do. Also packet loss coming back again :(19:28
*** roman_g has quit IRC19:29
fungi#status log deployments unpaused by removing /home/zuul/DISABLE-ANSIBLE on bridge.o.o19:32
openstackstatusfungi: finished logging19:32
openstackgerritMonty Taylor proposed opendev/system-config master: Fix siblings support in python-builder
openstackgerritMonty Taylor proposed opendev/system-config master: Add constraints support to python-builder
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Don't pull and retag in buildx workflow
mordredmnaser: do you have any convenient way to verify it that ^^ works?19:50
mordredmnaser: also - I'm less sure about the siblings patch - we're apparently using siblings support in nodepool jobs, so I want to check in with ianw before we land that one19:51
mordredI'm *pretty* sure it's right, and I think we might just be getting lucky in nodepool19:51
mnasermordred: i could build them locally.. i'm not using git right now to build things (for now):
mnasermordred: ACTUALLY i have something19:51
mordredmnaser: oh - yeah - with a depends-on that should be a good test case19:52
mnasermordred: feel free to update that, i think we will need to copy upper-constraints from requirements though..19:52
mnaseror wget it in..19:52
mordredlemme update that patch19:52
mnaserbecause upper-constraints.txt is not inside the repo19:53
*** roman_g has joined #opendev19:54
mordredmnaser: ok - updated that patch and added a pre-playbook to copy in the upper-constraints file19:56
openstackgerritClark Boylan proposed opendev/system-config master: Clear LD_PRELOAD variable on zuul-web containers
clarkbmordred: ^ I think that is what you were suggesting20:02
mordredclarkb: yes - I thnk that's a great next thing to try20:13
openstackgerritMonty Taylor proposed opendev/system-config master: Add constraints support to python-builder
mordredmnaser: I pulled the siblings patch out from under the constraints patch - I think it's ditracting for now20:22
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Use tempfile in buildx build
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: ansible-lint: use matchplay instead of matchtask
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: use zj_image instead of image as loopvar
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: use zj_log_file instead of item as loop_var
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Check blocks recursively for loops
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Update ansible-lint-rules testsuite to only test with the relevant rule
*** hillpd has joined #opendev20:52
clarkbthe LD_PRELOAD change should land soonish. I'll restart zuul-web again once that is in place20:52
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-logs-openshift: fix miss when replacing item with loop_var: zj_
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-logs-openshift: fix miss when replacing item with loop_var: zj_
openstackgerritMerged opendev/system-config master: Clear LD_PRELOAD variable on zuul-web containers
toskyit looks like a very basic question, but... what is the API entry point for the zuul instance on
toskyclarkb: thanks!21:17
*** DSpider has quit IRC21:25
clarkbI'm going to restart zuul-web now without jemalloc LD_PRELOAD set21:32
clarkb#status Log restarted zuul-web without LD_PRELOAD var set for jemalloc.21:33
openstackstatusclarkb: finished logging21:33
clarkbit seems incredibly stable over the last ~12 minutes21:45
clarkbmaybe the issue is jemalloc afterall21:46
clarkb(need more data to be confident)21:46
clarkb is what I'm looking at. Specifically that first graph and how level the line is now21:55
clarkbI think if that holds overnight then maybe we drop it from our images entirely?21:55
fungitosky: be aware that's a legacy white-labeled api endpoint, the multi-tenant url for it is so for example22:00
clarkbI'm due for a bike ride and zuul-web looks stable for now s ogoing to pop out22:00
clarkbback in a bit22:00
toskyfungi: thanks; that part is clear in, but I missed the starting poin :)22:01
fungitosky: also the zuul dashboard hosts dynamic api docs:
toskythat's useful, thanks22:03
ianwinfra-root: i'm not seeing we merged either fix for nb04 and ipv6 addresses ... can we do either or or both?22:13
*** tobiash has quit IRC22:13
corvusianw: i forgot to +2 that after fixing it; +2 on 725157 now22:18
ianwcorvus: thanks; apropos prior discussion is centos + virtualenv ok now?22:19
openstackgerritMerged zuul/zuul-jobs master: fetch-logs-openshift: fix miss when replacing item with loop_var: zj_
corvusianw: i'll check the volume release; you want to check the image build?22:23
corvusianw: Released volume mirror.centos successfully22:24
corvusianw: looks like we should be up to date there; i'll exit out of my manual flock, so mirror updates should continue22:24
ianwcorvus: yeah, it may be getting held up as nb04 has dropped out of zk due to the ipv6 literals coming back22:24
ianwthanks for looking in on mirror22:24
ianw00:06:55:36 for the last centos-722:25
corvusianw: that sounds about right22:25
corvus7 hours ago sounds about like when i started my day :)22:26
ianwvirtualenv 20.0.2022:26
ianwvirtualenv almost takes the record from dib for emergency point releases :)22:27
*** avass has quit IRC22:35
*** rchurch has quit IRC22:36
*** rchurch has joined #opendev22:39
*** hashar has quit IRC22:42
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: use python2-pip when running under Python 2
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: use python2-pip when running under Python 2
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing
*** mlavalle has quit IRC23:25
clarkbianw: corvus does imply the nodepool ansiblification and containering has landed?23:27
clarkbI was waiting on that to happen to redo my system-config reorg change23:28
*** tosky has quit IRC23:28
ianwclarkb: only nb04 is affected afaik atm23:28
ianwso in a word, no23:28
clarkbgot it23:30
clarkbianw: is it still desireable to land that one if the other has been approved?23:30
clarkbzuul memory use looks very stable23:30
ianwclarkb: i'm ... not sure; zuul may have problems if we reuse that zk writing function?  i didn't look into that.  maybe we would want a similar check in zuul if it doesn't already?23:32
openstackgerritClark Boylan proposed opendev/system-config master: Stop using jemalloc in python base image
clarkbinfra-root ^ I'm pushing that now and will WIP it to collect more data from zuul01, but it is looking like jemalloc is a likely source of our problems there.23:33
clarkbianw for now the change is specific to nodepool configs right?23:34
ianwclarkb: yes, that's the only place it writes out the zk hosts from the inventory ATM23:34
corvusclarkb: awesome.  i guess at the end of the day, that's not a shocking conclusion is it?  at least, as a hypothesis, "different malloc borks memory usage" passes the sniff test.23:34
clarkbcorvus: ya23:34
ianwclarbk: but i imagine in the final switch we would want to do similar for zuul23:35
clarkbcorvus: also worth noting the so version is different between xenial and our docker containers. It could be a bug in jemalloc23:35
clarkbor a bug in python using jemalloc23:35
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: use python2-pip when running under Python 2
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: Install backported pip for Xenial
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Add plain nodes to testing
*** calcmandan has quit IRC23:45
*** calcmandan has joined #opendev23:46
ianwit doesn't seem 725157 deployed itself on nb04 ... looking23:46
ianw was the last nodepool hourly run : @ 2020-05-04T23:04:1423:48
ianwok, promote missed it

Generated by 2.17.2 by Marius Gedminas - find it at!