Monday, 2021-09-27

opendevreviewSteve Baker proposed openstack/diskimage-builder master: Fix lower constraints
opendevreviewSteve Baker proposed openstack/diskimage-builder master: Fix lower constraints
opendevreviewSteve Baker proposed openstack/diskimage-builder master: Fix lower constraints
*** ysandeep|out is now known as ysandeep04:43
*** jpena|off is now known as jpena07:29
*** ykarel is now known as ykarel|lunch07:56
*** ysandeep is now known as ysandeep|lunch08:37
*** ykarel|lunch is now known as ykarel09:00
*** ysandeep|lunch is now known as ysandeep09:30
*** bhagyashris is now known as bhagyashris|rover09:32
*** ykarel is now known as ykarel|afk09:53
*** mazzy5 is now known as mazzy10:12
*** bhagyashris is now known as bhagyashris|rover10:38
*** ykarel|afk is now known as ykarel10:53
*** dviroel|out is now known as dviroel11:30
*** jpena is now known as jpena|lunch11:32
*** jpena|lunch is now known as jpena12:25
*** marios is now known as marios|call13:06
*** marios|call is now known as marios13:59
opendevreviewDanni Shi proposed openstack/diskimage-builder master: Update keylime-agent and tpm-emulator elements
*** redrobot is now known as Guest112915:09
opendevreviewDr. Jens Harbott proposed openstack/project-config master: Fix neutron-dynamic-routing grafana dashboard
clarkbfungi: catching up now, do you think we are still good to approve and this morning with plans to restart gerrit on the updated theme later today?15:33
fungiclarkb: yeah, i think it should be fine, i expect low impact and openstack hasn't really reached a critical point in the xena release work where i'd be uncomfortable restarting for those15:35
clarkbfungi: should also be safe at this point (but not its children) if you want to review and possibly approve that one15:35
clarkbthat is the gitea 1.14.7 upgrade15:35
clarkbfungi: ok I'll approve the gerrit theme changes now15:36
clarkbcorvus: on the zuul side of things I think we're running a locally hacked pin back to 4.9.0? Do we need to undo or clean up anything related to that to go back to normal?15:37
clarkbI think the expected fixes have all landed into the zuul codebase at this point15:37
opendevreviewMerged openstack/project-config master: Fix neutron-dynamic-routing grafana dashboard
clarkbLooking at zuul queues things seem generally happy. No large fallout issues like Friday :)15:41
corvusclarkb: we just tagged the images locally, so it's probably already auto-reverted (but we haven't restarted), but regardless, i'll run the pull playbook before restarting.  and yeah, i'd like to do that today15:41
clarkbgot it so docker tags unlike git tags will move appropriately on pull15:42
corvusyeah they behave more like git branches as far as that goes15:43
clarkbgerrit replication queues look sane too. At this point I think we can consider that issue addressed (thank you mnaser!)15:44
*** marios is now known as marios|out15:47
*** ysandeep is now known as ysandeep|out16:12
mnaserinfra-root: does anyone know where i can find the script that runs to generate the wheels for mirrors?16:37
clarkbI always have to dig it up. Give me a couple minutes and I should be able to find it16:37
*** jpena is now known as jpena|off16:38
mnaserbash magic, thank you clarkb 16:40
fungi is the meat of it16:42
fungiin case you want to look at some python magic16:42
opendevreviewMerged opendev/system-config master: Upgrade gitea to 1.14.7
opendevreviewMerged opendev/system-config master: gerrit: copy static files directly into container image
opendevreviewMerged opendev/system-config master: gerrit: host logo in static files
clarkbthe gitea upgrades have completed looks happy17:39
clarkbcorvus: do we need to land before doing zuul restarst?17:51
corvusclarkb: i think it's not important, but let's go ahead and land it.  i'd like to afk for a bit then maybe restart 30m from now or later...17:58
corvusi +wd it17:58
clarkbReleases looks quiet but I'll let them know we're likely to restart zuul and gerrit this afternoon pst18:01
fungi#status log Deleted,,, and at smarcet's request, after making a snapshot image of openstackid01 for posterity18:04
opendevstatusfungi: finished logging18:04
clarkbfungi: the review updates have happened. We should be good to restart gerrit whenever external factors are happy for it18:07
clarkbwe might want to do a docker compose pull first just to double check that, but ya should be ready18:08
opendevreviewMerged openstack/project-config master: Remove from our zuul config
fungicool, i'm doing some engagement stats reporting with long-running queries against the server, but can rerun them later18:08
clarkbselfishly I'm thinking after lunch so that I can eat lunch without any potentail impacts18:08
clarkbfungi: oh I'm happy to wait for those to finish18:08
fungiahh, yeah that works too18:08
clarkbcool I'll check in around then and we can make sure your queries are done18:09
fungibut yeah, lmk when you're ready, and i'll do the docker pull18:09
clarkband I've notified the openstack release team that gerrit and zuul restarts are likely in the near future. Will give them an update when we're ready to do each18:09
fungi"opendevorg/gerrit   3.2       43cea155d566   2 hours ago    793MB"18:10
opendevreviewMarco Vaschetto proposed openstack/diskimage-builder master: Allowing ubuntu element use local image
clarkbok lunch has been consumed. I should be in a good spot to do restarts in a few minutes if nowish works?19:29
clarkbfungi: corvus: do we want to try and do zuul and gerrit together or just go ahead and do gerrit since we're basically ready and it is quick?19:45
opendevreviewClark Boylan proposed opendev/system-config master: Upgrade gitea to 1.15.3
opendevreviewClark Boylan proposed opendev/system-config master: DNM force gitea failure for interaction
fungii'm on hand to help with it now, my queries are finished19:56
clarkbcool maybe we just do gerrit really quickly? since I don't think we really need to coordinate that one iwth zuul?19:57
fungiyeah, want me to restart it or are you doing that?19:58
clarkbfungi: if you want to do it go for it. Should we do a pull first?19:58
fungii can do another just to be sure19:58
fungi"opendevorg/gerrit   3.2       43cea155d566   4 hours ago    793MB"19:59
fungisame one i pulled earlier19:59
clarkbbut there is another from 4 hours ago. Now I wonder if we tagged them out of sequence?19:59
clarkblets dobule check on that really quickly19:59
fungistatus notice The Gerrit service on is being restarted briefly for configuration updates and should return to service momentarily20:00
fungithat's what i propose we notify with when we restart20:00
clarkb shows the change was 809513 which is the second in the sequence20:01
clarkbI always struggle to map that to the image id that docker image list shows (you'd think being abel to match an image locally to an image on the repo would be a required feature of such a system)20:01
clarkbok if you image inspect then you can get the sha25620:02
clarkbthat image tagged 3.2 is for the second change in the sequence. We should be good.20:02
clarkbfungi: that notice lgtm20:02
corvusclarkb: fungi i'm here and ready to restart zuul20:03
corvusit looks like you have not started gerrit restart?20:03
clarkbcorvus: correct gerrit hasn't happened yet as far as I can tell20:03
clarkbBut I think we are ready to do gerrit, can wait to coordinate with zuul20:04
clarkbfungi: ^20:04
fungiyeah, i can wait if we want to sync them20:04
corvusi'm ready now; i'll wait for fungi to give me a 'go' signal?20:05
fungishall we tweak the status notice?20:05
fungistatus notice Gerrit and Zuul services are being restarted briefly for configuration and code updates but should return to service momentarily20:06
fungisomething like that?20:06
fungicorvus: is there any sequencing we need to do between zuul and gerrit?20:07
fungilike do i stop and start gerrit while you're restarting zuul, or do i need to wait for zuul to go down, do i need to pause between stopping and starting?20:07
corvusi think: stop zuul, stop gerrit, start gerrit, start zuul20:07
fungiperfect, i'll send the status notice now and corvus you can go ahead with stopping zuul20:08
fungi#status notice Gerrit and Zuul services are being restarted briefly for configuration and code updates but should return to service momentarily20:08
opendevstatusfungi: sending notice20:08
-opendevstatus- NOTICE: Gerrit and Zuul services are being restarted briefly for configuration and code updates but should return to service momentarily20:08
fungii have the command readied to stop gerrit once you confirm it's safe to do so20:08
corvusfungi: zuul is stopped20:09
fungistopping and starting gerrit20:09
fungigerrit should be on its way back up now20:09
fungigerrit webui is loading for me20:10
fungiclarkb: ^ lgty?20:10
clarkbI think gerrit is happy except it isn't loading the logo as expected20:10
clarkbthat isn't fatal and we can probably proceed20:10
fungiyeah, i'm seeing the same which is why i asked20:11
clarkbwe should check the cla loads?20:11
corvusi'm standing by with zuul (as i think a gerrit restart while zuul is starting would be very messy)20:12
clarkbya I see the issue20:12
clarkbwe put the files in /var/gerrit/ and not /var/gerrit/static20:12
fungican we fix it up on the fly with the docker-compose file or do we need to roll back to the old image?20:12
clarkbfungi: I think we can mv them in the container image itslef20:12
clarkbthen work on a fix with a new image20:12
clarkbwhat I don't undersatnd is we tested this I thought20:13
clarkblike literally have a test to fetch the file20:13
fungiokay, so roll forward with the zuul start and we can tweak the gerrit container's fs live for now?20:13
clarkbya I think so. Let me copy the files now20:14
corvusokay, starting zuul now20:14
fungiand yeah, looks like we could just docker exec gerrit mv ...20:14
clarkbya I did that but I did cp so we can see where the files ended up in the build20:15
corvusi think we may need to clear the zk state20:15
clarkb loads now20:15
clarkband the logo loads too20:15
fungiand the logo, yep20:15
fungicorvus: need help with the zk flush?20:16
clarkbcla.html  opendev-sm.png  robots.txt  system-cla.html  usg-cla.html <- those are the files I copied to static/20:16
corvuswhat's the best way to run a "zuul" command when the container isn't running? :)20:16
corvusi guess i'm going to need to do a "docker run --rm " with the image20:16
clarkbya that sounds right20:16
corvusoh but we also need the volumes mounted....20:17
corvusso maybe a docker-compose run20:17
corvusdocker-compose run --rm scheduler zuul --help works20:18
clarkbre testing we did the test for paste but not for review20:18
corvusso i will run docker-compose run scheduler zuul delete-state20:18
fungicorvus: sounds good, thanks20:19
corvusrather: docker-compose run --rm scheduler zuul delete-state20:19
corvusrunning; it's not fast.20:19
corvusoops, i think i forget to stop the rest of zuul when running that; i'll run it again20:22
corvus(i got a kazoo notemptyerror)20:22
fungithat does seem to imply zuul was in the process of trying to start while you were deleting znodes20:22
fungiat least some of the daemons20:23
fungi(presumably executors/mergers?)20:24
corvusre-run finished without error, will restart all of zuul now20:24
corvusnow we know that can take about 5 minutes20:25
fungicould be worse20:25
corvusthis may be worth a note to zuul-discuss, to let folks now if they ran master after the last release, they may need to do the same20:26
fungionce everything's back up, i'm curious to know what indicated the need to flush zk20:26
corvuswe have a few mins, i'll answer now :)20:26
corvusfungi: scheduler crashed with this as the last message:
fungiahh, okay20:29
corvusi think we added a field to a zk object, and i don't think we have a fallback for if that isn't there20:29
corvuswe also changed the change cache schema, which also probably would have caused an error, but we didn't make it that far20:29
fungiis the expected way of dealing with that to have the code be backward-compatible, or to run something akin to a db migration?20:29
clarkbI've got a fix for the gerrit images that I'll push up once zuul has loaded configs20:30
fungithanks clarkb!20:30
corvusfungi: the first i think; but we're not putting a lot of effort into that until we get to being able to run 2 schedulers20:30
fungiyeah, i meant down the road when these changes become less frequent20:30
fungiobviously at the moment that would result in a lot of dead code20:31
clarkblooks like pypa, vexxhost, and zuul tenants have loaded but not openstack or opendev yet?20:37
clarkbdebug log shows it is still submitting merge requests20:38
fungiand not choking on github projects i guess20:40
clarkbnot that I've seen yet. I did notice but I don't think that is fatal20:40
clarkbcorvus: ^ some statsd bug20:41
corvusyeah, just happens on startup20:41
corvusshould fix, but is low-priority noise20:41
clarkbit seems to be running through the config loads in alpha sorted order and is to the openstack/s's20:49
clarkbnow t's20:49
clarkband now the x/'s20:51
corvusmaybe it's done now?20:52
clarkbthinking out loud here: two things that may help make this faster 1) don't restart gerrit as it has to repopulate its caches? also 2) prune the zuul main.yaml config to remove unused projects?20:52
clarkb doesn't show it yet20:52
corvusnope i was wrong; still cat jobs20:53
clarkbopenstack si done but not opendev yet20:54
corvusclarkb: it's expected to be slow since we deleted the zuul cache20:54
clarkbI think it is done now20:55
opendevreviewClark Boylan proposed opendev/system-config master: Properly copy gerrit static files
clarkbthat should fix the gerrit image's static/ dir contents. With tseting too20:55
corvusokay i now i think it's really done?20:57
corvus#status log restarted all of zuul on 0928c397937da4129122b00d2288e582bc46aabc21:11
opendevstatuscorvus: finished logging21:11
fungithanks corvus!21:17
*** dviroel is now known as dviroel|out21:29
clarkbthat is a fun one. my logo fix change fails in testing because I curl the actual binary data and it contains non utf8 valid content that testinfra tries to auto convert22:01
fungiheh. adding the test for paste was easier since the logo there is svg22:03
opendevreviewClark Boylan proposed opendev/system-config master: Properly copy gerrit static files
clarkbI expect that will work well enough. Just using head as i don't care too much about the actual byes just that the file exists and has the right file type22:04
fungicould even just rely on the result code for that one, since the cla gets checked for actual content22:08
clarkbya though curl return codes are weird22:09
clarkbiirc you get a 0 return if you get a 404 because the http request functioned22:09
ianwmight be able to use requests library directly, i think there's an example 22:15
clarkbI think this is fine :) it will tell us if the file exists or not and give us the encoding22:17
fungiwell, i meant checking for an http 200 response code, not curl's exit code22:20
clarkbya that is what the current code does22:20
fungiright, what i was saying was you could just stick with that and drop the attempt to parse content from the png file itself22:21
fungithe cla.html test checks file content, so it's kinda already covered22:21
clarkbfungi: oh right, I wasn't actually trying to parse content from the png itself. testinfra was doing that because I examed the stdout object22:21
clarkbI solved that by only requesting the head and not get'ing the file22:21
ianwsorry really should have added testinfra originally22:21
clarkbthen stdout is valid utf822:21
funginow i see what you're saying22:21
clarkbianw: no worries, was a quick fix in production and we should have new images soon enough22:22
fungii hadn't reviewed the recision yet22:22
fungilooking now22:22
fungiyep, lgtm22:22
clarkbalso devstack has another halt the world issue, but it appears to be outside of zuul configs this time. I've offered to help holding nodes etc if that would be useful but I think the nova and placement devs may need to look at this one22:26
clarkbcorvus: infra-root I think the zuul zk clearing removed all autoholds. I'm not sure if that removed any held nodes in nodepool22:28
fungii can check22:29
ianwi still have a connection open to at least one old held node22:29
ianw104.130.13.181 for reference22:29
clarkbya looking on nl01 it didn't clear the nodes22:29
clarkbjust the autohold records on the zuul side22:29
fungithere are still a bunch of nodes held22:29
clarkbThis means you'll need to manually delete the nodes this time around22:30
clarkbrather than just clearing the autohold in zuul22:30
funginoted, thanks22:30
corvusoh yep, that makes sense.22:30
ianwthis has all my mitmproxy setup etc for zuul-registry -- i think i might trust all that enough now to get rid of those nodes :)22:30
clarkbfungi: did we status log the gerrit restart with replication timeouts and updated theming?22:32
fungii don't recall22:32
clarkbDoesn't look like it. Do you think that is worthwhile?22:33
funginot finding it22:34
funginah, it's been long enough nobody's likely wondering why it was down for two minutes22:34
clarkbalso it seems like jobs queue up pretty quickly when new patchsets are pushed22:40
clarkblast weeks issue appears to have been mitigated22:41
clarkbfungi: I'm going to drop the mailman server upgrade topic from the agenda now that we can track that work through your spec23:08
ianwlooks like the gerrit build timed out?23:11
clarkbhrm it ran on inmotion so we may still be having general throughput issues there23:12
ianwdoesn't seem like anything in particular, just being slow?  it was on inmotion i see23:12
clarkbyuriys: one thing that just occured to me is maybe we should double check we aren't using qemu emulation but are using kvm?23:13
clarkbianw: ya I think we can recheck it23:13
corvusrunning fungi's zk-shell command from last week, i confirm the gerrit event queue is mostly empty, so we're keeping up with events.23:40
corvus(i managed to catch a few events setting in it right after a burst, then they were gone on the next run of the command, so i think i confirmed that's also still a valid way to check on this)23:41

Generated by 2.17.2 by Marius Gedminas - find it at!