opendevreview | Steve Baker proposed openstack/diskimage-builder master: Fix lower constraints https://review.opendev.org/c/openstack/diskimage-builder/+/811048 | 01:01 |
---|---|---|
opendevreview | Steve Baker proposed openstack/diskimage-builder master: Fix lower constraints https://review.opendev.org/c/openstack/diskimage-builder/+/811048 | 02:43 |
opendevreview | Steve Baker proposed openstack/diskimage-builder master: Fix lower constraints https://review.opendev.org/c/openstack/diskimage-builder/+/811048 | 03:03 |
*** ysandeep|out is now known as ysandeep | 04:43 | |
*** jpena|off is now known as jpena | 07:29 | |
*** ykarel is now known as ykarel|lunch | 07:56 | |
*** ysandeep is now known as ysandeep|lunch | 08:37 | |
*** ykarel|lunch is now known as ykarel | 09:00 | |
*** ysandeep|lunch is now known as ysandeep | 09:30 | |
*** bhagyashris is now known as bhagyashris|rover | 09:32 | |
*** ykarel is now known as ykarel|afk | 09:53 | |
*** mazzy5 is now known as mazzy | 10:12 | |
*** bhagyashris is now known as bhagyashris|rover | 10:38 | |
*** ykarel|afk is now known as ykarel | 10:53 | |
*** dviroel|out is now known as dviroel | 11:30 | |
*** jpena is now known as jpena|lunch | 11:32 | |
*** jpena|lunch is now known as jpena | 12:25 | |
*** marios is now known as marios|call | 13:06 | |
*** marios|call is now known as marios | 13:59 | |
opendevreview | Danni Shi proposed openstack/diskimage-builder master: Update keylime-agent and tpm-emulator elements https://review.opendev.org/c/openstack/diskimage-builder/+/810254 | 14:54 |
*** redrobot is now known as Guest1129 | 15:09 | |
opendevreview | Dr. Jens Harbott proposed openstack/project-config master: Fix neutron-dynamic-routing grafana dashboard https://review.opendev.org/c/openstack/project-config/+/811182 | 15:13 |
clarkb | fungi: catching up now, do you think we are still good to approve https://review.opendev.org/c/opendev/system-config/+/809512 and https://review.opendev.org/c/opendev/system-config/+/809513 this morning with plans to restart gerrit on the updated theme later today? | 15:33 |
fungi | clarkb: yeah, i think it should be fine, i expect low impact and openstack hasn't really reached a critical point in the xena release work where i'd be uncomfortable restarting for those | 15:35 |
clarkb | fungi: https://review.opendev.org/c/opendev/system-config/+/810303 should also be safe at this point (but not its children) if you want to review and possibly approve that one | 15:35 |
clarkb | that is the gitea 1.14.7 upgrade | 15:35 |
clarkb | fungi: ok I'll approve the gerrit theme changes now | 15:36 |
fungi | thanks! | 15:37 |
clarkb | corvus: on the zuul side of things I think we're running a locally hacked pin back to 4.9.0? Do we need to undo or clean up anything related to that to go back to normal? | 15:37 |
clarkb | I think the expected fixes have all landed into the zuul codebase at this point | 15:37 |
opendevreview | Merged openstack/project-config master: Fix neutron-dynamic-routing grafana dashboard https://review.opendev.org/c/openstack/project-config/+/811182 | 15:40 |
clarkb | Looking at zuul queues things seem generally happy. No large fallout issues like Friday :) | 15:41 |
corvus | clarkb: we just tagged the images locally, so it's probably already auto-reverted (but we haven't restarted), but regardless, i'll run the pull playbook before restarting. and yeah, i'd like to do that today | 15:41 |
clarkb | got it so docker tags unlike git tags will move appropriately on pull | 15:42 |
corvus | yeah they behave more like git branches as far as that goes | 15:43 |
clarkb | gerrit replication queues look sane too. At this point I think we can consider that issue addressed (thank you mnaser!) | 15:44 |
mnaser | \o/ | 15:45 |
*** marios is now known as marios|out | 15:47 | |
*** ysandeep is now known as ysandeep|out | 16:12 | |
mnaser | infra-root: does anyone know where i can find the script that runs to generate the wheels for mirrors? | 16:37 |
clarkb | I always have to dig it up. Give me a couple minutes and I should be able to find it | 16:37 |
*** jpena is now known as jpena|off | 16:38 | |
clarkb | mnaser: https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/roles/build-wheel-cache/files/wheel-build.sh | 16:39 |
mnaser | bash magic, thank you clarkb | 16:40 |
fungi | https://opendev.org/openstack/project-config/src/branch/master/roles/copy-wheels/files/wheel-indexer.py is the meat of it | 16:42 |
fungi | in case you want to look at some python magic | 16:42 |
opendevreview | Merged opendev/system-config master: Upgrade gitea to 1.14.7 https://review.opendev.org/c/opendev/system-config/+/810303 | 16:54 |
opendevreview | Merged opendev/system-config master: gerrit: copy static files directly into container image https://review.opendev.org/c/opendev/system-config/+/809512 | 16:54 |
opendevreview | Merged opendev/system-config master: gerrit: host logo in static files https://review.opendev.org/c/opendev/system-config/+/809513 | 16:54 |
clarkb | the gitea upgrades have completed looks happy | 17:39 |
clarkb | corvus: do we need to land https://review.opendev.org/c/openstack/project-config/+/810530 before doing zuul restarst? | 17:51 |
corvus | clarkb: i think it's not important, but let's go ahead and land it. i'd like to afk for a bit then maybe restart 30m from now or later... | 17:58 |
clarkb | ok | 17:58 |
corvus | i +wd it | 17:58 |
clarkb | thanks | 17:58 |
clarkb | Releases looks quiet but I'll let them know we're likely to restart zuul and gerrit this afternoon pst | 18:01 |
fungi | #status log Deleted openstackid01.openstack.org, openstackid02.openstack.org, openstackid03.openstack.org, and openstackid-dev01.openstack.org at smarcet's request, after making a snapshot image of openstackid01 for posterity | 18:04 |
opendevstatus | fungi: finished logging | 18:04 |
clarkb | fungi: the review updates have happened. We should be good to restart gerrit whenever external factors are happy for it | 18:07 |
clarkb | we might want to do a docker compose pull first just to double check that, but ya should be ready | 18:08 |
opendevreview | Merged openstack/project-config master: Remove github3.py from our zuul config https://review.opendev.org/c/openstack/project-config/+/810530 | 18:08 |
fungi | cool, i'm doing some engagement stats reporting with long-running queries against the server, but can rerun them later | 18:08 |
clarkb | selfishly I'm thinking after lunch so that I can eat lunch without any potentail impacts | 18:08 |
clarkb | fungi: oh I'm happy to wait for those to finish | 18:08 |
fungi | ahh, yeah that works too | 18:08 |
clarkb | cool I'll check in around then and we can make sure your queries are done | 18:09 |
fungi | but yeah, lmk when you're ready, and i'll do the docker pull | 18:09 |
clarkb | and I've notified the openstack release team that gerrit and zuul restarts are likely in the near future. Will give them an update when we're ready to do each | 18:09 |
fungi | "opendevorg/gerrit 3.2 43cea155d566 2 hours ago 793MB" | 18:10 |
opendevreview | Marco Vaschetto proposed openstack/diskimage-builder master: Allowing ubuntu element use local image https://review.opendev.org/c/openstack/diskimage-builder/+/809009 | 19:00 |
clarkb | ok lunch has been consumed. I should be in a good spot to do restarts in a few minutes if nowish works? | 19:29 |
clarkb | fungi: corvus: do we want to try and do zuul and gerrit together or just go ahead and do gerrit since we're basically ready and it is quick? | 19:45 |
opendevreview | Clark Boylan proposed opendev/system-config master: Upgrade gitea to 1.15.3 https://review.opendev.org/c/opendev/system-config/+/803231 | 19:56 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM force gitea failure for interaction https://review.opendev.org/c/opendev/system-config/+/800516 | 19:56 |
fungi | i'm on hand to help with it now, my queries are finished | 19:56 |
clarkb | cool maybe we just do gerrit really quickly? since I don't think we really need to coordinate that one iwth zuul? | 19:57 |
fungi | yeah, want me to restart it or are you doing that? | 19:58 |
clarkb | fungi: if you want to do it go for it. Should we do a pull first? | 19:58 |
fungi | i can do another just to be sure | 19:58 |
clarkb | ++ | 19:58 |
fungi | "opendevorg/gerrit 3.2 43cea155d566 4 hours ago 793MB" | 19:59 |
fungi | same one i pulled earlier | 19:59 |
clarkb | but there is another from 4 hours ago. Now I wonder if we tagged them out of sequence? | 19:59 |
clarkb | lets dobule check on that really quickly | 19:59 |
fungi | status notice The Gerrit service on review.opendev.org is being restarted briefly for configuration updates and should return to service momentarily | 20:00 |
fungi | that's what i propose we notify with when we restart | 20:00 |
clarkb | https://hub.docker.com/layers/opendevorg/gerrit/3.2/images/sha256-82e111fbd033cd0b0978a506f3d3c0797cdbe06216818b48e74152e3426bf048?context=explore shows the change was 809513 which is the second in the sequence | 20:01 |
clarkb | I always struggle to map that to the image id that docker image list shows (you'd think being abel to match an image locally to an image on the repo would be a required feature of such a system) | 20:01 |
clarkb | ok if you image inspect then you can get the sha256 | 20:02 |
clarkb | that image tagged 3.2 is for the second change in the sequence. We should be good. | 20:02 |
clarkb | fungi: that notice lgtm | 20:02 |
corvus | clarkb: fungi i'm here and ready to restart zuul | 20:03 |
corvus | it looks like you have not started gerrit restart? | 20:03 |
clarkb | corvus: correct gerrit hasn't happened yet as far as I can tell | 20:03 |
clarkb | But I think we are ready to do gerrit, can wait to coordinate with zuul | 20:04 |
clarkb | fungi: ^ | 20:04 |
fungi | yeah, i can wait if we want to sync them | 20:04 |
corvus | i'm ready now; i'll wait for fungi to give me a 'go' signal? | 20:05 |
fungi | shall we tweak the status notice? | 20:05 |
fungi | status notice Gerrit and Zuul services are being restarted briefly for configuration and code updates but should return to service momentarily | 20:06 |
fungi | something like that? | 20:06 |
clarkb | lgtm | 20:06 |
fungi | corvus: is there any sequencing we need to do between zuul and gerrit? | 20:07 |
fungi | like do i stop and start gerrit while you're restarting zuul, or do i need to wait for zuul to go down, do i need to pause between stopping and starting? | 20:07 |
corvus | i think: stop zuul, stop gerrit, start gerrit, start zuul | 20:07 |
fungi | perfect, i'll send the status notice now and corvus you can go ahead with stopping zuul | 20:08 |
corvus | ack | 20:08 |
fungi | #status notice Gerrit and Zuul services are being restarted briefly for configuration and code updates but should return to service momentarily | 20:08 |
opendevstatus | fungi: sending notice | 20:08 |
-opendevstatus- NOTICE: Gerrit and Zuul services are being restarted briefly for configuration and code updates but should return to service momentarily | 20:08 | |
fungi | i have the command readied to stop gerrit once you confirm it's safe to do so | 20:08 |
corvus | fungi: zuul is stopped | 20:09 |
fungi | stopping and starting gerrit | 20:09 |
fungi | gerrit should be on its way back up now | 20:09 |
fungi | gerrit webui is loading for me | 20:10 |
fungi | clarkb: ^ lgty? | 20:10 |
clarkb | I think gerrit is happy except it isn't loading the logo as expected | 20:10 |
clarkb | that isn't fatal and we can probably proceed | 20:10 |
fungi | yeah, i'm seeing the same which is why i asked | 20:11 |
clarkb | we should check the cla loads? | 20:11 |
corvus | i'm standing by with zuul (as i think a gerrit restart while zuul is starting would be very messy) | 20:12 |
fungi | https://review.opendev.org/static/cla.html | 20:12 |
clarkb | ya I see the issue | 20:12 |
clarkb | we put the files in /var/gerrit/ and not /var/gerrit/static | 20:12 |
fungi | can we fix it up on the fly with the docker-compose file or do we need to roll back to the old image? | 20:12 |
clarkb | fungi: I think we can mv them in the container image itslef | 20:12 |
clarkb | then work on a fix with a new image | 20:12 |
clarkb | what I don't undersatnd is we tested this I thought | 20:13 |
clarkb | like literally have a test to fetch the file | 20:13 |
fungi | okay, so roll forward with the zuul start and we can tweak the gerrit container's fs live for now? | 20:13 |
clarkb | ya I think so. Let me copy the files now | 20:14 |
corvus | okay, starting zuul now | 20:14 |
fungi | thanks! | 20:14 |
fungi | and yeah, looks like we could just docker exec gerrit mv ... | 20:14 |
clarkb | ya I did that but I did cp so we can see where the files ended up in the build | 20:15 |
corvus | i think we may need to clear the zk state | 20:15 |
clarkb | https://review.opendev.org/static/cla.html loads now | 20:15 |
clarkb | and the logo loads too | 20:15 |
fungi | and the logo, yep | 20:15 |
fungi | corvus: need help with the zk flush? | 20:16 |
clarkb | cla.html opendev-sm.png robots.txt system-cla.html usg-cla.html <- those are the files I copied to static/ | 20:16 |
fungi | perfect | 20:16 |
corvus | what's the best way to run a "zuul" command when the container isn't running? :) | 20:16 |
corvus | i guess i'm going to need to do a "docker run --rm " with the image | 20:16 |
clarkb | ya that sounds right | 20:16 |
corvus | oh but we also need the volumes mounted.... | 20:17 |
corvus | so maybe a docker-compose run | 20:17 |
corvus | docker-compose run --rm scheduler zuul --help works | 20:18 |
clarkb | re testing we did the test for paste but not for review | 20:18 |
fungi | ohh | 20:18 |
corvus | so i will run docker-compose run scheduler zuul delete-state | 20:18 |
fungi | corvus: sounds good, thanks | 20:19 |
corvus | rather: docker-compose run --rm scheduler zuul delete-state | 20:19 |
corvus | running; it's not fast. | 20:19 |
corvus | oops, i think i forget to stop the rest of zuul when running that; i'll run it again | 20:22 |
corvus | (i got a kazoo notemptyerror) | 20:22 |
fungi | that does seem to imply zuul was in the process of trying to start while you were deleting znodes | 20:22 |
fungi | at least some of the daemons | 20:23 |
fungi | (presumably executors/mergers?) | 20:24 |
corvus | yep | 20:24 |
corvus | re-run finished without error, will restart all of zuul now | 20:24 |
fungi | perfect | 20:24 |
corvus | now we know that can take about 5 minutes | 20:25 |
fungi | could be worse | 20:25 |
corvus | this may be worth a note to zuul-discuss, to let folks now if they ran master after the last release, they may need to do the same | 20:26 |
fungi | once everything's back up, i'm curious to know what indicated the need to flush zk | 20:26 |
corvus | we have a few mins, i'll answer now :) | 20:26 |
corvus | fungi: scheduler crashed with this as the last message: https://paste.opendev.org/show/809631/ | 20:27 |
fungi | ahh, okay | 20:29 |
corvus | i think we added a field to a zk object, and i don't think we have a fallback for if that isn't there | 20:29 |
corvus | we also changed the change cache schema, which also probably would have caused an error, but we didn't make it that far | 20:29 |
fungi | is the expected way of dealing with that to have the code be backward-compatible, or to run something akin to a db migration? | 20:29 |
clarkb | I've got a fix for the gerrit images that I'll push up once zuul has loaded configs | 20:30 |
fungi | thanks clarkb! | 20:30 |
corvus | fungi: the first i think; but we're not putting a lot of effort into that until we get to being able to run 2 schedulers | 20:30 |
fungi | yeah, i meant down the road when these changes become less frequent | 20:30 |
fungi | obviously at the moment that would result in a lot of dead code | 20:31 |
clarkb | looks like pypa, vexxhost, and zuul tenants have loaded but not openstack or opendev yet? | 20:37 |
clarkb | debug log shows it is still submitting merge requests | 20:38 |
fungi | and not choking on github projects i guess | 20:40 |
clarkb | not that I've seen yet. I did notice https://paste.opendev.org/show/809632/ but I don't think that is fatal | 20:40 |
clarkb | corvus: ^ some statsd bug | 20:41 |
corvus | yeah, just happens on startup | 20:41 |
corvus | should fix, but is low-priority noise | 20:41 |
clarkb | it seems to be running through the config loads in alpha sorted order and is to the openstack/s's | 20:49 |
clarkb | now t's | 20:49 |
clarkb | and now the x/'s | 20:51 |
corvus | maybe it's done now? | 20:52 |
clarkb | thinking out loud here: two things that may help make this faster 1) don't restart gerrit as it has to repopulate its caches? also 2) prune the zuul main.yaml config to remove unused projects? | 20:52 |
clarkb | https://zuul.opendev.org/tenants doesn't show it yet | 20:52 |
corvus | nope i was wrong; still cat jobs | 20:53 |
clarkb | openstack si done but not opendev yet | 20:54 |
corvus | clarkb: it's expected to be slow since we deleted the zuul cache | 20:54 |
clarkb | ah | 20:54 |
clarkb | I think it is done now | 20:55 |
opendevreview | Clark Boylan proposed opendev/system-config master: Properly copy gerrit static files https://review.opendev.org/c/opendev/system-config/+/811233 | 20:55 |
clarkb | that should fix the gerrit image's static/ dir contents. With tseting too | 20:55 |
corvus | okay i now i think it's really done? | 20:57 |
corvus | re-enqueueing | 20:57 |
corvus | complete | 21:10 |
corvus | #status log restarted all of zuul on 0928c397937da4129122b00d2288e582bc46aabc | 21:11 |
opendevstatus | corvus: finished logging | 21:11 |
fungi | thanks corvus! | 21:17 |
*** dviroel is now known as dviroel|out | 21:29 | |
clarkb | that is a fun one. my logo fix change fails in testing because I curl the actual binary data and it contains non utf8 valid content that testinfra tries to auto convert | 22:01 |
fungi | heh. adding the test for paste was easier since the logo there is svg | 22:03 |
opendevreview | Clark Boylan proposed opendev/system-config master: Properly copy gerrit static files https://review.opendev.org/c/opendev/system-config/+/811233 | 22:04 |
clarkb | I expect that will work well enough. Just using head as i don't care too much about the actual byes just that the file exists and has the right file type | 22:04 |
fungi | could even just rely on the result code for that one, since the cla gets checked for actual content | 22:08 |
clarkb | ya though curl return codes are weird | 22:09 |
clarkb | iirc you get a 0 return if you get a 404 because the http request functioned | 22:09 |
ianw | might be able to use requests library directly, i think there's an example | 22:15 |
clarkb | I think this is fine :) it will tell us if the file exists or not and give us the encoding | 22:17 |
fungi | well, i meant checking for an http 200 response code, not curl's exit code | 22:20 |
clarkb | ya that is what the current code does | 22:20 |
fungi | right, what i was saying was you could just stick with that and drop the attempt to parse content from the png file itself | 22:21 |
fungi | the cla.html test checks file content, so it's kinda already covered | 22:21 |
clarkb | fungi: oh right, I wasn't actually trying to parse content from the png itself. testinfra was doing that because I examed the stdout object | 22:21 |
fungi | oh | 22:21 |
fungi | neat | 22:21 |
clarkb | I solved that by only requesting the head and not get'ing the file | 22:21 |
ianw | sorry really should have added testinfra originally | 22:21 |
clarkb | then stdout is valid utf8 | 22:21 |
fungi | now i see what you're saying | 22:21 |
clarkb | ianw: no worries, was a quick fix in production and we should have new images soon enough | 22:22 |
fungi | i hadn't reviewed the recision yet | 22:22 |
fungi | revision | 22:22 |
fungi | looking now | 22:22 |
fungi | yep, lgtm | 22:22 |
clarkb | also devstack has another halt the world issue, but it appears to be outside of zuul configs this time. I've offered to help holding nodes etc if that would be useful but I think the nova and placement devs may need to look at this one | 22:26 |
fungi | lovely | 22:26 |
clarkb | corvus: infra-root I think the zuul zk clearing removed all autoholds. I'm not sure if that removed any held nodes in nodepool | 22:28 |
fungi | i can check | 22:29 |
ianw | i still have a connection open to at least one old held node | 22:29 |
ianw | 104.130.13.181 for reference | 22:29 |
clarkb | ya looking on nl01 it didn't clear the nodes | 22:29 |
clarkb | just the autohold records on the zuul side | 22:29 |
fungi | there are still a bunch of nodes held | 22:29 |
clarkb | This means you'll need to manually delete the nodes this time around | 22:30 |
clarkb | rather than just clearing the autohold in zuul | 22:30 |
fungi | noted, thanks | 22:30 |
corvus | oh yep, that makes sense. | 22:30 |
ianw | this has all my mitmproxy setup etc for zuul-registry -- i think i might trust all that enough now to get rid of those nodes :) | 22:30 |
clarkb | fungi: did we status log the gerrit restart with replication timeouts and updated theming? | 22:32 |
fungi | i don't recall | 22:32 |
clarkb | Doesn't look like it. Do you think that is worthwhile? | 22:33 |
fungi | not finding it | 22:34 |
fungi | nah, it's been long enough nobody's likely wondering why it was down for two minutes | 22:34 |
clarkb | k | 22:34 |
clarkb | also it seems like jobs queue up pretty quickly when new patchsets are pushed | 22:40 |
clarkb | last weeks issue appears to have been mitigated | 22:41 |
clarkb | fungi: I'm going to drop the mailman server upgrade topic from the agenda now that we can track that work through your spec | 23:08 |
fungi | wfm | 23:09 |
ianw | looks like the gerrit build timed out? | 23:11 |
clarkb | hrm it ran on inmotion so we may still be having general throughput issues there | 23:12 |
ianw | doesn't seem like anything in particular, just being slow? it was on inmotion i see | 23:12 |
clarkb | yuriys: one thing that just occured to me is maybe we should double check we aren't using qemu emulation but are using kvm? | 23:13 |
clarkb | ianw: ya I think we can recheck it | 23:13 |
corvus | running fungi's zk-shell command from last week, i confirm the gerrit event queue is mostly empty, so we're keeping up with events. | 23:40 |
corvus | (i managed to catch a few events setting in it right after a burst, then they were gone on the next run of the command, so i think i confirmed that's also still a valid way to check on this) | 23:41 |
fungi | awesome! | 23:43 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!