clarkb | Meeting time | 19:01 |
---|---|---|
clarkb | Anyone else here? | 19:01 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Jun 21 19:01:22 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
ianw | o/ | 19:01 |
frickler | \o | 19:02 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-June/000340.html Our Agenda | 19:02 |
clarkb | #topic Announcements | 19:02 |
clarkb | I had no announcements | 19:03 |
clarkb | And there were no actiosn from last meeting | 19:03 |
clarkb | That means we can dive right in | 19:03 |
clarkb | #topic Topics | 19:03 |
clarkb | #topic Improving CD throughput | 19:03 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/846195 Running Zuul cluster upgrade playbook automatically. | 19:03 |
clarkb | This change is ready for review now. It sets up a cron that fires on the weekend to upgrade and reboot our entire zuul cluster | 19:04 |
clarkb | fungi: ran the playbook by hand last week and it continued to function. I think we're as ready as we will be to do this, but let me know if you disagree. Also call out any problems with my implementation if you see them | 19:04 |
fungi | yeah, it completed without issue | 19:04 |
fungi | took over 24 hours to complete, but weekly seems like a reasonable cadence for that | 19:05 |
clarkb | and it should go quicker when zuul is calmer over the weekend | 19:06 |
fungi | yes | 19:06 |
clarkb | anyway we can followup further in review. Just wanted to call out the change exists and has had its -W removed | 19:06 |
clarkb | Anything else on this topic? | 19:06 |
clarkb | #topic Glean rpm platform static ipv6 support | 19:08 |
clarkb | This is sort of a ninja addition to the agenda, but I realized i never followed up on this whole thing due to travel | 19:08 |
clarkb | ianw: I see all the glean changes ended up merging. I guess we did the releases and things are happy on ovh now? | 19:08 |
clarkb | Is there anything else that needs to be done or can we consider this completed? | 19:09 |
ianw | i think that's complete, i haven't heard anything else on it | 19:09 |
clarkb | great. Thank you for taking care of that | 19:09 |
ianw | certainly if somebody wants something to do, there could be quite a bit of refactoring done in glean | 19:09 |
ianw | but, it works, so if it ain't broke ... :) | 19:09 |
clarkb | #topic Container Maintenance | 19:10 |
clarkb | I wanted to call out that we upgraded our first mariadb installation from 10.4 to 10.6 during the gerrit 3.5 upgrade process | 19:10 |
clarkb | As far as I can tell that went well. We should probably start thinking about upgrading the DBs for other services too | 19:11 |
clarkb | #link https://etherpad.opendev.org/p/gerrit-upgrade-3.5 captures mariadb upgrade process | 19:11 |
clarkb | This isn't urgent and does require rot access as performed on gerrit. However, there is an env var we can set to have mariadb do it automatically for us if non roots want to help us be brave and push those changes :) | 19:12 |
clarkb | s/rot/root/ | 19:12 |
fungi | root13 | 19:13 |
clarkb | #topic Gerrit 3.5 upgrade | 19:13 |
clarkb | This happened. Thank you to everyone (ianw in particular) who helped get us here | 19:13 |
clarkb | The upgrade itself went very smoothly. We have noticed a couple of issues since then. | 19:14 |
clarkb | #link https://bugs.chromium.org/p/gerrit/issues/detail?id=16018 Now fixed | 19:14 |
fungi | one of which is already addressed | 19:14 |
clarkb | yup that one I just linked is fixed upstream and the fix is deployed in opendev | 19:14 |
clarkb | The other issue whcih frickler noticed is that it seems any change marked WorkInProgress also shows up as merge conflicting in a change listing | 19:14 |
clarkb | I am hoping to have time to look into that probably tomorrow as I suspect that is a bug on the gerrit side where they equate WIP and merge conflict for some reason | 19:15 |
clarkb | If you notice other issues please do call them out | 19:15 |
fungi | has anyone had an opportunity to confirm whether that also happens on 3.6? | 19:15 |
frickler | actually some other kolla people noticed first | 19:15 |
clarkb | I have not. The next thing to discuss is removing 3.4 images and adding 3.6 stuff whcih will make it easier for us to test that | 19:15 |
ianw | ++ having a 3.6 replication in s-c would probably be a good help | 19:16 |
frickler | one other thing that seems new is the highlighting on hovering with the pointer on a word, which I think is very annoying | 19:16 |
ianw | i feel like it was doing that before; or maybe i'm just thinking of the upstream gerrit | 19:16 |
clarkb | yes that is new, and yes that is intentional and I haven't figured out if I like it or not yet | 19:16 |
clarkb | ianw: upstream gerrit did it before but 3.5 brought it to us | 19:16 |
clarkb | I wonder if we can put that behind a config and turn it off | 19:16 |
frickler | is that configurable somehow? | 19:16 |
clarkb | frickler: I looked for user config for it yesterday and couldn't find it | 19:17 |
clarkb | I think user config would be ideal, but server wide would be acceptable too. I'll have to look at it more | 19:17 |
fungi | i guess that's another behavior gertty is shielding me from | 19:17 |
fungi | since i don't get what's being described | 19:17 |
fungi | s/get/understand/ | 19:17 |
clarkb | fungi: ya if you mouse over wrods in the diff view gerrit now highlights all occurences of that word in the diff | 19:17 |
clarkb | I personally prefer explicit use of ^F | 19:17 |
fungi | uh... huh | 19:17 |
frickler | in screaming yellow | 19:17 |
clarkb | not sure why they added that, but its definitely something that seems intentional | 19:18 |
clarkb | #link https://review.opendev.org/q/topic:gerrit-3.4-cleanups | 19:18 |
clarkb | I've pushed up changes to being some of the cleanup here. The first two are actually followups to running 3.5 | 19:18 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/847034 | 19:18 |
clarkb | #link https://review.opendev.org/c/openstack/project-config/+/847057 | 19:19 |
fungi | is ^F another gerrit override? one of the reasons i don't use the webui is that it also implements its own keybindings, which seems to somehow override my browser's keybinds, so for someone who prefers to do keyboard-driven browser navigation it's nigh unworkable | 19:19 |
clarkb | if we can get those reviewed then the rest of the stack can be checked and landed when we are happy that we are unlikely to revert | 19:19 |
clarkb | fungi: old gerrit captured ^F but modern polygerrit doesn't it just uses the browser search function now | 19:19 |
clarkb | fungi: there are still other shortcut keys though | 19:19 |
fungi | oh, well that's an improvement at least, though my keyboard navigation plugin relies on / to search which is probably still overridden | 19:20 |
clarkb | I'm thinking maybe next week we remove 3.4 and hopefully by then we've also got the 3.6 jobs working and can land 3.6 quickly after 3.4 is removed | 19:20 |
fungi | if there were an option to disable all the gerrit webui keybindings, i might consider trying it out again | 19:20 |
clarkb | fungi: / is not overridden anymore | 19:20 |
clarkb | but [ and ] still move forward and backward through a review | 19:21 |
clarkb | oh wait / is but it grabs the normal search bar now not in page text search | 19:21 |
clarkb | sorry about that they just chagned what search meant in the context of / I guess | 19:21 |
fungi | yeah. ultimately i see that as the failing of the browser for not giving me an option to just ignore javascript keypress capture | 19:22 |
clarkb | The other thing we'll want to monitor is general memory usage by 3.5 since other users have had memory trouble. I suspect we're fine simply because we don't run extra plugins and metric gathering | 19:23 |
clarkb | But I'm remembering now that I meant to take a heap usage measurement then compare it daily or something | 19:23 |
fungi | also the earlier memory capture attempts in zuul didn't show any difference, though obviously there's no production load there | 19:23 |
clarkb | gerrit show-caches will give us this info I'll run that after the meeting and check it every day at roughly 1900 utc for the next few days | 19:24 |
clarkb | Anything else gerrit upgraded related to call out before we move on? | 19:25 |
fungi | just the surprisingly few complaints we've received | 19:25 |
fungi | it's like hardly anyone even noticed we upgraded | 19:25 |
fungi | success | 19:25 |
clarkb | ++ | 19:25 |
clarkb | #topic Enable Nodepool Launcher Webapp | 19:27 |
frickler | actually it turns out that it is enabled | 19:27 |
frickler | just not present in the config, which doesn't work in my local setup, but somehow works in the opendev deployment | 19:27 |
frickler | so nothing to do there | 19:28 |
clarkb | I think ianw was also talking about fixing up the grafana dashboard stuff? | 19:28 |
frickler | the grafana page has two issues: the "failed" status isn't shown in red | 19:28 |
clarkb | to add missing images? | 19:28 |
frickler | and some of the timing graphs are missing | 19:28 |
clarkb | oh right eh color | 19:28 |
ianw | yes i did start looking at this ... | 19:29 |
frickler | but I couldn't find a solution for any of this | 19:29 |
fungi | what's the launcher webapp path? | 19:29 |
ianw | the reason it doesn't work is because grafana has redone the way it deals with thresholds | 19:29 |
ianw | and it seems the way that grafyaml writes thresholds is in the old format, that doesn't quite get upgraded properly to the new format, so it doesn't set "red" when the value is "1" | 19:30 |
frickler | for the webapp, we should replace the apache default page with a list of the possible links. and maybe add ssl. http://nl01.opendev.org/ | 19:31 |
clarkb | ianw: ok so a grafyaml update is required. | 19:31 |
frickler | http://nl01.opendev.org/dib-image-list and http://nl01.opendev.org/image-list are the useful things | 19:31 |
fungi | what's the launcher webapp path? | 19:31 |
clarkb | is that fork you found stillactive? I wonder if we're better off just adopting that? | 19:31 |
ianw | so i started trying to reverse engineer the way to do new thresholds, and I find it fairly frustrating and frankly a bit pointless to be trying to rewrite grafyaml to a non-existent API | 19:31 |
fungi | thanks! | 19:31 |
clarkb | ianw: non existant because they change it arbitrarily? | 19:32 |
fungi | and/or don't document it? | 19:32 |
ianw | both | 19:32 |
fungi | shades of jjb | 19:33 |
ianw | they do tend to write backwards compat things so that old dashboards load | 19:33 |
ianw | then also, as mentioned, i found https://github.com/deliveryhero/grafyaml who have done a bunch of reverse engineering work for new panel types, etc | 19:34 |
ianw | but have also run reformatters, and not chosen to interact with the upstream development at all | 19:34 |
ianw | which was also depressing | 19:34 |
fungi | it's apache 2.0 though, so we could just fork their changes back in | 19:35 |
fungi | oh, except for the reformatting | 19:35 |
frickler | sound like typical german startup I must say | 19:35 |
ianw | then last time we discussed just using the raw json (which imo really isn't that hard to read) there was disagreement, which was also not fun | 19:35 |
ianw | so frankly i left the whole exercise a bit depressed over everything :) | 19:36 |
clarkb | ya maybe we need to revive that discussion and see if we can find a compromise. Like maybe we can store the raw json as yaml to make it reviewable and then have a really light weight tool that just converts yaml to json directly for grafana consumption | 19:36 |
fungi | i'm tempted to open a github issue against that project saying that their updates aren't importable upstream due to the reformatting | 19:36 |
clarkb | rather tahn doing a different representation entirely in yaml | 19:36 |
clarkb | I assume ruamel can do that in a pretty straightforward manner for us | 19:37 |
fungi | pyyaml could presumably do it too | 19:37 |
ianw | i don't know if we're having any different conversation than we had before, but i think if you look at the json output exported by grafana it's quite readable. they are not doing anything crazy in there | 19:38 |
fungi | unless you're worried about preserving ordering or inline comments | 19:38 |
ianw | it's just they arbitrarily choose to refactor things | 19:38 |
clarkb | ianw: I think it was corvus who had the major objection and didn't feel json was human readable at all. | 19:38 |
clarkb | (regardless of the actual json) | 19:38 |
clarkb | maybe this deservers a mailing list thread | 19:39 |
clarkb | as I don't know that corvus is watching the meeting today | 19:39 |
ianw | yes, fair enough | 19:40 |
clarkb | and we can be explciit about what the issues are that way and try to find a workaround/compromise/etc | 19:40 |
fungi | could probably stand to be a ml thread either way | 19:40 |
clarkb | ++ | 19:40 |
ianw | i mean, delivery hero or whatever have obviously seen some usefulness in it too | 19:40 |
ianw | but imo it's just a losing game if upstream don't want to give you an api to work to | 19:41 |
clarkb | ya you'll always be fighting problems like this | 19:41 |
corvus | oh hi | 19:41 |
clarkb | corvus: I think where we've ended up is we need to do something re grafyaml and grafana dashboard/graph management as the current tool is not working in ways that are annoying. But we should start up a mailing list thread to discuss it further to make sure we capture all the angles (including this random fork on github) | 19:42 |
frickler | do we know if that lack of documentation is intentional on the grafana side? | 19:43 |
fungi | fwiw, i don't see any obvious indication that the deliveryhero devs tried to upstream patches for grafyaml, just skimming the reviews there | 19:43 |
clarkb | ianw: is that something you woudl like to start or would it be helpful if I try to give it a go | 19:43 |
ianw | i can send a mail, sure | 19:43 |
clarkb | thanks | 19:43 |
ianw | i don't think we need to keep having the same conversation in irc, for sure | 19:43 |
clarkb | Alright anything else on this before we move on? | 19:44 |
clarkb | #topic Custom url shortener | 19:45 |
frickler | that's an easy one: still on my todo list | 19:45 |
clarkb | ok just wanted to make sure I had not missed a change | 19:45 |
frickler | nope | 19:45 |
clarkb | #topic Removing Projects from Zuul | 19:46 |
clarkb | This was not on the emailed agenda beacuse it occured to me just this morning | 19:46 |
clarkb | The changes I pushed up to windmill and x/vmware-nsx to remove their use of pipeline level queue definitions in the gerrit config have not been reviewed and most of them fail CI | 19:47 |
clarkb | one idea to address this is to simply remove projects like that from our zuul config. | 19:47 |
clarkb | Separately I do also notice that it seems like literally no one has addressed this problem in openstack at all | 19:47 |
clarkb | but I think for this topic I'm mostly concerned about what are very likely dead projects that we should just decouple from zuul until they become active again | 19:48 |
clarkb | Are there any objections to that or concerns with doing that? | 19:48 |
frickler | no, but an additional idea, can we also handle some of the lang standing zuul config errors like that? | 19:48 |
fungi | i'm fine with that. for openstack projects, i'm happy to present the tc with the list of projects we're removing, and suggest that they can be re-added when their authors are ready to address any problems | 19:48 |
fungi | same for config errors | 19:49 |
ianw | would you commit something to the projects saying "btw this zuul config is not being processed"? | 19:49 |
ianw | i just wonder if people do try to commit something, and it goes into gerrit and nothing happens | 19:49 |
clarkb | ok a lot going on. I'm going to start with ianw's since that was one of the concerns I had too | 19:49 |
corvus | i understood the suggestion as remove them from the tenant config; and i think that sounds good | 19:49 |
clarkb | Which was how do we make people aware of this change if they are already not really paying attention | 19:49 |
fungi | though that can lead to a cascade effect, since many of those errors are due to projects which have been renamed or retired still listed as required-projects in old branches of other projects' configs | 19:49 |
corvus | (removing from the tenant config means no commits to projects necessary) | 19:49 |
clarkb | corvus: correct, but it also means if someone pushes code to that repo now they'll just be silently ignored | 19:50 |
frickler | we could add a job in project-config that just output some comment? | 19:50 |
corvus | yes, and presumably ask someone what is up and end up at service-discuss or #opendev | 19:50 |
clarkb | frickler: the problem with that is you need th project in the tenant config to run the job against the repo | 19:51 |
clarkb | I think where I've ended up on that is what corvus describes | 19:51 |
clarkb | basically it isn't ideal but they should know where to go asking questions | 19:51 |
frickler | can't we just ignore the in-project config like we do for some github projects? | 19:52 |
corvus | (you could theoretically exclude all config objects from those projects and then run a job from a config repo, but that sounds like a lot of work for people who aren't around) | 19:52 |
fungi | just so i can go back to the openstack tc with a clear message, it's that leaving project zuul configs in a broken/error state indefinitely is not okay, even if it's "just" on some old stable branches ~nobody cares about, and we will be taking those projects out of the tenant config even if their master branch configs are still working | 19:52 |
corvus | my argument would be that opendev's level of service for projects should not exceed the attention given by their developers to them. | 19:52 |
clarkb | fungi: sort of. This si specifically re http://lists.openstack.org/pipermail/openstack-discuss/2022-May/028603.html | 19:52 |
frickler | corvus: fair enough, I support that | 19:53 |
clarkb | fungi: I do think though that we're appraoching what you descirbe whcih is that broken project configs regardless of the reason create problems for the projects in question and others. If they aren't going to do basic care and feeding then we'll remove from the CI system to avoid confusion | 19:53 |
fungi | well, i was extending it to frickler's suggestion that we "also handle some of the lang standing zuul config errors like that" | 19:53 |
clarkb | corvus: ++ | 19:53 |
corvus | (my comments are mostly in the context of abandoned projects) | 19:53 |
clarkb | the risk with doing wide srpead removal is that it will chain reaction down all the dependencies | 19:53 |
fungi | yes, my point was that already a vast number of the config errors are due to retired/renamed projects no longer appearing in the tenant config | 19:54 |
fungi | so i would expect the error count to grow if we remove them | 19:54 |
clarkb | anyway to start I'm just suggesting x/vmware-nsx and windmill be removed since they both appear dead and are not part of openstack. Then separately we need to push openstack harder to actually fix this stuff | 19:55 |
fungi | also the errors are branch-specific, but tenant removal is project-level | 19:55 |
clarkb | and if pushing openstack harder doesn't result in fixing these things we should consider removing from zuul at that time | 19:55 |
clarkb | btu I don't think we're quite to that point yet for openstack. But we should probably warn them that is ultimately our failsafe on the zuul side | 19:55 |
fungi | yeah, i'll give a gently firm reminder | 19:56 |
corvus | maybe separately ask openstack to appoint some janitors for those projects (midonet, etc)? | 19:57 |
fungi | definitely | 19:58 |
frickler | I also noticed that the zuul tenant has collected a set of config errors btw. | 19:58 |
corvus | yes, they are unresolvable until opendev finishes being extracted from openstack | 19:59 |
clarkb | Alright we are just about at time | 19:59 |
clarkb | #topic Open Discussion | 19:59 |
clarkb | Anything else? | 19:59 |
corvus | possibly some may remain even then | 19:59 |
fungi | i'm hoping to refresh our meetpad configs to current upstream examples/defaults | 20:00 |
fungi | not quite sure how best to minimize our differential going forward | 20:00 |
fungi | we've tried a few different things, but those files are quite large and our edits represent a small proportion of them | 20:00 |
fungi | open to ideas in #opendev if anyone has some | 20:01 |
clarkb | fungi: probably upstreaming support for flags we need is the best way | 20:01 |
clarkb | but also we are at time | 20:01 |
clarkb | thank you everyone | 20:01 |
clarkb | #endmeeting | 20:01 |
opendevmeet | Meeting ended Tue Jun 21 20:01:32 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:01 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-21-19.01.html | 20:01 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-21-19.01.txt | 20:01 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-06-21-19.01.log.html | 20:01 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!