Wednesday, 2024-01-24

tonybI've used kolla-ansible to perform that restart, and 'cluster_status looks good.01:27
tonybnova-compute still isn't happy.01:29
tonybOkay it's looking happier now.01:42
tonybIs there a nodepool / openstack test I can run? to gather more information.01:43
tonybIt looks like there's lots to clean up.02:04
Clark[m]tonyb: no nodepool test. I think getting the mirror up again is a good test and if that works we can try enabling nodepool again and see how it does. Though cleanup first is probably a good idea03:46
Clark[m]I'd also be interested in knowing what got things working again03:46
tonybI can write that down.  it boils down to using kolla-ansible to restart rabbit04:06
tonybin terms on clean up we have lots of nodes in the deleting state. and the mirror node in reboot 04:07
tonybis it fine to just delete everything all together other than the mirror VM?04:31
*** Xie is now known as liushy06:52
*** Xie is now known as liushy09:23
opendevreviewAntoine Musso proposed opendev/git-review master: Add --wip as an alias to --work-in-progress
hasharclarkb: thank you to have used Gerrit mailing list when last year you had an issue with searched queries exploding due to starred changes / max terms limit ( )12:44
fricklerhashar: oh, nice, did you also encounter this issue? so far I've had the impression that I'm the only star collector ;)12:47
hasharwe have some users having more than a thousand of stars (the limit is at 1024) 12:48
hasharand they most probably have a real use case for that (they are some of our most active gate keepers / reviewers)12:48
hasharI also suspect that causes the `sendEmail` task to take ages to process and emit an email whish cause delays in delivery for everyone due to a queue building up, but I don't have all the details yet ;)12:49
hasharone would typically include `is:starred` in their dashboards12:50
frickleroh, so sending emails for updates on starred changes? I stopped having gerrit send me mails some time ago and work only with dashboards and attention markers nowadays12:52
hasharwe have plenty of people (including me) that are emails driven12:52
hashar(we are old folks)12:53
fricklerI'm not too young, either, but still do change things. sometimes. ;) and gerrit changing lots of things on every update isn't an easy thing to process for me, too12:56
hasharthe plus sign is that there are also a lot of positive changes in each new releases12:56
fricklerhashar: regarding changing the limit mentioned in that thread, we have done that some time ago and not seen any issue I think, so I'd also assuming that doing a much higher limit should not be an issue, like 8k instead of 1k12:57
hasharpossibly yeah12:59
hasharI think the whole issue is the stars are now in All-Users.git and somehow on my setup jgit ends up doing a stat/fopen for each of them :)13:01
frickleroh, if you have a lot of users doing this, it could affect things differently than in our case indeed13:03
fricklerunrelated: mail to doesn't bounce immediately, but gets queued at the sender, just got the "message delayed 28 hours" warning from my mailer daemon. I wonder if this really is what we want, or whether we would want to have an actual mx running that either rejects everything right away or forwards/aliases this to our common mailbox?13:06
fungifrickler: it's been brought up a few times (mostly by me, because we also need proper e-mail addresses for release automation signing keys and bot accounts). we did at one time set up a cyrus server on firehose to receive bug notifications from launchpad to feed into mqtt, so the config we used to do that is likely still hanging out in system-config's git history, but might have been13:17
fungipuppet-based back them13:17
fricklerwell the questions would be a) do we want that at all b) do we want a fullblown mail system or just forwarding to the existing address(es)? something to add to the pre-ptg agenda?13:33
fungiyeah, we actually discussed it at a ptg around the time of the great rebranding, but it's been a few years and is worth bringing up again14:42
opendevreviewMerged openstack/project-config master: Restore cinderlib zuul jobs
*** blarnath is now known as d34dh0r5315:17
fungiheading out to grab a quick bite for lunch, should be back within the hour16:11
clarkbtonyb: (and others) yes I suspect that we can delete all nodes other than the mirror but not necesasrily all images from the inmotion/openmetal cloud 16:15
clarkball of those nodes should be ephemeral nodepool nodes. The only exception may be if we've held a node there but even then it probably wasn't running so not sure it was useful16:16
opendevreviewClark Boylan proposed opendev/system-config master: Document gerrit openid login failure debugging
clarkbinfra-root ^ hopefully thatmakes future issues like the one we had ~yesterday easier to address16:53
opendevreviewClark Boylan proposed opendev/system-config master: Document gerrit openid login failure debugging
clarkbmatrix quoting and rst quoting are different :)17:30
clarkbthat should render a bit better now17:30
fungithanks for jotting that down!17:44
clarkbI've started putting my notes for preptg stuff into an etherpad here: I'll send email with details to service-discuss today and link to that17:44
clarkbbut feel free to start adding thoughts now if you have them17:44
amorinHey all, not sure where we need to ask for this, let me know if not here. Axel and myself have +2 on mistral repos, but only for master branch. E.G. on that patch
amorinWe are only allowed to +1. That prevent any change to be cherry picked back to stables17:57
amorinIs there any process to request this?17:57
clarkbamorin: let me look at the way the acls are set up17:58
fungiit's not uncommon for openstack projects to have acl carve-outs for stable branches that have a different "core" group17:59
clarkb yup there is a mistral-stable-maint17:59
clarkb,members is the current membership and they can add you to the group17:59
clarkbamorin: ^ the easiest thing is to have someone in that group add you. If they are not longer active then a gerrit admin can help you resurrect things18:00
amorinAck, Oleg is active in that list, I will ask him them. Thanks!18:01
clarkbfinally got email sent about the pre ptg planning19:02
clarkbI did my best to reread it several times to avoid confusion between the ptg and the pre ptg. Hopefully I didn't confuse things19:02
ildikovHi All, I have a quick question. I'm trying to help a person who's setting up new accounts, including one to Gerrit. His user appears in Gerrit, and I could add him to a core group. However, when he hits the 'Reply' button on a review, it says "You don't have permission to vote” under the 'Submit requirements votes' text. Does anyone know what might be the issue?19:20
clarkbildikov: instead of code-review -1 0 +1 it says that you don't have permissions text?19:21
clarkbildikov: it could be an acl problem. Can you link to the change so that I can check the acls for the project and branch? You can also see if they can vote on a change like which should allow anyone to vote +/-119:22
ildikovclarkb: here's the screenshot of the message. I believe the buttons are inactive as well.19:23
ildikovhere's the change:
clarkbya so I get the same thing on that change. Its going to be project specific I think19:24
ildikovSteve is currently not added as reviewer, but that shouldn't be an issue19:24
ildikovoh wow, I've never seen that message in Gerrit19:25
clarkbildikov: I don't think there is anything wrong here. For changes to that repo to merge they must have a +1 workflow vote. That makes it a submit requirement vote and that user (and myself) don't have permissions to vote on that label19:25
ildikovso, I even added Steve to the TSC core group for that repo19:25
ildikovclarkb: ah, so that's the workflow row19:26
ildikovOMG, that's so not intuitive... :)19:26
clarkb these are the acls for the project19:26
clarkbcode review is listed as no-block that means it isn't a submit requierment its just informational (which gerrit calls a trigger vote because the information may be used to triggeractions)19:27
ildikovyep, I remember putting that file together :)19:27
clarkb people in one of these three groups will not get the message beacuse they can add workflow votes19:27
clarkbif you are added to the group you may need a hard refresh to reload the page and get the new submission abilities19:28
ildikovclarkb: I understand the rules, no need to type it in. The visualization is confusing here.19:28
clarkbcan you be more specific about what is confusing? Is it that gerrit differentiates between the types of labels?19:29
ildikovclarkb: I didn't realize that the "You don't have permission to vote” is *replacing* the 'workflow' row, and only applies to *that* row.19:29
clarkbI think gerrit is doing that to make it clear making changes to submit requirements may allow things to merge19:29
clarkbya I think it is trying to say "You don't have permission to modify any of the submit requirements"19:30
clarkbmaking an update like that to gerrit may be straightforward and I can try pushing it19:30
ildikovthe long part of the story is that I'm trying to help a person in an email thread, and he insisted he don't have permissions to '+1' the change, as in doing a regular review19:33
clarkbildikov: would "You don't have permission to vote on labels used as submit requirements" be more clear?19:33
ildikovclarkb: is it possible to not display that section for people who don't have permissions to do anything there?19:33
clarkbI suspect that making that change would be rejected by upstream19:34
ildikovclarkb: like how you can't see 'rollcall-vote' unless you have rights to that vote type?19:34
clarkbya in this case gerrit si treating submit requirements special because they are special (they are the things you need to address to merge). And I suspect they won't want to stop doing that19:34
clarkbbut also if the issue is that a user refused to click the +1 button and "send" I don't know that removing the content or changing the sentence will help19:35
ildikovclarkb: yeah, I meant more from visualization perspective, since if you don't have permissions to do the vote, then you can't make that vote anyway19:35
ildikovclarkb: the "You don't have permission..." message stopped the user from going forward from that point. So not displaying that message would help a lot with that.19:36
clarkbildikov: yup, (and I hate that this is the case) but the change that added the message has a google bug id that I can't see so this was a very intentional update by google to gerrit19:36
clarkband in my experience stuff like that is not easily changed in gerrit. We can probably try and make the string more verbose but removing is likely to make someone unhappy19:37
ildikovclarkb: we're talking about someone who knows Gerrit even less than I do, and you can see I'm far from expert level myself :)19:37
ildikovclarkb: ACK and no sweat, I just wanted to clarify how I mean that being a solution to the challenge19:38
clarkbif I had to guess the motiviation here is that most users want to know how to get their change to merge. So gerrit is trying to be extra verbose about the state changes necessary to make that happen19:39
ildikovclarkb: Maybe this: "You don't have permission to vote on labels used in this section as submit requirements""19:39
clarkback can try and be clear this is a distinct grouping from the others19:39
ildikovthat would be amazing, thank you!19:40
ildikovapologies for the long chat in circles19:40
ildikovuser friendliness is a very subjective topic... :)19:41
ildikovThanks clarkb !!20:14
fungiuser-friendliness is not always equivalent to google-friendliness20:57
tonybclarkb: WRT to removing the nodepool instances in inmotion.  I don't want to do anything that makes getting nodepool "in sync" harder.  23:12
clarkbtonyb: looking at a nodepool listing for inmotion `x sudo docker exec nodepool-docker_nodepool-launcher_1 nodepool list | grep inmotion` on one of nl01-04 does show we're apparently trying to delete instances in a loop23:13
clarkbtonyb: doing an openstack server list does seem to show those servers in the openstack cloud too. For whatever reason (likely fallout of the rabbitmq stuff) the deletes are failing23:14
clarkbtonyb: if we manually delete the np003* nodes in the cloud then nodepool should detect the server is gone and clean up things on its side23:15
clarkbtonyb: picking the first node in the openstack server listing is np0036422250. If you try to server delete it by uuid openstack claims the server does not exist23:17
clarkbI feel like we ran into this before and melwitt helped us debug but it was some mismatch in db entries and doing a naive realignment of the rows made it work23:17
clarkbin any case nodepool reports it is trying to delete all of the instances already and those attempts are failing. If we manually delete on the cloud side nodepool should detect this and update things on the nodepool side23:18
tonybThanks for helping me understand that nodepool will basically self heal if the underlying instances go away23:19
opendevreviewJeremy Stanley proposed opendev/system-config master: Switch from legacy to new style keycloak container
clarkbtonyb: if the nodepool listing showed a bunch of servers in a ready or used state then it becomes more dangerous because we could impact running jobs23:19
clarkbtonyb: but since they are all deleting we can clean up behind the scenes and nothing should be trying to use them23:19
clarkbI guess hold is the other state to worry about23:19
tonybI'll try and clean them out, and get the mirror node up again during my day.23:20
clarkbcool thanks23:20
opendevreviewMerged opendev/system-config master: Document gerrit openid login failure debugging

Generated by 2.17.3 by Marius Gedminas - find it at!