tonyb | I've used kolla-ansible to perform that restart, and 'cluster_status looks good. | 01:27 |
---|---|---|
tonyb | nova-compute still isn't happy. | 01:29 |
tonyb | Okay it's looking happier now. | 01:42 |
tonyb | Is there a nodepool / openstack test I can run? to gather more information. | 01:43 |
tonyb | It looks like there's lots to clean up. | 02:04 |
Clark[m] | tonyb: no nodepool test. I think getting the mirror up again is a good test and if that works we can try enabling nodepool again and see how it does. Though cleanup first is probably a good idea | 03:46 |
Clark[m] | I'd also be interested in knowing what got things working again | 03:46 |
tonyb | I can write that down. it boils down to using kolla-ansible to restart rabbit | 04:06 |
tonyb | in terms on clean up we have lots of nodes in the deleting state. and the mirror node in reboot | 04:07 |
tonyb | is it fine to just delete everything all together other than the mirror VM? | 04:31 |
*** Xie is now known as liushy | 06:52 | |
*** Xie is now known as liushy | 09:23 | |
opendevreview | Antoine Musso proposed opendev/git-review master: Add --wip as an alias to --work-in-progress https://review.opendev.org/c/opendev/git-review/+/906508 | 11:12 |
hashar | clarkb: thank you to have used Gerrit mailing list when last year you had an issue with searched queries exploding due to starred changes / max terms limit ( https://groups.google.com/g/repo-discuss/c/ynLek7qg7bg/m/FCPFHAayAQAJ ) | 12:44 |
frickler | hashar: oh, nice, did you also encounter this issue? so far I've had the impression that I'm the only star collector ;) | 12:47 |
hashar | yeah | 12:48 |
hashar | we have some users having more than a thousand of stars (the limit is at 1024) | 12:48 |
hashar | and they most probably have a real use case for that (they are some of our most active gate keepers / reviewers) | 12:48 |
hashar | I also suspect that causes the `sendEmail` task to take ages to process and emit an email whish cause delays in delivery for everyone due to a queue building up, but I don't have all the details yet ;) | 12:49 |
hashar | one would typically include `is:starred` in their dashboards | 12:50 |
frickler | oh, so sending emails for updates on starred changes? I stopped having gerrit send me mails some time ago and work only with dashboards and attention markers nowadays | 12:52 |
hashar | we have plenty of people (including me) that are emails driven | 12:52 |
hashar | (we are old folks) | 12:53 |
hashar | :D | 12:53 |
frickler | I'm not too young, either, but still do change things. sometimes. ;) and gerrit changing lots of things on every update isn't an easy thing to process for me, too | 12:56 |
hashar | indeed | 12:56 |
hashar | the plus sign is that there are also a lot of positive changes in each new releases | 12:56 |
frickler | hashar: regarding changing the limit mentioned in that thread, we have done that some time ago and not seen any issue I think, so I'd also assuming that doing a much higher limit should not be an issue, like 8k instead of 1k | 12:57 |
hashar | possibly yeah | 12:59 |
hashar | I think the whole issue is the stars are now in All-Users.git and somehow on my setup jgit ends up doing a stat/fopen for each of them :) | 13:01 |
frickler | oh, if you have a lot of users doing this, it could affect things differently than in our case indeed | 13:03 |
frickler | unrelated: mail to infra-root@opendev.org doesn't bounce immediately, but gets queued at the sender, just got the "message delayed 28 hours" warning from my mailer daemon. I wonder if this really is what we want, or whether we would want to have an actual mx running that either rejects everything right away or forwards/aliases this to our common mailbox? | 13:06 |
fungi | frickler: it's been brought up a few times (mostly by me, because we also need proper e-mail addresses for release automation signing keys and bot accounts). we did at one time set up a cyrus server on firehose to receive bug notifications from launchpad to feed into mqtt, so the config we used to do that is likely still hanging out in system-config's git history, but might have been | 13:17 |
fungi | puppet-based back them | 13:17 |
frickler | well the questions would be a) do we want that at all b) do we want a fullblown mail system or just forwarding to the existing @openstack.org address(es)? something to add to the pre-ptg agenda? | 13:33 |
fungi | yeah, we actually discussed it at a ptg around the time of the great rebranding, but it's been a few years and is worth bringing up again | 14:42 |
opendevreview | Merged openstack/project-config master: Restore cinderlib zuul jobs https://review.opendev.org/c/openstack/project-config/+/904035 | 14:52 |
*** blarnath is now known as d34dh0r53 | 15:17 | |
fungi | heading out to grab a quick bite for lunch, should be back within the hour | 16:11 |
clarkb | tonyb: (and others) yes I suspect that we can delete all nodes other than the mirror but not necesasrily all images from the inmotion/openmetal cloud | 16:15 |
clarkb | all of those nodes should be ephemeral nodepool nodes. The only exception may be if we've held a node there but even then it probably wasn't running so not sure it was useful | 16:16 |
opendevreview | Clark Boylan proposed opendev/system-config master: Document gerrit openid login failure debugging https://review.opendev.org/c/opendev/system-config/+/906541 | 16:50 |
clarkb | infra-root ^ hopefully thatmakes future issues like the one we had ~yesterday easier to address | 16:53 |
opendevreview | Clark Boylan proposed opendev/system-config master: Document gerrit openid login failure debugging https://review.opendev.org/c/opendev/system-config/+/906541 | 17:30 |
clarkb | matrix quoting and rst quoting are different :) | 17:30 |
clarkb | that should render a bit better now | 17:30 |
fungi | thanks for jotting that down! | 17:44 |
clarkb | I've started putting my notes for preptg stuff into an etherpad here: https://etherpad.opendev.org/p/opendev-preptg-202402 I'll send email with details to service-discuss today and link to that | 17:44 |
clarkb | but feel free to start adding thoughts now if you have them | 17:44 |
amorin | Hey all, not sure where we need to ask for this, let me know if not here. Axel and myself have +2 on mistral repos, but only for master branch. E.G. on that patch https://review.opendev.org/c/openstack/mistral/+/906283 | 17:57 |
amorin | We are only allowed to +1. That prevent any change to be cherry picked back to stables | 17:57 |
amorin | Is there any process to request this? | 17:57 |
clarkb | amorin: let me look at the way the acls are set up | 17:58 |
amorin | Thx | 17:58 |
fungi | it's not uncommon for openstack projects to have acl carve-outs for stable branches that have a different "core" group | 17:59 |
clarkb | https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/openstack/mistral.config#L9-L22 yup there is a mistral-stable-maint | 17:59 |
clarkb | https://review.opendev.org/admin/groups/12a0462fb2ca6355ae2661decbfc0b7eb4c943d2,members is the current membership and they can add you to the group | 17:59 |
clarkb | amorin: ^ the easiest thing is to have someone in that group add you. If they are not longer active then a gerrit admin can help you resurrect things | 18:00 |
amorin | Ack, Oleg is active in that list, I will ask him them. Thanks! | 18:01 |
clarkb | finally got email sent about the pre ptg planning | 19:02 |
clarkb | I did my best to reread it several times to avoid confusion between the ptg and the pre ptg. Hopefully I didn't confuse things | 19:02 |
ildikov | Hi All, I have a quick question. I'm trying to help a person who's setting up new accounts, including one to Gerrit. His user appears in Gerrit, and I could add him to a core group. However, when he hits the 'Reply' button on a review, it says "You don't have permission to vote” under the 'Submit requirements votes' text. Does anyone know what might be the issue? | 19:20 |
clarkb | ildikov: instead of code-review -1 0 +1 it says that you don't have permissions text? | 19:21 |
clarkb | ildikov: it could be an acl problem. Can you link to the change so that I can check the acls for the project and branch? You can also see if they can vote on a change like https://review.opendev.org/c/opendev/system-config/+/906541 which should allow anyone to vote +/-1 | 19:22 |
ildikov | https://usercontent.irccloud-cdn.com/file/eDDKJgk6/Gerrit_no_permission.png | 19:22 |
ildikov | clarkb: here's the screenshot of the message. I believe the buttons are inactive as well. | 19:23 |
ildikov | here's the change: https://review.opendev.org/c/starlingx/governance/+/904892 | 19:24 |
clarkb | ya so I get the same thing on that change. Its going to be project specific I think | 19:24 |
ildikov | Steve is currently not added as reviewer, but that shouldn't be an issue | 19:24 |
ildikov | oh wow, I've never seen that message in Gerrit | 19:25 |
clarkb | ildikov: I don't think there is anything wrong here. For changes to that repo to merge they must have a +1 workflow vote. That makes it a submit requirement vote and that user (and myself) don't have permissions to vote on that label | 19:25 |
ildikov | so, I even added Steve to the TSC core group for that repo | 19:25 |
ildikov | clarkb: ah, so that's the workflow row | 19:26 |
ildikov | OMG, that's so not intuitive... :) | 19:26 |
clarkb | https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/starlingx/governance.config these are the acls for the project | 19:26 |
clarkb | code review is listed as no-block that means it isn't a submit requierment its just informational (which gerrit calls a trigger vote because the information may be used to triggeractions) | 19:27 |
ildikov | yep, I remember putting that file together :) | 19:27 |
clarkb | https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/starlingx/governance.config#L7-L9 people in one of these three groups will not get the message beacuse they can add workflow votes | 19:27 |
clarkb | if you are added to the group you may need a hard refresh to reload the page and get the new submission abilities | 19:28 |
ildikov | clarkb: I understand the rules, no need to type it in. The visualization is confusing here. | 19:28 |
clarkb | can you be more specific about what is confusing? Is it that gerrit differentiates between the types of labels? | 19:29 |
ildikov | clarkb: I didn't realize that the "You don't have permission to vote” is *replacing* the 'workflow' row, and only applies to *that* row. | 19:29 |
clarkb | I think gerrit is doing that to make it clear making changes to submit requirements may allow things to merge | 19:29 |
clarkb | ya I think it is trying to say "You don't have permission to modify any of the submit requirements" | 19:30 |
clarkb | making an update like that to gerrit may be straightforward and I can try pushing it | 19:30 |
ildikov | the long part of the story is that I'm trying to help a person in an email thread, and he insisted he don't have permissions to '+1' the change, as in doing a regular review | 19:33 |
clarkb | ildikov: would "You don't have permission to vote on labels used as submit requirements" be more clear? | 19:33 |
ildikov | clarkb: is it possible to not display that section for people who don't have permissions to do anything there? | 19:33 |
clarkb | I suspect that making that change would be rejected by upstream | 19:34 |
ildikov | clarkb: like how you can't see 'rollcall-vote' unless you have rights to that vote type? | 19:34 |
clarkb | ya in this case gerrit si treating submit requirements special because they are special (they are the things you need to address to merge). And I suspect they won't want to stop doing that | 19:34 |
clarkb | but also if the issue is that a user refused to click the +1 button and "send" I don't know that removing the content or changing the sentence will help | 19:35 |
ildikov | clarkb: yeah, I meant more from visualization perspective, since if you don't have permissions to do the vote, then you can't make that vote anyway | 19:35 |
ildikov | clarkb: the "You don't have permission..." message stopped the user from going forward from that point. So not displaying that message would help a lot with that. | 19:36 |
clarkb | ildikov: yup, (and I hate that this is the case) but the change that added the message has a google bug id that I can't see so this was a very intentional update by google to gerrit | 19:36 |
clarkb | and in my experience stuff like that is not easily changed in gerrit. We can probably try and make the string more verbose but removing is likely to make someone unhappy | 19:37 |
ildikov | clarkb: we're talking about someone who knows Gerrit even less than I do, and you can see I'm far from expert level myself :) | 19:37 |
ildikov | clarkb: ACK and no sweat, I just wanted to clarify how I mean that being a solution to the challenge | 19:38 |
clarkb | if I had to guess the motiviation here is that most users want to know how to get their change to merge. So gerrit is trying to be extra verbose about the state changes necessary to make that happen | 19:39 |
ildikov | clarkb: Maybe this: "You don't have permission to vote on labels used in this section as submit requirements"" | 19:39 |
clarkb | ack can try and be clear this is a distinct grouping from the others | 19:39 |
ildikov | that would be amazing, thank you! | 19:40 |
ildikov | apologies for the long chat in circles | 19:40 |
ildikov | user friendliness is a very subjective topic... :) | 19:41 |
clarkb | ildikov: https://gerrit-review.googlesource.com/c/gerrit/+/404638 | 19:48 |
ildikov | Thanks clarkb !! | 20:14 |
fungi | user-friendliness is not always equivalent to google-friendliness | 20:57 |
tonyb | clarkb: WRT to removing the nodepool instances in inmotion. I don't want to do anything that makes getting nodepool "in sync" harder. | 23:12 |
clarkb | tonyb: looking at a nodepool listing for inmotion `x sudo docker exec nodepool-docker_nodepool-launcher_1 nodepool list | grep inmotion` on one of nl01-04 does show we're apparently trying to delete instances in a loop | 23:13 |
clarkb | tonyb: doing an openstack server list does seem to show those servers in the openstack cloud too. For whatever reason (likely fallout of the rabbitmq stuff) the deletes are failing | 23:14 |
clarkb | tonyb: if we manually delete the np003* nodes in the cloud then nodepool should detect the server is gone and clean up things on its side | 23:15 |
clarkb | tonyb: picking the first node in the openstack server listing is np0036422250. If you try to server delete it by uuid openstack claims the server does not exist | 23:17 |
clarkb | I feel like we ran into this before and melwitt helped us debug but it was some mismatch in db entries and doing a naive realignment of the rows made it work | 23:17 |
clarkb | in any case nodepool reports it is trying to delete all of the instances already and those attempts are failing. If we manually delete on the cloud side nodepool should detect this and update things on the nodepool side | 23:18 |
tonyb | Thanks for helping me understand that nodepool will basically self heal if the underlying instances go away | 23:19 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Switch from legacy to new style keycloak container https://review.opendev.org/c/opendev/system-config/+/905469 | 23:19 |
clarkb | tonyb: if the nodepool listing showed a bunch of servers in a ready or used state then it becomes more dangerous because we could impact running jobs | 23:19 |
clarkb | tonyb: but since they are all deleting we can clean up behind the scenes and nothing should be trying to use them | 23:19 |
clarkb | I guess hold is the other state to worry about | 23:19 |
tonyb | I'll try and clean them out, and get the mirror node up again during my day. | 23:20 |
clarkb | cool thanks | 23:20 |
opendevreview | Merged opendev/system-config master: Document gerrit openid login failure debugging https://review.opendev.org/c/opendev/system-config/+/906541 | 23:32 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!