Thursday, 2023-03-02

opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acl : remove deprecated NoOp function
opendevreviewIan Wienand proposed openstack/project-config master: acls : handle submit requirements in normalise tool
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acl : convert NoBlock to submit-requirements
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acl : handle key / values with multiple =
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acl : Update Review-Priority to submit-requirements
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acl : Convert remaining AnyWithBlock to submit requirements
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acl : disallow "function"
ianwa bit bigger stack than i thought, but i think that gets us there with submit requirements in the acl bits anyway00:19
clarkbianw: I think is flawed because max with block or whatever is the default label function. You need to explicitly set the function to noop when using submit requirements00:23
clarkbianw: this is the thing I got all confused about when reading the upstream docs00:23
clarkband noblock == noop so the next change has a similar problem? We should probably standarize on a single predicate there to avoid confusion about potnetial behavior differences00:24
ianwahh yes i remember this discussion now00:26
ianwwe never got an official answer on that did we00:26
clarkbwe did not, but we poked around in the code and seemed to reach the conclusion you should explicitly set the deprecated values (ugh_) because the first enum entry is max with block and that would potentially override the submit requirements so yo uneed to set noop/noblock explicitly00:27
clarkbI suspect that in gerrit 3.9 or so we'll end up deleting all the noop entries00:27
ianwthat's right, it's paging back in now :)00:28
clarkband to be clear I personally find this whole thing confusing so if someone ends up with an alternative understanding please don't be afaid to present it :000:31
clarkber :)00:31
clarkbit is entirely possible I'm wrong00:31
ianwi'm putting a 3.7 on hold and will poke to get some definitive answers (that may raise more questions, but anyway :)01:00
ianwremote: ERROR: commit 2c8917f: Cannot delete 'label.Review-Priority.function'. Label functions can only be set to {NO_BLOCK, NO_OP, PATCH_SET_LOCK}. Use submit requirements instead of label functions.03:14
ianwso yeah, we can't delete the function03:14
ianwif you use the ui, it does default to adding MaxWithBlock03:34
ianwan "is:true" submit-requirement seems to still make it a "trigger vote"03:37
ianwi can't see that the submit-requirement description shows up in the UI anywhere03:40
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acl : submit-requirements for deprecated NoOp function
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acl : add submit requirements to NoBlock labels
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acl : handle key / values with multiple =
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acl : Update Review-Priority to submit-requirements
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acl : Convert remaining AnyWithBlock to submit requirements
*** jpena|off is now known as jpena08:27
opendevreviewRadosław Piliszek proposed openstack/project-config master: Add the main NebulOuS repo
yoctozeptoinfra-root: ^ I was not sure if I should go for a new zuul tenant (as its creation is not really documented in the "Project Creator's Guide") and so I have chosen the opendev tenant10:17
yoctozeptoplease let me know what would be the best approach here10:17
opendevreviewRadosław Piliszek proposed openstack/project-config master: Add the main NebulOuS repo
frickleryoctozepto: some more information in term of the expected scope/size of your project might be helpful in order to decide whether a dedicated tenant would be useful. like if you are talking about the "main" repo, how many do you expect to have in the end?10:52
frickleralso, do you plan to only consume resources or also to contribute, like in terms of CI capacity or administration man-power?10:53
yoctozeptofrickler: a max of roughly a dozen, let's say 15, if all components decide to split out from the initial monorepo (not likely); we can offer some administration manpower (for one, you will see more of my activity :D I can also get trained on the work behind the scenes to help ourselves and you all) but otherwise I don't believe we have any relevant computing capacity in the budget (but I will check with my supervisors/management)11:03
frickleryoctozepto: o.k., let's wait and see what the others think11:18
fungiyoctozepto: i don't object, just be aware that there will be some overhead on the repo count. at a very minimum you'll need a base jobs repo for the tenant, you might be able to get away with reusing it as your central trusted config repo too but may want to make that separate, and then you'll likely also want a single-branch untrusted repo to hold most of your central job configuration that13:04
fungidoesn't need access to things like secrets13:04
fungimaybe look at the vexxhost or zuul tenants for some smaller examples13:07
yoctozeptofungi: is this documented somewhere? as to what tradeoffs are there with each option? I don't mind running as part of the opendev tenant in general, were just curious what the approach is13:07
yoctozeptoI didn't mean to sound like I demand a separate tenant13:08
yoctozeptoack, zuul and vexxhost noted for later13:08
fungiyoctozepto: the main tradeoff is you're taking on management of the aforementioned overhead repos. the closest thing there is to a document describing that repo split is, i think,
yoctozeptofungi: I see; though what do I *gain*? :D or do I only get more work to do for no gain? :D13:12
yoctozeptothis is what I was pondering upon13:12
fungiyoctozepto: you gain the flexibility to diverge more from the job setup in other tenants, and control of all of that without having to wait for other gatekeepers to approve your patches for it13:13
fungibut as is often said of open source, if it breaks you get to keep the pieces ;)13:14
fungi(in seriousness, i'm happy to help if you run into issues with your tenant configs though, as i expect are other folks)13:14
opendevreviewdaniel.pawlik proposed zuul/zuul-jobs master: Provide deploy-microshift role
fungiyoctozepto: note that changes to jobs in trusted config repos, including your base job and anything it runs, can't safely be directly tested pre-merge. we've figured out a workflow for mostly safely making changes to ours, which you might want to follow if you go the route of having your own tenant:
yoctozeptofungi: I see, thanks for the clarifications! I will give it some thought later after doing some more reading too; one last question for now - will the separate tenant have access to the same pool of nodes?13:17
yoctozeptoah yeah, I have seen this discussion before13:18
opendevreviewEspen Flage-Larsen proposed openstack/diskimage-builder master: Removed erroneous, comment, check for grub.d
fungiyoctozepto: yes, our pools are available to all tenants13:19
fungikeep in mind that job configuration can't be inherited between tenants though, which is why you'll see openstack projects included in the projects list for tenants like vexxhost or zuul13:19
fungiif you want to run jobs or consume other configuration from specific repos, you have to explicitly add them to the tenant (even if that tenant is not gating changes for them, it needs to know to include the configuration from them)13:20
yoctozeptooh, yeah, I have seen this pattern while editing the zuul tenant config, seems reasonable enough13:22
yoctozeptoI think we won't be needing them for now at least, this is pretty independent13:22
fungiright, if you decided you wanted to run... say... devstack jobs (or worse still, tempest jobs) then you'd need to include openstack's kitchen sink13:23
fungithose sort of interdependency tentacles are one of the main reasons we've struggled to split some repositories out to different tenants13:24
yoctozeptoyeah, I can imagine13:25
fungizuul's composable configuration model is a double-edged sword. it can be really convenient to incorporate configuration from another project, but do that too often without a clear plan and you can end up hamstrung with dependency hell type problems13:27
fungiso like any complex system, design and planning are important13:28
yoctozeptoat least it makes us keep our jobs in the long term haha13:29
yoctozeptoI agree totally13:29
*** tweining[m] is now known as tweining|ghost13:34
priteauHello. Just so you know, I've had a job fail with: "ECDSA host key for has changed and you have requested strict checking."14:18
fungipriteau: we see that from time to time when nova has lost track of a vm in some cloud provider (thinks it was successfully deleted but remained running for some reason), and then neutron assigns the same ip address to a new server instance we've created, then network connections get randomly sent to the old vm resulting in that error (this time appears to have been in rackspace's iad region).14:31
fungithey usually run periodic cleanup activities in the background to find and terminate such "rogue" virtual machines so the problem ip address usually clears up after a short time14:31
fungiif we see frequent occurrences we'll often correlate the ip addresses from the errors and supply a list to the provider in order to let them know14:32
fricklerthat the same address that was mentioned a couple of days ago, let me see if I can catch the duplicate host14:38
fricklerhmm, seems not to belong to our set of CI nodes currently, at least I cannot login via SSH. iirc contact to rackspace would be via their ticket system? not sure if twice in a week would already fall under "frequent"14:43
fricklerhere are two more!(),refreshInterval:(pause:!t,value:0),time:(from:now-7d,to:now))&_a=(columns:!(_source),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:'message:%20%22ECDSA%20host%20key%20for%20172.99.67.80%20has%20changed%22'),sort:!())14:46
opendevreviewdaniel.pawlik proposed zuul/zuul-jobs master: Provide deploy-microshift role
clarkbgerrit community meeting was quick, but I got some helpful feedback.16:24
clarkbRe java 17 NasserG didn't think the workaround suggested was problematic but was also apparently surprised it worked with java 17. They also clarified java 11 support isn't going away anytime soon so unless we needed functionality in java 17 (most likely gc features) there isn't any urgency to upgrade16:25
clarkbgiven ^ I think maybe we shelf java 17 work for now and hope upstream is able to fix the problem and then we won't have to provide workarounds to java command lines16:25
clarkbfor the sshd thing it sounds like the config option probably shouldn't be secret and should be documented so I volunteered to work up a change for that. And NasserG said they would try and test ianw's patch in their systems as they had to set the flag to disable things as it created a ton of failures for them16:26
clarkbI'm going to find breakfast but then I'll wip the java 17 change and I think we come back to that when the issues are sorted or when it becomes urgent enough we're willing to apply the workaround16:28
*** xhku_ is now known as xhku16:37
opendevreviewJulia Kreger proposed openstack/diskimage-builder master: WIP: save the boot filesystem for fips workload cases
hasharclarkb: hi, good to see you ar eattending the Gerrit community meeting :]16:47
clarkbhashar: yup, I'm trying to get more involved with Gerrit. We've been managing to push some bug fixes and changes occasionally and have been keeping our Gerrit install far more up to date16:48
hasharabout Gerrit SshChannelNotFoundException, looks like it got talked aobut on the mailling list and someone mentioned that `sshd.enableChannelIdTracking` setting.  my summary is
hasharapparently I have set it to `false` back in January but I have not revisited the task since then16:49
clarkbhashar: yup that seems to be the common action people have taken. But ianw dug into it and found a bug with the original channel id tracking change. He pushed a fix but we are not sure if that fixes the underlying issue or just to source of the specific exception16:49
hasharah good ianw!!!16:50
clarkbNasserG said they would try and test ianw's change to see if it fixes the underlying problem16:50
*** bhagyashris|ruck is now known as bhagyashris16:51
hasharnice :]16:54
opendevreviewClark Boylan proposed opendev/system-config master: Remove gitea08 from haproxy
clarkbinfra-root ^ thats the next step in gitea things other than restarting gerrit to pick up the replication plugin config changes16:55
clarkbI'm going to go bring the gerrit restart up in openstack-release now to make sure we don't interfere with RCs16:55
clarkb876194 should be safe though as its just removing a node from haproxy config16:56
hasharclarkb: I have grep in our Gerrit log and apparently the issue no more occurs so I guess sshd.enableChannelIdTracking=false did the trick :]16:59
clarkbya that disables the functionality entirely so it can't trigger the broken registration of the objects17:00
clarkbbut whether or not there is some underlying issue is still not understood I think17:00
*** jpena is now known as jpena|off17:47
opendevreviewJulia Kreger proposed openstack/diskimage-builder master: WIP: save the boot filesystem for fips workload cases
opendevreviewClark Boylan proposed opendev/ master: Add gitea10-12 to DNS
opendevreviewClark Boylan proposed opendev/system-config master: Add gitea10-12 to our inventory
clarkbinfra-root ^ production line gitea deployments. Please double check I didn't copy pasta anything poorly or miss a step18:20
clarkbI went ahead and single core approved the gitea08 haproxy removal18:31
clarkbits a pretty safe chagne since the server is sticking around for a little while longer and only affects the load balancer18:32
fungiwfm, thanks!18:35
clarkbI'm also thinking once 09-12 are up and rnning we may temporarily remove the old servers from haproxy to see how four servers cope18:36
clarkbI know you mentioned wanting to keep 8 but these servers are a fair bit bigger so I'm not sure we need to keep 8. Experiments should help us answer that question18:37
clarkbmaybe we need 6 instead of 8 etc18:37
clarkbI was going to add the gitea ssh hostkeys to gerrit at this point too until I remembered I need the port 222 hostkey which doesn't exist yet18:38
clarkbOnce 876202 and 876201 land the next hold up is going to be restarting gerrit to pick up replication changes18:38
clarkbthe openstack release team indicated after 1900 UTC would be ok for them18:39
fungiyeah, mainly thinking that we're more memory-bound when we get a backend overloaded, so the memory increase while keeping the same cpu count should mean less disruption18:39
clarkbalso I intend on removing gitea08 replication when I add 10-12 replication. This reduces the number of replication plugin reloads which should reduce changes for replication tasks getting lost18:42
fungibut then again, usually the offender ends up hitting a single backend, so more backends doesn't really make that much of a difference if the added memory per backend serves as a sufficient cushion18:42
clarkbfungi: exactly18:42
clarkbfungi: ianw: how about ~2100UTC to down up review? I think that has good overlap with my day and the two of you should anything unexpected occur (mostly thinking about the change of bind mounts18:44
fungii'll be around, sounds fine with me18:45
fungihappy to help with it too18:46
opendevreviewFelipe Reyes proposed openstack/project-config master: Add Ironic Dashboard charm to OpenStack charms
opendevreviewFelipe Reyes proposed openstack/project-config master: Add Ironic Dashboard charm to OpenStack charms
opendevreviewMerged opendev/system-config master: Remove gitea08 from haproxy
clarkblooking on review02 the current opendevorg/gerrit:3.6 image seems to match the image at
clarkb876194's merging has triggered a manage projects job run in deploy. I think we want that to complete before the gerrit restart. I've done some extra double checking on review02 and I think things are the way we want them. Just need to do a down them up20:39
clarkblooks like ansible galaxy changed their content responses and broke our test for proxying it20:43
clarkbdo we know if that is being used yet? there is a good chnace it broke any of them too20:43
clarkb(it resulted in a -1 on my gitea10-12 change)20:43
fungiTengu: ^ you were the one working on that, right?20:44
clarkbok the base job failed for 876194 which means manage-projects isn't running. However the gitea-lb job is running so it didn't care about that failure. I think this means w edon't need to wait on gerrit restarts and that the haproxy config update should apply anyway. The base job failed due to apt index issues with the osuosl mirror20:52
clarkbI suspect that issue will go away on its own once the upstream mirror that node talks to is fully in sync again20:52
ianwi'm around if you want to restart gerrit :)20:56
clarkbyup I'm going to do a docker-compose pull just in case mariadb wants to update20:56
clarkbit did20:57
clarkbanyone else want ot check the gerrit image lines up with what we expect before I down then up?20:58
clarkbthe jdk package chagne added 220mb to the image :/ but I think thats worth it should it become useful for debugging20:59
clarkbfungi: ianw: let me know when you're ready for me to do the restart. I think I've done the checks I was going to do21:00
ianwall lines up with "opendevorg/gerrit@sha256:c3bd03fcee7751849ea8cdb7045fb4d1118296037824bdc5d277158986b8cf12 from 875553 so lgtm21:02
clarkbfungi: ^ last call?21:03
sanchankunhi, i'm having troubles setting up horizon zun-ui integration. Horizon apache logs only mention glance not supporting docker image backend. zun-api,compute,ws-proxy,cni;kuryr , work as expected. Any ideas where to look for logs? :-(   21:03
clarkbsanchankun: give us a few as we are about to do a gerrit restart to pick up a config change21:04
clarkbianw: I'll give fungi a few more minutes to tell us to hold up, otherwise I'll proceed21:05
clarkbalright proceeding now. I'll do a docker-compose down, then a docker-compose up -d21:08
clarkbthats done. Web ui loads for me. New version is reported in the page footer. I see content in /home/gerrit2/review_site/data21:10
clarkbthis also picked up the brand new gerrit 3.6.4 release21:11
clarkbthey don't have release notes up for that yet :/21:13
ianwthe website was lagging on 3.7.1 too21:13
fungino ned to hold up, i'm ready to help with the restart21:13
ianwlooks like both in review
clarkbfungi: its done :) at this point just double checking that things look good would be helpful. Maybe review and/or land as a good way to exercise things21:14
clarkbianw: ^ fyi21:14
clarkbsanchankun: is this for a CI job? or for a deployment? We help run the CI systems and may be able to help with issues related to CI jobs, but if this is for a deployment you may have better luck with the horizon and zun teams21:15
sanchankunit is for a deployment. allright , i will try to contact them . thanks a lot ! 21:16
ianwclarkb: interestingly i had to log back in, did you?  21:17
clarkbianw: I did not21:17
ianwok, probably just a local timeout.  wondered if it was the datadir thing21:17
clarkbI think this has come up in the past and I suspect that the cache for logins is only so long or something21:17
clarkbno as far as I know the data dir is only used by plugins and not core gerrit21:18
ianwclarkb: 876201 lgtm.  did those sshfp records spit out OK from the launch node?  i feel like i have changes out for that, but maybe not21:18
clarkbianw: they did not. I just ssh'd in and did ssh-keyscan -D to get them. I had to ssh in because our jammy image is old and won't autoupdate sosreport which leads to unattended upgrades complaints21:18
clarkbso I ssh'd in manually dist-upgraded to fix sosreport, rebooted for good measure, then did ssh-keyscan commands on the host21:19
clarkbI should probably upload a new jammy image to vexxhost before doing the next batch21:19
ianwactually it might have been
ianwi think we probe for the ssh keys before ssh is up again21:20
ianwmight be worth putting that in before launching more nodes for testing.  it *should* work but it's been lots of fussing between old bridge versions that didn't have the tools, and this21:21
clarkbok I'll take a look. First I need to look at the failing mirror testinfra tests. But should be able to look at that next21:22
clarkbmy goal/hope is that I'll be able to add gitea10-12 to the gerrit replication config tomorrow and have the replication run over the weekend21:22
clarkbbut I need the add giteas to inventory change to land after the dns change and that change is currently failing21:22
clarkbmanually trying to reproduce the mirror failure I end up reproducing what the test is looking for so I don't understand why it would fail. Maybe there was a momentary blip on the ansible galaxy side of things?21:34
clarkbI'll recheck and see if it is consistent I guess21:34
fungior they have a cdn/lb sometimes returning content from a troubled backend21:35
clarkbianw: couple of comments on the ssh busy wait loop change21:37
clarkbfungi: adds the new giteas to dns so that they can provision ssl certs when they are added to inventory21:38
clarkbunrelated to everything else I've done today I noticed manila had a change stuck in check for ~9 hours. This appears due to them retrying a 3 hour long job which appears to be retrying because disk is filling and ansible can't find a suitable /tmp or equivalent which is treated as a network connection error. I've pinged the manila time in their irc channel about this21:44
clarkbI suggested they fix the job or move it to experimental to avoid 9 hour delays getting test results back21:44
clarkbgitea08 is much quieter after being removed from haproxy too 21:50
clarkb`gerrit show-queue` shows a new task for com.googlesource.gerrit.plugins.replication.AutoReloadRunnable I guess this is what watching for config updates21:51
opendevreviewJay Faulkner proposed openstack/project-config master: Ironic program adopting virtualpdu
opendevreviewMerged opendev/ master: Add gitea10-12 to DNS
JayFif one of you will let me know if that proj-conf change looks good, I'll add it to the agenda22:02
opendevreviewClark Boylan proposed opendev/system-config master: Update gerrit image builds for 3.6.4 and 3.7.1 tags
clarkbinfra-root ^ re gerrit releases that is a bookkeeping chagne for us as I checked the versions we built our new images with and the new tags just retag old commits with one exception and we were already checking out master for that one exception.22:07
clarkbJayF: that looks about right. I think the docs ask that you use a speciifc topic (whihc ou can set in the web ui)22:08
JayFclarkb: is it OK to add to the lsit before the governance patch lands?22:09
JayFalthough I would assume it's a fait accompli22:09
clarkbJayF: then generally the way the rename process works is we make sure everyone has the prep work and sign offs done. We schedule a gerrit downtime. During that downtime we stop gerrit. Move things around. We start gerrit and then quickly merge the change you pushed to avoid any creation of the old project again and we're in sync at athat point22:12
clarkbwith more than two changes proposed for projects renames I will need to go and rediscover how it is that two manage-project runs in a row don't create problems. I think it is because we always update openstack/project-config from master and don't use the zuul version22:13
clarkb(thats a me thing to worry about not you)22:13
opendevreviewJay Faulkner proposed openstack/project-config master: Ironic program adopting virtualpdu
opendevreviewIan Wienand proposed opendev/system-config master: launch: add a probe for ssh after reboot
clarkbianw: did you catch my notes from the gerrit community meeting earlier today? tldr is I think we'll hold off on java 17 since there isn't any urgency upstream or on our end to move to 17 yet and that gives them time to work out the issue I had to workaround. And for the ssh thing NasserG said they would try to test it in their env since they had to set the secret flag to false. I'll22:30
clarkbalso try to make a change to document the secret flag22:30
ianwyeah, agree on keeping on 11 for now.  17 definitely feels like maybe a mid-release thing we can update to and roll back easily from22:35
clarkbyup, the main benefits are in the garbage collector(s) and we haven't had issues with that recentl so I think we can hold off22:35
clarkbapparently newer javas have all done improvement to the garbage collection systems and that is a big reason people often upgrade22:36
ianwthe flag i have to think about22:36
clarkbianw: I get the sense that everyone was just setting it to false22:36
clarkbso its good to call attention to it and have someone whose installations were more affected by it check the change you pushed22:37
ianwthe fix as merged isn't, iiuc, really doing anything.  so then the flag is turning off the not-working workaround22:38
clarkbright, but your change should fix the fix. At least it will expose if the fix is fixing anything rather than exploding22:38
ianwagree, it would be great to test with that.  but i'm not sure what the secret flag is doing to get things working22:40
clarkboh I see. I think maybe it removes the errors from your logs so you stop worrying about it :)22:40
ianwthat might be correct ... end users still have the same result, but you just don't know about it from your logs22:41
clarkb has a +1 now. The mirror error must've been transiet22:41
ianwthanks, lgtm; left it for you to +w if you want to watch it?22:45
clarkbthanks ya I'll probably approve it now so that things are ready for friday replication starts22:46
clarkbthe only question is whether or not I need to double check the osuosl mirror can update its apt indexes yet22:47
clarkbas I think failed base jobs will break the gitea deployments on those nodes22:47
clarkbit failed to fetch focal-updates previously. Manually running apt-get update on the host was successful I think thats fine22:48
clarkbI've approved it and will keep an eye on it. This step will deploy the empty giteas and is pretty low impact if anything goes wrong22:49
clarkbianw: where did we end up with the function = NoOp stuff?23:57
clarkbit sounded like your testing indicated we did need to keep it after all?23:57
ianwclarkb: yes, you can't push an acl change without a function = line23:58
ianwso i've changed them all to NoOp in the series now23:58
ianwand implemented submit-requirements where required23:58
ianwwhat i'm doing now is just getting a diff together and doc updates for All-Projects23:58
clarkbcool, good to have that run down. I'll have to rereview those changes23:59
clarkbI've just realized that the gitea deployment is likely to take a number of hours due to the inventory updates. I'll try to check on it later this evening but may just end up digging into the results tomorrow morning23:59

Generated by 2.17.3 by Marius Gedminas - find it at!