Wednesday, 2022-11-30

opendevreviewMerged opendev/system-config master: rax: remove identity_api_version 2 pin
ianw... so yeah, rerunning seems to "update" the same 24 changesets in neutron00:03
clarkbfwiw I think if we are failing to update them it isn't a critical failure. THose chgnes will just need someone to manually add those votes if they are still active?00:04
clarkbit might be worth an email to the gerrit mailing list asking them about this behavior and they can hopefully point us to methods for checking if it applied properly?00:05
ianwyep, just seeing if i can glean anything from backend logs00:06
fungiany obvious commonalities between a random sample of those 24 changes?01:07
fungicorvus: i get a crash when trying to open basically any change with that gertty patch... sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such column: comment_1.unresolved01:45
fungiam i missing a db migration or just need to start with a clean slate?01:45
fungirolling back to tip of master works again01:47
opendevreviewJeremy Stanley proposed opendev/system-config master: Improve launch-node deps and fix script bugs
fungiclarkb: ^ addressed your comment01:54
ianwChange 84223 may not exceed 1000 updates. It may still be abandoned, submitted and you can add/remove reviewers to/from the attention-set. To continue working on this change, recreate it with a new Change-Id, then abandon this one.02:36
ianwso that's one problem02:36
fungioh, right, that02:37
* fungi shudders02:38
ianwbefore that is "Error in a slice of project openstack/neutron, will retry and skip corrupt meta-refs [CONTEXT request="SSH" ]"02:39
ianwit almost feels like this change has caused an entire "slice" to be skipped, maybe?  02:40
ianwthere's two other errors02:40
ianwcannot check change kind of new patch set c095827ea8f82a286c84da1ad3d1aab38a2e1328 in openstack/neutron 02:40
ianwcannot check change kind of new patch set f24a40fa5e727f176b63233c6b75476213ffc506 in openstack/neutron 02:40
ianw <-- email about issues03:17
*** pojadhav|out is now known as pojadhav|ruck04:23
*** yadnesh|away is now known as yadnesh04:31
*** frenzy_friday|rover is now known as frenzy_friday|rover|doc06:56
fricklerinfra-root: reviews on would be nice07:09
*** yadnesh is now known as yadnesh|afk07:50
*** frenzy_friday|rover|doc is now known as frenzy_friday|rover08:17
*** jpena|off is now known as jpena08:25
*** yadnesh|afk is now known as yadnesh08:41
*** dviroel_ is now known as dviroel10:47
*** rlandy|out is now known as rlandy11:14
*** ysandeep is now known as ysandeep|PTO12:38
*** dasm|off is now known as dasm12:43
opendevreviewMerged zuul/zuul-jobs master: Use recent node version for markdownlint job
*** frenzy_friday|rover is now known as frenzy_friday|rover|lunch12:58
*** yadnesh is now known as yadnesh|afk13:02
*** yadnesh|afk is now known as yadnesh13:32
*** pojadhav|ruck is now known as pojadhav|dr_appt13:41
*** frenzy_friday|rover|lunch is now known as frenzy_friday|rover13:50
opendevreviewJames E. Blair proposed ttygroup/gertty master: WIP: support inline comment threads
opendevreviewCedric Jeanneret proposed opendev/system-config master: Correct how ansible-galaxy is proxified
Tengufungi: -^^  fyi14:53
corvusfungi: i forgot the git add.  should be gtg now.  but you may want to back up your db in case you need to roll back.14:54
Tengulocally tested [without TLS], it seems to work as expected, I could at least fetch some collection, and follow things in my local httpd logs.14:54
*** yadnesh is now known as yadnesh|away14:59
*** dviroel is now known as dviroel|lunch15:17
fungithanks corvus!15:22
fungiTengu: that's excellent news15:23
Tengufungi: yeah - bit of a headache to get it right, especially with ansible-galaxy being dumb as a donkey, but at least: it's apparently working good.15:23
Tengufungi: not sure if adding some more tests may help?15:24
dmsimardHello o/ There is an account for RDO's third party CI which is under that has been lost to time since I haven't been at Red Hat for a while. Would it be possible to change the email for that account to ?15:28
Tengufungi: hmmm yeah. maybe adding some (better) tests may help. Now I have a good overview of the queries made by ansible-galaxy, and should be able to mimic them using curl/wget/others.15:28
opendevreviewCedric Jeanneret proposed opendev/system-config master: Correct how ansible-galaxy is proxified
Tengulet's see.15:42
fungidmsimard: might be better to ask in #rdo since opendev doesn't really get involved in that sort of thing15:51
fricklerfungi: I read that as request about an opendev gerrit account15:55
dmsimardyeah it's a gerrit account, they're the ones asking me about it since the password has been lost but they can't reset it15:56
dmsimardor so is my understanding15:56
fricklerbut why not simply create a new account?15:57
dmsimardI don't have a strong opinion, I suppose it's easier to keep the same account if possible ¯\_(ツ)_/¯16:00
fungidmsimard: frickler: oh! thanks, i definitely misread the comment16:02
fungii think we can probably update that via the gerrit rest api, i'll have to take a look after the tc meeting16:03
dmsimardfungi: that would be great and much appreciated, it's not an emergency, thanks :)16:04
Clark[m]We cannot change the openid16:07
Clark[m]We can only delete openids currently not change them unless we take an outage16:08
Clark[m]So depending on what lost password means this may not be possible16:08
fungioh, by "that" i meant changing the preferred e-mail address. but yes we can't really recover the account and hand it to someone else16:10
Clark[m]Yes we need more specifics as to what the problem is cc dmsimard 16:11
fungidmsimard: though if you still have login information for the launchpad account used, you could in theory give that to whoever maintains the rdo ci system and they could use it to log into gerrit and update the accounts both there and in launchpad16:12
dmsimardyeah I see what you mean and that makes sense, I'll let them know and report back16:12
dmsimardfungi: *nod*, if I still had access to it that's what I would have done16:13
dmsimardI handed over credentials but the email was never updated :(16:14
fungioh, well maybe they still have it in that case16:14
opendevreviewCedric Jeanneret proposed opendev/system-config master: Correct how ansible-galaxy is proxified
fungiif they have a working ssh key, they may also be able to use the ssh api to update things via set-account16:16
fungidmsimard: Clark[m]: ^16:16
*** dviroel|lunch is now known as dviroel16:19
*** pojadhav|dr_appt is now known as pojadhav|out16:27
*** marios is now known as marios|out16:27
clarkbfwiw I would avoid using a personal account for third party ci. In this case my hunch is that the openid login is associated with that personal account and in that case the account should be shutdown and a new one tied to the org should be used16:32
clarkbjrosser: hey, for your bfv instance rescue setup is the backing volume system ceph? If so what were the special image metadata settings and values that you set? Is that set on the instance you are rescuing's image too or just the image used to rescue?16:32
jrosserclarkb: yes it was ceph - i think we have the values set on all the images so you can then use anything to rescue anything else16:33
jrosserclarkb: then the general purpose images we upload for users all have these properties
clarkbthanks. I'm wondering if the rescued instance also needs those settings for things to work correctly16:36
opendevreviewCedric Jeanneret proposed opendev/system-config master: Correct how ansible-galaxy is proxified
*** jpena is now known as jpena|off17:05
clarkbno new cert errors today as expected17:37
opendevreviewMerged opendev/system-config master: Improve launch-node deps and fix script bugs
fungiunfortunately the mm3 deployment still hasn't completed since it was skipped due to errors at original deployment18:12
fungiwhat's the best way to get it to finish, merge a minimal edit to something or manually run ansible or...?18:13
clarkbfungi: you should be able to manually rerun ansible against the lists3 playbook18:14
clarkboh that doesn't do le though18:14
clarkbso you'd need to manually rerun the le playbook too (ianw did that recently for other hosts and has commands in scrollback somewhere)18:15
clarkbits possible that step completed overnight due to the daily runs though18:15
clarkband you just need to run the lists3 plabook now18:15
clarkbI guess the lists3 playbook isn't in the daily list ( we should fix that if not and if it is then it may have deployed more than you expect)18:15
fungioh, actually it should have run in periodic but failed:
fungiAnsibleUndefinedVariable: 'mailman3_db_password' is undefined18:18
fungiwhoops. missed a step ;)18:18
fungiwould it make more sense to add things for this under host_vars or group_vars?18:19
fungii guess for the test strings we put them under host_vars18:20
frickleris that a local db? would we have a different db when running a second mm3 host?18:20
clarkbfrickler: it is a local db. We shouldn't need a second mm3 host18:21
fungithat's the question. not that we plan to run more than one mm3 host unless we're moving from one to another18:21
fricklerwe might if we migrate to ubuntu 26.04 in the future18:21
fungiso if there are ever two they'll only coexist for a short time18:21
fungithe question is whether those need to share passwords or not18:21
clarkbin this case I think group vars is probably fine?18:21
clarkbunless you are worried our firewalls will break and allow external access from a secondary host accidentally18:22
fricklerI would prefer host_vars anyway, but not too strongly18:23
clarkbya host vars is probably most correct. Just more effort for the future18:23
clarkbI'm happy either way18:23
fungii guess are the things i need to generate18:23
fungiwell, generate and/or set18:24
fungiwe don't seem to set any of those for production in system-confg18:24
fungi(not even username/email)18:24
clarkbwe might be able to set a couple of them but ya I tried to make it clear which things are mm3 specific with that prefix18:24
fricklerwhat host is whois says internap, but no rdns
clarkbI don't know that config came out of the original servers exim config18:29
fungifrickler: oh, i think that got cargo-culted from before old lists.o.o was in puppet18:29
fungicould have sworn i cleaned that up already18:30
fungilooks like we have it cargo-culted into several other configs too if you git grep for it18:30
opendevreviewJeremy Stanley proposed opendev/system-config master: Clean up an old raw IP address from our MTAs
fungifrickler: clarkb: ^18:34
fungii think i asked corvus and mordred about that a while back too, and neither of them remembered why it was there either18:34
clarkbfungi: I guess it is already out of the prod mm3 config18:36
fungiyeah, i must have cleaned it up in our new config and that's what i was remembering18:36
fungiwon't be in git history because it was removed before we approved the change18:37
fungiinfra-root: last call for edits on the mm3 migration announcement at
fungii added a sentence to the final paragraph to clarify about future maintenance for migrating the other sites18:42
opendevreviewMerged openstack/project-config master: Add an Ubuntu FIPS testing token
ianwclarkb: yeah, and doesn't explain the 24 changes that keep updating19:43
ianwalthough it's probably worth bumping the comment limit, running against neutron and seeing if it goes away19:43
ianwthat way we can file a bug that "comments with more than limit cause entire slice to re-run endlessly" (presuming it does)19:44
clarkbianw: ya but that requires an outage19:44
clarkbit does look like luca agrees they shouldn't error in this case. Did you want to followup with them?19:45
ianwyeah, i'll ask if it would cause the recurring updates19:45
clarkbianw: it does look like 84223 is a merged change too19:46
clarkbbut reading luca's response we might also want ot check if any of the changes are active and if not we can probably get away with this as is19:47
clarkbif there are active changes that are sad we can ask reviewers to manually apply their votes on the latest patchsets as a workaround19:47
ianwthe 24 changes that may or may not be fixed are 19:56
clarkbianw: I think the way to check is likely going to involve inspecting the notedb content for the changes19:57
clarkbas luca mentioned a new note comment is added to capture the votes on the patchset so I think we need to look for those after identifying which votes we are trying to carry forward19:57
clarkbianw: re that paste it looks a lot like the other successful run that you shared (for dib iirc). How do we know if others haven't failed?19:58
clarkbI guess the hint is that only neutron is rerunning each time?19:58
opendevreviewMerged openstack/project-config master: Deprecate OpenStack-Ansible rsyslog roles
ianwclarkb: yeah, when rerunning over all projects, it's only this batch that appears again20:03
ianwi just went through them all, they are all either abandoned or closed20:04
ianwbut interestingly, i just noticed 84223 is in there *twice*20:04
clarkbthat might explain the double error we get about th emissing sha20:05
clarkbI think there was some wonder over whethe ror not multiple changes had shared an object20:05
ianwf24a40fa5e727f176b63233c6b75476213ffc506 doesn't appear to be a ps in 8422320:09
clarkbit might be an object and not a commit?20:11
ianwit does say "Cannot check change kind of new patch set f24a40fa5e727f176b63233c6b75476213ffc506" but that also may mean nothing20:12
opendevreviewMerged openstack/project-config master: Add repository for Skyline installation by OpenStack-Ansible
ianwit seems to send me to before it barfs20:13
fungiyeah, i had noted that earlier in the other channel where the error came up20:14
ianwc095827ea8f82a286c84da1ad3d1aab38a2e1328 sends me to 22128 as well20:14
fungii wonder if they're all broken revisions for the same change20:15
opendevreviewMerged openstack/project-config master: Add the cinder-infinidat charm to Openstack charms
ianw22128 *doesn't* appear in the list, updated or not20:16
opendevreviewMerged openstack/project-config master: Add the infinidat-tools charm to Openstack charms
opendevreviewMerged openstack/project-config master: Add manila-infinidat charm to OpenStack charms
fungiianw: according to git notes, f9aba49e05af6f69a6fca8b61c2e7a14d9b78e11 is the commit which merged for that change20:21
fungi10 years ago20:22
fungior nearly (early 2013)20:23
ianwmemories ... like the corners of my mind ... :)20:24
ianwi think we will never find f24a40fa5e727f176b63233c6b75476213ffc506 again20:25
ianwis it just a co-incidence that 84223 has been hashed into this "slice" of changes to update?  why is 22128 not appearing in the list?  does any of this matter?20:26
clarkbhrm maybe 22128 is in the slice and it is failing while gerrit tries to process it20:33
clarkband that causes things to fail enough that it tries to retry every time. fwiw if every change in that lits is closed (apparnetly so) then i think the impact is basically nil according to what luca has said20:34
opendevreviewMerged openstack/project-config master: And ansible-role-proxysql repo to zuul jobs
opendevreviewMerged openstack/project-config master: Telemetry: Switch back to launchpad from storyboard
opendevreviewMerged openstack/project-config master: Add feature branch notifications to openstack-sdks
ianwyeah, the only testing i option i see is quickly shutting down gerrit, bumping the max comments/thread and trying to copy-approvals on neutron and see what happens21:03
ianwif we think that's worthwhile i can try this afternoon when it's quiet21:03
*** dviroel is now known as dviroel|afk21:04
ianwotherwise i can just file a issue about it, but since all the changes being reported are closed, we can assume we're ok21:04
*** Guest305 is now known as atmark21:07
clarkbinfra-root I'm testing nodepool with new openstacksdk (actually running the test steup with known good openstacksdk first to make sure I don't confuse problems in test setup with sdk update issues) and I'm noticing that we may need to do a node cleanup and the inmotion cloud has leaked stuff due to placement I think (though there are a number of building servers too..)21:20
clarkball that to say we should dig into that. I'll try to take a better accounting once I'm done with nodepool testing21:21
clarkbok testing against iweb mtl01 with openstacksdk 0.103.0 produces: keystoneauth1.exceptions.catalog.EndpointNotFound: public endpoint for compute service in mtl01 region not found21:25
clarkbthis error does not occur when running against rax21:25
clarkbok the iweb issue is an issue with the vendor data in clouds.yaml21:28
clarkbI'll try to manually write that content out and not rely on the profile21:28
corvusclarkb: which are the old/new version numbers in question, just for those of us watching along at home?21:28
clarkbcorvus: old is openstacksdk==0.61.0 new is openstacksdk==0.103.021:30
clarkbrax worked21:30
clarkbfor booting and deleting a node21:30
clarkbI think I've got iweb working via clouds.yaml update. I'll write a change to update our iweb configs to work around this. Then assuming we're ok with updating nodepool to try 0.103 all over the place that might be worthwile?21:33
clarkbI feel like doing exhuastive testing of all the clouds and and imgae uploads etc might be more effort than it is worth? We can update. Then check if we boot nodes and upload images? I'm open to more testing though just the list is long if we try to do it all21:34
corvusclarkb: i think you've checked the clouds most likely to be affected, so at least we know if we update the launchers we won't faceplant.  if there are uploads that fail, we'll continue using old images, so risk isn't crazy there.  starting to try it in prod sgtm.21:36
clarkbgood point re image upload failures being low impact21:36
opendevreviewClark Boylan proposed opendev/system-config master: Update iweb clouds.yaml for old and new openstacksdk
clarkbcorvus: ^ thats the fix on our side21:39
clarkbinfra-root in rax-dfw we appear to have leaked a number of old instances (I found one from 2020 I manually deleted just to check if it would delete)21:56
clarkbI'm a bit confused as to why nodepool leaked instance cleanup wouldn't take care of this21:56
clarkbthey don't show up in the nodepool listing (for the ones I've spot checked) so they should be cleaned up by leak cleanup?21:58
clarkbdfw, iad, ord all seem to have the same situation21:59
corvusclarkb: maybe they were missing nodepool metadata?21:59
clarkboh right we check that to be sure we only leak cleanup instances created by nodepool. Ican check that21:59
clarkbcleaning this up should be good for capacity in nodepool. Then I/we also need to look at cleaning up inmotion's sadness22:00
clarkball this made more important by the impending iweb removal22:00
clarkbcorvus: that was a good hunch. Currently active nodes have properties set with the nodepool info but spot checking these other nodes they do not22:02
clarkband that would prevent leak cleanup from cleaning them up. In that cas eI guess it must've been an issue in the cloud or a temporary bug in nodepool? and we should go ahead and manually clean up22:02
corvusclarkb: yep one of those sounds most likely and i think manual cleanup is called for22:08
clarkbthanks for confirming. I'll start work on that shortly22:09
ianwclarkb: instances as in running vm's right?  I have that cleaned up a ton of leaked images and blobs quite a while ago, but iirc we identified and fixed that issue22:17
*** dasm is now known as dasm|off22:23
clarkbyes these are running VMs22:23
clarkbI'm starting in ord, taking the list of likely leaked nodes then will check them against what nodepool thinks its got. Any that nodepool doesn't know about will be removed22:30
clarkbthat process is running now for rax-ord22:33
clarkba few of these have servers with the same name too and will need to be removed via uuid22:35
clarkbianw: can I delete ianw-f34-test in rax-ord too?22:36
ianwumm, yes :)22:39
ianweven a script sending an email alerting us of very very old instances in the CI clouds would probably be helpful, it's a bit embarrassing i left that behind22:39
clarkbok done. There are three relatively recent nodes (~1 month old) that appear stuck in building that do have properties and updated at fields that indicate nodepool is trying to delete them. Other than those rax-ord looks clean. Now rax-iad22:42
ianwclarkb: while the script is in a loop -- i think my proposal to wrap up copy-approvals is to bump the comment limit when gerrit is quiet and re-run against neutron, and see if the 24 repeated updates goes away.  if it does, that was the cause.  either way i file an issue22:45
ianwif it doesn't then it seems like the missing object on 84223 is the cause.  nothing we can realistically do about that, but considering all of the 24 changes are abandoned or closed, we'll just leave it alone22:45
ianwassuming we don't have to worry about it, even if it didn't apply correctly22:46
clarkbsounds good. According to luca we dn't have to worry about it because all it does is add a new comment it doesn't change theactual schema22:48
ianwyep, i think for purposes of our upgrade, the todo is done.  we'll just try to report upstream so it's easier for the next person trying :)22:49
clarkbin iad we've got more nodes that nodepool is trying to delete and some nodes that nodepool doens't know to delete22:51
clarkbI'm going to try manually deleting all of them to see if that makes a difference22:51
*** rlandy is now known as rlandy|out23:01
clarkbianw: and ianw-xenial in dfw is good to go?23:03
ianwclarkb: basically anything with my name on it is gtg :)23:10
ianwi'm not actively using any23:10
clarkbianw: hrm it says that instance is locked. I don't know what that means but it won't let me delete it23:16
clarkbspot checking iweb and ovh they look fine. Whatever caused this to happen also seems to coincide with colliding node names (those I had to delete by uuid) so I suspect it was something on our side23:19
ianwi can't imagine i meant to lock it23:20
clarkbthere is a server unlock command23:21
clarkbI've put digging into the inmotion stuff on my todo list for tomorrow. It will require me to page in a fair bit of stuff so I don't want to tackle that now23:22
clarkbI see nodes properly held through zuul for corvus (zuul-tox-py38, frickler devstack-plugin-ceph, and ade_lee cinder-tempest-plugin-lvm-lio-barbican-fips)23:23
ianwbecause i'm poking at the error_logs, i also notice "hook[patchset-created] exited with error status: 2" seems to be happening constantly23:24
clarkbianw: I want to say that is a known issue and due to us not having access to the db anymore23:24
clarkbfor the welcome message maybe when you push for first patch?23:24
clarkbnot running things that don't work is a good idea though23:24
ianwi feel like that should be commented out ->
clarkboh I guess we addressed that the. Something else23:25
ianwi wonder if some of the python updates affected it23:26
clarkb#status log Cleaned up leaked nodepool instances in rax. Nodepool couldn't clean them up automatically due to missing metadata.23:26
opendevstatusclarkb: finished logging23:26
clarkbianw: I thought gerrit recorded the stdout/stderr along with those errors and put them in the error_log23:26
ianwseems like we need to up the logging to debug level23:27
ianwthis might be something useful to do when i restart to debug the max comment limit thing23:27
clarkboof that is probably going to be very chatty23:27
clarkbbut if it is temporary for ya that then might not be too big of a deal23:27
clarkbconfiguring gerrit logging is its own set of expertise though :(23:27
ianwi wonder if this can work23:28
ianwcom.googlesource.gerrit.plugins.hooks.HookTask is the log id, i think23:29
ianwcom.googlesource.gerrit.plugins.hooks.HookTask: INFO23:29
ianwok, i set that to DEBUG ... TIL ... let's see if anything comes up in logs23:31
clarkbya that is new to me as well.23:31
ianwoutput: update-blueprint: error: unrecognized arguments: --change-owner-username ... blah23:32
ianwit seems to be an argument per
clarkb is it possible that we aren't running an up to date jeepyb there?23:34
ianwi'm thinking so23:34
clarkboh wait23:34
clarkbits this script
clarkband that one doesn't have the argument23:35
clarkbI think we half updated?23:35
clarkbya I think that must be it.23:36
clarkbside note jeepyb changes trigger gerrit image rebuilds. We may need ot double check that integration is stillgood (though I try to get it up to date after we upgrad egerrit)23:36
ianwthe container has jeepyb-0.0.1.dev485.dist-info/23:36
clarkbianw: ya I think the issue is in update-blueprint which didn't get updated23:37
clarkbupdate-blueprint was disabled previously becuse it needed the db then melwitt updated it to use the rest api but it must've never worked because we didn't have the correct arglist in the command?23:37
ianwahhh, so basically port
melwittclarkb: I noticed that it didn't work but didn't get time to figure out what's wrong with it23:38
ianwi guess we found it :)  23:39
melwittif so, that's great news :)23:39
opendevreviewIan Wienand proposed opendev/jeepyb master: update_blueprint: handle recent gerrit arguments
clarkblooks like jeepyb is configured to build a gerrit 3.5 image23:45
clarkbwhich is in sync with what we are running so that bit should be good23:45
clarkbianw: small thing on that change23:46
opendevreviewIan Wienand proposed opendev/jeepyb master: update_blueprint: handle recent gerrit arguments
ianwthanks, i knew i couldn't get it without at least one typo!23:49

Generated by 2.17.3 by Marius Gedminas - find it at!