Wednesday, 2021-10-06

ianwfungi: if you get a chance before we do gerrit things too; adds its/actions.config which we seemed to drop in the puppet switch.  should be a no-op in production00:08
fungioh, yep thanks!00:08
opendevreviewIan Wienand proposed opendev/system-config master: [wip] letsencrypt : don't hit staging in the gate
opendevreviewMerged opendev/system-config master: gerrit: add its actions.config file
opendevreviewIan Wienand proposed opendev/system-config master: [wip] letsencrypt : don't hit staging in the gate
opendevreviewIan Wienand proposed opendev/system-config master: [wip] letsencrypt : don't hit staging in the gate
ianwclarkb: one thing to confirm maybe is i see the "avatar" image broken @ whilst it is something random on your new instance.  is that a fix, or something we have different in production?03:00
opendevreviewIan Wienand proposed opendev/system-config master: [wip] letsencrypt : don't hit staging in the gate
opendevreviewIan Wienand proposed opendev/system-config master: [wip] letsencrypt : don't hit staging in the gate
*** ykarel|away is now known as ykarel04:18
opendevreviewIan Wienand proposed opendev/system-config master: letsencrypt : don't use staging in the gate
opendevreviewIan Wienand proposed opendev/system-config master: Setup Letsencrypt for ptgbot site
opendevreviewIan Wienand proposed opendev/system-config master: Setting Up Ansible For ptgbot
opendevreviewIan Wienand proposed opendev/system-config master: ptgbot: setup web interface
*** ysandeep|out is now known as ysandeep05:27
priteauGood morning. Is Zuul struggling again today? Seeing many of the recently submitted jobs all in queued or waiting state.06:33
*** jpena|off is now known as jpena07:29
opendevreviewMarios Andreou proposed openstack/diskimage-builder master: Correct path for CentOS 9 stream base image
*** ykarel is now known as ykarel|lunch08:15
*** ysandeep is now known as ysandeep|lunch08:28
*** ykarel|lunch is now known as ykarel09:45
*** ysandeep|lunch is now known as ysandeep09:46
opendevreviewRadosÅ‚aw Piliszek proposed opendev/irc-meetings master: Cancel Masakari team meeting
*** hjensas is now known as hjensas|lunch09:58
fricklerpriteau: seems we have a bit of backlog, likely due to the release being in progress. hopefully that should resolve itself later today10:02
fungipriteau: are the ones you were looking at a few hours ago still a problem or have they been making progress?10:11
priteauThey've completed. It looks like there's around 30 minutes queue for new jobs.10:12
fungiyeah, seems the cycle-trailing deployment projects (tripleo and osa in particular) have approved a bunch of changes, and also the numerous periodic jobs which start around 06:00 utc are still finishing up10:15
fungiaccording to we're backlogged on available job nodes and on executor memory10:18
*** ysandeep is now known as ysandeep|away11:00
opendevreviewMerged openstack/project-config master: Add neutron-dynamic-routing-stable-maint group
opendevreviewMerged opendev/irc-meetings master: Cancel Masakari team meeting
*** dviroel|out is now known as dviroel11:41
opendevreviewMerged openstack/diskimage-builder master: Drop lower version requirement for networkx
*** jpena is now known as jpena|lunch12:00
*** ysandeep|away is now known as ysandeep12:44
opendevreviewMerged opendev/system-config master: letsencrypt : don't use staging in the gate
opendevreviewMerged opendev/system-config master: Setup Letsencrypt for ptgbot site
Clark[m]fungi: anything I can help look at with the release?13:05
Clark[m]Are we waiting on the items enqueued to the tag pipeline? I wonder if that pipeline's priority is lower than the release pipeline's priority13:07
Clark[m]No tag has precedence high like release and other post merge pipelines13:14
clarkbaha it is slow due to a semaphore13:21
clarkbin that case working as designed I guess13:21
opendevreviewMerged openstack/diskimage-builder master: Fix cron not installed in debian
clarkbfungi: I think we're looking at at least 5 hours or so to get through that tag queue. Likely longer due to waiting for nodes after getting the semaphore13:23
clarkbfungi: any idea if that is a problem?13:23
*** jpena|lunch is now known as jpena13:24
clarkbianw: for the avatar thing the current gitea requets them using a sequential numeric id which 404s and the updated gitea uses what appears to be more like a uuid13:28
clarkbianw: I can't say for sure that 1.15.3 will fix that issue, but it does seem the behavior changes so it wouldn't surprise me. It is possible they updated the db table for that but then didn't update the api request side or something13:28
clarkbI don't think it is critical, but something we followup on after the upgrade I guess13:28
fungiclarkb: the release notes builds aren't blockers for the release, no13:36
clarkbdiablo_rojo_phone: ianw: fungi: note that isn't running eavesdrop job(s) so there may be missing file matchers13:47
clarkbthe child changes are probably more likely to match and then run the job though13:47
opendevreviewShnaidman Sagi (Sergey) proposed zuul/zuul-jobs master: Include podman installation with molecule
opendevreviewMerged opendev/system-config master: Setting Up Ansible For ptgbot
fungiclarkb: yeah, i wondered why that didn't deploy14:22
fungiinfra-prod-service-eavesdrop is queued in deploy for 803190 now14:26
fungibut it's also queued in the opendev-prod-hourly pipeline so may get a cert sooner14:27
*** dviroel is now known as dviroel|afk14:41
fungimmm, infra-prod-base and infra-prod-letsencrypt failed, will dig into those momentarily14:54
fungilooks like the base failure is due to an interrupted package upgrade on afs01.ord.openstack.org15:23
fungii'll see what's up with that15:23
clarkbThe task includes an option with an undefined variable. The error was: 'letsencrypt_self_generate_tokens' is undefined is the LE issue. I'll push a fix for that one15:24
fungirerunning dpkg --configure -a15:24
fungiclarkb: thanks! i guess we missed setting a default there15:24
clarkbyup I think we just need to add it to the defaults file15:25
fungisomehow we ended up with mismatched kernel header packages on afs01.ord15:26
fungiThe following NEW packages will be installed: linux-headers-5.4.0-88 linux-headers-5.4.0-88-generic linux-image-5.4.0-88-generic linux-modules-5.4.0-88-generic linux-modules-extra-5.4.0-88-generic15:27
fungiThe following packages will be upgraded: linux-generic linux-headers-5.4.0-84-generic linux-headers-generic linux-image-generic15:27
fungionce those are in place, it should be able to recover from the dkms build failure15:28
opendevreviewClark Boylan proposed opendev/system-config master: Fix letsencrypt_self_generate_tokens defaults
clarkbfungi: ^ I think that will fix the LE issue15:31
fungiclarkb: should we exercise that by dropping the explicit False in the letsencrypt test job?15:33
fungi(assuming that works the way i think it does)15:33
clarkbfungi: ++ let me see if I can do that15:35
opendevreviewClark Boylan proposed opendev/system-config master: Fix letsencrypt_self_generate_tokens defaults
clarkbthat should do it I think15:36
fungithat way if the default doesn't work like we expect, the job should now fail15:36
*** ykarel is now known as ykarel|away15:38
fungiclarkb: mmm, no i don't think that's going to do what i was thinking15:39
clarkbfungi: oh right I agree it will see the zuul specific override15:39
fungii guess we override the default in playbooks/zuul/templates/group_vars/letsencrypt.yaml.j215:39
clarkbya I don't think there is a wy to do that unless we use a different group vars for that specific job15:39
clarkband figuring that out in ansibel is likely to be annoying. I'll revert the latest update for tnow15:39
fungiright, i'm fine with patchset 1 then15:40
opendevreviewClark Boylan proposed opendev/system-config master: Fix letsencrypt_self_generate_tokens defaults
fungithanks, sorry for the runaround15:41
clarkbno worries, it was a good idea15:41
*** jpena is now known as jpena|off15:43
*** jpena|off is now known as jpena15:44
fungishould this worry us?
clarkbfungi: yes, that points to a likely znode leak in zuul15:48
clarkbcorvus: ^ fyi15:48
clarkbI guess introduced with a restart on the 4th? was there a restart on monday?15:48
fungilooks like it was fairly steady for the week prior to that15:49
clarkbah ok15:49
clarkbinspecting the zk db direclty will probably make it clear where the leak is happening15:49
fungisteady as in flat15:49
fungiand then on or around the 4th started to grow linearly15:49
fungithe last zuul upgrade was 2021-09-30 according to
opendevreviewMerged opendev/system-config master: Update our gitea images to bullseye
clarkbthe big jumps we see are periodic jobs queuing up I think15:53
clarkbnote based on napkin math we did a while back I think we expect our cluster to be able to do millions of nodes without too much trouble. I don't think this is a restart and delete all znodes reset just yet15:55
clarkbthat said zuul does seem a bit slwo to work through some of the events its got, though it is also a busy morning15:58
funginah, more "why is this steadily going upward and never really downward but just since monday" sort of thing15:59
*** sboyron_ is now known as sboyron16:01
*** sboyron is now known as Guest200216:02
clarkbthe gitea job has begun. It will work through giteas one by one upgrading them and will stop if any fail16:02
clarkbit does them in order too so you can check 01 first. one thing we should probably double check is replication too in addition to web updates since we removed the bullseye backports openssh install since we are on bullseye now16:03
clarkbI don't expect issues since it should be the same openssh, but it is probably the main thing affected by this since gitea itself is a go binary16:03
fungiwe no longer have the bug where the sshd discards git pushes while gitea is restarting, right?16:06
clarkbcorrect, that was fixed by starting ssh after gitea web was started16:06
clarkbthe problem before was starting ssh first allowed gerrit to do the push but then the web component was never made aware of it16:06
clarkb web seems to be working16:07
clarkbI see replication events exiting the queue (with a much smaller number of retries likely due to the restarts)16:10
clarkbI think we're good, but a good final check would be to check your refs/changes/xyz/yz is present after pushing a new patchset or similar for after merging something16:11
corvusclarkb, fungi: thanks, i'll take a look at znodes16:13
*** ysandeep is now known as ysandeep|out16:15
clarkbThe gitea image upgrade to bullseye appears to have completed. I haven't seen anything that concerns me yet16:25
clarkbI'd like to get a bike ride in in a bit so won't land the 1.15.3 upgrade just yet but would like to do that next when I can help babysit16:25
corvusah, looks like there's a problem with the change cache cleanup16:33
corvusi need to use the repl to investigate more16:34
clarkbthe openstack release is basically done at this point so that should be safe. There are 38 release notes build jobs still in the queue but the release team didn't think those were super important16:35
fungithose are mainly just refreshing the release notes to make sure the tagged versions show up instead of "under development"16:35
corvuscool.  i'm still planning on proceeding lightly and don't expect any interruption yet.  i'll let you know if what i find changes that.16:36
fungi(or instead of release candidate versions)16:36
clarkbyuriys: related to the above message I think you're clear to set instance quota to 0 in the inmotion cloud and fiddle with placement settings whenever you like16:37
corvusclarkb, fungi: i understand the issue.  it's not something i can fix easily with the repl; i think we should fix the zuul bug and restart as soon as is convenient.  i do think we can allow this to grow for a while (days?) without it being urgent.16:46
fungicorvus: yes, as clark mentioned we estimated being able to support orders of magnitude more znodes16:47
clarkbcorvus: I agree, last time we did some metrics on this we figured millions of znodes would be fine on our cluster16:47
fungii just happened to notice its growth looked unbounded16:47
clarkbfungi if you have time is the next change I'd like to land post bike ride16:49
clarkbfungi: the child change has a held node on the gitea job which you can use to inspect things16:49
fungiyeah, i'll look it over16:49
*** jpena is now known as jpena|off17:00
corvusi have shut down the zuul repl17:21
fungithanks corvus!17:23
opendevreviewMerged zuul/zuul-jobs master: Revert "Revert "Include tox_extra_args in tox siblings tasks""
fungijust a heads up, that ^ caused some problems for tripleo jobs the first time it merged, it's now exercised with the case they ran into (and fixed), but there could be more corner cases lingering we didn't spot before it got reverted17:35
*** dviroel|afk is now known as dviroel17:41
fungiclarkb: also when you're back, ianw commented in here earlier about suddenly having generated avatars on org pages like
fungidoesn't seem like it presents a problem, but does seem to be a difference17:51
fungiand (as opposed to openstack) is the result of a rename test, correct?17:53
fungi#status log Manually corrected an incomplete unattended upgrade on afs01.ord.openstack.org17:56
opendevstatusfungi: finished logging17:57
*** ysandeep|out is now known as ysandeep18:03
opendevreviewMerged opendev/system-config master: Fix letsencrypt_self_generate_tokens defaults
opendevreviewDanni Shi proposed openstack/diskimage-builder master: Update keylime-agent and tpm-emulator elements
opendevreviewJeremy Stanley proposed opendev/system-config master: Finish ptgbot configuration
fungidiablo_rojo_phone: clarkb: ianw: ^19:10
Clark[m]fungi: ya OpenDev/system-config is pushed in to show us what a real repo looks like and dib is part of rename testing19:11
Clark[m]Re the avatar stuff I think it did he generated one before and they may have broken it and now it is fixed again? I'm thinking we upgrade and drug further from there if necessary19:11
fungialso we have a cert for ptgbot now, the le fix worked!19:11
fungii'll approve the web config change19:12
yuriysI've set time aside for Friday clark! but appreciate the greenlight.19:12
Clark[m]fungi I chose system-config as the push repo since it is already available on the test node. Made it easy19:15
*** ysandeep is now known as ysandeep|out19:24
clarkbfungi: I guess we can have ianw review it and ack the avatar thing and I can be around to babysit it from there19:34
fungiclarkb: if you have a moment, can you take a look at 812739? as far as i know that's all we're still missing for the ptgbot deployment work19:47
clarkbI sure do, let me see19:47
fungiaside from ianw's 812419 which i approved a few minutes ago19:48
clarkbI went ahead and approved 812739 since impact if anything isn't quite right is minimal19:49
fungiyeah, worst case it continues not working ;)19:49
opendevreviewMerged opendev/system-config master: ptgbot: setup web interface
opendevreviewMerged opendev/system-config master: Finish ptgbot configuration
ianwfungi: so it seems like we're just waiting for 812739 to deploy and then we expect ptgbot to be up?21:22
ianwthe other thing was we probably want to cname; i can look into that if you like21:23
clarkbianw: re deployment yes that was my understanding. Also if you want to rereview I think we can land that now and debug avatar stuff afterwards if the issue persists21:24
clarkbI'm around to babysit that for the next 3-4 hours or so21:24
ianwavatar stuff wasn't a big issue, just the broken image looks a bit ugly21:26
clarkbthe debian upgrade happened this morning. My rough plan is to look at the gerrit stuff tomorrow. I think the foundation has some updates to the cla that we can batch into that restart21:27
ianwok sounds good, yeah i was going to write a checklist tomorrow so that sounds good21:28
clarkbthe debian upgrade for gitea I mean21:30
ianwlooks like that deployed, looks subborn21:34
ianwUp About an hour                     ptgbot-docker_ptgbot_121:35
ianwi'll do a manual restart, but we can double check the restart logic21:35
ianwlocalhost:8000 is responding21:36
ianwoh i see, a typo in the http side. works21:37
opendevreviewIan Wienand proposed opendev/system-config master: ptgbot: fix servername on http side
clarkbianw: +A'd21:39
ianw{{ ptgbot_config_copied is changed | ternary('--force-recreate', '') }}"21:41
ianwit definitely feels like that deploy should have force recreated the container21:41
jrosseri always get nervous when the thing preceeding a ternary is not in ( )21:42
ianwTASK [ptgbot : Put ptgbot config in place]21:43
ianw"changed": false,21:43
ianwfungi: is it possible you hand-edited in the fixes, and since it reapplied the same thing didn't flag as changed?21:43
fungiianw: i did not, no21:44
fungionly looked at what was deployed and proposed changes based on that problem21:44
ianw... interesting ...21:44
ianwit changed in "Running 2021-10-06T20:26:12Z"21:49
ianwwhere it did recreate.  maybe i just fooled myself thinking it wasn't working because i was trying to hit http21:51
ianwthat seems to be it21:52
clarkbI'm not actually in the channel on oftc, what is the channel? I probably should join it21:52
diablo_rojo_phone#openinfra-events ?21:52
clarkbthats the one, thanks21:52
diablo_rojo_phoneNo problem.21:53
clarkbI don't think I see the bot in there? So it might still be having problems21:53
clarkbianw: ^ fyi21:53
ianwhrm, indeed21:53
clarkbDid this bot get the necessary updates to properly identify with oftc since they don't do sasl?21:54
clarkbI wonder if that is the problem21:54
ianwi believe it did21:54
fungithe account in private hostvars was created and tested during the oftc migration21:56
fungior maybe i only thought i'd done that21:57
fungialso entirely possible, there was rather a lot going on at that time21:57
clarkbheh hpe is still sending us email with my name on it from when I set up the hpcloud account22:05
clarkbthat one was fun too because they charged my corporate account the first billing cycle and I had to go explain to people why it was cheaper for them to not bill me and have me do an expense report22:05
ianwok; problem part 2 : openinfraptg openinfra-events :Illegal channel name22:06
clarkbdoes it need the #22:07
ianwi think so, let me try22:07
ianwok, it's there now22:08
clarkbI confirm I see it in the channel22:08
clarkbianw: I can fast approve an update to the channel list for the eavesdrop group ansible vars file22:08
clarkbor let me know if you want me to push that update22:09
clarkbI think you might have to quote the name becuse # is a yaml comment starter22:09
opendevreviewIan Wienand proposed opendev/system-config master: ptgbot: add leading # to channel name
clarkbapproved, hopefully yaml is happy with that22:11
fungii guess it's not like statusbot in that regard22:13
opendevreviewIan Wienand proposed opendev/system-config master: ptgbot: add certificate for
ianw^ i've created the _acme-challenge record for that22:21
clarkbI've +2'd but will let fungi double check if he likes (otherwise I think you can +A)22:23
ianwhrm, hang on ...22:25
ianwwe've got "" and ""22:25
clarkboh I suspect we may want both to be ptg.o.o22:26
clarkbsince we don't care too much about it being an irc bot on the web front end22:26
clarkbfungi: diablo_rojo_phone  ^ does that make sense?22:26
ianwyeah i think we might have all just been so used to typing "bot" it snuck in there22:26
diablo_rojo_phoneYeah that makes sense.22:28
fungithe old site was, the new site is, right22:29
fungior are you saying we should make it
clarkbfungi: correct ptg.opendev.org22:30
fungii thought i had already asked that on earlier changes and the consensus was for as the new site name (because i had the same question)22:30
clarkbsince the 'bot' is an implementation details22:30
diablo_rojo_phoneI feel like it should just change to opendev from openstack. 22:30
diablo_rojo_phoneRight. I don't think bot needs to be included in the url22:30
clarkbdiablo_rojo_phone: ya I suspect we'll do a redirect but you still need an ssl cert for the old name to make that work22:30
fungiokaym that's fine, having ptgbot as the site name seemed to have been intentional in the ansible change22:31
ianwfungi: yeah, i think it's just a situation normal wires crossed :)22:31
diablo_rojo_phoneI think that was mostly because I was using status bot as an example.22:31
diablo_rojo_phoneExactly like what ianw said. 22:31
ianwi think at this point, we'd be best just to put in some s/ptgbot/ptg/ edits before anyone links to it22:32
diablo_rojo_phoneThat's definitely my mistake. 22:32
fungiahh, okay wfm, i had previously told foundation folks though since that's what the changes were imlpementing22:32
fungiso i think it's already written into draft announcements, we'll just need to make sure they fix those22:32
fungisorry, i assumed the site name used in the earlier changes was intentional22:33
fungi(i had actually recommended, for that matter)22:34
fungiinfra-prod-remote-puppet-else failed in deploy for 812739:
clarkbthat shouldn't matter too much since there isn't any puppet left in eavesdrop22:36
clarkbbut probably want to understand why in gneral22:36
diablo_rojo_phoneI can make sure that gets updated in future emails fungi 22:36
opendevreviewIan Wienand proposed opendev/ master: Rename ptgbot to ptg
clarkbI was 30 secnds too late ont hat one22:39
clarkbya'll are quick22:39
opendevreviewIan Wienand proposed opendev/system-config master: ptgbot: rename site to
clarkband again22:42
ianwin my defense, everything was pretty much open in emacs :)22:42
clarkbno its a good thing. Less thinking for me :)22:43
opendevreviewIan Wienand proposed opendev/system-config master: ptgbot: rename site to
ianw^ missed testinfra22:44
clarkbthe gitea upgrade should be merging shortly22:45
opendevreviewMerged opendev/system-config master: Upgrade gitea to 1.15.3
opendevreviewMerged opendev/system-config master: ptgbot: fix servername on http side
opendevreviewMerged opendev/ master: Rename ptgbot to ptg
clarkb has been upgraded22:55
clarkbavatars are still broken22:55
fungiso maybe their working is an artifact of our tests22:56
clarkbya, I suspect something around our upgrading from old versions maybe22:57
clarkbif you look at the requests they are different in that we have a simple numeric id request in prod and a uuid request in testing22:57
clarkbbut other than that it is looking happy from what I can see so far22:57
clarkband that isn't a new regression22:57
clarkbwe have upgraded through 04 at this point. STill seems good. I expect it will get to 08 without any problems23:01
clarkbianw: fungi: I expect the foundation should be proposing cla.html file updates, should we go ahead and approve the bullseye update for gerrit now? then plan to restart once the cla.html is updated?23:03
clarkbI'm not super concerned about that update since most of the software runs in its own vm (similar to gitea running in its own independent binary)23:03
clarkbthe biggest risk is going to be jeepyb but that has decent testing on python3 now too iirc23:03
fungisounds good to me, i'm around if things go sideways23:04
clarkbfungi: is the gerrit bullseye change if you want to give that a look23:05
fungialready there23:05
ianw++ on the upgrade23:08
fungiinteresting that the undifferentiated openjdk:11 image is buster (oldstable) instead of bullseye (stable)23:08
clarkbfungi: ya I don't know when they tend to update those23:08
clarkbI figure it is better to be explicit either way (there is a buster specific image too)23:08
fungiagreed, lgtm23:09
clarkbgitea08 is done. the job is still running but this lookslike a successful gitea upgrade23:10
clarkbcue "we did it, we did it, we did it, yay!"23:11
clarkb(Dora the Explorer for anyone curious)23:11
ianwnow i'm going to have "i'm a map i'm a map i'm a map i'm a map" in my head23:12
ianw(i think i forgot a i'm a map)23:12
opendevreviewMerged opendev/system-config master: ptgbot: add leading # to channel name
fungii'm counting myself lucky to barely know what you're talking about23:14
clarkbfungi: haha, there are definitely worse programs but yes23:15
ianwfungi: i think for we need a separate vhost to do a 301 right?23:22
ianwi note for something like we don't do that, but i think it just happens to work because there's only one site running on and it defaults to that23:22
fungiianw: no need for a separate vhost, you can make it a serveralias and then redirect within that same vhost to the preferred url23:24
fungioh, though with multiple sites and sni that does become challenging if the serveralias isn't a subjectaltname on the same cert23:24
fungiwith sni in play, apache will be using the cert to match up the vhost i think23:26
clarkbI've always done it as a separate vhost. Possibly in the same file though23:26
ianwin this case i have added to the same cert23:28
fungithen it should be able to just be a redirect/rewrite within the vhost23:28
fungithe rewritecond we use on codesearch is a fine example, yep23:30
fungibut we need it as a serveralias in that vhost23:30
fungiwhich wasn't necessary on codesearch because it has only one vhost23:31
opendevreviewIan Wienand proposed opendev/system-config master: ptgbot: Add ServerAlias for
ianwfungi: ^ that?23:35
clarkbI see we already have the redirect there we just need the alias to match and then do the redirect23:36
ianwcan anyone think of a files: .zuul match example for "everything but this file"?23:57
ianwthe letsencrypt job shouldn't match on  playbooks/roles/letsencrypt-create-certs/handlers/main.yaml 23:58
Clark[m]I think zuul uses re2 for those so no negative lookahead match23:59

Generated by 2.17.2 by Marius Gedminas - find it at!