corvus | rm_you: i se an "rm_work|" (with a pipe) as a puppet user | 00:00 |
---|---|---|
rm_work| | heh | 00:01 |
rm_work| | will spin this down soon | 00:01 |
clarkb | ok time for dinner | 00:03 |
clarkb | I've learned more things about matrix :) | 00:03 |
rm_work | me too :) | 00:07 |
fungi | catching up, as for the tangent about edits, the weeslack slack plugin for weechat totally uses s/this/that/ or 3s/this/that/ to edit the third most recent message you sent | 00:11 |
rm_work | yeah neat, would be cool for element to do that as well :P | 00:12 |
opendevreview | Ghanshyam proposed openstack/project-config master: Properly retire neutron-lbaas https://review.opendev.org/c/openstack/project-config/+/800147 | 00:21 |
opendevreview | Ghanshyam proposed openstack/project-config master: Properly retire neutron-lbaas https://review.opendev.org/c/openstack/project-config/+/800147 | 00:32 |
opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire tripleo-common-tempest-plugin - Step 1: End project Gating https://review.opendev.org/c/openstack/project-config/+/800154 | 04:51 |
*** ysandeep|away is now known as ysandeep | 04:54 | |
*** ykarel|away is now known as ykarel | 05:04 | |
opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire tripleo-common-tempest-plugin - Step 3: Remove Project https://review.opendev.org/c/openstack/project-config/+/800157 | 05:11 |
*** bhagyashris|ruck is now known as bhagyashris|out | 05:53 | |
*** ykarel_ is now known as ykarel | 06:17 | |
*** jpena|off is now known as jpena | 07:11 | |
*** ysandeep is now known as ysandeep|lunch | 07:29 | |
*** amoralej|off is now known as amoralej | 08:05 | |
*** ykarel is now known as ykarel|lunch | 08:30 | |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Create repo for Hashicorp Vault deployment https://review.opendev.org/c/openstack/project-config/+/799822 | 08:37 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Add Vault role to Zuul jobs https://review.opendev.org/c/openstack/project-config/+/799825 | 08:37 |
*** mgoddard- is now known as mgoddard | 08:40 | |
*** ysandeep|lunch is now known as ysandeep | 08:42 | |
*** mrunge_ is now known as mrunge | 08:59 | |
*** ykarel|lunch is now known as ykarel | 10:05 | |
opendevreview | chzhang8 proposed openstack/project-config master: register andd return back tricircle under x namespaces https://review.opendev.org/c/openstack/project-config/+/800196 | 10:08 |
*** jpena is now known as jpena|lunch | 11:20 | |
*** jpena|lunch is now known as jpena | 12:25 | |
*** amoralej is now known as amoralej|lunch | 13:07 | |
*** ysandeep is now known as ysandeep|away | 13:21 | |
opendevreview | Matthias Runge proposed openstack/project-config master: Enable deleting left-overs of panko deprecation https://review.opendev.org/c/openstack/project-config/+/800241 | 13:34 |
*** amoralej|lunch is now known as amoralej | 14:00 | |
clarkb | fungi: I think its not so much weeslack as it is slack | 14:34 |
fungi | oh? | 14:34 |
clarkb | they removed that feature and people complained and then added it back again iirc (to slack proper) | 14:35 |
fungi | the s/foo/bar/ editing? | 14:35 |
clarkb | yup | 14:35 |
fungi | interesting | 14:35 |
clarkb | Looking through my todo list I think the big one for ensuring ianw's monday goes smoothly is ensuring that zuul restarts cleanly | 14:44 |
clarkb | corvus mentioend being able to do that today and I offered to help. corvus are there other changes we want to land really quickly before doing that? | 14:44 |
fungi | i'm happy to switch focus at some point to help with that too if needed | 14:45 |
corvus | clarkb: nothing urgent, but maybe we can do a quick pass over hashtag:sos and see if there's anything else ready to land? | 14:50 |
clarkb | corvus: sure | 15:04 |
clarkb | corvus: https://review.opendev.org/c/zuul/zuul/+/800066 is the one I see | 15:05 |
clarkb | I can review that shortly | 15:05 |
corvus | clarkb: yep, i just -1d the other 2, so that's the only candidate. | 15:07 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update gitea to 1.14.4 https://review.opendev.org/c/opendev/system-config/+/800274 | 15:31 |
clarkb | I'll mark ^ wip as the set of changes is not small. But reviews would still probably be helpful particularly in double checking the template updates. | 15:32 |
clarkb | now to review that zuul change | 15:32 |
*** jpena is now known as jpena|off | 16:11 | |
clarkb | corvus: mordred: in the original well known patchset you used opendev.ems.host. I'm looking at setting up the EMS instance now and it seems that they are all suffixed with element.io. I guess domains have simply changed since you set things up? | 16:11 |
corvus | clarkb: i'm unaware of that... but honestly, that seems surprising... i wouldn't think they would want to use element.io | 16:16 |
*** amoralej is now known as amoralej|off | 16:18 | |
clarkb | "First you need to choose a hostname. This will be where you will log in to your Element chat webapp." Then the field says your_subdomain .element.io | 16:23 |
clarkb | I'm going with opendev | 16:23 |
clarkb | for custom homeserver domain I want opendev.org and for custom client domain we want matrix.opendev.org ? | 16:27 |
clarkb | mordred: ^ I think this is the bit you were talking about | 16:28 |
clarkb | oh actuall I think we don't need custome client domain this is what we talked about yseterday with admins being able to talk directly to it | 16:28 |
clarkb | when I punch in opendev.org the results match what mordred pushed in ps1 of the well known files change | 16:32 |
clarkb | I think they use different domains for that stuff then (it complains that the files 404 but says I can deal with that later so I will proceed) | 16:32 |
fungi | bizarre | 16:33 |
clarkb | I'm disabling guests and public registration. I believe we have already decided we never want public registration but maybe we want guests later | 16:34 |
clarkb | I'm going to take the secrets lock | 16:38 |
clarkb | Ok I've put the credentails in the usual location and logged back in again using them. If someone else can give that a go that would be great | 16:50 |
clarkb | I'm going to restore mordred's original ps now | 16:51 |
mordred | ++ | 16:52 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add matrix well-known files for opendev https://review.opendev.org/c/opendev/system-config/+/800120 | 16:52 |
clarkb | They do note there is some cors requirements for one of those files which may force us to host it with apache, but I think we can do that if this doesn't work | 16:52 |
clarkb | corvus: ^ do you want to send that one in if it looks good to you? | 16:53 |
mordred | yah. there's no rush - that wizard will happily sit open for ages | 16:53 |
clarkb | mordred: ya there is a button to tell it to recheck | 16:53 |
clarkb | we are on the trial level nicket setup with 5 users. | 16:54 |
clarkb | *nickel | 16:54 |
mordred | I'm betting that'll be more than enough | 16:55 |
corvus | i don't see any reason for us to even use the ems-hosted element service | 16:57 |
corvus | i don't think we should do that, even as admins | 16:57 |
corvus | so if it's broken by cors: great :) | 16:57 |
clarkb | I'm not entirely sure what the next steps are other than to land the well known file. I suppose to create an admin account in the server. | 16:58 |
clarkb | And then a gerritbot account and then run a gerritbot? | 16:58 |
clarkb | oh and I guess some channels | 16:59 |
clarkb | I'll probably defer to order of operations on that to others but I'm happy to try and help do the work | 16:59 |
corvus | clarkb: that sounds good | 16:59 |
clarkb | I've been reminded that I need to get on the bike soon before the outside becomes extra warm. I'll start working my way out the door now then when I get back can help with zuul restart stuff. | 17:00 |
clarkb | infra-root but definitely see if you can get to the EMS dashboard and maybe glance over the options there | 17:01 |
corvus | i may take the same break actually :) | 17:01 |
*** melwitt is now known as Guest322 | 17:31 | |
mordred | clarkb: I have logged in to the dashboard and things look good. | 17:49 |
fungi | i logged into it enough to confirm the recorded credentials work, but that's all | 17:52 |
*** melwitt_ is now known as melwitt | 17:57 | |
*** melwitt is now known as jgwentworth | 17:58 | |
*** TheJulia is now known as needssleep | 18:04 | |
opendevreview | Goutham Pacha Ravi proposed openstack/project-config master: Add channel ops to #openstack-manila https://review.opendev.org/c/openstack/project-config/+/800296 | 18:44 |
opendevreview | Merged openstack/project-config master: Add channel ops to #openstack-manila https://review.opendev.org/c/openstack/project-config/+/800296 | 19:08 |
clarkb | I'm back and can help with restarts whenever we are ready. | 19:24 |
clarkb | looks like the nodepool launcers are on a 2 day old image whihc i think incluides the change we need on them | 19:24 |
fungi | i'm not sure if we were waiting on images from 800282 for restarting the executors, though i suppose it's immaterial since that didn't switch the default behavior anyway | 19:25 |
fungi | and yes, i restarted the launchers on the new version already | 19:26 |
fungi | it just happened to be convenient to do that since i already needed to restart one to free up those hanging arm node requests | 19:26 |
corvus | clarkb: the promote job for the latest patch was successful, so i think we can pull & restart now | 19:26 |
corvus | fungi: i think we should restart with images from 800282 -- that's the promote job i meant | 19:27 |
corvus | since it secceeded, we only need to do a pull and we'll be sure we have it | 19:27 |
fungi | right, that's what i assumed | 19:27 |
fungi | okay | 19:27 |
clarkb | we do ignore errors now though right? | 19:27 |
corvus | clarkb: i think just on tag deletes? | 19:28 |
fungi | errors for what? | 19:28 |
fungi | oh, on promote | 19:28 |
fungi | yeah, that situation | 19:28 |
fungi | it was just ignoring failure to delete tags on dockerhub | 19:28 |
clarkb | corvus: oh right just the cleanups | 19:28 |
clarkb | and ya checking docker hub images appear up to date | 19:28 |
corvus | i think/hope/intended it to still fail if it can't create the desired tag | 19:28 |
clarkb | corvus: yes, I blieve that is the case | 19:29 |
corvus | (is there an english word for think/hope/intend ?) | 19:29 |
corvus | ("would be suprised if not" is about all i can come up with right now) | 19:30 |
corvus | pulling now | 19:30 |
fungi | "wager" ;) | 19:30 |
corvus | fungi: ideally a word without unintended financial side-effects :) | 19:31 |
fungi | d'oh! | 19:31 |
JayF | corvus: expect? | 19:32 |
opendevreview | Merged opendev/system-config master: Add matrix well-known files for opendev https://review.opendev.org/c/opendev/system-config/+/800120 | 19:33 |
clarkb | oh hrm ^ restarting while that promotes and runs might get weird | 19:33 |
clarkb | I guess worst case I can manually do the service pulls and updates for that | 19:33 |
corvus | JayF: yeah, that's probably the closest :) that's like 98% what i wanted to say, then throw in 2% of "fingers crossed" :) | 19:34 |
JayF | I always joke about the world "should" for stuff like that | 19:34 |
corvus | clarkb: it's waiting on the infra-prod playbook semaphore, which i guess is held by the hourly jobs | 19:34 |
JayF | because should is the weasel word that means "this is what I expect but ANYTHING IS POSSIBLE" | 19:34 |
JayF | e.g. The patch should fail in CI if it creates a bug. | 19:35 |
clarkb | corvus: yup it should get it once the infra-prod-service-nodepool job finishes. I guess if we stop now and reeqneue we'll probably be ok | 19:35 |
corvus | JayF: if we capitalize it like an RFC it takes on that specific meaning, right? :) | 19:35 |
clarkb | corvus: or we wait for the nodepool job adn the gitea jobs to finish as another graceful option | 19:35 |
JayF | corvus: I've worked with enough protocol implementations to know that even in an RFC, should means 'this is expected but ANYTHING IS POSSIBLE" lol | 19:36 |
corvus | clarkb: maybe let's see what happens in the next 2 minutes? | 19:36 |
clarkb | corvus: wfm | 19:37 |
corvus | it's running a "cleanup old logs" task, but with 10237 logs in that directory, i'm not sure it's suceeding | 19:38 |
corvus | anyway, on to gitea now, which takes ~16 minutes | 19:39 |
corvus | that's kind of interesting that the semaphore shifted pipelines like that, but i guess it makes sense due to priority | 19:40 |
clarkb | yup I think that is normal | 19:41 |
clarkb | where normal == what it has always done | 19:41 |
clarkb | I see the well known files on gitea01 now | 19:44 |
clarkb | it has finished through 06 I think | 19:57 |
clarkb | now 7 is done | 19:59 |
clarkb | 8 is being restarted right now | 20:00 |
corvus | playbook is done; just waiting for the job to tidy up | 20:00 |
corvus | clarkb: look good to restart now? | 20:05 |
clarkb | corvus: I think so | 20:05 |
clarkb | the zuul playbook is running and i think it will keep running | 20:05 |
corvus | running | 20:05 |
clarkb | but it doesn't start anything iirc | 20:05 |
clarkb | so we should be fine | 20:06 |
corvus | #status log restarted all of zuul on commit 657d8c6fb284261f1213b9eaf1cf5c51f47c383b | 20:06 |
opendevstatus | corvus: finished logging | 20:06 |
johnsom | Do we need to restart our jobs or will they eventually requeue? | 20:14 |
corvus | rere-enqueing | 20:14 |
corvus | johnsom: no action necessary | 20:14 |
johnsom | Ok | 20:14 |
fungi | yeah, i see them starting to show up in the status interface now | 20:14 |
clarkb | and the jobs that are running seem to be doing work | 20:18 |
clarkb | whcih is good because there were changes to node requests so if we're doing real work then we're probably in a happy spot? | 20:18 |
fungi | i see some have already succeeded | 20:19 |
fungi | so, yes | 20:19 |
fungi | i need to go run a quick errand since this looks probably sane, but should be back in 30-45 minutes tops | 20:19 |
clarkb | no worries | 20:20 |
corvus | re-enqueue done | 20:22 |
clarkb | I'll be around to keep an eye on it during that time. When fungi gets back I may do some local network updates though | 20:22 |
corvus | clarkb: what's the matrix status? | 20:22 |
clarkb | also apparnetly the 5.13 kernel is available now I should update to that | 20:23 |
clarkb | corvus: I need to log back in and hit the recheck button the well known file stuff | 20:23 |
clarkb | I'll do that now | 20:23 |
clarkb | corvus: the server well known doc is happy now and has a green check mark. The client homeserver well known doc is not due to cors as we suspected. You indicated yuo thought that was fine so we'll just leave it as is for now? | 20:25 |
corvus | clarkb: i think so | 20:27 |
clarkb | Looking at the user management menu I can add a user, give it a @name:opendev.org and check a box if I want it to be admin or not. Do we want and admin account called @admin:opendev.org ? | 20:28 |
corvus | clarkb: sounds good | 20:29 |
clarkb | ok I'll add that info to the usual location | 20:29 |
clarkb | so now I think if I click on our element url I'll get a new webbrowser element client whcih I can use to login as admin and make a channel? | 20:34 |
corvus | clarkb: sounds right; modulo maybe that has issues due to cors? | 20:34 |
clarkb | ya I think the cors thing is if we want element to live under opendev.org | 20:35 |
clarkb | the url they give me is an element.io url so I'm hoping it will work | 20:35 |
corvus | you should be able to use any element webapp to talk to any homeserver (ie, i could use my own) | 20:36 |
corvus | basically, you're just trusting that site to serve you an application which then runs entirely in-browser | 20:36 |
corvus | so nobody else should trust me, but i do :) | 20:36 |
corvus | clarkb: if you're going to make a room, maybe just a 'test' room for now? | 20:38 |
clarkb | corvus: sure. The first thing it does is invite me to server alerts and then when I join that channel the server tells me I must agree to a tos to use the service (wasn't really expecting that) | 20:39 |
clarkb | I'm going to go ahead and accept those for our admin user :) | 20:39 |
corvus | clarkb: neat, that's all EMS stuff :) | 20:39 |
clarkb | corvus: yup. Also it auto created a #general for me. Should we just use that instaed of #test? | 20:40 |
corvus | wfm | 20:40 |
corvus | we can tombstone it later if we want | 20:40 |
clarkb | I'll try and invite those of us I know that are already on matrix | 20:41 |
clarkb | then we can change the channel settings to be more public as we get more comfotable with this | 20:41 |
clarkb | mordred: corvus: you have been invited. I couldn't find fungi's matrix name | 20:45 |
fungi | clarkb: it's "fungicide" on element because "fungi" was taken by someone else who obviously shares my impeccable tastes and unrivaled sense of style | 21:02 |
clarkb | fungi: cool on sec | 21:02 |
fungi | i'll get around to setting up a homeserver at some point to rectify that | 21:02 |
clarkb | fungi: kinrui is the display name? | 21:03 |
fungi | yeah, that was a test display name (fungi in romanji) so i wouldn't have to ghost my irc connection | 21:03 |
corvus | your irc nick can differ from your matrix display name | 21:04 |
fungi | i thnik i just set them to the same at the time | 21:04 |
fungi | since i was fiddling with clearing out the [m] | 21:05 |
clarkb | fungi: you should have invites to the two channels we're using now. Though really we're only going to use the one #test | 21:05 |
fungi | thanks, i haven't set up a persistent client yet anyway, will try to knock that out over the weekend | 21:06 |
clarkb | One thing to be careful of in element is adding aliases | 21:07 |
clarkb | There are two areas to do that. One allows you to create the alias on any server you have permissions to do that on I guess and the other will add it specifically to your homeserver | 21:08 |
clarkb | anyway I set the alias when I was admin on the opendev homeserver so that all worked fine | 21:08 |
clarkb | but when I looked at the settings as myself with admin perms noticed the subtle shift in behavior there | 21:08 |
corvus | clarkb: yep, i think we don't generally want people to add room aliases to official opendev rooms | 21:10 |
clarkb | ++ | 21:11 |
corvus | but that is how you can achieve some fault tolerance, and it's how you move rooms to other servers | 21:11 |
clarkb | it might also be good to have someone else login as admin via element and make sure that is working for them and my browser isn't now forever stuck being the browser to do that. The element url is in the EMS dashboard under server info iirc | 21:12 |
clarkb | the credentials to log into the ems dashboard and then to the admin account are in the typical location | 21:12 |
clarkb | (can you tell I don't want to end up as the spof here? :) ) | 21:13 |
clarkb | ok I'm going to do those network upgrades now. I'll be back soon | 21:13 |
corvus | i'm going to make an account for an eavesdrop bot | 21:32 |
corvus | (which should exercise the second-user testing clarkb asked for too :) | 21:33 |
clarkb | network and local kernel updates complete. I'm still here so they went ok I guess :) | 21:34 |
corvus | clarkb, fungi: i notice our hostname is no longer 'eavesdrop.*' is there a rebranding effort i should be aware of? | 21:35 |
corvus | i'm inclined to make a user @eavesdrop:opendev.org but don't want to undermine that. | 21:36 |
corvus | naming the user @meetings:opendev.org doesn't seem quite right though; it's not a meeting bot | 21:36 |
corvus | could go with 'logs' i guess? | 21:37 |
clarkb | corvus: one sec for me to double check something | 21:37 |
clarkb | eavesdrop01.opendev.org is still a thing where we run the bots (its a new server replacing the old one though not sure that is cimplete) | 21:38 |
corvus | but the cname is 'meetings.' now, yeah? | 21:38 |
clarkb | as part of that replacement the hosting was centralized under the meetings and logs website at meetings.opendev.org. | 21:38 |
clarkb | ya the idea behind that was humans interacting with the system think meetings ratehr than eavesdrop to find meeting logs | 21:39 |
corvus | as a project that uses logs w/o meetings, that doesn't help, but i guess it's too late to provide feedback on that change :) | 21:39 |
clarkb | ya ianw did that as part of the cleanup from the oftc move | 21:39 |
clarkb | I don't think the eavesdrop name will go away (though it is a redirect) | 21:40 |
corvus | well, there never was an eavesdrop.opendev.org afaik | 21:40 |
clarkb | oh hrm ya there may not have been | 21:40 |
clarkb | There isn't now | 21:40 |
corvus | anyway, i don't expect this to get any less weird for the zuul project, like i said, that ship seems to have sailed | 21:41 |
corvus | just wondering what the name of the bot should be :) | 21:41 |
clarkb | eavesdrop is sufficient generic and also descriptive that I still like that. THough I wonder if anyone had concerns with that name being creepy or something | 21:42 |
clarkb | I'm happy with 'eavesdrop' or 'logs' and iirc we easily disable accounts in EMS if we want to | 21:42 |
clarkb | that means if we go with one and don't like it switching to another should be easy | 21:42 |
clarkb | (without consuming user slots) | 21:42 |
corvus | okay, i'll just go with logs | 21:43 |
mordred | corvus: are we thinking we'll have a log bot and a meet bot as separate bots? | 21:54 |
corvus | mordred: well, i'm writing a log bot as we speak, but i don't plan on writing a meetbot since it's not required for zuul | 21:55 |
corvus | if someone wants to extend what i'm writing, i have no objections | 21:55 |
mordred | nod | 21:55 |
fungi | yeah, i didn't question the name of the new service, though it was hurried in so we could stop running a fork of supybot, which meant limnoria not deployed via puppet, and solving the irc-meetings publication which went to static.o.o so got a new site name | 21:57 |
corvus | i have it on good authority naming is hard | 21:57 |
fungi | i expect we could bring back the eavesdrop name for the location of channel logs, and keep meeting logs on the meetings site, would just need to figure out the details | 21:57 |
mordred | it's log. it's log. it's big, it's heavy, it's wood. | 21:58 |
fungi | basically we had static content (irc-meetings publication which we'd prefer to serve out of afs as a normal publication job) and dynamic content written by the bot to the server | 21:58 |
fungi | it's better than bad, it's good | 21:58 |
fungi | since the meeting logs and channel logs are in separate file trees, it should be fairly easy to have those be different sites | 21:59 |
fungi | having the eavesdrop site served from static.o.o while the bots ran on an eavesdrop server which did not host that site would have been kinda weird to manage, so i understand ianw's need to come up with a separate name for the former | 22:01 |
fungi | but to the original question, if there was any conscious rebranding i think it was merely to have something on the opendev.org domain since we were rebuilding it anyway, and make the openstack.org names legacy cnames/redirects | 22:02 |
corvus | oy, the bots need to agree to the T&C | 22:03 |
corvus | but neat: the error for that includes the url to agree, so we can do so without having to log in (i have just done so for logs) | 22:04 |
fungi | anybody happen to know what might have modified /etc/accessbot/channels.yaml on eavesdrop01 around 21:43z? the last change to that file was from 800296 which merged and deployed hours prior | 22:33 |
corvus | i have not logged into that server recently | 22:34 |
fungi | i'm trying to figure out why the accessbot run log acts like that change hadn't been included | 22:34 |
clarkb | wasn't me. Are you saying the file was modified at that time but did not include the modifications you expected? | 22:34 |
fungi | wondering if we have some sort of order of operations problem, though /var/log/ansible/run-accessbot.yaml.log indicates it copied the file into place before it ran accessbot | 22:34 |
fungi | the file appears to have been modified a couple hours after the deploy job completed: https://zuul.opendev.org/t/openstack/build/b35fd7771bd4495d84eb5332109f6e2e | 22:35 |
fungi | just judging from the last modified timestamp on it | 22:35 |
fungi | so i can't tell if it was updated more recently than that build to include the change which triggered that build and the deploy used a stale copy of the file | 22:36 |
fungi | basically i don't know what the file on disk there actually looked like when infra-prod-run-accessbot ran, because it looks like something has touched that file since then | 22:37 |
fungi | i'm half expecting that if i just manually run accessbot now, the change from 800296 will get applied (it seems to be in the config currently present on disk) | 22:38 |
clarkb | the file comes from project config right? I susppose it is possible there is a race somewhere around updating project-config in the deploy job and running the deploy job | 22:39 |
fungi | but without trying that, i can't say whether there's a bug in accessbot which caused it to ignore what was added with 800296 or a bug in our deployment logic that we're not deploying the version of that file which is triggering the deployment | 22:39 |
fungi | similar changes have taken effect before now, so yes i've got a strong suspicion we're "eventually consistent" by applying changes on the run after the run triggered by their merging | 22:40 |
fungi | and related to that, i'm wondering what else touched the config file since it seems to have been something other than the infra-prod-run-accessbot job | 22:41 |
fungi | maybe we have multiple deploy jobs writing that file and they interfere with one another? | 22:41 |
fungi | i only see playbooks/roles/accessbot/tasks/main.yaml in system-config writing it though | 22:43 |
fungi | so maybe we have more than one job running that role | 22:44 |
clarkb | I think we have an hourly job or maybe daily? | 22:49 |
fungi | yep, it's infra-prod-service-eavesdrop doing it | 22:49 |
fungi | just found it in the log | 22:49 |
fungi | though do those jobs share a semaphore? | 22:50 |
fungi | i guess not, because they're triggered from different projects in different pipelines | 22:50 |
fungi | so i suppose it's possible for them to race one another | 22:50 |
fungi | both infra-prod-run-accessbot and infra-prod-service-eavesdrop install /etc/accessbot/channels.yaml on the server, but only infra-prod-run-accessbot also runs accessbot | 22:52 |
clarkb | fungi: we can probably make them share a semaphore? | 22:53 |
fungi | so if infra-prod-service-eavesdrop started shortly before the change merged but wrote what had been the master branch copy of the config between when infra-prod-run-accessbot wrote it and when it actually ran eavesdrop (there's an image pull and other stuff which happens in between those steps so could be a lengthy window) we'd end up applying a stale config | 22:54 |
fungi | all of this also happened right around the time of the zuul restart and reenqueue | 22:56 |
fungi | well, maybe not quite. the task to write that file in the run job logged at 19:26:10 and the zuul restart was 40 minutes later | 22:57 |
fungi | there was an hourly run of infra-prod-service-eavesdrop which logs writing the file at 18:45:36 and again at 21:45:07 but we don't have any logs for two more runs which would have happened between those | 23:00 |
fungi | so those were probably delayed by the zuul restart | 23:00 |
clarkb | the 20:00ish run would've been caught by the restart | 23:00 |
clarkb | ya | 23:00 |
fungi | anyway, i don't have a smoking gun showing they were both writing the file around the same time | 23:01 |
fungi | trying a manual run of accessbot now to see if it applies the current configuration | 23:04 |
fungi | yep, it's applying it correctly | 23:07 |
fungi | the #opendev-manila access list just jumped from 11 to 18 entries | 23:08 |
fungi | the log from the previous run showed it only checking the entries from our global admins/ops for that channel, so i'm fairly certain it ran with an old copy of the channels.yaml | 23:09 |
fungi | maybe this isn't a race... | 23:10 |
fungi | one thing i don't see the infra-prod-run-accessbot job doing is pushing an updated project-config onto the eavesdrop server | 23:10 |
fungi | which i think would explain this behavior? | 23:11 |
fungi | yeah, there's a sync-project-config task which happens in infra-prod-service-eavesdrop but not in infra-prod-run-accessbot | 23:13 |
clarkb | ya that could do it too | 23:14 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Sync project-config before deploying accessbot https://review.opendev.org/c/opendev/system-config/+/800314 | 23:18 |
opendevreview | Clark Boylan proposed opendev/system-config master: Preserve zuul executor SIGTERM behavior https://review.opendev.org/c/opendev/system-config/+/800315 | 23:23 |
fungi | that would have been much easier to spot if infra-prod-service-eavesdrop hadn't run, since i would have seen the stale config and stale copy of project-config on the server | 23:23 |
clarkb | Now we won't forget to do that | 23:23 |
fungi | thanks! | 23:24 |
fungi | that should be fine to merge sooner, since the executors currently ignore that envvar anyway | 23:25 |
clarkb | thats true. I put the depeonds on in there anyway though | 23:27 |
fungi | that at least ensures we don't wind up needing to amend it | 23:30 |
fungi | so fine by me | 23:31 |
corvus | mordred, fungi: i'm i'm working on a container image for this eavesdrop bot; i need libolm3 but i only see libolm2 in our python-builder (debian buster i think?) | 23:55 |
corvus | should i add some backport thing, or clone+install from upstream repo? | 23:56 |
corvus | looks like it may be in buster-backports, so maybe i just add that to sources.list? | 23:58 |
fungi | corvus: yep, i just confirmed buster-backports has it | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!