fungi | neat! | 00:00 |
---|---|---|
mordred | clarkb: that's not a terrible approach | 00:00 |
fungi | yeah, sounds about right | 00:00 |
clarkb | horizon really hides the clouds.yaml file | 00:05 |
fungi | maybe horizon would rather you used horizon ;) | 00:07 |
mordred | clarkb: it does eventually provide one though right? | 00:08 |
mordred | (that took a while to get added) | 00:08 |
clarkb | mordred: ya you just have to dig through the ui. if you do the account dropdown it only shows openrc. for clouds.yaml you have to go to Project -> Api Access -> Download OpenRC file -> change it to clouds.yaml | 00:09 |
*** corvus has quit IRC | 00:09 | |
* mordred should make a patch | 00:10 | |
*** corvus has joined #opendev | 00:10 | |
fungi | indeed, part of the effort to raise up the universal sdk should at least mean the clouds.yaml is listed by default and you need to switch it to openrc if that's what you really want | 00:11 |
mordred | yeah | 00:11 |
mordred | also - I keep meaning to make an "import clouds.yaml" command for osc | 00:11 |
mordred | so that you could download a clouds.yaml from horizon, run "osc config import downloaded-file.yaml" and it would add that entry into your clouds.yaml - and potentially prompt you for a password (since the password will be missing from the downloaded file) | 00:12 |
fungi | that'd be neat | 00:12 |
clarkb | ya looking at this a bit more I think the "manual tasks" so far are the actual button clicking to make a cloud, then will be adding other root's ssh keys to the hosts, then creating the project/users for nodepool things. | 00:12 |
clarkb | I'm sure other stuff will come up too like quotas | 00:12 |
clarkb | but so far thats my list of what is required | 00:13 |
fungi | maybe horizon should even start to suggest that command (once it exists) | 00:13 |
mordred | ++ | 00:13 |
fungi | clarkb: well, (re)building the mirror has rather a few manual steps later | 00:13 |
clarkb | fungi: ya I mean more from the perspective of "how is this different than the other cloud we consume" | 00:13 |
fungi | oh, yep | 00:14 |
*** corvus has quit IRC | 00:15 | |
*** corvus has joined #opendev | 00:17 | |
clarkb | https://docs.flexmetal.net/day-1-getting-started-with-openstack/ those are their bootstrapping docs too fwiw which can help others who may want to take a look | 00:22 |
clarkb | essentailly there is a cloud provisioning dashboard on their website and from that you get a cloud with horizon and apis and such | 00:22 |
clarkb | next up I've discovered I need to learn about roles | 00:24 |
clarkb | there are also groups? | 00:25 |
clarkb | https://docs.openstack.org/keystone/latest/admin/service-api-protection.html | 00:26 |
fungi | fancy modern keystone, yep | 00:26 |
*** tosky has quit IRC | 00:27 | |
clarkb | I think what we want is two users with role member. Then a project for each. To most properly mimic our other setups. I don't think we need any groups | 00:27 |
fungi | sounds right to me | 00:27 |
clarkb | ah ok I think I've hit my first major snag. I don't see https for horizon or api access | 00:29 |
clarkb | the cloud provisioner api is all ssl'd, but the layer under that for the cloud itself isn't | 00:30 |
clarkb | I can do a port forward for bootstrapping things, but I think we really want nodepool to talk https, so why don't I ask if they have a way to do that already | 00:30 |
fungi | yep, better than hacking something together ourselves when they already have a clean solution | 00:31 |
fungi | (hopefully they do) | 00:32 |
clarkb | I'm looking around on the host too but not finding a 443 listener | 00:32 |
corvus | clarkb: agree re two users with member role | 00:47 |
clarkb | sounds like there isn't a magic ssl setup we can lean on. But I think we may be able to set up a proxy on the hosts ourselves? I've provided this as feedback that something even self signed is important. I also let them know that clouds.yaml can verify self signed pretty easily so for api access at least its not terrible | 00:48 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg-backup: add a few more global excludes https://review.opendev.org/c/opendev/system-config/+/774179 | 00:49 |
corvus | clarkb: seems like theoretically a service offering could use LE. or if we use a proxy, maybe we could use LE? | 00:49 |
clarkb | corvus: yes, and yes. The first thing is sorting out how the exteranl access is all routed through kolla and stuff (there is a VIP of some sort) and I figured self signed would make that easy | 00:50 |
clarkb | but ya we should be able to have them generate LE certs | 00:50 |
*** dviroel has quit IRC | 00:50 | |
fungi | can it be exposed through whatever's serving horizon? | 00:52 |
clarkb | no I think we'll be putting it in front of horizon | 00:53 |
clarkb | horizon is running out of containers as part of their deployment and I worry that if you start changing that too much you'll trip up the auomation for the cloud stuff | 00:54 |
clarkb | I should go help figure out dinner now. I'll look at the network setup and try to figure out how packets for the vip end up on the controllers and from that we can make plans for terminating ssl | 00:55 |
clarkb | er in the morning I mean | 00:55 |
fungi | yes, definitely go dinner | 00:56 |
fungi | and i'm a gonna drink | 00:56 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: ask: stream db backup https://review.opendev.org/c/opendev/system-config/+/774181 | 01:21 |
*** akrpan-pure has quit IRC | 01:23 | |
*** DSpider has quit IRC | 01:54 | |
*** hamalq has quit IRC | 02:14 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: ask: stream db backup https://review.opendev.org/c/opendev/system-config/+/774181 | 02:25 |
*** lbragstad has quit IRC | 02:35 | |
fungi | zuul's getting close enough to idle i'm going to restart the zuul-executor containers on ze08 and ze12 to restore their missing console streamers | 02:43 |
ianw | reverting mysql-client-5.7_5.7.32-0ubuntu0.16.04.1_amd64.deb makes dumps work again | 02:44 |
fungi | #status log restarted the zuul-executor container on ze08 and ze12 to restore their console streamers, previously lost to recent oom events | 02:47 |
openstackstatus | fungi: finished logging | 02:47 |
openstackgerrit | Yanos Angelopoulos proposed openstack/diskimage-builder master: Prevent pip2 to try to update to version > 21 https://review.opendev.org/c/openstack/diskimage-builder/+/774186 | 02:56 |
openstackgerrit | Merged opendev/system-config master: borg-backup: add a few more global excludes https://review.opendev.org/c/opendev/system-config/+/774179 | 03:01 |
openstackgerrit | Merged opendev/system-config master: ask: stream db backup https://review.opendev.org/c/opendev/system-config/+/774181 | 03:02 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: translate: backup zanata db directly to borg https://review.opendev.org/c/opendev/system-config/+/774189 | 03:09 |
*** lbragstad has joined #opendev | 03:15 | |
ianw | after the askdb stuff there merges, i will purge it and re-create it, which should give us enough room to start backing up the wiki server | 03:26 |
ianw | the end result being we can retire bup | 03:26 |
*** brinzhang_ has quit IRC | 03:26 | |
*** hemanth_n has joined #opendev | 03:35 | |
*** brinzhang_ has joined #opendev | 03:35 | |
openstackgerrit | Merged opendev/system-config master: translate: backup zanata db directly to borg https://review.opendev.org/c/opendev/system-config/+/774189 | 04:10 |
*** zimmerry has quit IRC | 04:55 | |
*** ysandeep|away is now known as ysandeep | 04:55 | |
*** ysandeep is now known as ysandeep|ruck | 04:55 | |
*** zimmerry has joined #opendev | 04:56 | |
*** ykarel has joined #opendev | 05:02 | |
*** DSpider has joined #opendev | 05:28 | |
*** ykarel has quit IRC | 05:50 | |
*** ykarel has joined #opendev | 05:51 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: ask: fix backup typo and ignore live postgresql https://review.opendev.org/c/opendev/system-config/+/774197 | 06:41 |
*** eolivare has joined #opendev | 06:47 | |
*** slaweq has joined #opendev | 07:02 | |
*** marios has joined #opendev | 07:02 | |
openstackgerrit | Merged opendev/system-config master: ask: fix backup typo and ignore live postgresql https://review.opendev.org/c/opendev/system-config/+/774197 | 07:14 |
*** rpittau|afk is now known as rpittau | 07:57 | |
*** ralonsoh has joined #opendev | 08:02 | |
*** sboyron has joined #opendev | 08:06 | |
*** ysandeep|ruck is now known as ysandeep|lunch | 08:09 | |
*** andrewbonney has joined #opendev | 08:18 | |
*** hashar has joined #opendev | 08:24 | |
*** ysandeep|lunch is now known as ysandeep | 08:33 | |
*** ykarel is now known as ykarel|away | 08:39 | |
*** tosky has joined #opendev | 08:41 | |
*** ykarel|away has quit IRC | 08:48 | |
*** fressi has joined #opendev | 08:52 | |
*** jpena|off is now known as jpena | 08:57 | |
*** ysandeep is now known as ysandeep|rover | 09:07 | |
openstackgerrit | Martin Kopec proposed opendev/system-config master: Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258 | 09:19 |
*** JayF has quit IRC | 09:32 | |
*** zimmerry has quit IRC | 09:32 | |
*** JayF has joined #opendev | 09:36 | |
openstackgerrit | Luigi Toscano proposed opendev/irc-meetings master: Fix the ID of the OpenStackSDK meetings https://review.opendev.org/c/opendev/irc-meetings/+/774212 | 09:43 |
*** zbr is now known as zbr|pto | 09:54 | |
zbr|pto | fungi: I did my bit on https://review.opendev.org/q/topic:%22bindep-2.9%22+(status:open%20OR%20status:merged) --- now we need ok from others, maybe ianw clarkb or mordred | 09:55 |
*** dtantsur|afk is now known as dtantsur | 10:01 | |
*** brinzhang_ has quit IRC | 10:08 | |
*** Tengu has quit IRC | 10:30 | |
*** hemanth_n has quit IRC | 10:52 | |
*** dviroel has joined #opendev | 10:57 | |
*** Tengu has joined #opendev | 11:08 | |
*** mciecierski has joined #opendev | 11:24 | |
*** ysandeep|rover is now known as ysandeep|afk | 11:35 | |
*** zimmerry has joined #opendev | 11:43 | |
*** dmellado has quit IRC | 12:21 | |
*** dmellado has joined #opendev | 12:23 | |
*** ysandeep|afk is now known as ysandeep|rover | 12:32 | |
*** jpena is now known as jpena|lunch | 12:32 | |
*** artom has joined #opendev | 12:37 | |
*** eolivare_ has joined #opendev | 12:56 | |
*** lbragstad_ has joined #opendev | 12:56 | |
*** eolivare has quit IRC | 12:56 | |
*** lbragstad has quit IRC | 12:56 | |
openstackgerrit | Merged opendev/irc-meetings master: Fix the ID of the OpenStackSDK meetings https://review.opendev.org/c/opendev/irc-meetings/+/774212 | 13:18 |
*** jpena|lunch is now known as jpena | 13:31 | |
*** tkajinam has quit IRC | 13:50 | |
openstackgerrit | Daniel Blixt proposed zuul/zuul-jobs master: Allow mirror push to delete current branch https://review.opendev.org/c/zuul/zuul-jobs/+/764152 | 13:51 |
*** jpena is now known as jpena|dojo | 13:51 | |
openstackgerrit | Daniel Blixt proposed zuul/zuul-jobs master: Allow mirror push to delete current branch https://review.opendev.org/c/zuul/zuul-jobs/+/764152 | 13:53 |
*** zoharm has joined #opendev | 13:54 | |
openstackgerrit | Dinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354 | 13:56 |
*** chandankumar is now known as raukadah | 13:59 | |
*** whoami-rajat__ has joined #opendev | 14:01 | |
mordred | fungi: was just reviewing the bindep stack (all lgtm) - the description-file -> long-description change made me think that should be a pbr patch | 14:30 |
mordred | like - pbr should translate use of description-file to the two long description fields, no? | 14:31 |
mordred | (obviously if somoene gives the long description fields, that's cool) | 14:31 |
mordred | I mean - actually the more that I think of it, in keeping with pbr's philosophy - I'm not sure what pbr can't just find README, README.rst or README.md and dtrt without having someone have to provide that info at all | 14:32 |
*** artom has quit IRC | 14:48 | |
openstackgerrit | Merged opendev/bindep master: Move all jobs in-repo https://review.opendev.org/c/opendev/bindep/+/773797 | 14:50 |
openstackgerrit | Merged opendev/bindep master: Build docs for OpenDev https://review.opendev.org/c/opendev/bindep/+/773796 | 14:50 |
openstackgerrit | Merged opendev/bindep master: Remove release note about rpm path references https://review.opendev.org/c/opendev/bindep/+/774031 | 14:50 |
*** mciecierski has quit IRC | 14:54 | |
fungi | mordred: yeah, it could be argued that pbr should also simply step out of the way if there are standard setup.cfg fields setuptools already understand | 14:54 |
fungi | but with setuptools use on the decline i don't know how much i want to wrestle it | 14:55 |
fungi | anyway, this is basically the same change i applied to zuul-client | 14:55 |
fungi | and am expecting to do similar for, e.g., git-review | 14:55 |
frickler | mordred: fungi: the bindep docs promote job has failed, but I'm not sure what the error is https://74e9c0429c6a046612ec-9df07301c72e495bf406244d2eaedc4c.ssl.cf1.rackcdn.com/773797/4/promote/opendev-promote-docs/730edad/job-output.txt | 14:59 |
mordred | fungi: yah - I could see the argument of having it step out of the way - but on the other hand I've always seen part of its job to be taking care of fiddly stuff for you so you don't have to | 15:00 |
*** artom has joined #opendev | 15:00 | |
mordred | and while those long description fields are neat - they aren't really a pleasant interface :) | 15:00 |
frickler | ah, maybe it was superseded, it passed on the next patch | 15:00 |
*** ysandeep|rover is now known as ysandeep|away | 15:03 | |
*** lbragstad_ is now known as lbragstad | 15:03 | |
fungi | frickler: oh, thanks for spotting, i'll double-check that but yeah there was a patch to move jobs in-repo and another to add the opendev docs publication jobs, so maybe i should have squashed something | 15:12 |
fungi | i half expect to find fallout i need to clean up from switching zuul tenants anyway, so need to check over a few things closely | 15:14 |
*** rpittau is now known as rpittau|afk | 15:17 | |
openstackgerrit | Merged opendev/bindep master: Overhaul Python package metadata https://review.opendev.org/c/opendev/bindep/+/774106 | 15:43 |
*** redrobot3 has joined #opendev | 15:45 | |
*** redrobot has quit IRC | 15:49 | |
*** redrobot3 is now known as redrobot | 15:49 | |
openstackgerrit | Merged opendev/bindep master: Update contributor doc and readme https://review.opendev.org/c/opendev/bindep/+/774107 | 15:55 |
*** tosky has quit IRC | 16:29 | |
*** fressi_ has joined #opendev | 16:36 | |
*** fressi has quit IRC | 16:36 | |
*** fressi_ is now known as fressi | 16:37 | |
fungi | clarkb: ianw: almost glacial, but the stable proposed updates request to approve a patched openafs into debian/buster was filed today: https://bugs.debian.org/982002 | 16:49 |
openstack | Debian bug 982002 in release.debian.org "buster-pu: package openafs/1.8.2-1" [Normal,Open] | 16:49 |
*** marios is now known as marios|out | 16:49 | |
clarkb | fungi: better late than never. I guess it was only 1.6 that started working again in february | 16:49 |
fungi | though i don't think we consume tpu in our nodes, so might still need to wait for a stable point release of buster | 16:50 |
clarkb | our ppa should work until then | 16:50 |
fungi | they weren't able to convince the security team that it qualified for inclusion in debian-security | 16:51 |
fungi | and non-security-related package updates into stable take waaaaay longer | 16:51 |
clarkb | I suppose that it isn't a security issue if it just fails to work :) | 16:51 |
fungi | yup | 16:51 |
*** marios|out has quit IRC | 16:56 | |
*** zoharm has quit IRC | 17:00 | |
fungi | frickler: taking a closer look at that initial opendev-promote-docs failure i think the problem is it ran even though we didn't do a docs build in the gate, so it failed to download the nonexistent artifact from that | 17:01 |
fungi | technically not a download failure, but it tried to reference a list index in a null list | 17:02 |
fungi | as a result of having no artifact to retrieve | 17:02 |
fungi | we could probably make the "download-artifact: Parse build response" task a little more robust there, but it's a corner case i'm tempted to just ignore | 17:03 |
*** klonn has joined #opendev | 17:05 | |
corvus | clarkb: your gerrit status is turkey time | 17:08 |
fungi | it's always turkey time! | 17:09 |
corvus | agreed | 17:09 |
fungi | kinda like how everyday is hallowe'en | 17:09 |
corvus | and the eternal september? | 17:09 |
fungi | we don't speak of that | 17:10 |
clarkb | I never updated it after novemeber | 17:10 |
*** ralonsoh has quit IRC | 17:14 | |
*** diablo_rojo has joined #opendev | 17:20 | |
fungi | i'm sort of surprised moving bindep out of the openstack tenant hasn't created any new zuul config errors, but no matches for bindep anywhere in the mess at https://zuul.opendev.org/t/openstack/config-errors | 17:24 |
fungi | and aside from the aforementioned docs promote first-time race and a mysterious retry on a single tox-pep8 run, all the bindep jobs are succeeding consistently | 17:25 |
fungi | and documentation is now correctly redirected to https://docs.opendev.org/opendev/bindep and we have release notes getting published for it, so i think it's time to tag | 17:26 |
fungi | infra-root: any objections to tagging a bindep 2.9.0 release shortly? | 17:27 |
fungi | actually i may tag 2.9.0.0rc1 just to make sure the release jobs are functional after all the job switcheroo | 17:28 |
clarkb | tagging an rc first sounds like a good idea | 17:29 |
clarkb | no objections from me | 17:29 |
*** eolivare_ has quit IRC | 17:30 | |
fungi | bombs away | 17:31 |
fungi | didn't trigger anything, checking the pipeline defs | 17:36 |
fungi | yeah, our release pipeline doesn't match on prerelease strings, working on a patch | 17:38 |
openstackgerrit | Jeremy Stanley proposed opendev/project-config master: Recognize PEP-440 prereleases in release pipeline https://review.opendev.org/c/opendev/project-config/+/774291 | 17:42 |
fungi | infra-root: ^ i expect my explanation there makes sense for why we shouldn't need a separate prerelease pipeline, but please do let me know if it seems like a bad idea | 17:43 |
fungi | (note this is specifically for the opendev tenant in zuul) | 17:43 |
clarkb | fungi: do you remember why we split them before? | 17:44 |
fungi | clarkb: in the beforetime, in the longlongago, pip didn't know to avoid installing prereleases automatically, so we only published wheels and not tarballs, hence ran separate jobs between the two pipelines | 17:45 |
fungi | i think we may have in some cases also avoided running specific jobs to do things like update documentation if it was a prerelease | 17:46 |
clarkb | aha | 17:46 |
*** klonn has quit IRC | 17:47 | |
fungi | one other thing i'm not sure how to tackle cleanly, looks like we only publish docs when new changes merge, so they'll have references to the old release until the first change merges after a release is tagged. an obvious workaround is to merge something trivial soon after releasing | 17:48 |
*** jpena|dojo is now known as jpena|off | 17:48 | |
fungi | back when we did docs builds and uploads in post, it was fairly easily to make those jobs also be able to run similarly in release | 17:49 |
fungi | but with the gate+promote it seems like that's not as simple to apply to something you trigger after tagging a release | 17:50 |
fungi | probably not urgent to solve, just means we have an incentive to review/merge stuff soon after any release | 17:50 |
clarkb | gerrit account cleanup update: tbachman has two accounts, one of which is the actively used account and that one had a preferred email address has no external id issue. The preferred email address that was set was in use by the other account (but the other account wasn't used for reviews or code pushing) | 17:53 |
clarkb | talking to tbachman we decided that setting the preferred email address on the active account to match the email address associated with the openids on that account was the easiest option. Trying to add external ids for the preferred email address would not work because that would conflict with the other account | 17:54 |
clarkb | tbachman made this change through the web ui on review-test, then we tested that log out and log in shows the account id did not change. We also checked that reviews and code pushes could still be made | 17:54 |
clarkb | since that all looked good tbachman went ahead and made the change on prod and that is one less account with consistency issues | 17:54 |
clarkb | My next step is to disable the other account to reduce confusion when people try to add reviewers | 17:55 |
clarkb | I have not done that yet, but am proceeding with it now | 17:55 |
clarkb | posting all this to channel as I think there may be some other accounts whee we just set the preferred email to match the openid (though maybe without user intervention) | 17:55 |
fungi | thanks for the walkthrough! i guess after the old account is disabled you're going to rerun the consistency check and see if those accounts no longer appear as errors? | 17:56 |
clarkb | fungi: "yes". I was hoping to get through a few more accounts before rerunning the consistency check, but if we think that is prudent to do first I can go ahead and do that | 17:57 |
clarkb | fungi: review-test:~clarkb/gerrit-consistency-notes/further-preferred-email-cleanups <- for my notes on cleaning up another 20 or so. They are arranged in roughly most confident to least confident in plan from beginning to end of that file if you want to take a look | 17:57 |
clarkb | sort of related: I was just looking at gerrit set-account to get the command right for --inactive. It looks like the ssh api for accounts can remoev and add and set preferred emails. Maybe we can use this as a way around the lack of a rest api for updating external ids? at least for emails | 17:59 |
clarkb | I'll file that away for further testing on review-test if it looks like it will be useful | 18:00 |
clarkb | and the second account for tbachman has been set inactive | 18:02 |
fungi | yeah, ideally we could. having seen the conditions under which those cli calls fail though, i'm skeptical we'll be able to rely on it to fix broken accounts | 18:02 |
clarkb | ya will definitely need testing | 18:02 |
fungi | it seems to give up in the face of incomplete account values | 18:02 |
*** dtantsur is now known as dtantsur|afk | 18:06 | |
clarkb | fungi: maybe if you get a chance you can look over the first set of accounts in that further-preferred-email-cleanups, there are 8, and I can run the script from last time to retire them if that plan looks good to you. Then rerun the consistency checks? | 18:06 |
clarkb | that set are accounts that I suspect we merged but elft behind ssh username external ids (since you can't merge those like the openids as far as I know) | 18:07 |
fungi | can do | 18:07 |
clarkb | I will run a consistency check against -test now and confirm tbachman's account is happy there though | 18:09 |
clarkb | fungi: I've +2'd the pipeline change for oepndev releases. | 18:13 |
clarkb | consistency check on -test shows tbachman's problem is gone from that perspective too \o/ | 18:16 |
fungi | excellent! | 18:18 |
fungi | clarkb: looking at your notes, are ssh usernames now separate from http usernames? | 18:22 |
fungi | or are they differentiated by whether or not they have a password? | 18:23 |
clarkb | fungi: oh I guess not. I've always thought of them as ssh because the http is newer (though old at this point) | 18:23 |
clarkb | I believe none of them had passwords because those are external ids as well and I tried tonote all the different ypes of external ids they had but let me double check that | 18:23 |
fungi | okay, so when it says "ssh username external id" that's specifically a "username:something" external id | 18:24 |
fungi | used to be username external ids had a password in that record as a separate column/field | 18:25 |
clarkb | yup they still do. But that first account in my list does not have that record. | 18:25 |
clarkb | I can check the other 7 really quickly too | 18:25 |
clarkb | so ya if the http passwd is set it would show up in the username:foo record. But if not set it doesn't have a line for that | 18:25 |
fungi | so usually if there was a username and an ssh public key then it might be in use for ssh access, while a username with a password then it might have been used for rest api access | 18:26 |
fungi | (the ssh public key being in a separate table/store of course) | 18:26 |
fungi | also they could have a username with no password and no ssh keys for the account, fairly common for older accounts since the openid used to autopopulate a username | 18:27 |
fungi | in those cases the account was at most only being used with the webui | 18:27 |
fungi | similar to accounts with no username | 18:27 |
clarkb | all 8 of those have no http passwd. 7 of them have only the one external id for the username. The odd one out has a mailto externalid as well | 18:28 |
clarkb | I had classified these separately because they all had a second account that appeared to be the actual one | 18:28 |
fungi | right, so almost guaranteed to be relics of older duplicate account merges/fixwes | 18:29 |
clarkb | the other account having openids and such | 18:29 |
clarkb | the lack of openids on these being the main clue | 18:29 |
clarkb | well when coupled with a secnodary account having openids | 18:29 |
fungi | yeah, only accounts which had been hand-fixed for situations like that or old manually created service accounts for bots and third-party ci systems would lack openids entirely | 18:30 |
clarkb | The next group is the third party ci category | 18:31 |
clarkb | there are two in that one, netiehr show up on the wiki today so I was thinking we can retire them too and if they need CI for that a new account set up in the modern method would be best | 18:32 |
fungi | yeah, i agree with your analysis on both those groups | 18:32 |
clarkb | then as you go further in that file my confidence in various classifications becomes weaker, though for the most part I think we can retire those accounts. tbachman was top of the "ok we need to actually figure this out and not retire it" list | 18:32 |
clarkb | fungi: cool, should I go ahead and use my script from last time to retire the first 2 groups (10 accounts) ? | 18:33 |
fungi | yes, i think that's safe | 18:34 |
fungi | did you want to adjust the invocation to catch stderr this time? | 18:34 |
clarkb | I can | 18:34 |
fungi | otherwise the set -x isn't super helpful as we miss it in the log | 18:34 |
clarkb | then I can upload the resulting log file next to the existing one on review | 18:34 |
fungi | sounds great | 18:35 |
clarkb | will take me a few minutes to page in how that all worked and get perms sorted out but proceeding with that shortly | 18:35 |
*** klonn has joined #opendev | 18:35 | |
fungi | infra-root: i'm self approving 774291 to augment the release pipeline in the opendev tenant for prerelease version strings for now, but if you disagree please don't hesitate to revert it | 18:35 |
openstackgerrit | Merged opendev/project-config master: Recognize PEP-440 prereleases in release pipeline https://review.opendev.org/c/opendev/project-config/+/774291 | 18:37 |
*** tosky has joined #opendev | 18:37 | |
fungi | and now that's merged, i'm manually reenqueuing the bindep 2.9.0.0rc1 tag ref | 18:40 |
fungi | an opendev-release-python build is queued for it now | 18:41 |
fungi | and it's running | 18:42 |
fungi | yay! https://pypi.org/project/bindep/#history shows a 2.9.0.0rc1 now | 18:45 |
fungi | the new project links appear correctly in the left sidebar of https://pypi.org/project/bindep/2.9.0.0rc1/ too | 18:46 |
fungi | once i'm done inspecting the sdist and wheel contents and doing some install tests from pypi, i'll tag 2.9.0 proper and put together a release announcement with the release notes we've got for it | 18:49 |
clarkb | nice | 18:49 |
clarkb | the first set of 8 account updates is done. Doing the couple of CIs next (split them in order to change the commit message | 18:49 |
fungi | awesome | 18:49 |
clarkb | and the CI accounts are done now too. That is a total of 8 + 2 + tbachman's 1 = 11 more fixed account issues | 18:54 |
clarkb | I'll work on rerunning the consistency checker next | 18:54 |
fungi | strange, i thought the license_files key in setup.cfg would cause AUTHORS and LICENSE to be included in the wheel, but they aren't as far as i can see | 18:56 |
clarkb | you may need manifest.ini? | 18:56 |
clarkb | since pbr is figuring out the file list itself maybe it overrides some setuptools defaults? | 18:56 |
clarkb | I put my log file at review:~clarkb/gerrit_user_cleanups/user-retirement.log.20210205 too btw | 18:57 |
fungi | actually, i'm mildly confused by which files it's decided to include | 18:57 |
fungi | for example the wheel includes bindep/releasenotes/notes/use-distro-library-db71244a0a5cf1dd.yaml but nothing else from the releasenotes tree | 18:57 |
clarkb | I thought pbr was supposed to do anything it thought was part of the python package (eg things in dirs with __init__.py) and then also things in MANIFEST.in | 18:59 |
clarkb | but I'd have to go read docs/code to be sure of that | 18:59 |
*** klonn has quit IRC | 19:02 | |
clarkb | I have requested a consistency check from prod. Now we wait | 19:02 |
clarkb | diffing older consistency report and one just generated looks like I expected. There were 11 additional issues last time. Those are now gone. There are no new issues | 19:09 |
fungi | excellent | 19:09 |
clarkb | there are 17 preferred email missing external id issues now. (Down from 109 when we started) | 19:09 |
clarkb | the rest of them are classified in my document I noted above. I just think that more careful review will be needed on those | 19:10 |
clarkb | I'm going to deescalate my privs now as things look to check out to me | 19:10 |
fungi | comparing the published bindep wheel to one of my personal projects, i would expect to see AUTHORS and LICENSE inside the dist-info tree | 19:12 |
fungi | huh, when i build a bindep wheel on my workstation, those files are included | 19:14 |
fungi | so this may come down to setuptools/wheel versions | 19:15 |
*** andrewbonney has quit IRC | 19:15 | |
clarkb | weird | 19:16 |
mordred | yeah - I thought pbr put the two of those into the dist by default | 19:17 |
mordred | oh - I think pbr ensures they wind up in the source dist | 19:17 |
fungi | yeah, this may be because we're running the job on ubuntu-bionic, i wonder if ubuntu-focal will fare better | 19:18 |
openstackgerrit | Jeremy Stanley proposed opendev/bindep master: Build releases on ubuntu-focal https://review.opendev.org/c/opendev/bindep/+/774299 | 19:25 |
fungi | stepping away to prep for dinner while that runs | 19:27 |
openstackgerrit | Martin Kopec proposed opendev/system-config master: Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258 | 19:28 |
openstackgerrit | James E. Blair proposed openstack/project-config master: Disallow fail-fast in Zuul https://review.opendev.org/c/openstack/project-config/+/774300 | 19:32 |
corvus | fungi: ^ i believe some of us are under the impression that patch was already written and merged, but apparently it was not. apparently octavia has managed to find and use that option without us knowing about it; and iurygregory was asking about using it. | 19:33 |
corvus | i'm not in a position to advocate for or against it at this point, but i did want to point out the omission and offer a correction. feel free to reject that patch if the policy has changed. | 19:34 |
johnsom | corvus It was documented.... Octavia only uses it for the gate pipeline, which isn't going to bring much developer value. | 19:35 |
corvus | johnsom: documented where? | 19:35 |
corvus | i'm sorry if i missed the policy change | 19:36 |
johnsom | https://zuul-ci.org/docs/zuul/reference/project_def.html?highlight=fail%20fast#attr-project.%3Cpipeline%3E.fail-fast | 19:37 |
corvus | johnsom: oh, yeah, i know it's documented in zuul, i was intimately involved in the creation and review of the change to implement it. :) | 19:38 |
johnsom | The only docs we have... grin | 19:38 |
corvus | johnsom: but i mean the openstack-infra team looked closely at the situation some time ago and decided that we did not want it used in openstack's zuul | 19:39 |
johnsom | The value for us is the gate jobs can run hours, so if one of them fails there really is no point holding up a bunch of instances for hours bringing no value. | 19:39 |
corvus | so we actually designed the feature in zuul to make it possible to force it off | 19:39 |
johnsom | Hmmm, well, that discussion wasn't communicated. The Octavia team voted to enable it two years ago | 19:40 |
corvus | yeah, clearly there was an oversight | 19:40 |
corvus | like i said, i think most of us thought the option was already set that way | 19:41 |
*** sboyron has quit IRC | 19:41 | |
corvus | johnsom: i see the argument for allowing it in gate, since, iff you have a clean-check system where check is required for gate, then users should have been exposed to all the errors already. it's certainly a nuance worth considering. | 19:42 |
mordred | ++ | 19:42 |
corvus | (however, i'm an advocate of not having clean-check, so patches can go directly to gate, and in that case, fail-fast would be detrimental) | 19:43 |
corvus | (but openstack *does* have clean-check) | 19:43 |
mordred | the whole stack should likely be considered holistically - developer patterns are a bit different now than they were when we put in clean-check in the first place | 19:43 |
johnsom | Yeah. I agree it's not helpful for the check pipeline, but gate is a different story. | 19:44 |
corvus | the biggest concern is that at a quick glance, most people say "oh i want fail-fast so we use less resources and get info faster" and that's counterintuitively often the opposite. | 19:45 |
johnsom | Yeah, totally for check. But most gate pipeline failures are infra/nodepool instance/etc. failures. | 19:46 |
corvus | to be honest, i'm not sure we actually evaluated the impact of fail-fast in a dependent pipeline | 19:46 |
johnsom | No point in letting them run 2+ hours when they are all going to fail because someone broke a post script | 19:46 |
mordred | also - in gate - quicker gate resets have a potential knock-on effect | 19:47 |
mordred | corvus: yah - I don't tknow that I'd really thought about it for gate before today | 19:47 |
corvus | what happens to a change in gate if fail-fast is enabled? | 19:47 |
corvus | (the user story for adding the feature was all about check; there are no tests of it in gate, so it technically has undefined behavior) | 19:47 |
johnsom | The job fails, the other jobs are cancelled, and zuul votes -1 | 19:47 |
corvus | johnsom: what if it's not the leading change? | 19:48 |
johnsom | It has worked perfectly for two years... lol | 19:48 |
johnsom | It's the same, the finished jobs still show complete. | 19:48 |
corvus | but it stays in the queue, right? | 19:49 |
corvus | (ie, if change B follows A in the gate pipeline, and change B fails one job, then change B should cancel all remaining jobs, and wait there until A completes; if A fails, B should restart, and if A succeeds, B should report -2). | 19:50 |
mordred | corvus: what about change C behind B - C should restart immediately when B fails on nearest non-failing, right? but then if B restarts if A fails, C should restart again? | 19:52 |
johnsom | Oh, you mean a patch chain. I can't say I know for sure. | 19:52 |
johnsom | Wouldn't it be the same as any Zuul -2 vote? | 19:53 |
corvus | mordred: yes | 19:53 |
johnsom | "dependent patch failed" or something like that if I remember right | 19:53 |
corvus | johnsom: not necessarily a git dependency; zuul establishes ordered dependencies in dependent pipelines | 19:54 |
openstackgerrit | Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size https://review.opendev.org/c/zuul/zuul-jobs/+/773474 | 19:54 |
corvus | (so those could be unrelated changes) | 19:54 |
johnsom | Well, I threw a vote on there for the conversation. | 20:08 |
*** auristor has quit IRC | 20:21 | |
*** auristor has joined #opendev | 20:21 | |
*** slaweq has quit IRC | 20:37 | |
iurygregory | so it's not ok to use fail-fast? (for example in check pipeline?) | 20:38 |
*** klonn has joined #opendev | 20:45 | |
fungi | iurygregory: it's not a zuul feature the opendev systems administrators intended to expose, and its addition to zuul explicitly allowed disablement so that it wouldn't be accidentally enabled in opendev, we just neglected to actually disable it once it became available | 21:02 |
fungi | but if projects in some tenants have been using it for two years, i suppose it's worth discussing keepingh | 21:03 |
fungi | in unrelated news, running build-python-release on ubuntu-focal (a la 774299) does seem to correctly include the files from the license_files metadata in the resulting wheels, so i think that's our solution | 21:09 |
fungi | i'm just going to self-approve that and push an rc2 tag | 21:09 |
corvus | fungi, iurygregory: i feel somewhat strongly it should not be used in openstack in check. i have confidence in the analysis we did previously that doing so would result in more resource usage rather than less. | 21:10 |
fungi | corvus: is it worse or better than the workaround some projects are using of setting slower jobs as dependent on their faster jobs? | 21:10 |
corvus | i'm less confident that holds true for gate in openstack specifically, though i still lean in that direction. | 21:10 |
corvus | fungi: what is being worked around? | 21:11 |
corvus | it's definitely better than that | 21:11 |
fungi | the desire to have changes report sooner if faster/simpler jobs fail | 21:11 |
fungi | i've seen a number of projects, counter to our recommendations, make things like devstack jobs depend on pep8 jobs | 21:11 |
fungi | also not a good idea for the same reasons, i think, but we can't really disable that | 21:12 |
corvus | is that why there's a conversation about how long it takes for jobs to report? | 21:12 |
fungi | the main conversation on why it takes so long for jobs to report has to do with backup in the available node quota | 21:13 |
fungi | people are trying to find ways to reduce overall node utilization in openstack | 21:13 |
corvus | i know, i read it; i just didn't see anything about devstack depending on pep8 with that | 21:14 |
corvus | which is why i asked | 21:14 |
fungi | oh, that was going on for much longer, i don't think it was brought up specifically | 21:15 |
corvus | if devstack depends on pep8, then *of course* the system is going to be slow :( | 21:15 |
fungi | it's been in specific projects (e.g. tripleo), but i don't know who might be doing it today | 21:15 |
*** slaweq has joined #opendev | 21:16 | |
corvus | ftr, that is about the best way to slow down the overall throughput. so it's just another of the ways that this can be counterintuitive. | 21:17 |
corvus | (if you tasked me with figuring out a way to slow things down, that would be #2 on my list) | 21:18 |
corvus | ("that" == slow jobs depending on fast ones) | 21:19 |
clarkb | the slowness conversation started because lastweek (or was it week before last?) the rtt for a new nova change was > 1 day | 21:20 |
clarkb | we ran at node capacity for basically the whole work week | 21:21 |
corvus | i know that | 21:21 |
*** slaweq has quit IRC | 21:22 | |
corvus | i saw dan's analysis, i think that's great and i'm supportive | 21:22 |
fungi | yeah, here's a current example of tripleo still doing that: https://opendev.org/openstack/tripleo-common/src/branch/master/zuul.d/layout.yaml#L22-L27 | 21:22 |
corvus | i was just alarmed to see fail-fast suggested as a solution since we know it's not. | 21:22 |
fungi | they only run their heavier jobs if linters and unit tests succeed | 21:23 |
corvus | that's going seriously slow down the overall throughput | 21:23 |
corvus | but, tbh, that will probably make things better for nova if they don't share a change queue | 21:23 |
clarkb | they don't tripleo is in one queue and nova + cinder + glance + neutron + tempest + devstack and probably a few others are in another | 21:24 |
corvus | it's all interconnected. tripleo doing that frees up nodes for nova, but it means an individual tripleo change takes much much longer than it otherwise would to report, and they are subject to more revisions and rechecks. | 21:27 |
corvus | (which eventually ends up eating more capacity if there is any available) | 21:28 |
fungi | unless it manages to drive away contributors ;) | 21:28 |
corvus | it looks like our effective capacity is only 750 nodes? | 21:30 |
*** hashar has quit IRC | 21:31 | |
fungi | yeah, we're trying to get in touch with inap to see if they can clean up rogue instances. every time we spin them back up we get a bunch of ssh timeouts and changed host keys | 21:31 |
fungi | it's been that way for months | 21:31 |
fungi | mgagne: looks like you're around at the moment! any chance you can get someone to take a look at that? | 21:31 |
mgagne | @fungi: sorry about that, the "maintenance" has been in progress for several months now. | 21:31 |
fungi | mgagne: oh, no worries, just didn't know if it was something we had lost track of | 21:32 |
mgagne | I tried poking people with sticks but those people depends on other people as well. I didn't know about the duplicated IPs issue until I took a look at previous merged changes. | 21:32 |
fungi | the inmotion cloud clarkb's been working on bringing up may help relieve some pressure too | 21:33 |
mgagne | I think it's better to keep it disabled for now until we figure out that situation. They are aware that you disabled the whole inap region, which I find unfortunate. But I don't have much control over it. | 21:33 |
clarkb | mgagne: fwiw dansmith and melwitt thought there may be some known nova + cells issues to explain that behavior | 21:34 |
clarkb | fungi: re inmotion I think the initial IP allocation is quite small ~/28 so unlikely to immediately help, but they say ipv6 is planned so hopeflly we'd be able to transition from /28 small resources to ipv6 many resources | 21:34 |
mgagne | I'm sure there is, we are stuck with Mitaka right now, there is unfortunately no plan to upgrade it. | 21:34 |
fungi | clarkb: presumably you mean ipv4 /28 (for ipv6 that'd be... huge?) | 21:35 |
clarkb | fungi: yes /28 for ipv4 | 21:35 |
fungi | but yeah, ~14 usable addresses probably not a big win yet | 21:35 |
clarkb | ipv6 isn't deployed yet but is planned | 21:36 |
fungi | that'll be cool | 21:37 |
melwitt | the thing I was talking about re: duplicate IPs that time was if an instance is deleted while nova-compute is "down" and 'nova-manage db archive_deleted_rows' happens to run [via a cron or such] while nova-compute is down, it will leave a libvirt guest running that will never be reaped and it could be using an IP that was freed and could be given back out to a new instance | 21:42 |
melwitt | the way to avoid that is to make use of the '--before <date/time>' option to archive_deleted_rows to give a buffer zone for down nova-compute issues to be resolved before its instances are swept away by an archive | 21:43 |
mgagne | we do run the archive_deleted_rows cron | 21:43 |
iurygregory | fungi, corvus gotcha =) | 21:44 |
mgagne | our maintenance includes putting down a bunch of compute nodes. those have been down for several weeks so our buffer would beed to be... like months. | 21:44 |
openstackgerrit | Merged opendev/bindep master: Build releases on ubuntu-focal https://review.opendev.org/c/opendev/bindep/+/774299 | 21:44 |
corvus | remote: https://review.opendev.org/c/zuul/zuul/+/774311 Add a test for fail-fast in the dependent pipeline [NEW] | 21:45 |
melwitt | ack. it's defaulted to --before 90 days in tripleo fwiw | 21:45 |
corvus | johnsom, mordred: ^ i have confirmed the behavior in zuul is as i described earlier, so that's good. there's a test so we don't regress. | 21:46 |
mgagne | melwitt: thanks for teaching me about that option, I'll see if we can enable it. Unfortunately, I don't think it would have prevented that specific situation since computes were down for several weeks. But it could help "normal" use in the future. | 21:49 |
mgagne | but it could also be that the issue was caused by "normal" use, we did have that issue several time in the past without any down compute nodes. | 21:50 |
fungi | i've pushed 2.9.0.0rc2 for bindep, looks like it correctly triggered the release pipeline this time, will confirm the uploaded wheel has the missing files once that completes | 21:51 |
melwitt | mgagne: can you remind me, are you on cells v1? I'm remembering now that dansmith was thinking along the lines of things related to cells v1 could be going on | 21:53 |
mgagne | yes, cells v1 is used, we have Nova Mitaka /shame | 21:53 |
mgagne | ~1 year ago, we had plan to upgrade to Queens, politics happened and here we are today. | 21:54 |
melwitt | ok, yeah. dan was pointing out how cells v1 involves syncing data up and down to/from the "api cell" and the "compute cells" and failure to sync could maybe present this kind of issue | 21:54 |
melwitt | that is, we've had and have a lot of problems around that syncing mechanism which drove the change to "cells v2" | 21:55 |
mgagne | I'm sure it does, I also found that the "reaper" doesn't work well with cells v1. You can end up with orphans on the compute nodes and it will never find them out. | 21:55 |
fungi | okay, bindep-2.9.0.0rc2-py2.py3-none-any.whl on pypi has the expected contents, so i'll tag 2.9.0 now | 21:57 |
mgagne | One challenge we had for the upgrade is that Nova is kind of coupled with Ironic in our case. We can't easily fast-forward without upgrading both. | 21:57 |
mgagne | Ironic changed drivers architecture so we would have to address that too since we do have custom drivers. + introduction of placement. + migration to cells v2. Lot at the same time to push forward. | 21:57 |
melwitt | ok yeah, then you've already found this I think. that reap task is how the rogue vms get cleaned up and if it's not working right, then that would definitely get you the duplicated IP problem (rogue vm still using IP and it's given out to a new vm) | 21:57 |
mgagne | yep... | 21:58 |
mgagne | I think instance is deleted at API cell level but not compute and reaper reads from compute cell database. and there is nothing to fix the discrepancies when that happens. | 21:59 |
mgagne | or could it be 2 years ago? my memory is very bad with time, now I feel old. | 21:59 |
melwitt | yeah... that makes sense. I don't recall off the top of my head about how what we call "local delete" works in cells v1 but what you're saying makes sense | 22:00 |
mgagne | It seems we should be able to bring back the region online next week. I'll sure poke this channel back when ready. | 22:01 |
melwitt | hm, looks like we don't free the network during local delete, so it seems like it shouldn't result in the IP being given out again | 22:03 |
melwitt | (in mitaka) | 22:03 |
melwitt | er sorry, I think it would. I misread this 'if self.cell_type != api' as meaning it wouldn't free the network but it would do it at the compute cell level https://github.com/openstack/nova/blob/mitaka-eol/nova/compute/api.py#L1851 | 22:06 |
openstackgerrit | Jeremy Stanley proposed opendev/project-config master: Correctly match releases as well as prereleases https://review.opendev.org/c/opendev/project-config/+/774312 | 22:06 |
fungi | clarkb: ^ i've tested that locally with both release and prerelease tag refs | 22:06 |
* fungi sighs | 22:07 | |
fungi | bindep 2.9.0 didn't get enqueued when pushed because of that. i'll manually reenqueue it once that merges | 22:08 |
fungi | no hurry on it | 22:08 |
*** rchurch has quit IRC | 22:29 | |
*** rchurch has joined #opendev | 22:31 | |
*** whoami-rajat__ has quit IRC | 22:39 | |
*** DSpider has quit IRC | 23:08 | |
clarkb | fungi: +2 on the prerelease fix | 23:19 |
clarkb | sorry I should've regex'd harder in my original review | 23:19 |
fungi | nah, me too | 23:19 |
fungi | thanks | 23:19 |
openstackgerrit | Merged opendev/project-config master: Correctly match releases as well as prereleases https://review.opendev.org/c/opendev/project-config/+/774312 | 23:21 |
*** JayF has quit IRC | 23:30 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!