Friday, 2021-02-05

fungi	neat!	00:00
mordred	clarkb: that's not a terrible approach	00:00
fungi	yeah, sounds about right	00:00
clarkb	horizon really hides the clouds.yaml file	00:05
fungi	maybe horizon would rather you used horizon ;)	00:07
mordred	clarkb: it does eventually provide one though right?	00:08
mordred	(that took a while to get added)	00:08
clarkb	mordred: ya you just have to dig through the ui. if you do the account dropdown it only shows openrc. for clouds.yaml you have to go to Project -> Api Access -> Download OpenRC file -> change it to clouds.yaml	00:09
*** corvus has quit IRC		00:09
* mordred should make a patch		00:10
*** corvus has joined #opendev		00:10
fungi	indeed, part of the effort to raise up the universal sdk should at least mean the clouds.yaml is listed by default and you need to switch it to openrc if that's what you really want	00:11
mordred	yeah	00:11
mordred	also - I keep meaning to make an "import clouds.yaml" command for osc	00:11
mordred	so that you could download a clouds.yaml from horizon, run "osc config import downloaded-file.yaml" and it would add that entry into your clouds.yaml - and potentially prompt you for a password (since the password will be missing from the downloaded file)	00:12
fungi	that'd be neat	00:12
clarkb	ya looking at this a bit more I think the "manual tasks" so far are the actual button clicking to make a cloud, then will be adding other root's ssh keys to the hosts, then creating the project/users for nodepool things.	00:12
clarkb	I'm sure other stuff will come up too like quotas	00:12
clarkb	but so far thats my list of what is required	00:13
fungi	maybe horizon should even start to suggest that command (once it exists)	00:13
mordred	++	00:13
fungi	clarkb: well, (re)building the mirror has rather a few manual steps later	00:13
clarkb	fungi: ya I mean more from the perspective of "how is this different than the other cloud we consume"	00:13
fungi	oh, yep	00:14
*** corvus has quit IRC		00:15
*** corvus has joined #opendev		00:17
clarkb	https://docs.flexmetal.net/day-1-getting-started-with-openstack/ those are their bootstrapping docs too fwiw which can help others who may want to take a look	00:22
clarkb	essentailly there is a cloud provisioning dashboard on their website and from that you get a cloud with horizon and apis and such	00:22
clarkb	next up I've discovered I need to learn about roles	00:24
clarkb	there are also groups?	00:25
clarkb	https://docs.openstack.org/keystone/latest/admin/service-api-protection.html	00:26
fungi	fancy modern keystone, yep	00:26
*** tosky has quit IRC		00:27
clarkb	I think what we want is two users with role member. Then a project for each. To most properly mimic our other setups. I don't think we need any groups	00:27
fungi	sounds right to me	00:27
clarkb	ah ok I think I've hit my first major snag. I don't see https for horizon or api access	00:29
clarkb	the cloud provisioner api is all ssl'd, but the layer under that for the cloud itself isn't	00:30
clarkb	I can do a port forward for bootstrapping things, but I think we really want nodepool to talk https, so why don't I ask if they have a way to do that already	00:30
fungi	yep, better than hacking something together ourselves when they already have a clean solution	00:31
fungi	(hopefully they do)	00:32
clarkb	I'm looking around on the host too but not finding a 443 listener	00:32
corvus	clarkb: agree re two users with member role	00:47
clarkb	sounds like there isn't a magic ssl setup we can lean on. But I think we may be able to set up a proxy on the hosts ourselves? I've provided this as feedback that something even self signed is important. I also let them know that clouds.yaml can verify self signed pretty easily so for api access at least its not terrible	00:48
openstackgerrit	Ian Wienand proposed opendev/system-config master: borg-backup: add a few more global excludes https://review.opendev.org/c/opendev/system-config/+/774179	00:49
corvus	clarkb: seems like theoretically a service offering could use LE. or if we use a proxy, maybe we could use LE?	00:49
clarkb	corvus: yes, and yes. The first thing is sorting out how the exteranl access is all routed through kolla and stuff (there is a VIP of some sort) and I figured self signed would make that easy	00:50
clarkb	but ya we should be able to have them generate LE certs	00:50
*** dviroel has quit IRC		00:50
fungi	can it be exposed through whatever's serving horizon?	00:52
clarkb	no I think we'll be putting it in front of horizon	00:53
clarkb	horizon is running out of containers as part of their deployment and I worry that if you start changing that too much you'll trip up the auomation for the cloud stuff	00:54
clarkb	I should go help figure out dinner now. I'll look at the network setup and try to figure out how packets for the vip end up on the controllers and from that we can make plans for terminating ssl	00:55
clarkb	er in the morning I mean	00:55
fungi	yes, definitely go dinner	00:56
fungi	and i'm a gonna drink	00:56
openstackgerrit	Ian Wienand proposed opendev/system-config master: ask: stream db backup https://review.opendev.org/c/opendev/system-config/+/774181	01:21
*** akrpan-pure has quit IRC		01:23
*** DSpider has quit IRC		01:54
*** hamalq has quit IRC		02:14
openstackgerrit	Ian Wienand proposed opendev/system-config master: ask: stream db backup https://review.opendev.org/c/opendev/system-config/+/774181	02:25
*** lbragstad has quit IRC		02:35
fungi	zuul's getting close enough to idle i'm going to restart the zuul-executor containers on ze08 and ze12 to restore their missing console streamers	02:43
ianw	reverting mysql-client-5.7_5.7.32-0ubuntu0.16.04.1_amd64.deb makes dumps work again	02:44
fungi	#status log restarted the zuul-executor container on ze08 and ze12 to restore their console streamers, previously lost to recent oom events	02:47
openstackstatus	fungi: finished logging	02:47
openstackgerrit	Yanos Angelopoulos proposed openstack/diskimage-builder master: Prevent pip2 to try to update to version > 21 https://review.opendev.org/c/openstack/diskimage-builder/+/774186	02:56
openstackgerrit	Merged opendev/system-config master: borg-backup: add a few more global excludes https://review.opendev.org/c/opendev/system-config/+/774179	03:01
openstackgerrit	Merged opendev/system-config master: ask: stream db backup https://review.opendev.org/c/opendev/system-config/+/774181	03:02
openstackgerrit	Ian Wienand proposed opendev/system-config master: translate: backup zanata db directly to borg https://review.opendev.org/c/opendev/system-config/+/774189	03:09
*** lbragstad has joined #opendev		03:15
ianw	after the askdb stuff there merges, i will purge it and re-create it, which should give us enough room to start backing up the wiki server	03:26
ianw	the end result being we can retire bup	03:26
*** brinzhang_ has quit IRC		03:26
*** hemanth_n has joined #opendev		03:35
*** brinzhang_ has joined #opendev		03:35
openstackgerrit	Merged opendev/system-config master: translate: backup zanata db directly to borg https://review.opendev.org/c/opendev/system-config/+/774189	04:10
*** zimmerry has quit IRC		04:55
*** ysandeep\|away is now known as ysandeep		04:55
*** ysandeep is now known as ysandeep\|ruck		04:55
*** zimmerry has joined #opendev		04:56
*** ykarel has joined #opendev		05:02
*** DSpider has joined #opendev		05:28
*** ykarel has quit IRC		05:50
*** ykarel has joined #opendev		05:51
openstackgerrit	Ian Wienand proposed opendev/system-config master: ask: fix backup typo and ignore live postgresql https://review.opendev.org/c/opendev/system-config/+/774197	06:41
*** eolivare has joined #opendev		06:47
*** slaweq has joined #opendev		07:02
*** marios has joined #opendev		07:02
openstackgerrit	Merged opendev/system-config master: ask: fix backup typo and ignore live postgresql https://review.opendev.org/c/opendev/system-config/+/774197	07:14
*** rpittau\|afk is now known as rpittau		07:57
*** ralonsoh has joined #opendev		08:02
*** sboyron has joined #opendev		08:06
*** ysandeep\|ruck is now known as ysandeep\|lunch		08:09
*** andrewbonney has joined #opendev		08:18
*** hashar has joined #opendev		08:24
*** ysandeep\|lunch is now known as ysandeep		08:33
*** ykarel is now known as ykarel\|away		08:39
*** tosky has joined #opendev		08:41
*** ykarel\|away has quit IRC		08:48
*** fressi has joined #opendev		08:52
*** jpena\|off is now known as jpena		08:57
*** ysandeep is now known as ysandeep\|rover		09:07
openstackgerrit	Martin Kopec proposed opendev/system-config master: Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258	09:19
*** JayF has quit IRC		09:32
*** zimmerry has quit IRC		09:32
*** JayF has joined #opendev		09:36
openstackgerrit	Luigi Toscano proposed opendev/irc-meetings master: Fix the ID of the OpenStackSDK meetings https://review.opendev.org/c/opendev/irc-meetings/+/774212	09:43
*** zbr is now known as zbr\|pto		09:54
zbr\|pto	fungi: I did my bit on https://review.opendev.org/q/topic:%22bindep-2.9%22+(status:open%20OR%20status:merged) --- now we need ok from others, maybe ianw clarkb or mordred	09:55
*** dtantsur\|afk is now known as dtantsur		10:01
*** brinzhang_ has quit IRC		10:08
*** Tengu has quit IRC		10:30
*** hemanth_n has quit IRC		10:52
*** dviroel has joined #opendev		10:57
*** Tengu has joined #opendev		11:08
*** mciecierski has joined #opendev		11:24
*** ysandeep\|rover is now known as ysandeep\|afk		11:35
*** zimmerry has joined #opendev		11:43
*** dmellado has quit IRC		12:21
*** dmellado has joined #opendev		12:23
*** ysandeep\|afk is now known as ysandeep\|rover		12:32
*** jpena is now known as jpena\|lunch		12:32
*** artom has joined #opendev		12:37
*** eolivare_ has joined #opendev		12:56
*** lbragstad_ has joined #opendev		12:56
*** eolivare has quit IRC		12:56
*** lbragstad has quit IRC		12:56
openstackgerrit	Merged opendev/irc-meetings master: Fix the ID of the OpenStackSDK meetings https://review.opendev.org/c/opendev/irc-meetings/+/774212	13:18
*** jpena\|lunch is now known as jpena		13:31
*** tkajinam has quit IRC		13:50
openstackgerrit	Daniel Blixt proposed zuul/zuul-jobs master: Allow mirror push to delete current branch https://review.opendev.org/c/zuul/zuul-jobs/+/764152	13:51
*** jpena is now known as jpena\|dojo		13:51
openstackgerrit	Daniel Blixt proposed zuul/zuul-jobs master: Allow mirror push to delete current branch https://review.opendev.org/c/zuul/zuul-jobs/+/764152	13:53
*** zoharm has joined #opendev		13:54
openstackgerrit	Dinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354	13:56
*** chandankumar is now known as raukadah		13:59
*** whoami-rajat__ has joined #opendev		14:01
mordred	fungi: was just reviewing the bindep stack (all lgtm) - the description-file -> long-description change made me think that should be a pbr patch	14:30
mordred	like - pbr should translate use of description-file to the two long description fields, no?	14:31
mordred	(obviously if somoene gives the long description fields, that's cool)	14:31
mordred	I mean - actually the more that I think of it, in keeping with pbr's philosophy - I'm not sure what pbr can't just find README, README.rst or README.md and dtrt without having someone have to provide that info at all	14:32
*** artom has quit IRC		14:48
openstackgerrit	Merged opendev/bindep master: Move all jobs in-repo https://review.opendev.org/c/opendev/bindep/+/773797	14:50
openstackgerrit	Merged opendev/bindep master: Build docs for OpenDev https://review.opendev.org/c/opendev/bindep/+/773796	14:50
openstackgerrit	Merged opendev/bindep master: Remove release note about rpm path references https://review.opendev.org/c/opendev/bindep/+/774031	14:50
*** mciecierski has quit IRC		14:54
fungi	mordred: yeah, it could be argued that pbr should also simply step out of the way if there are standard setup.cfg fields setuptools already understand	14:54
fungi	but with setuptools use on the decline i don't know how much i want to wrestle it	14:55
fungi	anyway, this is basically the same change i applied to zuul-client	14:55
fungi	and am expecting to do similar for, e.g., git-review	14:55
frickler	mordred: fungi: the bindep docs promote job has failed, but I'm not sure what the error is https://74e9c0429c6a046612ec-9df07301c72e495bf406244d2eaedc4c.ssl.cf1.rackcdn.com/773797/4/promote/opendev-promote-docs/730edad/job-output.txt	14:59
mordred	fungi: yah - I could see the argument of having it step out of the way - but on the other hand I've always seen part of its job to be taking care of fiddly stuff for you so you don't have to	15:00
*** artom has joined #opendev		15:00
mordred	and while those long description fields are neat - they aren't really a pleasant interface :)	15:00
frickler	ah, maybe it was superseded, it passed on the next patch	15:00
*** ysandeep\|rover is now known as ysandeep\|away		15:03
*** lbragstad_ is now known as lbragstad		15:03
fungi	frickler: oh, thanks for spotting, i'll double-check that but yeah there was a patch to move jobs in-repo and another to add the opendev docs publication jobs, so maybe i should have squashed something	15:12
fungi	i half expect to find fallout i need to clean up from switching zuul tenants anyway, so need to check over a few things closely	15:14
*** rpittau is now known as rpittau\|afk		15:17
openstackgerrit	Merged opendev/bindep master: Overhaul Python package metadata https://review.opendev.org/c/opendev/bindep/+/774106	15:43
*** redrobot3 has joined #opendev		15:45
*** redrobot has quit IRC		15:49
*** redrobot3 is now known as redrobot		15:49
openstackgerrit	Merged opendev/bindep master: Update contributor doc and readme https://review.opendev.org/c/opendev/bindep/+/774107	15:55
*** tosky has quit IRC		16:29
*** fressi_ has joined #opendev		16:36
*** fressi has quit IRC		16:36
*** fressi_ is now known as fressi		16:37
fungi	clarkb: ianw: almost glacial, but the stable proposed updates request to approve a patched openafs into debian/buster was filed today: https://bugs.debian.org/982002	16:49
openstack	Debian bug 982002 in release.debian.org "buster-pu: package openafs/1.8.2-1" [Normal,Open]	16:49
*** marios is now known as marios\|out		16:49
clarkb	fungi: better late than never. I guess it was only 1.6 that started working again in february	16:49
fungi	though i don't think we consume tpu in our nodes, so might still need to wait for a stable point release of buster	16:50
clarkb	our ppa should work until then	16:50
fungi	they weren't able to convince the security team that it qualified for inclusion in debian-security	16:51
fungi	and non-security-related package updates into stable take waaaaay longer	16:51
clarkb	I suppose that it isn't a security issue if it just fails to work :)	16:51
fungi	yup	16:51
*** marios\|out has quit IRC		16:56
*** zoharm has quit IRC		17:00
fungi	frickler: taking a closer look at that initial opendev-promote-docs failure i think the problem is it ran even though we didn't do a docs build in the gate, so it failed to download the nonexistent artifact from that	17:01
fungi	technically not a download failure, but it tried to reference a list index in a null list	17:02
fungi	as a result of having no artifact to retrieve	17:02
fungi	we could probably make the "download-artifact: Parse build response" task a little more robust there, but it's a corner case i'm tempted to just ignore	17:03
*** klonn has joined #opendev		17:05
corvus	clarkb: your gerrit status is turkey time	17:08
fungi	it's always turkey time!	17:09
corvus	agreed	17:09
fungi	kinda like how everyday is hallowe'en	17:09
corvus	and the eternal september?	17:09
fungi	we don't speak of that	17:10
clarkb	I never updated it after novemeber	17:10
*** ralonsoh has quit IRC		17:14
*** diablo_rojo has joined #opendev		17:20
fungi	i'm sort of surprised moving bindep out of the openstack tenant hasn't created any new zuul config errors, but no matches for bindep anywhere in the mess at https://zuul.opendev.org/t/openstack/config-errors	17:24
fungi	and aside from the aforementioned docs promote first-time race and a mysterious retry on a single tox-pep8 run, all the bindep jobs are succeeding consistently	17:25
fungi	and documentation is now correctly redirected to https://docs.opendev.org/opendev/bindep and we have release notes getting published for it, so i think it's time to tag	17:26
fungi	infra-root: any objections to tagging a bindep 2.9.0 release shortly?	17:27
fungi	actually i may tag 2.9.0.0rc1 just to make sure the release jobs are functional after all the job switcheroo	17:28
clarkb	tagging an rc first sounds like a good idea	17:29
clarkb	no objections from me	17:29
*** eolivare_ has quit IRC		17:30
fungi	bombs away	17:31
fungi	didn't trigger anything, checking the pipeline defs	17:36
fungi	yeah, our release pipeline doesn't match on prerelease strings, working on a patch	17:38
openstackgerrit	Jeremy Stanley proposed opendev/project-config master: Recognize PEP-440 prereleases in release pipeline https://review.opendev.org/c/opendev/project-config/+/774291	17:42
fungi	infra-root: ^ i expect my explanation there makes sense for why we shouldn't need a separate prerelease pipeline, but please do let me know if it seems like a bad idea	17:43
fungi	(note this is specifically for the opendev tenant in zuul)	17:43
clarkb	fungi: do you remember why we split them before?	17:44
fungi	clarkb: in the beforetime, in the longlongago, pip didn't know to avoid installing prereleases automatically, so we only published wheels and not tarballs, hence ran separate jobs between the two pipelines	17:45
fungi	i think we may have in some cases also avoided running specific jobs to do things like update documentation if it was a prerelease	17:46
clarkb	aha	17:46
*** klonn has quit IRC		17:47
fungi	one other thing i'm not sure how to tackle cleanly, looks like we only publish docs when new changes merge, so they'll have references to the old release until the first change merges after a release is tagged. an obvious workaround is to merge something trivial soon after releasing	17:48
*** jpena\|dojo is now known as jpena\|off		17:48
fungi	back when we did docs builds and uploads in post, it was fairly easily to make those jobs also be able to run similarly in release	17:49
fungi	but with the gate+promote it seems like that's not as simple to apply to something you trigger after tagging a release	17:50
fungi	probably not urgent to solve, just means we have an incentive to review/merge stuff soon after any release	17:50
clarkb	gerrit account cleanup update: tbachman has two accounts, one of which is the actively used account and that one had a preferred email address has no external id issue. The preferred email address that was set was in use by the other account (but the other account wasn't used for reviews or code pushing)	17:53
clarkb	talking to tbachman we decided that setting the preferred email address on the active account to match the email address associated with the openids on that account was the easiest option. Trying to add external ids for the preferred email address would not work because that would conflict with the other account	17:54
clarkb	tbachman made this change through the web ui on review-test, then we tested that log out and log in shows the account id did not change. We also checked that reviews and code pushes could still be made	17:54
clarkb	since that all looked good tbachman went ahead and made the change on prod and that is one less account with consistency issues	17:54
clarkb	My next step is to disable the other account to reduce confusion when people try to add reviewers	17:55
clarkb	I have not done that yet, but am proceeding with it now	17:55
clarkb	posting all this to channel as I think there may be some other accounts whee we just set the preferred email to match the openid (though maybe without user intervention)	17:55
fungi	thanks for the walkthrough! i guess after the old account is disabled you're going to rerun the consistency check and see if those accounts no longer appear as errors?	17:56
clarkb	fungi: "yes". I was hoping to get through a few more accounts before rerunning the consistency check, but if we think that is prudent to do first I can go ahead and do that	17:57
clarkb	fungi: review-test:~clarkb/gerrit-consistency-notes/further-preferred-email-cleanups <- for my notes on cleaning up another 20 or so. They are arranged in roughly most confident to least confident in plan from beginning to end of that file if you want to take a look	17:57
clarkb	sort of related: I was just looking at gerrit set-account to get the command right for --inactive. It looks like the ssh api for accounts can remoev and add and set preferred emails. Maybe we can use this as a way around the lack of a rest api for updating external ids? at least for emails	17:59
clarkb	I'll file that away for further testing on review-test if it looks like it will be useful	18:00
clarkb	and the second account for tbachman has been set inactive	18:02
fungi	yeah, ideally we could. having seen the conditions under which those cli calls fail though, i'm skeptical we'll be able to rely on it to fix broken accounts	18:02
clarkb	ya will definitely need testing	18:02
fungi	it seems to give up in the face of incomplete account values	18:02
*** dtantsur is now known as dtantsur\|afk		18:06
clarkb	fungi: maybe if you get a chance you can look over the first set of accounts in that further-preferred-email-cleanups, there are 8, and I can run the script from last time to retire them if that plan looks good to you. Then rerun the consistency checks?	18:06
clarkb	that set are accounts that I suspect we merged but elft behind ssh username external ids (since you can't merge those like the openids as far as I know)	18:07
fungi	can do	18:07
clarkb	I will run a consistency check against -test now and confirm tbachman's account is happy there though	18:09
clarkb	fungi: I've +2'd the pipeline change for oepndev releases.	18:13
clarkb	consistency check on -test shows tbachman's problem is gone from that perspective too \o/	18:16
fungi	excellent!	18:18
fungi	clarkb: looking at your notes, are ssh usernames now separate from http usernames?	18:22
fungi	or are they differentiated by whether or not they have a password?	18:23
clarkb	fungi: oh I guess not. I've always thought of them as ssh because the http is newer (though old at this point)	18:23
clarkb	I believe none of them had passwords because those are external ids as well and I tried tonote all the different ypes of external ids they had but let me double check that	18:23
fungi	okay, so when it says "ssh username external id" that's specifically a "username:something" external id	18:24
fungi	used to be username external ids had a password in that record as a separate column/field	18:25
clarkb	yup they still do. But that first account in my list does not have that record.	18:25
clarkb	I can check the other 7 really quickly too	18:25
clarkb	so ya if the http passwd is set it would show up in the username:foo record. But if not set it doesn't have a line for that	18:25
fungi	so usually if there was a username and an ssh public key then it might be in use for ssh access, while a username with a password then it might have been used for rest api access	18:26
fungi	(the ssh public key being in a separate table/store of course)	18:26
fungi	also they could have a username with no password and no ssh keys for the account, fairly common for older accounts since the openid used to autopopulate a username	18:27
fungi	in those cases the account was at most only being used with the webui	18:27
fungi	similar to accounts with no username	18:27
clarkb	all 8 of those have no http passwd. 7 of them have only the one external id for the username. The odd one out has a mailto externalid as well	18:28
clarkb	I had classified these separately because they all had a second account that appeared to be the actual one	18:28
fungi	right, so almost guaranteed to be relics of older duplicate account merges/fixwes	18:29
clarkb	the other account having openids and such	18:29
clarkb	the lack of openids on these being the main clue	18:29
clarkb	well when coupled with a secnodary account having openids	18:29
fungi	yeah, only accounts which had been hand-fixed for situations like that or old manually created service accounts for bots and third-party ci systems would lack openids entirely	18:30
clarkb	The next group is the third party ci category	18:31
clarkb	there are two in that one, netiehr show up on the wiki today so I was thinking we can retire them too and if they need CI for that a new account set up in the modern method would be best	18:32
fungi	yeah, i agree with your analysis on both those groups	18:32
clarkb	then as you go further in that file my confidence in various classifications becomes weaker, though for the most part I think we can retire those accounts. tbachman was top of the "ok we need to actually figure this out and not retire it" list	18:32
clarkb	fungi: cool, should I go ahead and use my script from last time to retire the first 2 groups (10 accounts) ?	18:33
fungi	yes, i think that's safe	18:34
fungi	did you want to adjust the invocation to catch stderr this time?	18:34
clarkb	I can	18:34
fungi	otherwise the set -x isn't super helpful as we miss it in the log	18:34
clarkb	then I can upload the resulting log file next to the existing one on review	18:34
fungi	sounds great	18:35
clarkb	will take me a few minutes to page in how that all worked and get perms sorted out but proceeding with that shortly	18:35
*** klonn has joined #opendev		18:35
fungi	infra-root: i'm self approving 774291 to augment the release pipeline in the opendev tenant for prerelease version strings for now, but if you disagree please don't hesitate to revert it	18:35
openstackgerrit	Merged opendev/project-config master: Recognize PEP-440 prereleases in release pipeline https://review.opendev.org/c/opendev/project-config/+/774291	18:37
*** tosky has joined #opendev		18:37
fungi	and now that's merged, i'm manually reenqueuing the bindep 2.9.0.0rc1 tag ref	18:40
fungi	an opendev-release-python build is queued for it now	18:41
fungi	and it's running	18:42
fungi	yay! https://pypi.org/project/bindep/#history shows a 2.9.0.0rc1 now	18:45
fungi	the new project links appear correctly in the left sidebar of https://pypi.org/project/bindep/2.9.0.0rc1/ too	18:46
fungi	once i'm done inspecting the sdist and wheel contents and doing some install tests from pypi, i'll tag 2.9.0 proper and put together a release announcement with the release notes we've got for it	18:49
clarkb	nice	18:49
clarkb	the first set of 8 account updates is done. Doing the couple of CIs next (split them in order to change the commit message	18:49
fungi	awesome	18:49
clarkb	and the CI accounts are done now too. That is a total of 8 + 2 + tbachman's 1 = 11 more fixed account issues	18:54
clarkb	I'll work on rerunning the consistency checker next	18:54
fungi	strange, i thought the license_files key in setup.cfg would cause AUTHORS and LICENSE to be included in the wheel, but they aren't as far as i can see	18:56
clarkb	you may need manifest.ini?	18:56
clarkb	since pbr is figuring out the file list itself maybe it overrides some setuptools defaults?	18:56
clarkb	I put my log file at review:~clarkb/gerrit_user_cleanups/user-retirement.log.20210205 too btw	18:57
fungi	actually, i'm mildly confused by which files it's decided to include	18:57
fungi	for example the wheel includes bindep/releasenotes/notes/use-distro-library-db71244a0a5cf1dd.yaml but nothing else from the releasenotes tree	18:57
clarkb	I thought pbr was supposed to do anything it thought was part of the python package (eg things in dirs with __init__.py) and then also things in MANIFEST.in	18:59
clarkb	but I'd have to go read docs/code to be sure of that	18:59
*** klonn has quit IRC		19:02
clarkb	I have requested a consistency check from prod. Now we wait	19:02
clarkb	diffing older consistency report and one just generated looks like I expected. There were 11 additional issues last time. Those are now gone. There are no new issues	19:09
fungi	excellent	19:09
clarkb	there are 17 preferred email missing external id issues now. (Down from 109 when we started)	19:09
clarkb	the rest of them are classified in my document I noted above. I just think that more careful review will be needed on those	19:10
clarkb	I'm going to deescalate my privs now as things look to check out to me	19:10
fungi	comparing the published bindep wheel to one of my personal projects, i would expect to see AUTHORS and LICENSE inside the dist-info tree	19:12
fungi	huh, when i build a bindep wheel on my workstation, those files are included	19:14
fungi	so this may come down to setuptools/wheel versions	19:15
*** andrewbonney has quit IRC		19:15
clarkb	weird	19:16
mordred	yeah - I thought pbr put the two of those into the dist by default	19:17
mordred	oh - I think pbr ensures they wind up in the source dist	19:17
fungi	yeah, this may be because we're running the job on ubuntu-bionic, i wonder if ubuntu-focal will fare better	19:18
openstackgerrit	Jeremy Stanley proposed opendev/bindep master: Build releases on ubuntu-focal https://review.opendev.org/c/opendev/bindep/+/774299	19:25
fungi	stepping away to prep for dinner while that runs	19:27
openstackgerrit	Martin Kopec proposed opendev/system-config master: Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258	19:28
openstackgerrit	James E. Blair proposed openstack/project-config master: Disallow fail-fast in Zuul https://review.opendev.org/c/openstack/project-config/+/774300	19:32
corvus	fungi: ^ i believe some of us are under the impression that patch was already written and merged, but apparently it was not. apparently octavia has managed to find and use that option without us knowing about it; and iurygregory was asking about using it.	19:33
corvus	i'm not in a position to advocate for or against it at this point, but i did want to point out the omission and offer a correction. feel free to reject that patch if the policy has changed.	19:34
johnsom	corvus It was documented.... Octavia only uses it for the gate pipeline, which isn't going to bring much developer value.	19:35
corvus	johnsom: documented where?	19:35
corvus	i'm sorry if i missed the policy change	19:36
johnsom	https://zuul-ci.org/docs/zuul/reference/project_def.html?highlight=fail%20fast#attr-project.%3Cpipeline%3E.fail-fast	19:37
corvus	johnsom: oh, yeah, i know it's documented in zuul, i was intimately involved in the creation and review of the change to implement it. :)	19:38
johnsom	The only docs we have... grin	19:38
corvus	johnsom: but i mean the openstack-infra team looked closely at the situation some time ago and decided that we did not want it used in openstack's zuul	19:39
johnsom	The value for us is the gate jobs can run hours, so if one of them fails there really is no point holding up a bunch of instances for hours bringing no value.	19:39
corvus	so we actually designed the feature in zuul to make it possible to force it off	19:39
johnsom	Hmmm, well, that discussion wasn't communicated. The Octavia team voted to enable it two years ago	19:40
corvus	yeah, clearly there was an oversight	19:40
corvus	like i said, i think most of us thought the option was already set that way	19:41
*** sboyron has quit IRC		19:41
corvus	johnsom: i see the argument for allowing it in gate, since, iff you have a clean-check system where check is required for gate, then users should have been exposed to all the errors already. it's certainly a nuance worth considering.	19:42
mordred	++	19:42
corvus	(however, i'm an advocate of not having clean-check, so patches can go directly to gate, and in that case, fail-fast would be detrimental)	19:43
corvus	(but openstack does have clean-check)	19:43
mordred	the whole stack should likely be considered holistically - developer patterns are a bit different now than they were when we put in clean-check in the first place	19:43
johnsom	Yeah. I agree it's not helpful for the check pipeline, but gate is a different story.	19:44
corvus	the biggest concern is that at a quick glance, most people say "oh i want fail-fast so we use less resources and get info faster" and that's counterintuitively often the opposite.	19:45
johnsom	Yeah, totally for check. But most gate pipeline failures are infra/nodepool instance/etc. failures.	19:46
corvus	to be honest, i'm not sure we actually evaluated the impact of fail-fast in a dependent pipeline	19:46
johnsom	No point in letting them run 2+ hours when they are all going to fail because someone broke a post script	19:46
mordred	also - in gate - quicker gate resets have a potential knock-on effect	19:47
mordred	corvus: yah - I don't tknow that I'd really thought about it for gate before today	19:47
corvus	what happens to a change in gate if fail-fast is enabled?	19:47
corvus	(the user story for adding the feature was all about check; there are no tests of it in gate, so it technically has undefined behavior)	19:47
johnsom	The job fails, the other jobs are cancelled, and zuul votes -1	19:47
corvus	johnsom: what if it's not the leading change?	19:48
johnsom	It has worked perfectly for two years... lol	19:48
johnsom	It's the same, the finished jobs still show complete.	19:48
corvus	but it stays in the queue, right?	19:49
corvus	(ie, if change B follows A in the gate pipeline, and change B fails one job, then change B should cancel all remaining jobs, and wait there until A completes; if A fails, B should restart, and if A succeeds, B should report -2).	19:50
mordred	corvus: what about change C behind B - C should restart immediately when B fails on nearest non-failing, right? but then if B restarts if A fails, C should restart again?	19:52
johnsom	Oh, you mean a patch chain. I can't say I know for sure.	19:52
johnsom	Wouldn't it be the same as any Zuul -2 vote?	19:53
corvus	mordred: yes	19:53
johnsom	"dependent patch failed" or something like that if I remember right	19:53
corvus	johnsom: not necessarily a git dependency; zuul establishes ordered dependencies in dependent pipelines	19:54
openstackgerrit	Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size https://review.opendev.org/c/zuul/zuul-jobs/+/773474	19:54
corvus	(so those could be unrelated changes)	19:54
johnsom	Well, I threw a vote on there for the conversation.	20:08
*** auristor has quit IRC		20:21
*** auristor has joined #opendev		20:21
*** slaweq has quit IRC		20:37
iurygregory	so it's not ok to use fail-fast? (for example in check pipeline?)	20:38
*** klonn has joined #opendev		20:45
fungi	iurygregory: it's not a zuul feature the opendev systems administrators intended to expose, and its addition to zuul explicitly allowed disablement so that it wouldn't be accidentally enabled in opendev, we just neglected to actually disable it once it became available	21:02
fungi	but if projects in some tenants have been using it for two years, i suppose it's worth discussing keepingh	21:03
fungi	in unrelated news, running build-python-release on ubuntu-focal (a la 774299) does seem to correctly include the files from the license_files metadata in the resulting wheels, so i think that's our solution	21:09
fungi	i'm just going to self-approve that and push an rc2 tag	21:09
corvus	fungi, iurygregory: i feel somewhat strongly it should not be used in openstack in check. i have confidence in the analysis we did previously that doing so would result in more resource usage rather than less.	21:10
fungi	corvus: is it worse or better than the workaround some projects are using of setting slower jobs as dependent on their faster jobs?	21:10
corvus	i'm less confident that holds true for gate in openstack specifically, though i still lean in that direction.	21:10
corvus	fungi: what is being worked around?	21:11
corvus	it's definitely better than that	21:11
fungi	the desire to have changes report sooner if faster/simpler jobs fail	21:11
fungi	i've seen a number of projects, counter to our recommendations, make things like devstack jobs depend on pep8 jobs	21:11
fungi	also not a good idea for the same reasons, i think, but we can't really disable that	21:12
corvus	is that why there's a conversation about how long it takes for jobs to report?	21:12
fungi	the main conversation on why it takes so long for jobs to report has to do with backup in the available node quota	21:13
fungi	people are trying to find ways to reduce overall node utilization in openstack	21:13
corvus	i know, i read it; i just didn't see anything about devstack depending on pep8 with that	21:14
corvus	which is why i asked	21:14
fungi	oh, that was going on for much longer, i don't think it was brought up specifically	21:15
corvus	if devstack depends on pep8, then of course the system is going to be slow :(	21:15
fungi	it's been in specific projects (e.g. tripleo), but i don't know who might be doing it today	21:15
*** slaweq has joined #opendev		21:16
corvus	ftr, that is about the best way to slow down the overall throughput. so it's just another of the ways that this can be counterintuitive.	21:17
corvus	(if you tasked me with figuring out a way to slow things down, that would be #2 on my list)	21:18
corvus	("that" == slow jobs depending on fast ones)	21:19
clarkb	the slowness conversation started because lastweek (or was it week before last?) the rtt for a new nova change was > 1 day	21:20
clarkb	we ran at node capacity for basically the whole work week	21:21
corvus	i know that	21:21
*** slaweq has quit IRC		21:22
corvus	i saw dan's analysis, i think that's great and i'm supportive	21:22
fungi	yeah, here's a current example of tripleo still doing that: https://opendev.org/openstack/tripleo-common/src/branch/master/zuul.d/layout.yaml#L22-L27	21:22
corvus	i was just alarmed to see fail-fast suggested as a solution since we know it's not.	21:22
fungi	they only run their heavier jobs if linters and unit tests succeed	21:23
corvus	that's going seriously slow down the overall throughput	21:23
corvus	but, tbh, that will probably make things better for nova if they don't share a change queue	21:23
clarkb	they don't tripleo is in one queue and nova + cinder + glance + neutron + tempest + devstack and probably a few others are in another	21:24
corvus	it's all interconnected. tripleo doing that frees up nodes for nova, but it means an individual tripleo change takes much much longer than it otherwise would to report, and they are subject to more revisions and rechecks.	21:27
corvus	(which eventually ends up eating more capacity if there is any available)	21:28
fungi	unless it manages to drive away contributors ;)	21:28
corvus	it looks like our effective capacity is only 750 nodes?	21:30
*** hashar has quit IRC		21:31
fungi	yeah, we're trying to get in touch with inap to see if they can clean up rogue instances. every time we spin them back up we get a bunch of ssh timeouts and changed host keys	21:31
fungi	it's been that way for months	21:31
fungi	mgagne: looks like you're around at the moment! any chance you can get someone to take a look at that?	21:31
mgagne	@fungi: sorry about that, the "maintenance" has been in progress for several months now.	21:31
fungi	mgagne: oh, no worries, just didn't know if it was something we had lost track of	21:32
mgagne	I tried poking people with sticks but those people depends on other people as well. I didn't know about the duplicated IPs issue until I took a look at previous merged changes.	21:32
fungi	the inmotion cloud clarkb's been working on bringing up may help relieve some pressure too	21:33
mgagne	I think it's better to keep it disabled for now until we figure out that situation. They are aware that you disabled the whole inap region, which I find unfortunate. But I don't have much control over it.	21:33
clarkb	mgagne: fwiw dansmith and melwitt thought there may be some known nova + cells issues to explain that behavior	21:34
clarkb	fungi: re inmotion I think the initial IP allocation is quite small ~/28 so unlikely to immediately help, but they say ipv6 is planned so hopeflly we'd be able to transition from /28 small resources to ipv6 many resources	21:34
mgagne	I'm sure there is, we are stuck with Mitaka right now, there is unfortunately no plan to upgrade it.	21:34
fungi	clarkb: presumably you mean ipv4 /28 (for ipv6 that'd be... huge?)	21:35
clarkb	fungi: yes /28 for ipv4	21:35
fungi	but yeah, ~14 usable addresses probably not a big win yet	21:35
clarkb	ipv6 isn't deployed yet but is planned	21:36
fungi	that'll be cool	21:37
melwitt	the thing I was talking about re: duplicate IPs that time was if an instance is deleted while nova-compute is "down" and 'nova-manage db archive_deleted_rows' happens to run [via a cron or such] while nova-compute is down, it will leave a libvirt guest running that will never be reaped and it could be using an IP that was freed and could be given back out to a new instance	21:42
melwitt	the way to avoid that is to make use of the '--before <date/time>' option to archive_deleted_rows to give a buffer zone for down nova-compute issues to be resolved before its instances are swept away by an archive	21:43
mgagne	we do run the archive_deleted_rows cron	21:43
iurygregory	fungi, corvus gotcha =)	21:44
mgagne	our maintenance includes putting down a bunch of compute nodes. those have been down for several weeks so our buffer would beed to be... like months.	21:44
openstackgerrit	Merged opendev/bindep master: Build releases on ubuntu-focal https://review.opendev.org/c/opendev/bindep/+/774299	21:44
corvus	remote: https://review.opendev.org/c/zuul/zuul/+/774311 Add a test for fail-fast in the dependent pipeline [NEW]	21:45
melwitt	ack. it's defaulted to --before 90 days in tripleo fwiw	21:45
corvus	johnsom, mordred: ^ i have confirmed the behavior in zuul is as i described earlier, so that's good. there's a test so we don't regress.	21:46
mgagne	melwitt: thanks for teaching me about that option, I'll see if we can enable it. Unfortunately, I don't think it would have prevented that specific situation since computes were down for several weeks. But it could help "normal" use in the future.	21:49
mgagne	but it could also be that the issue was caused by "normal" use, we did have that issue several time in the past without any down compute nodes.	21:50
fungi	i've pushed 2.9.0.0rc2 for bindep, looks like it correctly triggered the release pipeline this time, will confirm the uploaded wheel has the missing files once that completes	21:51
melwitt	mgagne: can you remind me, are you on cells v1? I'm remembering now that dansmith was thinking along the lines of things related to cells v1 could be going on	21:53
mgagne	yes, cells v1 is used, we have Nova Mitaka /shame	21:53
mgagne	~1 year ago, we had plan to upgrade to Queens, politics happened and here we are today.	21:54
melwitt	ok, yeah. dan was pointing out how cells v1 involves syncing data up and down to/from the "api cell" and the "compute cells" and failure to sync could maybe present this kind of issue	21:54
melwitt	that is, we've had and have a lot of problems around that syncing mechanism which drove the change to "cells v2"	21:55
mgagne	I'm sure it does, I also found that the "reaper" doesn't work well with cells v1. You can end up with orphans on the compute nodes and it will never find them out.	21:55
fungi	okay, bindep-2.9.0.0rc2-py2.py3-none-any.whl on pypi has the expected contents, so i'll tag 2.9.0 now	21:57
mgagne	One challenge we had for the upgrade is that Nova is kind of coupled with Ironic in our case. We can't easily fast-forward without upgrading both.	21:57
mgagne	Ironic changed drivers architecture so we would have to address that too since we do have custom drivers. + introduction of placement. + migration to cells v2. Lot at the same time to push forward.	21:57
melwitt	ok yeah, then you've already found this I think. that reap task is how the rogue vms get cleaned up and if it's not working right, then that would definitely get you the duplicated IP problem (rogue vm still using IP and it's given out to a new vm)	21:57
mgagne	yep...	21:58
mgagne	I think instance is deleted at API cell level but not compute and reaper reads from compute cell database. and there is nothing to fix the discrepancies when that happens.	21:59
mgagne	or could it be 2 years ago? my memory is very bad with time, now I feel old.	21:59
melwitt	yeah... that makes sense. I don't recall off the top of my head about how what we call "local delete" works in cells v1 but what you're saying makes sense	22:00
mgagne	It seems we should be able to bring back the region online next week. I'll sure poke this channel back when ready.	22:01
melwitt	hm, looks like we don't free the network during local delete, so it seems like it shouldn't result in the IP being given out again	22:03
melwitt	(in mitaka)	22:03
melwitt	er sorry, I think it would. I misread this 'if self.cell_type != api' as meaning it wouldn't free the network but it would do it at the compute cell level https://github.com/openstack/nova/blob/mitaka-eol/nova/compute/api.py#L1851	22:06
openstackgerrit	Jeremy Stanley proposed opendev/project-config master: Correctly match releases as well as prereleases https://review.opendev.org/c/opendev/project-config/+/774312	22:06
fungi	clarkb: ^ i've tested that locally with both release and prerelease tag refs	22:06
* fungi sighs		22:07
fungi	bindep 2.9.0 didn't get enqueued when pushed because of that. i'll manually reenqueue it once that merges	22:08
fungi	no hurry on it	22:08
*** rchurch has quit IRC		22:29
*** rchurch has joined #opendev		22:31
*** whoami-rajat__ has quit IRC		22:39
*** DSpider has quit IRC		23:08
clarkb	fungi: +2 on the prerelease fix	23:19
clarkb	sorry I should've regex'd harder in my original review	23:19
fungi	nah, me too	23:19
fungi	thanks	23:19
openstackgerrit	Merged opendev/project-config master: Correctly match releases as well as prereleases https://review.opendev.org/c/opendev/project-config/+/774312	23:21
*** JayF has quit IRC		23:30

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!