Friday, 2021-02-05

mordredclarkb: that's not a terrible approach00:00
fungiyeah, sounds about right00:00
clarkbhorizon really hides the clouds.yaml file00:05
fungimaybe horizon would rather you used horizon ;)00:07
mordredclarkb: it does eventually provide one though right?00:08
mordred(that took a while to get added)00:08
clarkbmordred: ya you just have to dig through the ui. if you do the account dropdown it only shows openrc. for clouds.yaml you have to go to Project -> Api Access -> Download OpenRC file -> change it to clouds.yaml00:09
*** corvus has quit IRC00:09
* mordred should make a patch00:10
*** corvus has joined #opendev00:10
fungiindeed, part of the effort to raise up the universal sdk should at least mean the clouds.yaml is listed by default and you need to switch it to openrc if that's what you really want00:11
mordredalso - I keep meaning to make an "import clouds.yaml" command for osc00:11
mordredso that you could download a clouds.yaml from horizon, run "osc config import downloaded-file.yaml" and it would add that entry into your clouds.yaml - and potentially prompt you for a password (since the password will be missing from the downloaded file)00:12
fungithat'd be neat00:12
clarkbya looking at this a bit more I think the "manual tasks" so far are the actual button clicking to make a cloud, then will be adding other root's ssh keys to the hosts, then creating the project/users for nodepool things.00:12
clarkbI'm sure other stuff will come up too like quotas00:12
clarkbbut so far thats my list of what is required00:13
fungimaybe horizon should even start to suggest that command (once it exists)00:13
fungiclarkb: well, (re)building the mirror has rather a few manual steps later00:13
clarkbfungi: ya I mean more from the perspective of "how is this different than the other cloud we consume"00:13
fungioh, yep00:14
*** corvus has quit IRC00:15
*** corvus has joined #opendev00:17
clarkb those are their bootstrapping docs too fwiw which can help others who may want to take a look00:22
clarkbessentailly there is a cloud provisioning dashboard on their website and from that you get a cloud with horizon and apis and such00:22
clarkbnext up I've discovered I need to learn about roles00:24
clarkbthere are also groups?00:25
fungifancy modern keystone, yep00:26
*** tosky has quit IRC00:27
clarkbI think what we want is two users with role member. Then a project for each. To most properly mimic our other setups. I don't think we need any groups00:27
fungisounds right to me00:27
clarkbah ok I think I've hit my first major snag. I don't see https for horizon or api access00:29
clarkbthe cloud provisioner api is all ssl'd, but the layer under that for the cloud itself isn't00:30
clarkbI can do a port forward for bootstrapping things, but I think we really want nodepool to talk https, so why don't I ask if they have a way to do that already00:30
fungiyep, better than hacking something together ourselves when they already have a clean solution00:31
fungi(hopefully they do)00:32
clarkbI'm looking around on the host too but not finding a 443 listener00:32
corvusclarkb: agree re two users with member role00:47
clarkbsounds like there isn't a magic ssl setup we can lean on. But I think we may be able to set up a proxy on the hosts ourselves? I've provided this as feedback that something even self signed is important. I also let them know that clouds.yaml can verify self signed pretty easily so for api access at least its not terrible00:48
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: add a few more global excludes
corvusclarkb: seems like theoretically a service offering could use LE.  or if we use a proxy, maybe we could use LE?00:49
clarkbcorvus: yes, and yes. The first thing is sorting out how the exteranl access is all routed through kolla and stuff (there is a VIP of some sort) and I figured self signed would make that easy00:50
clarkbbut ya we should be able to have them generate LE certs00:50
*** dviroel has quit IRC00:50
fungican it be exposed through whatever's serving horizon?00:52
clarkbno I think we'll be putting it in front of horizon00:53
clarkbhorizon is running out of containers as part of their deployment and I worry that if you start changing that too much you'll trip up the auomation for the cloud stuff00:54
clarkbI should go help figure out dinner now. I'll look at the network setup and try to figure out how packets for the vip end up on the controllers and from that we can make plans for terminating ssl00:55
clarkber in the morning I mean00:55
fungiyes, definitely go dinner00:56
fungiand i'm a gonna drink00:56
openstackgerritIan Wienand proposed opendev/system-config master: ask: stream db backup
*** akrpan-pure has quit IRC01:23
*** DSpider has quit IRC01:54
*** hamalq has quit IRC02:14
openstackgerritIan Wienand proposed opendev/system-config master: ask: stream db backup
*** lbragstad has quit IRC02:35
fungizuul's getting close enough to idle i'm going to restart the zuul-executor containers on ze08 and ze12 to restore their missing console streamers02:43
ianwreverting mysql-client-5.7_5.7.32-0ubuntu0.16.04.1_amd64.deb makes dumps work again02:44
fungi#status log restarted the zuul-executor container on ze08 and ze12 to restore their console streamers, previously lost to recent oom events02:47
openstackstatusfungi: finished logging02:47
openstackgerritYanos Angelopoulos proposed openstack/diskimage-builder master: Prevent pip2 to try to update to version > 21
openstackgerritMerged opendev/system-config master: borg-backup: add a few more global excludes
openstackgerritMerged opendev/system-config master: ask: stream db backup
openstackgerritIan Wienand proposed opendev/system-config master: translate: backup zanata db directly to borg
*** lbragstad has joined #opendev03:15
ianwafter the askdb stuff there merges, i will purge it and re-create it, which should give us enough room to start backing up the wiki server03:26
ianwthe end result being we can retire bup03:26
*** brinzhang_ has quit IRC03:26
*** hemanth_n has joined #opendev03:35
*** brinzhang_ has joined #opendev03:35
openstackgerritMerged opendev/system-config master: translate: backup zanata db directly to borg
*** zimmerry has quit IRC04:55
*** ysandeep|away is now known as ysandeep04:55
*** ysandeep is now known as ysandeep|ruck04:55
*** zimmerry has joined #opendev04:56
*** ykarel has joined #opendev05:02
*** DSpider has joined #opendev05:28
*** ykarel has quit IRC05:50
*** ykarel has joined #opendev05:51
openstackgerritIan Wienand proposed opendev/system-config master: ask: fix backup typo and ignore live postgresql
*** eolivare has joined #opendev06:47
*** slaweq has joined #opendev07:02
*** marios has joined #opendev07:02
openstackgerritMerged opendev/system-config master: ask: fix backup typo and ignore live postgresql
*** rpittau|afk is now known as rpittau07:57
*** ralonsoh has joined #opendev08:02
*** sboyron has joined #opendev08:06
*** ysandeep|ruck is now known as ysandeep|lunch08:09
*** andrewbonney has joined #opendev08:18
*** hashar has joined #opendev08:24
*** ysandeep|lunch is now known as ysandeep08:33
*** ykarel is now known as ykarel|away08:39
*** tosky has joined #opendev08:41
*** ykarel|away has quit IRC08:48
*** fressi has joined #opendev08:52
*** jpena|off is now known as jpena08:57
*** ysandeep is now known as ysandeep|rover09:07
openstackgerritMartin Kopec proposed opendev/system-config master: Deploy refstack with ansible docker
*** JayF has quit IRC09:32
*** zimmerry has quit IRC09:32
*** JayF has joined #opendev09:36
openstackgerritLuigi Toscano proposed opendev/irc-meetings master: Fix the ID of the OpenStackSDK meetings
*** zbr is now known as zbr|pto09:54
zbr|ptofungi: I did my bit on --- now we need ok from others, maybe ianw clarkb or mordred09:55
*** dtantsur|afk is now known as dtantsur10:01
*** brinzhang_ has quit IRC10:08
*** Tengu has quit IRC10:30
*** hemanth_n has quit IRC10:52
*** dviroel has joined #opendev10:57
*** Tengu has joined #opendev11:08
*** mciecierski has joined #opendev11:24
*** ysandeep|rover is now known as ysandeep|afk11:35
*** zimmerry has joined #opendev11:43
*** dmellado has quit IRC12:21
*** dmellado has joined #opendev12:23
*** ysandeep|afk is now known as ysandeep|rover12:32
*** jpena is now known as jpena|lunch12:32
*** artom has joined #opendev12:37
*** eolivare_ has joined #opendev12:56
*** lbragstad_ has joined #opendev12:56
*** eolivare has quit IRC12:56
*** lbragstad has quit IRC12:56
openstackgerritMerged opendev/irc-meetings master: Fix the ID of the OpenStackSDK meetings
*** jpena|lunch is now known as jpena13:31
*** tkajinam has quit IRC13:50
openstackgerritDaniel Blixt proposed zuul/zuul-jobs master: Allow mirror push to delete current branch
*** jpena is now known as jpena|dojo13:51
openstackgerritDaniel Blixt proposed zuul/zuul-jobs master: Allow mirror push to delete current branch
*** zoharm has joined #opendev13:54
openstackgerritDinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos
*** chandankumar is now known as raukadah13:59
*** whoami-rajat__ has joined #opendev14:01
mordredfungi: was just reviewing the bindep stack (all lgtm) - the description-file -> long-description change made me think that should be a pbr patch14:30
mordredlike - pbr should translate use of description-file to the two long description fields, no?14:31
mordred(obviously if somoene gives the long description fields, that's cool)14:31
mordredI mean - actually the more that I think of it, in keeping with pbr's philosophy - I'm not sure what pbr can't just find README, README.rst or and dtrt without having someone have to provide that info at all14:32
*** artom has quit IRC14:48
openstackgerritMerged opendev/bindep master: Move all jobs in-repo
openstackgerritMerged opendev/bindep master: Build docs for OpenDev
openstackgerritMerged opendev/bindep master: Remove release note about rpm path references
*** mciecierski has quit IRC14:54
fungimordred: yeah, it could be argued that pbr should also simply step out of the way if there are standard setup.cfg fields setuptools already understand14:54
fungibut with setuptools use on the decline i don't know how much i want to wrestle it14:55
fungianyway, this is basically the same change i applied to zuul-client14:55
fungiand am expecting to do similar for, e.g., git-review14:55
fricklermordred: fungi: the bindep docs promote job has failed, but I'm not sure what the error is
mordredfungi: yah - I could see the argument of having it step out of the way - but on the other hand I've always seen part of its job to be taking care of fiddly stuff for you so you don't have to15:00
*** artom has joined #opendev15:00
mordredand while those long description fields are neat - they aren't really a pleasant interface :)15:00
fricklerah, maybe it was superseded, it passed on the next patch15:00
*** ysandeep|rover is now known as ysandeep|away15:03
*** lbragstad_ is now known as lbragstad15:03
fungifrickler: oh, thanks for spotting, i'll double-check that but yeah there was a patch to move jobs in-repo and another to add the opendev docs publication jobs, so maybe i should have squashed something15:12
fungii half expect to find fallout i need to clean up from switching zuul tenants anyway, so need to check over a few things closely15:14
*** rpittau is now known as rpittau|afk15:17
openstackgerritMerged opendev/bindep master: Overhaul Python package metadata
*** redrobot3 has joined #opendev15:45
*** redrobot has quit IRC15:49
*** redrobot3 is now known as redrobot15:49
openstackgerritMerged opendev/bindep master: Update contributor doc and readme
*** tosky has quit IRC16:29
*** fressi_ has joined #opendev16:36
*** fressi has quit IRC16:36
*** fressi_ is now known as fressi16:37
fungiclarkb: ianw: almost glacial, but the stable proposed updates request to approve a patched openafs into debian/buster was filed today:
openstackDebian bug 982002 in "buster-pu: package openafs/1.8.2-1" [Normal,Open]16:49
*** marios is now known as marios|out16:49
clarkbfungi: better late than never. I guess it was only 1.6 that started working again in february16:49
fungithough i don't think we consume tpu in our nodes, so might still need to wait for a stable point release of buster16:50
clarkbour ppa should work until then16:50
fungithey weren't able to convince the security team that it qualified for inclusion in debian-security16:51
fungiand non-security-related package updates into stable take waaaaay longer16:51
clarkbI suppose that it isn't a security issue if it just fails to work :)16:51
*** marios|out has quit IRC16:56
*** zoharm has quit IRC17:00
fungifrickler: taking a closer look at that initial opendev-promote-docs failure i think the problem is it ran even though we didn't do a docs build in the gate, so it failed to download the nonexistent artifact from that17:01
fungitechnically not a download failure, but it tried to reference a list index in a null list17:02
fungias a result of having no artifact to retrieve17:02
fungiwe could probably make the "download-artifact: Parse build response" task a little more robust there, but it's a corner case i'm tempted to just ignore17:03
*** klonn has joined #opendev17:05
corvusclarkb: your gerrit status is turkey time17:08
fungiit's always turkey time!17:09
fungikinda like how everyday is hallowe'en17:09
corvusand the eternal september?17:09
fungiwe don't speak of that17:10
clarkbI never updated it after novemeber17:10
*** ralonsoh has quit IRC17:14
*** diablo_rojo has joined #opendev17:20
fungii'm sort of surprised moving bindep out of the openstack tenant hasn't created any new zuul config errors, but no matches for bindep anywhere in the mess at
fungiand aside from the aforementioned docs promote first-time race and a mysterious retry on a single tox-pep8 run, all the bindep jobs are succeeding consistently17:25
fungiand documentation is now correctly redirected to and we have release notes getting published for it, so i think it's time to tag17:26
fungiinfra-root: any objections to tagging a bindep 2.9.0 release shortly?17:27
fungiactually i may tag just to make sure the release jobs are functional after all the job switcheroo17:28
clarkbtagging an rc first sounds like a good idea17:29
clarkbno objections from me17:29
*** eolivare_ has quit IRC17:30
fungibombs away17:31
fungididn't trigger anything, checking the pipeline defs17:36
fungiyeah, our release pipeline doesn't match on prerelease strings, working on a patch17:38
openstackgerritJeremy Stanley proposed opendev/project-config master: Recognize PEP-440 prereleases in release pipeline
fungiinfra-root: ^ i expect my explanation there makes sense for why we shouldn't need a separate prerelease pipeline, but please do let me know if it seems like a bad idea17:43
fungi(note this is specifically for the opendev tenant in zuul)17:43
clarkbfungi: do you remember why we split them before?17:44
fungiclarkb: in the beforetime, in the longlongago, pip didn't know to avoid installing prereleases automatically, so we only published wheels and not tarballs, hence ran separate jobs between the two pipelines17:45
fungii think we may have in some cases also avoided running specific jobs to do things like update documentation if it was a prerelease17:46
*** klonn has quit IRC17:47
fungione other thing i'm not sure how to tackle cleanly, looks like we only publish docs when new changes merge, so they'll have references to the old release until the first change merges after a release is tagged. an obvious workaround is to merge something trivial soon after releasing17:48
*** jpena|dojo is now known as jpena|off17:48
fungiback when we did docs builds and uploads in post, it was fairly easily to make those jobs also be able to run similarly in release17:49
fungibut with the gate+promote it seems like that's not as simple to apply to something you trigger after tagging a release17:50
fungiprobably not urgent to solve, just means we have an incentive to review/merge stuff soon after any release17:50
clarkbgerrit account cleanup update: tbachman has two accounts, one of which is the actively used account and that one had a preferred email address has no external id issue. The preferred email address that was set was in use by the other account (but the other account wasn't used for reviews or code pushing)17:53
clarkbtalking to tbachman we decided that setting the preferred email address on the active account to match the email address associated with the openids on that account was the easiest option. Trying to add external ids for the preferred email address would not work because that would conflict with the other account17:54
clarkbtbachman made this change through the web ui on review-test, then we tested that log out and log in shows the account id did not change. We also checked that reviews and code pushes could still be made17:54
clarkbsince that all looked good tbachman went ahead and made the change on prod and that is one less account with consistency issues17:54
clarkbMy next step is to disable the other account to reduce confusion when people try to add reviewers17:55
clarkbI have not done that yet, but am proceeding with it now17:55
clarkbposting all this to channel as I think there may be some other accounts whee we just set the preferred email to match the openid (though maybe without user intervention)17:55
fungithanks for the walkthrough! i guess after the old account is disabled you're going to rerun the consistency check and see if those accounts no longer appear as errors?17:56
clarkbfungi: "yes". I was hoping to get through a few more accounts before rerunning the consistency check, but if we think that is prudent to do first I can go ahead and do that17:57
clarkbfungi: review-test:~clarkb/gerrit-consistency-notes/further-preferred-email-cleanups <- for my notes on cleaning up another 20 or so. They are arranged in roughly most confident to least confident in plan from beginning to end of that file if you want to take a look17:57
clarkbsort of related: I was just looking at gerrit set-account to get the command right for --inactive. It looks like the ssh api for accounts can remoev and add and set preferred emails. Maybe we can use this as a way around the lack of a rest api for updating external ids? at least for emails17:59
clarkbI'll file that away for further testing on review-test if it looks like it will be useful18:00
clarkband the second account for tbachman has been set inactive18:02
fungiyeah, ideally we could. having seen the conditions under which those cli calls fail though, i'm skeptical we'll be able to rely on it to fix broken accounts18:02
clarkbya will definitely need testing18:02
fungiit seems to give up in the face of incomplete account values18:02
*** dtantsur is now known as dtantsur|afk18:06
clarkbfungi: maybe if you get a chance you can look over the first set of accounts in that further-preferred-email-cleanups, there are 8, and I can run the script from last time to retire them if that plan looks good to you. Then rerun the consistency checks?18:06
clarkbthat set are accounts that I suspect we merged but elft behind ssh username external ids (since you can't merge those like the openids as far as I know)18:07
fungican do18:07
clarkbI will run a consistency check against -test now and confirm tbachman's account is happy there though18:09
clarkbfungi: I've +2'd the pipeline change for oepndev releases.18:13
clarkbconsistency check on -test shows tbachman's problem is gone from that perspective too \o/18:16
fungiclarkb: looking at your notes, are ssh usernames now separate from http usernames?18:22
fungior are they differentiated by whether or not they have a password?18:23
clarkbfungi: oh I guess not. I've always thought of them as ssh because the http is newer (though old at this point)18:23
clarkbI believe none of them had passwords because those are external ids as well and I tried tonote all the different ypes of external ids they had but let me double check that18:23
fungiokay, so when it says "ssh username external id" that's specifically a "username:something" external id18:24
fungiused to be username external ids had a password in that record as a separate column/field18:25
clarkbyup they still do. But that first account in my list does not have that record.18:25
clarkbI can check the other 7 really quickly too18:25
clarkbso ya if the http passwd is set it would show up in the username:foo record. But if not set it doesn't have a line for that18:25
fungiso usually if there was a username and an ssh public key then it might be in use for ssh access, while a username with a password then it might have been used for rest api access18:26
fungi(the ssh public key being in a separate table/store of course)18:26
fungialso they could have a username with no password and no ssh keys for the account, fairly common for older accounts since the openid used to autopopulate a username18:27
fungiin those cases the account was at most only being used with the webui18:27
fungisimilar to accounts with no username18:27
clarkball 8 of those have no http passwd. 7 of them have only the one external id for the username. The odd one out has  a mailto externalid as well18:28
clarkbI had classified these separately because they all had a second account that appeared to be the actual one18:28
fungiright, so almost guaranteed to be relics of older duplicate account merges/fixwes18:29
clarkbthe other account having openids and such18:29
clarkbthe lack of openids on these being the main clue18:29
clarkbwell when coupled with a secnodary account having openids18:29
fungiyeah, only accounts which had been hand-fixed for situations like that or old manually created service accounts for bots and third-party ci systems would lack openids entirely18:30
clarkbThe next group is the third party ci category18:31
clarkbthere are two in that one, netiehr show up on the wiki today so I was thinking we can retire them too and if they need CI for that a new account set up in the modern method would be best18:32
fungiyeah, i agree with your analysis on both those groups18:32
clarkbthen as you go further in that file my confidence in various classifications becomes weaker, though for the most part I think we can retire those accounts. tbachman was top of the "ok we need to actually figure this out and not retire it" list18:32
clarkbfungi: cool, should I go ahead and use my script from last time to retire the first 2 groups (10 accounts) ?18:33
fungiyes, i think that's safe18:34
fungidid you want to adjust the invocation to catch stderr this time?18:34
clarkbI can18:34
fungiotherwise the set -x isn't super helpful as we miss it in the log18:34
clarkbthen I can upload the resulting log file next to the existing one on review18:34
fungisounds great18:35
clarkbwill take me a few minutes to page in how that all worked and get perms sorted out but proceeding with that shortly18:35
*** klonn has joined #opendev18:35
fungiinfra-root: i'm self approving 774291 to augment the release pipeline in the opendev tenant for prerelease version strings for now, but if you disagree please don't hesitate to revert it18:35
openstackgerritMerged opendev/project-config master: Recognize PEP-440 prereleases in release pipeline
*** tosky has joined #opendev18:37
fungiand now that's merged, i'm manually reenqueuing the bindep tag ref18:40
fungian opendev-release-python build is queued for it now18:41
fungiand it's running18:42
fungiyay! shows a now18:45
fungithe new project links appear correctly in the left sidebar of too18:46
fungionce i'm done inspecting the sdist and wheel contents and doing some install tests from pypi, i'll tag 2.9.0 proper and put together a release announcement with the release notes we've got for it18:49
clarkbthe first set of 8 account updates is done. Doing the couple of CIs next (split them in order to change the commit message18:49
clarkband the CI accounts are done now too. That is a total of 8 + 2 + tbachman's 1 = 11 more fixed account issues18:54
clarkbI'll work on rerunning the consistency checker next18:54
fungistrange, i thought the license_files key in setup.cfg would cause AUTHORS and LICENSE to be included in the wheel, but they aren't as far as i can see18:56
clarkbyou may need manifest.ini?18:56
clarkbsince pbr is figuring out the file list itself maybe it overrides some setuptools defaults?18:56
clarkbI put my log file at review:~clarkb/gerrit_user_cleanups/user-retirement.log.20210205 too btw18:57
fungiactually, i'm mildly confused by which files it's decided to include18:57
fungifor example the wheel includes bindep/releasenotes/notes/use-distro-library-db71244a0a5cf1dd.yaml but nothing else from the releasenotes tree18:57
clarkbI thought pbr was supposed to do anything it thought was part of the python package (eg things in dirs with and then also things in MANIFEST.in18:59
clarkbbut I'd have to go read docs/code to be sure of that18:59
*** klonn has quit IRC19:02
clarkbI have requested a consistency check from prod. Now we wait19:02
clarkbdiffing older consistency report and one just generated looks like I expected. There were 11 additional issues last time. Those are now gone. There are no new issues19:09
clarkbthere are 17 preferred email missing external id issues now. (Down from 109 when we started)19:09
clarkbthe rest of them are classified in my document I noted above. I just think that more careful review will be needed on those19:10
clarkbI'm going to deescalate my privs now as things look to check out to me19:10
fungicomparing the published bindep wheel to one of my personal projects, i would expect to see AUTHORS and LICENSE inside the dist-info tree19:12
fungihuh, when i build a bindep wheel on my workstation, those files are included19:14
fungiso this may come down to setuptools/wheel versions19:15
*** andrewbonney has quit IRC19:15
mordredyeah - I thought pbr put the two of those into the dist by default19:17
mordredoh - I think pbr ensures they wind up in the source dist19:17
fungiyeah, this may be because we're running the job on ubuntu-bionic, i wonder if ubuntu-focal will fare better19:18
openstackgerritJeremy Stanley proposed opendev/bindep master: Build releases on ubuntu-focal
fungistepping away to prep for dinner while that runs19:27
openstackgerritMartin Kopec proposed opendev/system-config master: Deploy refstack with ansible docker
openstackgerritJames E. Blair proposed openstack/project-config master: Disallow fail-fast in Zuul
corvusfungi: ^ i believe some of us are under the impression that patch was already written and merged, but apparently it was not.  apparently octavia has managed to find and use that option without us knowing about it; and iurygregory was asking about using it.19:33
corvusi'm not in a position to advocate for or against it at this point, but i did want to point out the omission and offer a correction.  feel free to reject that patch if the policy has changed.19:34
johnsomcorvus It was documented....  Octavia only uses it for the gate pipeline, which isn't going to bring much developer value.19:35
corvusjohnsom: documented where?19:35
corvusi'm sorry if i missed the policy change19:36
corvusjohnsom: oh, yeah, i know it's documented in zuul, i was intimately involved in the creation and review of the change to implement it.  :)19:38
johnsomThe only docs we have... grin19:38
corvusjohnsom: but i mean the openstack-infra team looked closely at the situation some time ago and decided that we did not want it used in openstack's zuul19:39
johnsomThe value for us is the gate jobs can run hours, so if one of them fails there really is no point holding up a bunch of instances for hours bringing no value.19:39
corvusso we actually designed the feature in zuul to make it possible to force it off19:39
johnsomHmmm, well, that discussion wasn't communicated. The Octavia team voted to enable it two years ago19:40
corvusyeah, clearly there was an oversight19:40
corvuslike i said, i think most of us thought the option was already set that way19:41
*** sboyron has quit IRC19:41
corvusjohnsom: i see the argument for allowing it in gate, since, iff you have a clean-check system where check is required for gate, then users should have been exposed to all the errors already.  it's certainly a nuance worth considering.19:42
corvus(however, i'm an advocate of not having clean-check, so patches can go directly to gate, and in that case, fail-fast would be detrimental)19:43
corvus(but openstack *does* have clean-check)19:43
mordredthe whole stack should likely be considered holistically - developer patterns are a bit different now than they were when we put in clean-check in the first place19:43
johnsomYeah. I agree it's not helpful for the check pipeline, but gate is a different story.19:44
corvusthe biggest concern is that at a quick glance, most people say "oh i want fail-fast so we use less resources and get info faster" and that's counterintuitively often the opposite.19:45
johnsomYeah, totally for check. But most gate pipeline failures are infra/nodepool instance/etc. failures.19:46
corvusto be honest, i'm not sure we actually evaluated the impact of fail-fast in a dependent pipeline19:46
johnsomNo point in letting them run 2+ hours when they are all going to fail because someone broke a post script19:46
mordredalso - in gate - quicker gate resets have a potential knock-on effect19:47
mordredcorvus: yah - I don't tknow that I'd really thought about it for gate before today19:47
corvuswhat happens to a change in gate if fail-fast is enabled?19:47
corvus(the user story for adding the feature was all about check; there are no tests of it in gate, so it technically has undefined behavior)19:47
johnsomThe job fails, the other jobs are cancelled, and zuul votes -119:47
corvusjohnsom: what if it's not the leading change?19:48
johnsomIt has worked perfectly for two years... lol19:48
johnsomIt's the same, the finished jobs still show complete.19:48
corvusbut it stays in the queue, right?19:49
corvus(ie, if change B follows A in the gate pipeline, and change B fails one job, then change B should cancel all remaining jobs, and wait there until A completes; if A fails, B should restart, and if A succeeds, B should report -2).19:50
mordredcorvus: what about change C behind B - C should restart immediately when B fails on nearest non-failing, right? but then if B restarts if A fails, C should restart again?19:52
johnsomOh, you mean a patch chain. I can't say I know for sure.19:52
johnsomWouldn't it be the same as any Zuul -2 vote?19:53
corvusmordred: yes19:53
johnsom"dependent patch failed" or something like that if I remember right19:53
corvusjohnsom: not necessarily a git dependency; zuul establishes ordered dependencies in dependent pipelines19:54
openstackgerritGomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size
corvus(so those could be unrelated changes)19:54
johnsomWell, I threw a vote on there for the conversation.20:08
*** auristor has quit IRC20:21
*** auristor has joined #opendev20:21
*** slaweq has quit IRC20:37
iurygregoryso it's not ok to use fail-fast? (for example in check pipeline?)20:38
*** klonn has joined #opendev20:45
fungiiurygregory: it's not a zuul feature the opendev systems administrators intended to expose, and its addition to zuul explicitly allowed disablement so that it wouldn't be accidentally enabled in opendev, we just neglected to actually disable it once it became available21:02
fungibut if projects in some tenants have been using it for two years, i suppose it's worth discussing keepingh21:03
fungiin unrelated news, running build-python-release on ubuntu-focal (a la 774299) does seem to correctly include the files from the license_files metadata in the resulting wheels, so i think that's our solution21:09
fungii'm just going to self-approve that and push an rc2 tag21:09
corvusfungi, iurygregory: i feel somewhat strongly it should not be used in openstack in check.  i have confidence in the analysis we did previously that doing so would result in more resource usage rather than less.21:10
fungicorvus: is it worse or better than the workaround some projects are using of setting slower jobs as dependent on their faster jobs?21:10
corvusi'm less confident that holds true for gate in openstack specifically, though i still lean in that direction.21:10
corvusfungi: what is being worked around?21:11
corvusit's definitely better than that21:11
fungithe desire to have changes report sooner if faster/simpler jobs fail21:11
fungii've seen a number of projects, counter to our recommendations, make things like devstack jobs depend on pep8 jobs21:11
fungialso not a good idea for the same reasons, i think, but we can't really disable that21:12
corvusis that why there's a conversation about how long it takes for jobs to report?21:12
fungithe main conversation on why it takes so long for jobs to report has to do with backup in the available node quota21:13
fungipeople are trying to find ways to reduce overall node utilization in openstack21:13
corvusi know, i read it; i just didn't see anything about devstack depending on pep8 with that21:14
corvuswhich is why i asked21:14
fungioh, that was going on for much longer, i don't think it was brought up specifically21:15
corvusif devstack depends on pep8, then *of course* the system is going to be slow :(21:15
fungiit's been in specific projects (e.g. tripleo), but i don't know who might be doing it today21:15
*** slaweq has joined #opendev21:16
corvusftr, that is about the best way to slow down the overall throughput.  so it's just another of the ways that this can be counterintuitive.21:17
corvus(if you tasked me with figuring out a way to slow things down, that would be #2 on my list)21:18
corvus("that" == slow jobs depending on fast ones)21:19
clarkbthe slowness conversation started because lastweek (or was it week before last?) the rtt for a new nova change was > 1 day21:20
clarkbwe ran at node capacity for basically the whole work week21:21
corvusi know that21:21
*** slaweq has quit IRC21:22
corvusi saw dan's analysis, i think that's great and i'm supportive21:22
fungiyeah, here's a current example of tripleo still doing that:
corvusi was just alarmed to see fail-fast suggested as a solution since we know it's not.21:22
fungithey only run their heavier jobs if linters and unit tests succeed21:23
corvusthat's going seriously slow down the overall throughput21:23
corvusbut, tbh, that will probably make things better for nova if they don't share a change queue21:23
clarkbthey don't tripleo is in one queue and nova + cinder + glance + neutron + tempest + devstack and probably a few others are in another21:24
corvusit's all interconnected.  tripleo doing that frees up nodes for nova, but it means an individual tripleo change takes much much longer than it otherwise would to report, and they are subject to more revisions and rechecks.21:27
corvus(which eventually ends up eating more capacity if there is any available)21:28
fungiunless it manages to drive away contributors ;)21:28
corvusit looks like our effective capacity is only 750 nodes?21:30
*** hashar has quit IRC21:31
fungiyeah, we're trying to get in touch with inap to see if they can clean up rogue instances. every time we spin them back up we get a bunch of ssh timeouts and changed host keys21:31
fungiit's been that way for months21:31
fungimgagne: looks like you're around at the moment! any chance you can get someone to take a look at that?21:31
mgagne@fungi: sorry about that, the "maintenance" has been in progress for several months now.21:31
fungimgagne: oh, no worries, just didn't know if it was something we had lost track of21:32
mgagneI tried poking people with sticks but those people depends on other people as well. I didn't know about the duplicated IPs issue until I took a look at previous merged changes.21:32
fungithe inmotion cloud clarkb's been working on bringing up may help relieve some pressure too21:33
mgagneI think it's better to keep it disabled for now until we figure out that situation. They are aware that you disabled the whole inap region, which I find unfortunate. But I don't have much control over it.21:33
clarkbmgagne: fwiw dansmith and melwitt thought there may be some known nova + cells issues to explain that behavior21:34
clarkbfungi: re inmotion I think the initial IP allocation is quite small ~/28 so unlikely to immediately help, but they say ipv6 is planned so hopeflly we'd be able to transition from /28 small resources to ipv6 many resources21:34
mgagneI'm sure there is, we are stuck with Mitaka right now, there is unfortunately no plan to upgrade it.21:34
fungiclarkb: presumably you mean ipv4 /28 (for ipv6 that'd be... huge?)21:35
clarkbfungi: yes /28 for ipv421:35
fungibut yeah, ~14 usable addresses probably not a big win yet21:35
clarkbipv6 isn't deployed yet but is planned21:36
fungithat'll be cool21:37
melwittthe thing I was talking about re: duplicate IPs that time was if an instance is deleted while nova-compute is "down" and 'nova-manage db archive_deleted_rows' happens to run [via a cron or such] while nova-compute is down, it will leave a libvirt guest running that will never be reaped and it could be using an IP that was freed and could be given back out to a new instance21:42
melwittthe way to avoid that is to make use of the '--before <date/time>' option to archive_deleted_rows to give a buffer zone for down nova-compute issues to be resolved before its instances are swept away by an archive21:43
mgagnewe do run the archive_deleted_rows cron21:43
iurygregoryfungi, corvus gotcha =)21:44
mgagneour maintenance includes putting down a bunch of compute nodes. those have been down for several weeks so our buffer would beed to be... like months.21:44
openstackgerritMerged opendev/bindep master: Build releases on ubuntu-focal
corvusremote: Add a test for fail-fast in the dependent pipeline [NEW]21:45
melwittack. it's defaulted to --before 90 days in tripleo fwiw21:45
corvusjohnsom, mordred: ^ i have confirmed the behavior in zuul is as i described earlier, so that's good.  there's a test so we don't regress.21:46
mgagnemelwitt: thanks for teaching me about that option, I'll see if we can enable it. Unfortunately, I don't think it would have prevented that specific situation since computes were down for several weeks. But it could help "normal" use in the future.21:49
mgagnebut it could also be that the issue was caused by "normal" use, we did have that issue several time in the past without any down compute nodes.21:50
fungii've pushed for bindep, looks like it correctly triggered the release pipeline this time, will confirm the uploaded wheel has the missing files once that completes21:51
melwittmgagne: can you remind me, are you on cells v1? I'm remembering now that dansmith was thinking along the lines of things related to cells v1 could be going on21:53
mgagneyes, cells v1 is used, we have Nova Mitaka /shame21:53
mgagne~1 year ago, we had plan to upgrade to Queens, politics happened and here we are today.21:54
melwittok, yeah. dan was pointing out how cells v1 involves syncing data up and down to/from the "api cell" and the "compute cells" and failure to sync could maybe present this kind of issue21:54
melwittthat is, we've had and have a lot of problems around that syncing mechanism which drove the change to "cells v2"21:55
mgagneI'm sure it does, I also found that the "reaper" doesn't work well with cells v1. You can end up with orphans on the compute nodes and it will never find them out.21:55
fungiokay, bindep- on pypi has the expected contents, so i'll tag 2.9.0 now21:57
mgagneOne challenge we had for the upgrade is that Nova is kind of coupled with Ironic in our case. We can't easily fast-forward without upgrading both.21:57
mgagneIronic changed drivers architecture so we would have to address that too since we do have custom drivers. + introduction of placement. + migration to cells v2. Lot at the same time to push forward.21:57
melwittok yeah, then you've already found this I think. that reap task is how the rogue vms get cleaned up and if it's not working right, then that would definitely get you the duplicated IP problem (rogue vm still using IP and it's given out to a new vm)21:57
mgagneI think instance is deleted at API cell level but not compute and reaper reads from compute cell database. and there is nothing to fix the discrepancies when that happens.21:59
mgagneor could it be 2 years ago? my memory is very bad with time, now I feel old.21:59
melwittyeah... that makes sense. I don't recall off the top of my head about how what we call "local delete" works in cells v1 but what you're saying makes sense22:00
mgagneIt seems we should be able to bring back the region online next week. I'll sure poke this channel back when ready.22:01
melwitthm, looks like we don't free the network during local delete, so it seems like it shouldn't result in the IP being given out again22:03
melwitt(in mitaka)22:03
melwitter sorry, I think it would. I misread this 'if self.cell_type != api' as meaning it wouldn't free the network but it would do it at the compute cell level
openstackgerritJeremy Stanley proposed opendev/project-config master: Correctly match releases as well as prereleases
fungiclarkb: ^ i've tested that locally with both release and prerelease tag refs22:06
* fungi sighs22:07
fungibindep 2.9.0 didn't get enqueued when pushed because of that. i'll manually reenqueue it once that merges22:08
fungino hurry on it22:08
*** rchurch has quit IRC22:29
*** rchurch has joined #opendev22:31
*** whoami-rajat__ has quit IRC22:39
*** DSpider has quit IRC23:08
clarkbfungi: +2 on the prerelease fix23:19
clarkbsorry I should've regex'd harder in my original review23:19
funginah, me too23:19
openstackgerritMerged opendev/project-config master: Correctly match releases as well as prereleases
*** JayF has quit IRC23:30

Generated by 2.17.2 by Marius Gedminas - find it at!