Tuesday, 2020-11-03

*** lseki has joined #openstack-infra		00:00
*** tosky has quit IRC		00:26
*** yamamoto has joined #openstack-infra		00:32
*** yamamoto has quit IRC		00:39
*** yamamoto has joined #openstack-infra		00:48
*** yamamoto has quit IRC		00:48
*** yamamoto has joined #openstack-infra		00:49
*** mmethot_ has quit IRC		00:51
*** yamamoto has quit IRC		00:54
*** ramishra has joined #openstack-infra		00:55
openstackgerrit	wu.shiming proposed openstack/openstack-zuul-jobs master: Fix hacking min version to 3.2.0 https://review.opendev.org/760746	00:59
*** zzzeek has quit IRC		01:13
*** zzzeek has joined #openstack-infra		01:16
*** lbragstad has joined #openstack-infra		01:37
*** yamamoto has joined #openstack-infra		01:37
*** yamamoto has quit IRC		01:39
*** yamamoto has joined #openstack-infra		01:40
*** hamalq has quit IRC		01:41
*** yamamoto has quit IRC		01:41
*** yamamoto has joined #openstack-infra		01:42
*** yamamoto has quit IRC		01:50
*** lbragstad has quit IRC		02:13
*** lbragstad has joined #openstack-infra		02:40
*** kaisers has quit IRC		02:48
*** rcernin has quit IRC		02:50
*** ianychoi has quit IRC		03:00
*** rcernin has joined #openstack-infra		03:00
*** rcernin has quit IRC		03:07
*** rcernin has joined #openstack-infra		03:20
*** ianychoi has joined #openstack-infra		03:21
*** rcernin has quit IRC		03:25
*** lbragstad has quit IRC		03:27
*** rcernin has joined #openstack-infra		03:31
*** lbragstad has joined #openstack-infra		03:33
*** ociuhandu has joined #openstack-infra		03:40
*** rcernin has quit IRC		03:40
*** rcernin has joined #openstack-infra		03:40
*** ociuhandu has quit IRC		03:49
*** auristor has quit IRC		03:49
*** auristor has joined #openstack-infra		03:53
*** ykarel\|away has joined #openstack-infra		04:43
*** lbragstad has quit IRC		04:44
*** mtreinish has quit IRC		04:44
*** ykarel\|away is now known as ykarel		04:47
*** evrardjp has quit IRC		05:33
*** evrardjp has joined #openstack-infra		05:33
*** matt_kosut has joined #openstack-infra		05:53
*** ysandeep\|away is now known as ysandeep		05:53
*** ysandeep is now known as ysandeep\|ruck		05:53
*** lmiccini has joined #openstack-infra		06:35
*** rcernin has quit IRC		06:45
*** gnuoy has joined #openstack-infra		06:48
*** ramishra has quit IRC		07:14
*** vishalmanchanda has joined #openstack-infra		07:22
*** dklyle has quit IRC		07:28
*** ramishra has joined #openstack-infra		07:37
*** ralonsoh has joined #openstack-infra		07:41
*** eolivare has joined #openstack-infra		07:42
*** zzzeek has quit IRC		07:55
*** zzzeek has joined #openstack-infra		07:56
*** slaweq has joined #openstack-infra		08:01
*** hashar has joined #openstack-infra		08:03
*** jcapitao has joined #openstack-infra		08:05
*** sboyron has joined #openstack-infra		08:11
*** andrewbonney has joined #openstack-infra		08:14
*** rpittau\|afk is now known as rpittau		08:19
*** gfidente has joined #openstack-infra		08:35
*** lpetrut has joined #openstack-infra		08:36
*** tosky has joined #openstack-infra		08:40
*** ykarel has quit IRC		08:42
*** matbu has quit IRC		08:51
*** matbu has joined #openstack-infra		08:52
*** jpena\|off is now known as jpena		08:56
*** ysandeep\|ruck is now known as ysandeep\|lunch		08:58
*** lyarwood has quit IRC		09:01
*** lucasagomes has joined #openstack-infra		09:04
*** sboyron has quit IRC		09:12
*** sboyron has joined #openstack-infra		09:12
*** ociuhandu has joined #openstack-infra		09:19
*** dtantsur\|afk is now known as dtantsur		09:35
*** derekh has joined #openstack-infra		09:37
*** nightmare_unreal has joined #openstack-infra		09:41
Tengu	hello there	09:52
Tengu	fyi there are apparently CI jobes failing with oom-killer on a regulare basis. Not sure if it has to be reported here though... ?	09:53
*** sshnaidm\|afk is now known as sshnaidm\|rover		09:56
*** ysandeep\|lunch is now known as ysandeep\|ruck		10:02
*** ykarel has joined #openstack-infra		10:06
frickler	infra-root: ^^ see https://review.opendev.org/757488 which seems to fix this, do we want to globally revert the swapsize to 8G?	10:33
*** d34dh0r53 has quit IRC		10:50
*** dchen has quit IRC		10:51
*** soniya29 is now known as soniya29\|afk		11:11
*** dtantsur is now known as dtantsur\|bbl		11:16
*** jcapitao is now known as jcapitao_lunch		11:36
openstackgerrit	Jens Harbott (frickler) proposed openstack/openstack-zuul-jobs master: Set default swap size back to 8G https://review.opendev.org/761119	11:43
frickler	Tengu: ^^ FYI	11:44
*** mtreinish has joined #openstack-infra		11:47
*** rfolco has joined #openstack-infra		12:00
Tengu	frickler: ah, thanks. So I should update my depends-on then	12:09
Tengu	frickler: and moving to 8gb makes the two jobs pass in my review!	12:09
*** janders has joined #openstack-infra		12:16
janders	I'm having permissions issues with my gerrit account, I suspect it has to do with the cleanup after the security issue a couple weeks back. Do you happen to know how to get some assistance with this? Thanks in advance.	12:17
*** eolivare has quit IRC		12:19
frickler	Tengu: I don't think a dependency will work on this patch since it is a config project	12:23
*** rlandy has joined #openstack-infra		12:26
*** jcapitao_lunch is now known as jcapitao		12:27
*** jpena is now known as jpena\|lunch		12:32
Tengu	janders: did you update to fedora-33? if so, it might also be due to some new policies being enforced in ssh client	12:33
janders	Tengu yes!	12:33
janders	Tengu setenforce 0 and retest?	12:33
Tengu	janders: nope, not related to selinux	12:33
Tengu	wait a second	12:33
janders	Tengu ACK	12:34
Tengu	janders: I didn't upgrade to fc-33, but there's a thing that changed in /etc/ssh/ssh_config.d/50-redhat.conf	12:34
Tengu	janders: as a quick workaround, you can comment out the "Include /etc/ssh/ssh_config.d/*.conf" in /etc/ssh/ssh_config file	12:34
Tengu	no restart is needed, since it's only for the ssh client, that re-reads this file upon each invocation.	12:35
Tengu	not sure about the proper correction - maybe it's related to the ssh key.... as said, I didn't upgrade yet.	12:35
janders	Tengu trying this now	12:36
Tengu	janders: is it better?	12:37
janders	Tengu no change	12:37
Tengu	oh? err... weird.	12:38
Tengu	janders: https://www.reddit.com/r/Fedora/comments/jhxbdh/no_ssh_public_key_auth_after_upgrade_to_fedora_33/	12:38
Tengu	just found that one.	12:38
Tengu	might be better.	12:38
ykarel	Tengu, you seen https://fedoraproject.org/wiki/Changes/StrongCryptoSettings2 ?	12:42
ykarel	if things not work, ^ suggests update-crypto-policies --set DEFAULT:FEDORA32	12:42
ykarel	or update-crypto-policies --set LEGACY	12:42
Tengu	ykarel: one of the proposal in the reddit link points to that update-crypto-policies	12:43
Tengu	guess I should be good with my key(s) here.	12:44
Tengu	but it will become funky once I update my private infra. Time to re-key a bit.	12:44
ykarel	Tengu, ack i have not faced the issue but someone shared above link related to above issue so shared here	12:45
Tengu	ykarel: thanks for sharing :). as said - I'm still on fc-32	12:46
*** lbragstad has joined #openstack-infra		12:53
*** dtantsur\|bbl is now known as dtantsur		12:54
janders	thank you for your assistance frickler Tengu ykarel I'm back in business	12:55
janders	it was some weirdness with ssh ports	12:55
frickler	janders: great to hear that, let us know if you see any further issues	12:58
janders	frickler thank you, will do!	12:58
*** eolivare has joined #openstack-infra		12:59
*** __ministry1 has joined #openstack-infra		13:01
*** lpetrut has quit IRC		13:02
*** lbragstad has quit IRC		13:06
*** lbragstad has joined #openstack-infra		13:09
*** __ministry1 has quit IRC		13:21
*** __ministry1 has joined #openstack-infra		13:21
*** derekh has quit IRC		13:27
*** mmethot has joined #openstack-infra		13:31
*** jpena\|lunch is now known as jpena		13:32
*** raukadah is now known as chandankumar		13:32
*** __ministry1 has quit IRC		13:38
*** __ministry1 has joined #openstack-infra		13:38
*** slaweq has quit IRC		13:42
*** __ministry1 has quit IRC		13:46
*** __ministry1 has joined #openstack-infra		13:46
*** slaweq has joined #openstack-infra		13:47
Tengu	frickler: fyi the depends-on is working :).	13:49
Tengu	let's get your patch merged now.	13:49
*** __ministry1 has quit IRC		13:49
*** __ministry1 has joined #openstack-infra		13:51
*** __ministry1 has quit IRC		13:52
*** __ministry1 has joined #openstack-infra		13:59
*** __ministry1 has quit IRC		14:01
*** __ministry1 has joined #openstack-infra		14:02
*** derekh has joined #openstack-infra		14:05
*** __ministry1 has quit IRC		14:06
*** d34dh0r53 has joined #openstack-infra		14:10
*** jamesdenton has joined #openstack-infra		14:17
*** Goneri has joined #openstack-infra		14:36
*** ykarel is now known as ykarel\|away		14:44
*** lmiccini has quit IRC		14:45
*** ysandeep\|ruck is now known as ysandeep\|away		14:55
*** lmiccini has joined #openstack-infra		14:55
*** wolsen has quit IRC		15:00
*** mordred has quit IRC		15:00
*** kwazar has joined #openstack-infra		15:05
*** ykarel\|away has quit IRC		15:14
fungi	frickler: Tengu: increasing the swap space (unless we go back to sparse fallocate which i thought breaks on newer kernels?) is going to break other jobs which rely on that available disk. we've already had folks run into the 1gb decrease from the current swap dd causing job failures in providers with limited disk	15:18
fungi	you know you can adjust how much swap devstack creates in your job configuration, right?	15:18
fungi	it seems to me that any build trying to use >1gb swap is in for a rough time for other reasons, and probably means severe slowdowns occupying nodes out of our quota for far longer than necessary	15:20
clarkb	also 8gb of swap will take mich longer to allocate. I've pointed out periodically for years bow that openstack spftware and jobs could really use a diet. Many of them over deploy services and memory use has skyrocketed since the beginning of the project	15:20
Tengu	fungi: 1gb of swap is kind of similar to nothing.... ? Frankly, I don't care - a backport is blocked due to resource limitations, and it's a bit annoying. Passing to 8gb makes it pass, and that swap thing is NOT realted to the change itself.	15:21
Tengu	maybe we can cut that thing in two and push 4gb instead?	15:22
Tengu	a bit less than 8, but a bit more than 1, and everyone will be happy?	15:22
clarkb	you can set it on your job is what fungi is saying	15:22
*** hashar is now known as hasharOut		15:22
clarkb	rather than trying to make it one size fits all	15:22
clarkb	but also 1gb should be plenty	15:22
Tengu	and do some more hit'n'fail the whole way down.. how to lose time -.-	15:23
clarkb	the swap on the test nodes is there to keep a job on yhe edge form failing. It is not there to double available "memory"	15:23
* Tengu remembers the old days where swap was usually twice the ram		15:23
fungi	yeah, but that doesn't tend to work so well on virtual machines	15:24
fungi	the idea is to add enough swap so that inactive or infrequently accessed pages can be stashed on disk, but 1gb is usually far more than enough for that especially in an hour-long job	15:25
fungi	it's so that you can free up more ram for things like filesystem cache to speed up execution time	15:25
Tengu	doesn't seem to be enough for the 2 devstack jobs on this change: https://review.opendev.org/760911 though it's not really part of the change itself.	15:25
frickler	one day I would also like to learn from our resource providers whether our quota are really limited by ram or more by cpu. in my own clouds, a flavor with 8 vcpu would naturally have 16-24 gig of ram	15:26
fungi	it's not that there's not enough swap, it's that there's not enough ram. if the kernel is trying to stick things in swap because there's truly no more room in ram, then you're sunk	15:26
Tengu	usually the limit is more CPU than RAM in cloud providers....	15:26
Tengu	remembering that time I worked with AWS, the main issue was getting enough cores, not enough ram.	15:26
Tengu	ram is cheap, compared to cpu.	15:27
clarkb	Tengu: yes and for years I have warned opemstack of this problem	15:27
clarkb	no one seems to care	15:27
Tengu	and with motherboards supporting 4TB or ram... :)	15:27
Tengu	clarkb: well.... I know tripleo is something terrible for the ram (and the whole openstack world in the end)	15:27
clarkb	some services are mich larger than others and the privsep daemons are not small	15:28
fungi	part of why we limit the amount of ram to a consistent amount across our providers is that's what jobs tend to try to use the most of, and this is a means of assuring that projects design their tests and test frameworks to not need more than that (so that people can also run them in their personal developer environments or in local virtual machines)	15:28
fungi	when jobs start running out of memory, "just throw some more ram at the problem" isn't indefinitely scalable	15:29
fungi	throw some more effort at not wasting as much ram	15:29
fungi	the root of the issue is that, for the most part, developers are happy not to really have to care how much ram a piece of software uses until they run out, and then it's a scramble to try and figure out how to put the entire project on a diet	15:31
Tengu	so in this case, it's no on me - just trying to push a change that missed a backport. basically, I just don't care, downstream has been done - I'm not the right one to talk into resources for tests -.-	15:32
fungi	also stated as "enjoy your tragedy of the commons"	15:32
fungi	(it's not my fault the park is full of litter, i just wanted to take a walk in it)	15:32
Tengu	and, well, passing from 8gb to 1gb of swap IS an infra change, and there isn't any wonder it makes things failing - the project I'm hitting now didn't get any update for the past 5 months or so in that branch.... so it's a bit "meh" in the end.	15:33
clarkb	do the jobs run cinder backup service?	15:33
fungi	if more people brought a trash back with them when they went for a walk, this would be far less of an issue	15:33
fungi	er, a trash bag	15:34
clarkb	iirc the one thing that was done before was disabling that since no tempest tests exercise it	15:34
Tengu	not that I know... it's apparently testing the openstacksdk itself.	15:34
fungi	Tengu: yes, it's an infra change to allow you to use newer linux kernels. we could instead just decide not to let you test on newer kernels	15:34
Tengu	no idea what it does, but it's apparently running multiple things in the same fashion tempest does	15:34
clarkb	Tengu: that infra change is a rwsult of ext4 breaking sparse allocation for swap files	15:34
clarkb	we dont have control over that	15:34
*** dklyle has joined #openstack-infra		15:34
Tengu	would putting 4g as a global value be acceptable?	15:35
*** wolsen has joined #openstack-infra		15:35
fungi	well, for starters, this discussion is about devstack specifically, so speaking about defaults in devstack it's a question of how many projects will need to adjust the swap value up in their builds to get more virtual memory, vs how many will need to adjust it down to get more disk	15:36
fungi	ideally a good default would minimize both of those	15:36
Tengu	4gigs is probably low enough to have no impact on the actual disk space.	15:37
Tengu	there were 8, before....	15:37
frickler	fungi: do you know which project would be low on disk? the change also affected at least tripleo	15:37
clarkb	it was sparseallocated before	15:37
Tengu	unless the disk size was also reduced?	15:37
clarkb	so not actually 8gb in most cases	15:37
Tengu	humpf..	15:37
clarkb	it would be 0 unless used	15:37
Tengu	thanks ext4 -.-	15:37
fungi	yeah, it was using 0gb disk before unless you started paging things out to swap	15:37
fungi	however, there's the added problem that jobs using more swap are likely doing so not to improve performance but because they've run out of ram, so they're going to start crawling when active memory is being read from/written to swap, which means the job will run way longer and consume more overall quota in our providers which could be used to run different (better performing) jobs instead	15:38
frickler	that's all fine in theory, but nobody is maintaining devstack anymore, in particular stable branches	15:39
fungi	and on top of that, it swap thrash tortures the storage in those providers and in many cases drags down overall i/o for other server instances on the same compute nodes or the same storage networks/backends, leading to "noisy neighbor" issues which actually impact our other builds	15:39
fungi	so choosing a smaller default swap value helps force jobs to abort early (oom) if they try to use too much memory	15:40
fungi	rather than making worse for everyone	15:40
Tengu	is there a way to edit the .zuul.yaml thing in order to actually set the ram we need to make it pass?	15:40
fungi	frickler: it's okay if people don't maintain software in stable branches any more. just eol those branches	15:41
fungi	that's literally what eol means: "nobody's maintaining this"	15:41
Tengu	stable/train isn't going EOL anytime soon.	15:41
Tengu	devstack, on the other hand.....	15:41
fungi	it totally can if openstack wants it to	15:41
*** mordred has joined #openstack-infra		15:41
Tengu	and, well, it's more masking the actual problem....	15:42
Tengu	like passing those jobs as non-voting, in the end.	15:42
fungi	it's a project decision. either find people who want to maintain those jobs and frameworks, or don't and acknowledge to your users that it's not happening	15:42
Tengu	yay - all green.	15:42
tosky	clarkb: I've just noticed the question - in cinder we re-enabled cinder-backup on a few jobs	15:43
fungi	i'm not an active core reviewer on devstack so i'm not going to tell the devstack maintainers not to change their default swap size, but i think it's likely to lead to problems and it's not overall a responsible use of test resources. on the other hand, stable branches get far fewer changes so maybe changing the default in stable/train and earlier is an acceptable compromise?	15:44
fungi	or maybe just un-backporting the fallocate->dd change on branches which don't run on newer kernels	15:45
clarkb	ironic in particular haddisk space issues iirc	15:45
clarkb	may want to double check with them at least	15:45
sean-k-mooney	fungi: well really non of the ci jobs should be swapping	15:45
sean-k-mooney	we all know the amount of ram available in the ci vms its 8GBs	15:46
sean-k-mooney	we shoudl not be running tests in such a way as we swap in the first place	15:46
*** mordred has quit IRC		15:47
sean-k-mooney	if we need to drop the test concurancy to make that happen or use smaller guest vms that is what we should do instead of swapping	15:47
*** wolsen has quit IRC		15:47
Tengu	(for the records, the undercloud is supposed to get 23gb or so.... just saying)	15:48
sean-k-mooney	23GB of what? cinder space or swap?	15:50
Tengu	memory.	15:50
sean-k-mooney	no its not	15:50
sean-k-mooney	not in the ci	15:50
Tengu	according to the specs we provide customers.	15:50
*** bnemec has quit IRC		15:50
sean-k-mooney	not accordign to the spec we provide for test vms	15:50
Tengu	8gigs without any swap for the undercloud is not working (tested here, even the undercloud deploy has issues). anyway....	15:51
Tengu	we'll do the change in devstack itself, and be off with it.	15:51
Tengu	next time I get this kind of issue, I'll probably just abandon the patch and go on other tasks.	15:52
sean-k-mooney	https://docs.opendev.org/opendev/infra-manual/latest/testing.html	15:52
sean-k-mooney	There is at least 8GB of system memory available.	15:52
clarkb	Tengu: right we're dealing with the resources available to us	15:52
clarkb	unfortunately there are no good answers and everything is essentially a compromise	15:52
clarkb	but we're doing our best and when we noticed memory was a problem I sent out the alarm	15:52
clarkb	but its been many months and very little has happened as a result	15:53
Tengu	couldn't we move to xfs instead? iirc it's the default for centos/rhel anyway.. ?	15:53
sean-k-mooney	Tengu: this is really a ooo issue in that it has not keep it minium requirement within the resouce avaiable to test it	15:53
clarkb	Tengu: xfs doesn't do sparse allocations either	15:53
clarkb	aiui it never has whereas ext4 has regressed	15:53
Tengu	sean-k-mooney: not even ooo in fact... devstack isn't ooo, is it?	15:53
sean-k-mooney	no but destack runs fine in the requiremetn you have depending on what service you enable	15:54
Tengu	clarkb: or maybe allocate an actual volume to the VM? though it makes things a bit more complicated to handle.	15:54
sean-k-mooney	you just have to configre the services to work iwthin the 8GB ram and 80GB disk requiremetns	15:54
clarkb	Tengu: we've tried that too and it doesn't work reliably because nova and cinder leak the volumes	15:54
clarkb	Tengu: then you run out of volume quota and can't boot more instances	15:54
Tengu	sean-k-mooney: yeah - well. in this case, again, I didn't do anything with the job description. Just applying a \|#ßæð backport that applied without any issue..... so meh, again.	15:55
clarkb	but you also can't delete the leaked volumes because cinder and nova disagree on them being used or something	15:55
clarkb	(also not all clouds support that)	15:55
Tengu	clarkb: erf.... "great"....	15:55
clarkb	and then you can't delete the old images that the volumes use ...	15:56
clarkb	it turns into a large chain reaction of pain	15:56
Tengu	guess so. Happy to NOT use anything like that for my local lab then. just plain libvirt with some ansible...	15:57
sean-k-mooney	clarkb: the image deletion depends on if you are doing a shallow copy or not but ya	15:57
Tengu	anyway. we'll push the change on devstack, since it's the one thing causing the issue.	15:57
Tengu	frickler: guess you'll abandon your patch against openstack-zuul-jobs?	15:58
clarkb	my suggestion for right now would be to bump up the value on a per job basis	15:58
clarkb	that way we don't break ironic with a global change	15:58
Tengu	basically https://review.opendev.org/757488	15:59
clarkb	and longer term figuring out how to make the jobs leaner would be excellent. I think my original thread on this had a bunch of notes on how people can do that	15:59
*** mordred has joined #openstack-infra		15:59
Tengu	or deprecate devstack.	16:00
Tengu	that's probably what will happen. eventually.	16:00
sean-k-mooney	we can deprecate devstack after we deprecate ooo	16:00
Tengu	first time I see devstack jobs actually.	16:00
sean-k-mooney	devstack is actully one of the better installers IMO	16:00
clarkb	it works well for CI	16:01
clarkb	definitely has its problems	16:01
sean-k-mooney	it does but its easy to debug vs the complexity of others	16:01
* Tengu sticks to his tripleo-lab		16:01
sean-k-mooney	i fine myself graviating to the simpler installer like devstack and kolla-ansible more	16:01
sean-k-mooney	i had hopes for ooo standalone but since i cant get our deployment folks to support that in multinode i have basically given up on that too	16:02
Tengu	standalone isn't for multinode, indeed..	16:02
Tengu	anyway. we'll see if it merge - downstream has been done, I'm fine with it.	16:03
Tengu	now is EOD for me. see you around	16:03
zbr	fungi: clarkb: small doc fix on bindep regarding undocumented compile tag: https://review.opendev.org/#/c/751253/	16:03
zbr	i think some would find it useful	16:03
*** gyee has joined #openstack-infra		16:14
clarkb	http://lists.openstack.org/pipermail/openstack-discuss/2019-July/007493.html found the original thread from last year	16:14
clarkb	Tengu: sean-k-mooney ^ fyi	16:14
*** dwilde has joined #openstack-infra		16:22
*** d34dh0r53 has quit IRC		16:24
*** dwilde has quit IRC		16:24
*** d34dh0r53 has joined #openstack-infra		16:24
*** gyee has quit IRC		16:31
*** dave-mccowan has joined #openstack-infra		16:36
*** wolsen has joined #openstack-infra		16:36
*** gyee has joined #openstack-infra		16:41
*** gyee has quit IRC		16:45
*** rpittau is now known as rpittau\|afk		16:46
*** dtantsur is now known as dtantsur\|afk		16:47
*** bnemec has joined #openstack-infra		17:00
*** hamalq has joined #openstack-infra		17:01
*** lpetrut has joined #openstack-infra		17:02
ade_lee	clarkb, hey - so I tried an experiment yesterday where I commented out the fips bits and just had the node reboot. https://zuul.opendev.org/t/openstack/build/16680efdff1a4b9d8153783ec002a779/log/job-output.txt#871-882	17:05
ade_lee	clarkb, other than the weirdness that the reboot did not just do it immediately, what is notable is that it doesn't come back	17:05
clarkb	looks like it did come back though?	17:06
clarkb	you need to reenable the logger daemon after the reboot which was the original role I shared	17:06
ade_lee	also - I noticed this from the previous run -- where fips is enabled .. https://zuul.opendev.org/t/openstack/build/94645ae4c6b741f0ac352028f29e45a6	17:06
clarkb	but it appears the job continues post reboot	17:06
clarkb	does fips change the hostkeys somehow?	17:07
ade_lee	clarkb, well - when it continues without fips, we get this ..	17:07
ade_lee	https://zuul.opendev.org/t/openstack/build/16680efdff1a4b9d8153783ec002a779	17:07
ade_lee	see the perms denied ..	17:07
clarkb	right but the reboot succeeds?	17:08
clarkb	basically reboot failing seems to be related to fips somehow?	17:08
ade_lee	clarkb, several folks I've chatted to -- weshay\|ruck for instance, indicated to me that nodepool reclaims the node when its rebooted	17:08
clarkb	no it does not	17:08
clarkb	many many many years ago during the zuulv2 days this was true but it hasn't been true since the zuulv3 upgrade	17:09
clarkb	your example shows that this seems to work fine?	17:09
ade_lee	ok cool - thats good to know -- can you repost the role that restarts the logger?	17:10
clarkb	https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/start-zuul-console is the role to run post reboot to get the console log back	17:10
clarkb	without that you see the 2020-11-02 21:12:38.076687 \| [controller] Waiting on logger entries from your example	17:10
fungi	so just including "start-zuul-console" after the reboot ought to do it	17:11
clarkb	and then it is later that the job fails	17:11
ade_lee	I'll try it - as for fips mucking with the keys -- that would be unexpected	17:11
*** elod has quit IRC		17:11
*** elod has joined #openstack-infra		17:12
ade_lee	but if it works when I restart the logger, then something fips related is clearly a problem ..	17:12
ade_lee	clarkb, fungi thanks -- off to experiment some more ..	17:12
*** dave-mccowan has quit IRC		17:12
*** jcapitao has quit IRC		17:15
*** Ajohn has joined #openstack-infra		17:16
*** dave-mccowan has joined #openstack-infra		17:16
*** gfidente is now known as gfidente\|afk		17:21
*** fresta has quit IRC		17:22
*** sshnaidm\|rover is now known as sshnaidm\|afk		17:22
*** lucasagomes has quit IRC		17:24
*** dikonoor has joined #openstack-infra		17:27
*** ociuhandu has quit IRC		17:31
*** dikonoor has quit IRC		17:36
clarkb	just to confirm the mkswap man page seems to indicate that xfs + fallocate wouldn't work either	17:45
*** tosky has quit IRC		17:57
*** derekh has quit IRC		18:03
*** dwilde has joined #openstack-infra		18:05
*** d34dh0r53 has quit IRC		18:05
*** lpetrut has quit IRC		18:26
*** hamalq has quit IRC		18:27
*** hamalq has joined #openstack-infra		18:27
*** arxcruz has quit IRC		18:28
*** gyee has joined #openstack-infra		18:32
*** jpena is now known as jpena\|off		18:33
*** ralonsoh has quit IRC		18:49
*** eolivare has quit IRC		18:52
*** jamesdenton has quit IRC		18:54
*** ianw_pto is now known as ianw		18:59
*** vishalmanchanda has quit IRC		19:02
*** arxcruz has joined #openstack-infra		19:28
ade_lee	clarkb, fungi I think we figured out the problem. This is the error message we are getting .. http://paste.openstack.org/show/799667/	19:32
fungi	ade_lee: indeed, looks like maybe fips mode could have cleared and regenerated the ssh host key?	19:33
fungi	or otherwise changed how it's being served	19:34
ade_lee	clarkb, fungi the problem is not that the keys are regenerated, but that we default to using ed25519 keys when we first set things up	19:34
ade_lee	and those types of keys are disabled in fips	19:34
fungi	aha!	19:34
fungi	good point	19:34
fungi	we'd need zuul to add all available host keys to the known_hosts for that build i guess	19:35
ade_lee	yup -- or default to using ecdsa perhaps	19:36
clarkb	I thought ecdsa was considered less secure ...	19:36
clarkb	something about unexplainable magic numbers	19:36
*** andrewbonney has quit IRC		19:37
ade_lee	clarkb, not sure - but the logs show that its trying to use ecdsa and getting a ed25519 key	19:38
clarkb	ya the issue is we're only registering the ed25519 key	19:38
clarkb	so when it gets something else it can't verify it. As fungi mentions we can probably have zuul write them alldown	19:38
fungi	fips isn't about being "secure" it's about being predictable and vetted	19:39
fungi	the usg (via fips) doesn't trust some newer and ostensibly stronger ciphers and hashes because it hasn't confidently vetted them yet	19:40
ade_lee	clarkb, fungi so - how do we get zuul to write all the keys down?	19:40
fungi	ade_lee: i'll have to think about it more when we're not in the middle of our weekly meeting	19:40
ade_lee	fungi, ack thanks	19:41
fungi	corvus: ^ this is an interesting corner case for zuul's handling of node ssh host keys	20:01
fungi	i'm refreshing my memory now on where we do the ssh-keygen call to fetch those	20:01
clarkb	fungi: its in nodepool utils.nodescan	20:06
clarkb	then passed all the way up the chain from there	20:06
fungi	aha, so we'd need to plumb it through zk, maybe need to change the message format?	20:09
fungi	though perhaps multiline content there is fine	20:09
clarkb	well it seems nodepool grabs all the keys	20:09
clarkb	I'm working from there to see if I can find where we might select the key type to use	20:09
clarkb	hrm it seems like zuul writes all the keys it gets	20:11
clarkb	is it possible that centos-8 is only generating ed25519 by default?	20:11
clarkb	oh unless it is paramiko only grabing a single key	20:13
clarkb	https://opendev.org/zuul/nodepool/src/branch/master/nodepool/nodeutils.py#L88 that may be it	20:13
clarkb	everything else assumes a list but there we may only grab a single key	20:13
clarkb	the comment in there also says paramiko only grabs the rsa key	20:14
clarkb	I'm guessing that has changed :)	20:14
clarkb	http://docs.paramiko.org/en/stable/api/transport.html#paramiko.transport.Transport.get_remote_server_key is very quiet about what that actually does	20:15
*** jamesdenton has joined #openstack-infra		20:17
clarkb	skimming the code I think it is using the host key that was used to establish the connection so not a truescan	20:17
clarkb	what we want is an ssh-keyscan equivalent for paramiko	20:21
*** tosky has joined #openstack-infra		20:27
clarkb	reading rfc 4253 I think what you can do is a series of connections each with a different supported host key type by the client	20:32
clarkb	not sure if paramiko exposes that though	20:32
*** slaweq has quit IRC		20:33
*** slaweq has joined #openstack-infra		20:44
*** zxiiro has joined #openstack-infra		20:59
*** raissa has joined #openstack-infra		21:01
*** hasharOut is now known as hashar		21:12
*** rcernin has joined #openstack-infra		21:18
*** rcernin has quit IRC		21:18
*** rcernin has joined #openstack-infra		21:19
clarkb	ade_lee: fungi https://review.opendev.org/761229 is a rudimentary attempt at addressing this in nodepool	21:21
*** rfolco has quit IRC		21:24
*** rfolco has joined #openstack-infra		21:24
ade_lee	clarkb, cool thanks -- I'll add as a depends-on in my patch and see if it works!	21:24
clarkb	well depends on won't help here	21:24
clarkb	that is part of the nodepool service running in production that hands out test nodes and isn't speculatively applied	21:25
ade_lee	clarkb, ah ok	21:25
*** raissa has quit IRC		21:25
ade_lee	clarkb, so I have to wait for it to be merged to try it out?	21:26
clarkb	fwiw it seems that paramiko is what prefers ed25519	21:26
*** ChanServ has quit IRC		21:26
clarkb	https://github.com/paramiko/paramiko/blob/dccc563c3a581a14ceaa261b5fe0d08539f84b87/paramiko/transport.py#L171-L178 is the list I think	21:26
clarkb	and that is why we end up using that by default	21:27
ade_lee	clarkb, one of the reasons paramiko is not fips certified ..	21:28
clarkb	I mean you can disable it which is essentially how my change works	21:28
clarkb	but also I'm not really convinced in the value of fips if it bypasses ed25519 in favor of ecdsa	21:28
clarkb	since literally all the writing on the subject seems to be that ed25519 is far superior	21:28
*** rfolco has quit IRC		21:29
ade_lee	clarkb, as fungi mentioned, fips is all about using things that are vetted and predictable -- you have to use fips certified algorithms etc.	21:30
clarkb	right but the reason people don't like ecdsa is it has unvettable magic numbers	21:30
clarkb	whereas the other algorithm is actually understandable	21:30
clarkb	(which makes me think it isn't even achieving that goal)	21:31
clarkb	but I guess if you are the same entity creating the fips vetting and the magic numbers...	21:31
ade_lee	:/	21:31
*** ChanServ has joined #openstack-infra		21:32
*** tepper.freenode.net sets mode: +o ChanServ		21:32
clarkb	and ya apparently it is NIST in both cases	21:32
ade_lee	yup	21:32
fungi	clarkb: "unvetted" depends on who's doing the vetting	21:46
fungi	ahh, yeah, as you said	21:46
fungi	fips is really mostly only for the usg and contractors... anyone who doesn't trust "certain three-letter agencies" should probably not use fips mode	21:47
*** gfidente\|afk has quit IRC		21:52
*** matt_kosut has quit IRC		21:59
*** hashar has quit IRC		22:01
*** slaweq has quit IRC		22:02
*** aedc has joined #openstack-infra		22:03
*** aedc has quit IRC		22:04
*** sboyron has quit IRC		22:34
*** tkajinam has joined #openstack-infra		22:59
*** tosky has quit IRC		23:01
*** mmethot has quit IRC		23:18
*** dchen has joined #openstack-infra		23:34
*** mmethot has joined #openstack-infra		23:34

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!