*** lseki has joined #openstack-infra | 00:00 | |
*** tosky has quit IRC | 00:26 | |
*** yamamoto has joined #openstack-infra | 00:32 | |
*** yamamoto has quit IRC | 00:39 | |
*** yamamoto has joined #openstack-infra | 00:48 | |
*** yamamoto has quit IRC | 00:48 | |
*** yamamoto has joined #openstack-infra | 00:49 | |
*** mmethot_ has quit IRC | 00:51 | |
*** yamamoto has quit IRC | 00:54 | |
*** ramishra has joined #openstack-infra | 00:55 | |
openstackgerrit | wu.shiming proposed openstack/openstack-zuul-jobs master: Fix hacking min version to 3.2.0 https://review.opendev.org/760746 | 00:59 |
---|---|---|
*** zzzeek has quit IRC | 01:13 | |
*** zzzeek has joined #openstack-infra | 01:16 | |
*** lbragstad has joined #openstack-infra | 01:37 | |
*** yamamoto has joined #openstack-infra | 01:37 | |
*** yamamoto has quit IRC | 01:39 | |
*** yamamoto has joined #openstack-infra | 01:40 | |
*** hamalq has quit IRC | 01:41 | |
*** yamamoto has quit IRC | 01:41 | |
*** yamamoto has joined #openstack-infra | 01:42 | |
*** yamamoto has quit IRC | 01:50 | |
*** lbragstad has quit IRC | 02:13 | |
*** lbragstad has joined #openstack-infra | 02:40 | |
*** kaisers has quit IRC | 02:48 | |
*** rcernin has quit IRC | 02:50 | |
*** ianychoi has quit IRC | 03:00 | |
*** rcernin has joined #openstack-infra | 03:00 | |
*** rcernin has quit IRC | 03:07 | |
*** rcernin has joined #openstack-infra | 03:20 | |
*** ianychoi has joined #openstack-infra | 03:21 | |
*** rcernin has quit IRC | 03:25 | |
*** lbragstad has quit IRC | 03:27 | |
*** rcernin has joined #openstack-infra | 03:31 | |
*** lbragstad has joined #openstack-infra | 03:33 | |
*** ociuhandu has joined #openstack-infra | 03:40 | |
*** rcernin has quit IRC | 03:40 | |
*** rcernin has joined #openstack-infra | 03:40 | |
*** ociuhandu has quit IRC | 03:49 | |
*** auristor has quit IRC | 03:49 | |
*** auristor has joined #openstack-infra | 03:53 | |
*** ykarel|away has joined #openstack-infra | 04:43 | |
*** lbragstad has quit IRC | 04:44 | |
*** mtreinish has quit IRC | 04:44 | |
*** ykarel|away is now known as ykarel | 04:47 | |
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #openstack-infra | 05:33 | |
*** matt_kosut has joined #openstack-infra | 05:53 | |
*** ysandeep|away is now known as ysandeep | 05:53 | |
*** ysandeep is now known as ysandeep|ruck | 05:53 | |
*** lmiccini has joined #openstack-infra | 06:35 | |
*** rcernin has quit IRC | 06:45 | |
*** gnuoy has joined #openstack-infra | 06:48 | |
*** ramishra has quit IRC | 07:14 | |
*** vishalmanchanda has joined #openstack-infra | 07:22 | |
*** dklyle has quit IRC | 07:28 | |
*** ramishra has joined #openstack-infra | 07:37 | |
*** ralonsoh has joined #openstack-infra | 07:41 | |
*** eolivare has joined #openstack-infra | 07:42 | |
*** zzzeek has quit IRC | 07:55 | |
*** zzzeek has joined #openstack-infra | 07:56 | |
*** slaweq has joined #openstack-infra | 08:01 | |
*** hashar has joined #openstack-infra | 08:03 | |
*** jcapitao has joined #openstack-infra | 08:05 | |
*** sboyron has joined #openstack-infra | 08:11 | |
*** andrewbonney has joined #openstack-infra | 08:14 | |
*** rpittau|afk is now known as rpittau | 08:19 | |
*** gfidente has joined #openstack-infra | 08:35 | |
*** lpetrut has joined #openstack-infra | 08:36 | |
*** tosky has joined #openstack-infra | 08:40 | |
*** ykarel has quit IRC | 08:42 | |
*** matbu has quit IRC | 08:51 | |
*** matbu has joined #openstack-infra | 08:52 | |
*** jpena|off is now known as jpena | 08:56 | |
*** ysandeep|ruck is now known as ysandeep|lunch | 08:58 | |
*** lyarwood has quit IRC | 09:01 | |
*** lucasagomes has joined #openstack-infra | 09:04 | |
*** sboyron has quit IRC | 09:12 | |
*** sboyron has joined #openstack-infra | 09:12 | |
*** ociuhandu has joined #openstack-infra | 09:19 | |
*** dtantsur|afk is now known as dtantsur | 09:35 | |
*** derekh has joined #openstack-infra | 09:37 | |
*** nightmare_unreal has joined #openstack-infra | 09:41 | |
Tengu | hello there | 09:52 |
Tengu | fyi there are apparently CI jobes failing with oom-killer on a regulare basis. Not sure if it has to be reported here though... ? | 09:53 |
*** sshnaidm|afk is now known as sshnaidm|rover | 09:56 | |
*** ysandeep|lunch is now known as ysandeep|ruck | 10:02 | |
*** ykarel has joined #openstack-infra | 10:06 | |
frickler | infra-root: ^^ see https://review.opendev.org/757488 which seems to fix this, do we want to globally revert the swapsize to 8G? | 10:33 |
*** d34dh0r53 has quit IRC | 10:50 | |
*** dchen has quit IRC | 10:51 | |
*** soniya29 is now known as soniya29|afk | 11:11 | |
*** dtantsur is now known as dtantsur|bbl | 11:16 | |
*** jcapitao is now known as jcapitao_lunch | 11:36 | |
openstackgerrit | Jens Harbott (frickler) proposed openstack/openstack-zuul-jobs master: Set default swap size back to 8G https://review.opendev.org/761119 | 11:43 |
frickler | Tengu: ^^ FYI | 11:44 |
*** mtreinish has joined #openstack-infra | 11:47 | |
*** rfolco has joined #openstack-infra | 12:00 | |
Tengu | frickler: ah, thanks. So I should update my depends-on then | 12:09 |
Tengu | frickler: and moving to 8gb makes the two jobs pass in my review! | 12:09 |
*** janders has joined #openstack-infra | 12:16 | |
janders | I'm having permissions issues with my gerrit account, I suspect it has to do with the cleanup after the security issue a couple weeks back. Do you happen to know how to get some assistance with this? Thanks in advance. | 12:17 |
*** eolivare has quit IRC | 12:19 | |
frickler | Tengu: I don't think a dependency will work on this patch since it is a config project | 12:23 |
*** rlandy has joined #openstack-infra | 12:26 | |
*** jcapitao_lunch is now known as jcapitao | 12:27 | |
*** jpena is now known as jpena|lunch | 12:32 | |
Tengu | janders: did you update to fedora-33? if so, it might also be due to some new policies being enforced in ssh client | 12:33 |
janders | Tengu yes! | 12:33 |
janders | Tengu setenforce 0 and retest? | 12:33 |
Tengu | janders: nope, not related to selinux | 12:33 |
Tengu | wait a second | 12:33 |
janders | Tengu ACK | 12:34 |
Tengu | janders: I didn't upgrade to fc-33, but there's a thing that changed in /etc/ssh/ssh_config.d/50-redhat.conf | 12:34 |
Tengu | janders: as a quick workaround, you can comment out the "Include /etc/ssh/ssh_config.d/*.conf" in /etc/ssh/ssh_config file | 12:34 |
Tengu | no restart is needed, since it's only for the ssh client, that re-reads this file upon each invocation. | 12:35 |
Tengu | not sure about the proper correction - maybe it's related to the ssh key.... as said, I didn't upgrade yet. | 12:35 |
janders | Tengu trying this now | 12:36 |
Tengu | janders: is it better? | 12:37 |
janders | Tengu no change | 12:37 |
Tengu | oh? err... weird. | 12:38 |
Tengu | janders: https://www.reddit.com/r/Fedora/comments/jhxbdh/no_ssh_public_key_auth_after_upgrade_to_fedora_33/ | 12:38 |
Tengu | just found that one. | 12:38 |
Tengu | might be better. | 12:38 |
ykarel | Tengu, you seen https://fedoraproject.org/wiki/Changes/StrongCryptoSettings2 ? | 12:42 |
ykarel | if things not work, ^ suggests update-crypto-policies --set DEFAULT:FEDORA32 | 12:42 |
ykarel | or update-crypto-policies --set LEGACY | 12:42 |
Tengu | ykarel: one of the proposal in the reddit link points to that update-crypto-policies | 12:43 |
Tengu | guess I should be good with my key(s) here. | 12:44 |
Tengu | but it will become funky once I update my private infra. Time to re-key a bit. | 12:44 |
ykarel | Tengu, ack i have not faced the issue but someone shared above link related to above issue so shared here | 12:45 |
Tengu | ykarel: thanks for sharing :). as said - I'm still on fc-32 | 12:46 |
*** lbragstad has joined #openstack-infra | 12:53 | |
*** dtantsur|bbl is now known as dtantsur | 12:54 | |
janders | thank you for your assistance frickler Tengu ykarel I'm back in business | 12:55 |
janders | it was some weirdness with ssh ports | 12:55 |
frickler | janders: great to hear that, let us know if you see any further issues | 12:58 |
janders | frickler thank you, will do! | 12:58 |
*** eolivare has joined #openstack-infra | 12:59 | |
*** __ministry1 has joined #openstack-infra | 13:01 | |
*** lpetrut has quit IRC | 13:02 | |
*** lbragstad has quit IRC | 13:06 | |
*** lbragstad has joined #openstack-infra | 13:09 | |
*** __ministry1 has quit IRC | 13:21 | |
*** __ministry1 has joined #openstack-infra | 13:21 | |
*** derekh has quit IRC | 13:27 | |
*** mmethot has joined #openstack-infra | 13:31 | |
*** jpena|lunch is now known as jpena | 13:32 | |
*** raukadah is now known as chandankumar | 13:32 | |
*** __ministry1 has quit IRC | 13:38 | |
*** __ministry1 has joined #openstack-infra | 13:38 | |
*** slaweq has quit IRC | 13:42 | |
*** __ministry1 has quit IRC | 13:46 | |
*** __ministry1 has joined #openstack-infra | 13:46 | |
*** slaweq has joined #openstack-infra | 13:47 | |
Tengu | frickler: fyi the depends-on is working :). | 13:49 |
Tengu | let's get your patch merged now. | 13:49 |
*** __ministry1 has quit IRC | 13:49 | |
*** __ministry1 has joined #openstack-infra | 13:51 | |
*** __ministry1 has quit IRC | 13:52 | |
*** __ministry1 has joined #openstack-infra | 13:59 | |
*** __ministry1 has quit IRC | 14:01 | |
*** __ministry1 has joined #openstack-infra | 14:02 | |
*** derekh has joined #openstack-infra | 14:05 | |
*** __ministry1 has quit IRC | 14:06 | |
*** d34dh0r53 has joined #openstack-infra | 14:10 | |
*** jamesdenton has joined #openstack-infra | 14:17 | |
*** Goneri has joined #openstack-infra | 14:36 | |
*** ykarel is now known as ykarel|away | 14:44 | |
*** lmiccini has quit IRC | 14:45 | |
*** ysandeep|ruck is now known as ysandeep|away | 14:55 | |
*** lmiccini has joined #openstack-infra | 14:55 | |
*** wolsen has quit IRC | 15:00 | |
*** mordred has quit IRC | 15:00 | |
*** kwazar has joined #openstack-infra | 15:05 | |
*** ykarel|away has quit IRC | 15:14 | |
fungi | frickler: Tengu: increasing the swap space (unless we go back to sparse fallocate which i thought breaks on newer kernels?) is going to break other jobs which rely on that available disk. we've already had folks run into the 1gb decrease from the current swap dd causing job failures in providers with limited disk | 15:18 |
fungi | you know you can adjust how much swap devstack creates in your job configuration, right? | 15:18 |
fungi | it seems to me that any build trying to use >1gb swap is in for a rough time for other reasons, and probably means severe slowdowns occupying nodes out of our quota for far longer than necessary | 15:20 |
clarkb | also 8gb of swap will take mich longer to allocate. I've pointed out periodically for years bow that openstack spftware and jobs could really use a diet. Many of them over deploy services and memory use has skyrocketed since the beginning of the project | 15:20 |
Tengu | fungi: 1gb of swap is kind of similar to nothing.... ? Frankly, I don't care - a backport is blocked due to resource limitations, and it's a bit annoying. Passing to 8gb makes it pass, and that swap thing is NOT realted to the change itself. | 15:21 |
Tengu | maybe we can cut that thing in two and push 4gb instead? | 15:22 |
Tengu | a bit less than 8, but a bit more than 1, and everyone will be happy? | 15:22 |
clarkb | you can set it on your job is what fungi is saying | 15:22 |
*** hashar is now known as hasharOut | 15:22 | |
clarkb | rather than trying to make it one size fits all | 15:22 |
clarkb | but also 1gb should be plenty | 15:22 |
Tengu | and do some more hit'n'fail the whole way down.. how to lose time -.- | 15:23 |
clarkb | the swap on the test nodes is there to keep a job on yhe edge form failing. It is not there to double available "memory" | 15:23 |
* Tengu remembers the old days where swap was usually twice the ram | 15:23 | |
fungi | yeah, but that doesn't tend to work so well on virtual machines | 15:24 |
fungi | the idea is to add enough swap so that inactive or infrequently accessed pages can be stashed on disk, but 1gb is usually far more than enough for that especially in an hour-long job | 15:25 |
fungi | it's so that you can free up more ram for things like filesystem cache to speed up execution time | 15:25 |
Tengu | doesn't seem to be enough for the 2 devstack jobs on this change: https://review.opendev.org/760911 though it's not really part of the change itself. | 15:25 |
frickler | one day I would also like to learn from our resource providers whether our quota are really limited by ram or more by cpu. in my own clouds, a flavor with 8 vcpu would naturally have 16-24 gig of ram | 15:26 |
fungi | it's not that there's not enough swap, it's that there's not enough ram. if the kernel is trying to stick things in swap because there's truly no more room in ram, then you're sunk | 15:26 |
Tengu | usually the limit is more CPU than RAM in cloud providers.... | 15:26 |
Tengu | remembering that time I worked with AWS, the main issue was getting enough cores, not enough ram. | 15:26 |
Tengu | ram is cheap, compared to cpu. | 15:27 |
clarkb | Tengu: yes and for years I have warned opemstack of this problem | 15:27 |
clarkb | no one seems to care | 15:27 |
Tengu | and with motherboards supporting 4TB or ram... :) | 15:27 |
Tengu | clarkb: well.... I know tripleo is something terrible for the ram (and the whole openstack world in the end) | 15:27 |
clarkb | some services are mich larger than others and the privsep daemons are not small | 15:28 |
fungi | part of why we limit the amount of ram to a consistent amount across our providers is that's what jobs tend to try to use the most of, and this is a means of assuring that projects design their tests and test frameworks to not need more than that (so that people can also run them in their personal developer environments or in local virtual machines) | 15:28 |
fungi | when jobs start running out of memory, "just throw some more ram at the problem" isn't indefinitely scalable | 15:29 |
fungi | throw some more effort at not wasting as much ram | 15:29 |
fungi | the root of the issue is that, for the most part, developers are happy not to really have to care how much ram a piece of software uses until they run out, and then it's a scramble to try and figure out how to put the entire project on a diet | 15:31 |
Tengu | so in this case, it's no on me - just trying to push a change that missed a backport. basically, I just don't care, downstream has been done - I'm not the right one to talk into resources for tests -.- | 15:32 |
fungi | also stated as "enjoy your tragedy of the commons" | 15:32 |
fungi | (it's not my fault the park is full of litter, i just wanted to take a walk in it) | 15:32 |
Tengu | and, well, passing from 8gb to 1gb of swap IS an infra change, and there isn't any wonder it makes things failing - the project I'm hitting now didn't get any update for the past 5 months or so in that branch.... so it's a bit "meh" in the end. | 15:33 |
clarkb | do the jobs run cinder backup service? | 15:33 |
fungi | if more people brought a trash back with them when they went for a walk, this would be far less of an issue | 15:33 |
fungi | er, a trash bag | 15:34 |
clarkb | iirc the one thing that was done before was disabling that since no tempest tests exercise it | 15:34 |
Tengu | not that I know... it's apparently testing the openstacksdk itself. | 15:34 |
fungi | Tengu: yes, it's an infra change to allow you to use newer linux kernels. we could instead just decide not to let you test on newer kernels | 15:34 |
Tengu | no idea what it does, but it's apparently running multiple things in the same fashion tempest does | 15:34 |
clarkb | Tengu: that infra change is a rwsult of ext4 breaking sparse allocation for swap files | 15:34 |
clarkb | we dont have control over that | 15:34 |
*** dklyle has joined #openstack-infra | 15:34 | |
Tengu | would putting 4g as a global value be acceptable? | 15:35 |
*** wolsen has joined #openstack-infra | 15:35 | |
fungi | well, for starters, this discussion is about devstack specifically, so speaking about defaults in devstack it's a question of how many projects will need to adjust the swap value up in their builds to get more virtual memory, vs how many will need to adjust it down to get more disk | 15:36 |
fungi | ideally a good default would minimize both of those | 15:36 |
Tengu | 4gigs is probably low enough to have no impact on the actual disk space. | 15:37 |
Tengu | there were 8, before.... | 15:37 |
frickler | fungi: do you know which project would be low on disk? the change also affected at least tripleo | 15:37 |
clarkb | it was sparseallocated before | 15:37 |
Tengu | unless the disk size was also reduced? | 15:37 |
clarkb | so not actually 8gb in most cases | 15:37 |
Tengu | humpf.. | 15:37 |
clarkb | it would be 0 unless used | 15:37 |
Tengu | thanks ext4 -.- | 15:37 |
fungi | yeah, it was using 0gb disk before unless you started paging things out to swap | 15:37 |
fungi | however, there's the added problem that jobs using more swap are likely doing so not to improve performance but because they've run out of ram, so they're going to start crawling when active memory is being read from/written to swap, which means the job will run way longer and consume more overall quota in our providers which could be used to run different (better performing) jobs instead | 15:38 |
frickler | that's all fine in theory, but nobody is maintaining devstack anymore, in particular stable branches | 15:39 |
fungi | and on top of that, it swap thrash tortures the storage in those providers and in many cases drags down overall i/o for other server instances on the same compute nodes or the same storage networks/backends, leading to "noisy neighbor" issues which actually impact our other builds | 15:39 |
fungi | so choosing a smaller default swap value helps force jobs to abort early (oom) if they try to use too much memory | 15:40 |
fungi | rather than making worse for everyone | 15:40 |
Tengu | is there a way to edit the .zuul.yaml thing in order to actually set the ram we need to make it pass? | 15:40 |
fungi | frickler: it's okay if people don't maintain software in stable branches any more. just eol those branches | 15:41 |
fungi | that's literally what eol means: "nobody's maintaining this" | 15:41 |
Tengu | stable/train isn't going EOL anytime soon. | 15:41 |
Tengu | devstack, on the other hand..... | 15:41 |
fungi | it totally can if openstack wants it to | 15:41 |
*** mordred has joined #openstack-infra | 15:41 | |
Tengu | and, well, it's more masking the actual problem.... | 15:42 |
Tengu | like passing those jobs as non-voting, in the end. | 15:42 |
fungi | it's a project decision. either find people who want to maintain those jobs and frameworks, or don't and acknowledge to your users that it's not happening | 15:42 |
Tengu | yay - all green. | 15:42 |
tosky | clarkb: I've just noticed the question - in cinder we re-enabled cinder-backup on a few jobs | 15:43 |
fungi | i'm not an active core reviewer on devstack so i'm not going to tell the devstack maintainers not to change their default swap size, but i think it's likely to lead to problems and it's not overall a responsible use of test resources. on the other hand, stable branches get far fewer changes so maybe changing the default in stable/train and earlier is an acceptable compromise? | 15:44 |
fungi | or maybe just un-backporting the fallocate->dd change on branches which don't run on newer kernels | 15:45 |
clarkb | ironic in particular haddisk space issues iirc | 15:45 |
clarkb | may want to double check with them at least | 15:45 |
sean-k-mooney | fungi: well really non of the ci jobs should be swapping | 15:45 |
sean-k-mooney | we all know the amount of ram available in the ci vms its 8GBs | 15:46 |
sean-k-mooney | we shoudl not be running tests in such a way as we swap in the first place | 15:46 |
*** mordred has quit IRC | 15:47 | |
sean-k-mooney | if we need to drop the test concurancy to make that happen or use smaller guest vms that is what we should do instead of swapping | 15:47 |
*** wolsen has quit IRC | 15:47 | |
Tengu | (for the records, the undercloud is supposed to get 23gb or so.... just saying) | 15:48 |
sean-k-mooney | 23GB of what? cinder space or swap? | 15:50 |
Tengu | memory. | 15:50 |
sean-k-mooney | no its not | 15:50 |
sean-k-mooney | not in the ci | 15:50 |
Tengu | according to the specs we provide customers. | 15:50 |
*** bnemec has quit IRC | 15:50 | |
sean-k-mooney | not accordign to the spec we provide for test vms | 15:50 |
Tengu | 8gigs without any swap for the undercloud is not working (tested here, even the undercloud deploy has issues). anyway.... | 15:51 |
Tengu | we'll do the change in devstack itself, and be off with it. | 15:51 |
Tengu | next time I get this kind of issue, I'll probably just abandon the patch and go on other tasks. | 15:52 |
sean-k-mooney | https://docs.opendev.org/opendev/infra-manual/latest/testing.html | 15:52 |
sean-k-mooney | There is at least 8GB of system memory available. | 15:52 |
clarkb | Tengu: right we're dealing with the resources available to us | 15:52 |
clarkb | unfortunately there are no good answers and everything is essentially a compromise | 15:52 |
clarkb | but we're doing our best and when we noticed memory was a problem I sent out the alarm | 15:52 |
clarkb | but its been many months and very little has happened as a result | 15:53 |
Tengu | couldn't we move to xfs instead? iirc it's the default for centos/rhel anyway.. ? | 15:53 |
sean-k-mooney | Tengu: this is really a ooo issue in that it has not keep it minium requirement within the resouce avaiable to test it | 15:53 |
clarkb | Tengu: xfs doesn't do sparse allocations either | 15:53 |
clarkb | aiui it never has whereas ext4 has regressed | 15:53 |
Tengu | sean-k-mooney: not even ooo in fact... devstack isn't ooo, is it? | 15:53 |
sean-k-mooney | no but destack runs fine in the requiremetn you have depending on what service you enable | 15:54 |
Tengu | clarkb: or maybe allocate an actual volume to the VM? though it makes things a bit more complicated to handle. | 15:54 |
sean-k-mooney | you just have to configre the services to work iwthin the 8GB ram and 80GB disk requiremetns | 15:54 |
clarkb | Tengu: we've tried that too and it doesn't work reliably because nova and cinder leak the volumes | 15:54 |
clarkb | Tengu: then you run out of volume quota and can't boot more instances | 15:54 |
Tengu | sean-k-mooney: yeah - well. in this case, again, I didn't do anything with the job description. Just applying a |#ßæð backport that applied without any issue..... so meh, again. | 15:55 |
clarkb | but you also can't delete the leaked volumes because cinder and nova disagree on them being used or something | 15:55 |
clarkb | (also not all clouds support that) | 15:55 |
Tengu | clarkb: erf.... "great".... | 15:55 |
clarkb | and then you can't delete the old images that the volumes use ... | 15:56 |
clarkb | it turns into a large chain reaction of pain | 15:56 |
Tengu | guess so. Happy to NOT use anything like that for my local lab then. just plain libvirt with some ansible... | 15:57 |
sean-k-mooney | clarkb: the image deletion depends on if you are doing a shallow copy or not but ya | 15:57 |
Tengu | anyway. we'll push the change on devstack, since it's the one thing causing the issue. | 15:57 |
Tengu | frickler: guess you'll abandon your patch against openstack-zuul-jobs? | 15:58 |
clarkb | my suggestion for right now would be to bump up the value on a per job basis | 15:58 |
clarkb | that way we don't break ironic with a global change | 15:58 |
Tengu | basically https://review.opendev.org/757488 | 15:59 |
clarkb | and longer term figuring out how to make the jobs leaner would be excellent. I think my original thread on this had a bunch of notes on how people can do that | 15:59 |
*** mordred has joined #openstack-infra | 15:59 | |
Tengu | or deprecate devstack. | 16:00 |
Tengu | that's probably what will happen. eventually. | 16:00 |
sean-k-mooney | we can deprecate devstack after we deprecate ooo | 16:00 |
Tengu | first time I see devstack jobs actually. | 16:00 |
sean-k-mooney | devstack is actully one of the better installers IMO | 16:00 |
clarkb | it works well for CI | 16:01 |
clarkb | definitely has its problems | 16:01 |
sean-k-mooney | it does but its easy to debug vs the complexity of others | 16:01 |
* Tengu sticks to his tripleo-lab | 16:01 | |
sean-k-mooney | i fine myself graviating to the simpler installer like devstack and kolla-ansible more | 16:01 |
sean-k-mooney | i had hopes for ooo standalone but since i cant get our deployment folks to support that in multinode i have basically given up on that too | 16:02 |
Tengu | standalone isn't for multinode, indeed.. | 16:02 |
Tengu | anyway. we'll see if it merge - downstream has been done, I'm fine with it. | 16:03 |
Tengu | now is EOD for me. see you around | 16:03 |
zbr | fungi: clarkb: small doc fix on bindep regarding undocumented compile tag: https://review.opendev.org/#/c/751253/ | 16:03 |
zbr | i think some would find it useful | 16:03 |
*** gyee has joined #openstack-infra | 16:14 | |
clarkb | http://lists.openstack.org/pipermail/openstack-discuss/2019-July/007493.html found the original thread from last year | 16:14 |
clarkb | Tengu: sean-k-mooney ^ fyi | 16:14 |
*** dwilde has joined #openstack-infra | 16:22 | |
*** d34dh0r53 has quit IRC | 16:24 | |
*** dwilde has quit IRC | 16:24 | |
*** d34dh0r53 has joined #openstack-infra | 16:24 | |
*** gyee has quit IRC | 16:31 | |
*** dave-mccowan has joined #openstack-infra | 16:36 | |
*** wolsen has joined #openstack-infra | 16:36 | |
*** gyee has joined #openstack-infra | 16:41 | |
*** gyee has quit IRC | 16:45 | |
*** rpittau is now known as rpittau|afk | 16:46 | |
*** dtantsur is now known as dtantsur|afk | 16:47 | |
*** bnemec has joined #openstack-infra | 17:00 | |
*** hamalq has joined #openstack-infra | 17:01 | |
*** lpetrut has joined #openstack-infra | 17:02 | |
ade_lee | clarkb, hey - so I tried an experiment yesterday where I commented out the fips bits and just had the node reboot. https://zuul.opendev.org/t/openstack/build/16680efdff1a4b9d8153783ec002a779/log/job-output.txt#871-882 | 17:05 |
ade_lee | clarkb, other than the weirdness that the reboot did not just do it immediately, what is notable is that it doesn't come back | 17:05 |
clarkb | looks like it did come back though? | 17:06 |
clarkb | you need to reenable the logger daemon after the reboot which was the original role I shared | 17:06 |
ade_lee | also - I noticed this from the previous run -- where fips is enabled .. https://zuul.opendev.org/t/openstack/build/94645ae4c6b741f0ac352028f29e45a6 | 17:06 |
clarkb | but it appears the job continues post reboot | 17:06 |
clarkb | does fips change the hostkeys somehow? | 17:07 |
ade_lee | clarkb, well - when it continues without fips, we get this .. | 17:07 |
ade_lee | https://zuul.opendev.org/t/openstack/build/16680efdff1a4b9d8153783ec002a779 | 17:07 |
ade_lee | see the perms denied .. | 17:07 |
clarkb | right but the reboot succeeds? | 17:08 |
clarkb | basically reboot failing seems to be related to fips somehow? | 17:08 |
ade_lee | clarkb, several folks I've chatted to -- weshay|ruck for instance, indicated to me that nodepool reclaims the node when its rebooted | 17:08 |
clarkb | no it does not | 17:08 |
clarkb | many many many years ago during the zuulv2 days this was true but it hasn't been true since the zuulv3 upgrade | 17:09 |
clarkb | your example shows that this seems to work fine? | 17:09 |
ade_lee | ok cool - thats good to know -- can you repost the role that restarts the logger? | 17:10 |
clarkb | https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/start-zuul-console is the role to run post reboot to get the console log back | 17:10 |
clarkb | without that you see the 2020-11-02 21:12:38.076687 | [controller] Waiting on logger entries from your example | 17:10 |
fungi | so just including "start-zuul-console" after the reboot ought to do it | 17:11 |
clarkb | and then it is later that the job fails | 17:11 |
ade_lee | I'll try it - as for fips mucking with the keys -- that would be unexpected | 17:11 |
*** elod has quit IRC | 17:11 | |
*** elod has joined #openstack-infra | 17:12 | |
ade_lee | but if it works when I restart the logger, then something fips related is clearly a problem .. | 17:12 |
ade_lee | clarkb, fungi thanks -- off to experiment some more .. | 17:12 |
*** dave-mccowan has quit IRC | 17:12 | |
*** jcapitao has quit IRC | 17:15 | |
*** Ajohn has joined #openstack-infra | 17:16 | |
*** dave-mccowan has joined #openstack-infra | 17:16 | |
*** gfidente is now known as gfidente|afk | 17:21 | |
*** fresta has quit IRC | 17:22 | |
*** sshnaidm|rover is now known as sshnaidm|afk | 17:22 | |
*** lucasagomes has quit IRC | 17:24 | |
*** dikonoor has joined #openstack-infra | 17:27 | |
*** ociuhandu has quit IRC | 17:31 | |
*** dikonoor has quit IRC | 17:36 | |
clarkb | just to confirm the mkswap man page seems to indicate that xfs + fallocate wouldn't work either | 17:45 |
*** tosky has quit IRC | 17:57 | |
*** derekh has quit IRC | 18:03 | |
*** dwilde has joined #openstack-infra | 18:05 | |
*** d34dh0r53 has quit IRC | 18:05 | |
*** lpetrut has quit IRC | 18:26 | |
*** hamalq has quit IRC | 18:27 | |
*** hamalq has joined #openstack-infra | 18:27 | |
*** arxcruz has quit IRC | 18:28 | |
*** gyee has joined #openstack-infra | 18:32 | |
*** jpena is now known as jpena|off | 18:33 | |
*** ralonsoh has quit IRC | 18:49 | |
*** eolivare has quit IRC | 18:52 | |
*** jamesdenton has quit IRC | 18:54 | |
*** ianw_pto is now known as ianw | 18:59 | |
*** vishalmanchanda has quit IRC | 19:02 | |
*** arxcruz has joined #openstack-infra | 19:28 | |
ade_lee | clarkb, fungi I think we figured out the problem. This is the error message we are getting .. http://paste.openstack.org/show/799667/ | 19:32 |
fungi | ade_lee: indeed, looks like maybe fips mode could have cleared and regenerated the ssh host key? | 19:33 |
fungi | or otherwise changed how it's being served | 19:34 |
ade_lee | clarkb, fungi the problem is not that the keys are regenerated, but that we default to using ed25519 keys when we first set things up | 19:34 |
ade_lee | and those types of keys are disabled in fips | 19:34 |
fungi | aha! | 19:34 |
fungi | good point | 19:34 |
fungi | we'd need zuul to add all available host keys to the known_hosts for that build i guess | 19:35 |
ade_lee | yup -- or default to using ecdsa perhaps | 19:36 |
clarkb | I thought ecdsa was considered less secure ... | 19:36 |
clarkb | something about unexplainable magic numbers | 19:36 |
*** andrewbonney has quit IRC | 19:37 | |
ade_lee | clarkb, not sure - but the logs show that its trying to use ecdsa and getting a ed25519 key | 19:38 |
clarkb | ya the issue is we're only registering the ed25519 key | 19:38 |
clarkb | so when it gets something else it can't verify it. As fungi mentions we can probably have zuul write them alldown | 19:38 |
fungi | fips isn't about being "secure" it's about being predictable and vetted | 19:39 |
fungi | the usg (via fips) doesn't trust some newer and ostensibly stronger ciphers and hashes because it hasn't confidently vetted them yet | 19:40 |
ade_lee | clarkb, fungi so - how do we get zuul to write all the keys down? | 19:40 |
fungi | ade_lee: i'll have to think about it more when we're not in the middle of our weekly meeting | 19:40 |
ade_lee | fungi, ack thanks | 19:41 |
fungi | corvus: ^ this is an interesting corner case for zuul's handling of node ssh host keys | 20:01 |
fungi | i'm refreshing my memory now on where we do the ssh-keygen call to fetch those | 20:01 |
clarkb | fungi: its in nodepool utils.nodescan | 20:06 |
clarkb | then passed all the way up the chain from there | 20:06 |
fungi | aha, so we'd need to plumb it through zk, maybe need to change the message format? | 20:09 |
fungi | though perhaps multiline content there is fine | 20:09 |
clarkb | well it seems nodepool grabs all the keys | 20:09 |
clarkb | I'm working from there to see if I can find where we might select the key type to use | 20:09 |
clarkb | hrm it seems like zuul writes all the keys it gets | 20:11 |
clarkb | is it possible that centos-8 is only generating ed25519 by default? | 20:11 |
clarkb | oh unless it is paramiko only grabing a single key | 20:13 |
clarkb | https://opendev.org/zuul/nodepool/src/branch/master/nodepool/nodeutils.py#L88 that may be it | 20:13 |
clarkb | everything else assumes a list but there we may only grab a single key | 20:13 |
clarkb | the comment in there also says paramiko only grabs the rsa key | 20:14 |
clarkb | I'm guessing that has changed :) | 20:14 |
clarkb | http://docs.paramiko.org/en/stable/api/transport.html#paramiko.transport.Transport.get_remote_server_key is very quiet about what that actually does | 20:15 |
*** jamesdenton has joined #openstack-infra | 20:17 | |
clarkb | skimming the code I think it is using the host key that was used to establish the connection so not a truescan | 20:17 |
clarkb | what we want is an ssh-keyscan equivalent for paramiko | 20:21 |
*** tosky has joined #openstack-infra | 20:27 | |
clarkb | reading rfc 4253 I think what you can do is a series of connections each with a different supported host key type by the client | 20:32 |
clarkb | not sure if paramiko exposes that though | 20:32 |
*** slaweq has quit IRC | 20:33 | |
*** slaweq has joined #openstack-infra | 20:44 | |
*** zxiiro has joined #openstack-infra | 20:59 | |
*** raissa has joined #openstack-infra | 21:01 | |
*** hasharOut is now known as hashar | 21:12 | |
*** rcernin has joined #openstack-infra | 21:18 | |
*** rcernin has quit IRC | 21:18 | |
*** rcernin has joined #openstack-infra | 21:19 | |
clarkb | ade_lee: fungi https://review.opendev.org/761229 is a rudimentary attempt at addressing this in nodepool | 21:21 |
*** rfolco has quit IRC | 21:24 | |
*** rfolco has joined #openstack-infra | 21:24 | |
ade_lee | clarkb, cool thanks -- I'll add as a depends-on in my patch and see if it works! | 21:24 |
clarkb | well depends on won't help here | 21:24 |
clarkb | that is part of the nodepool service running in production that hands out test nodes and isn't speculatively applied | 21:25 |
ade_lee | clarkb, ah ok | 21:25 |
*** raissa has quit IRC | 21:25 | |
ade_lee | clarkb, so I have to wait for it to be merged to try it out? | 21:26 |
clarkb | fwiw it seems that paramiko is what prefers ed25519 | 21:26 |
*** ChanServ has quit IRC | 21:26 | |
clarkb | https://github.com/paramiko/paramiko/blob/dccc563c3a581a14ceaa261b5fe0d08539f84b87/paramiko/transport.py#L171-L178 is the list I think | 21:26 |
clarkb | and that is why we end up using that by default | 21:27 |
ade_lee | clarkb, one of the reasons paramiko is not fips certified .. | 21:28 |
clarkb | I mean you can disable it which is essentially how my change works | 21:28 |
clarkb | but also I'm not really convinced in the value of fips if it bypasses ed25519 in favor of ecdsa | 21:28 |
clarkb | since literally all the writing on the subject seems to be that ed25519 is far superior | 21:28 |
*** rfolco has quit IRC | 21:29 | |
ade_lee | clarkb, as fungi mentioned, fips is all about using things that are vetted and predictable -- you have to use fips certified algorithms etc. | 21:30 |
clarkb | right but the reason people don't like ecdsa is it has unvettable magic numbers | 21:30 |
clarkb | whereas the other algorithm is actually understandable | 21:30 |
clarkb | (which makes me think it isn't even achieving that goal) | 21:31 |
clarkb | but I guess if you are the same entity creating the fips vetting and the magic numbers... | 21:31 |
ade_lee | :/ | 21:31 |
*** ChanServ has joined #openstack-infra | 21:32 | |
*** tepper.freenode.net sets mode: +o ChanServ | 21:32 | |
clarkb | and ya apparently it is NIST in both cases | 21:32 |
ade_lee | yup | 21:32 |
fungi | clarkb: "unvetted" depends on who's doing the vetting | 21:46 |
fungi | ahh, yeah, as you said | 21:46 |
fungi | fips is really mostly only for the usg and contractors... anyone who doesn't trust "certain three-letter agencies" should probably not use fips mode | 21:47 |
*** gfidente|afk has quit IRC | 21:52 | |
*** matt_kosut has quit IRC | 21:59 | |
*** hashar has quit IRC | 22:01 | |
*** slaweq has quit IRC | 22:02 | |
*** aedc has joined #openstack-infra | 22:03 | |
*** aedc has quit IRC | 22:04 | |
*** sboyron has quit IRC | 22:34 | |
*** tkajinam has joined #openstack-infra | 22:59 | |
*** tosky has quit IRC | 23:01 | |
*** mmethot has quit IRC | 23:18 | |
*** dchen has joined #openstack-infra | 23:34 | |
*** mmethot has joined #openstack-infra | 23:34 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!