15:01:40 <lamt> #startmeeting openstack-helm 15:01:44 <openstack> Meeting started Tue Nov 17 15:01:40 2020 UTC and is due to finish in 60 minutes. The chair is lamt. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:47 <openstack> The meeting name has been set to 'openstack_helm' 15:02:14 <lamt> #link https://etherpad.opendev.org/p/openstack-helm-weekly-meeting agenda 15:02:20 <stevthedev> Hello 15:02:31 <lamt> o/ 15:02:41 <miniroy> \o 15:02:42 <cliffparsons> o/ 15:03:02 <lamt> Our fearless leader is feeling unwell. He asked me to host this week's meeting 15:03:31 <miniroy> I am glad he/she decides to take a day off 15:04:12 <stevthedev> I hope they are getting some rest 15:04:32 <lamt> I hope *he* is too. 15:04:59 <lamt> I think we can get started. 15:05:29 <lamt> Quick reminder, there is no meeting next week 15:05:41 <lamt> #topic OSH compute gate failure 15:06:05 <lamt> This is a follow up from last week, but the gate is still broken 15:06:32 <miniroy> X) 15:06:35 <lamt> so no OSH patches can merge 15:07:05 <sangeet> oh no! 15:07:59 <stevthedev> Do we know how it is broken? 15:08:26 <lamt> Not exactly, conjecture is some system library in Bionic 15:08:37 <lamt> starts to act up 15:09:01 <miniroy> to be precise, causing openvswitch to gag when it can't create threads 15:09:32 <stevthedev> Sounds tricky :/ 15:09:40 <miniroy> let me try that tweak today 15:09:52 <lamt> I tried to update all the old stuff in the gate (minikube version, helm version, k8s version) 15:10:14 <miniroy> all signs point to a system issue though 15:10:22 <lamt> (I think they should be updated anyway) - here was my last attempt that sort of work 15:10:41 <miniroy> oh really... did that actually help? 15:10:45 <lamt> https://review.opendev.org/#/c/762361/ 15:10:55 <lamt> Updating the host to focal worked 15:11:02 <lamt> "worked" - but 15:11:09 <lamt> it introduced new problems 15:11:33 <lamt> it crashes ceph and some other python/C library errored with Rocky 15:12:35 <lamt> I assume it is some system library difference between focal and bionic was the cause, but I wasn't able to pinpoint it 15:12:59 <miniroy> hmm.... so at least it confirms our suspicsion 15:13:00 <lamt> if anyone wants to take a lot, I'd appreciate that 15:13:21 <miniroy> I will take another crack at it today 15:14:00 <lamt> Thanks - if upgrading to focal isn't the correct path, we have to find the fix for ovs 15:14:07 <miniroy> so any other host we can try on besdies vexx and focal? 15:14:36 <lamt> I believe Andrii tried to address the max task parameter, but from what I can tell it was never limited 15:14:43 <lamt> so setting it to infinity doesn't do anything 15:15:15 <lamt> I tried the 32gb node, and same error - doesn't look like system memory constrainted. 15:17:07 <lamt> Thanks miniroy for taking a look 15:17:29 <miniroy> I wonder if something in the image change..... 15:17:37 <miniroy> but these are great data points to have 15:17:58 <lamt> I tried that too - I reverted the ovs image to the date the last gate passed 15:18:04 <lamt> same failure :/ 15:18:31 <miniroy> =( 15:19:19 <lamt> others can chime in as well - but I exhausted all the possibilities why ovs keeps failing 15:20:18 <miniroy> we don't have local access to any of these hosts right? 15:20:30 <lamt> I do not think so 15:22:20 <lamt> if it is not system library - it is some constraints preventing pthread creation 15:22:51 <miniroy> any idea what kernel is running on focal? 15:24:00 <lamt> ansible_kernel: 5.4.0-53-generic 15:24:39 <miniroy> oh and that works actually... wow 15:24:58 <lamt> it breaks other things 15:25:03 <lamt> like ceph and rocky 15:25:09 <lamt> with cffi 15:26:10 <miniroy> so should we move ahead and try to get it work with focal? or try to get it working with vexx? 15:26:52 <lamt> I will let others chime in - both require a bit of work 15:27:33 <lamt> it is asking if you want to fix errors A or errors B. 15:28:44 <miniroy> well let me poke at that ovs/vexx issue today more then 15:28:55 <lamt> thanks miniroy 15:28:58 <miniroy> sounds like probably more work at this point with upgrade to focal 15:29:11 <miniroy> looks like an onion to me I am afraid 15:29:25 <lamt> the alternative is to fix ovs 15:29:38 <miniroy> ceph is probably just the first layer I am afraid 15:29:56 <lamt> yup 15:30:14 <miniroy> in the absense of our fearless leader, I say let's focus on fixing ovs for now 15:30:30 <lamt> ++ 15:31:16 <lamt> we can take this offline to troubleshoot 15:31:41 <miniroy> + 15:31:43 <lamt> let's move on 15:31:50 <lamt> #topic Open Discussion 15:32:28 <lamt> there is no other agenda item - so opening the floor for discussion 15:34:12 <lamt> if nothing else, we can end the meeting. Everyone have a great rest of the week. 15:34:19 <lamt> #endmeeting