15:03:11 <bnemec> #startmeeting oslo 15:03:12 <openstack> Meeting started Mon Feb 11 15:03:11 2019 UTC and is due to finish in 60 minutes. The chair is bnemec. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:03:15 <openstack> The meeting name has been set to 'oslo' 15:03:18 <bnemec> courtesy ping for amotoki, amrith, ansmith, bnemec, dansmith, dhellmann, dims 15:03:18 <bnemec> courtesy ping for dougwig, e0ne, electrocucaracha, flaper87, garyk, gcb, haypo 15:03:18 <bnemec> courtesy ping for hberaud, jd__, johnsom, jungleboyj, kgiusti, kragniz, lhx_ 15:03:18 <bnemec> courtesy ping for moguimar, njohnston, raildo, redrobot, sileht, sreshetnyak, stephenfin 15:03:19 <bnemec> courtesy ping for stevemar, therve, thinrichs, toabctl, zhiyan, zxy, zzzeek 15:03:23 <stephenfin> o/ 15:03:29 <redrobot> ol 15:03:32 <redrobot> o/ 15:03:34 <redrobot> \o 15:03:36 <moguimar> xD 15:03:36 <jungleboyj> o/ 15:03:44 <moguimar> I was waiting in the wrong channel 15:03:56 <bnemec> My previous meeting ran over a bit. 15:04:05 <bnemec> #link https://wiki.openstack.org/wiki/Meetings/Oslo#Agenda_for_Next_Meeting 15:04:07 <johnsom> o/ 15:05:20 <bnemec> #topic Red flags for/from liaisons 15:05:53 <bnemec> I've been kind of out of touch for the past two weeks, so if there's anything going on let me know. 15:05:59 <johnsom> Nothing to report here. 15:06:41 * jungleboyj has been out of touch as well. :-) 15:07:04 <bnemec> jungleboyj: Yours involved a lot more bbq than mine. :-) 15:07:22 <jungleboyj> He he he. Yeah, and the scale shows it this morning. :-( 15:08:34 <bnemec> Okay, as always you don't have to wait for the meeting to bring up issues, so if there is anything just let the Oslo team know. 15:08:37 <bnemec> #topic Releases 15:09:04 <bnemec> I released oslo.utils before I left on PTO so we could get the EventletEvent fix out there. 15:09:20 <bnemec> Skimming my emails this morning I didn't see that it made anything explode. 15:09:34 <bnemec> I'll do the usual set of releases today. 15:10:02 <bnemec> #topic Action items from last meeting 15:10:10 <bnemec> "kgiusti to try to reproduce https://bugs.launchpad.net/oslo.messaging/+bug/1800957 with higher thread count" 15:10:11 <openstack> Launchpad bug 1800957 in oslo.messaging "Upgrading to pike version causes rabbit timeouts with ssl" [High,Confirmed] - Assigned to Ken Giusti (kgiusti) 15:10:19 <bnemec> I believe this is done. 15:10:35 <bnemec> And IIRC there was even a fix posted to the bug. 15:11:35 <bnemec> And that was it for action items. 15:11:52 <bnemec> #topic Privsep and code that forks 15:11:59 <bnemec> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001831.html 15:12:41 <bnemec> This came up because the Neutron team was having another issue with the threaded privsep change. 15:13:00 <bnemec> It turns out that the library they called had a fork in it. 15:13:16 <johnsom> I think we ran into this as well. Someone was attempting to add privsep to our agent but was having problems. 15:13:27 <bnemec> This very much does not play nicely with threads, as you can see from the links in my email. 15:14:44 <bnemec> I added this to the agenda because I don't have a great answer to it. 15:15:28 <bnemec> We may just need to come up with a way to make a fork-safe privsep call (whether that's running in the main thread or a completely separate daemon) and deal with these issues as we find them. 15:15:49 <bnemec> But if anyone has a better idea please share. :-) 15:16:04 <dhellmann> is the fork happening inside secure side of the privsep call? 15:16:19 <bnemec> dhellmann: Yes 15:16:26 <bnemec> It was in a library called by the privileged function. 15:16:27 * stephenfin tends to bury his head in the sand when it comes to all things privsep and is of absolutely no help :( 15:16:50 <dhellmann> what is the fork doing? 15:16:55 <bnemec> stephenfin: This is why "privsep expert" is high on my list of wants in every Oslo project update. ;-) 15:17:58 <dhellmann> is it the ssh that triggers the failure? 15:18:03 <bnemec> It was something to do with netns management: https://github.com/svinota/pyroute2/blob/master/pyroute2/netns/nslink.py#L146-L147 15:18:49 <bnemec> Ah: https://github.com/svinota/pyroute2/blob/master/pyroute2/netns/nslink.py#L108 15:18:51 <dhellmann> ok, well, the whole point of privsep is to avoid having to do forks 15:18:56 <gsantomaggio> about : https://bugs.launchpad.net/oslo.messaging/+bug/1800957 15:18:58 <gsantomaggio> we fixed the problem here: https://github.com/celery/py-amqp/pull/247 15:18:58 <openstack> Launchpad bug 1800957 in oslo.messaging "Upgrading to pike version causes rabbit timeouts with ssl" [High,Confirmed] - Assigned to Ken Giusti (kgiusti) 15:18:58 <bnemec> That's why it's forking. 15:19:23 <bnemec> gsantomaggio: Yep, thanks for looking into that! 15:19:33 <dhellmann> it sounds like maybe the privsep API needs an "init hook" to set stuff like this up 15:19:58 <bnemec> The problem is it isn't our code. 15:20:04 <dhellmann> well, it's the structure 15:20:19 <dhellmann> we need to give the secure side of the privsep api a chance to do some setup before we start making threads 15:20:48 <bnemec> I'm not sure this is something that could be done ahead of time though. 15:20:57 <dhellmann> oh, no? 15:20:59 <bnemec> Neutron doesn't have a complete list of netns's at startup. 15:21:03 <dhellmann> ah 15:21:18 <bnemec> At least I don't think so, since they can be created by new networks and such. 15:21:22 <dhellmann> oh, sure 15:22:02 <dhellmann> I guess we need to give the secure functions a way to do something in the "main" thread then 15:22:20 <dhellmann> or maybe we just say this is not a good candidate for privsep, I don't know 15:23:08 <bnemec> Yeah, it seems like we could add an in-process command to the privsep protocol. 15:23:33 <dhellmann> I guess the trick is that the secure code in the thread has to be able to trigger it 15:23:39 <bnemec> Alternatively, neutron got around this by just synchronizing all of the privileged calls in this module. 15:23:50 <dhellmann> that seems like a reasonable approach, too 15:23:58 <dhellmann> at least as a stop-gap 15:24:03 <bnemec> I don't know whether that's entirely safe either, but it at least keeps multiple calls from stepping on each other. 15:24:52 * dhellmann shrugs 15:25:46 <bnemec> Okay, sounds like adding in-process execution is probably our best bet here. 15:26:14 <bnemec> #action Investigate adding main thread execution to privsep 15:26:35 <bnemec> in-process really isn't the right term since all the calls run in the same process, just different threads. 15:26:38 <dhellmann> oh, maybe we can just flag certain privsep functions to run in process instead of in threads 15:26:49 <dhellmann> that might be simpler than having something in the main thread that the other threads can communicate with 15:27:03 <dhellmann> sorry, flag them to run in the main thread instead of a worker 15:28:05 <bnemec> Yeah, I'm wondering if we could add a privileged_synchronous decorator or something to indicate that we can't run the call asynchronously. 15:28:17 <dhellmann> right, I think that's likely to be the simplest approach 15:28:39 <dhellmann> I was thinking originally we'd have something the worker thread could do to cause work to happen in the main thread, but that's silly 15:28:52 <dhellmann> at some point we'd be building an operating system scheduler 15:29:17 <bnemec> Yeah, I _think_ we can do it this way, but I don't know the privsep code well enough to say for sure. 15:29:44 <dhellmann> it should just be an if statement at the point where we dispatch the work to a thread on the secure side 15:30:05 <dhellmann> either do the work immediately, or throw it into a thread 15:30:05 <bnemec> Yeah 15:31:06 <bnemec> Okay, sounds like we have a plan then. 15:31:13 <bnemec> I'll reply to the mailing list thread too for visibility. 15:31:31 <bnemec> #action bnemec to update openstack-discuss about privsep/fork issue 15:32:08 <bnemec> #topic Weekly Wayward Review 15:32:44 <bnemec> #link https://review.openstack.org/579186 15:33:01 <bnemec> Slightly different this week. 15:33:22 <dhellmann> it looks like we need a recheck there to build new versions of the docs, since the old logs have expired 15:33:23 <bnemec> We need someone to take this patch over and eliminate the duplication in the docs. 15:33:58 <dhellmann> also that, yes 15:34:54 <bnemec> Basically it's a good change that needs a bit of massaging. I've been meaning to do that but I keep not having time, so I'm putting out a call for help. :-) 15:35:44 <moguimar> o/ 15:36:16 <moguimar> I can take it dhellmann 15:36:26 <bnemec> moguimar: Thanks! 15:36:31 <dhellmann> thanks, moguimar ! 15:36:49 <bnemec> #action moguimar to take over https://review.openstack.org/579186 15:37:09 <bnemec> #topic Open discussion 15:37:17 <bnemec> Anything else? 15:37:26 <moguimar> yep 15:37:38 <moguimar> I spoke about oslo.config last week at FOSDEM 15:37:40 <moguimar> https://fosdem.org/2019/schedule/event/python_application_configuration/ 15:37:58 * dhellmann still has that video queued up to watch 15:38:13 <bnemec> Nice 15:38:18 <moguimar> it is basically the same talk I gave at Python Brasil 15:38:39 <moguimar> I thought that I had enought reharsal but it was in portuguese 15:39:00 <moguimar> so I kinda translated on the fly and next time will do some english reharsal too 15:39:06 <bnemec> :-) 15:39:23 <dhellmann> :-) 15:41:06 <bnemec> Any other topics? 15:41:59 <moguimar> not on my end 15:43:11 <bnemec> Okay, we'll give everyone 15 minutes back then. 15:43:14 <bnemec> Thanks for joining! 15:43:19 <bnemec> #endmeeting