15:03:11 <bnemec> #startmeeting oslo
15:03:12 <openstack> Meeting started Mon Feb 11 15:03:11 2019 UTC and is due to finish in 60 minutes.  The chair is bnemec. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:03:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:03:15 <openstack> The meeting name has been set to 'oslo'
15:03:18 <bnemec> courtesy ping for amotoki, amrith, ansmith, bnemec, dansmith, dhellmann, dims
15:03:18 <bnemec> courtesy ping for dougwig, e0ne, electrocucaracha, flaper87, garyk, gcb, haypo
15:03:18 <bnemec> courtesy ping for hberaud, jd__, johnsom, jungleboyj, kgiusti, kragniz, lhx_
15:03:18 <bnemec> courtesy ping for moguimar, njohnston, raildo, redrobot, sileht, sreshetnyak, stephenfin
15:03:19 <bnemec> courtesy ping for stevemar, therve, thinrichs, toabctl, zhiyan, zxy, zzzeek
15:03:23 <stephenfin> o/
15:03:29 <redrobot> ol
15:03:32 <redrobot> o/
15:03:34 <redrobot> \o
15:03:36 <moguimar> xD
15:03:36 <jungleboyj> o/
15:03:44 <moguimar> I was waiting in the wrong channel
15:03:56 <bnemec> My previous meeting ran over a bit.
15:04:05 <bnemec> #link https://wiki.openstack.org/wiki/Meetings/Oslo#Agenda_for_Next_Meeting
15:04:07 <johnsom> o/
15:05:20 <bnemec> #topic Red flags for/from liaisons
15:05:53 <bnemec> I've been kind of out of touch for the past two weeks, so if there's anything going on let me know.
15:05:59 <johnsom> Nothing to report here.
15:06:41 * jungleboyj has been out of touch as well.  :-)
15:07:04 <bnemec> jungleboyj: Yours involved a lot more bbq than mine. :-)
15:07:22 <jungleboyj> He he he.  Yeah, and the scale shows it this morning.  :-(
15:08:34 <bnemec> Okay, as always you don't have to wait for the meeting to bring up issues, so if there is anything just let the Oslo team know.
15:08:37 <bnemec> #topic Releases
15:09:04 <bnemec> I released oslo.utils before I left on PTO so we could get the EventletEvent fix out there.
15:09:20 <bnemec> Skimming my emails this morning I didn't see that it made anything explode.
15:09:34 <bnemec> I'll do the usual set of releases today.
15:10:02 <bnemec> #topic Action items from last meeting
15:10:10 <bnemec> "kgiusti to try to reproduce https://bugs.launchpad.net/oslo.messaging/+bug/1800957 with higher thread count"
15:10:11 <openstack> Launchpad bug 1800957 in oslo.messaging "Upgrading to pike version causes rabbit timeouts with ssl" [High,Confirmed] - Assigned to Ken Giusti (kgiusti)
15:10:19 <bnemec> I believe this is done.
15:10:35 <bnemec> And IIRC there was even a fix posted to the bug.
15:11:35 <bnemec> And that was it for action items.
15:11:52 <bnemec> #topic Privsep and code that forks
15:11:59 <bnemec> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-January/001831.html
15:12:41 <bnemec> This came up because the Neutron team was having another issue with the threaded privsep change.
15:13:00 <bnemec> It turns out that the library they called had a fork in it.
15:13:16 <johnsom> I think we ran into this as well. Someone was attempting to add privsep to our agent but was having problems.
15:13:27 <bnemec> This very much does not play nicely with threads, as you can see from the links in my email.
15:14:44 <bnemec> I added this to the agenda because I don't have a great answer to it.
15:15:28 <bnemec> We may just need to come up with a way to make a fork-safe privsep call (whether that's running in the main thread or a completely separate daemon) and deal with these issues as we find them.
15:15:49 <bnemec> But if anyone has a better idea please share. :-)
15:16:04 <dhellmann> is the fork happening inside secure side of the privsep call?
15:16:19 <bnemec> dhellmann: Yes
15:16:26 <bnemec> It was in a library called by the privileged function.
15:16:27 * stephenfin tends to bury his head in the sand when it comes to all things privsep and is of absolutely no help :(
15:16:50 <dhellmann> what is the fork doing?
15:16:55 <bnemec> stephenfin: This is why "privsep expert" is high on my list of wants in every Oslo project update. ;-)
15:17:58 <dhellmann> is it the ssh that triggers the failure?
15:18:03 <bnemec> It was something to do with netns management: https://github.com/svinota/pyroute2/blob/master/pyroute2/netns/nslink.py#L146-L147
15:18:49 <bnemec> Ah: https://github.com/svinota/pyroute2/blob/master/pyroute2/netns/nslink.py#L108
15:18:51 <dhellmann> ok, well, the whole point of privsep is to avoid having to do forks
15:18:56 <gsantomaggio> about : https://bugs.launchpad.net/oslo.messaging/+bug/1800957
15:18:58 <gsantomaggio> we fixed the problem here: https://github.com/celery/py-amqp/pull/247
15:18:58 <openstack> Launchpad bug 1800957 in oslo.messaging "Upgrading to pike version causes rabbit timeouts with ssl" [High,Confirmed] - Assigned to Ken Giusti (kgiusti)
15:18:58 <bnemec> That's why it's forking.
15:19:23 <bnemec> gsantomaggio: Yep, thanks for looking into that!
15:19:33 <dhellmann> it sounds like maybe the privsep API needs an "init hook" to set stuff like this up
15:19:58 <bnemec> The problem is it isn't our code.
15:20:04 <dhellmann> well, it's the structure
15:20:19 <dhellmann> we need to give the secure side of the privsep api a chance to do some setup before we start making threads
15:20:48 <bnemec> I'm not sure this is something that could be done ahead of time though.
15:20:57 <dhellmann> oh, no?
15:20:59 <bnemec> Neutron doesn't have a complete list of netns's at startup.
15:21:03 <dhellmann> ah
15:21:18 <bnemec> At least I don't think so, since they can be created by new networks and such.
15:21:22 <dhellmann> oh, sure
15:22:02 <dhellmann> I guess we need to give the secure functions a way to do something in the "main" thread then
15:22:20 <dhellmann> or maybe we just say this is not a good candidate for privsep, I don't know
15:23:08 <bnemec> Yeah, it seems like we could add an in-process command to the privsep protocol.
15:23:33 <dhellmann> I guess the trick is that the secure code in the thread has to be able to trigger it
15:23:39 <bnemec> Alternatively, neutron got around this by just synchronizing all of the privileged calls in this module.
15:23:50 <dhellmann> that seems like a reasonable approach, too
15:23:58 <dhellmann> at least as a stop-gap
15:24:03 <bnemec> I don't know whether that's entirely safe either, but it at least keeps multiple calls from stepping on each other.
15:24:52 * dhellmann shrugs
15:25:46 <bnemec> Okay, sounds like adding in-process execution is probably our best bet here.
15:26:14 <bnemec> #action Investigate adding main thread execution to privsep
15:26:35 <bnemec> in-process really isn't the right term since all the calls run in the same process, just different threads.
15:26:38 <dhellmann> oh, maybe we can just flag certain privsep functions to run in process instead of in threads
15:26:49 <dhellmann> that might be simpler than having something in the main thread that the other threads can communicate with
15:27:03 <dhellmann> sorry, flag them to run in the main thread instead of a worker
15:28:05 <bnemec> Yeah, I'm wondering if we could add a privileged_synchronous decorator or something to indicate that we can't run the call asynchronously.
15:28:17 <dhellmann> right, I think that's likely to be the simplest approach
15:28:39 <dhellmann> I was thinking originally we'd have something the worker thread could do to cause work to happen in the main thread, but that's silly
15:28:52 <dhellmann> at some point we'd be building an operating system scheduler
15:29:17 <bnemec> Yeah, I _think_ we can do it this way, but I don't know the privsep code well enough to say for sure.
15:29:44 <dhellmann> it should just be an if statement at the point where we dispatch the work to a thread on the secure side
15:30:05 <dhellmann> either do the work immediately, or throw it into a thread
15:30:05 <bnemec> Yeah
15:31:06 <bnemec> Okay, sounds like we have a plan then.
15:31:13 <bnemec> I'll reply to the mailing list thread too for visibility.
15:31:31 <bnemec> #action bnemec to update openstack-discuss about privsep/fork issue
15:32:08 <bnemec> #topic Weekly Wayward Review
15:32:44 <bnemec> #link https://review.openstack.org/579186
15:33:01 <bnemec> Slightly different this week.
15:33:22 <dhellmann> it looks like we need a recheck there to build new versions of the docs, since the old logs have expired
15:33:23 <bnemec> We need someone to take this patch over and eliminate the duplication in the docs.
15:33:58 <dhellmann> also that, yes
15:34:54 <bnemec> Basically it's a good change that needs a bit of massaging. I've been meaning to do that but I keep not having time, so I'm putting out a call for help. :-)
15:35:44 <moguimar> o/
15:36:16 <moguimar> I can take it dhellmann
15:36:26 <bnemec> moguimar: Thanks!
15:36:31 <dhellmann> thanks, moguimar !
15:36:49 <bnemec> #action moguimar to take over https://review.openstack.org/579186
15:37:09 <bnemec> #topic Open discussion
15:37:17 <bnemec> Anything else?
15:37:26 <moguimar> yep
15:37:38 <moguimar> I spoke about oslo.config last week at FOSDEM
15:37:40 <moguimar> https://fosdem.org/2019/schedule/event/python_application_configuration/
15:37:58 * dhellmann still has that video queued up to watch
15:38:13 <bnemec> Nice
15:38:18 <moguimar> it is basically the same talk I gave at Python Brasil
15:38:39 <moguimar> I thought that I had enought reharsal but it was in portuguese
15:39:00 <moguimar> so I kinda translated on the fly and next time will do some english reharsal too
15:39:06 <bnemec> :-)
15:39:23 <dhellmann> :-)
15:41:06 <bnemec> Any other topics?
15:41:59 <moguimar> not on my end
15:43:11 <bnemec> Okay, we'll give everyone 15 minutes back then.
15:43:14 <bnemec> Thanks for joining!
15:43:19 <bnemec> #endmeeting