#openstack-rpm-packaging log

14:06:56 <dirk> #startmeeting rpm_packaging
14:06:57 <openstack> Meeting started Wed Oct 31 14:06:56 2018 UTC and is due to finish in 60 minutes.  The chair is dirk. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:06:58 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:07:00 <openstack> The meeting name has been set to 'rpm_packaging'
14:07:23 <dirk> ping toabctl, dirk, apevec, aplanas, IgorYozhikov, jpena, jruzicka, number80, kaslcrof, ykarel
14:07:27 <dirk> #topic roll call
14:08:27 * dirk struggles with etherpad
14:11:19 <jpena> o/
14:16:50 <cmurphy> o/
14:18:49 <dirk> #chair jpena  cmurphy
14:18:50 <openstack> Current chairs: cmurphy dirk jpena
14:19:02 <dirk> does anyone have topics? I can't access etherpad for some reason
14:19:16 <jpena> nothing special for me
14:19:24 <dirk> jpena: are you coming to the berlin summit?
14:19:36 <jpena> dirk: no, not this time
14:22:34 <dirk> anyone else coming that is worth talking to ?
14:22:46 <dirk> I mean, sorry, in the context of contributions for the openstack rpm packaging
14:23:21 <jpena> from the Red Hat side I'm not aware of many people going to Berlin
14:30:03 <dirk> k, thanks
14:30:07 <dirk> #topic reviews
14:30:11 <dirk> do we have reviews to talk about?
14:30:40 <dirk> https://review.openstack.org/#/c/613652/ looks like an easy merge
14:31:06 <dirk> ah, I see cmurphy and jpena would need to look at https://review.openstack.org/#/c/610011/ again - I would like to close out this topic
14:31:13 <dirk> any strong opinions on how to get this merged?
14:31:17 <dirk> or if it should be merged?
14:32:43 <jpena> I don't have a strong opinion on the limit increase. I saw cmurphy didn't like the increase in TasksMax
14:32:58 <cmurphy> I'm still unclear on why raising both nproc and tasksmax is okay
14:33:00 <jpena> dirk: how did you come up with those numbers?
14:33:23 <dirk> jpena: I copied them from ceph-osd.service
14:33:39 <cmurphy> when i happened to check this on a customer's production site the number of tasks was at 22 for cinder-volume
14:33:44 <dirk> the issue we had was with a user that was running cinder-volume with rbd at scale.
14:34:16 <dirk> and librbd just asserts() (which crashes all of cinder-volume) when it can't start a new thread
14:34:40 <dirk> so TasksMax=infinity basicall removes the systemd cgroup pid controller
14:34:54 <dirk> and the rest is just bumping the limits accross the sanity imho
14:35:32 <dirk> cmurphy: so you considered tasksmax=infinity the unsafe part? or the NPROC?
14:38:20 <cmurphy> dirk: the combination of raising both is what seems unsafe to me
14:38:23 <cmurphy> but i'm not an expert
14:39:35 <cmurphy> but i noted that ubuntu doesn't bother with these limits http://paste.openstack.org/show/732149/
14:39:43 <dirk> so LimitNPROC=1234\nTasksMax=500 means "this service and its children are allowed to use 500 pids (threads, processes), all of the same uid are allowed 1234"
14:40:14 <dirk> LimitNPROC=1234\nTasksMax=inifinity means "this service is allowed to use up to the user rlimit (1234)"
14:40:36 <jpena> oh, so nproc still affects
14:40:43 <dirk> yes
14:40:53 <dirk> nproc is number of processes for this *user* (not this process tree)
14:41:02 <dirk> so e.g. cinder-api and cinder-volume share limits
14:41:07 <dirk> (as they run as the same user)
14:41:20 <dirk> tasksMax is a systemd feature to limit cinder-volume so that its not "eating" the quota of cinder-api
14:41:41 <dirk> but exactly this feature caused the issue for the customer (he was having 8191 threads in ceph and then it asserted)
14:41:52 <dirk> when it tried to spawn 8192
14:42:29 <dirk> given the tradeoff "all of cinder volume is down for every user" and "well, ceph goes slightly beyond sanity but if the hardware is powerful enough we allow it" this change is changing it towards "we keep cinder volume up at all cost"
14:43:21 <cmurphy> is librbd supposed to be using that many threads?
14:43:26 <dirk> cmurphy: I have to admit I ignore the ubuntu reason because it just means they eithe rhave some other way of overriding the defaults (e.g. via a limits drop in from their orchestration) or they never had the issue so far (for example because they use the older systemd or have a DefaultTasksMax=infinity configured)
14:43:59 <dirk> so if they don't set it, the global default catches, and if that one is infinity then they never have cgroup pid limits
14:44:09 <dirk> cmurphy: well.. thats a whole other debate.. :-)
14:44:23 <dirk> IMHO 8192 is plenty but then again we saw that this limit is being reached
14:45:40 <cmurphy> but is it reached because it's a runaway, or because it's functioning normally? if it's a runaway and malfunctioning then raising the limits for it is just going to delay the inevitable crash
14:45:52 <dirk> so, to draw a conclusion here, a simple TasksMax=infinity change without raising LimitNOFILE and limitNPROC would be acceptable?
14:46:19 <dirk> cmurphy: there wasn't a leak, its just a load spike, it is normal a few seconds later
14:46:32 <cmurphy> okay
14:46:40 <dirk> e.g. the user is spawning a heat stack with a few dozenn volumes inside
14:47:33 <dirk> and while I agree librbd could be less agressively spawning threads, still any decent hardware should be capable of handling 8192 threads, so its not that the machine would be crawling death with that many threads
14:48:03 <dirk> (although it will eat like 64GB of RAM just for stack space)
14:48:17 <cmurphy> okay, tasksmax=infinity without the other limit changes sounds fine to me
14:53:01 <dirk> #topic open floor
14:54:29 <openstackgerrit> Dirk Mueller proposed openstack/rpm-packaging master: cinder-volume: Raise limits and disable cgroup limtis  https://review.openstack.org/610011
14:54:34 <dirk> cmurphy: done ^^ :)
14:54:43 <dirk> anything else?
14:54:49 <dirk> T-2 min before ending ;)
14:54:54 <dirk> (happy halloween)
14:54:57 * cmurphy nothing
14:56:03 <jpena> nope
14:56:58 <dirk> thanks :)
14:57:00 <dirk> #endmeeting