*** tosky has quit IRC | 00:06 | |
*** kgiusti has left #openstack-oslo | 00:34 | |
*** lbragstad has quit IRC | 01:09 | |
*** rcernin has quit IRC | 01:31 | |
*** rcernin has joined #openstack-oslo | 01:32 | |
*** dave-mccowan has joined #openstack-oslo | 02:18 | |
*** ducnv has joined #openstack-oslo | 02:54 | |
*** ducnv has quit IRC | 02:56 | |
*** ducnv has joined #openstack-oslo | 02:56 | |
*** lbragstad has joined #openstack-oslo | 03:11 | |
*** dave-mccowan has quit IRC | 04:00 | |
*** bnemec has quit IRC | 04:26 | |
*** ebbex has quit IRC | 04:26 | |
*** cburgess has quit IRC | 04:26 | |
*** stephenfin has quit IRC | 04:26 | |
*** django has quit IRC | 04:26 | |
*** sorrison has quit IRC | 04:26 | |
*** bnemec has joined #openstack-oslo | 04:53 | |
*** ebbex has joined #openstack-oslo | 04:53 | |
*** cburgess has joined #openstack-oslo | 04:53 | |
*** stephenfin has joined #openstack-oslo | 04:53 | |
*** django has joined #openstack-oslo | 04:53 | |
*** sorrison has joined #openstack-oslo | 04:53 | |
*** rcernin has quit IRC | 04:56 | |
*** rcernin has joined #openstack-oslo | 05:02 | |
*** lbragstad has quit IRC | 05:33 | |
*** jhesketh has quit IRC | 05:53 | |
*** jhesketh has joined #openstack-oslo | 05:54 | |
openstackgerrit | Merged openstack/automaton master: add python 3.7 unit test job https://review.openstack.org/610513 | 06:27 |
---|---|---|
*** e0ne has joined #openstack-oslo | 06:32 | |
*** lxkong has quit IRC | 06:52 | |
*** Luzi has joined #openstack-oslo | 06:54 | |
*** e0ne has quit IRC | 07:17 | |
*** lxkong has joined #openstack-oslo | 07:21 | |
*** e0ne has joined #openstack-oslo | 07:21 | |
*** pcaruana has joined #openstack-oslo | 07:25 | |
*** rcernin has quit IRC | 07:27 | |
*** aojea has joined #openstack-oslo | 07:33 | |
*** e0ne has quit IRC | 08:04 | |
*** moguimar has joined #openstack-oslo | 08:22 | |
*** tosky has joined #openstack-oslo | 08:27 | |
*** e0ne has joined #openstack-oslo | 10:25 | |
openstackgerrit | Merged openstack/oslo.vmware master: SDRS recommendation for create VM https://review.openstack.org/640639 | 11:19 |
openstackgerrit | Merged openstack/oslo.vmware master: Return None if no suitable datastore is found https://review.openstack.org/640640 | 11:20 |
*** a-pugachev has joined #openstack-oslo | 11:25 | |
*** a-pugachev_ has joined #openstack-oslo | 12:02 | |
*** a-pugachev has quit IRC | 12:05 | |
*** a-pugachev_ is now known as a-pugachev | 12:05 | |
*** raildo has joined #openstack-oslo | 12:14 | |
*** a-pugachev has quit IRC | 12:25 | |
*** a-pugachev has joined #openstack-oslo | 12:28 | |
openstackgerrit | Stephen Finucane proposed openstack/oslo.db stable/rocky: Resolve SAWarning in Query.soft_delete() https://review.openstack.org/641650 | 12:48 |
stephenfin | bnemec: When you're about, would you mind looking at https://review.openstack.org/641650 ? | 12:49 |
*** a-pugachev has quit IRC | 12:57 | |
*** kgiusti has joined #openstack-oslo | 13:34 | |
*** lbragstad has joined #openstack-oslo | 13:48 | |
*** ansmith has joined #openstack-oslo | 13:49 | |
*** moguimar has quit IRC | 15:00 | |
*** Luzi has quit IRC | 15:35 | |
*** a-pugachev has joined #openstack-oslo | 15:45 | |
openstackgerrit | Merged openstack/oslo.db stable/rocky: Resolve SAWarning in Query.soft_delete() https://review.openstack.org/641650 | 15:45 |
*** a-pugachev_ has joined #openstack-oslo | 15:56 | |
*** a-pugachev has quit IRC | 15:58 | |
*** a-pugachev has joined #openstack-oslo | 16:00 | |
*** a-pugachev_ has quit IRC | 16:01 | |
openstackgerrit | Stephen Finucane proposed openstack/oslo.db stable/queens: Resolve SAWarning in Query.soft_delete() https://review.openstack.org/641724 | 16:44 |
*** e0ne has quit IRC | 17:07 | |
*** a-pugachev has quit IRC | 17:46 | |
*** irclogbot_2 has joined #openstack-oslo | 18:02 | |
*** e0ne has joined #openstack-oslo | 18:19 | |
*** e0ne has quit IRC | 18:24 | |
*** e0ne has joined #openstack-oslo | 18:29 | |
*** e0ne has quit IRC | 18:33 | |
*** pcaruana has quit IRC | 18:46 | |
*** pcaruana has joined #openstack-oslo | 19:05 | |
openstackgerrit | Ben Nemec proposed openstack/oslo.cache stable/rocky: Fix memcache pool client in monkey-patched environments https://review.openstack.org/640500 | 19:30 |
*** pcaruana has quit IRC | 19:33 | |
*** mnaser has joined #openstack-oslo | 20:08 | |
mnaser | dhellmann: mind if i pick at your brain at a change you helped push..... 6 years ago? :) | 20:08 |
dhellmann | mnaser : uhm, you can try. :-) | 20:09 |
mnaser | https://review.openstack.org/#/c/63444/3/openstack/common/service.py L195 in service.py | 20:09 |
mnaser | i'm noticing an issue with nova where a reload is doing really weird behaviour (troubleshooted a tad with the nova team) | 20:09 |
mnaser | and it seems that im concluding that https://github.com/openstack/oslo.service/blob/471bfe770808a0348fcce1e83753dcc414b361c1/oslo_service/service.py#L395-L396 should really be "if _is_sighup_and_daemon(signo):" | 20:10 |
mnaser | because what that is doing is that if (not) sighup and we're running as a daemon: break. this means that a reload is resuliting in self.restart() being called | 20:10 |
dhellmann | is that now how the reload works? | 20:11 |
mnaser | well, shouldn't the reload call self.reset() and not self.restart() ? | 20:11 |
mnaser | nova assumes that a reload (sighup) only calls reset() -- not a full service restart | 20:12 |
mnaser | maybe dansmith can comment better about this, if he has time.. i've just kinda tried to slowly figure this out | 20:12 |
dansmith | dhellmann: the oslo service code assumes that we should only do a reload type action on sighup if we've self-daemonized, | 20:13 |
dansmith | which is not really legit for a systemd-managed process, or anything running under a supervisor (like regular init for respawn, monit, etc) | 20:13 |
dhellmann | I wonder if this logic predates that? | 20:13 |
dansmith | predates the idea of running a service foreground style under a supervisor? no, | 20:14 |
dhellmann | bnemec you've looked at the service code more recently than I have, I think | 20:14 |
dansmith | but it probably does pre-date the uptick in systemd usage | 20:14 |
dhellmann | yeah, systemd | 20:14 |
dhellmann | I trust you both if you say it's wrong, but I don't know this lib well enough to actually confirm it if you're not sure | 20:15 |
mnaser | i mean, i've only started running into this with our gates failing from a bit of odd behaviour showing up | 20:15 |
mnaser | our = OSA | 20:15 |
dansmith | dhellmann: I disagree with the assertion made in the commit message of the code that added this, so...I'm pretty sure :D | 20:15 |
dhellmann | let's get a fix posted then | 20:17 |
dansmith | like, it's not that the code is broken, it's that the thing the code is (properly) doing is fundamentally naive, I think | 20:17 |
* bnemec passes the buck to zaneb :-) | 20:17 | |
mnaser | given this is breaking the openstack-ansible gates, i can volunteer myself to fix it if someone knows what needs to be done (and doesn't have the time to :<) | 20:18 |
* zaneb hides | 20:18 | |
dansmith | mnaser: honestly, I'd just revert the is-daemon check part and call it good | 20:18 |
bnemec | We have a maybe related bug too: https://bugs.launchpad.net/oslo.service/+bug/1794708 | 20:18 |
openstack | Launchpad bug 1794708 in oslo.service "Service children uselessly restart on signals" [Undecided,New] | 20:18 |
dansmith | but I'd defer to osloians about notice, deprecation, etc | 20:18 |
mnaser | also dansmith found a related bug in neutron | 20:19 |
mnaser | where it just dies on a sighup too | 20:19 |
dansmith | bnemec: yeah, we have another bug where someone is doing something wrong in nova because they assume this is just how it's supposed to work | 20:19 |
dansmith | and that | 20:19 |
mnaser | https://bugs.launchpad.net/neutron/+bug/1780139 | 20:20 |
openstack | Launchpad bug 1780139 in neutron "Sending SIGHUP to neutron-server process causes it to hang" [Undecided,Triaged] - Assigned to Bernard Cafarelli (bcafarel) | 20:20 |
mnaser | this one | 20:20 |
mnaser | oh jeez | 20:21 |
mnaser | dansmith: https://bugs.launchpad.net/nova/+bug/1715374 | 20:21 |
openstack | Launchpad bug 1715374 in OpenStack Compute (nova) "Reloading compute with SIGHUP prevents instances from booting" [High,In progress] - Assigned to Ralf Haferkamp (rhafer) | 20:21 |
mnaser | would ya look at that | 20:21 |
dansmith | mnaser: didn't I say "I hope to god this is not really happening because ... bad things" ? :) | 20:21 |
dansmith | shutting down rpc and all | 20:22 |
mnaser | well i guess we can reference that bug to fix it but it's been around a while | 20:22 |
dansmith | mnaser: I have oslo.service pulled and am working on it | 20:22 |
mnaser | dansmith: is it safe to reclassify those bugs as "invalid" for nova (not sure about neutron, but at least nova) and add that it affects oslo.service instead? | 20:23 |
bnemec | FWIW, it sounds like TripleO hit this as well and worked around it instead: https://bugs.launchpad.net/tripleo/+bug/1789680 | 20:23 |
openstack | Launchpad bug 1789680 in tripleo "mistral MessagingTimeout correlates with containerized undercloud uptime" [Critical,Fix released] - Assigned to Dougal Matthews (d0ugal) | 20:23 |
dansmith | I don't think so | 20:23 |
dansmith | they're legit they're just not bugs in nova | 20:23 |
dansmith | bnemec: right, which is what was tried to do in nova.. | 20:23 |
dansmith | and maybe neutron too | 20:23 |
mnaser | dansmith: right so just to move the bug as a oslo.service bug rather than a nova bug | 20:24 |
dansmith | add oslo.service I think | 20:24 |
dansmith | so they affect both | 20:25 |
mnaser | ok, updated https://bugs.launchpad.net/oslo.service/+bug/1715374 | 20:25 |
openstack | Launchpad bug 1715374 in OpenStack Compute (nova) "Reloading compute with SIGHUP prevents instances from booting" [High,In progress] - Assigned to Ralf Haferkamp (rhafer) | 20:25 |
dansmith | I assume that since that change was merged with zero tests (!) that this should be easy ... | 20:26 |
mnaser | guess by the logic of things you don't have to write any either :) | 20:27 |
dansmith | le woot :) | 20:27 |
dansmith | mnaser: can we do gate hackage to test this against your stuff before we merge | 20:28 |
dansmith | ? | 20:28 |
mnaser | dansmith: i think so. i'm trying to think if it can work or not, because it's not a direct dependencies (i.e. nova or neutron) but a dependencies of it | 20:28 |
dansmith | mriedem is a ninja with that stuff | 20:29 |
bnemec | Wow, I still have a draft comment on https://review.openstack.org/#/c/63444 | 20:29 |
mnaser | i think what could be a way to hack around it is make a dnm change to nova pointing requirements to that wip oslo.service change | 20:30 |
bnemec | However, it was a commit message nitpick as opposed to anything useful. | 20:30 |
mnaser | and then we should be able to clone nova with updated requirements and pull it down | 20:30 |
* mnaser wonders if there's an easy way to update requirements.txt to somehow point at a specific oslo change | 20:30 | |
openstackgerrit | Dan Smith proposed openstack/oslo.service master: Revert assertion made about SIGHUP behavior for non-daemon services https://review.openstack.org/641793 | 20:32 |
dansmith | mnaser: ^ | 20:32 |
dansmith | I tagged the oslo and nova bugs there. I think the others have been worked around so probably not worth mentioning, but if others care, we can | 20:32 |
mnaser | dansmith: shouldn't it be "if is_sighup(signo): break"? | 20:33 |
mnaser | because we want to breakout and not hit the self.restart() ? | 20:33 |
dansmith | isn't it? | 20:33 |
mnaser | because `_is_sighup_and_daemon` evaluated to true when i ran it in systemd (i actually did a LOG.debug() with a system reload afterwards) | 20:33 |
mnaser | it's "if not _is_sighup(signo): break" which means break out _only_ if its not sighup unless im really botting out | 20:34 |
dansmith | oh, I was blindly reverting the and daemon one, but it was "not sighup" before | 20:34 |
mnaser | the daemon seemed to be working ok, systemd was saying that it was true, but if `_is_sighup_and_daemon` returns true on a reload, "if not true: break" => it won't break and hit self.restart() | 20:35 |
dansmith | restart() is not what we want though right? | 20:35 |
mnaser | if we do sighup i assume we _dont_ want a restart | 20:35 |
*** e0ne has joined #openstack-oslo | 20:35 | |
zaneb | I feel like if the 'not' was the wrong way around we'd have noticed that no (maskable) signals other than SIGHUP work for killing the process before now... | 20:36 |
mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L395 | 20:36 |
dansmith | zaneb: right that's what I was thinking | 20:36 |
mnaser | from what i understand, that code will break out only if it's NOT sighup and daemon | 20:36 |
mnaser | so if you get a reload, you will evaluate false, and it'll call self.restart() | 20:36 |
mnaser | and i assume self.restart() is not what we want, but self.reset() ? | 20:36 |
dansmith | mnaser: well we did handle_signal which calls the reset I think | 20:37 |
dansmith | we want to not run restart though, AFAIK | 20:37 |
mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L778-L784 | 20:37 |
mnaser | we call .stop() | 20:37 |
mnaser | then .reset() / | 20:37 |
mnaser | so restart() is not an actual restart of process, but reset | 20:38 |
mnaser | so we do need to call it, that code is right, but it seems like we are calling .stop() then .reset() | 20:38 |
dansmith | stop where? from _reload_service? | 20:38 |
dansmith | I'm actually not sure why we're not exiting in that now | 20:38 |
mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L397 | 20:39 |
*** aojea has quit IRC | 20:39 | |
mnaser | into | 20:39 |
mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L314 | 20:39 |
mnaser | into | 20:39 |
mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L780 | 20:39 |
mnaser | which does | 20:39 |
mnaser | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L760-L762 | 20:39 |
dansmith | no, I'm saying: | 20:40 |
dansmith | https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L360 | 20:40 |
dansmith | calls https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L342 | 20:40 |
dansmith | which raises SignalExit | 20:40 |
dansmith | oh wait I see | 20:40 |
dansmith | it doesn't exit unless SYSTEMexit | 20:41 |
mnaser | yeah but the stop still happens in self.restart() which _really_ is self.reset() | 20:41 |
mnaser | and self.reset() seems to be a self.stop() followed by self.reset() | 20:42 |
dansmith | not wure what you mean by that, but, | 20:42 |
dansmith | if we break on that sighup thing we'll exit the while loop and drop out right? | 20:42 |
mnaser | dansmith: yes | 20:43 |
dansmith | which will bubble up to ProcessLauncher's while loop, | 20:43 |
mnaser | but we don't actually want to break out, because without calling self.restart(), we won't reset | 20:43 |
dansmith | which will also break and then os._exit right? | 20:43 |
dansmith | I guess I'm confused about what you're suggesting needs to change, because just flipping the logic on "is not sighup" doesn't seem right | 20:44 |
mnaser | dansmith: yes, i was wrong about that, i assumed self.restart() was a full service restart | 20:45 |
mnaser | but in reality, self.restart() is a reset | 20:45 |
mnaser | so all that logic is okay, but i think your assumption that Service.stop() is _not_ being called is not true, because it is being called on resets | 20:45 |
mnaser | because of https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L778-L784 | 20:46 |
dansmith | but restart() does a stop() | 20:46 |
dansmith | mnaser: stop() can't be called while we're running or we tear down rpc | 20:46 |
mnaser | and we call self.manager.cleanup_host() in stop() in nova's service | 20:46 |
dansmith | right | 20:46 |
mnaser | which sets self._events = None and then causes the breakage | 20:47 |
mnaser | so either self.reset() needs to rebuild everything that self.stop() tore down, or we stop calling self.stop() in oslo service and let services handle their own resets | 20:47 |
dansmith | we have to make a change here that ends up with us calling Service.reset() without Service.stop() first, agreed? | 20:47 |
mnaser | correct | 20:47 |
dansmith | mnaser: it can't do the former because that means we kill in-flight rpc calls | 20:48 |
mnaser | right, it seems that right now, oslo's default behaviour is to call stop() on the service, then reset(), then start() | 20:48 |
mnaser | if you ask me, sounds like a normal restart.. | 20:48 |
mnaser | oslo should just call reset() and do nothing else | 20:49 |
dansmith | right.. reset should be pointless, | 20:49 |
dansmith | although it's probably doing a reset because it doesn't re-create the Service() object because of the sequencing | 20:49 |
dansmith | right | 20:49 |
dansmith | mnaser: I think there might be multple paths in the same file here, | 20:50 |
dansmith | because the ProcessLauncher version looks closer | 20:50 |
dansmith | well, no it kills all children too | 20:50 |
mnaser | yep | 20:50 |
dansmith | but ProcessLauncher.wait() is replicating things from ServiceLauncher() | 20:50 |
mnaser | i'm struggling to understand the why behind that | 20:51 |
dansmith | mnaser: so I haven't eaten anything all day yet and am pretty foggy and need to go do that.. you wanna take that patch and hack on it? | 20:51 |
dansmith | yeah, that seems weird to me too | 20:51 |
dansmith | I'm guessing it's an mpm sort of "processes or threads" sort of thing | 20:51 |
mnaser | https://github.com/openstack/neutron/blob/master/neutron/service.py#L137-L146 | 20:51 |
mnaser | https://github.com/openstack/neutron/blob/9a15084375a80df830cf50c1fb20908b5d7f822c/neutron/common/config.py#L106-L112 | 20:52 |
mnaser | im gonna guess neutron makes that same assumption | 20:52 |
mnaser | and thats' why it dies, it has no rpc after a reload | 20:52 |
dansmith | yeah | 20:52 |
mnaser | ok, this will be interesting. a fix in oslo is probably the fundamental way but i'll try to study the history behind the cahnges | 20:52 |
mnaser | and see why we've decided to .stop() .reset() .start() | 20:52 |
dansmith | I think it has to be a fix in oslo, because we can't really prevent the stop (cleanly) from in nova or neutron | 20:53 |
mnaser | yeah, it's not like we'd be changing default behavior either, i'll dig | 20:53 |
dansmith | ack, thanks | 20:54 |
openstackgerrit | Ben Nemec proposed openstack/oslo.log master: Test bionic for legacy job https://review.openstack.org/641799 | 20:55 |
*** e0ne has quit IRC | 21:14 | |
*** mtreinish has joined #openstack-oslo | 21:30 | |
*** kgiusti has left #openstack-oslo | 21:37 | |
*** ansmith has quit IRC | 21:46 | |
*** e0ne has joined #openstack-oslo | 22:01 | |
*** raildo has quit IRC | 22:13 | |
*** a-pugachev has joined #openstack-oslo | 22:21 | |
*** a-pugachev has quit IRC | 22:22 | |
*** tosky has quit IRC | 22:22 | |
*** a-pugachev has joined #openstack-oslo | 22:22 | |
*** ansmith has joined #openstack-oslo | 22:28 | |
*** tosky has joined #openstack-oslo | 22:29 | |
*** e0ne has quit IRC | 22:34 | |
*** e0ne has joined #openstack-oslo | 22:34 | |
*** e0ne has quit IRC | 22:36 | |
*** rcernin has joined #openstack-oslo | 22:41 | |
openstackgerrit | Ben Nemec proposed openstack/oslo.log master: Test bionic for legacy job https://review.openstack.org/641799 | 22:42 |
*** purplerbot has quit IRC | 23:26 | |
*** purplerbot has joined #openstack-oslo | 23:26 | |
*** tosky has quit IRC | 23:34 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!