*** Guest28904 has quit IRC | 00:34 | |
*** mmethot has quit IRC | 01:12 | |
*** mmethot has joined #openstack-oslo | 01:55 | |
*** ianychoi has joined #openstack-oslo | 02:00 | |
*** ianychoi has quit IRC | 03:49 | |
*** ianychoi has joined #openstack-oslo | 03:49 | |
*** ianychoi has quit IRC | 03:51 | |
*** ianychoi has joined #openstack-oslo | 03:52 | |
*** lbragstad has quit IRC | 03:53 | |
*** pcaruana has joined #openstack-oslo | 05:20 | |
*** larainema has quit IRC | 05:35 | |
*** Luzi has joined #openstack-oslo | 05:52 | |
*** lpetrut has joined #openstack-oslo | 06:05 | |
*** hberaud|gone is now known as hberaud | 06:47 | |
*** starborn has joined #openstack-oslo | 06:58 | |
*** tesseract has joined #openstack-oslo | 07:08 | |
*** rcernin has quit IRC | 07:19 | |
*** jaosorior has quit IRC | 07:19 | |
openstackgerrit | Hervé Beraud proposed openstack/tooz master: Update Sphinx requirement Cap grpcio https://review.opendev.org/659590 | 07:46 |
---|---|---|
*** lpetrut has quit IRC | 08:09 | |
openstackgerrit | Natal Ngétal proposed openstack/oslo.cache stable/stein: Avoid tox_install.sh for constraints support https://review.opendev.org/659174 | 08:12 |
*** tosky has joined #openstack-oslo | 08:15 | |
*** hberaud has quit IRC | 08:22 | |
*** hberaud has joined #openstack-oslo | 08:23 | |
*** lpetrut has joined #openstack-oslo | 08:29 | |
openstackgerrit | Natal Ngétal proposed openstack/oslo.log stable/stein: Cap bandit below 1.6.0 version and update sphinx and limit monotonic. https://review.opendev.org/659755 | 08:56 |
*** e0ne has joined #openstack-oslo | 09:17 | |
*** rcernin has joined #openstack-oslo | 09:50 | |
*** hberaud is now known as hberaud|school-r | 09:56 | |
*** hberaud|school-r is now known as hberaud|lunch | 10:02 | |
*** hberaud|lunch is now known as hberaud | 11:23 | |
openstackgerrit | Merged openstack/stevedore master: Cap Bandit below 1.6.0 and update Sphinx requirement https://review.opendev.org/659551 | 11:28 |
*** raildo has joined #openstack-oslo | 11:34 | |
*** rcernin has quit IRC | 11:36 | |
openstackgerrit | Hervé Beraud proposed openstack/stevedore master: Uncap bandit. https://review.opendev.org/659554 | 11:44 |
*** yan0s has joined #openstack-oslo | 11:48 | |
*** kgiusti has joined #openstack-oslo | 12:52 | |
*** kgiusti has left #openstack-oslo | 12:52 | |
*** jaosorior has joined #openstack-oslo | 13:11 | |
*** lbragstad has joined #openstack-oslo | 13:28 | |
*** mmethot has quit IRC | 13:50 | |
*** ansmith has joined #openstack-oslo | 13:52 | |
bnemec | #success Capped all the bandits in Oslo to unblock CI | 13:53 |
openstackstatus | bnemec: Added success to Success page (https://wiki.openstack.org/wiki/Successes) | 13:53 |
stephenfin | Good riddance | 14:00 |
*** lpetrut has quit IRC | 14:06 | |
*** ansmith has quit IRC | 14:08 | |
*** kgiusti has joined #openstack-oslo | 14:11 | |
*** mmethot has joined #openstack-oslo | 14:13 | |
*** Luzi has quit IRC | 14:25 | |
gsantomaggio | >So if we take a step back this feature will have to be implemented as a new parameter to the __init__ of the RPCClient class. | 14:36 |
gsantomaggio | @kgiusti totally agree! | 14:36 |
gsantomaggio | it is easier ! | 14:36 |
kgiusti | gsantomaggio: sorry man - I should've realized that at the start :( | 14:39 |
gsantomaggio | @kgiusti don't worry! it was a good opportunity to study the code | 14:39 |
*** jaosorior has quit IRC | 14:41 | |
kgiusti | gsantomaggio: I like you Gabriele - you look on the bright side of life :) | 14:41 |
gsantomaggio | @kgiusti :) thank you! | 14:42 |
kgiusti | gsantomaggio: I think folks will like this feature. | 14:42 |
kgiusti | gsantomaggio: the RPC via a queue currently hides the possibility that there's no one listening | 14:43 |
kgiusti | gsantomaggio: which forces the caller to wait until the timeout hits, where the user would probably want to fail immediately instead. | 14:44 |
gsantomaggio | @kgiusti yes exactly and also it is useful for troubleshooting | 14:44 |
kgiusti | gsantomaggio: +1 | 14:44 |
*** kmalloc is now known as kmalloc_away | 14:47 | |
*** yan0s has quit IRC | 14:55 | |
*** starborn has quit IRC | 15:11 | |
*** ansmith has joined #openstack-oslo | 15:40 | |
openstackgerrit | Merged openstack/tooz master: Unblock tooz gate https://review.opendev.org/658204 | 15:44 |
*** shardy has quit IRC | 15:49 | |
*** ansmith has quit IRC | 15:58 | |
*** ansmith has joined #openstack-oslo | 16:09 | |
*** hberaud is now known as hberaud|gone | 16:14 | |
openstackgerrit | Rodolfo Alonso Hernandez proposed openstack/oslo.privsep master: Pass correct arguments to six.reraise https://review.opendev.org/659831 | 16:15 |
*** tesseract has quit IRC | 16:17 | |
*** tesseract has joined #openstack-oslo | 16:18 | |
*** ansmith has quit IRC | 16:29 | |
*** ralonsoh has joined #openstack-oslo | 16:46 | |
*** njohnston has joined #openstack-oslo | 16:46 | |
ralonsoh | bnemec, hi, one question about privsep. It's possible to call a function with a privsep decorator and have some kind of endless loop there? Something like a monitor | 16:47 |
ralonsoh | bnemec, or privsep is for "atomic" functions | 16:47 |
bnemec | ralonsoh: Now that privsep can run things in parallel you could theoretically launch a long-running privileged function. | 16:52 |
bnemec | You would need to be careful not to do it too many times though. There's a limit to how many parallel calls privsep will allow. | 16:52 |
ralonsoh | bnemec, yes, I saw there is a limit in the number of threads, thanks! | 16:53 |
bnemec | And it's user-configurable, so someone could set a really low limit and block all of your other privileged calls. | 16:53 |
ralonsoh | bnemec, yes, that could be a serious problem | 16:54 |
*** tesseract has quit IRC | 16:58 | |
*** tesseract has joined #openstack-oslo | 16:59 | |
*** lbragstad has quit IRC | 17:12 | |
*** lbragstad has joined #openstack-oslo | 17:12 | |
*** tesseract has quit IRC | 17:17 | |
*** e0ne has quit IRC | 17:17 | |
*** ralonsoh has quit IRC | 17:24 | |
*** ansmith has joined #openstack-oslo | 17:35 | |
*** ansmith has quit IRC | 17:45 | |
*** ansmith has joined #openstack-oslo | 17:49 | |
*** imacdonn has joined #openstack-oslo | 18:18 | |
*** ansmith has quit IRC | 18:26 | |
imacdonn | any messaging / AMQP expert available for a consult? Trying to diagnose an issue where an RPC call (by nova-api) is attempted on a dead TCP connection. A "Connection reset by peer" error is received, a new connection is established, and the message gets resent, but then there's a timeout waiting for the reply, for some reason.... | 18:44 |
imacdonn | ^ kgiusti maybe ? | 18:47 |
kgiusti | imacdonn: why are the TCP links down - was there a network fault? | 18:48 |
gsantomaggio | I'd check also the called module, maybe the problem is simply to the called | 18:49 |
imacdonn | kgiusti: "it's complicated" ... that's a whole other facet of the problem (has to do with eventlet monkey patching and threading in WSGI containers not getting along - resulting in breaking the AMQP heartbeat, so rabbitmq-server drops the conenction) ... trying to focus on the failure to utilise the reestablished connection at the moment | 18:50 |
gsantomaggio | @imacdonn are you using SSL connctions ? | 18:52 |
imacdonn | I've confirmed that the request is received on the target host ... the issue seems to be handling the reply, somehow .. and it only fails in the case where the conenction had just been reestablished and the request retried | 18:52 |
imacdonn | negative on SSL | 18:52 |
gsantomaggio | is there something on the rabbitmq logs? | 18:52 |
gsantomaggio | did you check them ? | 18:53 |
kgiusti | imacdonn: ok - so its the path back from server. Any chance you can look at the reply queues - trying to see if the reply gets to rabbitmq or not | 18:53 |
imacdonn | rabbitmq-server logs where it dropped the connection after observing no heartbeats for 60s | 18:53 |
kgiusti | imacdonn: just to clarify: is the client <--> rabbitmq the only connection that is failing? or is the rabbitmq <---> server also dropping? | 18:54 |
imacdonn | kgiusti: if "client" means the host that is sending the RPC request, then that is where the problem lies | 18:54 |
kgiusti | imacdonn: sorry - yes client=RPC request server=RPC reply | 18:55 |
imacdonn | log from nova-api with some debug turned on: http://paste.openstack.org/show/AZQET5IOgntK785RHtxh/ | 18:55 |
imacdonn | 23:48:19.113 is where the retried publish happens .. then nothing for 60 seconds until the timeout | 18:56 |
imacdonn | is there a way I can log AMQP packets is oslo.messaging ? | 18:57 |
kgiusti | imacdonn: I'm going through that log - that's from the RPC request client side correct? | 19:01 |
kgiusti | imacdonn: I'll need a sec | 19:01 |
imacdonn | kgiusti: correct, and thanks | 19:01 |
openstackgerrit | Merged openstack/oslo-specs master: Remove policy-merge policy https://review.opendev.org/648766 | 19:02 |
kgiusti | imacdonn: do you know which oslo.messaging version you are using? | 19:03 |
imacdonn | kgiusti: it's RDO Stein .. RPM version is python2-oslo-messaging-9.5.0-1.el7.noarch | 19:04 |
kgiusti | imacdonn: regarding logging: --log_levels "oslo.messaging=DEBUG,oslo_messaging=DEBUG" should give us everything - I don't think we dump the entire msg tho... | 19:04 |
kgiusti | imacdonn: kk thanks | 19:05 |
imacdonn | yeah, I have those, and amqp=DEBUG,amqplib=DEBUG , and a couple of env vars to make kombu talk too | 19:05 |
kgiusti | imacdonn: on the "server side" - is there anything in the logs around that time with the text "sending reply msg id:" ? | 19:29 |
imacdonn | kgiusti: well, I don't have the uber-debugging enabled on the server side, but I do see that it gets the request, and I have no reason to doubt that it is replying, since it always works, except in the case where the client had to reestablish its connection asnd try publishing the request | 19:30 |
kgiusti | imacdonn: ok - that log message prints the msg_id and reply_q - I wanted to see if it matched the logs you sent | 19:32 |
imacdonn | kgiusti: OK, I can try to add debug on the server side | 19:33 |
imacdonn | trying to reproduce the problem now .. it's a bit tricky, with the timing ... working on it | 19:42 |
kgiusti | imacdonn: np | 19:42 |
*** imacdonn has quit IRC | 19:45 | |
*** imacdonn has joined #openstack-oslo | 19:52 | |
imacdonn | kgiusti: reproduced the problem .. client said: | 19:53 |
imacdonn | 2019-05-17 19:50:32.041 25492 ERROR nova.api.openstack.wsgi [req-02b27aed-9f3e-448b-a26b-eb689ad2d456 d451459e393247d2a571ea2ec6914b7f bc885705e450495ca3c5b5f5a54f7355 - default default] Unexpected exception in API method: MessagingTimeout: Timed out waiting for a reply to message ID bd16e129e7e544689ab7997fc8e3e270 | 19:53 |
imacdonn | despite server saying: | 19:53 |
imacdonn | 2019-05-17 19:49:32.082 17993 DEBUG oslo_messaging._drivers.amqpdriver [req-02b27aed-9f3e-448b-a26b-eb689ad2d456 d451459e393247d2a571ea2ec6914b7f bc885705e450495ca3c5b5f5a54f7355 - default default] sending reply msg_id: bd16e129e7e544689ab7997fc8e3e270 reply queue: reply_6579029399d848e7a4a481970fc0d3af time elapsed: 0.0391101880086s _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:125 | 19:53 |
kgiusti | imacdonn: can you do a "sudo rabbitmqctl list_queues" against your broker? | 19:54 |
imacdonn | kgiusti: anything in particular to look for in that ? | 19:55 |
kgiusti | imacdonn: sorry - use the "-n <domain>" option to point the broker | 19:55 |
kgiusti | imacdonn: yeah I wanted to see if there's a queue called reply_6579<etc> and if there's a message stuck on it | 19:56 |
imacdonn | appears so: reply_6579029399d848e7a4a481970fc0d3af1 | 19:56 |
kgiusti | imacdonn: taa-daa! | 19:57 |
kgiusti | imacdonn: we've narrowed the problem down to the requester side. | 19:57 |
imacdonn | cool .. I think ;) | 19:57 |
kgiusti | imacdonn: I'm running a test locally where I restart rabbitmq in the middle of an RPC call (while the server is processing before the reply is sent) | 19:58 |
kgiusti | imacdonn: I'm walking through the log output here - stay tuned. | 19:58 |
imacdonn | kgiusti: you may have to defeat heartbeats .... normally the heartbeat gets the connection reset and kills the old connection before anyone tries to use it | 19:59 |
imacdonn | ... or at least some of the time, that happens | 19:59 |
kgiusti | imacdonn: can you grep through the requester-side logs for "reply_657902<etc>"? | 20:00 |
imacdonn | quite a few matches for that | 20:01 |
imacdonn | I can paste a new client side log .. after a min to scrub it a little | 20:01 |
*** pcaruana has quit IRC | 20:02 | |
kgiusti | imacdonn: kk | 20:03 |
imacdonn | kgiusti: http://paste.openstack.org/show/s5tTpEgBhjagvmQ7MK2F/ | 20:03 |
imacdonn | I guess that reply queue gets reused .. most of the matches were from earlier | 20:07 |
kgiusti | imacdonn: hrm - shouldn't get reused - it's a randomly generated uuid4 string | 20:08 |
kgiusti | imacdonn: it should survive reconnects tho | 20:08 |
imacdonn | kgiusti: new one for each RPC request? or... | 20:09 |
kgiusti | imacdonn: nope - one for each rpc client (transport, actually) | 20:09 |
imacdonn | kgiusti: k, so I just meant that there were matches in the log for my earlier attempts to reproduce the problem | 20:10 |
kgiusti | imacdonn: they are somewhat long lived | 20:10 |
imacdonn | I guess "reused" was the wrong term ... but that's what I meant | 20:11 |
kgiusti | imacdonn: what I would *expect* to see in those logs is a "Queue.declare: reply_xxx" right before the "Reconnected to AMQP Server..." line | 20:12 |
* kgiusti thinks... | 20:13 | |
imacdonn | hmm, I see Queue.declare in the case where the connection reset didn't reproduce | 20:14 |
imacdonn | in that case, it's right *after* a connection was established, it appears: | 20:15 |
imacdonn | 2019-05-17 19:44:33.121 25492 DEBUG oslo.messaging._drivers.impl_rabbit [req-4a9ae3a4-5d3a-4a30-af4b-917a755797af d451459e393247d2a571ea2ec6914b7f bc885705e450495ca3c5b5f5a54f7355 - default default] [13325785-dcff-4bcf-ac48-580b78372f51] Connected to AMQP server on imot03mq.dcilab.oraclecorp.com:5672 via [amqp] client with port 33266. __init__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:543 | 20:15 |
imacdonn | 2019-05-17 19:44:33.122 25492 DEBUG oslo.messaging._drivers.impl_rabbit [req-4a9ae3a4-5d3a-4a30-af4b-917a755797af d451459e393247d2a571ea2ec6914b7f bc885705e450495ca3c5b5f5a54f7355 - default default] [13325785-dcff-4bcf-ac48-580b78372f51] Queue.declare: reply_6579029399d848e7a4a481970fc0d3af declare /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py:253 | 20:15 |
kgiusti | imacdonn: I would expect it to be logged once when the connection is first established, then again each time the connection recovers... | 20:16 |
kgiusti | imacdonn: can I send you a patched imp_rabbit.py file with extra debug logs? | 20:17 |
imacdonn | kgiusti: sure .. iain.macdonnell -at- oracle.com | 20:18 |
kgiusti | imacdonn: I want to determine if the channel is not being established. kk | 20:18 |
imacdonn | kgiusti: not pressuring, but just checking in case it fell in a hole ... let me know when you've sent that | 20:38 |
kgiusti | imacdonn: np - no just running tests to verify I'm tracing the right stuff... | 20:39 |
imacdonn | cool | 20:39 |
*** ansmith has joined #openstack-oslo | 20:45 | |
kgiusti | imacdonn: email sent | 20:49 |
imacdonn | kgiusti: OK, watching for it... | 20:49 |
imacdonn | nothing so-far ... must be stuck in a queue ..... ;) | 20:51 |
kgiusti | imacdonn: try https://paste.fedoraproject.org/paste/vOyS5seUjWSma~emIanwXw | 20:52 |
imacdonn | kgiusti: k, patch applied .. will try to repro | 20:54 |
imacdonn | kgiusti: http://paste.openstack.org/show/rpS2mcKyIESyG4JehLxd/ | 21:00 |
imacdonn | kgiusti: wait, paste didn't work | 21:00 |
imacdonn | grumble .. it says the paste contains spam, and makes me do captcha, then cuts off most of the content | 21:02 |
*** raildo has quit IRC | 21:03 | |
imacdonn | kgiusti: this has the timeout part: http://paste.openstack.org/show/K5Ut8teApN4In7mYcPyG/ | 21:03 |
imacdonn | I was trying to paste the entire log, including when the connection was first established earlier | 21:03 |
kgiusti | imacdonn: how well does it compress? you can try emailing it to kgiusti at gmail.com | 21:05 |
imacdonn | kgiusti: here's the earlier part, where the first request established the connection, and worked OK: http://paste.openstack.org/show/7wW4RtRW4wUKdvfrRG6i/ | 21:06 |
*** ansmith has quit IRC | 21:09 | |
kgiusti | imacdonn: sorry I gotta drop off. I can feel Wifey's burning glare on the back of my neck.... | 21:17 |
imacdonn | lol | 21:17 |
kgiusti | imacdonn: ttyl | 21:18 |
imacdonn | kgiusti: np ... I'll lurk here for whenever you want to resume | 21:18 |
*** kgiusti has left #openstack-oslo | 21:18 | |
imacdonn | melwitt sean-k-mooney: above may be interesting "light reading" ... "to be continued" | 21:19 |
melwitt | imacdonn: excellent, looking forward to learning more. good call bringing in the oslo.messaging experts | 21:46 |
imacdonn | melwitt: :) thanks for your help yesterday, and have a good weekend! | 21:47 |
melwitt | imacdonn: likewise, thank you for digging into this with us. enjoy your weekend :) | 21:48 |
*** tosky has quit IRC | 22:42 | |
*** dave-mccowan has quit IRC | 22:50 | |
*** mmethot has quit IRC | 23:53 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!