dims | klindgren: all of these are in the code paths where there is a retry loop and when each try has an exception it logs it and still continues... | 00:00 |
---|---|---|
klindgren | dims - from what I can tell in the code unless I am reading it wrong we log the same error a number of times if we are reconnecting. 3 in fact. | 00:00 |
*** liusheng has quit IRC | 00:00 | |
*** liusheng has joined #openstack-oslo | 00:00 | |
klindgren | I guess I am more concerned and though that heartbeating was suppose to fix the fact that when we got t publish something on rabbitmq - some how we keep publishing to a connection that has been closed | 00:04 |
klindgren | and keep sending to a socket that is CLOSED_WAIT | 00:04 |
dims | klindgren: how long each of those happens is the question | 00:05 |
klindgren | I would agree that it seems like eventually the retry succeseds? I mean I have no way of knowing it except for the fact that I don't log anymore retries | 00:05 |
dims | you have extra logging? | 00:06 |
klindgren | well that failover happened almost an hour ago - and I still see errors coming in | 00:07 |
klindgren | dims, what do you need? | 00:07 |
dims | klindgren: which one do you see now "ConnectionForced" or "IOError: Socket closed" | 00:08 |
klindgren | neither - the error that consistently repeates is [-] AMQP server on openstack-test.int.godaddy.com:5672 is unreachable: connection already closed. Trying again in 1 seconds. | 00:10 |
klindgren | with the exact same traceback as: https://gist.github.com/krislindgren/6a05e91263801de94ca1#file-fail-after-reconnect-L40 | 00:10 |
klindgren | then 1 second later I will see: Reconnected to AMQP server on openstack-test.int.godaddy.com:5672 logged at info | 00:11 |
klindgren | then minutes later - repeat | 00:12 |
dims | which process do you see this in? | 00:12 |
klindgren | nova* | 00:13 |
klindgren | nova-compute nova-metadata | 00:13 |
klindgren | mainly because I assume those are the only components that seem to do anything actively - we have vm's that call into metadata peridically | 00:13 |
klindgren | and its always logging an error trying to talk to conductor | 00:14 |
klindgren | since we are using cells - I haven't been testing this in the api cell where neutron runs - so only nova components are hooked up to this rabbitmq | 00:15 |
dims | ok, i think i am at end of the rope with looking at the individual stack traces, would it be possible to get a full set of logs? starting just before the failover? | 00:16 |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Prototype a switch flow (WIP) https://review.openstack.org/164902 | 00:17 |
klindgren | Is it possible that we need to handle connection already closed errors differently? | 00:17 |
klindgren | dims - whatever you need | 00:17 |
dims | klindgren: thanks, davanum AT gmail.com | 00:18 |
klindgren | is logs from nova-compute at debug going to be ok? - they should be "less" noisey. | 00:18 |
klindgren | since I assume you really only needs longs from a single component? | 00:18 |
klindgren | logs* | 00:18 |
*** YorikSar has quit IRC | 00:18 | |
*** amotoki has joined #openstack-oslo | 00:19 | |
dims | klindgren: sure thanks, let's try that first | 00:19 |
*** YorikSar has joined #openstack-oslo | 00:19 | |
*** sreshetn1 has joined #openstack-oslo | 00:40 | |
*** YorikSar has quit IRC | 00:41 | |
*** stevemar has joined #openstack-oslo | 00:43 | |
*** sreshetn1 has quit IRC | 00:45 | |
*** achanda_ has quit IRC | 00:48 | |
*** salv-orlando has joined #openstack-oslo | 00:50 | |
*** sputnik13 has quit IRC | 00:52 | |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Prototype a switch flow (WIP) https://review.openstack.org/164902 | 00:58 |
*** geguileo has quit IRC | 01:03 | |
*** YorikSar has joined #openstack-oslo | 01:04 | |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Prototype a switch flow (WIP) https://review.openstack.org/164902 | 01:05 |
harlowja | dhellmann a import that i always thought felt out of place, seems in line with your __init__ email; https://github.com/openstack/cinder/blob/master/cinder/volume/__init__.py#L27 | 01:08 |
harlowja | seems odd :-P | 01:08 |
harlowja | so not only import time change, but config that influences import time decision which may influence who knows what | 01:09 |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Prototype a switch flow (WIP) https://review.openstack.org/164902 | 01:15 |
*** exploreshaifali has quit IRC | 01:20 | |
*** geguileo has joined #openstack-oslo | 01:25 | |
*** dims has quit IRC | 01:26 | |
*** liusheng has quit IRC | 01:31 | |
*** liusheng has joined #openstack-oslo | 01:32 | |
*** sputnik13 has joined #openstack-oslo | 01:34 | |
*** jungleboyj has joined #openstack-oslo | 01:35 | |
*** geguileo has quit IRC | 01:42 | |
*** geguileo has joined #openstack-oslo | 01:43 | |
*** sputnik13 has quit IRC | 01:44 | |
*** geguileo has quit IRC | 01:48 | |
*** YorikSar has quit IRC | 01:49 | |
*** dims has joined #openstack-oslo | 01:54 | |
*** sputnik13 has joined #openstack-oslo | 01:57 | |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Prototype a switch flow #2 (WIP) https://review.openstack.org/164922 | 01:57 |
*** geguileo has joined #openstack-oslo | 02:01 | |
*** dims has quit IRC | 02:03 | |
*** yamahata has quit IRC | 02:07 | |
*** zzzeek has quit IRC | 02:09 | |
*** sputnik13 has quit IRC | 02:10 | |
*** salv-orlando has quit IRC | 02:11 | |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Prototype a switch flow #2 (WIP) https://review.openstack.org/164922 | 02:11 |
*** sigmavirus24_awa is now known as sigmavirus24 | 02:24 | |
*** achanda has joined #openstack-oslo | 02:27 | |
*** sputnik13 has joined #openstack-oslo | 02:38 | |
*** sputnik13 has quit IRC | 02:42 | |
*** sputnik13 has joined #openstack-oslo | 02:43 | |
*** achanda has quit IRC | 02:52 | |
*** jecarey has joined #openstack-oslo | 02:59 | |
*** sputnik13 has quit IRC | 03:02 | |
*** dims has joined #openstack-oslo | 03:04 | |
*** salv-orlando has joined #openstack-oslo | 03:11 | |
*** dims has quit IRC | 03:11 | |
*** boris-42 has quit IRC | 03:12 | |
*** harlowja is now known as harlowja_away | 03:15 | |
*** zzzeek has joined #openstack-oslo | 03:22 | |
*** zzzeek has quit IRC | 03:22 | |
*** david-lyle is now known as david-lyle_afk | 03:35 | |
*** rushiagr_away is now known as rushiagr | 03:39 | |
*** rushiagr is now known as rushiagr_away | 04:00 | |
*** achanda has joined #openstack-oslo | 04:01 | |
*** subscope_ has joined #openstack-oslo | 04:04 | |
*** stevemar has quit IRC | 04:22 | |
*** stevemar has joined #openstack-oslo | 04:23 | |
*** salv-orlando has quit IRC | 04:23 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 04:24 | |
*** rushiagr_away is now known as rushiagr | 04:39 | |
*** klindgren has quit IRC | 04:42 | |
*** enikanorov has quit IRC | 04:43 | |
*** enikanorov has joined #openstack-oslo | 04:43 | |
*** achanda has quit IRC | 04:46 | |
*** achanda has joined #openstack-oslo | 04:49 | |
*** achanda has quit IRC | 04:49 | |
*** sputnik13 has joined #openstack-oslo | 04:55 | |
openstackgerrit | Merged openstack/taskflow: Ensure we register & deregister conductor listeners https://review.openstack.org/164096 | 05:01 |
openstackgerrit | Merged openstack/taskflow: Just use the class name instead of TYPE constant https://review.openstack.org/164554 | 05:01 |
*** exploreshaifali has joined #openstack-oslo | 05:12 | |
*** salv-orlando has joined #openstack-oslo | 05:20 | |
*** exploreshaifali has quit IRC | 05:30 | |
*** subscope_ has quit IRC | 05:43 | |
*** yamahata has joined #openstack-oslo | 05:49 | |
*** rushiagr is now known as rushiagr_away | 05:51 | |
*** ajo has joined #openstack-oslo | 05:57 | |
*** mtanino has quit IRC | 06:21 | |
*** achanda has joined #openstack-oslo | 06:26 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/oslo.db: Imported Translations from Transifex https://review.openstack.org/164579 | 06:33 |
openstackgerrit | Li Ma proposed openstack/oslo.messaging: Add pluggability for matchmakers https://review.openstack.org/161615 | 06:33 |
*** amotoki has quit IRC | 06:43 | |
*** stevemar has quit IRC | 06:44 | |
openstackgerrit | Li Ma proposed openstack/oslo.messaging: ZeroMQ deployment guide https://review.openstack.org/130943 | 06:44 |
*** salv-orlando has quit IRC | 06:44 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/oslo.messaging: Imported Translations from Transifex https://review.openstack.org/164404 | 06:46 |
*** amotoki has joined #openstack-oslo | 06:49 | |
*** sreshetn1 has joined #openstack-oslo | 06:51 | |
*** yamahata has quit IRC | 06:53 | |
*** yamahata has joined #openstack-oslo | 06:53 | |
*** exploreshaifali has joined #openstack-oslo | 07:00 | |
openstackgerrit | Li Ma proposed openstack/oslo.messaging: Fix deleting keys during iteration in matchmaker heartbeat https://review.openstack.org/164972 | 07:07 |
*** yamahata has quit IRC | 07:20 | |
openstackgerrit | Li Ma proposed openstack/oslo.messaging: Add pluggability for matchmakers https://review.openstack.org/161615 | 07:25 |
*** exploreshaifali has quit IRC | 07:28 | |
*** achanda has quit IRC | 07:31 | |
*** inc0 has joined #openstack-oslo | 07:41 | |
*** ajo has quit IRC | 07:46 | |
*** ajo has joined #openstack-oslo | 07:46 | |
*** dtantsur|afk is now known as dtantsur | 08:03 | |
openstackgerrit | Mehdi Abaakouk proposed openstack/tooz: fix lock concurrency issues with certain drivers https://review.openstack.org/164642 | 08:03 |
*** e0ne has joined #openstack-oslo | 08:17 | |
openstackgerrit | Mehdi Abaakouk proposed openstack/tooz: fix lock concurrency issues with certain drivers https://review.openstack.org/164642 | 08:20 |
sileht | jd__, hi, I have written some test cases that exercice concurrency of the lock mechanism, and found some bug in certain drivers:( | 08:20 |
sileht | jd__, the most epic issue: | 08:23 |
sileht | + NOTE(sileht): mysql allow only one lock per connection at a time: | 08:23 |
sileht | + select GET_LOCK("a", 0); | 08:23 |
sileht | + select GET_LOCK("b", 0); <-- this release lock "a" ... | 08:23 |
*** ajo has quit IRC | 08:25 | |
*** sreshetn1 has quit IRC | 08:26 | |
*** dulek has joined #openstack-oslo | 08:30 | |
*** salv-orlando has joined #openstack-oslo | 08:31 | |
*** i159 has joined #openstack-oslo | 08:32 | |
*** exploreshaifali has joined #openstack-oslo | 08:34 | |
*** e0ne is now known as e0ne_ | 08:36 | |
openstackgerrit | Li Yingjun proposed openstack/oslo-incubator: Remove unused validate_ssl_version https://review.openstack.org/164982 | 08:36 |
*** dims has joined #openstack-oslo | 08:42 | |
*** dims has quit IRC | 08:48 | |
*** sreshetn1 has joined #openstack-oslo | 08:50 | |
*** andreykurilin_ has joined #openstack-oslo | 08:53 | |
*** boris-42 has joined #openstack-oslo | 08:59 | |
jd__ | sileht: rofl | 09:00 |
*** BrianShang has joined #openstack-oslo | 09:00 | |
jd__ | sileht: man you got to love MySQL | 09:00 |
*** dulek has quit IRC | 09:05 | |
*** viktors has joined #openstack-oslo | 09:08 | |
*** salv-orlando has quit IRC | 09:21 | |
*** salv-orlando has joined #openstack-oslo | 09:21 | |
openstackgerrit | Elena Ezhova proposed openstack/oslo-incubator: Store ProcessLauncher signal handlers on class level https://review.openstack.org/164993 | 09:24 |
*** sreshetn1 has quit IRC | 09:26 | |
*** e0ne_ has quit IRC | 09:30 | |
*** kashyapc has joined #openstack-oslo | 09:31 | |
*** rushiagr_away is now known as rushiagr | 09:39 | |
*** exploreshaifali has quit IRC | 09:45 | |
*** andreykurilin_ has quit IRC | 09:48 | |
*** dims has joined #openstack-oslo | 09:50 | |
*** sreshetn1 has joined #openstack-oslo | 09:52 | |
*** andreykurilin_ has joined #openstack-oslo | 09:56 | |
*** andreykurilin_ has quit IRC | 10:01 | |
*** sreshetn1 has quit IRC | 10:03 | |
*** ajo has joined #openstack-oslo | 10:05 | |
*** sreshetn1 has joined #openstack-oslo | 10:07 | |
openstackgerrit | Elena Ezhova proposed openstack/oslo-incubator: Store ProcessLauncher signal handlers on class level https://review.openstack.org/164993 | 10:11 |
*** sreshetn1 has quit IRC | 10:11 | |
*** mfedosin has quit IRC | 10:14 | |
*** e0ne has joined #openstack-oslo | 10:17 | |
*** YorikSar has joined #openstack-oslo | 10:28 | |
*** dulek has joined #openstack-oslo | 10:30 | |
*** achanda has joined #openstack-oslo | 10:31 | |
openstackgerrit | Julien Danjou proposed openstack-dev/pbr: Handle PEP426 markers https://review.openstack.org/165015 | 10:32 |
*** exploreshaifali has joined #openstack-oslo | 10:34 | |
*** achanda has quit IRC | 10:36 | |
*** dtantsur is now known as dtantsur|brb | 10:37 | |
*** e0ne is now known as e0ne_ | 10:40 | |
jd__ | dhellmann: is there an ETA yet on pbr 0.11 or the like (from master)? | 10:40 |
*** exploreshaifali has quit IRC | 10:41 | |
*** e0ne_ is now known as e0ne | 10:44 | |
*** rushiagr is now known as rushiagr_away | 10:56 | |
*** rushiagr_away is now known as rushiagr | 10:57 | |
*** e0ne is now known as e0ne_ | 11:08 | |
*** e0ne_ has quit IRC | 11:17 | |
*** cdent has joined #openstack-oslo | 11:22 | |
*** kashyapc has left #openstack-oslo | 11:23 | |
*** e0ne has joined #openstack-oslo | 11:27 | |
openstackgerrit | enikanorov proposed openstack/oslo.db: Avoid excessing logging of RetryRequest exception https://review.openstack.org/165032 | 11:28 |
openstackgerrit | Elena Ezhova proposed openstack/oslo-incubator: Store ProcessLauncher signal handlers on class level https://review.openstack.org/164993 | 11:31 |
*** shaohe_feng has joined #openstack-oslo | 11:38 | |
shaohe_feng | hi all, any one know how to avoid lazy load in db object? now there is a loop in my two objects. | 11:40 |
*** mtanino has joined #openstack-oslo | 11:46 | |
*** flwang1 has quit IRC | 11:53 | |
*** exploreshaifali has joined #openstack-oslo | 11:53 | |
d0ugal | Is the python-openstackclient under the oslo project group? | 11:55 |
*** flwang has joined #openstack-oslo | 11:56 | |
*** amotoki has quit IRC | 11:59 | |
*** e0ne is now known as e0ne_ | 12:00 | |
eezhova | dhellmann, hi! I wanted to ask you a question concerning handling SIGHUP by parent process (multi worker mode, using ProcessLauncher). Why doesn't parent process reload its configuration and call reset after sending SIGHUP to its children? | 12:01 |
*** subscope has quit IRC | 12:02 | |
*** rushiagr is now known as rushiagr_away | 12:04 | |
*** stevemar has joined #openstack-oslo | 12:05 | |
*** e0ne_ is now known as e0ne | 12:06 | |
*** sreshetn1 has joined #openstack-oslo | 12:06 | |
*** ihrachyshka has joined #openstack-oslo | 12:10 | |
*** subscope has joined #openstack-oslo | 12:14 | |
*** dims has quit IRC | 12:16 | |
*** dims has joined #openstack-oslo | 12:17 | |
*** dtantsur|brb is now known as dtantsur | 12:18 | |
*** jaypipes has joined #openstack-oslo | 12:20 | |
*** sreshetn1 has quit IRC | 12:23 | |
*** salv-orlando has quit IRC | 12:30 | |
*** exploreshaifali has quit IRC | 12:33 | |
*** gordc has joined #openstack-oslo | 12:39 | |
*** flwang has quit IRC | 12:45 | |
*** flwang has joined #openstack-oslo | 12:48 | |
*** rushiagr_away is now known as rushiagr | 12:49 | |
*** exploreshaifali has joined #openstack-oslo | 12:52 | |
*** bknudson has quit IRC | 12:57 | |
*** mriedem_away is now known as mriedem | 12:58 | |
*** salv-orlando has joined #openstack-oslo | 13:00 | |
*** bknudson has joined #openstack-oslo | 13:01 | |
*** jecarey has quit IRC | 13:02 | |
*** ChuckC has quit IRC | 13:04 | |
*** ChuckC_ has joined #openstack-oslo | 13:04 | |
*** sreshetn1 has joined #openstack-oslo | 13:07 | |
*** ChuckC has joined #openstack-oslo | 13:07 | |
*** ChuckC_ has quit IRC | 13:07 | |
*** amotoki has joined #openstack-oslo | 13:09 | |
dhellmann | harlowja_away: yes, that import assumes that the config has been parsed, which isn't necessarily safe | 13:09 |
dhellmann | jd__: I need to track down mordred and find out about pbr | 13:10 |
*** BrianShang_ has joined #openstack-oslo | 13:11 | |
*** ChuckC has quit IRC | 13:12 | |
*** BrianShang has quit IRC | 13:13 | |
*** flwang has quit IRC | 13:16 | |
*** mfedosin has joined #openstack-oslo | 13:17 | |
*** flwang has joined #openstack-oslo | 13:18 | |
*** elarson_ has joined #openstack-oslo | 13:18 | |
*** liusheng has quit IRC | 13:19 | |
*** liusheng has joined #openstack-oslo | 13:19 | |
*** elarson_ has quit IRC | 13:21 | |
*** mtanino has quit IRC | 13:24 | |
*** elarson_ has joined #openstack-oslo | 13:26 | |
*** alexpilotti has joined #openstack-oslo | 13:27 | |
*** salv-orlando has quit IRC | 13:31 | |
*** alexpilotti has quit IRC | 13:32 | |
jd__ | dhellmann: thanks | 13:32 |
*** alexpilotti has joined #openstack-oslo | 13:32 | |
*** ChuckC has joined #openstack-oslo | 13:33 | |
*** jungleboyj has quit IRC | 13:36 | |
*** flwang has quit IRC | 13:37 | |
*** flwang has joined #openstack-oslo | 13:38 | |
*** prad has joined #openstack-oslo | 13:42 | |
*** zzzeek has joined #openstack-oslo | 13:50 | |
*** amrith is now known as _amrith_ | 13:50 | |
*** elarson_ has quit IRC | 13:58 | |
dims | eezhova: missing feature :) | 13:59 |
dims | d0ugal: nope. has its own core - https://review.openstack.org/#/admin/groups/?filter=openstackclient | 14:00 |
d0ugal | dims: aha, thanks! | 14:00 |
stevemar | d0ugal, whats up with osc? | 14:01 |
openstackgerrit | Mehdi Abaakouk proposed openstack/tooz: fix lock concurrency issues with certain drivers https://review.openstack.org/164642 | 14:02 |
d0ugal | stevemar: ah, are you the PTL? I have a question regarding an Ironic plugin that I'm likely to be working on soon. | 14:02 |
openstackgerrit | Li Yingjun proposed openstack/oslo-incubator: Remove unused validate_ssl_version https://review.openstack.org/164982 | 14:02 |
*** pblaho_ has joined #openstack-oslo | 14:03 | |
stevemar | d0ugal, the ptl is dtroyer (try #openstack-sdk), i'm core on the project but dean if more familiar with the plugins portion; we can help you out there | 14:03 |
*** subscope has quit IRC | 14:03 | |
*** pblaho_ has quit IRC | 14:04 | |
*** pblaho has quit IRC | 14:05 | |
*** pblaho_ has joined #openstack-oslo | 14:05 | |
*** pblaho_ has quit IRC | 14:06 | |
*** pblaho_ has joined #openstack-oslo | 14:06 | |
*** sigmavirus24_awa is now known as sigmavirus24 | 14:07 | |
*** pblaho_ has quit IRC | 14:08 | |
*** sreshetn1 has quit IRC | 14:08 | |
*** mtanino has joined #openstack-oslo | 14:10 | |
*** pblaho__ has joined #openstack-oslo | 14:10 | |
*** inc0 has quit IRC | 14:11 | |
*** salv-orlando has joined #openstack-oslo | 14:11 | |
*** salv-orlando has joined #openstack-oslo | 14:11 | |
*** subscope has joined #openstack-oslo | 14:18 | |
*** jungleboyj has joined #openstack-oslo | 14:22 | |
*** jecarey has joined #openstack-oslo | 14:23 | |
*** stpierre has joined #openstack-oslo | 14:27 | |
*** zz_jgrimm is now known as jgrimm | 14:33 | |
*** achanda has joined #openstack-oslo | 14:34 | |
dims | sileht: dhellmann: the exceptions traceback(s) during reconnect/heartbeat are very very scary | 14:36 |
*** _amrith_ is now known as amrith | 14:37 | |
dims | sileht: dhellmann: we need to distinguish between transient issues and message actually dropped | 14:38 |
dims | ozamiatin: ^ | 14:38 |
*** achanda has quit IRC | 14:43 | |
sileht | dims, perhaps should I LOG.info() only the message and LOG.debug the backtrace WDYT ? | 14:47 |
dims | sileht: sounds great. are any of these 4 spots actual errors? http://paste.openstack.org/show/192895/ | 14:49 |
dims | dhellmann: i just realized that LOG.exception prints tracebacks which have "TRACE" in the log level field | 14:50 |
dims | because of | 14:51 |
dims | cfg.StrOpt('logging_exception_prefix', | 14:51 |
dims | default='%(asctime)s.%(msecs)03d %(process)d TRACE %(name)s ' | 14:51 |
dims | '%(instance)s', | 14:51 |
*** inc0 has joined #openstack-oslo | 14:53 | |
dhellmann | dims: ah, I wondered what was causing that. We should probably change that. | 14:55 |
sileht | dims, some of these errors need to drop the backtrace | 14:55 |
dhellmann | dims: matching sdague's logging spec, which is approved now I think | 14:55 |
dhellmann | sileht: can we make it only log the error if we can't reconnect? | 14:55 |
sdague | yeh, that should be a Liberty thing though | 14:55 |
dhellmann | sdague: right | 14:55 |
dhellmann | sdague: oslo is already on liberty, since we have our stable branches cut | 14:56 |
sileht | dhellmann, I will try to do that | 14:56 |
sdague | dhellmann: gotcha | 14:56 |
*** david-lyle_afk is now known as david-lyle | 14:57 | |
dhellmann | sdague: did you see the link to the logging-related oslo spec I sent you? | 14:57 |
sdague | dhellmann: vaguely, but until Nova is basically released for Kilo I won't have any mental bw for it | 14:57 |
dhellmann | sdague: ack | 14:57 |
sdague | sileht: why do you need the traceback? is it a software bug? | 14:58 |
sileht | sdague, when we catch kombu 'recoverable_errors' we don't need a backtrace but for other exception raise we have a good chance that is a software bug | 14:59 |
sdague | ok, exceptions in logs should be exceptional circumstances. Because mostly it just scares our users on stuff they can't do anything about. | 15:00 |
eezhova | dims, so it's OK if I file a bug and add this functionality? | 15:01 |
dims | eezhova: sure thanks, we can treat as a bug since i believe we do re-read config in some circumstances, right? | 15:01 |
openstackgerrit | Li Yingjun proposed openstack/oslo-incubator: Remove unused validate_ssl_version https://review.openstack.org/164982 | 15:01 |
dims | sdague: sileht: dhellmann: +1 to exceptional circumstances | 15:03 |
dhellmann | dims, eezhova : I'm not sure we do re-read the config, do we? | 15:03 |
eezhova | dims, dhellmann: in neutron we want all processes to reload config files on SIGHUP and that includes parent process as well | 15:04 |
dims | dhellmann: see service.py (reload_config_files) | 15:04 |
dhellmann | dims: ah, I didn't realize that was there | 15:05 |
dims | eezhova: ack | 15:05 |
*** klindgren_ has joined #openstack-oslo | 15:05 | |
dhellmann | eezhova: be careful that nothing you depend on is caching the configuration settings in local variables | 15:05 |
dims | eezhova: let's at least come up with a review to see what it takes, may be late for kilo though | 15:06 |
*** jaypipes has quit IRC | 15:06 | |
*** kgiusti has joined #openstack-oslo | 15:06 | |
eezhova | dhellmann, yes, thank you for reminding. in fact, we want to use SIGHUP only to change logging options and policy_path. and I take care of all of that in reset() methods of rpc and api workers | 15:08 |
*** e0ne is now known as e0ne_ | 15:08 | |
*** sreshetn1 has joined #openstack-oslo | 15:13 | |
*** jungleboyj has quit IRC | 15:17 | |
*** crc32 has joined #openstack-oslo | 15:17 | |
*** crc32 has quit IRC | 15:18 | |
*** e0ne_ has quit IRC | 15:18 | |
*** crc32 has joined #openstack-oslo | 15:19 | |
openstackgerrit | Elena Ezhova proposed openstack/oslo-incubator: ProcessLauncher: reload config file in parent process on SIGHUP https://review.openstack.org/165104 | 15:23 |
*** e0ne has joined #openstack-oslo | 15:24 | |
*** crc32 has quit IRC | 15:28 | |
*** tsekiyama has joined #openstack-oslo | 15:32 | |
dims | sileht: klindgren_ reported some issues with the heartbeat patch. his setup has a haproxy in front of rabbitmq. have you tried out that scenario? | 15:32 |
sileht | dims, no | 15:34 |
dims | ok, if you can update the review with the muted traceback(s), i'll ask him to try and send us logs, he has some very good scenarios he is able to test | 15:35 |
sileht | good news | 15:36 |
klindgren_ | Basically what I see is after I move traffic from one rabbitmq server to another - I see Errors that continue when attempting to _publish a message on the queue | 15:36 |
dims | hey klindgren_! | 15:36 |
klindgren_ | morning :-) | 15:37 |
klindgren_ | another day devoted to rabbitmq failover testing :-) | 15:37 |
sileht | klindgren_, I like backtrace, if you have one to share :) | 15:37 |
dims | klindgren_: we were talking about actual errors vs transitional errors with sileht and dhellmann. so sileht is going to rev | 15:37 |
dims | sileht: the spots i pointed you to those were from klindgren_'s traces | 15:38 |
klindgren_ | https://gist.github.com/krislindgren/6a05e91263801de94ca1#file-restart-rabbitmq-logs <- restart of rabbitmq - expected errors | 15:38 |
sileht | I very interesting to known what kombu raise | 15:38 |
sileht | thanks | 15:38 |
klindgren_ | publish: https://gist.github.com/krislindgren/6a05e91263801de94ca1#file-fail-after-reconnect | 15:38 |
dhellmann | eezhova: does the logging code reconfigure the loggers on SIGHUP? | 15:38 |
klindgren_ | ^^ keeps happening for a long period of time | 15:39 |
sileht | dims, this backtrace will be replace by a LOG.info | 15:39 |
sileht | dims, the fact is that the error message can be log 'rpc_conn_pool_size' times | 15:39 |
dims | sileht: ack | 15:40 |
sileht | + 1 per rpc server + 1 for the rpc connection used for reply | 15:40 |
dims | sileht: would be handy to print a count on the info message? | 15:41 |
dims | or at least the rpc_con_pool_size? | 15:41 |
dims | sileht: my fear is if the logs are not self-explanatory to some extent, we'll be staring at logs for next 6 months :) | 15:42 |
*** yamahata has joined #openstack-oslo | 15:49 | |
eezhova | dhellmann, to reconfigure loggers it is needed to call logging.setup. in neutron there is a helper function that does that: https://github.com/openstack/neutron/blob/e933891462408435c580ad42ff737f8bff428fbc/neutron/common/config.py#L187 | 15:49 |
eezhova | dhellmann, that is how reset() method for RPCWorker should look like IMO: https://review.openstack.org/#/c/161732/4/neutron/service.py | 15:51 |
*** prad has quit IRC | 15:51 | |
*** prad has joined #openstack-oslo | 15:51 | |
*** inc0 has quit IRC | 15:53 | |
*** inc0_ has joined #openstack-oslo | 15:53 | |
*** liusheng has quit IRC | 15:57 | |
*** liusheng has joined #openstack-oslo | 15:58 | |
*** inc0_ has quit IRC | 15:59 | |
*** mdbooth has quit IRC | 16:02 | |
*** amotoki has quit IRC | 16:06 | |
*** mdbooth has joined #openstack-oslo | 16:07 | |
*** sputnik13 has quit IRC | 16:20 | |
*** salv-orlando has quit IRC | 16:21 | |
*** crc32 has joined #openstack-oslo | 16:23 | |
klindgren_ | sileht, dims - have a debug log from nova-compute of before the restart of rabbitmq - the connect errors - then it running ok - then 10min later throwing an error with socket closed. Do you want me to send via email? | 16:24 |
dims | klindgren_: yes please | 16:25 |
*** jaypipes has joined #openstack-oslo | 16:27 | |
klindgren_ | dims, email on its way | 16:31 |
*** sigmavirus24 is now known as sigmavirus24_awa | 16:31 | |
*** enikanorov has quit IRC | 16:40 | |
*** enikanorov has joined #openstack-oslo | 16:41 | |
*** gordc has quit IRC | 16:48 | |
*** harlowja_away is now known as harlowja | 16:49 | |
harlowja | sileht nice mysql bug, lol | 16:50 |
sileht | harlowja, that fixed in > 5.7.5 of mysql-server | 16:50 |
sileht | harlowja, I have also found a nice kazoo one: https://github.com/python-zk/kazoo/issues/291 | 16:51 |
harlowja | ha, nie | 16:51 |
harlowja | *nice | 16:51 |
*** BrianShang_ has quit IRC | 16:51 | |
harlowja | lock or rlock, that is the question :-P | 16:51 |
sileht | harlowja, I'm currently writing workaround in tooz for all this issues | 16:52 |
*** BrianShang has joined #openstack-oslo | 16:52 | |
harlowja | sileht k, i can adress the kazoo one i think | 16:53 |
harlowja | or should be able to :-P | 16:53 |
sileht | harlowja, in kazoo or in tooz ? | 16:53 |
*** alexpilotti has quit IRC | 16:53 | |
sileht | harlowja, I have already writen a fix in tooz | 16:53 |
harlowja | both? lol | 16:53 |
*** salv-orlando has joined #openstack-oslo | 16:53 | |
harlowja | i'm a kazoo core i guess also now, so both :-P | 16:54 |
sileht | harlowja, I will do it for tooz, because I have written some tests to exercices concurency of the tooz lock: https://review.openstack.org/#/c/164642/ | 16:55 |
harlowja | kk | 16:55 |
sileht | harlowja, and I need to fix all drivers to make them pass | 16:55 |
harlowja | right | 16:55 |
harlowja | dhellmann are u thinking that after https://review.openstack.org/#/c/162656/ is merged taskflow could release or do u want to wait further on that one? | 16:57 |
*** gordc has joined #openstack-oslo | 16:58 | |
*** sreshetn1 has quit IRC | 16:58 | |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Use the class name instead of the TYPE property in __str__ https://review.openstack.org/165148 | 17:04 |
harlowja | sileht ya, i think the kazoo defintion of re-entrant is sorta off , lol | 17:06 |
sileht | lol | 17:06 |
harlowja | sorta like reentrant acquire() but forgot about the release() part | 17:07 |
harlowja | *someone forgot | 17:07 |
*** jungleboyj has joined #openstack-oslo | 17:09 | |
openstackgerrit | Mehdi Abaakouk proposed openstack/tooz: fix lock concurrency issues with certain drivers https://review.openstack.org/164642 | 17:11 |
*** i159 has quit IRC | 17:15 | |
*** dtantsur is now known as dtantsur|afk | 17:16 | |
*** ihrachyshka has quit IRC | 17:19 | |
sileht | dims, klindgren_ I run out of times to submit a new heartbeat patch today, I will do it tomorrow (my) morning. | 17:19 |
openstackgerrit | Ken Giusti proposed openstack/oslo.messaging: Create a unique transport for each server in the functional tests https://review.openstack.org/155476 | 17:21 |
*** salv-orl_ has joined #openstack-oslo | 17:21 | |
openstackgerrit | Mehdi Abaakouk proposed openstack/tooz: fix lock concurrency issues with certain drivers https://review.openstack.org/164642 | 17:23 |
*** salv-orlando has quit IRC | 17:25 | |
*** dulek has quit IRC | 17:25 | |
*** sigmavirus24_awa is now known as sigmavirus24 | 17:28 | |
dims | sileht: thanks | 17:29 |
*** achanda has joined #openstack-oslo | 17:34 | |
openstackgerrit | Elena Ezhova proposed openstack/oslo-incubator: Store ProcessLauncher signal handlers on class level https://review.openstack.org/164993 | 17:35 |
*** exploreshaifali has quit IRC | 17:36 | |
*** e0ne has quit IRC | 17:37 | |
*** sputnik13 has joined #openstack-oslo | 17:37 | |
openstackgerrit | Ken Giusti proposed openstack/oslo.messaging: Create a unique transport for each server in the functional tests https://review.openstack.org/155476 | 17:42 |
*** jaosorior has joined #openstack-oslo | 17:42 | |
dhellmann | harlowja: we need to wait until that update is merged into the projects that use tooz | 17:51 |
harlowja | k | 17:51 |
harlowja | i'll let the others know that are awaiting that, thx | 17:51 |
openstackgerrit | Michael Bayer proposed openstack/oslo.db: Provide working SQLA_VERSION attribute https://review.openstack.org/165166 | 17:52 |
dhellmann | harlowja: yeah, sorry about that, but it shouldn't be too long | 17:52 |
harlowja | np | 17:52 |
*** exploreshaifali has joined #openstack-oslo | 17:59 | |
*** alexpilotti has joined #openstack-oslo | 18:02 | |
*** shardy has quit IRC | 18:04 | |
harlowja | sileht https://github.com/python-zk/kazoo/pull/292 | 18:07 |
harlowja | should nail that one on the head... | 18:07 |
*** achanda has quit IRC | 18:16 | |
*** e0ne has joined #openstack-oslo | 18:16 | |
dhellmann | rpodolyaka, viktors, zzzeek : oslo.db 1.7.1 released | 18:18 |
*** pblaho__ is now known as pblaho | 18:19 | |
*** pblaho has quit IRC | 18:19 | |
*** pblaho has joined #openstack-oslo | 18:19 | |
*** achanda has joined #openstack-oslo | 18:21 | |
*** pblaho has quit IRC | 18:21 | |
*** pblaho has joined #openstack-oslo | 18:22 | |
dims | yay! dhellmann | 18:27 |
*** e0ne is now known as e0ne_ | 18:27 | |
*** sputnik13 has quit IRC | 18:30 | |
*** pblaho has quit IRC | 18:30 | |
*** e0ne_ has quit IRC | 18:32 | |
*** sputnik13 has joined #openstack-oslo | 18:32 | |
*** e0ne has joined #openstack-oslo | 18:39 | |
*** cdent has quit IRC | 18:40 | |
openstackgerrit | Joshua Harlow proposed openstack/oslo-incubator: Recommend users of service.py use `launch_service_class` https://review.openstack.org/164836 | 18:41 |
openstackgerrit | Merged openstack/oslo-incubator: Add two additional emotions to release_notes https://review.openstack.org/162651 | 18:46 |
openstackgerrit | Merged openstack/oslo-incubator: Inject a bit more emotions to our releases https://review.openstack.org/161865 | 18:47 |
dims | eezhova: when testing with neutron, can you SIGHUP multiple times say every few mins, and it still reloads properly? | 18:49 |
*** kgiusti has quit IRC | 18:56 | |
*** gordc has quit IRC | 18:58 | |
*** rushiagr is now known as rushiagr_away | 18:59 | |
*** sreshetn1 has joined #openstack-oslo | 19:00 | |
*** ihrachyshka has joined #openstack-oslo | 19:13 | |
*** exploreshaifali has quit IRC | 19:17 | |
*** yamahata has quit IRC | 19:22 | |
*** achanda has quit IRC | 19:33 | |
*** yamahata has joined #openstack-oslo | 19:41 | |
*** sreshetn1 has quit IRC | 19:42 | |
*** andreykurilin_ has joined #openstack-oslo | 19:45 | |
*** sreshetn1 has joined #openstack-oslo | 19:46 | |
*** amrith is now known as _amrith_ | 19:48 | |
*** sputnik13 has quit IRC | 19:53 | |
*** sreshetn1 has quit IRC | 20:01 | |
*** achanda has joined #openstack-oslo | 20:19 | |
*** sputnik13 has joined #openstack-oslo | 20:23 | |
*** sputnik13 has quit IRC | 20:24 | |
*** sputnik13 has joined #openstack-oslo | 20:29 | |
*** ajo has quit IRC | 20:40 | |
*** boris-42 has quit IRC | 20:42 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 20:44 | |
*** achanda has quit IRC | 20:44 | |
*** achanda has joined #openstack-oslo | 20:45 | |
*** alexpilotti has quit IRC | 21:00 | |
*** sputnik13 has quit IRC | 21:01 | |
*** alexpilotti has joined #openstack-oslo | 21:02 | |
*** sigmavirus24_awa is now known as sigmavirus24 | 21:03 | |
*** harlowja has quit IRC | 21:03 | |
*** alexpilotti has quit IRC | 21:03 | |
*** harlowja_ has joined #openstack-oslo | 21:03 | |
*** _amrith_ is now known as amrith | 21:04 | |
*** ajo has joined #openstack-oslo | 21:05 | |
*** alexpilotti has joined #openstack-oslo | 21:06 | |
*** alexpilotti has quit IRC | 21:06 | |
*** sputnik13 has joined #openstack-oslo | 21:08 | |
*** sputnik13 has quit IRC | 21:09 | |
openstackgerrit | Davanum Srinivas (dims) proposed openstack/oslo.messaging: Tiny problem with notify-server in simulator https://review.openstack.org/165222 | 21:10 |
openstackgerrit | David Medberry proposed openstack/oslo.messaging: Fix a couple typos to make it easier to read. https://review.openstack.org/165223 | 21:11 |
dims | klindgren_: what haproxy version are you on? i am trying the simulator with 1.4.24 (https://github.com/openstack/oslo.messaging/blob/master/tools/simulator.py) | 21:12 |
klindgren_ | 1.5.something - hold plz | 21:12 |
klindgren_ | 1.5.2-2 | 21:13 |
dims | ok, when you disable a server, is that just the "disable server" against haproxy sock or actually stop rabbitmq as well? | 21:13 |
klindgren_ | diable against haproxy sock | 21:14 |
klindgren_ | which actually jsut prevents new connection from going to that node | 21:15 |
klindgren_ | then restart of the node that you disabled | 21:15 |
dims | restart rabbitmq | 21:15 |
klindgren_ | correct | 21:15 |
dims | ok cool. will try a few things and let you know tomorrow. hopefull sileht will have a patch for us as well | 21:16 |
dims | thanks for your patience | 21:16 |
*** ajo has quit IRC | 21:18 | |
klindgren_ | no problem. Bascially - we have been tracking this patch for 1+ months. We like other operators have a lot of issues with rpc and rabbit connections. Minaly with processes not being connected to rabbit anymore and not realizing it - so we have high hopes that this will solve our problems. | 21:18 |
klindgren_ | mainly*. I would say most of our operational issues comes down to rabbit. When weird errors start happening or things aren't working - restarting rpc workers typically solves the problem. So long story short - I will give as much time/testing to this as needed to make sure rabbit stuff is not an issue anymore. | 21:21 |
klindgren_ | IE be able to get away from - you did maintenance on rabbitmq? restart all the services attached to rabbitmq paradigm. | 21:22 |
*** stpierre has quit IRC | 21:41 | |
*** crc32 has quit IRC | 21:43 | |
*** BrianShang has quit IRC | 21:45 | |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Prototype a switch flow #2 (WIP) https://review.openstack.org/164922 | 21:46 |
*** BrianShang has joined #openstack-oslo | 21:46 | |
harlowja_ | anyone know why https://review.openstack.org/#/c/164836/ has a jenkins -1 but no errors (all green in the check list) | 21:48 |
harlowja_ | pretty weird ^ | 21:48 |
openstackgerrit | Joshua Harlow proposed openstack/oslo-incubator: Recommend users of service.py use `launch_service_class` https://review.openstack.org/164836 | 21:55 |
*** ChuckC has quit IRC | 21:57 | |
*** harlowja_ has quit IRC | 22:09 | |
*** harlowja has joined #openstack-oslo | 22:11 | |
*** sigmavirus24 is now known as sigmavirus24_awa | 22:14 | |
harlowja | sileht so fix for kazoo was to remove re-entrant ability, lol | 22:17 |
openstackgerrit | Joshua Harlow proposed openstack/oslo-incubator: Recommend users of service.py use `launch_service_class` https://review.openstack.org/164836 | 22:18 |
*** jgrimm is now known as zz_jgrimm | 22:21 | |
*** harlowja has quit IRC | 22:24 | |
*** harlowja_ has joined #openstack-oslo | 22:24 | |
*** sputnik13 has joined #openstack-oslo | 22:28 | |
*** crc32 has joined #openstack-oslo | 22:29 | |
*** jecarey has quit IRC | 22:30 | |
*** sputnik13 has quit IRC | 22:30 | |
*** sputnik13 has joined #openstack-oslo | 22:31 | |
*** amrith is now known as _amrith_ | 22:35 | |
*** e0ne has quit IRC | 22:36 | |
*** andreykurilin_ has quit IRC | 22:38 | |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Prototype a switch flow #2 (WIP) https://review.openstack.org/164922 | 22:48 |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Prototype a switch flow #2 (WIP) https://review.openstack.org/164922 | 22:50 |
*** stevemar has quit IRC | 22:52 | |
*** zzzeek has quit IRC | 22:54 | |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Prototype a switch flow #2 (WIP) https://review.openstack.org/164922 | 22:55 |
*** sreshetn1 has joined #openstack-oslo | 22:57 | |
*** mriedem is now known as mriedem_away | 22:59 | |
harlowja_ | jogo since u might know, for https://review.openstack.org/#/c/164836/ its puking with 'Requirement debtcollector>=0.3.0 does not match openstack/requirements value debtcollector~=0.3.0' but when i used 'debtcollector~=0.3.0' 2.6 blew up, so wondering if u have any idea (2.6 logs @ http://logs.openstack.org/36/164836/6/check/gate-oslo-incubator-python26/121d0f8/console.html ) | 23:00 |
*** sreshetn1 has quit IRC | 23:01 | |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Prototype a switch flow #2 (WIP) https://review.openstack.org/164922 | 23:08 |
*** dims has quit IRC | 23:09 | |
jogo | harlowja_: >= != ~= | 23:14 |
jogo | harlowja_: look at the logs for g-r | 23:14 |
jogo | dhellmann: landed a patch that made the change recently | 23:14 |
*** enikanorov has quit IRC | 23:14 | |
*** enikanorov has joined #openstack-oslo | 23:15 | |
*** YorikSar has quit IRC | 23:15 | |
harlowja_ | jogo hmmm, 2.6 not seem to like 'debtcollector~=0.3.0' though right? from http://logs.openstack.org/36/164836/6/check/gate-oslo-incubator-python26/121d0f8/console.html | 23:17 |
harlowja_ | 'ValueError: ('Expected version spec in', 'debtcollector~=0.3.0', 'at', '~=0.3.0')' | 23:17 |
harlowja_ | which sorta seems odd, but not sure, guess 2.6 to old to understand that | 23:18 |
harlowja_ | let me try again though, just in case | 23:18 |
*** dims has joined #openstack-oslo | 23:20 | |
jogo | harlowja_: not sure sorry | 23:20 |
harlowja_ | k | 23:20 |
harlowja_ | np | 23:20 |
openstackgerrit | Joshua Harlow proposed openstack/oslo-incubator: Recommend users of service.py use `launch_service_class` https://review.openstack.org/164836 | 23:21 |
*** ihrachyshka has quit IRC | 23:21 | |
*** jaosorior has quit IRC | 23:22 | |
harlowja_ | let's see if that works | 23:22 |
*** ChuckC has joined #openstack-oslo | 23:26 | |
*** YorikSar has joined #openstack-oslo | 23:28 | |
openstackgerrit | Joshua Harlow proposed openstack/taskflow: Give the GC a break https://review.openstack.org/165248 | 23:31 |
*** crc32 has quit IRC | 23:39 | |
*** crc32 has joined #openstack-oslo | 23:44 | |
*** david-lyle is now known as david-lyle_afk | 23:50 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!