bbbbzhao_ | cgoncalves: Maybe you don't run health-monitor to refresh the operating_status ? ;-) | 00:07 |
---|---|---|
bbbbzhao_ | cgoncalves: Sorry , I need to run to office, it's late.. ;-). I will reply you when I arrive. | 00:07 |
bbbbzhao_ | johnsom: Does that mean I need to post a new revision for patch 2? | 00:08 |
johnsom | I haven't posted any comment that require a revision to patch 2. We may add a patch to the end of the chain, but I am not committed to another patch 2 yet | 00:10 |
bbbbzhao_ | johnsom: Oh, yeah. Sorry. I consider -1 for making me concern the issue. Thanks | 00:23 |
johnsom | bbbbzhao_ Yesh, No problem, I just wanted to call attention to my question there as reordering flows is higher risk | 00:26 |
*** longkb has joined #openstack-lbaas | 00:49 | |
*** abaindur has quit IRC | 01:31 | |
*** hongbin has joined #openstack-lbaas | 01:44 | |
bzhao__ | cgoncalves: Hi, For your question, did you test on centos? The get_udp_listeners just do searching the udp specific named config file, is_udp_listener_running is checking the keepalived which hold the specific file is running by searching in /proc with its pid.. Maybe the keepalived is not running. | 01:59 |
*** ramishra has joined #openstack-lbaas | 02:00 | |
*** yamamoto has joined #openstack-lbaas | 02:01 | |
bzhao__ | johnsom: I will provider a full logs for the concerned reorder issue in part 2. And today, I will begin the storyboard highest UDP bug today, and try my best to fullfill the rest of them in this weekend. Thank you again for many helps, and thanks our octavia team. ;-) | 02:04 |
bzhao__ | cgoncalves: So your amp are in trouble status, the healthmonitor will remove the amp and rebuild it. | 02:05 |
bzhao__ | cgoncalves: As it can not get the expect listener to update into db. | 02:05 |
bzhao__ | cgoncalves: I mean "it" above sentence is health monitor process | 02:06 |
*** ramishra has quit IRC | 02:08 | |
*** yamamoto has quit IRC | 02:12 | |
*** yamamoto has joined #openstack-lbaas | 02:18 | |
*** yamamoto has quit IRC | 02:23 | |
bzhao__ | cgoncalves: sorry, s/ healthmonitor /health manger/ | 02:35 |
johnsom | Thank you. Volunteering at the county fair tonight I can’t work on it tonight, but will work again tomorrow | 02:39 |
*** yamamoto has joined #openstack-lbaas | 03:03 | |
bzhao__ | johnsom: Thanks. ;-). Run with the time. | 03:08 |
bzhao__ | johnsom: I had prepared the LOG, I think I show it here. Hope not fresh other guys' screen. | 03:09 |
bbbbzhao_ | https://www.irccloud.com/pastebin/tgt6zgFO/This%20is%20operation%20steps. | 03:10 |
*** yamamoto has quit IRC | 03:11 | |
bbbbzhao_ | http://paste.openstack.org/show/727198/ This is piece of health manager logs. | 03:20 |
bbbbzhao_ | I collect this by not moving the order. Agent side will raise 500 for start listener https://www.irccloud.com/pastebin/GuMt9ECo/This%20is%20the%20error%20in%20log | 03:26 |
bbbbzhao_ | johnsom: I mark here and ping you for not miss it. Thanks. ;-). Have a good rest. | 03:26 |
*** yamamoto has joined #openstack-lbaas | 03:37 | |
*** yamamoto has quit IRC | 03:41 | |
*** yamamoto has joined #openstack-lbaas | 03:43 | |
*** hongbin has quit IRC | 03:52 | |
*** yamamoto has quit IRC | 04:02 | |
*** ramishra has joined #openstack-lbaas | 04:03 | |
*** yamamoto has joined #openstack-lbaas | 04:03 | |
*** yamamoto has quit IRC | 04:14 | |
*** yamamoto has joined #openstack-lbaas | 04:28 | |
*** yamamoto has quit IRC | 04:38 | |
*** yamamoto has joined #openstack-lbaas | 04:43 | |
*** yamamoto has quit IRC | 04:47 | |
*** yamamoto has joined #openstack-lbaas | 04:56 | |
*** yamamoto has quit IRC | 04:58 | |
*** yamamoto has joined #openstack-lbaas | 06:03 | |
*** yamamoto_ has joined #openstack-lbaas | 06:07 | |
*** yamamoto has quit IRC | 06:09 | |
*** yamamoto_ has quit IRC | 06:10 | |
*** rcernin has quit IRC | 06:54 | |
*** annp has quit IRC | 07:02 | |
*** longkb has quit IRC | 07:03 | |
*** longkb has joined #openstack-lbaas | 07:17 | |
*** longkb has quit IRC | 07:31 | |
*** longkb has joined #openstack-lbaas | 07:35 | |
*** ktibi has joined #openstack-lbaas | 07:56 | |
cgoncalves | bzhao__, yes, centos. right, health manager is failing over amp because of expected listeners not matching | 08:19 |
cgoncalves | I will continue looking at it today | 08:19 |
*** salmankhan has joined #openstack-lbaas | 08:48 | |
openstackgerrit | ZhaoBo proposed openstack/octavia master: Followup patch for UDP support https://review.openstack.org/587690 | 08:49 |
bzhao__ | cgoncalves: Thanks. My env just be flushed. Is the process can be found in "/proc/PID" on centos? If yes, we must to go inside and check the /var/log/message for the reason why it can not be setup. If not, another different between OS. | 08:51 |
cgoncalves | bzhao__, I'm restacking with latest patches. yeah, I wanted to check that yesterday but was 2 AM for me :) | 08:55 |
cgoncalves | I'll check and keep you posted | 08:55 |
bzhao__ | cgoncalves: You work so hard. ;-) . Take a good rest. | 08:58 |
cgoncalves | thanks :) | 08:58 |
*** obre is now known as obre_ | 09:58 | |
*** obre_ is now known as obre | 09:58 | |
*** obre has quit IRC | 10:04 | |
openstackgerrit | ZhaoBo proposed openstack/octavia master: [UDP] Fix failed member always in DRAIN status https://review.openstack.org/588511 | 10:53 |
*** amuller has joined #openstack-lbaas | 11:53 | |
*** longkb has quit IRC | 12:38 | |
*** ktibi has quit IRC | 13:04 | |
*** ramishra has quit IRC | 13:22 | |
*** ktibi has joined #openstack-lbaas | 14:06 | |
cgoncalves | I no longer get unexpected # of listeners. latest PS should have fixed it | 14:12 |
cgoncalves | altough pool operating_status keeps on OFFLINE | 14:13 |
cgoncalves | member provisioning status is ACTIVE | 14:13 |
cgoncalves | ah... netcat isn't installed on centos amp :/ | 14:16 |
*** rpittau has quit IRC | 14:18 | |
*** hongbin has joined #openstack-lbaas | 14:33 | |
*** erjacobs has joined #openstack-lbaas | 14:41 | |
cgoncalves | interesting. amphora-haproxy netns isn't created after vm reboot and isn't failed over as eth0 (lb-mgmt) is up and reports health msgs | 14:41 |
cgoncalves | https://storyboard.openstack.org/#!/story/2003306 | 14:54 |
openstackgerrit | German Eichberger proposed openstack/octavia master: Allows failover if port is not deallocated by nova https://review.openstack.org/585864 | 15:47 |
*** erjacobs has quit IRC | 15:57 | |
-openstackstatus- NOTICE: The infra team is renaming projects in Gerrit. There will be a short ~10 minute Gerrit downtime in a few minutes as a result. | 16:02 | |
johnsom | cgoncalves Hmm, so I think there is a systemd service for setting up the netns, I wonder if that is failing | 16:21 |
xgerman_ | would that run for UDP? | 16:27 |
johnsom | Well, it doesn't *depend* on any other systemd service, but the question is where in the code is it getting written out. | 16:32 |
johnsom | Yeah, it is missing on the pure UDP path | 16:34 |
*** openstackgerrit has quit IRC | 16:49 | |
cgoncalves | johnsom, can you reproduce it on ubuntu? | 16:52 |
johnsom | I haven't tried, but I can clearly see in the UDP path where this is missing. | 16:52 |
johnsom | cgoncalves https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/agent/api_server/listener.py#L190 | 16:53 |
johnsom | That section is not in the pure UDP path | 16:53 |
johnsom | And should be | 16:53 |
johnsom | Though a refactor would be nice too, but.... | 16:53 |
*** dmellado has quit IRC | 17:22 | |
johnsom | cgoncalves Can you confirm the LB with the amp you reboot only had a UDP listener? Is there something more you want me to test here on xenial? | 17:27 |
cgoncalves | johnsom, only one UDP listener | 17:28 |
cgoncalves | johnsom, netcat being installed is a new requirement, correct? | 17:28 |
johnsom | cgoncalves Yeah, ok, so it's that missing code. I will fix in my patch today. | 17:28 |
-openstackstatus- NOTICE: Project renames and review.openstack.org downtime are complete without any major issue. | 17:28 | |
johnsom | I am stacking last night's code, going to investigate that flow re-order, then try to finish my review of #2 | 17:29 |
cgoncalves | thanks \o/ | 17:29 |
*** openstackgerrit has joined #openstack-lbaas | 17:39 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Gate on CentOS 7 and check on Ubuntu Bionic https://review.openstack.org/587414 | 17:39 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Gate on CentOS 7 and check on Ubuntu Bionic https://review.openstack.org/587414 | 17:44 |
cgoncalves | johnsom, we'll need to revert your Ib8677d2b85e352b19abf5fd0b79c1b8653819301 | 17:45 |
cgoncalves | "Job octavia-v2-dsvm-scenario-ubuntu.bionic in openstack/octavia-tempest-plugin is not permitted to shadow job octavia-v2-dsvm-scenario-ubuntu.bionic in openstack/octavia" | 17:45 |
johnsom | Say what? | 17:45 |
cgoncalves | or I can fix that in https://review.openstack.org/#/c/587442/ | 17:46 |
*** salmankhan has quit IRC | 17:46 | |
johnsom | cgoncalves Where are you seeing that? Ib8677d2b85e352b19abf5fd0b79c1b8653819301 is correct, however, there should be no definition for that in octavia/octavia | 17:48 |
KeithMnemonic | johnsonm can you point me in the location of the code that populates the octavia.conf or amphora-agent.conf in the amphora images. i.e when changing a value on the octavia-worked node how it gets sent to the amphora when it is freshly booted? | 17:49 |
johnsom | amphora-agent.conf is created at amp boot time and loaded via config drive. It does not get updated after boot (If I remember right). We have plans to enable that, but not implemented yet | 17:50 |
johnsom | KeithMnemonic Hi Keith BTW. | 17:51 |
johnsom | The template is here: https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/agent/templates/amphora_agent_conf.template | 17:51 |
johnsom | It gets rendered here: https://github.com/openstack/octavia/blob/master/octavia/controller/worker/tasks/compute_tasks.py#L77 | 17:52 |
johnsom | cgoncalves Ah, this needs to go away... https://github.com/openstack/octavia/blob/master/zuul.d/jobs.yaml#L54 | 17:53 |
johnsom | Sigh, to many things going on at once.... | 17:53 |
johnsom | cgoncalves yes, nuke it here would be great: https://review.openstack.org/#/c/587442/ | 17:53 |
cgoncalves | right. that's what I'm gonna do once I get to fix octavia-v2-dsvm-scenario-centos.7 | 17:54 |
johnsom | Thanks | 17:55 |
KeithMnemonic | johnsonm Hello Back, sorry to be so abrupt ;0-) thanks, config drive is what i was looking for | 17:57 |
KeithMnemonic | is that the same for octavia.conf | 17:57 |
johnsom | octavia.conf is only on the controllers | 17:57 |
johnsom | And is configured/installed by the operator or packager | 17:58 |
KeithMnemonic | ok do maybe this guy read your note wrong http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2016-05-26.log.html | 17:58 |
KeithMnemonic | johnsom: Just a second, I will send you the settings I think you need to increase | 17:58 |
KeithMnemonic | johnsom: kevo These two in octavia.conf | 17:58 |
KeithMnemonic | johnsom: # rest_request_conn_timeout = 10 | 17:58 |
KeithMnemonic | johnsom: # rest_request_read_timeout = 60 | 17:58 |
KeithMnemonic | kevo: Johnson, I'll try that out and I'll let you know. Thanks | 17:58 |
KeithMnemonic | kevo: thanks johnsom your suggestion worked. | 17:58 |
johnsom | # ls /etc/octavia/ | 17:58 |
johnsom | amphora-agent.conf certs | 17:58 |
johnsom | root@amphora-6bf09249-2ba9-4ce3-9572-83e61dcf5e21:/usr/lib/systemd/system# | 17:58 |
KeithMnemonic | i see an octavia.conf but it is all commented out | 17:59 |
KeithMnemonic | do you recall that conversation | 17:59 |
johnsom | Right, those settings are only valid in the octavia.conf (they are controller settings). When they are commented out, they are using the default values, which is the number in the comment. | 18:00 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: Gate on octavia-dsvm-base based jobs and housekeeping https://review.openstack.org/587442 | 18:00 |
KeithMnemonic | ok they guy who pinged me thought they should end up on the amphorae | 18:00 |
johnsom | So, this: # rest_request_conn_timeout = 10 | 18:00 |
johnsom | Means rest_request_conn_timeout is using the coded default, which is 10 by default. | 18:01 |
johnsom | No, that is the timeout for the controller talking to the amphora-agent. It doesn't get set or used in the amp | 18:01 |
KeithMnemonic | yes he misunderstood | 18:01 |
johnsom | Two years ago??? ha | 18:02 |
KeithMnemonic | his issue is the vip is not plugging fast enough and he saw that old thread and thought it was the same issue | 18:02 |
johnsom | Lucky if I remember what people asked me last week | 18:02 |
johnsom | Hmm, so nova isn't booting the VM fast enough? | 18:02 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: Gate on octavia-dsvm-base based jobs and housekeeping https://review.openstack.org/587442 | 18:02 |
johnsom | KeithMnemonic That setting is connection_max_retries and connection_retry_interval | 18:03 |
johnsom | But the default for that is like 25 minutes. If he can't boot an amp in that, he might as well go home..... | 18:04 |
johnsom | Could be that his lb-mgmt-net isn't working | 18:04 |
johnsom | Normal time for that action is less than 30 seconds | 18:04 |
johnsom | We have it at 25 minutes for virtualbox users and super slow gate test hosts | 18:05 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: Gate on octavia-dsvm-base based jobs and housekeeping https://review.openstack.org/587442 | 18:05 |
* cgoncalves needs more coffee... | 18:05 | |
johnsom | Production deploys usually drop that down, but like I said, it's normal to just boot in less than 30 seconds | 18:05 |
johnsom | KeithMnemonic I would check that the lb-mgmt-net is even working. Most likely that is the problem | 18:06 |
KeithMnemonic | yeah for sure. i can check the plumbing | 18:06 |
KeithMnemonic | the log showed Error code 400: | 18:07 |
KeithMnemonic | JSON Response: | 18:07 |
KeithMnemonic | { | 18:07 |
KeithMnemonic | 'message': 'Invalid VIP', | 18:07 |
KeithMnemonic | } | 18:07 |
KeithMnemonic | I found a match of these error: | 18:07 |
KeithMnemonic | and that led him to that old thread listed above | 18:07 |
KeithMnemonic | 2018-07-03 11:44:44.977 1711 INFO werkzeug [-] 10.207.206.13 - - [03/Jul/2018 11:44:44] "POST /0.5/plug/vip/10.207.221.55 HTTP/1.1" 404 – | 18:09 |
colin- | when you guys create an LB and the provisioning state goes to ERROR, what's the first place you look for additional info? i'm not super familiar with the API at the moment and my instinct is to just check the worker's process output but i'm wondering if there's better info than that available | 18:09 |
johnsom | KeithMnemonic 400 means user error. Like the VIP address doesn't match the subnet/network specified, or something like that. | 18:16 |
cgoncalves | colin-, that is how I do it, too :/ | 18:17 |
johnsom | colin- Yes, provisioning_status ERROR means the controller ran into a problem and all of the retries/workarounds have failed and the controller needs to stop and ask for operator intervention. Like if nova goes down in the middle of booting an amphora, or neutron fails to create a port and the retries time out. The first stop is going to be the controller logs | 18:17 |
cgoncalves | today I added "Debugging - End-user friendly ERROR messages" as proposed topic for the PTG | 18:18 |
johnsom | In that state the end user has the option of escalating to the operator or deleting the object in ERROR and trying again | 18:18 |
*** openstackgerrit has quit IRC | 18:19 | |
johnsom | cgoncalves bring the popcorn. This is fun topic. Many operators want to hide the true reason for the failures from the users... SLA contracts and such.... | 18:19 |
colin- | understood, thanks | 18:19 |
johnsom | They don't like octavia objects to say "Nova compute has been down for 8 hours, unable to create load balancer" Which is exactly what it would say for a recent issue I saw... lol | 18:20 |
cgoncalves | johnsom, I know. i was thinking soemthing generic yet a bit more useful like pointing fingers to the compute or network service | 18:20 |
johnsom | cgoncalves It is a good topic to discuss though, so please feel free to add it to the etherpad | 18:32 |
johnsom | https://etherpad.openstack.org/p/octavia-stein-ptg | 18:33 |
cgoncalves | I already did :) | 18:33 |
johnsom | I see that. Awesome | 18:33 |
*** openstackgerrit has joined #openstack-lbaas | 18:34 | |
openstackgerrit | German Eichberger proposed openstack/octavia master: Allows failover if port is not deallocated by nova https://review.openstack.org/585864 | 18:34 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Gate on CentOS 7 and check on Ubuntu Bionic https://review.openstack.org/587414 | 18:36 |
*** ktibi has quit IRC | 19:04 | |
*** abaindur has joined #openstack-lbaas | 19:16 | |
*** salmankhan has joined #openstack-lbaas | 19:19 | |
*** salmankhan has quit IRC | 19:24 | |
*** abaindur has quit IRC | 19:28 | |
*** amuller has quit IRC | 19:52 | |
*** dmellado has joined #openstack-lbaas | 19:56 | |
rm_work | jiteka: ask questions here :P | 20:09 |
jiteka | I'm too shy | 20:15 |
jiteka | :D | 20:15 |
jiteka | actually I already bothered johnsom couple of time this week | 20:15 |
jiteka | it was your turn | 20:15 |
xgerman_ | lol | 20:16 |
rm_work | lol | 20:17 |
*** harlowja has joined #openstack-lbaas | 20:21 | |
johnsom | jiteka No need to be shy, we are all friendly here | 20:27 |
jiteka | I know johnsom was joking :) | 20:29 |
johnsom | Funny how we all have the same advice... grin | 20:37 |
johnsom | Nice, our upgrade tags have shown up: https://pypi.org/project/octavia/ | 20:47 |
cgoncalves | I don't get why so often our centos job fails to build the amp image with "Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again" | 20:48 |
cgoncalves | http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/18de25c/job-output.txt.gz#_2018-08-03_20_28_39_672377 | 20:48 |
cgoncalves | this is a successful run: http://logs.openstack.org/55/587255/1/check/octavia-v1-dsvm-scenario-kvm-centos.7/f145430/logs/devstacklog.txt.gz#_2018-07-31_00_16_20_969 | 20:50 |
johnsom | looking | 20:50 |
cgoncalves | even though there was that metadata 404 | 20:50 |
johnsom | Well, even on that one "updateinfo.xml.bz2: [Errno 14] HTTP Error 404 - Not Found" doesn't seem good | 20:51 |
johnsom | So, I could be lazy and say the epel mirrors are trash, but I'm not so give me a few minutes to look at some things | 20:52 |
cgoncalves | johnsom, I asked on #openstack-infra but no luck. perhaps you could work your magic :D | 20:52 |
johnsom | lol | 20:53 |
johnsom | Maybe they don't get into the RedHat parties either | 20:53 |
cgoncalves | lol | 20:53 |
johnsom | Hmm, looks like that is occuring in the base elements from DIB too and not one of ours? I can't imagine you got an include of iscsi by me | 20:55 |
cgoncalves | diskimage-builder/diskimage_builder/elements/base/install.d/00-baseline-environment | 20:56 |
cgoncalves | install-packages -m base iscsi_package | 20:56 |
xgerman_ | with the next PTG around the corner —- we are better on that bus ;-) | 21:01 |
johnsom | cgoncalves My initial guess is this element step is not working: 01-set-centos-mirror | 21:02 |
johnsom | But still looking | 21:02 |
cgoncalves | johnsom, I doubt that because that's for centos repos. epel is managed separately | 21:03 |
johnsom | That it is running out to the interwebs and not using the OpenStack infra mirrors. But it could be that that epel isn't mirrored as well | 21:03 |
johnsom | Well, that *might* be the issue.... See what I am saying | 21:03 |
johnsom | cgoncalves Yep, ok, got it | 21:04 |
johnsom | http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/18de25c/job-output.txt.gz#_2018-08-03_20_28_38_556548 | 21:04 |
johnsom | This is the execution log of the DIB phase. The # prefixes are the ordering | 21:05 |
johnsom | Oh, wait, nevermind, looking in the wrong place. | 21:05 |
johnsom | 05-rpm-epel-release This one might not be working.... | 21:05 |
johnsom | lol | 21:05 |
johnsom | https://github.com/openstack/diskimage-builder/tree/master/diskimage_builder/elements/epel | 21:06 |
cgoncalves | from what I've seen, it is working. the epel-release package gets installed | 21:06 |
cgoncalves | mirror is not overwritten, which is expected | 21:07 |
cgoncalves | it's really intermittent. see http://logs.openstack.org/06/586906/2/check/octavia-v1-dsvm-scenario-kvm-centos.7/b60cf4b/logs/devstacklog.txt.gz#_2018-08-01_12_12_02_645. no errors or warnings whatsoever | 21:08 |
johnsom | Yeah, it looks like the "cache data" from for fastestmirror is getting trashed? | 21:11 |
johnsom | Hmm, so your failed job ran at OVH, the success ran at inap. Is there a relation on the failed jobs? | 21:13 |
johnsom | You can look in zuul-info/zuul-info.controller.txt for the provider | 21:13 |
johnsom | It could be one provider has a problem with the mirror | 21:14 |
cgoncalves | I'd say very likely | 21:14 |
johnsom | This still makes me wonder: | 21:15 |
johnsom | http://logs.openstack.org/06/586906/2/check/octavia-v1-dsvm-scenario-kvm-centos.7/b60cf4b/logs/devstacklog.txt.gz#_2018-08-01_12_12_02_898 | 21:15 |
johnsom | So the others seem to be mirrors at the provider, this one (success) went out to the internets | 21:15 |
johnsom | osuosl.org (which happens to be here in town) | 21:15 |
johnsom | Go beavs | 21:15 |
johnsom | mnaser Quick question, are you aware of any recent issues with the epel mirror at OVH? | 21:17 |
mnaser | johnsom: not that i know of | 21:17 |
johnsom | Ok, no smoking gun here yet, but thought I would give a quick ping to check | 21:17 |
johnsom | http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/18de25c/job-output.txt.gz#_2018-08-03_20_28_39_672377 for those playing along | 21:18 |
cgoncalves | http://logs.openstack.org/56/584856/3/check/octavia-v1-dsvm-scenario-kvm-centos.7/1766905/zuul-info/inventory.yaml okay with rax | 21:21 |
johnsom | Yeah, inap and rax seem to pass. | 21:21 |
johnsom | cgoncalves Is this centos 7? | 21:21 |
cgoncalves | johnsom, yes. controller and amp | 21:22 |
cgoncalves | http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/b7ce6d4/zuul-info/inventory.yaml this is inap and failed | 21:22 |
johnsom | Ok, so likely not a local mirror issue | 21:22 |
johnsom | cgoncalves these are running on different nodepool images. Your failed is on centos image where pass is on ubuntu | 21:24 |
cgoncalves | hmm ok, looking now at nodeset level... | 21:25 |
cgoncalves | why can't you just pull your strings with infra people. would be much easier xD | 21:25 |
johnsom | Must save silver bullets for things specific and clear..... | 21:26 |
johnsom | So that path I was going down is looking at the ca-certificates files as that plays a role here. That is when I noticed the base image is different | 21:27 |
johnsom | Yeah, all four samples I have line up, so it looks like you must build centos images on ubuntu hosts... lol | 21:28 |
xgerman_ | yep, that’s how it was designed | 21:28 |
xgerman_ | cgoncalves: did you think you could build centos on centos? | 21:29 |
johnsom | I would look at clock skew on the centos hosts (i.e. is it getting a good time so the ssl can negotiate?), the packages like yum ca-certificates, etc. | 21:29 |
johnsom | xgerman_ I love the wording of this MicroFocus proxy vote letter (from hp/hpe stock you might have had): "To approve the disposal by the Company of the SUSE buisness segment ..." | 21:32 |
johnsom | cgoncalves I need to get back to UDP stuff. Noodle on that a bit. If you are still stuck ping your colleague ianw in #openstack-dib | 21:36 |
cgoncalves | johnsom, ok. thank you for your time! | 21:36 |
johnsom | If you are still stuck next week ping me again on it | 21:37 |
xgerman_ | Yeah, only kept the HP printer Corp. - the rest seemed to risky:-) | 21:38 |
*** pcaruana has quit IRC | 21:38 | |
johnsom | Yeah, I must have like one share floating around somewhere | 21:38 |
jiteka | Could someone confirm which parameter I need to change to increase the timeout on amphora build | 21:39 |
johnsom | Wow, really? 25 minutes isn't enough? I think that is the default | 21:39 |
jiteka | I don't have enough time to troubleshoot why the controller can't reach my VM | 21:39 |
jiteka | octavia-worker[18322]: WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='10.79.80.30', port=9443): Max retries exceeded with url: /0.5/plug/vip/10.63.69.0 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fc56b4edbd0>: Failed to establish a new connection: [Errno 111] Connection refused',)) | 21:39 |
jiteka | the VM go away in something like 1 or 2 min | 21:40 |
johnsom | jiteka connection_max_retries (default 300) and connection_retry_interval (default 5) are the two timeouts there while we wait for nova to boot the instance. | 21:41 |
johnsom | Though this also could mean your lb-mgmt-net is not working. Booting up a cirros there and checking that it is reachable/got an IP can help there | 21:41 |
johnsom | Hmmm refused, could be a firewall or some kind of customized image that is broken. | 21:42 |
jiteka | what's the difference between : | 21:43 |
jiteka | - ConnectTimeoutError | 21:43 |
jiteka | - NewConnectionError | 21:43 |
johnsom | Normally we handle the security groups, so that should not be an issue. | 21:43 |
jiteka | I noticed that it throw few timeout before throwing error | 21:43 |
johnsom | Hmm, neither of those are from our code. They are things we are catching. Let me search. Are they both in that same warning message? | 21:48 |
johnsom | jiteka They are both URLLIB3 execptions: http://urllib3.readthedocs.io/en/latest/reference/index.html#module-urllib3.exceptions | 21:49 |
johnsom | NewConnectionError appears to mean something is actively rejecting the connection, where ConectTimeoutError is no response at all | 21:50 |
johnsom | You would see ComputeWaitTimeoutException if octavia actually gives up trying to connect | 21:51 |
KeithMnemonic | johnsom thanks again have a great weekend (need to make up for my earlier abruptness) | 21:53 |
johnsom | Or a pure "TimeOutException" with an error log entry "Connection retries (currently set to %(max_retries)s) exhausted. The amphora is unavailable. | 21:53 |
johnsom | KeithMnemonic o/ | 21:53 |
*** KeithMnemonic has quit IRC | 21:53 | |
rm_work | hmmm i feel like i JUST saw someone else posting about that centos build error somewhere else... | 21:57 |
johnsom | rm_work BTW, I hacked a stack with your bbq client patch (fed=False) and it doesn't appear to solve the problem. | 21:59 |
rm_work | hmmmmm | 21:59 |
johnsom | At least when using Octavia | 22:00 |
johnsom | I didn't try the CLI test | 22:00 |
rm_work | it still tries to hit the href passed in? | 22:00 |
johnsom | It hits the public URL in keystone. I don't know if it's getting that from keystone, the hardcoded one in the config file, or the href | 22:00 |
rm_work | hmmmm | 22:01 |
rm_work | yeah can you just put a print statement in to show what URL it's passing in | 22:01 |
johnsom | Sadly I don't have direct access to that stack, so my debugging is limited there. | 22:01 |
rm_work | in the secrets class (right after I do the if-statemet to generate the new URL) | 22:01 |
rm_work | ah | 22:01 |
rm_work | not devstack? | 22:01 |
johnsom | No, it's an actual cloud that someone else controls | 22:02 |
johnsom | Thus why the internal and public URLs are different and we found this issue. | 22:02 |
rm_work | yeah but | 22:03 |
johnsom | Maybe next week I can setup that CLI test on devstack again. | 22:03 |
rm_work | hmmm | 22:03 |
rm_work | do you know it was done correctly then? | 22:03 |
johnsom | I'm just arms deep in a amphora-agent refactor for UDP | 22:03 |
rm_work | like, the patch was actually installed in the right place | 22:03 |
johnsom | Yeah, I watched and instructed as the installed. 85% confident | 22:03 |
rm_work | hmmmmmmmm | 22:03 |
rm_work | I can add in some debugging if you can have them try it again | 22:04 |
rm_work | then could at least verify it is running that code | 22:04 |
rm_work | there are some changes i wanted to make anyway | 22:04 |
johnsom | Yeah, maybe next week. I can also just setup a devstack and change the internal URL to broken and run my test steps in the story | 22:04 |
rm_work | yes | 22:04 |
johnsom | I just don't have the VMs for that right now | 22:04 |
rm_work | i would like to see that :P | 22:04 |
rm_work | ah yeah | 22:04 |
rm_work | cloud + bit.do/devstack? :P | 22:05 |
rm_work | i always just used RAX VMs | 22:05 |
johnsom | I don't, for the reasons your are aware of.... | 22:05 |
johnsom | lol | 22:05 |
rm_work | i mean | 22:05 |
rm_work | it's better than NO VMs | 22:05 |
rm_work | usually | 22:05 |
rm_work | but, let me see if i can spin a stack | 22:06 |
rm_work | i haven't tried in a while | 22:06 |
rm_work | do you stack on Bionic now? | 22:07 |
rm_work | i should prolly just make a clean thing | 22:07 |
johnsom | No, I haven't switched yet | 22:07 |
rm_work | hmmm | 22:08 |
johnsom | I was nervous about the major networking changes, but it seems the compatibility stuff works well enough our amps run. | 22:08 |
rm_work | but it should be safe, right? | 22:08 |
rm_work | hmm k | 22:08 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Add usage admin resource https://review.openstack.org/557548 | 22:11 |
*** hongbin has quit IRC | 22:19 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Gate on CentOS 7 and check on Ubuntu Bionic https://review.openstack.org/587414 | 22:35 |
cgoncalves | got it! https://review.openstack.org/#/c/588676/ | 23:13 |
johnsom | Nice, so it was a ca-certificates issue. Probably just not installed soon enough. | 23:19 |
cgoncalves | not installed at all | 23:22 |
cgoncalves | epel is installed on the host but is http:// | 23:22 |
johnsom | Yeah, it is eventually, it was in rpm list | 23:22 |
cgoncalves | all repos I've come across are http:// in fact | 23:22 |
johnsom | http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/18de25c/controller/logs/rpm-qa.txt.gz | 23:24 |
johnsom | I checked that, but it must be coming in too late | 23:24 |
cgoncalves | you, sir, always amuse me with such useful logs | 23:24 |
cgoncalves | so, the mystery of octavia-tempest-plugin is still unsolved. the perms seem to be right, though | 23:25 |
* johnsom spends way too much time looking at logs for people..... | 23:25 | |
cgoncalves | guilty xD | 23:25 |
johnsom | Hmmm, have a link for the plugin issue? | 23:26 |
johnsom | Where you are dumping the perms? | 23:26 |
cgoncalves | post-run. 2 secs | 23:26 |
johnsom | My little patch for the netns service is getting bigger. I found that some of the unit tests are actually functional tests and need moved.... | 23:26 |
cgoncalves | johnsom, http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/18de25c/job-output.txt.gz#_2018-08-03_20_28_52_632600 | 23:31 |
cgoncalves | run with ca-certificate fixed: http://logs.openstack.org/14/587414/6/check/octavia-v2-dsvm-scenario-centos.7/cf0e780/job-output.txt.gz#_2018-08-03_23_24_59_017149 | 23:34 |
johnsom | So right at the top of that, it only zuul can access that path. Where is that original failure? | 23:34 |
johnsom | Ah, ok | 23:34 |
rm_work | woo my devstack worked | 23:46 |
rm_work | ok time to test this patch thing | 23:47 |
rm_work | so you recommended ... setting the config for "internal" in octavia.conf for barbican | 23:47 |
rm_work | and then setting internal to something invalid | 23:47 |
johnsom | Yeah | 23:47 |
rm_work | and seeing if it still succeeds? | 23:47 |
rm_work | ... prolly i'll just do tons of debug logging | 23:47 |
johnsom | certificates section endpoint_type | 23:47 |
johnsom | Yeah | 23:48 |
johnsom | Or just use my CLI test in the story | 23:48 |
rm_work | oh, right | 23:48 |
rm_work | well | 23:48 |
rm_work | CLI doesn't have a flag... | 23:48 |
*** harlowja has quit IRC | 23:48 | |
rm_work | i'd have to default it to False | 23:48 |
rm_work | which i can do, so :) | 23:49 |
johnsom | cgoncalves Try overriding the path to the tempest plugin to be /opt/stack/octavia-tempest-plugin | 23:50 |
johnsom | I set it to /home/zuul/.... here: | 23:50 |
johnsom | vars: | 23:51 |
johnsom | devstack_localrc: | 23:51 |
johnsom | TEMPEST_PLUGINS: "'{{ ansible_user_dir }}/src/git.openstack.org/openstack/octavia-tempest-plugin'" | 23:51 |
rm_work | oh ummm | 23:51 |
rm_work | johnsom: any chance they're still using the old-style Containers? | 23:51 |
rm_work | instead of one PKCS12 secret? | 23:51 |
johnsom | No, it is pkcs12 | 23:51 |
rm_work | because... i didn't do it for Containers yet in that patch... | 23:51 |
rm_work | hmm ok | 23:51 |
rm_work | really thought i had it there for a sec :P | 23:51 |
johnsom | cgoncalves Yeah, my money is on that /home/zuul directory permissions are different on that centos nodepool instance. opt/stack/octavia-tempest-plugin should fix you right up. Probably should just to that in the parent job for all of them | 23:53 |
cgoncalves | 2018-08-03 23:25:28.290794 | controller | /home/zuul: | 23:56 |
cgoncalves | 2018-08-03 23:25:28.290953 | controller | total 52 | 23:56 |
cgoncalves | 2018-08-03 23:25:28.291080 | controller | drwx------. 7 zuul zuul 4096 Aug 3 22:43 . | 23:56 |
johnsom | Yeah,, I think the user at that point has switched to "stack" via devstack | 23:57 |
cgoncalves | isn't the ansible user zuul? if so it should have no probs in reading /home/zuul | 23:57 |
rm_work | hmmmm johnsom this seems to be working in my client, one sec | 23:57 |
cgoncalves | ah, right | 23:57 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Gate on CentOS 7 and check on Ubuntu Bionic https://review.openstack.org/587414 | 23:59 |
cgoncalves | thanks johnsom | 23:59 |
cgoncalves | off I go | 23:59 |
johnsom | o/ | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!