openstackgerrit | Merged openstack/octavia stable/train: Update the lb_id on an amp earlier if we know it https://review.opendev.org/706615 | 00:21 |
---|---|---|
*** yamamoto has joined #openstack-lbaas | 00:36 | |
*** yamamoto has quit IRC | 00:41 | |
*** jamesden_ has joined #openstack-lbaas | 00:43 | |
*** jamesdenton has quit IRC | 00:43 | |
*** armax has quit IRC | 00:50 | |
*** armax has joined #openstack-lbaas | 00:58 | |
*** yamamoto has joined #openstack-lbaas | 01:12 | |
*** yamamoto has quit IRC | 01:38 | |
*** vishalmanchanda has joined #openstack-lbaas | 02:29 | |
*** yamamoto has joined #openstack-lbaas | 02:42 | |
*** yamamoto has quit IRC | 02:43 | |
*** yamamoto has joined #openstack-lbaas | 02:43 | |
*** hongbin has joined #openstack-lbaas | 02:59 | |
*** psachin has joined #openstack-lbaas | 03:29 | |
*** hongbin has quit IRC | 03:33 | |
*** armax has quit IRC | 04:05 | |
*** spatel has joined #openstack-lbaas | 04:17 | |
*** spatel has quit IRC | 04:21 | |
*** aannuusshhkkaa has quit IRC | 05:37 | |
*** maciejjozefczyk has joined #openstack-lbaas | 05:50 | |
*** maciejjozefczyk has quit IRC | 05:51 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 06:22 |
*** rpittau|afk is now known as rpittau | 06:29 | |
*** riuzen has joined #openstack-lbaas | 06:59 | |
*** riuzen has quit IRC | 07:01 | |
*** maciejjozefczyk has joined #openstack-lbaas | 07:25 | |
*** stingrayza has joined #openstack-lbaas | 07:26 | |
*** stingray- has joined #openstack-lbaas | 07:28 | |
*** also_stingrayza has quit IRC | 07:29 | |
*** stingrayza has quit IRC | 07:31 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Preupgrade check for amphorav2 provider https://review.opendev.org/735556 | 07:41 |
*** ataraday_ has joined #openstack-lbaas | 07:54 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 08:11 |
*** stingray- is now known as stingrayza | 08:36 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: DNM: CentOS 8 controller and amphora job https://review.opendev.org/698450 | 08:50 |
*** born2bake has joined #openstack-lbaas | 09:01 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: DNM: CentOS 8 controller and amphora job https://review.opendev.org/698450 | 09:05 |
*** ramishra has quit IRC | 09:09 | |
*** luksky has joined #openstack-lbaas | 09:18 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: Introduce an image driver interface https://review.opendev.org/738017 | 09:23 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: Add amphora image tag capability to Octavia flavors https://review.opendev.org/737528 | 09:23 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 09:24 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 09:49 |
*** ramishra has joined #openstack-lbaas | 09:52 | |
*** rpittau is now known as rpittau|bbl | 10:04 | |
openstackgerrit | Merged openstack/octavia stable/train: Workaround peer name starting with hyphen https://review.opendev.org/732430 | 10:15 |
cgoncalves | cores, please review these two backports so that we can cut a train dot release: https://review.opendev.org/#/c/738023/ and https://review.opendev.org/#/c/737155/ | 10:19 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 11:22 |
*** gcheresh has joined #openstack-lbaas | 11:34 | |
*** gcheresh has quit IRC | 11:44 | |
*** rpittau|bbl is now known as rpittau | 12:12 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 12:33 |
*** gcheresh has joined #openstack-lbaas | 12:57 | |
*** jamesden_ is now known as jamesdenton | 12:59 | |
*** stingrayza has quit IRC | 13:00 | |
*** TrevorV has joined #openstack-lbaas | 13:25 | |
ataraday_ | johnsom, Hi! I released that failover refactor broke some of amphorav2 functionality, like update_vrrp_conf() method changed and v2 tasks we not updated. If you have WIP patch for v2 refactor - maybe you can upload it and we can work on it together to speed up things. | 13:28 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 13:46 |
*** yamamoto has quit IRC | 13:54 | |
*** gthiemon1e is now known as gthiemonge | 13:55 | |
*** yamamoto has joined #openstack-lbaas | 13:56 | |
*** stingrayza has joined #openstack-lbaas | 13:57 | |
*** gcheresh has quit IRC | 14:19 | |
*** armax has joined #openstack-lbaas | 14:24 | |
*** ataraday_ has quit IRC | 14:51 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 15:26 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 15:30 |
*** gcheresh has joined #openstack-lbaas | 15:31 | |
*** yamamoto has quit IRC | 15:44 | |
*** rpittau is now known as rpittau|afk | 15:59 | |
*** vishalmanchanda has quit IRC | 16:15 | |
*** psachin has quit IRC | 16:18 | |
*** yamamoto has joined #openstack-lbaas | 16:20 | |
*** yamamoto has quit IRC | 16:27 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 16:37 |
xgerman | cgoncalves: you are welcome | 16:45 |
cgoncalves | xgerman, lol. you've just +W'd them. danke dir! | 16:46 |
johnsom | xgerman While you are on a roll: https://review.opendev.org/#/c/738070/ | 16:47 |
johnsom | That actually broke our OSA friends | 16:48 |
johnsom | Thanks! | 16:49 |
xgerman | sure thing! | 16:49 |
johnsom | cgoncalves Thanks for getting stable/train moving again. We should try to get that endpoint_type patch in the stable/train release as the bad patch was backported there. | 16:50 |
johnsom | Wow, there is an octavia patch in experimental that has run 28 hr 21 min | 16:51 |
johnsom | It's the v2 patch | 16:53 |
*** rouk has joined #openstack-lbaas | 17:04 | |
rouk | question, i have a user who "accidentally" deleted their certs out of barbican before setting the new ones on the listener, and setting the new one on the listener errors due to the old ones, whats the best workaround? | 17:06 |
johnsom | That was fixed a while ago. What version of Octavia are you running? | 17:07 |
rouk | uhhh, sec | 17:07 |
johnsom | https://review.opendev.org/#/c/691987/ | 17:08 |
rouk | 5.0.1, latest train. | 17:11 |
johnsom | That issue was a confluence of open bugs coming together. There is an open bug in barbican to allow us to "reserve" pkcs12 bundles like we did for the raw containers format. So when we switched to pkcs12 bundles all of a sudden people could delete in use barbican content. Which exposed that our API thought they could not do that. So we had to go fix all of that in the API. | 17:11 |
rouk | should i be building from stable/train? | 17:11 |
johnsom | Ok, one second let me check the status of that patch in train. We are planning a train stable branch release very soon. | 17:11 |
rouk | yeah, if stable/train has it, or i can safely cherrypick it, ill get that built and deployed to fix this for us. | 17:13 |
johnsom | Yeah, stable/train has these patches. I'm just not sure if it has been "released" yet. Github is not being nice to me today. | 17:15 |
johnsom | There were a series of patches related to that issue. It might be hard to cherry pick. | 17:15 |
rouk | well if its in stable/train im fine building that | 17:16 |
rouk | we are just using release packages currently for octavia | 17:16 |
johnsom | As a short term workaround, you could update the barbican href in the database to point to the new href. | 17:16 |
rouk | will the listener delete? | 17:16 |
rouk | i could dump the listener and tell the user to remake it, at worst. | 17:16 |
johnsom | No | 17:16 |
cgoncalves | fix not released yet in train | 17:17 |
johnsom | Ok. Yeah, we are cutting a new train release fairly soon. I would expect one early next week at the latest. | 17:17 |
rouk | ah okay, ill get stable/train built and deployed then, had to do that for a lot of projects in stein too... as missing patches in release versions plagues every openstack project basically. | 17:19 |
johnsom | Yeah, we hit gate breakage from the python3 switch, so we couldn't do stable releases for a while. | 17:19 |
rouk | does the patch have any messaging that tells the user that the cert is gone? i have it in my logs, but the user just gets a bricked LB with everything in error. | 17:21 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: Define and use octavia nodesets https://review.opendev.org/738246 | 17:22 |
johnsom | Well, the patches just make actions work when the content is missing. | 17:23 |
johnsom | If they try to use a missing pkcs12 bundle, the api and client both tell the user it is invalid | 17:24 |
*** gcheresh has quit IRC | 17:24 | |
rouk | alright, not really an issue, just curious is all | 17:24 |
*** maciejjozefczyk has quit IRC | 17:28 | |
johnsom | rouk I just checked on delete, it appears the delete fix was in 5.0.1, so they should be able to delete the listener and recreate. | 17:29 |
johnsom | Just update isn't in 5.0.1 | 17:29 |
rouk | delete doesnt work, stuck complaining | 17:39 |
rouk | no error returned to client, same octavia-listener-delete/update flow that is stuck complaining about certs | 17:40 |
johnsom | Yeah, I know you don't have update, but I would have expected the delete to work with 5.0.1 | 17:40 |
rouk | yeah no dice, same revert based on the cert retrieval error | 17:41 |
openstackgerrit | Merged openstack/octavia stable/train: Fix netcat option in udp_check.sh for CentOS/RHEL https://review.opendev.org/738023 | 18:35 |
johnsom | Ugh, one of the stable/train patches landed on a rax host 600 seconds a test | 19:11 |
openstackgerrit | Merged openstack/octavia stable/train: Fix batch member create for v1 amphora driver https://review.opendev.org/737155 | 19:23 |
openstackgerrit | Merged openstack/octavia master: Fix neutron subnet lookup ignoring endpoint_type https://review.opendev.org/738070 | 19:23 |
johnsom | Well, at least it merged | 19:24 |
openstackgerrit | Michael Johnson proposed openstack/octavia stable/ussuri: Fix neutron subnet lookup ignoring endpoint_type https://review.opendev.org/738264 | 19:24 |
openstackgerrit | Michael Johnson proposed openstack/octavia stable/train: Fix neutron subnet lookup ignoring endpoint_type https://review.opendev.org/738265 | 19:24 |
rouk | this would be in the haproxy logs on the amphora, right? had every amphora in dev go error on cert rotation after 18 days on train octavia.amphorae.drivers.haproxy.exceptions.InternalServerError: Internal Server Error | 19:31 |
johnsom | No, the reason there should be in the main controller logs. It will also be in the amphora logs, likely syslog or amphora-agent | 19:32 |
rouk | no reason i can find in controller logs | 19:33 |
johnsom | It should be in the line or two above the "internal server error line" | 19:33 |
rouk | http://paste.openstack.org/show/PRXAusVIUR5QTPo3igK9/ | 19:34 |
johnsom | Yeah, it's still above those lines. | 19:36 |
rouk | mm, figured it would be in the stack, looked higher, its just a generic message so my search didnt see it: Amphora agent returned unexpected result code 500 with response {'error': 'write() argument must be str, not bytes', 'http_code': 500} | 19:36 |
rouk | smells like python2 | 19:37 |
rouk | the cert being used is being represented as b'XXX' in the logs. | 19:37 |
johnsom | Ah, yeah, that was fixed too. It was a python issue | 19:37 |
rouk | this might be 5.0.0 | 19:38 |
rouk | checking | 19:38 |
rouk | oh, too late, already updated the containers, dont have records in this dev spot. | 19:38 |
johnsom | It was this bug: https://review.opendev.org/#/c/719922/ | 19:39 |
rouk | it was kolla as of march 25th ish | 19:39 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 19:43 |
rouk | johnsom: yeah checked and 5.0.1 doesnt have that fix | 19:49 |
rouk | kinda a timebomb :/ | 19:49 |
rouk | glad im pushing stable/train today. | 19:49 |
*** TrevorV has quit IRC | 19:50 | |
*** yamamoto has joined #openstack-lbaas | 20:25 | |
*** yamamoto has quit IRC | 20:30 | |
*** aannuusshhkkaa has joined #openstack-lbaas | 20:31 | |
*** shtepanie has joined #openstack-lbaas | 20:41 | |
shtepanie | hi! does anyone have any advice or tips on how to debug a test case from Zuul that timed out? https://zuul.opendev.org/t/openstack/build/73e396cdd62f40cca2914f7e0be683a0 | 20:46 |
johnsom | I can take a look | 20:46 |
johnsom | shtepanie So that job is "non-voting" which means we know it is sometimes experiencing problems not related to the patch. | 20:47 |
johnsom | That said, I will still take a quick look | 20:47 |
shtepanie | ah i see, thanks! | 20:48 |
johnsom | So this one, show_loadbalancer provisioning_status failed to update to ACTIVE within the required time 900. Current status of show_loadbalancer: PENDING_CREATE | 20:49 |
johnsom | For whatever reason nova was unable to boot a VM inside the 900 second timeout. | 20:49 |
shtepanie | where did you look to find that error? | 20:50 |
johnsom | Or it never was reachable on the network. | 20:50 |
johnsom | https://zuul.opendev.org/t/openstack/build/73e396cdd62f40cca2914f7e0be683a0/log/job-output.txt#60612 | 20:50 |
johnsom | That is the direct link. I usually start with the "job-output.txt" log file. Then will dig into the other logs are I see where the error might be | 20:51 |
aannuusshhkkaa | So how do we go about debugging this? | 20:51 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: DNM: ARM64 support https://review.opendev.org/738096 | 20:52 |
johnsom | Well, I would say that you don't need to debug it as it is unrelated to your patch. But if you are interested in trying to figure it out, we would drill down into the deeper level logs to see if there are hints. | 20:53 |
johnsom | My next stop is usually the screen-o-cw.txt log, which is an Octavia worker log file: https://zuul.opendev.org/t/openstack/build/73e396cdd62f40cca2914f7e0be683a0/log/controller/logs/screen-o-cw.txt#1760 | 20:54 |
johnsom | I can confirm here that the only issue/error is that the controller can't reach the VM being booted. | 20:54 |
johnsom | Again at this point it is either a nova or neutron failure. | 20:54 |
johnsom | As you can see in the logs, we try for a long time.... | 20:55 |
shtepanie | ah ok, thanks for clarifying! | 20:56 |
aannuusshhkkaa | ohh how can you be sure that it wasn't a failure on octavia side? | 20:56 |
johnsom | Next I would typically look in the libvirt logs to see if the hypervisor kernel crashed. Unfortunately this multi-node job doesn't collect those logs. | 20:57 |
johnsom | I highly suspect that was the issue here. | 20:57 |
aannuusshhkkaa | ohh.. so how do we do away with the time out error? | 20:58 |
aannuusshhkkaa | without knowing what caused it | 20:58 |
johnsom | There is a bit of art to this. I look for any log messages that are of "ERROR", there is a filter at the top. I also then look at what is in the logs. In this case I see it's just repeatedly attempting to connect to the amphora. If that part of the code was broken, the other scenario tests would have all failed as well. | 20:59 |
johnsom | Well, since it's non-voting, it won't block your patch or stop people from reviewing it. If Zuul gives it a +1 you are good to go. | 21:00 |
johnsom | If you want to see if you can clear it, and you know it's an un-related issue to the patch, you can post a comment of "recheck" which will tell zuul to re-run the tests. However, we use those sparingly as they use resources to run. Also in this case, it's a non-voting job, so no really need/reason to recheck it. | 21:01 |
shtepanie | will recheck run all of the tests? | 21:02 |
johnsom | You just got the unlucky roll of the dice on that one. It landed on a test host with a broken hypervisor | 21:03 |
aannuusshhkkaa | aah okay so we leave it as it is.. how can you be sure that it will not cause any issues in production level code? | 21:03 |
johnsom | Yes, it reruns all of the tests | 21:03 |
johnsom | The tests that don't say "non-voting" are the tests that ensure that. | 21:03 |
johnsom | Basically I can tell that the failure was not in our code by looking at our logs. To isolate it more I would dig through the nova and neutron logs. But those services logs are not always so helpful. If we collected the libvirt logs in that job, it would probably be very clear with a kernel trace in the log. | 21:05 |
johnsom | This is one of the challenges of relying on other projects. Sometimes they are broken and impact us. | 21:06 |
aannuusshhkkaa | why dont we collect libvirt logs? | 21:06 |
aannuusshhkkaa | also, at this stage, do we inform the nova or neutron team about potential bugs? | 21:06 |
johnsom | That I don't know. The parent job created by the openstack-qa team doesn't collect those logs in the multi-node jobs. I think it really should. | 21:06 |
aannuusshhkkaa | okay cool | 21:07 |
johnsom | If we can isolate it yes, we would open a bug for them. In the case of the qemu/kvm crash, I opened a kernel bug for it: https://bugzilla.kernel.org/show_bug.cgi?id=192521 | 21:08 |
openstack | bugzilla.kernel.org bug 192521 in kvm "KVM: entry failed, hardware error 0x0" [High,New] - Assigned to virtualization_kvm | 21:08 |
*** armax has quit IRC | 21:10 | |
aannuusshhkkaa | gotcha! thanks for the clarification.. so now you will be able to review and merge our branch to master right? | 21:10 |
johnsom | Yes, the test passed so that is good sign it is ready for reviews. | 21:10 |
aannuusshhkkaa | alrighty! | 21:11 |
johnsom | I don't think I will have time to review that today, but early next week. Maybe others will get to it sooner. | 21:12 |
aannuusshhkkaa | That's okay.. just making sure there are no blockers from our end.. | 21:13 |
johnsom | Nope, looks good | 21:13 |
shtepanie | thanks!! | 21:13 |
aannuusshhkkaa | Thank you, johnsom! :) | 21:14 |
johnsom | No problem | 21:14 |
*** armax has joined #openstack-lbaas | 21:26 | |
openstackgerrit | Merged openstack/octavia master: Fix error on devstack cleanup https://review.opendev.org/735510 | 21:38 |
*** servagem has quit IRC | 22:01 | |
*** luksky has quit IRC | 22:22 | |
rouk | aw did vip acls not make it into train? | 22:58 |
rouk | nevermind, just didnt show up in the patch notes for the release, but its in the api and yaml notes | 23:02 |
johnsom | ACLs are listed in the features of 5.0.0, just spelled out as "access control list" | 23:15 |
rouk | ah so it is, gets lost in the list, i am blind as a bat apparently. | 23:15 |
johnsom | Ha, blinded by the shiny list of new features! | 23:16 |
rouk | every release octavia gets much nicer, ill say that much | 23:16 |
johnsom | We try.... | 23:16 |
rouk | while every release i evaluate magnum and it falls over. | 23:16 |
rouk | even though its my next biggest feature request after LBs were | 23:17 |
*** born2bake has quit IRC | 23:45 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!