opendevreview | frankming proposed openstack/ironic master: Modify ESP configuring script of redfish document https://review.opendev.org/c/openstack/ironic/+/909953 | 03:38 |
---|---|---|
opendevreview | frankming proposed openstack/ironic master: Modify ESP configuring script of redfish document https://review.opendev.org/c/openstack/ironic/+/909953 | 03:40 |
opendevreview | frankming proposed openstack/ironic master: Modify ESP configuring script of redfish document https://review.opendev.org/c/openstack/ironic/+/909953 | 03:43 |
opendevreview | frankming proposed openstack/ironic master: Fix iscsi url generate method for ipxe https://review.opendev.org/c/openstack/ironic/+/910300 | 07:00 |
dtantsur | arne_wiebalck: hey, good morning. Do you folks rely on python-hardware and extra data inspection? | 08:37 |
opendevreview | frankming proposed openstack/ironic master: Fix iscsi url generate method for ipxe https://review.opendev.org/c/openstack/ironic/+/910300 | 08:55 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: [WIP] Add inspection PXE filter service https://review.opendev.org/c/openstack/ironic/+/907991 | 09:03 |
opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent-builder master: Update link to ipmitool repository https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/910216 | 09:32 |
opendevreview | Merged openstack/ironic master: neutron: do not error if no cleaning/provisioning on launch https://review.opendev.org/c/openstack/ironic/+/909937 | 12:31 |
TheJulia | good morning | 14:08 |
opendevreview | Riccardo Pittau proposed openstack/ironic master: [WIP] move back to plain pyasn1 in 2024.2 https://review.opendev.org/c/openstack/ironic/+/910342 | 14:12 |
opendevreview | Julia Kreger proposed openstack/ironic stable/2023.2: neutron: do not error if no cleaning/provisioning on launch https://review.opendev.org/c/openstack/ironic/+/910310 | 14:13 |
opendevreview | Julia Kreger proposed openstack/ironic stable/2023.1: neutron: do not error if no cleaning/provisioning on launch https://review.opendev.org/c/openstack/ironic/+/910311 | 14:13 |
opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent-builder master: Update ipmitool version to 1.8.19 https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/910344 | 14:31 |
TheJulia | dtantsur: it occurs to me that some operators do for the extensive data, but may be the path is to make it "more optional" or a dib element plugin? | 15:24 |
dtantsur | TheJulia: it is a plugin. The problem is: the library itself is maintained by rpittau alone, and we don't have a use case for it. | 15:25 |
JayF | let me have a look | 15:29 |
JayF | my downstream might use it | 15:29 |
JayF | This is a major downside of putting our stuff in libraries | 15:29 |
JayF | if this was a feature fully implemented in IPA, I suspect we don't have this conversation | 15:29 |
dtantsur | Yeah, although in this case it was the other way around: we just reused a library | 15:30 |
JayF | ah, and rpittau took it over | 15:30 |
JayF | I saw redhat-cip in the org and assumed we/you all built it | 15:30 |
dtantsur | Yep. Red Hat "inherited" it from the eNovance acquisition. | 15:30 |
dtantsur | And the OSP HardProv team, later OCP Metal team ended up responsible for it. | 15:31 |
dtantsur | (could be a good PTG topic btw) | 15:33 |
TheJulia | ++++ | 15:43 |
*** nfedorov_ is now known as jingvar | 15:56 | |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: [WIP] Add inspection PXE filter service https://review.opendev.org/c/openstack/ironic/+/907991 | 16:19 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: [WIP] Add RPC for the PXE filter service https://review.opendev.org/c/openstack/ironic/+/910365 | 16:21 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: [WIP] Add inspection PXE filter service https://review.opendev.org/c/openstack/ironic/+/907991 | 16:23 |
dtantsur | I wonder if something is still wrong with eventlet. Using Event.wait locks the complete process so much it only responds to SIGKILL. | 16:29 |
JayF | What eventlet version, what python version? | 16:31 |
JayF | Easy way to tell if it's eventlet usually is going from 3.11+ to <3.10 | 16:31 |
JayF | there's very little that changed around how that works in older pythons | 16:31 |
JayF | but if we can get a reproducing failure I can point someone at it | 16:31 |
dtantsur | Python 3.9.16, eventlet==0.35.1 | 16:31 |
JayF | oh, that's a bad version isn't it? | 16:31 |
dtantsur | huh, then why is it in u-c? | 16:31 |
JayF | ah, I was thinking 0.34.1 I think | 16:31 |
dtantsur | I'm testing https://review.opendev.org/c/openstack/ironic/+/907991. Basically just starting the new service. | 16:32 |
JayF | yeah 34.1 was the bad one, confirmed | 16:32 |
dtantsur | It starts then locks up until the event timeout passes. Then it handles Ctrl+C. | 16:32 |
* dtantsur is pondering a busy loop instead of an event.. | 16:37 | |
JayF | dtantsur: you didn't monkey_patch | 16:40 |
JayF | https://review.opendev.org/c/openstack/ironic/+/907991/8#message-6a4cad63ec16df488b72eba54288e81ced3a19a8 | 16:40 |
dtantsur | JayF: it's in cmd.__init__ already | 16:43 |
JayF | __init__ doesn't run when you have a main | 16:43 |
JayF | __init__ runs on import | 16:43 |
JayF | not on calls to main | 16:43 |
JayF | aiui | 16:43 |
dtantsur | JayF: then nothing in ironic monkey patches :) | 16:43 |
dtantsur | but no, you cannot import a submodule without running __init__ | 16:44 |
JayF | hmmmm | 16:44 |
dtantsur | (we rely on it for all our commands) | 16:44 |
JayF | yep, you're right | 16:44 |
JayF | so it prints no messages? | 16:45 |
JayF | nothing angry from eventlet | 16:45 |
JayF | so that means you must be patched in | 16:45 |
dtantsur | I don't see anything.. I also use eventlet.queue and spawn explicitly. | 16:47 |
JayF | Can you lay out the exact case you're seeing, and perhaps a simple reproduction that is doable by someone less-openstacky? | 16:49 |
JayF | I'm basically trying to see if I can get this to a point where I can ask someone more eventlet-y to look at it | 16:49 |
JayF | but that's only going to happen if we can repro in a venv/outside devstack | 16:49 |
JayF | dtantsur: have we used https://docs.python.org/3/library/fcntl.html with eventlet before? | 16:50 |
JayF | It probably doesn't even get that far, does it? | 16:51 |
dtantsur | I'll see if I can come up with an easier reproducer | 16:54 |
dtantsur | (I don't quite understand the context of the question around fcntl) | 16:54 |
rpittau | good night! o/ | 16:58 |
opendevreview | Verification of a change to openstack/ironic-python-agent-builder master failed: Update tinyipa to tinycore 15.x https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/910169 | 17:10 |
opendevreview | Verification of a change to openstack/ironic-python-agent-builder master failed: Update link to ipmitool repository https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/910216 | 17:10 |
opendevreview | Verification of a change to openstack/ironic-python-agent-builder master failed: Update ipmitool version to 1.8.19 https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/910344 | 17:10 |
JayF | dtantsur: I'm mainly just suspect it could be responsible for a deadlock; that's all | 17:42 |
TheJulia | Hey folks, https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/902171 would love a review. It would enable us to run multiple distinct boot_interface scenario jobs on the same overall job. Specifically so we cover more boot interfaces without adding more scenario jobs. | 19:00 |
JayF | TheJulia: +2a | 19:05 |
JayF | https://review.opendev.org/c/openstack/releases/+/910398 I'll note I just +1'd this patch to move V, W, X to unmaintained/ namespace | 19:29 |
JayF | Please let me know if at some point you all stop caring about V so we can EOL it :) | 19:29 |
JayF | At this point, my personal investment stops around W | 19:29 |
JayF | (but you know me, I'll backport things anyway) | 19:29 |
TheJulia | I only care about 2023.1 and W | 19:40 |
opendevreview | Julia Kreger proposed openstack/ironic-tempest-plugin master: Invoke tests with fake interfaces https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/909939 | 19:40 |
TheJulia | wow that took a long time to post into gerrit | 19:40 |
JayF | we can't keep "W" without a chain of upgrades to master | 19:42 |
JayF | so that means you care about everything W+ | 19:42 |
*** elodilles is now known as elodilles_pto | 19:46 | |
TheJulia | care is on different levels, truthfully | 19:56 |
JayF | if it was maintained | 19:56 |
JayF | we wouldn't prefix with "un-" :D | 19:56 |
opendevreview | Julia Kreger proposed openstack/ironic master: ci: fix dnsmasq downgrade package location https://review.opendev.org/c/openstack/ironic/+/910436 | 19:57 |
TheJulia | flagged https://review.opendev.org/c/openstack/ironic/+/910436 as ironic-week-prio as it is causing numerous job failures | 19:58 |
TheJulia | well, it seeks to fix what causes many failures | 19:58 |
TheJulia | https://zuul.opendev.org/t/openstack/status#ironic seems a bit like https://media1.tenor.com/m/6031wf2pdKwAAAAd/you-got-red-on-you-you-for-a-stain.gif | 19:59 |
TheJulia | Well, looks like we may just want to disable grenade to merge it and then backport the fix to branches which are also impacted. | 20:13 |
JayF | if we need to do that, I'm onboard | 20:16 |
JayF | will likely not have free time to look at this until tomorrow or later | 20:16 |
TheJulia | I'm going to let the change on the master branch run for now | 20:17 |
TheJulia | just to make sure it is good/happy | 20:17 |
opendevreview | Verification of a change to openstack/ironic-tempest-plugin master failed: Test multiple boot interfaces as part of one CI job https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/902171 | 20:21 |
JayF | TheJulia: So looking at https://review.opendev.org/c/openstack/ironic/+/894460/18/zuul.d/ironic-jobs.yaml#769 -- the multitenant job is failing; kinda expected given how it's setup; but I wanted to ensure that changing the regexp is the right path to get that job to not run | 20:30 |
JayF | (all the other jobs pass) | 20:30 |
JayF | and yes, I'll DRY up the multinode stuff in a follow-up (I'm trying right now to get a job working to make nova happy)_ | 20:31 |
JayF | ignore sean's comment, he enabled the job overnight in nova so we have job outputs | 20:33 |
JayF | TheJulia: actually... I can't make sense as to why the multitenant job fails there | 20:35 |
JayF | ooooh, yeah, I know why | 20:35 |
JayF | no, I don't | 20:35 |
JayF | wtf | 20:35 |
JayF | The only difference is setting IRONIC_SHARDS and a shard name | 20:37 |
JayF | they are hooked up to the same flavor | 20:37 |
JayF | I cannot explain why one works and one does not | 20:37 |
JayF | and it's concerning to me as it could be a real bug(?) | 20:37 |
JayF | hmm, I see a comment that might be a pointer to this being broken in our devstack plugin | 20:39 |
JayF | https://zuul.opendev.org/t/openstack/build/2a3f86e4974b4a4a9e875477c77f81e0/log/controller/logs/screen-ir-api.txt#2378-2389 maybe related | 20:44 |
JayF | for now I'm skipping that job | 20:44 |
JayF | and just doing BaremetalBasicOps | 20:44 |
JayF | and going and and DRY'ing it | 20:44 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [CI] Support for running with shards https://review.opendev.org/c/openstack/ironic/+/894460 | 20:52 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [CI] Support for adding dummy shards https://review.opendev.org/c/openstack/ironic/+/910440 | 20:52 |
JayF | ^ that change swaps it to only do BaremetalBasicOps, and I went ahead and DRY'd it up | 20:53 |
JayF | sean-k-mooney: ^ that should actually run itself this time, as a bonus lol | 20:53 |
sean-k-mooney | ack | 20:54 |
sean-k-mooney | i may not get time to redo my patch tonight but if can do it quickly ill let it run and check it in the morning | 20:54 |
sean-k-mooney | so with the split i shoudl jsut depnd on https://review.opendev.org/c/openstack/ironic/+/894460 for now right | 20:54 |
JayF | yes | 20:55 |
JayF | 910440 is separate, just for dummy shards | 20:55 |
JayF | what does your patch need? | 20:55 |
JayF | it should be trivial, yeah? | 20:55 |
sean-k-mooney | so ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa | 20:56 |
JayF | no | 20:56 |
sean-k-mooney | is the sanity check job we run on every patch | 20:56 |
JayF | oh, yeha | 20:56 |
JayF | That's the one you want me to hook up to dummy when it's' built? | 20:56 |
JayF | I can swing that | 20:56 |
sean-k-mooney | well we could or we could change what job we run | 20:56 |
JayF | let me restack it real quick to be that way, shouldn't be hard | 20:56 |
JayF | you are running the most passe' job possible | 20:57 |
sean-k-mooney | your updating ironic-tempest-uefi-redfish-vmedia | 20:57 |
JayF | BIOS! IPMI! | 20:57 |
JayF | bah! | 20:57 |
JayF | lol | 20:57 |
JayF | yeah, just because redfish is newer and shinier but it doesn't matter | 20:57 |
sean-k-mooney | basically im asking | 20:57 |
JayF | I'll flip it around | 20:57 |
sean-k-mooney | if you could have just one ironic job on every nova patch | 20:57 |
JayF | I want, for purposes of this conversation, the minimal change possible in nova | 20:57 |
sean-k-mooney | what woudl you like it to be | 20:57 |
JayF | and do not want you to address my answer to that question :) | 20:57 |
JayF | lol | 20:57 |
JayF | probably something uefi+redfish reflects the majority of how things are run | 20:58 |
sean-k-mooney | ok so i think we can just swap to ironic-tempest-uefi-redfish-vmedia | 20:58 |
sean-k-mooney | when that is working with the dumy nodes | 20:58 |
JayF | easy enough | 20:58 |
JayF | just know my choice was probably decided as much by where my cursor was in the file | 20:58 |
JayF | and I just lucked out to hit a good choice for that use lol | 20:59 |
JayF | actually, I wonder if nova should care about vmedia vs pxe job | 20:59 |
TheJulia | unlikely, tbh | 20:59 |
JayF | any ideas on that multitenant weirdness? | 20:59 |
JayF | it makes me nervous but I don't know anything about that job | 20:59 |
TheJulia | I haven't been able to look, I am in a meeting at the moment | 20:59 |
JayF | and sean-k-mooney is fairly certain it can't be because of sharding | 20:59 |
JayF | ack | 20:59 |
JayF | mainly looking for you to pat me on the head and say it's OK LOL | 21:00 |
sean-k-mooney | nova largely should not care about the particalar of the ironic driver | 21:00 |
JayF | only things I can really think of is BFV might be nice | 21:01 |
JayF | because that's in our nova driver too | 21:01 |
JayF | but it probably trades coverage in one area for coverage in another | 21:01 |
JayF | and I know volume stuff is a pain point for nova jobs so... | 21:01 |
TheJulia | JayF: all the changes on the job config you posted made me raise my eyebrow immediately, but I need to look at the actual results when I have time to do so | 21:01 |
sean-k-mooney | ya so we would be happy to test bfv with ironic | 21:01 |
sean-k-mooney | well actully that because of a libvirt regression | 21:02 |
JayF | TheJulia: the new version is DRY'd, the only changes are to tempest_test_regexp and adding IRONIC_SHARD* vars | 21:02 |
sean-k-mooney | well qemu | 21:02 |
JayF | so how about this | 21:02 |
JayF | we focus right now on "good enough for sharding to merge" | 21:02 |
sean-k-mooney | basically we have some josb fail because qemu does nto give up the volume when we ask it too | 21:02 |
JayF | but I promise you more time after to "improve nova<>ironic ci generally" | 21:02 |
sean-k-mooney | and sometime we hit slow node issues with lvm runing on a loopback device | 21:02 |
JayF | which will include making this dummy shard stuff work, hooking up a post-run thing to ensure n-cpu didn't cross shards, and picking a better job for nova to run | 21:02 |
sean-k-mooney | right BFV is not needed for sharding | 21:03 |
sean-k-mooney | but we can iterage on this after FF | 21:03 |
JayF | well ideally, we get dummy shards + the post-run stuff workign well enough | 21:03 |
JayF | that we can drop the shard multinode | 21:03 |
JayF | because we haven't even evaluated the fact that I gotta get that merged here | 21:03 |
TheJulia | JayF: https://zuul.opendev.org/t/openstack/build/2a3f86e4974b4a4a9e875477c77f81e0/log/controller/logs/screen-ir-api.txt#2378-2389 is unrelated | 21:03 |
JayF | that's an NGS vif detach bug, right? | 21:03 |
TheJulia | JayF: where is the job saying that your seeing fail due to https://review.opendev.org/c/openstack/ironic/+/894460/18/zuul.d/ironic-jobs.yaml#741 with the multitenant job? | 21:04 |
TheJulia | since the test is not run | 21:04 |
TheJulia | no, it is a trust/who is authoritative issue | 21:04 |
JayF | sean made it run in nova, lemme find it | 21:04 |
JayF | https://review.opendev.org/c/openstack/nova/+/910333 | 21:05 |
TheJulia | and reality of "we need to do the thing to move it forward" | 21:05 |
TheJulia | okay | 21:07 |
TheJulia | so the issue is, not all scenario jobs are appropriate for all situations | 21:07 |
TheJulia | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_2a3/910333/4/check/ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode-shard/2a3f86e/testr_results.html | 21:07 |
TheJulia | basic ops and single tenant pass | 21:08 |
JayF | TheJulia: my concern is simple: I'll note that https://review.opendev.org/c/openstack/ironic/+/894460/19/zuul.d/ironic-jobs.yaml represents the diff between this job and the one that passes | 21:08 |
TheJulia | multi-tenant is a specialized case test which requires additional configuration AFAIK | 21:08 |
JayF | TheJulia: I can't answer *why it doesn't pass on shard job* | 21:08 |
TheJulia | so you *cannot* just say run everything, *espesically* in scenario | 21:08 |
JayF | because it's identical config, just with sharding on top | 21:08 |
JayF | (ignore the tempest_test_regex change in that patchset; it was modified to avoid this issue and get a clean pass, but I still wanna know *why*) | 21:09 |
JayF | because it seems to me if it were possible for sharding to cause that job to fail, and *only sharding* that may be a canary of a potential issue | 21:09 |
JayF | but I can't connect dots as to where/how it could be, so I am a little stuck | 21:09 |
TheJulia | The way I see it right now, they are unrelated, but maybe I'm not seeing the same concern | 21:10 |
sean-k-mooney | i was plannign to start with this tempest_test_regex: (^tempest\..*compute\..*(reboot|rebuild).*) | 21:10 |
JayF | TheJulia: If all I do is add sharding, and setup the two n-cpus on separate shards, why should that make multitenant tests fail? | 21:11 |
JayF | That's the exact question I'm trying to answer | 21:11 |
JayF | and "we expect it to fail because X" is a good answer | 21:11 |
JayF | but AIUI there should be *zero* about having different shards setup that causes this to fail | 21:11 |
TheJulia | my gut feeling is you've resurrected job config we just don't expect to be run anymore | 21:14 |
TheJulia | but I need to be able to pivot to focus to know for sure | 21:14 |
JayF | Yeah, that assumption is invalid. That job is passing alongside the failing job | 21:15 |
TheJulia | where? | 21:15 |
JayF | > ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode https://zuul.opendev.org/t/openstack/build/8f54392ac8ec4739ad1d782b245e7cfa : SUCCESS in 1h 30m 41s (non-voting) | 21:15 |
JayF | https://review.opendev.org/c/openstack/ironic/+/894460/19#message-114d559904d38dc5d0c0a9e0081294cbaed5f4cf | 21:15 |
JayF | that link is good to use, to results | 21:15 |
TheJulia | \o/ | 21:15 |
JayF | that is the only reason I WTF; that I see this job passing | 21:15 |
JayF | the -shard variant is not passing the same test | 21:16 |
JayF | with shards being the only difference | 21:16 |
JayF | which is hrm-inducing | 21:16 |
TheJulia | did you look at the actual failure? | 21:16 |
TheJulia | testtools.matchers._impl.MismatchError: 'PING 10.0.100.223 (10.0.100.223) 56(84) bytes of data.\n64 bytes from 10.0.100.223: icmp_seq=1 ttl=64 time=6.22 ms\n64 bytes from 10.0.100.223: icmp_seq=2 ttl=64 time=0.458 ms\n64 bytes from 10.0.100.223: icmp_seq=3 ttl=64 time=0.434 ms\n64 bytes from 10.0.100.223: icmp_seq=4 ttl=64 time=0.475 ms\n\n--- 10.0.100.223 ping statistics ---\n4 packets transmitted, 4 received, 0% packet | 21:16 |
TheJulia | loss, time 3008ms\nrtt min/avg/max/mdev = 0.434/1.895/6.216/2.494 ms\n' matches Contains(' bytes from 10.0.100.223') <-- deja vu | 21:16 |
JayF | Yeah, so in the sharding case the end nodes can communicate | 21:17 |
JayF | whereas in the non-sharding case they can't | 21:17 |
JayF | or is there something else I'm missing? | 21:17 |
TheJulia | the match is failing | 21:19 |
TheJulia | the text it is trying to compare suggests success | 21:19 |
TheJulia | but hey, don't know why it doesn't want to match it right now | 21:19 |
JayF | it's the opposite, it's a negative match | 21:19 |
JayF | and it found it | 21:19 |
JayF | that ping /is not supposed to succeed/ AIUI | 21:19 |
TheJulia | oh shoot | 21:20 |
sean-k-mooney | pre-run: playbooks/ci-workarounds/pre.yaml | 21:20 |
sean-k-mooney | is there a reason you added that | 21:20 |
JayF | https://github.com/openstack/ironic/blob/master/playbooks/ci-workarounds/pre.yaml | 21:20 |
JayF | looks like it does multi-node-bridge? | 21:21 |
sean-k-mooney | that should be done in the parent job right | 21:21 |
JayF | oooh good point | 21:21 |
JayF | I see | 21:21 |
sean-k-mooney | as in the exiting multi node josb pre playbooks will run before this jobs | 21:21 |
sean-k-mooney | in an onion style | 21:21 |
sean-k-mooney | so that proably not needed | 21:22 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [CI] Support for running with shards https://review.opendev.org/c/openstack/ironic/+/894460 | 21:22 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [CI] Support for adding dummy shards https://review.opendev.org/c/openstack/ironic/+/910440 | 21:22 |
JayF | that's the updated one | 21:22 |
sean-k-mooney | ack | 21:22 |
sean-k-mooney | so BaremetalBasicOps is the new tempest regex | 21:23 |
JayF | yep | 21:24 |
TheJulia | JayF: I think I would need to dig at the job log itself and compare to the test for sure, because out of the box, yeah, that should be failing. I *will* note | 21:34 |
TheJulia | err | 21:34 |
TheJulia | sorry, still on a short call that is dragging on | 21:34 |
TheJulia | so, the *test* was originally intended if I'm remembering correctly to be a vm in one tenant and a baremetal in another | 21:35 |
JayF | oooh | 21:35 |
JayF | I'm going to step away and walk the dog, I'd like to figure out why that's failing but we've hopped over that issue for now | 21:36 |
JayF | and sean-k-mooney is fairly confident that even if the fail is real (doubtful, still) that it's not likely caused on nova side | 21:37 |
JayF | so I think the job gives us confidence in any event w/r/t what Nova devs are lookin' | 21:37 |
sean-k-mooney | i also dont think this is nova related | 21:37 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/910333 | 21:37 |
sean-k-mooney | i have slit out the job change | 21:37 |
JayF | sean-k-mooney: I'll note we're going from 4->2 nodes booted with the regexp change :) | 21:38 |
JayF | sean-k-mooney: I think we really need the post-run bit to validate to have confidence the CI job will, unattended, report if things are broken (even if logs tell us it's working) | 21:39 |
sean-k-mooney | so what ill like do is create a nova-ironic-shard job that inherits form either the multi node or dumy node job | 21:39 |
sean-k-mooney | and just tweaks the regex to enable more nova related tests | 21:39 |
JayF | booting more instances to validate in Ironic is ... not my favorite approach, as each machine we boot makes the job take longer and exposes us to more false alarms | 21:39 |
JayF | so just have some awareness of how chonky our fake machines are | 21:39 |
JayF | nested virt is terrible, only thing more terrible is NOT nested virt ;) | 21:40 |
sean-k-mooney | well our nova libvirt jobs are also booting vms in the vms | 21:40 |
sean-k-mooney | and thatn mainly what they are for | 21:40 |
JayF | I guess you have to own the libvirt, don't you | 21:41 |
sean-k-mooney | what would be nice form an nova perspective (you dont have to do this) is for use in our gate to add more usage of the nova api | 21:41 |
JayF | so no magical "rehome my top level hypervisors" | 21:41 |
sean-k-mooney | wel we have libvirt installed in teh zuul vms just like ye do | 21:41 |
JayF | sean-k-mooney: lets have this conversation during some of the downtime between cycles, I'm on board to improve coverage ... on the flip side, we're getting close to "feature complete" in ironic driver | 21:42 |
sean-k-mooney | its just nova talks to it directly instead of via ironic and vbmc | 21:42 |
JayF | we have nothing slated for it, moving forward, outside of the metadata stuff we're also working on together | 21:42 |
* TheJulia collapses after meeting | 21:42 | |
sean-k-mooney | cool but we dont have a lot of coverage or the integration in general | 21:42 |
JayF | I am going to actually go walk the dog now :) my brain keeps looking at the clock expecting it to be late bceause I'm exhausted | 21:42 |
JayF | sean-k-mooney: yeah, I realized that goes both ways | 21:43 |
JayF | sean-k-mooney: if it's more stable, even more reason to ensure it doesn't break | 21:43 |
sean-k-mooney | so testing BFV and other things would likely be good if we dont already have that in the existing job | 21:43 |
JayF | my feelings on our nova driver are way, way, up since we've got the client/sdk split through | 21:43 |
sean-k-mooney | i am not really sure how much ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa tests | 21:43 |
JayF | the funniest thing about that job name | 21:43 |
JayF | we haven't had "agent_ipmitool" style driver names in AGES | 21:44 |
JayF | I suspect we kept it to not break nova | 21:44 |
JayF | so getting onto a better job is not a bad idea | 21:44 |
TheJulia | I tried to rename it ages+1 or ages+2 cycles ago | 21:44 |
sean-k-mooney | ... | 21:44 |
JayF | and on the ironic side maybe even indicate it's used for nova, too | 21:44 |
sean-k-mooney | https://d393ab92b65d6ff2eea5-fc707f543607a38fac44776c15f601da.ssl.cf5.rackcdn.com/906992/5/check/ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa/f001dcc/testr_results.html | 21:44 |
TheJulia | but devstack's gate was so faily that I never succeeded | 21:44 |
sean-k-mooney | it runs one test | 21:44 |
JayF | wow, rescue, and bfv | 21:44 |
JayF | that's honestly not bad | 21:44 |
JayF | in terms of horizontal api coverage | 21:45 |
sean-k-mooney | am no | 21:45 |
sean-k-mooney | well its only runnign est_baremetal_server_ops_wholedisk_image | 21:45 |
sean-k-mooney | its skiped BFV | 21:45 |
JayF | oh, yeah, I see that now heh | 21:45 |
JayF | yeah, we'll improve this too | 21:45 |
JayF | this stuff /does/ get tested in Ironic gate though | 21:46 |
sean-k-mooney | so the new job your adding has more coverage then the the existing job... | 21:46 |
JayF | so putting in nova gate just is an earlier warning system | 21:46 |
JayF | which I think is why our focus has been on "make that job basic and small and super reliable" over coverage | 21:46 |
JayF | that is a decision we can revisit, but I'll put it this way: I never want the "Gate Status" portion of the TC meeting to contain "Ironic jobs are keeping Nova stuff from landing" | 21:46 |
JayF | lol | 21:47 |
sean-k-mooney | sure but its a very expeinsive way to boot one vm | 21:47 |
sean-k-mooney | anyway you should go walk your dog | 21:47 |
JayF | yes, I should | 21:47 |
JayF | :D | 21:47 |
JayF | https://usercontent.irccloud-cdn.com/file/2ip0E65I/FB_IMG_1633191782370.jpg (dog tax paid) | 21:48 |
JayF | brb :) | 21:48 |
TheJulia | oh noes | 21:48 |
TheJulia | the dnsmasq version 2.80 is quietly exiting again | 21:49 |
opendevreview | yatin proposed openstack/ironic master: Source install dnsmasq-2.87 https://review.opendev.org/c/openstack/ironic/+/888121 | 21:51 |
TheJulia | I bet that will be happy | 21:52 |
TheJulia | same plan, we could just disable grenade and backport to fix it | 21:52 |
* TheJulia begins wishing we never merged the mutlitenancy test | 22:40 | |
JayF | TheJulia: I am back, I have other things I *can* do, but I'm willing to hate that test with you if you think it's helpful :) | 22:47 |
opendevreview | Julia Kreger proposed openstack/ironic master: ci: Source install dnsmasq-2.87 https://review.opendev.org/c/openstack/ironic/+/888121 | 22:50 |
JayF | sean-k-mooney: noting that ironic ci is busted now, all those jobs are going to fail fast | 22:50 |
TheJulia | The weird thing is they are on the same logical subnet, so of course ping will work | 22:51 |
JayF | yeah I'm not sure I grok how the actual test is passing | 22:51 |
JayF | that's mainly where my wtf is coming from | 22:51 |
JayF | TheJulia: +2 888121, mainly because I trust the gate to complain if you did it wrong :) | 22:52 |
JayF | TheJulia: that way also you can land it later if it passes | 22:53 |
TheJulia | JayF: yakin did it and it passed the gate ages ago, so it should still work | 22:53 |
JayF | ack; wfm | 22:53 |
JayF | there's probably value is getting a ppa up | 22:54 |
TheJulia | stevebaker[m]: fyi https://review.opendev.org/c/openstack/ironic/+/888121 which we'll need to backport to stable/2023.2 (and likely further, thanks ubuntu!) | 22:54 |
TheJulia | I guess it is just a little frustrating since it is a known issue which kind of got shrugged at if memory serves | 22:54 |
TheJulia | but the older package route was only going to work for so long | 22:55 |
JayF | I read the LP bug more as "hell if I know" and they punted :( | 22:55 |
JayF | TheJulia: you have that LP bug # at hand, perhaps? | 22:56 |
TheJulia | not handy, I just remember we noted the link where it was known in dnsmasq | 22:56 |
JayF | https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/2026757 | 22:56 |
stevebaker[m] | the source install looks cleaner anyway, heh | 22:58 |
JayF | that is ... ignoring the real issue in a sense | 22:58 |
JayF | I have a downstream that may be deploying Bobcat Ironic on Ubuntu. | 22:58 |
JayF | What do I tell them? | 22:58 |
JayF | It doesn't work? Build your own dnsmasq? | 22:59 |
JayF | So I think I'll try to find time to make a better answer; even if we have to doc "use our PPA because ubuntu won't fix 2026757" | 22:59 |
TheJulia | ubuntu won't pull in the updated version of dnsmasq because that is a change for a known issue on config updates in the version they are shipping | 23:00 |
JayF | *blink* | 23:01 |
JayF | do you mean because they document it as a known issue, they won't fix it? | 23:01 |
JayF | Or did I misread that? | 23:01 |
TheJulia | it is known to dnsmasq folk | 23:01 |
JayF | oooh so the idea is, we don't have a moving-forward fix | 23:01 |
JayF | only a moving-backwards unbreak | 23:01 |
JayF | ugh | 23:01 |
TheJulia | and 2.87 is known good as well, but that would mean incrementing their package version | 23:02 |
JayF | they aren't willing to backport a fix? | 23:02 |
TheJulia | ... ubuntu? | 23:02 |
JayF | yes | 23:02 |
TheJulia | I mean, they are wanting to pin it down exactly, I think thekelly's post mentioning it pins it down to like 2 different changes I think | 23:03 |
TheJulia | its been a while unfortunately | 23:04 |
TheJulia | so pivoting subjects to the multitenancy job, even the exemplar job logs have them on the same network( 99253a6c-b699-4d2f-8c79-8d1ec7d9a8f8 10.0.100.16 1640f2be-f306-4a99-97a7-d9bd3f38f64e 10.0.100.249) https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8f5/894460/18/check/ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode/8f54392/controller/logs/tempest_log.txt | 23:06 |
JayF | so we've reoriented from "why does the sharding job not pass" to "why does the actual job pass" | 23:10 |
JayF | that's comforting, weirdly | 23:10 |
TheJulia | basically yeah | 23:13 |
TheJulia | so, the way the job is designed, is it uses the *same* IP range for two different networks | 23:13 |
TheJulia | and being in tenants, they should have entirely separate config | 23:14 |
TheJulia | and then we end up wiring the nodes to different vlans | 23:14 |
TheJulia | and it confirms if things are happy | 23:14 |
TheJulia | ... or not | 23:14 |
JayF | so really is just testing NGS | 23:16 |
JayF | and/or Ironic/Neutron/NGS interations | 23:17 |
TheJulia | basically yeah | 23:18 |
TheJulia | and if the underlying contract is solid there, or not | 23:18 |
TheJulia | and in this case, I can *see* the tags getting set | 23:18 |
TheJulia | I just don't know why they don't take effect or hold | 23:18 |
JayF | so this is just a mechanism of the test thing | 23:18 |
JayF | not an actual problem, most likely | 23:18 |
JayF | if tags are getting passed through, NGS is doing the thing | 23:18 |
TheJulia | yeah, what is happening is almost like, the tags are getting lost | 23:19 |
TheJulia | ... in fact, we should have some of it preserved | 23:19 |
JayF | where are the tags set? | 23:19 |
JayF | you wanna VC while looking at it? | 23:19 |
JayF | not sure how much brain I have left, but I have time left in the day :) | 23:19 |
TheJulia | https://1f56561c0b806c2ef991-4e4c7f9e4e66be3b004244bfc6fd74b6.ssl.cf5.rackcdn.com/910436/1/check/ironic-standalone-redfish/676e82b/post-job-network-ovs.txt | 23:19 |
JayF | TheJulia: I should be seeing blank, yes? | 23:20 |
TheJulia | I don't think so | 23:20 |
TheJulia | I'd have to go look at the post job steps | 23:20 |
JayF | I'm saying more like | 23:20 |
JayF | it appears blank | 23:20 |
TheJulia | yes, it is | 23:20 |
JayF | is that a local web browser issue | 23:20 |
JayF | or an empty file | 23:20 |
JayF | lol okay | 23:20 |
JayF | TheJulia: ... all those logs load in as empty to me | 23:21 |
JayF | okay, there we go | 23:22 |
JayF | WTF | 23:22 |
TheJulia | https://github.com/openstack/ironic/blob/master/playbooks/ci-workarounds/get_extra_logging.yaml#L26 | 23:22 |
JayF | seems like this is all "ovs is in a strange/bad state" | 23:23 |
JayF | but again: how is that possible with the shard name set differently?! | 23:23 |
JayF | wait | 23:23 |
JayF | is there a possibility we're digging a random/weird failure | 23:23 |
JayF | we've not seen this happen 2x yet | 23:23 |
TheJulia | I'm fairly sure it is a weird/random failure which is unrelated | 23:23 |
JayF | ack | 23:23 |
JayF | sgtm | 23:23 |
TheJulia | shard is all about the input into nova-compute | 23:23 |
TheJulia | and a filtered view | 23:23 |
TheJulia | that wouldn't change the behavior or outcome like 7 steps removed | 23:24 |
TheJulia | I think that sort of sums it up, there is some sort of ovs problem on the job/hosts/config where somehow everything ends up wired together | 23:26 |
TheJulia | and it shouldn't, and that test *really* is built around finding a bad/misbehaving ml2 plugin | 23:27 |
opendevreview | Julia Kreger proposed openstack/ironic stable/2023.2: ci: Source install dnsmasq-2.87 https://review.opendev.org/c/openstack/ironic/+/910444 | 23:33 |
TheJulia | it looks like we don't need to take that further | 23:36 |
opendevreview | Jay Faulkner proposed openstack/ironic master: Remove downgrade_dnsmasq; 1.90 is upstream now https://review.opendev.org/c/openstack/ironic/+/910445 | 23:47 |
JayF | for those not seeing the chat in #openstack-tc: fungi found and pointed out that 1.90 was pushed to Jammy repos a couple weeks back | 23:48 |
JayF | so while I don't wanna stop Julia's from hitting the gate, I stacked a full removal of the downgrade on top | 23:48 |
JayF | if that passes we can backport that down and be fixed, too | 23:48 |
JayF | but let Julia's fix land since our CI is busted already and it's close | 23:48 |
* JayF & | 23:49 | |
fungi | to be clear, i just stumbled across it when looking to see if it would make sense to ask for the version from noble to be added to jammy-backports | 23:49 |
TheJulia | 2.90 :) | 23:51 |
opendevreview | Julia Kreger proposed openstack/ironic stable/2023.2: Special case lenovo UEFI boot setup https://review.opendev.org/c/openstack/ironic/+/910446 | 23:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!