opendevreview | Ameya Raut proposed openstack/ironic-tempest-plugin master: Detaching instance_uuid for standalone TC's https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/838462 | 02:36 |
---|---|---|
arne_wiebalck | Good morning, Ironic! | 06:05 |
janders | good morning arne_wiebalck and Ironic o/ | 06:15 |
arne_wiebalck | hey janders o/ | 06:16 |
opendevreview | Harald Jensås proposed openstack/networking-baremetal master: [DNM] TEST CI networking-baremetal-multitenant-vlans https://review.opendev.org/c/openstack/networking-baremetal/+/839298 | 06:29 |
rpittau | good morning ironic! o/ | 07:19 |
dtantsur | morning folks | 07:40 |
dtantsur | rpittau: hey, let's sync re https://review.opendev.org/c/openstack/sushy-tools/+/836801 | 07:40 |
dtantsur | the last thing we determined about lower-constraints is that they should contain all dependencies. has anything changed? | 07:40 |
rpittau | dtantsur: good morning! | 07:43 |
rpittau | I actually don't think we need all the deps in lower-constraints, some of them can be left free and adjusted if needed | 07:44 |
dtantsur | not having dependencies there is what has broken us | 07:44 |
dtantsur | essentially, MarkupSafe, Werkzeug and itsdangerous removed API that the old Flask (?) relied on | 07:44 |
dtantsur | yeah, I think it was Flask | 07:45 |
dtantsur | the alternative is to stop testing lower-constraints at all | 07:46 |
rpittau | we just need to track in lower-constraints what we track in requirements, and possibly some other deps of deps if needed | 07:49 |
rpittau | look at other projects like sushy or ipa, even if they break at some point, it will be for just one or two deps, and always the same | 07:49 |
rpittau | tracking all of them it doesn't make sense | 07:49 |
rpittau | that's one of the reasons why we decided to leave the l-c test only in master | 07:49 |
dtantsur | it's probably easier to add everything that to chase a new package every time something breaks. but I can give it a try. | 07:50 |
rpittau | I see it the opposite way :) | 07:54 |
rpittau | leaving the freedom to the direct requirements to "choose" their dependencies version reduce the number of packages that we have to chase in case of breakage | 07:54 |
opendevreview | Dmitry Tantsur proposed openstack/sushy-tools master: Fix the CI https://review.opendev.org/c/openstack/sushy-tools/+/836801 | 07:54 |
dtantsur | rpittau: trying ^^^ | 07:54 |
janders | good morning rpittau dtantsur and Ironic o/ | 07:59 |
rpittau | hey janders :) | 08:00 |
dtantsur | rpittau: well, them "choosing" their dependencies IS the source of breakages | 08:03 |
dtantsur | anyway, the patch seems to be passing. please review. | 08:04 |
sagar | Hi ironic ! | 08:47 |
sagar | I am part of Dell team where we are working on improving the test coverage in its third-party CI. | 08:49 |
sagar | In ironic_tempest_plugin, currently we are adding test cases for different deployment scenario's with all the available drivers. To achieve the synchronize boot mode deployment scenario, planning to use boot_mode parameter in test cases. | 08:49 |
sagar | Have tried to use boot_mode as a node attribute in test cases, by referring this https://github.com/openstack/ironic-tempest-plugin/blob/56af4756a993b264bac6f5c7788397ebfc7359bf/ironic_tempest_plugin/services/baremetal/v1/json/baremetal_client.py#L505 . still its not getting reflected on the node properties. | 08:53 |
dtantsur | sagar: hi! boot_mode is usually set by devstack. what exactly are you trying to achieve? | 08:55 |
sagar | we are checking boot mode on server first and then sending opposite of it. For example : if boot_mode on server is uefi we need to pass "bios" boot_mode from tempest test case. | 09:03 |
dtantsur | sagar: you'll have hard time. take a look at devstack/lib/ironic: we're setting the current boot mode on the flavor. | 09:05 |
sagar | dtantsur: Does that means, we cant update boot_mode as of now through tempest? | 09:08 |
sagar | Meanwhile I will also check : devstack/lib/ironic as you have suggested. | 09:10 |
opendevreview | Riccardo Pittau proposed openstack/bifrost master: [DNM] Test dhcp-all-interfaces fix https://review.opendev.org/c/openstack/bifrost/+/839329 | 09:52 |
dtantsur | sagar: you probably won't be able to without a lot of hackery | 10:06 |
sagar | dtantsur: ok, Thank you. | 10:09 |
opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent master: Multipath Hardware path handling https://review.opendev.org/c/openstack/ironic-python-agent/+/837039 | 10:17 |
rpittau | TheJulia: re: multipath, I pushed an update to your patch based on some... testing results, and dtantsur suggestions :) | 10:21 |
dtantsur | rpittau: can we actually cache the outcome of is_multipath_enabled? | 10:27 |
dtantsur | maybe in a global variable as a temporary measure? | 10:27 |
rpittau | oh yeah | 10:28 |
opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent master: Multipath Hardware path handling https://review.opendev.org/c/openstack/ironic-python-agent/+/837039 | 10:35 |
opendevreview | Riccardo Pittau proposed openstack/bifrost master: Upgrade from stable/yoga https://review.opendev.org/c/openstack/bifrost/+/839369 | 12:06 |
iurygregory | good morning Ironic | 12:10 |
rpittau | hey iurygregory :) | 12:11 |
iurygregory | o/ | 12:12 |
janders | see you tomorrow Ironic o/ | 12:57 |
TheJulia | rpittau: Okay, I'll caffinate and look in a little bit | 12:59 |
TheJulia | dtantsur: I tried originally with a global var... I ran into all sorts of testing issues | 12:59 |
TheJulia | like... bashing my head into a wall until it was covered in blood tried | 13:00 |
rpittau | bye janders :) | 13:03 |
rpittau | good morning TheJulia :) | 13:03 |
TheJulia | The only way to make it a global is likely make overall detection of mpio capability a method invoked as part of startup and never again. Kind of similar to how the node cache works, although that only gets updated after initial check-in upon direct invocation by ?1? method | 13:21 |
dtantsur | TheJulia: in theory, we should cache it on the hardware manager | 13:22 |
TheJulia | I really tried | 13:22 |
dtantsur | but the current design makes it harder (list_all_block_devices is a global function) | 13:22 |
TheJulia | yup | 13:22 |
TheJulia | rpittau: your changes look good to me | 13:27 |
hjensas | anyone else see the Task "Generate statistics" error seen here - https://zuul.opendev.org/t/openstack/build/391677d36eed4781bf2394d590c27146 ? | 13:28 |
rpittau | thanks TheJulia :) | 13:28 |
TheJulia | hjensas: no, generate statistics is something brand new that I've seen no emails on | 13:29 |
hjensas | TheJulia: ok, I'll keep digging then. network-baremetal CI is broken, and the failing generate stats task cause "POST_FAILURE" and no logs are collected. | 13:30 |
TheJulia | ugh | 13:30 |
* hjensas filed https://bugs.launchpad.net/devstack/+bug/1970431 | 13:36 | |
hjensas | dansmith: if you are around, can you look at ^ - I'm not sure if we want to use 'tryint()' on L34 of get-stats.py, or put zero in value, or skip the stat? | 13:50 |
dansmith | hjensas: yeah, tryint would be good.. I have no idea why systemd is reporting something like that | 13:51 |
dansmith | hjensas: we should also mark the ansible as "ignore_errors" there so we don't nail you in cases like this | 13:52 |
dansmith | gmann: ^ | 13:52 |
dansmith | hjensas: I have a patch up to fix something else, so let me just add to that | 13:52 |
hjensas | dansmith: yeah, it is strange. I proposed https://review.opendev.org/c/openstack/devstack/+/839387 - using trying() | 13:52 |
dansmith | oh okay | 13:52 |
hjensas | s/trying()/tryint()/ | 13:53 |
gmann | dansmith: you mean to mark devstack role itself with ignore_error so that it does not cause job failure. I think we did same for stackviz role case also. | 13:54 |
opendevreview | Harald Jensås proposed openstack/networking-baremetal master: [DNM] TEST CI networking-baremetal-multitenant-vlans https://review.opendev.org/c/openstack/networking-baremetal/+/839298 | 13:54 |
dansmith | gmann: yeah, I have another tweak for jobs that don't have pymysql installed, so I added it to that: https://review.opendev.org/c/openstack/devstack/+/839217 | 13:55 |
gmann | dansmith: nice | 13:56 |
opendevreview | Iury Gregory Melo Ferreira proposed openstack/ironic-python-agent stable/wallaby: Multipath Hardware path handling https://review.opendev.org/c/openstack/ironic-python-agent/+/837784 | 14:03 |
iurygregory | rpittau, "backport" updated | 14:03 |
rpittau | iurygregory: checked and lgtm | 14:06 |
rpittau | thanks! | 14:06 |
iurygregory | nice! | 14:08 |
opendevreview | Julia Kreger proposed openstack/ironic master: DNM: v6/grenade multinode jobs https://review.opendev.org/c/openstack/ironic/+/839086 | 14:14 |
TheJulia | if that doesn't work, I'm likely going to need to ask opendev folks to hold the VMs from an execution so I can poke around. | 14:15 |
TheJulia | vxlan tunnel sadness (which is a headache with every multinode job) | 14:15 |
dtantsur | looking for a 2nd +2 on https://review.opendev.org/c/openstack/sushy-tools/+/836801 | 14:43 |
TheJulia | done | 14:44 |
dtantsur | thx! | 14:45 |
TheJulia | would anyone like classix pixie boots stickers? | 14:45 |
TheJulia | classic | 14:45 |
dtantsur | bring them to the summit :) | 14:45 |
opendevreview | Riccardo Pittau proposed openstack/sushy-tools master: Use python Zed tests https://review.opendev.org/c/openstack/sushy-tools/+/838674 | 14:52 |
opendevreview | Dmitry Tantsur proposed openstack/sushy-tools master: vmedia: keep the original URL in Image https://review.opendev.org/c/openstack/sushy-tools/+/836795 | 14:54 |
dtantsur | this ^^^ is annoying when debugging | 14:54 |
TheJulia | Apparently my reward for ordering two packs of stickers is hot sauce | 14:57 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Decouple deploy callback timeout from deploy step timeout https://review.opendev.org/c/openstack/ironic/+/837690 | 15:02 |
ajya | ftarasenko: while looking through the logs and having another try in my environment I was able to reproduce the issue. I'm starting seeing what is happening, but not yet clear why it is happening and why only from time to time. It's around the logic in Ironic with async tasks (the ones that reboot the system). | 15:08 |
ajya | As same pattern is reused in all interfaces it can affect all idrac async tasks. At least now it does not look that it has anything to do with iDRAC. | 15:08 |
ajya | Another thing, in newer Ironics the error is ignored here https://opendev.org/openstack/ironic/src/commit/93dc442935d5f7553c2459d46fb1d1c1d9c8a57c/ironic/conductor/rpcapi.py#L73 | 15:09 |
ajya | My guess is that something was backported to Wallaby and because Wallaby does not get this ^, the error fails all cleaning. | 15:10 |
opendevreview | Dmitry Tantsur proposed openstack/bifrost master: Prevent the enroll/deploy commands from running without venv https://review.opendev.org/c/openstack/bifrost/+/839399 | 15:10 |
ajya | I'll continue looking into the logic to get it fixed. For now don't see any quick workarounds. | 15:10 |
TheJulia | there is nothing like waiting for a node to become available... when it is one of five. | 15:13 |
ftarasenko | ajya: thank you for your research. workaround is not a problem for me, hope the bug will be found and fixed) | 15:13 |
opendevreview | Harald Jensås proposed openstack/networking-baremetal master: Remove deprecated ironic client opts https://review.opendev.org/c/openstack/networking-baremetal/+/839298 | 15:21 |
adarobin | Is storyboard the appropriate venue for feature requests or just bug reports? | 15:32 |
TheJulia | adarobin: it is | 15:33 |
TheJulia | for both | 15:33 |
dtantsur | adarobin: yes https://docs.openstack.org/ironic/latest/contributor/contributing.html#adding-new-features | 15:33 |
adarobin | Cool -- I have patches as well, but getting my employer to sign a contributor agreement is all sorts of fun | 15:34 |
dtantsur | sigh, yeah | 15:34 |
adarobin | Only took like a year and half to get the last one :-( | 15:35 |
dtantsur | mmmmghh.. our love for JSON fields backfires from time to time :( I need to search database for things that have agent_url and I cannot (at least in a database-agnostic way) | 15:46 |
opendevreview | Merged openstack/sushy-tools master: Fix the CI https://review.opendev.org/c/openstack/sushy-tools/+/836801 | 15:48 |
* dtantsur is wondering how modern sqlalchemy deals with JSON queries | 15:53 | |
rpittau | good night! o/ | 16:05 |
dtantsur | okay, nothing backend-independent. damn. | 16:13 |
TheJulia | dtantsur: why do you need to query by agent_url? | 16:14 |
dtantsur | TheJulia: I need to know if we're running a deploy step now or just waiting for the agent to come back | 16:14 |
TheJulia | In other news, for some insane reason, port 8080 works with ipv6, and 443 does not :( | 16:14 |
dtantsur | Oo | 16:15 |
dtantsur | I can, of course, filter on the Python side.. but it means fetching all nodes in DEPLOYWAIT | 16:16 |
TheJulia | dtantsur: so teaching ironic to do a secondary query instead of just assuming the node will heartbeat quickly? I thought we had ipa code to immediately heartbeat upon task completion | 16:17 |
dtantsur | TheJulia: the timeout does not use heartbeats (and we cannot query by the last heartbeat time since it's also in JSON!) | 16:18 |
TheJulia | ahh | 16:18 |
TheJulia | I still thought we tought IPA to immediately heartbeat anyway, so I'm not sure I truly understand the why unless your intending to just teach ironic to go ask "whats up?" explicitly | 16:19 |
dtantsur | I need to be able to tweak the deploy step timeout | 16:33 |
dtantsur | so yeah, IPA does heartbeat immediately, but it has no effect on the timeout? unless we flip the node to deploying and back. | 16:34 |
TheJulia | hmmm | 16:35 |
TheJulia | feels like a bug that we don't | 16:35 |
TheJulia | but there may be reasons there | 16:35 |
dtantsur | a counter-argument could be: heartbeats are not an indicator that a deploy step is not stuck | 16:35 |
dtantsur | whether we should handle the case of a deploy step getting stuck.. is questionable | 16:36 |
JayF | a deploy step arguably should handle the case of itself getting stuck, internally | 16:36 |
JayF | if it's doing something that long running, nothing preventing it from making sure e.g. the vendor tool it kicked off is making progress | 16:37 |
dtantsur | yeah, I'm thinking among the same lines | 16:37 |
dtantsur | maybe we don't even need a generic deploy step timeout, only a deploy callback timeout | 16:37 |
dtantsur | the hard part is to distinguish the two.. I wish the agent heartbeat timestamp was a normal field we could query against | 16:40 |
* dtantsur is curious even we even use it at all | 16:41 | |
opendevreview | Ameya Raut proposed openstack/ironic-tempest-plugin master: Add iDRAC management cleaning steps tests https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/826646 | 16:41 |
dtantsur | answering myself: we use it for fast-track detection and for PXE retries | 16:42 |
rloo | wrt dtantsur's comment about searching db for info in json ... we have a downstream hack to allow the user to filter (when doing 'list') on a property key=value pair... maybe it can be generalized to any json field, don't recall how the code works... | 16:42 |
dtantsur | rloo: it can be done quite easily, but not in a backend-agnostic way | 16:42 |
rloo | that's what i don't know, the code does something at the db layer and it might be specific to mysql. | 16:43 |
*** dking is now known as Guest2887 | 16:53 | |
*** Guest2887 is now known as dking | 17:07 | |
mallik_ | rloo: Hi | 17:31 |
rloo | hi mallik_ | 17:32 |
mallik_ | I tried anaconda based provisioning with rhel 7.9 rhel8.2 and 8.5 as per the documentation, and could not succeed | 17:33 |
mallik_ | rloo: with rhel8.5, I see in /tmp/anaconda.log "dasbus.error.DBusError: [Errno 2] No such file or directory: 'efibootmgr': 'efibootmgr'" | 17:33 |
mallik_ | rloo: with rhel8.2, I see "ValueError: Error enabling service chronyd: 1" | 17:33 |
TheJulia | mallik_: between 8.3 and 8.4, I believe efibootmgr basically became required to be in any rhel image deployed | 17:34 |
TheJulia | So maybe some sort of mismatch? | 17:34 |
rloo | so it doesn't work with rhel7. there's a bug with anaconda or something. we should prob update some doc or open a bug or ?? about that | 17:34 |
rloo | (and I don't actually have experience much experience with anaconda or rhel8) | 17:35 |
mallik_ | rloo: with rhel7.9, node moved to active but image did not bootup | 17:35 |
rloo | i think %onerror doesn't work with rhel7, i've already forgotten. if node moved to active, then the %onerror bug wouldn't be the issue. | 17:37 |
mallik_ | TheJulia: we checked manuall in the anaconda shell prompt of rhel8.5, efiboomgr was present but it was giving that error | 17:39 |
TheJulia | mallik_: I'd honestly consider contacting RH support in that case, since your far outside the driver at that point, and it sounds like an issue with anaconda itself | 17:40 |
mallik_ | TheJulia: I also tried with url patch id 834709, for rhel7.9 it was failing with pre-install steps. with 8.5 it crossed pre-install steps and as per anaconda logs it is waiting for some input on the screen. Input is required by ScreenData(IpmiErrorDialog,None,True) screen | 17:44 |
opendevreview | Julia Kreger proposed openstack/ironic master: Grenade: Turn up interfaces for vxlan https://review.opendev.org/c/openstack/ironic/+/839420 | 17:46 |
TheJulia | rloo or any around core, I'd appreciate just merging ^ | 17:47 |
TheJulia | I can't fix the grenade stuff without it, it seems. :\ | 17:47 |
TheJulia | and it needs to actually merge in for the subnode to pick it up. | 17:47 |
TheJulia | since it runs off master | 17:47 |
rloo | TheJulia: need 2 people to review? | 17:52 |
dtantsur | I can be the 2nd | 17:53 |
rloo | thx dtantsur! was wondering if I missed something about only needing 1 core ;) | 17:53 |
dtantsur | well.. gate fixes are one of the cases where it's legal to merge with a single core | 17:54 |
rloo | ah.. see, i WAS missing something! | 17:54 |
dtantsur | OR we can do it this way: we allow TheJulia to approve it herself if it passes the multinode job (it's non-voting) | 17:55 |
rloo | ha ha | 17:55 |
TheJulia | oh well, then... | 17:55 |
TheJulia | I guess I'll look for something else to do | 17:55 |
TheJulia | :) | 17:55 |
JayF | rloo: policy was changed a few years back to "2x core votes except in cases where it's trivial or a gate fix" | 17:56 |
JayF | I don't love anything getting into the code base without 2x independent eyes, but this is the real world where there aren't enough humans working on ironic to go around ;0 | 17:56 |
rloo | thx JayF! the 'trivial' part I remember, so I'm not totally senile ;) | 17:57 |
TheJulia | the v6 job still has me stumped | 17:57 |
rloo | (I was just musing about changing it to just one core...) | 17:57 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: [WIP] Decouple deploy callback timeout from deploy step timeout https://review.opendev.org/c/openstack/ironic/+/837690 | 17:58 |
dtantsur | this just stops timing out running steps ^^ | 17:59 |
dtantsur | we need to check that we handle an abruptly restarted agent correctly.. but this has to be done anyway | 17:59 |
dtantsur | but that's for tomorrow. o/ | 18:02 |
TheJulia | goodnight | 18:02 |
opendevreview | Julia Kreger proposed openstack/ironic master: DNM: v6/grenade multinode jobs https://review.opendev.org/c/openstack/ironic/+/839086 | 18:17 |
opendevreview | Julia Kreger proposed openstack/ironic master: CI: Turn off STP for CI jobs https://review.opendev.org/c/openstack/ironic/+/839425 | 18:40 |
TheJulia | hjensas: so I don't think we can do multipath in CI... Looks like it would only work if backed by a real block device | 19:04 |
hjensas | TheJulia: hm, a file mounted as loop dev does not work? | 19:16 |
TheJulia | That *might* work | 19:17 |
TheJulia | I'm not sure it makes sense to retool CI that much though | 19:17 |
hjensas | yeah, maybe not. | 19:20 |
TheJulia | wow, we also can't turn off stp | 19:26 |
TheJulia | in bridge mode | 19:26 |
opendevreview | Harald Jensås proposed openstack/networking-baremetal master: Register neutron common config options https://review.opendev.org/c/openstack/networking-baremetal/+/839298 | 19:48 |
hjensas | TheJulia: do we have loops if we turn of stp? | 19:56 |
TheJulia | hjensas: it won't let us because we're using a bridge | 20:00 |
hjensas | ah, right I saw something similar in Infrared downstream. They had to condition on being bridge or not. | 20:01 |
TheJulia | yeah, we default to bridge upstream which makes me wonder how many CI failures we've had over the years due to it | 20:03 |
TheJulia | hjensas: do you remember any issues with dhcpv6/slaac in ci, specifically with tinycore linux? | 20:06 |
hjensas | TheJulia: no, it's been too long. | 20:07 |
TheJulia | hjensas: would you mind glancing at an IPA console log and seeing if anything screams out at you? | 20:08 |
hjensas | TheJulia: not at all, is it on your v6/grenade patch? | 20:09 |
TheJulia | https://e850e778a47f7adac9b8-8b0899cb6c8c0582fa25b52fb6031f3e.ssl.cf1.rackcdn.com/839086/8/check/ironic-tempest-ipxe-ipv6/e2950f8/controller/logs/ironic-bm-logs/node-1_console_2022-04-26-19%3A33%3A32_log.txt | 20:11 |
TheJulia | yeah | 20:11 |
TheJulia | I'm a bit stumped... and I'm thinking maybe I should just change the type over to eliminate if it is the Os of the ramdisk | 20:11 |
TheJulia | It feels like it is working as expected though | 20:11 |
TheJulia | but nothing is really adding up | 20:11 |
TheJulia | err, rax | 20:13 |
TheJulia | so have to use tiny | 20:13 |
TheJulia | ugh | 20:13 |
TheJulia | well | 20:13 |
* TheJulia checks to see where the grenade fix is at | 20:13 | |
TheJulia | Not terribly long, so I can kick it then | 20:14 |
hjensas | It looks good, tinycore boots, and there is addresses and routes learned via RA. | 20:18 |
hjensas | I ran into an issue some time ago where I had flapping routes, I belive because of multiple interfaces all learning a default over RA proto. RHBZ#2046514, I never found the time to dig into that bug properly. - Would it be worth trying to force eth1 down? | 20:21 |
TheJulia | We only have the second interfaces because of portgroup bonding, so I think we could likely just default down the second interfaces in general | 20:28 |
opendevreview | Verification of a change to openstack/ironic master failed: Grenade: Turn up interfaces for vxlan https://review.opendev.org/c/openstack/ironic/+/839420 | 20:55 |
opendevreview | Clark Boylan proposed openstack/bifrost master: [DNM] Test dhcp-all-interfaces fix https://review.opendev.org/c/openstack/bifrost/+/839329 | 21:47 |
opendevreview | Verification of a change to openstack/ironic master failed: Grenade: Turn up interfaces for vxlan https://review.opendev.org/c/openstack/ironic/+/839420 | 21:48 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!