opendevreview | Verification of a change to openstack/ironic master failed: api: Add schema for allocations API (requests) https://review.opendev.org/c/openstack/ironic/+/945218 | 00:30 |
---|---|---|
opendevreview | Jacob Anders proposed openstack/ironic master: Fix servicing abort to respect abortable flag https://review.opendev.org/c/openstack/ironic/+/957189 | 00:54 |
janders | TheJulia ^^ is my attempt to try a fix for the issue of abortable=false being ignored in servicewait, while making a step towards completing my AI performance goal | 00:57 |
janders | I will lab-test this now and report back | 00:57 |
janders | it seems to work, avoiding two problems: 1) firmware update being interrupted (it nicely queued the abort till after it finished) and 2) having to double-tap abort to get out from service wait to service failed to active | 01:20 |
janders | once we're happy with your change I will re-spin mine with the correct verb/ in-code naming scheme | 01:20 |
janders | will be curious what you folks think (also CC dtantsur and JayF since we discussed "abortability") | 01:21 |
opendevreview | cid proposed openstack/ironic master: Add periodic cleanup of stale conductors https://review.opendev.org/c/openstack/ironic/+/956500 | 01:28 |
opendevreview | cid proposed openstack/ironic master: Add periodic cleanup of stale conductors https://review.opendev.org/c/openstack/ironic/+/956500 | 01:42 |
opendevreview | Steve Baker proposed openstack/networking-generic-switch master: WIP Add security group support to ovs https://review.opendev.org/c/openstack/networking-generic-switch/+/956519 | 04:41 |
opendevreview | Takashi Kajinami proposed openstack/networking-generic-switch master: devstack: Drop explicit etcd api version https://review.opendev.org/c/openstack/networking-generic-switch/+/957207 | 07:49 |
tkajinam | I wonder if author information of ngs needs to be updated https://github.com/openstack/networking-generic-switch/blob/master/setup.cfg#L3-L4 . I'm not quite sure if we can/should update the author but I double the email is still active | 07:54 |
opendevreview | Takashi Kajinami proposed openstack/networking-generic-switch master: Update author information https://review.opendev.org/c/openstack/networking-generic-switch/+/957208 | 07:58 |
tkajinam | I *doubt* | 07:58 |
abongale | good morning ironic o/ | 08:28 |
queensly[m] | Good morning o/ | 08:32 |
kubajj | good morning abongale , queensly[m] , and Ironic! o/ | 08:32 |
darkhackernc | https://bugs.launchpad.net/ironic/+bug/2120542 | 09:34 |
darkhackernc | team any help on this | 09:34 |
FreemanBoss[m] | Good morning everyone | 09:37 |
darkhackernc | morning | 09:46 |
dtantsur | darkhackernc: "request was made with project scope" is the most important hint. With new RBAC, some commands, including this one, requires a system-wide token, not a project one. | 09:54 |
darkhackernc | Thanks dimitry | 09:54 |
darkhackernc | dtantsur, does that will block the provisioning??? | 09:57 |
dtantsur | darkhackernc: normal provisioning actions can be done with a project token | 10:09 |
darkhackernc | dtantsur, perfect, my nodes are starting, which means should not be a blocker for me. | 10:10 |
darkhackernc | I have 2 more bugz | 10:10 |
darkhackernc | https://bugs.launchpad.net/ironic/+bug/2120549 | 10:15 |
dtantsur | Well, not broken, just not implemented :) And the bug should go to Nova, not to Ironic proper. | 10:19 |
darkhackernc | cool thank you Dimitry | 10:49 |
darkhackernc | team can you please share some hints for this oen - https://pasteboard.co/pnSTaBENm20F.png | 11:32 |
dtantsur | darkhackernc: it's a very generic problem. Basically, the node cannot DHCP on boot. Maybe it does not reach neutron or whatever you're using for DHCP. Using tcpdump may help, also check the neutron's resources. | 12:00 |
darkhackernc | dtantsur, yes, I am replicating and collecting the logs | 12:14 |
darkhackernc | will share that | 12:14 |
TheJulia | good morning | 12:57 |
TheJulia | tkajinam: it really should be who maintains it at this point, so broadly speaking that would be OpenStack | 13:13 |
tkajinam | TheJulia, yeah that's what I thought. | 13:15 |
TheJulia | darkhackernc: fwiw, I put in comments aligned with the above discussion into the bugs and updatd their status | 13:18 |
TheJulia | darkhackernc: If you manually created the baremetal port(s), I'd double check the MACs are right as a first step, then move on to ensuring the physical network the port is wired to is attached to the correct physical network on the controller nodes, and then find the dhcp service and ensure packets are making it to neutron. | 13:19 |
TheJulia | But as Dmitry said, this is very common and often just... a typo or a incorrect port tagging, or the wrong physical network port being used on the controllers | 13:20 |
TheJulia | Which just needs to be worked through one step at a time :( | 13:20 |
darkhackernc | TheJulia++ thank you very much | 13:21 |
darkhackernc | in hte logs I am getting some uncommon messgaes and missleading info, I ll pass that in few minutes | 13:21 |
TheJulia | dtantsur: regarding eventlet, steve is suggesting splitting out the api object and indirection stuff out separately and merging. Which could make sense, if you have any strong opinion lmk | 13:23 |
TheJulia | JayF: Quick question, regarding eventlet, are the +1s more just hesitancy in general or are you expecting to them be treated or upgraded to +2 when the time comes? Just trying to make sure folks are on the same page and given the timeline for end of cycle I unfortunately need to be a little blunt :) | 13:26 |
TheJulia | Folks, If I can get a single core to review https://review.opendev.org/c/openstack/ironic/+/956801, it would be appreicated. It gets grenade back in passing shape. | 13:36 |
kubajj | Hello, we are looking into fixing the other part of the RAID skip_block_devices. We think that we need to write a function which retrieves raid_level of an existing RAID and so we looked into the possible values and how they can be stored in mdadm --detail. We are a bit confused about the options. The Ironic docs say that we support value of raid_level 2, but the manual for mdadm does not list it as a possible level value. | 13:45 |
kubajj | https://docs.openstack.org/ironic/latest/admin/raid.html#mandatory-properties | 13:45 |
JayF | TheJulia: mainly sticking to what I told you -- I think I may have missed too much context to get the confidence to be a core review on that. I would hope that my vote doesn't become something that's absolutely needed, because I hope that this is a big enough change we get lots of cores looking at it | 13:54 |
opendevreview | Julia Kreger proposed openstack/ironic master: Replace GreenThreadPoolExecutor in conductor https://review.opendev.org/c/openstack/ironic/+/952939 | 13:55 |
opendevreview | Julia Kreger proposed openstack/ironic master: Set the backend to threading https://review.opendev.org/c/openstack/ironic/+/953683 | 13:55 |
opendevreview | Julia Kreger proposed openstack/ironic master: Launch vnc proxy with no_fork https://review.opendev.org/c/openstack/ironic/+/957044 | 13:55 |
opendevreview | Julia Kreger proposed openstack/ironic master: Remove direct mapping from API -> DB https://review.opendev.org/c/openstack/ironic/+/956512 | 13:55 |
opendevreview | Julia Kreger proposed openstack/ironic master: Optional indirection API use https://review.opendev.org/c/openstack/ironic/+/956504 | 13:55 |
TheJulia | JayF: okay, glad you said that in public channel so other reviewers can be aware of that with their reviews so they don't take to thinking we're all using +1s instead or something like that | 13:57 |
opendevreview | Julia Kreger proposed openstack/ironic master: Set the backend to threading https://review.opendev.org/c/openstack/ironic/+/953683 | 13:57 |
opendevreview | Julia Kreger proposed openstack/ironic master: Revert "ci: temporary metal3 integration job disable" https://review.opendev.org/c/openstack/ironic/+/956953 | 13:58 |
opendevreview | Julia Kreger proposed openstack/ironic master: Clean-up misc eventlet references https://review.opendev.org/c/openstack/ironic/+/955632 | 13:58 |
darkhackernc | TheJulia, https://bugs.launchpad.net/ironic/+bug/2120567 | 14:09 |
TheJulia | so your deploying VMs? | 14:12 |
TheJulia | so slight out of the box problem, ironic leans hard into UEFI by default. Octavia doesn't ship an amphora image which is UEFI compatible. | 14:13 |
TheJulia | so | 14:14 |
TheJulia | you really shouldn't be pre-creating your neutron port with the same mac as the baremetal host | 14:15 |
TheJulia | Ironic will change it properly | 14:15 |
TheJulia | and your using the network interface flat which means it should still work I guess, but any changes from the default to the mac are expected by design | 14:15 |
TheJulia | darkhackernc: 1) check the neutron-dhcp-agent log to make sure it is working and it is processing the updates. 2) Make sure the network it is attaching a dhcp service to *is* the same physical network the VM is attached to. Most commonly the latter is the issue | 14:19 |
TheJulia | also, make sure dnsmasq is not crashing/erroring/restarting. That has been a pain as of relative recentness | 14:27 |
dtantsur | TheJulia: no objections re splitting indirection. Testing it with eventlet still present may get a useful data point. | 14:40 |
TheJulia | oh yeah, that is why I stacked them together since we don't want lock conflicts to creep into being a thing. I've noticed the before jobs seem to run a little longer by like 25% but that is still in the swing of CI in general | 14:41 |
TheJulia | before jobs being eventlet removed + metal3-integration | 14:41 |
TheJulia | but after, at least what I've looked at might be the fastest metal3-integration executions I've seen to date | 14:41 |
TheJulia | (31 minutes) | 14:50 |
TheJulia | 32 in the last run | 14:51 |
darkhackernc | TheJulia, I am simulating in a virtualized environed, so yes I am using vm's as a vBMC using vbmc driver to manage IPMI calls. | 14:51 |
drannou | Hello. I'm playing with soft raid, doing Two distinct raid 1 with the classical json. I see in the doc that the "is_root_volume" is not working with mdadm implementation, so how do you explain to the IPA on which raid he should put the system ? Obvisouly my tests show that most of the time, it takes the wrong one :p | 14:52 |
darkhackernc | and I checked everything no difference in the service, if you glance the https://pastebin.com/raw/K0Ymh6e8 you will see I have complied all the relevant logs | 14:52 |
drannou | Hello. I'm playing with soft raid, doing Two distinct raid 1 with the classical json. I see in the doc that the "is_root_volume" is not working with mdadm implementation, so how do you explain to the IPA on which raid he should put the system ? Obvisouly my tests show that most of the time, it takes the wrong one :p | 14:53 |
TheJulia | darkhackernc: you don't have neutron-dhcp-agent logs there | 14:53 |
darkhackernc | TheJulia, ohhh | 14:54 |
TheJulia | drannou: so, the model with software raid configured by ironic is. I think.. and it has been a long time, but its always the first device | 14:54 |
darkhackernc | let me complied that as well | 14:54 |
TheJulia | so first raid device configured | 14:54 |
TheJulia | darkhackernc: you should see leases getting logged as well, in theory. That gives you a solid indication if dhcp is working or not and starts to also point you in the direction of the root problem | 14:54 |
darkhackernc | drannou, long time back when I was in RedHat I tried that, let me check my git repo, | 14:55 |
drannou | Hello Ironic team! I'm playing with soft raid, doing Two distinct raid 1 with the classical json | 14:56 |
darkhackernc | drannou, https://github.com/NileshChandekar/rhosp16_2_sw_raid | 14:56 |
darkhackernc | check if this helping you | 14:56 |
drannou | Sorry buffer issue, I send multiple time the same message :p | 14:56 |
TheJulia | drannou: no worries | 14:58 |
drannou | darkhackernc: you also define "is_root_volume", but there https://docs.openstack.org/ironic/latest/admin/raid.html#optional-properties we said that it's not supported | 14:59 |
drannou | And I don't see anything implemented for that in IPA | 15:00 |
drannou | I was checking to implement it (by using the volume_name), but I was wondering if it was not the wrong direction, may be someone already had it and fix it | 15:00 |
darkhackernc | drannou, then sorry no idea, this is 3 year old stuff | 15:02 |
drannou | TheJulia: I was also thinking that it would be the first device, but on my tests it doesn't seems to be the case, But my be I can put the ROOT raid at the end to verify :) | 15:03 |
TheJulia | I feel like kubajj is the new software raid expert | 15:03 |
opendevreview | Julia Kreger proposed openstack/ironic master: Clean-up misc eventlet references https://review.opendev.org/c/openstack/ironic/+/955632 | 15:03 |
kubajj | TheJulia: oh, no | 15:03 |
* TheJulia grins evilly :) | 15:03 | |
kubajj | TheJulia: I think I also came across the issue, but then solved it with forcing a root device hint | 15:04 |
TheJulia | drannou: I don't think is_root_volume is cared about/honored by the agent software raid | 15:04 |
kubajj | Can have a look into the 'is_root_volume' if needed | 15:04 |
TheJulia | ideally, yeah, the you root device hint it instead, but maybe there is a path where it should be honored. I guess it is also much harder to draw a direct line there | 15:05 |
drannou | The documentation seems to be outdated, as volume_name in soft raid is working: I named my raid devices and I can see them on the host | 15:06 |
TheJulia | drannou: written by different folks with different focuses. Most of the raid docs were written before software raid was a thing. | 15:07 |
kubajj | drannou: I don't think we need the volume name for the implementation, no?! | 15:07 |
drannou | I was checking on hardware.py on IPA to patch and select the corresponding named raid in get_os_install_device, but it seems tthat cached_node does not have the raid partitionning | 15:08 |
drannou | kubajj: this was just for a short test, I'm sure that (as an expert as TheJulia said :p) you can do without it ;) | 15:09 |
kubajj | drannou: ah, makes sense. When creating RAID, you could add its volume name to the root device hints | 15:09 |
TheJulia | well, it can be refreshed, the issue at hand is unless the device is explicitly named you can't tie it to an automatic device determination short of somehow saving the intent to device name mapping | 15:10 |
kubajj | drannou: I am no expert :D just spent quite a lot of time trying to prevent RAIDs from being deleted when reinstalling | 15:10 |
drannou | kubajj: yes, and when we write the image, we just have to find the corresponding name | 15:10 |
kubajj | drannou: for finding the name, there is already a function raid_utils.get_volume_name_of_raid_device | 15:11 |
drannou | kubajj: I saw but I miss the node database information (or may be I miss it) | 15:11 |
opendevreview | Doug Szumski proposed openstack/ironic-python-agent-builder master: Don't fail early if no config drive found https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/957253 | 15:12 |
kubajj | TheJulia: I am not sure if we have a different unique field of the raid which would be consistent (I am not sure if the indices of the md partitions could change, for example) | 15:14 |
kubajj | volume name should be unique | 15:14 |
opendevreview | Doug Szumski proposed openstack/ironic-python-agent-builder master: Don't fail early if no config drive found https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/957253 | 15:14 |
darkhackernc | https://bugzilla.redhat.com/show_bug.cgi?id=2032243 | 15:14 |
darkhackernc | https://bugzilla.redhat.com/show_bug.cgi?id=1317918 | 15:14 |
TheJulia | kubajj: I know the device name can change across reboots and that is sort of a know your platform detail. I've never done anything with custom volume names, closest I've gotten to software raid recently was trying to access my old NAS's hard disks which were an LVM mirror set. | 15:15 |
drannou | kubajj: the problem is that we might need to force the customer to add a (unique) name, if not the matching might be complicated (or we need to store the information). On my side that would not be a problem because for security reason, I changed a little bit the way raid is confiugred: raid is destroyed when instance is deleted and re created during the spawn | 15:18 |
drannou | this way we prevent a case where a customer A change the raid configuration, and a customer B get this new one (even if the host in DB is mark with the old RAID) | 15:19 |
kubajj | drannou: the current default behavior is that if you change target raid config then you need to go to cleaning anyway, no? so should not be a problem | 15:22 |
kubajj | volume names need to be unique, enforced by https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L3370-L3377 | 15:24 |
drannou | kubajj: depends on who is managing the infrastructure. In our case there is a huge separation between admin (that configure raid) and final customer (public cloud way of work). In that case we want to be SURE that if the customer put the host in rescue and change the radi configuration, we will destroy his change at next recycle | 15:24 |
kubajj | drannou: so the target raid config would not match the atcual raid config then? | 15:29 |
drannou | kubajj: if custA change manually ? yes exactly. And as long as there is no check during recycle, you want see it. Another problem we had (if I remmber well) is that keeping raid during recycle create a huge usage of schred, instead of NVME instant erase (We only have NVMe hosts) | 15:32 |
darkhackernc | TheJulia++ thanks, I will look into that tomorrow | 15:34 |
kubajj | drannou: I understand, our use case is for our hypervisors. If we need to reinstall a hypervisor with RAID array, we want to keep the data of VMs, but reinstall the OS | 15:34 |
drannou | kubajj: make sense | 15:39 |
kubajj | drannou: anyway, do you want me to have a look into the is_root_volume once I am done with debugging? https://review.opendev.org/c/openstack/ironic-python-agent/+/937342 | 15:40 |
drannou | kubajj: you might be better than me to fix that upstream, but I will keep going in my read/patch of the code to better understand this part. How do you think to retreive the volume_name ? Add it in the cache_node information ? | 15:44 |
TheJulia | cached node information can be replaced | 15:45 |
TheJulia | and comes from the api which you can't write to | 15:45 |
TheJulia | I think an internal mapping might make sense | 15:45 |
TheJulia | or to internally teach root device selection to check for software raid and attempt to identify is_root_volume | 15:45 |
opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'vendor' field to the Port object https://review.opendev.org/c/openstack/ironic/+/954966 | 16:01 |
opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'category' field to the Port object https://review.opendev.org/c/openstack/ironic/+/955447 | 16:11 |
opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'vendor' field to the Port object https://review.opendev.org/c/openstack/ironic/+/954966 | 16:15 |
opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'category' field to the Port object https://review.opendev.org/c/openstack/ironic/+/955447 | 16:20 |
opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'physical_network' field to the Portgroup object https://review.opendev.org/c/openstack/ironic/+/955625 | 16:34 |
opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'physical_network' field to the Portgroup object https://review.opendev.org/c/openstack/ironic/+/955625 | 16:36 |
opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'category' field to the Portgroup object https://review.opendev.org/c/openstack/ironic/+/955713 | 16:41 |
opendevreview | Clif Houck proposed openstack/ironic-tempest-plugin master: Change Portgroup minimum microversion to 1.26 https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/955799 | 16:43 |
opendevreview | Merged openstack/ironic-python-agent master: Fix for motherboards where efibootmgr returns UTF-8. https://review.opendev.org/c/openstack/ironic-python-agent/+/956068 | 16:45 |
opendevreview | Merged openstack/ironic master: Initialize variable to prevent an error https://review.opendev.org/c/openstack/ironic/+/956629 | 17:24 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 18:06 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 18:10 |
opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 19:39 |
janders | good morning Ironic o/ | 20:53 |
janders | TheJulia couple questions regarding https://review.opendev.org/c/openstack/ironic/+/956972 before it gets too late in your TZ | 20:55 |
janders | 1) is it right to say we need to move from abort to unrescue as the verb and otherwise the code is close to merge ready? | 20:56 |
janders | 2) to me it feels like we also need something similar to https://review.opendev.org/c/openstack/ironic/+/957189 to enforce abortable flag in-step to be respected. Would you agree? | 20:57 |
janders | I'm working on the downstream side to this in BMO, we've got some time pressures so trying to test against in-flight patches, which is fine as long as behaviour doesn't change in a major way (can change verbs in the BMO code later). | 20:58 |
janders | coffee time in my TZ now | 21:05 |
opendevreview | cid proposed openstack/ironic master: Fix service failed state transitions for wait/hold https://review.opendev.org/c/openstack/ironic/+/957290 | 22:19 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!