| opendevreview | chandan kumar proposed openstack/cyborg master: Remove broken image signature verification https://review.opendev.org/c/openstack/cyborg/+/991027 | 08:36 |
|---|---|---|
| opendevreview | chandan kumar proposed openstack/cyborg master: Remove broken image signature verification https://review.opendev.org/c/openstack/cyborg/+/991027 | 08:42 |
| opendevreview | chandan kumar proposed openstack/cyborg-tempest-plugin master: Add scenario test for FPGA programming with FakeDriver https://review.opendev.org/c/openstack/cyborg-tempest-plugin/+/991081 | 13:23 |
| jgilaber | #startmeeting cyborg | 14:00 |
| opendevmeet | Meeting started Tue Jun 2 14:00:27 2026 UTC and is due to finish in 60 minutes. The chair is jgilaber. Information about MeetBot at http://wiki.debian.org/MeetBot. | 14:00 |
| opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 14:00 |
| opendevmeet | The meeting name has been set to 'cyborg' | 14:00 |
| jgilaber | Hi all! Who is around today? | 14:00 |
| jgilaber | While we gather, feel free to add topics to the agenda https://etherpad.opendev.org/p/openstack-cyborg-irc-meeting#L48 | 14:01 |
| jgilaber | courtesy ping: sean-k-mooney amoralej bogdando rlandy chandankumar | 14:01 |
| chandankumar | o/ | 14:01 |
| sean-k-mooney | o/ | 14:01 |
| jgilaber | let's give folks a minute to join and then we can start | 14:02 |
| jgilaber | ok let's get started | 14:03 |
| jgilaber | we have a topic from chandankumar | 14:03 |
| jgilaber | #topic nvme cleanup stages spec proposal discussion | 14:03 |
| jgilaber | #link nvme cleanup stages spec proposal discussion | 14:03 |
| jgilaber | go ahead chandankumar | 14:03 |
| chandankumar | sure | 14:04 |
| chandankumar | Here is the current spec for nvme secure cleanup https://review.opendev.org/c/openstack/cyborg-specs/+/985349/10/specs/2026.2/approved/generic-nvme-driver-with-secure-cleanup.rst | 14:04 |
| chandankumar | Thank you everyone for reviewing the spec. | 14:04 |
| chandankumar | Below is the cleanup flow for nvme device once user deletes an instance: | 14:04 |
| chandankumar | 1. Instance deletion and unbind process goes seperately. | 14:04 |
| chandankumar | 2. During unbind process, cyborg will disable the nvme device with maintaining status by default and set resevered=total in placement | 14:04 |
| chandankumar | 3. Unbind process will finish and cleanup will run async. | 14:04 |
| chandankumar | 4. if the cleanup finishes successfully, we set reserved=0 in placement and add a new flag cleanup_fail to false and enable the device | 14:04 |
| chandankumar | 5. if cleanup fails, keep the device disabled with maintaing state and set cleanup_fail to true and reserved=total in placement. | 14:04 |
| chandankumar | 6. Operator can list the device with cleanup_failed and run cleanup manually on those devices. | 14:05 |
| chandankumar | Earlier in previous patch iteration, I went with cleaning and cleaning_failed status message but there are too many state. | 14:05 |
| chandankumar | So I sticked with | 14:05 |
| chandankumar | default maintaining state and cleanup_failed flag to keep the flow simpler. | 14:05 |
| chandankumar | Do we want to add a cleaning state to know the exact state? and if cleanup failed, we will stick with maintaining? | 14:05 |
| sean-k-mooney | so im not ok with reusing disabled for cleaning failed | 14:05 |
| chandankumar | How do we want to handle that? | 14:05 |
| sean-k-mooney | that was one of the feedack i gave on irc when you pushed the sepc initally | 14:05 |
| amoralej | o/ | 14:05 |
| chandankumar | sean-k-mooney: sorry I am not getting | 14:06 |
| sean-k-mooney | i want to intoduce a sperate device_state filed which will taransation between avilable -> allcoated -> cleaning and hten to eitehr error or aviabel dependign on if cleanign succeeded | 14:06 |
| chandankumar | ok | 14:07 |
| sean-k-mooney | we also need to set reserved=total durign arq bind not durign unbind | 14:07 |
| sean-k-mooney | so on bind we woudl transiation the device to allcoated and set reserved=total | 14:08 |
| sean-k-mooney | on unbind it moved to cleaning keeping reserved=total | 14:08 |
| chandankumar | above device state sounds good, I was focus earloer on cleanup | 14:09 |
| sean-k-mooney | and ether end in error (reserved=total) or aviable (reseting reserved=0) | 14:09 |
| chandankumar | yes that approach seems better | 14:10 |
| chandankumar | One more thing, Does operator can toggle the device state manually? once clenaup finishes successfully | 14:10 |
| sean-k-mooney | im also wondering if we want to add a /device/<uuid>/clean endpoint ot manually trigger cleaning | 14:10 |
| sean-k-mooney | that would be an admin operator similar to program | 14:10 |
| sean-k-mooney | chandankumar: no | 14:11 |
| sean-k-mooney | the admin cannot | 14:11 |
| sean-k-mooney | if we add clean as a deivce action | 14:11 |
| sean-k-mooney | that is how they would recover it | 14:11 |
| sean-k-mooney | we coudl perhaps consider a way to force it | 14:11 |
| jgilaber | that endpoint would only work for devices that are in error? | 14:12 |
| sean-k-mooney | but this si really internal stant that an admin shoudl not have to change in a normal workflow | 14:12 |
| sean-k-mooney | jgilaber: i would say error or aviableale woudl be ok | 14:12 |
| sean-k-mooney | but a 409 for allcoated/cleaning | 14:12 |
| jgilaber | right, available would be fine, but redundant | 14:13 |
| sean-k-mooney | we coudl dicusssi fi cleanign an allcoated fpga before repogrammign it shoudl eb allowed or not in the spec | 14:13 |
| sean-k-mooney | the other approch here for erroed device woudl eb a cyborg-manage command or simialr for operator to manualy update the state | 14:14 |
| sean-k-mooney | im a little reluctant to mirror nova's reset-state api | 14:14 |
| jgilaber | that is currently proposed in the spec, right chandankumar? | 14:14 |
| sean-k-mooney | nova and cinder allows admin to reset the state of instnace after you have fixed the instance/volume | 14:14 |
| chandankumar | currently If a nvme device with cleanup_failed to true, we need to run cleanup manually | 14:15 |
| sean-k-mooney | but we have had a lot fo issues with that in the past with customers or support causing more damabge by using it then if we didnt provide it | 14:15 |
| sean-k-mooney | chandankumar: jgilaber lets follow up in the spec on the exact mechanics | 14:16 |
| sean-k-mooney | i do want to capature both the happy and error paths and ensure we have a documented workflow for both | 14:16 |
| jgilaber | ack | 14:17 |
| chandankumar | I alsoanual recovery:** Operators retry failed cleanups using | 14:17 |
| chandankumar | ``cyborg-nvme-cleanup --device <uuid>`` (requires admin credentials). The CLI | 14:17 |
| chandankumar | tool re-triggers cleanup and resets ``cleanup_failed=False`` on success. | 14:17 |
| chandankumar | that I have proposed in the spec | 14:18 |
| chandankumar | anyway let me address the current comments and new design based on above discussion | 14:18 |
| chandankumar | we can follow up on spec | 14:18 |
| sean-k-mooney | im not really a fan of provideing a cil for the cleaning | 14:19 |
| sean-k-mooney | but ack i have some pednign coeemt form when i first lookd but i stoped at the problem description after my inital skim pass | 14:20 |
| sean-k-mooney | ill review it in detail this week ideally today or tomrorow | 14:20 |
| sean-k-mooney | to be clear im not really a fan of having any dirver speciric clis | 14:20 |
| chandankumar | sean-k-mooney: I will ping you tomorrow for review once I update it based on above design, | 14:21 |
| sean-k-mooney | cyborgs core role is too provied a hardware indepent common api over the acclerator it manges so you shoudl nto need nvme specific clis but a generic way to triger cleaning or programmign is ok | 14:22 |
| sean-k-mooney | ack | 14:22 |
| jgilaber | ack, is that all for this topic? | 14:23 |
| chandankumar | sure | 14:23 |
| chandankumar | thank you jgilaber sean-k-mooney ! | 14:23 |
| jgilaber | thanks Chandan, let's move to reviews | 14:24 |
| jgilaber | #topic Reviews | 14:24 |
| jgilaber | we have one | 14:24 |
| jgilaber | #link https://review.opendev.org/c/openstack/cyborg/+/991027 | 14:24 |
| jgilaber | with tempest tests https://review.opendev.org/c/openstack/cyborg-tempest-plugin/+/991081 | 14:24 |
| sean-k-mooney | i have 2 to add later | 14:24 |
| chandankumar | I was working on dropping the image verification code for device program functionality | 14:24 |
| sean-k-mooney | so the fake driver already supprot program | 14:25 |
| chandankumar | It is dropping the code and marking existing verify_glance_signatures config as deprecated | 14:25 |
| sean-k-mooney | https://review.opendev.org/c/openstack/cyborg/+/991027/2/cyborg/accelerator/drivers/fake.py#127 | 14:26 |
| chandankumar | it has a update method not program one | 14:26 |
| sean-k-mooney | that what update is | 14:26 |
| chandankumar | will I rename that? | 14:26 |
| sean-k-mooney | no update is what prgram is called in teh drier interface | 14:26 |
| chandankumar | ok | 14:26 |
| sean-k-mooney | so that a sperat question | 14:26 |
| sean-k-mooney | we could perhas rename it but that is a large change | 14:26 |
| sean-k-mooney | sicne it woudl be changing the public api of the drivers | 14:26 |
| chandankumar | program interface is currently used in fpga driver only not in generic one | 14:27 |
| sean-k-mooney | again that is but true and untrue | 14:27 |
| sean-k-mooney | so form an api perspective we shoudl not have any driver specifc apis | 14:27 |
| sean-k-mooney | so its more correct to say that the update api si a noop for driver other then the fpg driver | 14:28 |
| sean-k-mooney | rahter then teh progarm api is only for fpga | 14:28 |
| sean-k-mooney | the ablity to program or update a device with a glance image | 14:28 |
| sean-k-mooney | is a genereic capablity that is not used by other drivers | 14:29 |
| sean-k-mooney | but the nvme driver coudl supprot it as we have dicussed in the past | 14:29 |
| chandankumar | yes | 14:29 |
| chandankumar | https://github.com/openstack/cyborg/blob/master/cyborg/accelerator/drivers/driver.py#L27 with respect to current patch, How we do want to proceed? Reusing update at all places a seperate patch and calling update flow via profram cli? | 14:30 |
| jgilaber | I'm looking quickly but the program api can't call the fake driver right? | 14:30 |
| chandankumar | jgilaber: yes | 14:30 |
| sean-k-mooney | so the current patch is incorrect | 14:31 |
| sean-k-mooney | https://github.com/openstack/cyborg/blob/master/cyborg/accelerator/drivers/driver.py#L26-L35 | 14:31 |
| jgilaber | and is there any endpoint for update? | 14:31 |
| sean-k-mooney | udpate is part of the base driver interface | 14:31 |
| sean-k-mooney | and that is how programing shoudl be invoked | 14:31 |
| sean-k-mooney | at least that is muy current understandign but ill need to review that in more detail to confirm | 14:32 |
| chandankumar | there is no endpoint for update, we currently invoke program https://bugs.launchpad.net/openstack-cyborg/+bug/2144308/comments/1 | 14:32 |
| jgilaber | so the fpga implemented it incorrectly? | 14:32 |
| sean-k-mooney | so there are 2 ways to program the ptgs | 14:32 |
| chandankumar | $CYBORG_URL/deployables/$DEPLOYABLE_UUID/program - was the api interface | 14:32 |
| sean-k-mooney | you an refence the image via the device profile | 14:32 |
| sean-k-mooney | or you can manually do it via the deployable api | 14:33 |
| jgilaber | but to test chandankumar's patch we need to call the api iirc | 14:33 |
| sean-k-mooney | if we look at progrma its curenly using PATCH which i want to fix in the future but PATCH is an "update" the same way put is | 14:34 |
| jgilaber | the image verification was only in the program endpoint path | 14:34 |
| chandankumar | https://review.opendev.org/c/openstack/cyborg-tempest-plugin/+/991081/1/cyborg_tempest_plugin/services/cyborg_rest_client.py#154 is this how i am calling in tempest | 14:34 |
| sean-k-mooney | internally its calling fpga_program over the rpc bus | 14:34 |
| sean-k-mooney | which looks like you are right is callign driver.prgram | 14:35 |
| sean-k-mooney | which is an internal api | 14:35 |
| sean-k-mooney | that shoudl nto be called by the manger | 14:36 |
| sean-k-mooney | so i think we need to fix that first | 14:36 |
| chandankumar | ah, yes correct, that needs to be fixed on tempest patch side | 14:36 |
| chandankumar | I also need to add a flag to disable it on older release | 14:37 |
| sean-k-mooney | yes you shoudl add a supprot_program flag or similar | 14:37 |
| sean-k-mooney | we shoudl have a accelorator_features configre group or similar config sction ile the compute_enabeld_feature tempest section | 14:38 |
| sean-k-mooney | we can add fake_program True|False to that to guard the supprot | 14:38 |
| sean-k-mooney | so looping back | 14:39 |
| chandankumar | sure, I will update both the patches based on above suggestion | 14:39 |
| sean-k-mooney | https://github.com/openstack/cyborg/blob/2875d3c12d4484e9336ba5084f32f2acf83a2366/cyborg/accelerator/drivers/fpga/base.py | 14:39 |
| chandankumar | first one I need to re-read the conversaiton | 14:39 |
| sean-k-mooney | shoudl likely be updated to add an implamntion fo update that calle program | 14:40 |
| sean-k-mooney | and program should be updated to _programe or just removed from the fpga drivers | 14:40 |
| sean-k-mooney | alternitivly we coudl do the reverse and make program the public method and remove update | 14:40 |
| sean-k-mooney | we shoudl not really ahve both and the manager shoudl never call a method on a driver that is not part of https://github.com/openstack/cyborg/blob/2875d3c12d4484e9336ba5084f32f2acf83a2366/cyborg/accelerator/drivers/driver.py | 14:41 |
| jgilaber | I don't understand how the current version of chandankumar patches passed in ci with the new test | 14:42 |
| jgilaber | the agent manager calls explicitely the fpga driver program method | 14:42 |
| chandankumar | jgilaber: it is just faking the device path, there is no real check | 14:42 |
| sean-k-mooney | jgilaber: yes it does | 14:42 |
| sean-k-mooney | jgilaber: well technially it doesnt | 14:43 |
| sean-k-mooney | it called the program funciton on the driver object | 14:43 |
| sean-k-mooney | jgilaber: by inheriting form the genic fgpa class | 14:43 |
| sean-k-mooney | the fake driver gained a program funciton | 14:43 |
| sean-k-mooney | which is why its workign but that not really the correct way this should work | 14:44 |
| sean-k-mooney | jgilaber: does that make sense | 14:44 |
| jgilaber | sort of, but I need to stare at it for a bit | 14:45 |
| jgilaber | I'll do that offline though | 14:45 |
| jgilaber | we can continue | 14:45 |
| sean-k-mooney | https://review.opendev.org/c/openstack/cyborg/+/991027/2/cyborg/accelerator/drivers/fake.py | 14:45 |
| sean-k-mooney | so progam call update and update redutr True | 14:45 |
| sean-k-mooney | and because that added program the agened didnt explode when it did driver.program | 14:46 |
| sean-k-mooney | here https://github.com/openstack/cyborg/blob/2875d3c12d4484e9336ba5084f32f2acf83a2366/cyborg/agent/manager.py#L184 | 14:46 |
| sean-k-mooney | but cool lets move on | 14:46 |
| jgilaber | oh I misread the variable, I get it now thanks! | 14:47 |
| chandankumar | I will propose a patch to make program method public and remove update , can back on that patch again | 14:47 |
| sean-k-mooney | chandankumar: so meta comment | 14:47 |
| sean-k-mooney | you need to split this patch in 2 | 14:47 |
| sean-k-mooney | https://review.opendev.org/c/openstack/cyborg/+/991027 shoudl now bopth remvoe the image signiture verficaion | 14:47 |
| sean-k-mooney | and try and hook up programing supprot in the fake dirver | 14:47 |
| sean-k-mooney | thsoe are 2 compeletely diffent activeties so they shoudl be in 2 diffent commits | 14:48 |
| chandankumar | yes, sure! | 14:48 |
| sean-k-mooney | and based on the disucss abvoe we may want to fix how the agent calls the driver ectra as well in a sperate commit | 14:48 |
| chandankumar | ok | 14:49 |
| chandankumar | Now I have next course of action on these | 14:49 |
| chandankumar | thank you! | 14:50 |
| jgilaber | thanks chandankumar! | 14:50 |
| jgilaber | sean-k-mooney, you mentioned before you wanted to hightlight some patches? | 14:50 |
| sean-k-mooney | yes i think im now happy enough with the current content to move forward with https://review.opendev.org/c/openstack/cyborg/+/989470 | 14:51 |
| sean-k-mooney | i do plan ot add a followup with more detailed docuemeation on exactly how the kernel modlule works | 14:51 |
| sean-k-mooney | but i want to work on that in parallel with creating some tempest tests to test the pci driver | 14:51 |
| sean-k-mooney | as part of that change i have future tweaked the ci jobs | 14:52 |
| sean-k-mooney | https://review.opendev.org/c/openstack/cyborg/+/989470/5/.zuul.yaml | 14:52 |
| sean-k-mooney | so the default tempest job will now be multi node (the ipv6 one will be single node) | 14:52 |
| chandankumar | thank you for taking care of multinode changes in that patch | 14:52 |
| sean-k-mooney | the jobs are also mvoed to debian 13 | 14:53 |
| sean-k-mooney | i may add ubuntu 24.04 at some point btu it shoudl already work on ubuntu 26.04 | 14:53 |
| sean-k-mooney | so im hoping ot avoid athat instead | 14:53 |
| jgilaber | do we have a requirement to have some job on ubuntu? | 14:53 |
| sean-k-mooney | its also tbd if i will add centos 10 stream supprot ro not | 14:53 |
| jgilaber | without the new module I mean | 14:53 |
| sean-k-mooney | if we need that for some reason we can consider it in the future | 14:53 |
| sean-k-mooney | jgilaber: no | 14:54 |
| sean-k-mooney | we can but we dont | 14:54 |
| sean-k-mooney | at least not based on the current runtims | 14:54 |
| jgilaber | ack, then debian it's fine | 14:54 |
| sean-k-mooney | but the grendae josb are kep on ubuntu | 14:54 |
| sean-k-mooney | with the module disabled | 14:54 |
| sean-k-mooney | so we actully still have coverage | 14:54 |
| jgilaber | perfect then | 14:54 |
| sean-k-mooney | ideally once 26.04 is workign properly in the ci | 14:55 |
| sean-k-mooney | i can move the ipv6 job to that | 14:55 |
| sean-k-mooney | getting 24.04 to work likely would not be hard but ofr now i just wanted to minimise the kernel spread | 14:56 |
| sean-k-mooney | i also want o highligh one other fix | 14:56 |
| sean-k-mooney | https://review.opendev.org/c/openstack/cyborg/+/989470/5/devstack/lib/cyborg | 14:56 |
| sean-k-mooney | its pretty minor but cybrog shoudl not have nova's password | 14:57 |
| chandankumar | ah yes correct | 14:57 |
| sean-k-mooney | so we shoudl be using the cybrog account to talk to nova and placemetn | 14:57 |
| sean-k-mooney | so i just fixe that here | 14:57 |
| jgilaber | +1 good catch | 14:58 |
| sean-k-mooney | any way im going to leave this open for a few days for ye to take a look | 14:58 |
| sean-k-mooney | and then ill move forward with it at the end fo the week if ye dont find anythign concerning | 14:58 |
| jgilaber | thanks, I'll prioritise this review after I'm done with the specs | 14:58 |
| sean-k-mooney | the other time very quickly is https://review.opendev.org/c/openstack/cyborg-specs/+/989003 | 14:58 |
| sean-k-mooney | ill update the assingee to be me | 14:58 |
| sean-k-mooney | jgilaber has already teken a look | 14:59 |
| sean-k-mooney | if there is no other feedback on that ill proceed with that spec again later in teh week or early next week | 14:59 |
| sean-k-mooney | my over all plan is, get the pci driver testable in teh gate, add some tempest test for it, then start working on adding this fucntionality after | 15:00 |
| sean-k-mooney | so ya if there is any other feedback let me know that all i had for reviews | 15:01 |
| jgilaber | thanks! | 15:01 |
| jgilaber | we're just over time | 15:01 |
| jgilaber | is there any last minute topic that you want to raise quickly | 15:01 |
| jgilaber | ? | 15:01 |
| sean-k-mooney | not cirtical but i notice the nova spec for mdevs | 15:02 |
| sean-k-mooney | merged | 15:02 |
| jgilaber | yes, it merged yesterday | 15:02 |
| sean-k-mooney | so ill also try an loop back to yoru cyborg one | 15:02 |
| jgilaber | I'll try to get started on that work this week | 15:02 |
| jgilaber | thanks! | 15:02 |
| jgilaber | final topic | 15:02 |
| jgilaber | #topic Volunteers to chair next meeting | 15:02 |
| jgilaber | any volunteer? | 15:03 |
| chandankumar | i will take care of next meeting | 15:03 |
| jgilaber | thanks! | 15:03 |
| sean-k-mooney | +1 | 15:03 |
| jgilaber | we can leave it here for today, thanks all! | 15:03 |
| jgilaber | #endmeeting | 15:03 |
| opendevmeet | Meeting ended Tue Jun 2 15:03:35 2026 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:03 |
| opendevmeet | Minutes: https://meetings.opendev.org/meetings/cyborg/2026/cyborg.2026-06-02-14.00.html | 15:03 |
| opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/cyborg/2026/cyborg.2026-06-02-14.00.txt | 15:03 |
| opendevmeet | Log: https://meetings.opendev.org/meetings/cyborg/2026/cyborg.2026-06-02-14.00.log.html | 15:03 |
| sean-k-mooney | jgilaber: quick question | 15:06 |
| sean-k-mooney | are you planning to start on the nvoa side fo that spec first | 15:06 |
| sean-k-mooney | i.e. reviging the patchs for the mdev ci job and then the owner traits | 15:06 |
| jgilaber | yes, I was planning to start on the nova side | 15:07 |
| sean-k-mooney | ack that what i was going to suggest since i feel like we will have less of a reivew bottelneck on the cybrog side | 15:07 |
| jgilaber | I need to look at the existing patches to revive though | 15:07 |
| jgilaber | do you know all of them, or should I ask in the nova channel? | 15:08 |
| sean-k-mooney | you could just create your own but i can grab them quickly one sec | 15:08 |
| sean-k-mooney | https://review.opendev.org/q/topic:%22mtty_support%22 | 15:08 |
| jgilaber | thanks! I'll go through them | 15:09 |
| chandankumar | sean-k-mooney: when get time , can you take a look at https://review.opendev.org/c/openstack/cyborg/+/986536 jgilaber has already reviewed it. | 15:11 |
| sean-k-mooney | right remember that being dicussed before | 15:15 |
| sean-k-mooney | sure | 15:15 |
| chandankumar | thank you! | 15:16 |
| sean-k-mooney | for now i think reproting the trait si fine | 15:16 |
| sean-k-mooney | and we need to audit all the other reider to ensure they all do htis | 15:16 |
| sean-k-mooney | we need to consider the request path later | 15:16 |
| opendevreview | sean mooney proposed openstack/cyborg master: Add pci-sim developer guide https://review.opendev.org/c/openstack/cyborg/+/991177 | 19:00 |
| opendevreview | Ghanshyam Maan proposed openstack/cyborg-tempest-plugin master: Drop python 3.10 https://review.opendev.org/c/openstack/cyborg-tempest-plugin/+/991196 | 19:20 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!