Tuesday, 2026-06-02

opendevreviewchandan kumar proposed openstack/cyborg master: Remove broken image signature verification  https://review.opendev.org/c/openstack/cyborg/+/99102708:36
opendevreviewchandan kumar proposed openstack/cyborg master: Remove broken image signature verification  https://review.opendev.org/c/openstack/cyborg/+/99102708:42
opendevreviewchandan kumar proposed openstack/cyborg-tempest-plugin master: Add scenario test for FPGA programming with FakeDriver  https://review.opendev.org/c/openstack/cyborg-tempest-plugin/+/99108113:23
jgilaber#startmeeting cyborg14:00
opendevmeetMeeting started Tue Jun  2 14:00:27 2026 UTC and is due to finish in 60 minutes.  The chair is jgilaber. Information about MeetBot at http://wiki.debian.org/MeetBot.14:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.14:00
opendevmeetThe meeting name has been set to 'cyborg'14:00
jgilaberHi all! Who is around today?14:00
jgilaberWhile we gather, feel free to add topics to the agenda https://etherpad.opendev.org/p/openstack-cyborg-irc-meeting#L4814:01
jgilabercourtesy ping: sean-k-mooney amoralej bogdando rlandy chandankumar14:01
chandankumaro/14:01
sean-k-mooneyo/14:01
jgilaberlet's give folks a minute to join and then we can start14:02
jgilaberok let's get started14:03
jgilaberwe have a topic from chandankumar 14:03
jgilaber#topic nvme cleanup stages spec proposal discussion14:03
jgilaber#link nvme cleanup stages spec proposal discussion14:03
jgilabergo ahead chandankumar 14:03
chandankumarsure14:04
chandankumarHere is the current spec for nvme secure cleanup https://review.opendev.org/c/openstack/cyborg-specs/+/985349/10/specs/2026.2/approved/generic-nvme-driver-with-secure-cleanup.rst14:04
chandankumarThank you everyone for reviewing the spec.14:04
chandankumarBelow is the cleanup flow for nvme device once user deletes an instance:14:04
chandankumar1. Instance deletion and unbind process goes seperately.14:04
chandankumar2. During unbind process, cyborg will disable the nvme device with maintaining status by default and set resevered=total in placement14:04
chandankumar3. Unbind process will finish and cleanup will run async. 14:04
chandankumar4. if the cleanup finishes successfully, we set reserved=0 in placement and add a new flag cleanup_fail to false and enable the device14:04
chandankumar5. if cleanup fails, keep the device disabled with maintaing state and set cleanup_fail to true and reserved=total in placement.14:04
chandankumar6. Operator can list the device with cleanup_failed and run cleanup manually on those devices.14:05
chandankumarEarlier in previous patch iteration, I went with cleaning and cleaning_failed status message but there are too many state.14:05
chandankumarSo I sticked with14:05
chandankumardefault maintaining state and cleanup_failed flag to keep the flow simpler.14:05
chandankumarDo we want to add a cleaning state to know the exact state? and if cleanup failed, we will stick with maintaining?14:05
sean-k-mooney so im not ok with reusing disabled for cleaning failed14:05
chandankumarHow do we want to handle that?14:05
sean-k-mooneythat was one of the feedack i gave on irc when you pushed the sepc initally14:05
amoralejo/14:05
chandankumarsean-k-mooney: sorry I am not getting14:06
sean-k-mooneyi want to intoduce a sperate device_state filed which will taransation between avilable -> allcoated -> cleaning  and hten to eitehr error or aviabel dependign on if cleanign succeeded14:06
chandankumarok14:07
sean-k-mooneywe also need to set reserved=total durign arq bind not durign unbind14:07
sean-k-mooneyso on bind we woudl transiation the device to allcoated and set reserved=total14:08
sean-k-mooneyon unbind it moved to cleaning keeping reserved=total14:08
chandankumarabove device state sounds good, I was focus earloer on cleanup14:09
sean-k-mooneyand ether end in error (reserved=total) or aviable (reseting reserved=0)14:09
chandankumaryes that approach seems better14:10
chandankumarOne more thing, Does operator can toggle the device state manually? once clenaup finishes successfully14:10
sean-k-mooneyim also wondering if we want to add a /device/<uuid>/clean endpoint ot manually trigger cleaning14:10
sean-k-mooneythat would be an admin operator similar to program14:10
sean-k-mooneychandankumar: no14:11
sean-k-mooneythe admin cannot14:11
sean-k-mooneyif we add clean as a deivce action14:11
sean-k-mooneythat is how they would recover it14:11
sean-k-mooneywe coudl perhaps consider a way to force it14:11
jgilaberthat endpoint would only work for devices that are in error?14:12
sean-k-mooneybut  this si really internal stant that an admin shoudl not have to change in a normal workflow14:12
sean-k-mooneyjgilaber: i would say error or aviableale woudl be ok14:12
sean-k-mooneybut a 409 for allcoated/cleaning14:12
jgilaberright, available would be fine, but redundant14:13
sean-k-mooneywe coudl dicusssi fi cleanign an allcoated fpga before repogrammign it shoudl eb allowed or not in the spec14:13
sean-k-mooneythe other approch here for erroed device woudl eb a cyborg-manage command or simialr for operator to manualy update the state14:14
sean-k-mooneyim a little reluctant to mirror nova's reset-state api14:14
jgilaberthat is currently proposed in the spec, right chandankumar?14:14
sean-k-mooneynova and cinder allows admin to reset the state of instnace after you have fixed the instance/volume14:14
chandankumarcurrently If a nvme device with cleanup_failed to true, we need to run cleanup manually14:15
sean-k-mooneybut we have had a lot fo issues with that in the past with customers or support causing more damabge by using it then if we didnt provide it14:15
sean-k-mooneychandankumar: jgilaber  lets follow up in the spec on the exact mechanics14:16
sean-k-mooneyi do want to capature both the happy and error paths and ensure we have a documented workflow for both14:16
jgilaberack14:17
chandankumarI alsoanual recovery:** Operators retry failed cleanups using14:17
chandankumar``cyborg-nvme-cleanup --device <uuid>`` (requires admin credentials). The CLI14:17
chandankumartool re-triggers cleanup and resets ``cleanup_failed=False`` on success.14:17
chandankumarthat I have proposed in the spec14:18
chandankumaranyway let me address the current comments and new design based on above discussion14:18
chandankumarwe can follow up on spec14:18
sean-k-mooneyim not really a fan of provideing a cil for the cleaning14:19
sean-k-mooneybut ack i have some pednign coeemt form when i first lookd but i stoped at the problem description after my inital skim pass14:20
sean-k-mooneyill review it in detail this week ideally today or tomrorow14:20
sean-k-mooneyto be clear im not really a fan of having any dirver speciric clis14:20
chandankumarsean-k-mooney: I will ping you tomorrow for review once I update it based on above design, 14:21
sean-k-mooneycyborgs core role is too provied a hardware indepent common api over the acclerator it manges so you shoudl nto need nvme specific clis but a generic way to triger cleaning or programmign is ok14:22
sean-k-mooneyack14:22
jgilaberack, is that all for this topic?14:23
chandankumarsure14:23
chandankumarthank you jgilaber sean-k-mooney !14:23
jgilaberthanks Chandan, let's move to reviews14:24
jgilaber#topic Reviews14:24
jgilaberwe have one14:24
jgilaber#link https://review.opendev.org/c/openstack/cyborg/+/99102714:24
jgilaberwith tempest tests https://review.opendev.org/c/openstack/cyborg-tempest-plugin/+/99108114:24
sean-k-mooneyi have 2 to add later14:24
chandankumarI was working on dropping the image verification code for device program functionality14:24
sean-k-mooneyso the fake driver already supprot program14:25
chandankumarIt is dropping the code and marking existing verify_glance_signatures config as deprecated14:25
sean-k-mooneyhttps://review.opendev.org/c/openstack/cyborg/+/991027/2/cyborg/accelerator/drivers/fake.py#12714:26
chandankumarit has a update method not program one14:26
sean-k-mooneythat what update is14:26
chandankumarwill I rename that?14:26
sean-k-mooneyno update is what prgram is called in teh drier interface14:26
chandankumarok14:26
sean-k-mooneyso that a sperat question14:26
sean-k-mooneywe could perhas rename it but that is a large change14:26
sean-k-mooneysicne it woudl be changing the public api of the drivers14:26
chandankumarprogram interface is currently used in fpga driver only not in generic one14:27
sean-k-mooneyagain that is but true and untrue14:27
sean-k-mooneyso form an api perspective we shoudl not have any driver specifc apis14:27
sean-k-mooneyso its more correct to say that the update api si a noop for driver other then the fpg driver14:28
sean-k-mooneyrahter then teh progarm api is only for fpga14:28
sean-k-mooneythe ablity to program or update a device with a glance image14:28
sean-k-mooneyis a genereic capablity that is not used by other drivers14:29
sean-k-mooneybut the nvme driver coudl supprot it as we have dicussed in the past14:29
chandankumaryes14:29
chandankumarhttps://github.com/openstack/cyborg/blob/master/cyborg/accelerator/drivers/driver.py#L27 with respect to current patch, How we do want to proceed? Reusing update at all places a seperate patch and calling update flow via profram cli?14:30
jgilaberI'm looking quickly but the program api can't call the fake driver right?14:30
chandankumarjgilaber: yes14:30
sean-k-mooneyso the current patch is incorrect14:31
sean-k-mooneyhttps://github.com/openstack/cyborg/blob/master/cyborg/accelerator/drivers/driver.py#L26-L3514:31
jgilaberand is there any endpoint for update?14:31
sean-k-mooneyudpate is part of the base driver interface14:31
sean-k-mooneyand that is how programing shoudl be invoked14:31
sean-k-mooneyat least that is muy current understandign but ill need to review that in more detail to confirm14:32
chandankumarthere is no endpoint for update, we currently invoke program https://bugs.launchpad.net/openstack-cyborg/+bug/2144308/comments/114:32
jgilaberso the fpga implemented it incorrectly?14:32
sean-k-mooneyso there are 2 ways to program the ptgs14:32
chandankumar$CYBORG_URL/deployables/$DEPLOYABLE_UUID/program - was the api interface14:32
sean-k-mooneyyou an refence the image via the device profile14:32
sean-k-mooneyor you can manually do it via the deployable api14:33
jgilaberbut to test chandankumar's patch we need to call the api iirc14:33
sean-k-mooneyif we look at progrma its curenly using PATCH which i want to fix in the future but PATCH is an "update" the same way put is14:34
jgilaberthe image verification was only in the program endpoint path14:34
chandankumarhttps://review.opendev.org/c/openstack/cyborg-tempest-plugin/+/991081/1/cyborg_tempest_plugin/services/cyborg_rest_client.py#154 is this how i am calling in tempest14:34
sean-k-mooneyinternally its calling fpga_program over the rpc bus14:34
sean-k-mooneywhich looks like you are right is callign driver.prgram14:35
sean-k-mooneywhich is an internal api14:35
sean-k-mooneythat shoudl nto be called by the manger14:36
sean-k-mooneyso i think we need to fix that first14:36
chandankumarah, yes correct, that needs to be fixed on tempest patch side14:36
chandankumarI also need to add a flag to disable it on older release14:37
sean-k-mooneyyes you shoudl add a supprot_program flag or similar14:37
sean-k-mooneywe shoudl have a accelorator_features configre group or similar config sction ile the compute_enabeld_feature tempest section14:38
sean-k-mooneywe can add fake_program True|False to that to guard the supprot14:38
sean-k-mooneyso looping back14:39
chandankumarsure, I will update both the patches based on above suggestion14:39
sean-k-mooneyhttps://github.com/openstack/cyborg/blob/2875d3c12d4484e9336ba5084f32f2acf83a2366/cyborg/accelerator/drivers/fpga/base.py14:39
chandankumarfirst one I need to re-read the conversaiton14:39
sean-k-mooneyshoudl likely be updated to add an implamntion fo update that calle program14:40
sean-k-mooneyand program should be updated to _programe or just removed from the fpga drivers14:40
sean-k-mooneyalternitivly we coudl do the reverse and make program the public method and remove update14:40
sean-k-mooneywe shoudl not really ahve both and the manager shoudl never call a method on a driver that is not part of https://github.com/openstack/cyborg/blob/2875d3c12d4484e9336ba5084f32f2acf83a2366/cyborg/accelerator/drivers/driver.py14:41
jgilaberI don't understand how the current version of chandankumar patches passed in ci with the new test14:42
jgilaberthe agent manager calls explicitely the fpga driver program method14:42
chandankumarjgilaber: it is just faking the device path, there is no real check14:42
sean-k-mooneyjgilaber: yes it does14:42
sean-k-mooneyjgilaber: well technially it doesnt14:43
sean-k-mooneyit called the program funciton on the driver object14:43
sean-k-mooneyjgilaber: by inheriting form the genic fgpa class14:43
sean-k-mooneythe fake driver gained a program funciton14:43
sean-k-mooneywhich is why its workign but that not really the correct way this should work14:44
sean-k-mooneyjgilaber: does that make sense14:44
jgilabersort of, but I need to stare at it for a bit14:45
jgilaberI'll do that offline though14:45
jgilaberwe can continue14:45
sean-k-mooneyhttps://review.opendev.org/c/openstack/cyborg/+/991027/2/cyborg/accelerator/drivers/fake.py14:45
sean-k-mooneyso progam call update and update redutr True14:45
sean-k-mooneyand because that added program the agened didnt explode when it did driver.program14:46
sean-k-mooneyhere https://github.com/openstack/cyborg/blob/2875d3c12d4484e9336ba5084f32f2acf83a2366/cyborg/agent/manager.py#L18414:46
sean-k-mooneybut cool lets move on14:46
jgilaberoh I misread the variable, I get it now thanks!14:47
chandankumarI will propose a patch to make program method public and remove update , can back on that patch again14:47
sean-k-mooneychandankumar: so meta comment14:47
sean-k-mooneyyou need to split this patch in 214:47
sean-k-mooneyhttps://review.opendev.org/c/openstack/cyborg/+/991027 shoudl now bopth remvoe the image signiture verficaion14:47
sean-k-mooneyand try  and hook up programing supprot in the fake dirver14:47
sean-k-mooneythsoe are 2 compeletely diffent activeties so they shoudl be in 2 diffent commits14:48
chandankumaryes, sure!14:48
sean-k-mooneyand based on the disucss abvoe we may want to fix how the agent calls the driver ectra as well in a sperate commit14:48
chandankumarok14:49
chandankumarNow I have next course of action on these14:49
chandankumarthank you!14:50
jgilaberthanks chandankumar!14:50
jgilabersean-k-mooney, you mentioned before you wanted to hightlight some patches?14:50
sean-k-mooneyyes i think im now happy enough with the current content to move forward with https://review.opendev.org/c/openstack/cyborg/+/98947014:51
sean-k-mooneyi do plan ot add a followup with more detailed docuemeation on exactly how the kernel modlule works14:51
sean-k-mooneybut i want to work on that in parallel with creating some tempest tests to test the pci driver14:51
sean-k-mooneyas part of that change i have future tweaked the ci jobs14:52
sean-k-mooneyhttps://review.opendev.org/c/openstack/cyborg/+/989470/5/.zuul.yaml14:52
sean-k-mooneyso the default tempest job will now be multi node (the ipv6 one will be single node)14:52
chandankumarthank you for taking care of multinode changes in that patch14:52
sean-k-mooneythe jobs are also mvoed to debian 1314:53
sean-k-mooneyi may add ubuntu 24.04 at some point btu it shoudl already work on ubuntu 26.0414:53
sean-k-mooneyso im hoping ot avoid athat instead14:53
jgilaberdo we have a requirement to have some job on ubuntu?14:53
sean-k-mooneyits also tbd if i will add centos 10 stream supprot ro not14:53
jgilaberwithout the new module I mean14:53
sean-k-mooneyif we need that for some reason we can consider it in the future14:53
sean-k-mooneyjgilaber: no14:54
sean-k-mooneywe can but we dont14:54
sean-k-mooneyat least not based on the current runtims14:54
jgilaberack, then debian it's fine14:54
sean-k-mooneybut the grendae josb are kep on ubuntu14:54
sean-k-mooneywith the module disabled14:54
sean-k-mooneyso we actully still have coverage14:54
jgilaberperfect then14:54
sean-k-mooneyideally once 26.04 is workign properly in the ci14:55
sean-k-mooneyi can move the ipv6 job to that14:55
sean-k-mooneygetting 24.04 to work likely would not be hard but ofr now i just wanted to minimise the kernel spread14:56
sean-k-mooneyi also want o highligh one other fix14:56
sean-k-mooneyhttps://review.opendev.org/c/openstack/cyborg/+/989470/5/devstack/lib/cyborg14:56
sean-k-mooneyits pretty minor but cybrog shoudl not have nova's password14:57
chandankumarah yes correct14:57
sean-k-mooneyso we shoudl be using the cybrog account to talk to nova and placemetn14:57
sean-k-mooneyso i just fixe that here14:57
jgilaber+1 good catch14:58
sean-k-mooneyany way im going to leave this open for a few days for ye to take a look14:58
sean-k-mooneyand then ill move forward with it at the end fo the week if ye dont find anythign concerning14:58
jgilaberthanks, I'll prioritise this review after I'm done with the specs14:58
sean-k-mooneythe other time very quickly is https://review.opendev.org/c/openstack/cyborg-specs/+/98900314:58
sean-k-mooneyill update the assingee to be me14:58
sean-k-mooneyjgilaber has already teken a look14:59
sean-k-mooneyif there is no other feedback on that ill proceed with that spec again later in teh week or early next week14:59
sean-k-mooneymy over all plan is, get the pci driver testable in teh gate, add some tempest test for it, then start working on adding this fucntionality after15:00
sean-k-mooneyso ya if there is any other feedback let me know that all i had for reviews15:01
jgilaberthanks! 15:01
jgilaberwe're just over time15:01
jgilaberis there any last minute topic that you want to raise quickly15:01
jgilaber?15:01
sean-k-mooneynot cirtical but i notice the nova spec for mdevs15:02
sean-k-mooneymerged15:02
jgilaberyes, it merged yesterday15:02
sean-k-mooneyso ill also try an loop back to yoru cyborg one15:02
jgilaberI'll try to get started on that work this week15:02
jgilaberthanks!15:02
jgilaberfinal topic15:02
jgilaber#topic Volunteers to chair next meeting15:02
jgilaberany volunteer?15:03
chandankumari will take care of next meeting15:03
jgilaberthanks!15:03
sean-k-mooney+115:03
jgilaberwe can leave it here for today, thanks all!15:03
jgilaber#endmeeting15:03
opendevmeetMeeting ended Tue Jun  2 15:03:35 2026 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:03
opendevmeetMinutes:        https://meetings.opendev.org/meetings/cyborg/2026/cyborg.2026-06-02-14.00.html15:03
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/cyborg/2026/cyborg.2026-06-02-14.00.txt15:03
opendevmeetLog:            https://meetings.opendev.org/meetings/cyborg/2026/cyborg.2026-06-02-14.00.log.html15:03
sean-k-mooneyjgilaber: quick question 15:06
sean-k-mooneyare you planning to start on the nvoa side fo that spec first15:06
sean-k-mooneyi.e. reviging the patchs for the mdev ci job and then the owner traits15:06
jgilaberyes, I was planning to start on the nova side15:07
sean-k-mooneyack that what i was going to suggest since i feel like we will have less of a reivew bottelneck on the cybrog side15:07
jgilaberI need to look at the existing patches to revive though15:07
jgilaberdo you know all of them, or should I ask in the nova channel?15:08
sean-k-mooneyyou could just create your own but i can grab them quickly one sec15:08
sean-k-mooneyhttps://review.opendev.org/q/topic:%22mtty_support%2215:08
jgilaberthanks! I'll go through them15:09
chandankumarsean-k-mooney: when get time , can you take a look at https://review.opendev.org/c/openstack/cyborg/+/986536 jgilaber has already reviewed it.15:11
sean-k-mooneyright  remember that being dicussed before15:15
sean-k-mooneysure15:15
chandankumarthank you!15:16
sean-k-mooneyfor now i think reproting the trait si fine15:16
sean-k-mooneyand we need to audit all the other reider to ensure they all do htis15:16
sean-k-mooneywe need to consider the request path later15:16
opendevreviewsean mooney proposed openstack/cyborg master: Add pci-sim developer guide  https://review.opendev.org/c/openstack/cyborg/+/99117719:00
opendevreviewGhanshyam Maan proposed openstack/cyborg-tempest-plugin master: Drop python 3.10  https://review.opendev.org/c/openstack/cyborg-tempest-plugin/+/99119619:20

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!