*** dasm is now known as dasm|off | 02:00 | |
bauzas | good morning Nova | 06:57 |
---|---|---|
* bauzas gets a SIGHUP from 3 weeks | 06:57 | |
gibi | bauzas: welcome back | 07:48 |
bauzas | gibi: thanks | 07:50 |
*** elodilles_pto is now known as elodilles | 08:06 | |
gibi | the user_data patch looks good to me, but needs a second code. The rebuild of bfv also look good to me but I have some nits that can be fixed in the last patch or as a follow up if needed | 09:08 |
*** brinzhang_ is now known as brinzhang | 09:09 | |
gibi | I will keep an eye on the manila series | 09:10 |
kgube | hi, I'm new to Nova development! I've been working on a blueprint+spec that I'd like to submit and get feedback on, but I have been wondering, should I wait with this until after the Zed release? | 09:10 |
sean-k-mooney | kgube: if its a spec you can submit it to the spec repo now | 09:17 |
gibi | kgube: welcome! | 09:17 |
sean-k-mooney | kgube: it might not get much review for the next week or two but you can feel free to create it | 09:17 |
gibi | kgube: it is a good time to submit the spec as we soon be freed up from Zed and start planning the AA cycle | 09:17 |
sean-k-mooney | kgube: once we release our first release candiate the master banch will technically be open for the a cycle development | 09:18 |
sean-k-mooney | although we tend not to merge large changes until the final release is done | 09:18 |
gibi | kgube: also if the spec is a bit controversial then we can add it to the AA PTG planning to discussit in real time | 09:19 |
kgube | alright, thanks for the info! | 09:20 |
gibi | sean-k-mooney: I've replied your question in https://review.opendev.org/c/openstack/nova/+/853835 | 09:20 |
sean-k-mooney | gibi: thanks ill look again shortly. most o fthe seriese i looked at looks good | 09:25 |
sean-k-mooney | i see a good split point again about half way through the open patches | 09:25 |
sean-k-mooney | around here ish https://review.opendev.org/c/openstack/nova/+/854440 | 09:26 |
kgube | sean-k-mooney: since AA is not yet available in the spec repo, should I submit it to Zed for now? | 09:26 |
sean-k-mooney | kgube: you can just creat the directory locally like dan did here https://review.opendev.org/c/openstack/nova-specs/+/853837 | 09:27 |
sean-k-mooney | kgube: ill create a patch to do it properly later today | 09:28 |
sean-k-mooney | and then you can rebase when its merged | 09:28 |
sean-k-mooney | for now just copy the zed template and use that for the basis of the spec | 09:28 |
kgube | alright! | 09:28 |
sean-k-mooney | i dont think we plan to update it for antelope currently | 09:29 |
bauzas | I was planning to do the AA paperwork right after Zed-3 FWIW | 09:34 |
sean-k-mooney | ack well that thursday so if you want to create it go for it otherwise ill go create it when i get time | 09:35 |
sean-k-mooney | we have two things to do for the specs repo | 09:35 |
sean-k-mooney | one create teh new folder and copy the template | 09:35 |
sean-k-mooney | two run the script to move/symlink the implemented specs | 09:36 |
sean-k-mooney | that second task uses the bluepirnt state in launchpad to generate teh list | 09:36 |
sean-k-mooney | so that is better to wait until next week when everything is updated before we do that | 09:36 |
gibi | sean-k-mooney: ack | 09:40 |
bauzas | sean-k-mooney: moving the specs is generally planned for RC1 https://docs.openstack.org/nova/latest/contributor/ptl-guide.html#milestone-3 | 09:52 |
bauzas | I mean approved => implementeds | 09:53 |
sean-k-mooney | yep | 09:59 |
sean-k-mooney | that is to account for possible FFEs | 10:00 |
sean-k-mooney | kgube: what is the thing you have been working on by the way | 10:30 |
kgube | sean-k-mooney: support for extending attached file-based volumes (such as NFS volumes) | 10:38 |
sean-k-mooney | oh that shoudl already exists | 10:39 |
sean-k-mooney | provided the cinder volume supprots it | 10:39 |
sean-k-mooney | i rememebr that case being speical but i belive it should work | 10:39 |
sean-k-mooney | ah yes https://docs.openstack.org/cinder/latest/reference/support-matrix.html#operation_online_extend_support | 10:40 |
sean-k-mooney | Generic NFS Reference Driver (NFS): missing | 10:40 |
sean-k-mooney | but some vendor nfs drivers supprot it | 10:41 |
sean-k-mooney | Veritas Cluster NFS Driver (NFS): complete | 10:41 |
sean-k-mooney | NetApp Data ONTAP Driver (iSCSI, NFS, FC): complete | 10:41 |
sean-k-mooney | kgube: so that is likely a cidner feature rahter then a nova one | 10:41 |
kgube | sean-k-mooney: https://bugs.launchpad.net/nova/+bug/1978294 | 10:41 |
sean-k-mooney | hum | 10:43 |
sean-k-mooney | im not sure if we really want nova to have logic to resize these volumes on behalf of the backend | 10:43 |
sean-k-mooney | i guess we can review your proposal | 10:43 |
sean-k-mooney | but i would proably push this towards os-brick | 10:44 |
kgube | the logic is already there, though | 10:44 |
sean-k-mooney | or require the backend driver to do it not nova | 10:44 |
kgube | its just exposed as an external server event only | 10:44 |
sean-k-mooney | changing the external events api to be synconous is likely not somethign we can do | 10:44 |
kgube | The problem is that QEMU has to perform the resize | 10:45 |
sean-k-mooney | qemu or qemu image | 10:45 |
sean-k-mooney | *qemu-img | 10:45 |
kgube | qemu holds a lock on the attached file, which qemu-img respects | 10:46 |
kgube | so it refuses to resize | 10:46 |
sean-k-mooney | right but you can call qemu-img with force-share | 10:46 |
sean-k-mooney | my understandign is that is what the nfs driver normally did to reseize the device | 10:47 |
sean-k-mooney | is that not the case | 10:47 |
kgube | that does not work for modifying the image, afaik | 10:47 |
kgube | and the qemu-img docu explicitely warns against modifying attached volumes | 10:48 |
sean-k-mooney | i see | 10:49 |
sean-k-mooney | then rather then a new server action | 10:50 |
sean-k-mooney | i think you should modle the new api on the existing /os-assed-volume-shapshots api | 10:50 |
sean-k-mooney | https://docs.openstack.org/api-ref/compute/#create-assisted-volume-snapshots | 10:50 |
sean-k-mooney | actully https://docs.openstack.org/api-ref/compute/#assisted-volume-snapshots-os-assisted-volume-snapshots is slight better link | 10:51 |
sean-k-mooney | you could add a /os-assisted-volume-extend api for this usecase | 10:52 |
sean-k-mooney | that can be blocking or nonblocking as its a new api for this exact usecase | 10:52 |
sean-k-mooney | its not really a server action like the other instance actions hence new top level api | 10:53 |
kgube | ok, yeah, I guess that makes sense | 10:54 |
kgube | so Cinder does not need to care which instance the volume is attached to | 10:56 |
sean-k-mooney | well im not actully sure why we dont include the instance uuid it woudl make our life simpler | 10:58 |
sean-k-mooney | so i would proably include it in teh request as its simpelr for use to then make the rpc to the compute node work | 10:59 |
sean-k-mooney | im not actully user that /os-assisted-volume-snapshots works properly for multi_attach volumes for example | 11:00 |
sean-k-mooney | although i suspect that is not supproted via nfs backends | 11:00 |
sean-k-mooney | so i would proably make the body somethign like (instance_uuid, volume_uuid, new_size) and perhapsp include the attachment_uuid if that is useful but proably not required | 11:02 |
kgube | hm, NetApp ONTAP seems to support multiattach | 11:02 |
sean-k-mooney | with that we can internally lookup the instance.host and make an rpc to the compute to do the reisze | 11:02 |
sean-k-mooney | kgube: i think that is likely only with iscsi | 11:02 |
sean-k-mooney | you could make it work i guess with nfs | 11:03 |
sean-k-mooney | but that will be much much harder to supprot extend with | 11:03 |
sean-k-mooney | as we have muliple QEMU processes using it | 11:03 |
kgube | yeah | 11:03 |
kgube | I think this will only work if the file is read-only | 11:04 |
sean-k-mooney | for iscis the volume resize is going ot happen on the netapp san | 11:04 |
kgube | but then we canT resize it anyway | 11:04 |
kgube | or, qemu won't rather | 11:04 |
sean-k-mooney | yep thats an edge case which prably should be called out and validated on the cinder sied | 11:07 |
sean-k-mooney | e.g. prevent multi attach volumes on nfs form either beign created or rezised while attached depending on what is more approcriate | 11:07 |
kgube | yeah, that is something cinder will have to do | 11:11 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Follow up for the PCI in placement series https://review.opendev.org/c/openstack/nova/+/855185 | 11:19 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Doc follow up for PCI in placement https://review.opendev.org/c/openstack/nova/+/855186 | 11:19 |
bauzas | reminder : nova meeting will happen today at 1600UTC here in this channel | 12:23 |
bauzas | gibi: sean-k-mooney: I can run the meeting back | 12:24 |
* bauzas is currently looking at what was done during my 3.5week-PTO | 12:24 | |
gibi | bauzas: you can have the meeting back :) | 12:29 |
bauzas | not sure if I should say \o/ or /.o\ | 12:29 |
Uggla | Hi bauzas, I hope you enjoy your vacation. | 12:30 |
bauzas | I did | 12:30 |
Uggla | Just to let you know that unfortunately I could not attend today's meeting. :( | 12:31 |
bauzas | np | 12:31 |
kashyap | Uggla: s/enjoy/enjoyed/? I thought he's going to have another vacation :-P | 12:31 |
kashyap | (Which is also fine, if he's up for it.) | 12:31 |
Uggla | *enjoyed thx kashyap | 12:31 |
bauzas | haha, can't wait for someone saying I'm on "perpetual PTO" | 12:31 |
bauzas | you know what was fun ? I was in a camp site for two weeks | 12:32 |
bauzas | first week, the site was telling in French | 12:32 |
bauzas | eventually, last week, we got like 70% of people that were German | 12:33 |
bauzas | as it was the last children vacations week in France, while some German lander still have 3 weeks | 12:33 |
bauzas | so, the campsite folks were telling in German | 12:33 |
bauzas | *in Corsica | 12:34 |
Uggla | I hope you had no pb with the storm in Corsica ? | 12:34 |
bauzas | none | 12:36 |
bauzas | I was on the East | 12:36 |
bauzas | when the storm arrived around Ajaccio, we just got a small storm, like just winding and a few raining for 10 mins | 12:38 |
bauzas | but the campsite asked us on the afternoon to make sure we could quickly move to a gym if needed on the next evening as a new storm was arriving | 12:39 |
bauzas | eventually, we didn't move | 12:39 |
Uggla | bauzas, good to know. I saw some impressive storm videos and thought about you hoping you were safe. | 12:45 |
bauzas | thanks | 12:45 |
bauzas | indeed this was impressive | 12:45 |
Uggla | clearly | 12:45 |
bauzas | I visited some small village in the mountains on the day the storm arrived | 12:46 |
bauzas | and yeah, they got problems with just the wind | 12:46 |
bauzas | 200km/h | 12:46 |
bauzas | fortunately, the mountain protected us | 12:46 |
kashyap | Yikes! | 12:54 |
ricolin | sean-k-mooney gibi stephenfin hey, can you give another review on https://review.opendev.org/c/openstack/nova/+/830646/ thanks | 13:38 |
sean-k-mooney | sure have you started reworking https://review.opendev.org/c/openstack/nova/+/844507/ | 13:40 |
sean-k-mooney | we likely will want ot merge both togehter | 13:41 |
sean-k-mooney | they proably shoudl be in the other order too | 13:41 |
*** dasm|off is now known as dasm | 13:46 | |
stephenfin | ricolin: sure | 13:55 |
dansmith | whoami-rajat: are you working on gibi's comments? if not, I'll do it | 14:05 |
ricolin | sean-k-mooney: yes, see if I can push something out today | 14:09 |
gibi | stephenfin: sean-k-mooney made review progress in the PCI series so if you have free cyles then we could merge some patches there. I also pushed a FUP on top with doc fixes based on your comments | 14:15 |
stephenfin | gibi: Also on my list | 14:16 |
gibi | stephenfin: thanks | 14:16 |
bauzas | I haven't yet had time to look at open reviews, but if people want me to look at some, \o | 14:29 |
* bauzas looks at the previous meetings and some changes | 14:29 | |
whoami-rajat | dansmith, sorry was away, I'm currently busy with the PTL nomination, after that can address | 14:50 |
dansmith | whoami-rajat: I'll do it | 14:51 |
whoami-rajat | dansmith, ack, thanks | 14:51 |
dansmith | just didn't want to step on toes if you were in the middle already | 14:51 |
sean-k-mooney | i have a few nits ill be adding shortly | 14:51 |
sean-k-mooney | reviewing teh sersie now | 14:51 |
sean-k-mooney | nothing worhth holding it however | 14:51 |
whoami-rajat | ack, yeah I've been busy with cinder deadlines as well this week | 14:53 |
dansmith | sean-k-mooney: get them in so I can address | 14:54 |
sean-k-mooney | the nits on the first patch are up ill review the last two again shortly | 14:56 |
sean-k-mooney | no issue with the secon patch reviewing last one now | 14:57 |
dansmith | oh I thought you mean the last one | 15:01 |
dansmith | if they're just nits, I think we can/should do a FUP for the nits on the lower ones | 15:01 |
dansmith | I was going to revise the top one because it won't impact much | 15:01 |
sean-k-mooney | ack | 15:01 |
sean-k-mooney | my main grip is _detach_device being used asd the name of the function that detaches the root volume | 15:02 |
sean-k-mooney | since its really really ambigious | 15:02 |
sean-k-mooney | but its private/internal | 15:02 |
sean-k-mooney | and we can change that later | 15:02 |
sean-k-mooney | its not going ot affect the rpc or anything outside the compute manager and the tests for it | 15:02 |
sean-k-mooney | there were a couple of typos in the commens too but again nothing requireing a respin | 15:03 |
sean-k-mooney | dansmith: do you hae a way forward for https://review.opendev.org/c/openstack/nova/+/830883/27/nova/tests/functional/test_boot_from_volume.py | 15:04 |
sean-k-mooney | i can try and run that locally shortly and take a look | 15:05 |
dansmith | sean-k-mooney: I fixed it | 15:05 |
sean-k-mooney | ack | 15:05 |
dansmith | in the patchset I pushed after that | 15:05 |
dansmith | hence the zuul +1 | 15:05 |
sean-k-mooney | oh thats an old comment | 15:05 |
dansmith | yeah | 15:05 |
sean-k-mooney | ok marked it done | 15:06 |
sean-k-mooney | ok done | 15:07 |
dansmith | ack | 15:08 |
opendevreview | Dan Smith proposed openstack/nova master: Add API support for rebuilding BFV instances https://review.opendev.org/c/openstack/nova/+/830883 | 15:09 |
sean-k-mooney | dansmith: i +2'd the userdata patch but left the +w to you or melwitt by the way since i respun it | 15:09 |
dansmith | sean-k-mooney: I haven't reviewed that one really yet | 15:09 |
dansmith | but I think melwitt did, so hopefully she can hit it | 15:10 |
sean-k-mooney | she had one pending comment from PS7 https://review.opendev.org/c/openstack/nova/+/816157/15/nova/objects/instance.py but i also dont think she considered it a blocker | 15:11 |
sean-k-mooney | its a property on the ovo so technially we could change that later without causing any object change the one thing we cant change is the key in the system metadata table with our a data migration | 15:12 |
sean-k-mooney | so im hoping she is ok with is as is | 15:13 |
jhartkopf | sean-k-mooney, gibi: Hey, just saw that the user data change already received two +2s from you. I have some adjustments (in response to nits) ready locally. Should I wait for a follow up with this? | 15:23 |
sean-k-mooney | yes | 15:23 |
sean-k-mooney | please adress them in a followup patch | 15:23 |
gibi | jhartkopf: o/ . as sean-k-mooney suggests lets address those nits in a separate patch. We intend to land the user_data patch as is today to allow the next feature behind it to land too | 15:24 |
opendevreview | Merged openstack/nova master: Adapt websocketproxy tests for SimpleHTTPServer fix https://review.opendev.org/c/openstack/nova/+/853379 | 15:24 |
opendevreview | Balazs Gibizer proposed openstack/nova-specs master: Update the PCI in placement spec https://review.opendev.org/c/openstack/nova-specs/+/855218 | 15:26 |
gibi | sean-k-mooney, stephenfin: updated the PCI spec to reflect the implementation reality ^^ | 15:27 |
sean-k-mooney | ack | 15:27 |
jhartkopf | sean-k-mooney, gibi: Alright, just wanted to make sure :) | 15:28 |
bauzas | gibi: thanks (or someone else) for having created https://etherpad.opendev.org/p/nova-zed-blueprint-status this helps my work | 15:32 |
gibi | bauzas: I think I noted somewhere that you owe me a beer for that :) | 15:33 |
bauzas | gibi: if we were traveling to a PTG, I could have made it | 15:34 |
gibi | yeah I know :) | 15:34 |
bauzas | but, I can only give you a e-beer | 15:34 |
*** artom__ is now known as artom | 15:34 | |
* bauzas discovered this the day before he moved to Corsica | 15:34 | |
gibi | bauzas: we can take that beer in Vancouver then | 15:34 |
bauzas | we need to progress then on sustainability efforts as I would like to present those :) | 15:35 |
bauzas | need then* | 15:35 |
bauzas | reminder : nova meeting in 15 mins here | 15:46 |
gibi | stephenfin, sean-k-mooney: I prepared a respin of the PCI series to fix two issues sean-k-mooney spotted in the Create RequestGroups from InstancePCIRequests and in Support resource_class and traits in PCI alias. Should I push it now or hold on while you are reviewing? | 15:52 |
dansmith | anybody seen this yet? https://zuul.opendev.org/t/openstack/build/5088531548e2497f8a59ddd8c8b89de8 | 15:53 |
dansmith | requirements blocking job setup | 15:53 |
dansmith | I don't see a lot of other fails, so either it's unstable or just happened | 15:54 |
dansmith | but found it in a manilla job 7 mins ago | 15:54 |
* dansmith moves to -qa | 15:56 | |
sean-k-mooney | no neither have had a releas erecnetly and both are 3.7+ | 15:57 |
gibi | yeah I haven't see such failure yet | 15:58 |
sean-k-mooney | i wonder is it stable | 15:58 |
sean-k-mooney | ususlaly you see a failure and a rety of the same package multiple times when this happens | 15:59 |
sean-k-mooney | im not seeign that | 15:59 |
bauzas | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Aug 30 16:00:15 2022 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
bauzas | hey folks, happy to see all you again | 16:00 |
gmann | o/ | 16:00 |
elodilles | o/ | 16:00 |
bauzas | ok, let's start, people will arrive | 16:01 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:02 |
bauzas | #topic Bugs (stuck/critical) | 16:02 |
bauzas | #info One Critical bug | 16:02 |
bauzas | but, | 16:02 |
bauzas | #link https://bugs.launchpad.net/nova/+bug/1986545 Just sent to the gate | 16:02 |
bauzas | so the fix should be merged soon | 16:03 |
bauzas | thanks melwitt for having worked on it | 16:03 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 7 new untriaged bugs (-1 since the last meeting) | 16:03 |
bauzas | #link https://storyboard.openstack.org/#!/project/openstack/placement 26 open stories (-1 since the last meeting) in Storyboard for Placement | 16:03 |
bauzas | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:03 |
gibi | o/ | 16:03 |
bauzas | Uggla wasn't able to attend this meeting, but... | 16:03 |
bauzas | #info bug baton is being passed to Uggla | 16:04 |
* bauzas is testing a new leadership, mwawawa | 16:04 | |
bauzas | mwahaha even | 16:04 |
gibi | :) | 16:04 |
bauzas | thanks all who looked at bugs while I was off, the progress is impressive | 16:05 |
bauzas | any bug to raise before we move on ? | 16:05 |
bauzas | guess not | 16:06 |
bauzas | #topic Gate status | 16:06 |
bauzas | #topic Gate status | 16:06 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:06 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status | 16:06 |
bauzas | #link https://zuul.openstack.org/builds?job_name=tempest-integrated-compute-centos-9-stream&project=openstack%2Fnova&pipeline=periodic-weekly Centos 9 Stream periodic job status | 16:07 |
bauzas | #link https://zuul.opendev.org/t/openstack/builds?job_name=nova-emulation&pipeline=periodic-weekly&skip=0 Emulation periodic job runs | 16:07 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:07 |
bauzas | all the job runs are green | 16:07 |
bauzas | haven't seen the gate today, but I have seen that nova-next was failing last week | 16:07 |
bauzas | thanks all who worked on fixing the gate | 16:08 |
bauzas | that being said, any gate failure recently spotted or requiring to be discussed today ? | 16:08 |
gibi | gate seems to be good given we have FF week | 16:09 |
bauzas | indeed | 16:09 |
bauzas | and what a good point for passing to the next topic | 16:10 |
bauzas | :) | 16:10 |
bauzas | #topic Release Planning | 16:10 |
bauzas | #link https://releases.openstack.org/zed/schedule.html | 16:11 |
bauzas | #info Zed-3 is in *2 days* | 16:11 |
bauzas | #link Zed tracking etherpad: https://etherpad.opendev.org/p/nova-zed-blueprint-status | 16:11 |
bauzas | #link https://etherpad.opendev.org/p/nova-zed-microversions-plan | 16:11 |
bauzas | I've looked at the blueprint status | 16:11 |
bauzas | I'll do the launchpad status cleanup later this week | 16:11 |
bauzas | to make sure it shows the right progress | 16:11 |
bauzas | correct me if I'm wrong, but I'm seeing for the moment 3 different series to focus on reviews | 16:12 |
bauzas | oh my bad, 5 | 16:13 |
bauzas | from what I've seen, there are reviewing dynamics on the userdata series | 16:13 |
gmann | for RBAC, this is last patch for documentation and releasenotes #link https://review.opendev.org/c/openstack/nova/+/854882 | 16:14 |
gibi | I think user_data + rebuild bfv can land today | 16:14 |
gmann | sean-k-mooney is waiting for gibi to review i think? | 16:14 |
bauzas | gibi: and they have a proper microversion, so no conflict is expected besided a small one | 16:14 |
gibi | gmann: ohh, I thought I pushed +A htere | 16:14 |
gibi | gmann: I will fix it right away | 16:14 |
bauzas | besides* | 16:14 |
gmann | gibi: thanks | 16:15 |
bauzas | I'll put my attention to the open reviews tomorrow morning | 16:15 |
gibi | gmann: done | 16:15 |
gmann | thanks again | 16:16 |
dansmith | the top of the bfv one, | 16:16 |
bauzas | fun fact is that I'm asked to write our Zed cycle highlights while app. 50% of the open blueprints are still under review | 16:16 |
bauzas | poke elodilles ;) | 16:16 |
dansmith | just hit a pypi indexing fail, so it will have to be rechecked when it finishes | 16:16 |
sean-k-mooney | gmann: yep was going to appove it if it was not apporve by my end of day by gibi | 16:16 |
* elodilles is shocked :-o | 16:16 | |
bauzas | do people need a group discussion here at the meeting for any of those series ? | 16:16 |
bauzas | or can we continue to review off the meeting ? | 16:17 |
gmann | sean-k-mooney: thanks. | 16:17 |
sean-k-mooney | i think we can continue off meeting for the most part | 16:17 |
bauzas | cool | 16:17 |
bauzas | #topic Review priorities | 16:18 |
bauzas | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) | 16:18 |
bauzas | most of them are for bugs | 16:18 |
bauzas | I'll look at them after FF | 16:19 |
bauzas | for the two series having a review-prio flag, they are already under great reviewing attention and close to be merged | 16:19 |
bauzas | #topic Stable Branches | 16:20 |
bauzas | elodilles: want to continue ? | 16:20 |
elodilles | yepp, thanks | 16:20 |
elodilles | though, since we are around FF not so much happening around stable branches | 16:21 |
elodilles | #info stable/stein (and older) are blocked: grenade and other devstack based jobs fail with the same timeout issue as stable/train was previously | 16:21 |
elodilles | #info newer branches should be OK | 16:21 |
elodilles | #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci | 16:21 |
elodilles | that's it | 16:22 |
bauzas | thanks | 16:22 |
bauzas | anything to point out ? | 16:23 |
bauzas | if not, we can wrap it up | 16:23 |
elodilles | nothing else from me | 16:23 |
bauzas | ok, last but not the least | 16:24 |
bauzas | #topic Open discussion | 16:24 |
bauzas | nothing is on the agenda | 16:24 |
bauzas | and we're 2 days from FF | 16:24 |
bauzas | anything urgent to raise ? | 16:24 |
bauzas | I maybe have one | 16:24 |
bauzas | this is premature and I don't have the full picture in my mind about the current whole Zed status, but do people feel we should grant some exceptions ? | 16:25 |
bauzas | proactively I mean | 16:25 |
bauzas | not which ones | 16:25 |
bauzas | but should we all agree on discussing this if needed ? | 16:25 |
gibi | we have 2 weeks between FF and RC! | 16:26 |
gibi | RC1 | 16:26 |
sean-k-mooney | realisticaly i think if we were too i woudl only extend until the next meeting | 16:26 |
gibi | yeah | 16:27 |
bauzas | yup, me too | 16:27 |
bauzas | if you don't mind, let's just say we're not against that, provided at least one core agreed a series | 16:27 |
gibi | I want to believe that we can land the PCI series but it is a lot of patches | 16:27 |
sean-k-mooney | maybe revisit on thurday | 16:27 |
bauzas | and provided we're able to merge till next week | 16:27 |
sean-k-mooney | gibi: i think we can land enough of it to be useful | 16:27 |
bauzas | sean-k-mooney: correct, that was my point | 16:27 |
sean-k-mooney | unsure if all of it | 16:27 |
bauzas | we're just on a meeting | 16:28 |
bauzas | ok, looks we're in a consensus | 16:28 |
gibi | sean-k-mooney: yeah I feel the same but I would like to aim for all :D | 16:28 |
sean-k-mooney | me too | 16:28 |
bauzas | we just leave the room for this, but we'll anyway consider this on Thursday on a case-by-case basis | 16:28 |
bauzas | that's all I wanted | 16:28 |
sean-k-mooney | +1 | 16:28 |
bauzas | that's it for me then | 16:29 |
bauzas | unless someone else has something to raise, we can close | 16:29 |
sean-k-mooney | just one other thing to note dansmith noticed a possible pypi cnd issue that may block things merging for a bit | 16:29 |
sean-k-mooney | hopefuly it will resolve itslef shortly | 16:29 |
gibi | fingers crossed | 16:29 |
bauzas | shit, ok | 16:30 |
* bauzas will send hugs to the pypi operators | 16:30 | |
bauzas | anyway, thanks all | 16:31 |
sean-k-mooney | there is nothing we can do really so no need to worry right now | 16:31 |
sean-k-mooney | o/ | 16:31 |
bauzas | #endmeeting | 16:31 |
opendevmeet | Meeting ended Tue Aug 30 16:31:17 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:31 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2022/nova.2022-08-30-16.00.html | 16:31 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2022/nova.2022-08-30-16.00.txt | 16:31 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2022/nova.2022-08-30-16.00.log.html | 16:31 |
elodilles | o/ | 16:31 |
sean-k-mooney | elodilles: before you go | 16:32 |
sean-k-mooney | elodilles: have you seen https://9f27113a1a10a64b577d-d92e3f8cc209d4a3d1e66263399702fb.ssl.cf1.rackcdn.com/855022/3/check/openstack-tox-py39/62146ba/testr_results.html | 16:32 |
sean-k-mooney | refernce and actual look pretty identical to me | 16:33 |
sean-k-mooney | and it passed on py36 | 16:33 |
sean-k-mooney | i was going to try unning that locally to confirm but just wondering if i shoudl just recheck if it passes | 16:33 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Create RequestGroups from InstancePCIRequests https://review.opendev.org/c/openstack/nova/+/852771 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support resource_class and traits in PCI alias https://review.opendev.org/c/openstack/nova/+/853316 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Split PCI pools per PF https://review.opendev.org/c/openstack/nova/+/854440 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Map PCI pools to RP UUIDs https://review.opendev.org/c/openstack/nova/+/854118 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Make allocation candidates available for scheduler filters https://review.opendev.org/c/openstack/nova/+/854119 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Filter PCI pools based on Placement allocation https://review.opendev.org/c/openstack/nova/+/854120 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Factor out base class for candidate aware filters https://review.opendev.org/c/openstack/nova/+/854929 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Store allocated RP in InstancePCIRequest https://review.opendev.org/c/openstack/nova/+/854121 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Func test for PCI in placement scheduling https://review.opendev.org/c/openstack/nova/+/854122 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support cold migrate and resize with PCI tracking in placement https://review.opendev.org/c/openstack/nova/+/854247 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support evacuate with PCI in placement https://review.opendev.org/c/openstack/nova/+/854615 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support unshelve with PCI in placement https://review.opendev.org/c/openstack/nova/+/854616 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Support same host resize with PCI in placement https://review.opendev.org/c/openstack/nova/+/854441 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Test reschedule with PCI in placement https://review.opendev.org/c/openstack/nova/+/854626 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Test multi create with PCI in placement https://review.opendev.org/c/openstack/nova/+/854663 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Allow enabling PCI scheduling in Placement https://review.opendev.org/c/openstack/nova/+/854924 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Follow up for the PCI in placement series https://review.opendev.org/c/openstack/nova/+/855185 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Doc follow up for PCI in placement https://review.opendev.org/c/openstack/nova/+/855186 | 16:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Drop InstanceEventFixture https://review.opendev.org/c/openstack/nova/+/855262 | 16:36 |
sean-k-mooney | elodilles: ya those all passed locally with py39 | 16:37 |
sean-k-mooney | ill run them a few times but i think that just an intermitent failrue perhaps | 16:37 |
gibi | dansmith: if you could +A https://review.opendev.org/c/openstack/nova/+/816157 then we could start landing the rebuild bfv too | 16:38 |
gibi | I go get some fresh air but I will check back later to see if my +2 is needed somewhere | 16:42 |
dansmith | gibi: I said above, but I haven't reviewed that one at all | 16:55 |
dansmith | and I think melwitt did, so probably best to let her ack it | 16:55 |
dansmith | let's give her a few hours but if she doesn't pop up today maybe I can review it enough to ack it | 16:56 |
elodilles | sean-k-mooney: hmmmm, looks strange. the difference is some alignment on the title of the tables. thanks, i'll try to look into it tomorrow. | 16:56 |
sean-k-mooney | they passed locally so ill recheck the first patch and see | 17:07 |
gmann | bauzas: just a heads up, your PTL nomination patch still showing WIP/merge conflict, may be you need to reabase on master ? https://review.opendev.org/c/openstack/election/+/852630 | 17:12 |
sean-k-mooney | im going to go get somethign to eat i might do some reviews later | 17:19 |
gibi | dansmith: ack | 17:29 |
dansmith | gibi: ah crap, I dunno why my delete of that instance event file didn't work | 18:30 |
dansmith | ah, my over-use of shell macros I bet | 18:32 |
opendevreview | Dan Smith proposed openstack/nova master: Add API support for rebuilding BFV instances https://review.opendev.org/c/openstack/nova/+/830883 | 18:32 |
dansmith | gibi: as you said, we might as well keep working on the top patch until it gets closer to the bottom, so ^ | 18:32 |
gibi | dansmith: sure, I abandoned mine | 18:46 |
gibi | and put the +2 back to the top of bfv | 18:46 |
dansmith | gibi: I'm about to commit some comments on the user_data patch, | 18:55 |
dansmith | can you have a look? | 18:55 |
dansmith | gibi: specifically this: https://review.opendev.org/c/openstack/nova/+/816157/comments/f31f7593_ff5dd9fc | 18:57 |
dansmith | isn't this just using instance sysmeta to avoid an RPC bump? | 18:58 |
dansmith | and thus it's side-stepping any RPC or object versioning we'd normally have for something like this? | 18:58 |
dansmith | rebuild is basically growing a new feature, and we need to pass it a flag, | 18:58 |
dansmith | but instead of the flag and version bump, we're stashing it in instance sysmeta | 18:58 |
dansmith | which means we can't fail the call with "sorry we're pinned to RPC 6.1 so we can't do that right now" | 18:59 |
dansmith | sean-k-mooney: ^ | 18:59 |
sean-k-mooney[m] | the orginal idea was to allow updating the user data and then regenerate the config drive the next time we reboot the instance | 19:01 |
dansmith | but that's not how it's implemented now right? | 19:01 |
sean-k-mooney[m] | although i think we changed our mind and blocked update with config drive | 19:01 |
dansmith | right | 19:01 |
sean-k-mooney[m] | so im not sure why we are blocking it now | 19:02 |
sean-k-mooney[m] | but that is why it was done that way | 19:02 |
dansmith | why does hard reboot not just always regenerate config drive? | 19:02 |
sean-k-mooney[m] | no | 19:02 |
sean-k-mooney[m] | it does not do it today at all | 19:02 |
dansmith | I'm saying.. why not just make it regenerate always | 19:03 |
dansmith | instead of the dirty flag | 19:03 |
dansmith | then you get to say it's best-effort, based on compute and virt support for doing so | 19:03 |
sean-k-mooney[m] | that would need to pull the data form the db but i guess we could | 19:03 |
dansmith | so? | 19:04 |
sean-k-mooney[m] | just saying that the side effect | 19:04 |
dansmith | the way it is right now, you've basically created a shadow RPC interface with no versioning which also accumulates in the DB | 19:04 |
sean-k-mooney[m] | i think we wanted to aovid that | 19:04 |
sean-k-mooney[m] | im trying to think if there is any other downside to always doing it | 19:05 |
sean-k-mooney[m] | we added a trait to signel if the backend supprots regenerating it | 19:05 |
dansmith | yeah, which also requires that we ask placement if *we* support a thing, which seems kinda odd :) | 19:06 |
sean-k-mooney[m] | well its a compute capablity trait | 19:06 |
sean-k-mooney[m] | we have several like that | 19:06 |
dansmith | but we don't have to check placement for that right? | 19:06 |
sean-k-mooney[m] | am i dont think its in the api db so normally i think we do check placement | 19:07 |
dansmith | the ones that we use for scheduler filtering make sense of course, but I thought we wrote them somewhere we could get at them ourselves | 19:07 |
dansmith | anyway | 19:08 |
dansmith | the shadow RPC interface seems much worse to me | 19:08 |
sean-k-mooney[m] | i dont thikn its in the compute nodes table so im not sure where they would be | 19:08 |
sean-k-mooney[m] | i guess we were not really thinking of it as a rpc interface | 19:09 |
sean-k-mooney[m] | just some metadta on the instance | 19:09 |
sean-k-mooney[m] | but i see your point | 19:09 |
dansmith | well the test is, that if you ran this under grenade, | 19:09 |
dansmith | you'd allow the reboot with the new user data, but it wouldn't get honored | 19:09 |
sean-k-mooney[m] | for an un upgraded compute | 19:09 |
sean-k-mooney[m] | hum | 19:10 |
sean-k-mooney[m] | ya your right | 19:10 |
dansmith | so you go to a lot of work to return a fail to the API call if it's not honor-able, but then you'll quietly say "got it" and send it off to a compute that will ignore it :) | 19:10 |
sean-k-mooney[m] | so we would also neeed a compute service bump | 19:10 |
sean-k-mooney[m] | and min version check | 19:10 |
dansmith | not really, | 19:10 |
dansmith | the rpc pinning will handle that for you | 19:10 |
sean-k-mooney[m] | if we changed the hard reboot api with a new paramter | 19:11 |
dansmith | the auto pin will stick to the minimum supported version, and the rpcapi.py will raise if it can't send at v6.1 so you can error the api call | 19:11 |
dansmith | right | 19:11 |
sean-k-mooney[m] | or always rebuit as you said | 19:11 |
dansmith | if we always rebuild, then we need a service version and check, | 19:11 |
dansmith | because then you're assuming the compute will do a thing that it might not | 19:11 |
dansmith | letting rpc handle it is simpler and more direct | 19:11 |
sean-k-mooney[m] | yep that why i was thinink the check initally | 19:11 |
sean-k-mooney[m] | ya fair point | 19:12 |
sean-k-mooney[m] | how do you want to proceed | 19:12 |
dansmith | I dunno, it sucks that this is going to bump the bfv one too because it already touches rpc :/ | 19:12 |
dansmith | but this also seems very wrong | 19:12 |
dansmith | I can probably bang out the RPC change pretty quick | 19:13 |
dansmith | but it might push either of these into FFE territory, | 19:13 |
sean-k-mooney[m] | i feel like this is not the first time we have done it this way. but that does not me it was right before | 19:13 |
dansmith | well, I hope we haven't done this before | 19:14 |
dansmith | this is the whole reason we have all the rpc, object, and service version plumbing | 19:14 |
dansmith | for this exact sort of thing | 19:14 |
sean-k-mooney[m] | not via system metadata | 19:14 |
sean-k-mooney[m] | im actully thinking about a bug fix we did | 19:14 |
sean-k-mooney[m] | where we needed to fix it and backport | 19:15 |
sean-k-mooney[m] | but that was becausse we could not change the rpc | 19:15 |
sean-k-mooney[m] | i dont think we have done this in a new feature | 19:15 |
dansmith | well, it kinda depends on the mechanics as to whether or not that was okay or not | 19:15 |
dansmith | this is basically a new flag to an RPC call, so side-stepping it is pretty bad | 19:16 |
dansmith | if it's the other way around, like compute decorating an instance so the controllers can see something, that's a bit different | 19:16 |
dansmith | I'm guessing there's no tempest test for this and thus no hope that even luck would get us a test fail on a grenade job? | 19:17 |
sean-k-mooney[m] | we stashed a flag in the port porfile in the migration vif object https://github.com/openstack/nova/blob/master/nova/objects/migrate_data.py#L81-L92 | 19:18 |
dansmith | I mean, it'll clearly break, I just mean to demonstrate it | 19:18 |
dansmith | because if we did, this couldn't even merge | 19:18 |
sean-k-mooney[m] | ther is no tempest test for this no | 19:18 |
dansmith | that seems to me to be the "should we merge this as-is".. if we can easily write a valid tempest test that would fail on the grenade job, that's pretty bad | 19:19 |
sean-k-mooney[m] | i dont think it would be that hard to test this although i have not really done much with tempest. lets see if there is a simiar test we can modify | 19:20 |
*** dasm is now known as dasm|off | 19:21 | |
dansmith | no, | 19:21 |
dansmith | I'n not saying we should waste time on a test | 19:21 |
dansmith | I'm saying we *could* easily write a legit test that *will* fail on a grenade job because this is broken | 19:21 |
dansmith | that should be our *semantic test* for "is this something we should merge" | 19:22 |
dansmith | sean-k-mooney[m]: this also has a bit of a race sort of condition in it | 19:23 |
dansmith | if I do a hard reboot with new user data, | 19:23 |
sean-k-mooney[m] | ah right i think you confinced me already that we should not merge it as is | 19:23 |
dansmith | the API will write the user data to the instance record, then go to make the rpc call | 19:23 |
dansmith | if rabbit is down, or the compute is down, the rpc call will fail, | 19:23 |
dansmith | well, I guess it will still see the dirty flag in that case, but only if they do another hard reboot | 19:24 |
dansmith | point being, the pre-update of the user data is presumptuous | 19:24 |
sean-k-mooney[m] | yes | 19:24 |
dansmith | not easy to resolve that at this point though | 19:24 |
sean-k-mooney[m] | the dirty flag was intneded to say regenerate the config drive the next time you hard reboot | 19:25 |
sean-k-mooney[m] | which made sense when the intent was to allow update with config drive | 19:25 |
dansmith | another interesting wrinkle in that case, | 19:25 |
dansmith | is that we'll set that flag even if the compute is way too old, | 19:25 |
sean-k-mooney[m] | since it was lazy | 19:25 |
dansmith | then they reboot, nothing happens.. then 6mo later when the operator upgrades that compute .. poof, user data changes | 19:26 |
sean-k-mooney[m] | yes but when the compute is eventually updated it will regenerate the config on the next reboot | 19:26 |
dansmith | right, but that could be many months later | 19:27 |
dansmith | which could be quite confusing for someone, | 19:27 |
dansmith | especially if they wrote something to test, tried to set it via hard reboot, it seemed to allow it but never worked, shrugged and went off, | 19:28 |
dansmith | then 6mo later it changes suddenly... | 19:28 |
dansmith | that user_data script might have been untested, they couldn't test it, so they assumed no harm | 19:28 |
sean-k-mooney[m] | it could. so orginally the problem of how/when to regnerate the conf dirver came up a few months ago after the spec review | 19:29 |
dansmith | I upgraded my +0 to -1: https://review.opendev.org/c/openstack/nova/+/816157 | 19:29 |
sean-k-mooney[m] | and at the time the ideia of the dirity flag was disucsed to allow this | 19:29 |
sean-k-mooney[m] | its obvious we should have intoduced the rpc bump now | 19:29 |
dansmith | yeah I understand the "lazy do it later" aspect, but if it's broken and doesn't get honored right away, that "later" being a half year or more seems like more harm than good | 19:30 |
sean-k-mooney[m] | but orgianlly i tought that lazy rebuild was actully desirebale | 19:30 |
sean-k-mooney[m] | ack | 19:30 |
dansmith | lazy is good as long as it's not TOO LAZY :P | 19:30 |
sean-k-mooney[m] | the rebuild series has both a compute service bump and rpc bump right | 19:31 |
dansmith | yes | 19:32 |
sean-k-mooney[m] | so eihter way it will have to be updated when this is adressed | 19:32 |
dansmith | yes | 19:32 |
sean-k-mooney[m] | so 6.1 would be for update_userdata and 6.2 for rebuild | 19:32 |
dansmith | yeah | 19:33 |
dansmith | are you thinking be sneaky and combine? I dunno how I feel about that | 19:33 |
sean-k-mooney[m] | no | 19:33 |
sean-k-mooney[m] | 6.1 for extending hard_reboot | 19:33 |
sean-k-mooney[m] | and 6.2 for extending rebuild | 19:34 |
sean-k-mooney[m] | not cominign both into 6.1 | 19:34 |
sean-k-mooney[m] | since there in different patches that would be odd | 19:34 |
dansmith | ack, yep, that's what i'll be, and of course service versions for each bump | 19:34 |
dansmith | *it'll | 19:34 |
sean-k-mooney[m] | and we should not squash them obviously since the patches are unrelated | 19:34 |
dansmith | no, I just thought you were about to "get creative" :) | 19:35 |
sean-k-mooney[m] | hehe not with correctness | 19:35 |
sean-k-mooney[m] | out side of a bug fix… | 19:35 |
sean-k-mooney[m] | jhartkopf: ^ | 19:36 |
sean-k-mooney[m] | not sure you were following that | 19:37 |
dansmith | not online, AFAICT | 19:38 |
dansmith | ugh, | 19:39 |
dansmith | all of the virt and rpc stuff that the rebuild patches touch will conflict out | 19:39 |
dansmith | I so wish luck had landed these in opposite order | 19:39 |
sean-k-mooney[m] | by the way apprently the matix bridge does not actullly only show you the currently online people | 19:46 |
sean-k-mooney[m] | which is why jhartkopf auto completed for me even if they are not here | 19:47 |
dansmith | sean-k-mooney[m]: https://termbin.com/q3ev | 19:51 |
dansmith | I think that's basically what needs to happen | 19:51 |
dansmith | I wish we could get a read on this from melwitt and gibi so I should know if I should spend my evening trying to get this all changed and tested | 19:52 |
sean-k-mooney[m] | ignoring your base64 nits that looks about right | 19:53 |
dansmith | it also seems like the tests on this are pretty lacking, no? | 19:54 |
dansmith | like, there are no tests on the virt driver changes? | 19:55 |
sean-k-mooney[m] | im currently trying to set up a devstack env i dont currently have one to test it. we do still have 2 days for code frezee | 19:55 |
dansmith | like, no test that I see that actually checks that the libvirt driver will honor the flag | 19:55 |
opendevreview | Rico Lin proposed openstack/nova master: Add traits for viommu model https://review.opendev.org/c/openstack/nova/+/844507 | 19:56 |
gibi | I'm not sure I have enough brain left today for this. But I have at least two comments to the above 1) I don't think this is a rebuild feature it is tight to hard reboot 2) I think we prevented the possible upgrade issue with the capability trait. | 19:56 |
ricolin | sean-k-mooney[m]: just update the viommu traits patch :) | 19:56 |
gibi | I cannot argue that this can be done differently | 19:56 |
gibi | and dansmith you are right that the system metadata dependency makes it at least a grey interface | 19:57 |
gibi | it is sad that we figured out this issue late in the cycle | 19:57 |
dansmith | gibi: I don't understand what you mean by #1, but yeah I guess you're right on the trait.. that's preeety thin though :) | 19:58 |
gibi | #1 is probably just a missunderstanding from | 19:59 |
gibi | 20:58 < dansmith> rebuild is basically growing a new feature, and we need to pass it a flag, | 19:59 |
sean-k-mooney[m] | the trait wont be reported on a non upgraded compute yes | 20:00 |
sean-k-mooney[m] | which might help for the upgrade chase specifically | 20:00 |
dansmith | gibi: oh yeah I meant reboot there sorry | 20:00 |
gibi | on the flag itself. The RPC flag vs the persisted field has some semantic difference. If we update the user_data in the DB in the API layer then pass a flag to regenerate the config drive via the RPC then a lost RPC means that the DB data and the config data got out of sync | 20:00 |
dansmith | gibi: but we can and should revert if we're reporting failure to the user | 20:01 |
sean-k-mooney[m] | so im not sure if we should revert | 20:01 |
dansmith | because if this happens because of the lack of a trait or version, | 20:01 |
gibi | the reboot RPC is a cast | 20:01 |
gibi | so if the RPC lost the API wont notice | 20:01 |
sean-k-mooney[m] | the reason for that is todya id we update instnace metadata | 20:01 |
dansmith | then it will pop into being in six months and be very confusing | 20:01 |
sean-k-mooney[m] | we dont update the config drive | 20:01 |
sean-k-mooney[m] | unless you do a cross cell migration | 20:02 |
sean-k-mooney[m] | so you can have a delta between the metadta api and the config drive today | 20:02 |
melwitt | ugh, my irc client had "froze" not receiving new messages for only this network and I didn't realize it until now. had to close and reopen the client to receive and send messages | 20:02 |
dansmith | gibi: ack, not for a version conflict, but for an actual lost RPC we'd get out of sync.. I'm not sure if that's better or worse than queuing an update for six months later on a different version of the software, but fair point | 20:03 |
gibi | yeah, both case seems problematic | 20:03 |
dansmith | indeed | 20:03 |
melwitt | I just skimmed through yalls review comments from today a little while ago and don't have a handle yet on what's going on. I will read further and add a comment once I understand it | 20:03 |
dansmith | I guess the trait eliminates the acute concern of this being actually broken, but I'm still concerned about setting the precedent for shadow RPC interfaces in metadata, even if protected by a flag like that | 20:04 |
gibi | could we do both the DB update and the config driver regeneration from the nova-compute service? | 20:04 |
sean-k-mooney[m] | do we consider the user_data to be higher imporantce to be updated then other info in the config drive | 20:04 |
dansmith | it's what we have versioning for and how we do math about "can we do this now or not" | 20:04 |
sean-k-mooney[m] | that we dont regenerate today | 20:04 |
dansmith | gibi: well my first thought was not a flag, but pass the user data to the reboot call, and let it update it | 20:05 |
dansmith | gibi: that would be much better all around | 20:05 |
dansmith | I need to look at the potential size limit though | 20:05 |
sean-k-mooney[m] | like if you attch a volume/interface or update server metadata that wont get updated in the config drive today | 20:05 |
gibi | ahh yeah, it is a blob | 20:05 |
dansmith | if you can pass a MiB that would be bad... | 20:05 |
sean-k-mooney[m] | its 64k i think | 20:06 |
* dansmith notes that he said "I wish melwitt and gibi were around" .. and then it happened ;P | 20:06 | |
dansmith | sean-k-mooney[m]: is it? | 20:06 |
sean-k-mooney[m] | its large yes | 20:06 |
dansmith | we might want to make that bigger though at some point, so expecting to put that into an rpc message might be a bad idea long-term | 20:06 |
melwitt | it was a bit of a coincidence :) I saw sean's comment on the userdata review come in email and they said "I talked to dan" but I didn't see any talking to dan in the channel. that's when I realized my client was messed up | 20:07 |
dansmith | aha | 20:07 |
gibi | I need to drop for the night. I'm fine pulling user_data out of the release while we design it better. I just whish we can somehow avoid in the future push contributors into a desgin dead end and then pulling the rug out at FF. | 20:08 |
dansmith | I know the feeling because I was arguing that we not do that for bfv rebuild either | 20:08 |
gibi | yeah | 20:09 |
dansmith | and I noted in my comment that (a) I know the implication and (b) I'm willing to scramble on the work | 20:09 |
melwitt | so I disconnected and reconnected the network and I saw yalls comments rolling in. but when I sent messages there was no acknowledgement, so I checked the irc logs and my messages weren't there. so I had to escalate to a full quit/start of my client. and now it's working 🙄 | 20:09 |
gibi | I can look at the patch / comments tomorrow morning. But now I drop. See you tomorrow | 20:09 |
dansmith | alright | 20:09 |
gibi | o/ | 20:09 |
dansmith | sean-k-mooney[m]: how about this: | 20:09 |
dansmith | sean-k-mooney[m]: how about we let this land as it is because theoretically the trait should catch it, and we convert to an RPC interface after BFV set and before the release | 20:10 |
dansmith | that won't be a behavioral change since it *should* be catching it now | 20:10 |
dansmith | if someone is deploying on master within a two week window they could have some sysmeta cruft, but highly unlikely | 20:10 |
sean-k-mooney[m] | ack we can likely get that working by the end of the week | 20:11 |
sean-k-mooney[m] | as part of the follow up patch once bfv is landed | 20:11 |
dansmith | it's mostly what I just wrote, but 6.2 | 20:11 |
sean-k-mooney[m] | yep | 20:11 |
dansmith | but I also think that this is missing a lot of testing it should have | 20:12 |
sean-k-mooney[m] | looking at it again you are right | 20:12 |
dansmith | has anyone other than the author tried this on a real devstack? with configdrive? | 20:12 |
sean-k-mooney[m] | no i was going to see if i could do that tonight but i might just do that torrow at this point | 20:12 |
dansmith | also, in my defense, gibi *did* ask me to review this :) | 20:12 |
dansmith | and I *did* try to punt to melwitt | 20:13 |
dansmith | and melwitt *did* sabotage her irc client so she "didn't see that" | 20:13 |
sean-k-mooney[m] | i was going to see if i could create a tempest test for this althogh im not sure how to force the vm too boot on the un upgraded node for grenade | 20:13 |
dansmith | yeah you can't really, so you have to boot two and hit both I think | 20:14 |
dansmith | but at least it would non-deterministically fail | 20:14 |
sean-k-mooney[m] | oh with the anti affintiy filter | 20:14 |
dansmith | melwitt: are you caught up yet, enough to grok that ^ plan? | 20:15 |
melwitt | dansmith: yeah I think so | 20:16 |
dansmith | and what say ye? | 20:16 |
melwitt | the plan sounds like a good compromise | 20:17 |
sean-k-mooney[m] | we can likely sync with the autour tomrrow but i can set this up and test it tomorrow in anycase and perhaps look at more testing | 20:18 |
sean-k-mooney[m] | so see if we can harden this and unblock bfv | 20:19 |
dansmith | I just un-1'd it with a writeup | 20:20 |
dansmith | sean-k-mooney[m]: are you saying that because you want to test it before it merges, given it has no real testing now? | 20:20 |
dansmith | or because you expect some change to this again? | 20:20 |
sean-k-mooney[m] | they were going to work on a followup patch to adress some nits | 20:21 |
sean-k-mooney[m] | and i want to test it and figure out what else they should add tests for in that | 20:21 |
dansmith | okay but we told them to do those as a follow-up right? | 20:21 |
sean-k-mooney[m] | yep | 20:21 |
dansmith | okay, well, tomorrow our runway is even shorter | 20:22 |
dansmith | so hopefully you can do that in the morning :) | 20:22 |
melwitt | sean-k-mooney[m]: something I was confused on was whether the admin password (if there is one) would be preserved across a configdrive recreate. it seemed like yes? based on userdata update during rebuild impl, but I couldn't find definitively how that works | 20:23 |
dansmith | I guess I thought you all were more confident in this with all the +2s it had | 20:23 |
sean-k-mooney[m] | i was priortising it more then i would otherwise becuase of the bfv series on top | 20:24 |
dansmith | yeah the ordering was unfortunate | 20:25 |
sean-k-mooney[m] | i should have reviewed it more closly sorry. thanks for reviewing though even if the timing is not idea its better to get this right | 20:26 |
dansmith | yep, so maybe if you test in the morning and +W it we can get the others in the queue and I'll start on the proper RPC stuff based on my pastebin above when I'm around | 20:27 |
opendevreview | Rajat Dhasmana proposed openstack/python-novaclient master: Add support to rebuild boot volume https://review.opendev.org/c/openstack/python-novaclient/+/827163 | 21:23 |
opendevreview | Rajat Dhasmana proposed openstack/python-novaclient master: Add support to rebuild boot volume 2.94 https://review.opendev.org/c/openstack/python-novaclient/+/827163 | 21:31 |
whoami-rajat | ^ rebased on top of 2.93 to avoid conflicts later | 21:33 |
whoami-rajat | sean-k-mooney[m], jfyi, you've a -2 on client patch of 2.93 https://review.opendev.org/c/openstack/python-novaclient/+/816158 | 21:34 |
sean-k-mooney[m] | oh i didnt clear that after you spun | 21:41 |
sean-k-mooney[m] | i proably wont review that tonight but ill drop the -2 sorry about that | 21:41 |
sean-k-mooney[m] | oh thats the user data one | 21:43 |
whoami-rajat | yep, you already dropped the -2 from mine some time back but without the user data one, my change can't get in :) | 21:44 |
sean-k-mooney[m] | i ment to do that when the spec was approved | 21:48 |
sean-k-mooney[m] | there is no reason your bfv sereise would not work with the lvm driver right | 21:50 |
sean-k-mooney[m] | i have a devstack deploying to test the user_data update but forgot to enable ceph | 21:50 |
sean-k-mooney[m] | but i can pull in your changes after and test them if i get time after | 21:51 |
sean-k-mooney[m] | dansmith: https://termbin.com/6502 | 23:14 |
sean-k-mooney[m] | ill try ubuntu 20.04 in the morning but on c9s the regenreation fails | 23:15 |
dansmith | ugh | 23:15 |
dansmith | I was not expecting that kind of failure | 23:15 |
dansmith | seems if we get to the mkisofs part we should be as good as otherwise | 23:16 |
sean-k-mooney[m] | not when we are just calling an exsiting funciton | 23:16 |
dansmith | so the configdrive was created properly on instance boot but failed on regenerate? | 23:16 |
sean-k-mooney[m] | ya it might be because of the format | 23:17 |
sean-k-mooney[m] | this might be trying to update the iso | 23:17 |
dansmith | oh you mean code that works if the iso doesn't exist but fails if it already does? | 23:18 |
sean-k-mooney[m] | when we use vfat that is allow but for iso it need to delete and recreate the file | 23:18 |
dansmith | right | 23:18 |
dansmith | I guess I expected it to overwrite, but maybe that's why permission denied, because qemu owns it now? | 23:18 |
sean-k-mooney[m] | ya the previous iff unly ran that code if the file did not exist | 23:18 |
dansmith | yeah | 23:18 |
sean-k-mooney[m] | https://review.opendev.org/c/openstack/nova/+/816157/15/nova/virt/libvirt/driver.py#4950 | 23:18 |
dansmith | man, if this really doesn't work (with isofs) and nobody tested until now ... :P | 23:19 |
sean-k-mooney[m] | the reboot continues | 23:19 |
sean-k-mooney[m] | the error is logged and ignored | 23:19 |
sean-k-mooney[m] | so if you dont actuly look at the logs it looks like it worked form the api | 23:19 |
dansmith | yeah, but I hope that's not why it went unnoticed | 23:19 |
dansmith | this is also why a tempest test would be good... | 23:20 |
dansmith | early in the bfv rebuild, we were doing the same.. quietly not doing the rebuild, and I wrote a test that touches a file, then does the rebuild, and asserts that it's gone.. and that pointed out that we weren't actually doing it | 23:20 |
sean-k-mooney[m] | lookign at the vfat path i think i twould repoen the file and reformat it https://github.com/openstack/nova/blob/master/nova/virt/configdrive.py#L100 | 23:22 |
sean-k-mooney[m] | although im kind fo surrpised its failing like this for iso | 23:23 |
sean-k-mooney[m] | the upper fucntion https://github.com/openstack/nova/blob/master/nova/virt/configdrive.py#L137 | 23:23 |
dansmith | I gotta run in a minute | 23:23 |
sean-k-mooney[m] | oh never mind | 23:23 |
dansmith | I thought you were going to do this tomorrow? | 23:23 |
sean-k-mooney[m] | yep | 23:24 |
sean-k-mooney[m] | so the uper function creates a temp dir with teh metadata files and then turns that into an iso using the final pat as the output | 23:24 |
sean-k-mooney[m] | i tought it was going to do it in the temp dir and move it | 23:24 |
sean-k-mooney[m] | so ya if qemu has a lock on that file it will cause a permission denined | 23:25 |
dansmith | well, I think libvirt chowns it before boot doesn't it? | 23:25 |
dansmith | I was thinking ownership not lock | 23:25 |
sean-k-mooney[m] | the issue is this is happening at the wrong time | 23:27 |
sean-k-mooney[m] | the vm is still runing | 23:27 |
sean-k-mooney[m] | it need to happen when the vm is stopped | 23:27 |
dansmith | oh right, before the destroyed message | 23:27 |
dansmith | if this really needs that level of care, I'm going to recommend we swap the order of the bfv stuff | 23:28 |
sean-k-mooney[m] | right now its at the top of hard reboot https://review.opendev.org/c/openstack/nova/+/816157/15/nova/virt/libvirt/driver.py#3887 | 23:28 |
dansmith | yeah, before destroy | 23:28 |
sean-k-mooney[m] | ya so i dont object to swaping the order given the issues with the user data patch as is | 23:29 |
sean-k-mooney[m] | anyway time for me to go sleep o/ | 23:29 |
dansmith | I guess if we're about to destroy, it's not *as* bad that we write to it before, but we could be competing with writes from the guest in the vfat case | 23:29 |
dansmith | still, wrong as you note | 23:29 |
dansmith | ack, thanks for testing, g'nite | 23:29 |
sean-k-mooney[m] | by the way i tested the nova-client and osc change too which do work altough im going to ask for the osc change to take a user-data file instead of a sting or atleast have that as an option | 23:32 |
sean-k-mooney[m] | im pretty sure thats what we do on boot | 23:32 |
sean-k-mooney[m] | passing the bases64 encoded sting on the command line is annowing without doing $(echo "stuff" | base64) | 23:33 |
sean-k-mooney[m] | i guess thats less important if the nova part is broken | 23:34 |
dansmith | oh yeah, I hadn't even looked | 23:35 |
dansmith | passed as a CLI string makes no sense | 23:35 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!