opendevreview | Artom Lifshitz proposed openstack/nova master: POC: power up dedicated cores during pre_live_migration https://review.opendev.org/c/openstack/nova/+/909806 | 01:02 |
---|---|---|
opendevreview | Artom Lifshitz proposed openstack/nova master: POC: power up dedicated cores during pre_live_migration https://review.opendev.org/c/openstack/nova/+/909806 | 01:22 |
opendevreview | Artom Lifshitz proposed openstack/nova master: POC: power up dedicated cores during pre_live_migration https://review.opendev.org/c/openstack/nova/+/909806 | 01:24 |
opendevreview | melanie witt proposed openstack/nova master: Support create with ephemeral encryption for qcow2 https://review.opendev.org/c/openstack/nova/+/870932 | 05:18 |
opendevreview | melanie witt proposed openstack/nova master: Support (resize|cold migration) with ephemeral encryption for qcow2 https://review.opendev.org/c/openstack/nova/+/870933 | 05:18 |
opendevreview | melanie witt proposed openstack/nova master: Support live migration with ephemeral encryption for qcow2 https://review.opendev.org/c/openstack/nova/+/905512 | 05:18 |
opendevreview | melanie witt proposed openstack/nova master: Support rebuild with ephemeral encryption for qcow2 https://review.opendev.org/c/openstack/nova/+/870939 | 05:18 |
opendevreview | melanie witt proposed openstack/nova master: Support rescue with ephemeral encryption for qcow2 https://review.opendev.org/c/openstack/nova/+/873675 | 05:18 |
opendevreview | melanie witt proposed openstack/nova master: Add encryption support to qemu-img rebase https://review.opendev.org/c/openstack/nova/+/870936 | 05:18 |
opendevreview | melanie witt proposed openstack/nova master: Support snapshot with ephemeral encryption for qcow2 https://review.opendev.org/c/openstack/nova/+/870937 | 05:18 |
opendevreview | melanie witt proposed openstack/nova master: Add backing_encryption_secret_uuid to BlockDeviceMapping https://review.opendev.org/c/openstack/nova/+/907960 | 05:18 |
opendevreview | melanie witt proposed openstack/nova master: Support encrypted backing files for qcow2 https://review.opendev.org/c/openstack/nova/+/907961 | 05:18 |
opendevreview | melanie witt proposed openstack/nova master: Support cross cell resize with ephemeral encryption for qcow2 https://review.opendev.org/c/openstack/nova/+/909595 | 05:18 |
opendevreview | melanie witt proposed openstack/nova master: libvirt: Introduce support for raw with LUKS https://review.opendev.org/c/openstack/nova/+/884313 | 05:18 |
opendevreview | melanie witt proposed openstack/nova master: libvirt: Introduce support for rbd with LUKS https://review.opendev.org/c/openstack/nova/+/889912 | 05:18 |
opendevreview | Amit Uniyal proposed openstack/nova master: enforce remote console shutdown https://review.opendev.org/c/openstack/nova/+/901824 | 07:11 |
gibi | artom: sean-k-mooney: nice catch on power mgmt. should I revert https://github.com/openstack-k8s-operators/nova-operator/pull/695 ? or we can live with the limitation? | 08:30 |
*** mklejn_ is now known as mklejn | 08:34 | |
opendevreview | melanie witt proposed openstack/nova master: Add backing_encryption_secret_uuid to BlockDeviceMapping https://review.opendev.org/c/openstack/nova/+/907960 | 08:36 |
opendevreview | melanie witt proposed openstack/nova master: Support encrypted backing files for qcow2 https://review.opendev.org/c/openstack/nova/+/907961 | 08:36 |
opendevreview | melanie witt proposed openstack/nova master: Support cross cell resize with ephemeral encryption for qcow2 https://review.opendev.org/c/openstack/nova/+/909595 | 08:36 |
opendevreview | melanie witt proposed openstack/nova master: libvirt: Introduce support for raw with LUKS https://review.opendev.org/c/openstack/nova/+/884313 | 08:36 |
opendevreview | melanie witt proposed openstack/nova master: libvirt: Introduce support for rbd with LUKS https://review.opendev.org/c/openstack/nova/+/889912 | 08:36 |
noonedeadpunk | hey folks. I guess I finally realized what is the usecase of root volume detach that was bothering me and which raised discussions here and there. And that is - volume resize. As in order to resize the disk, it must be in detached state. And you can't detach root volume. | 09:17 |
tkajinam | bauzas, hi. I just noticed I made a mistake in os-vif patch while discussion in another project and submitted a fix for it. I wonder we can merge this before we create os-vif release for caracal... https://review.opendev.org/c/openstack/os-vif/+/909682 | 09:38 |
bauzas | tkajinam: I'm pretty busy those days, but I can take a look | 09:39 |
tkajinam | bauzas, thanks and sorry for bothering you. I've left a same comment in a release patch for os-vif... | 09:41 |
bauzas | tkajinam: that's cool, I'll review it today | 09:57 |
sean-k-mooney[m] | gibi: i think we can fix the edge cases and backport but we might want to keep a revirt ready incase we need it. my inclinations is it would be ok to have as a known issue in the context of a beta release but if we cant backport the bug fixes in time we might revert for ga | 10:05 |
sean-k-mooney[m] | tkajinam: i assume you would also like https://review.opendev.org/c/openstack/os-vif/+/909341 | 10:08 |
sean-k-mooney[m] | bauzas: im +2 on both the os vif changes so let include them in the os-vif release | 10:09 |
sean-k-mooney[m] | @gibi the isolate issue in particalar is a low probability of impacting people, the general live migration issue however is much more impactful. so we should try and land reneβs fix for that. | 10:12 |
sean-k-mooney[m] | gibi: looking at the second poc patch it looks like its also affecting live migration with pinned cpus | 10:17 |
sean-k-mooney[m] | that is very surprising since i tought bauzas had tested that and it was covered | 10:17 |
bauzas | I'm on a meeting | 10:18 |
bauzas | and I miss context, what are you guys discussing ? | 10:19 |
opendevreview | Doug Szumski proposed openstack/nova master: Revert "[libvirt] Live migration fails when config_drive_format=iso9660" https://review.opendev.org/c/openstack/nova/+/909122 | 10:34 |
gibi | sean-k-mooney[m]: OK i will open a revert with a hold | 10:34 |
gibi | bauzas: yet another bug in the power mgmt feature | 10:36 |
bauzas | which is ? | 10:37 |
opendevreview | Doug Szumski proposed openstack/nova master: Revert "[libvirt] Live migration fails when config_drive_format=iso9660" https://review.opendev.org/c/openstack/nova/+/909122 | 10:38 |
sean-k-mooney[m] | bauzas: there are two, livemigation is broken and emultor thread policy=isolate | 10:39 |
bauzas | ack, any bug report to provide me ? | 10:39 |
gibi | sean-k-mooney[m]: the liv migration one is not specific to power mgmt isnt it? | 10:39 |
bauzas | sean-k-mooney: fwiw, I haven't tested live-migration | 10:40 |
sean-k-mooney[m] | i tought you had | 10:40 |
sean-k-mooney[m] | but that explains why it was missed | 10:40 |
sean-k-mooney[m] | there is a spereate live migration issue that is unrelated to power management | 10:41 |
sean-k-mooney[m] | but there is also a power management live migration bug | 10:41 |
sean-k-mooney[m] | we are not turning on the cores as part of pre live migrate | 10:41 |
sean-k-mooney[m] | so thats just actully broken | 10:41 |
sean-k-mooney[m] | the related issue is when cpu_shared_set is used and we dont have a numa topology | 10:42 |
sean-k-mooney[m] | we dont update the cores of the vm for the desination | 10:42 |
sean-k-mooney[m] | we have code up for review to fix that | 10:42 |
sean-k-mooney[m] | that means when combined with mixed shared and dedicated cpus on the same host | 10:43 |
sean-k-mooney[m] | its possible ot migrate a floating instnace to a core that is offline | 10:43 |
sean-k-mooney[m] | if the cpu_shared_set and cpu_dedicated_set are not the same on all hosts | 10:43 |
sean-k-mooney[m] | or rather the source and dest host | 10:44 |
sean-k-mooney[m] | gibi: the more i think about it the more im leaning towards truning it off by default again and trying to renable it after beta instead | 10:44 |
bauzas | I think I checked that we were turning the cores on on the right internal method that's called for *any* guest start | 10:45 |
bauzas | but maybe live-migration uses another path | 10:45 |
sean-k-mooney[m] | bauzas: the vm on the dest is started by libvirt not nova | 10:45 |
bauzas | and then paused, right? | 10:45 |
sean-k-mooney[m] | correct its started in the paused state and then libvirt unpauases it when it when we swap form executing on the source to the dest | 10:46 |
sean-k-mooney[m] | so we need to power them on in pre-livemigration | 10:46 |
gibi | sean-k-mooney[m]: ack, we can land the revert and I reopen the tracker Jira | 10:46 |
bauzas | sean-k-mooney: I see then | 10:47 |
sean-k-mooney[m] | on the plus side whitebox caught htis | 10:47 |
bauzas | when is the target guest defined and paused ? | 10:47 |
sean-k-mooney[m] | on the down side we are not yet running whitebox on nova | 10:47 |
bauzas | I thought it was when we were calling qemu migrate | 10:48 |
gibi | would be nice to get bug reports i can link to. | 10:48 |
sean-k-mooney[m] | bauzas: its defiend and paused by libvirt when we call migrateToUri3 | 10:48 |
bauzas | which is not done in pre-livemigrate | 10:48 |
sean-k-mooney[m] | no | 10:48 |
bauzas | but I see the reason on turning on the core before | 10:48 |
bauzas | then in premigrate | 10:49 |
bauzas | gotcha | 10:49 |
sean-k-mooney[m] | its done after pre-livemigate | 10:49 |
sean-k-mooney[m] | but its the call to libvirt on the source host that cause libvirt to create the vm in the paused sate on the dest | 10:49 |
bauzas | yeah, same point than for the mdev things we discussed :) | 10:49 |
bauzas | bad news is that internals of live-migration was blind for me in Antelope timle | 10:50 |
bauzas | now, this is no longer the case | 10:50 |
sean-k-mooney[m] | yes its the same code paths as the mdevs | 10:50 |
gibi | https://github.com/openstack-k8s-operators/nova-operator/pull/702 revert is up | 10:50 |
sean-k-mooney[m] | by the way nova-next is blocked | 10:51 |
sean-k-mooney[m] | im going to fix that shortly | 10:51 |
sean-k-mooney[m] | we are messing with the port bidnign profile for some reason in the post hook | 10:51 |
sean-k-mooney[m] | we should not be doing that | 10:51 |
sean-k-mooney[m] | gibi: approved. im going to grab coffee and ill work on unblocking the gate when i get back | 10:54 |
opendevreview | Doug Szumski proposed openstack/nova master: Revert "[libvirt] Live migration fails when config_drive_format=iso9660" https://review.opendev.org/c/openstack/nova/+/909122 | 10:54 |
bauzas | sean-k-mooney: gibi: we could add a note in our upstream doc saying we don't support live-migration yet due to a bug | 11:04 |
gibi | I would rather fix the bugs instead | 11:19 |
sean-k-mooney | bauzas: same im not ok with doc fixes like that in general i would prefer an api block or an actual fix | 11:21 |
bauzas | cool then | 11:21 |
sean-k-mooney | bauzas: artom has started workign on a fix | 11:22 |
bauzas | gtk | 11:22 |
sean-k-mooney | so lets just help him do that and we can file an upstream bug to track it properly | 11:22 |
opendevreview | sean mooney proposed openstack/nova master: [S-RBAC] adapt nova-next for port's binding:profile field change https://review.opendev.org/c/openstack/nova/+/909859 | 12:11 |
sean-k-mooney | bauzas: gibi ^ that will fix the gate blocker | 12:11 |
opendevreview | Merged openstack/os-vif master: Drop wrong stacklevel https://review.opendev.org/c/openstack/os-vif/+/909682 | 12:29 |
gibi | +2 with some followup thinking | 12:30 |
artom | gibi, sean-k-mooney, there's still something weird going on, https://review.opendev.org/c/openstack/whitebox-tempest-plugin/+/909785 isn't fully passing | 12:31 |
artom | It's _better_ with the two POC patches and Uggla's cpu_shared_set live migration patches, but still red | 12:32 |
sean-k-mooney | gibi: we can likely refactor this to use python to set the extra atributes on the port | 12:55 |
sean-k-mooney | just not right now | 12:55 |
sean-k-mooney | we can however just drop a python file into the hooks dir that import novas service user stuff and uses nova's code to set the my_key data | 12:56 |
gibi | sean-k-mooney: ack, i definitely wont block on this | 13:04 |
sean-k-mooney | i belvie we have unit tests that make sure we dont clobber but i agree the integration test was nice ot have | 13:07 |
gibi | artom: Feb 22 03:45:17.085381 np0036833327 nova-compute[103912]: ERROR oslo_messaging.rpc.server NotImplementedError: Cannot load 'emulator_pins' in the base class | 13:08 |
sean-k-mooney | ya so that depending on a patch that is not in the patch chain i belive | 13:08 |
sean-k-mooney | gibi: i think the whitebox job is using two depend on again nova | 13:09 |
sean-k-mooney | https://review.opendev.org/c/openstack/whitebox-tempest-plugin/+/909785 | 13:09 |
sean-k-mooney | yep | 13:09 |
sean-k-mooney | or not | 13:10 |
gibi | it seem the numa_info has no emulator_pin field | 13:10 |
sean-k-mooney | i tought it might be in https://review.opendev.org/c/openstack/nova/+/877773/8 | 13:10 |
gibi | a functional test in that patch probably can reproduce this issue | 13:11 |
sean-k-mooney | the emulator thread are ment to be included in cpu_pins | 13:11 |
sean-k-mooney | gibi: one of the thing i disucssed with artom was https://github.com/openstack/nova/blob/master/nova/objects/instance_numa.py#L291-L296 | 13:12 |
sean-k-mooney | is ment to have the emulator pinning in it too | 13:13 |
sean-k-mooney | actully looking at https://github.com/openstack/nova/blob/master/nova/objects/migrate_data.py#L144 | 13:13 |
sean-k-mooney | emulator_pins is in the object | 13:14 |
sean-k-mooney | so artom is just missing an s | 13:14 |
sean-k-mooney | since it sa set since the a range | 13:14 |
sean-k-mooney | hum no the patch is correct which means its not populated? | 13:16 |
sean-k-mooney | https://github.com/openstack/nova/blob/3209f6551652cff7bef0b9d9719ab940dd05a0f8/nova/virt/libvirt/migration.py#L113-L117 | 13:16 |
sean-k-mooney | @artom i mostly aggre with this comment but i belive we dont alwasy set emulatorpin in the xml | 13:17 |
sean-k-mooney | https://github.com/openstack/nova/blob/3209f6551652cff7bef0b9d9719ab940dd05a0f8/nova/virt/libvirt/migration.py#L113-L126 | 13:17 |
artom | Yeah, I think I need an if there | 13:18 |
artom | It won't be set if there's no emulator thread policy | 13:18 |
sean-k-mooney | if emulatorpin is not found you need to default it to vcpupin | 13:18 |
sean-k-mooney | so ya you need an if here https://github.com/openstack/nova/blob/3209f6551652cff7bef0b9d9719ab940dd05a0f8/nova/virt/libvirt/migration.py#L127 | 13:19 |
sean-k-mooney | well | 13:19 |
sean-k-mooney | actully proably not there | 13:19 |
sean-k-mooney | but where we buidl teh dst_numa_info in the first place | 13:19 |
sean-k-mooney | well either works i guess | 13:19 |
sean-k-mooney | for backport reason we proably want to have _update_numa_xml have the fallback if its not set in dst_numa_info | 13:20 |
sean-k-mooney | but we should fix where we are creating the numa info to also set it properly | 13:20 |
sean-k-mooney | so add somethign like | 13:22 |
sean-k-mooney | emulator_pin = info.emulator_pins if info.emulator_pins else info.cpu_pins.values() | 13:23 |
sean-k-mooney | emulatorpin.set('cpuset',hardware.format_cpu_spec(emulator_pin)) | 13:23 |
opendevreview | Amit Uniyal proposed openstack/nova master: enforce remote console shutdown https://review.opendev.org/c/openstack/nova/+/901824 | 13:42 |
*** haleyb|out is now known as haleyb | 14:52 | |
opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Auto set heartbeat_in_pthread for wsgi services https://review.opendev.org/c/openstack/nova/+/909880 | 14:53 |
mnaser | It seems `nova-grenade-multinode` is broken in `stable/zed`. Does anyone know of anything that sticks out | 15:31 |
mnaser | https://review.opendev.org/c/openstack/nova/+/909098 we're eating up resources with folks rechecking it, we're at 3x, i think the job is deffo broken | 15:32 |
mnaser | ok, that's neutron being broken | 15:33 |
mnaser | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_39b/909098/2/check/nova-grenade-multinode/39bc664/controller/logs/screen-q-svc.txt | 15:33 |
frickler | iiuc grenade jobs should be dropped from zed now that yoga is unmaintained? | 15:34 |
dansmith | idk that we actually said that (removing grenade if the source is unmaintained) but that make sense to me | 15:43 |
clarkb | historically we've dropped grenade jobs for N+1 when N is no longer maintained | 15:43 |
clarkb | beacuse the grenade jobs start at N and upgrade to N+1 and you very quickly bitrot | 15:44 |
dansmith | you mean historically...removed right? I meant in response to the recent policy to have branches that are specifically _unmaintained_ but still around | 15:46 |
clarkb | correct historically | 15:46 |
dansmith | in the past, IIRC, we'd remove a grenade when it upgrades from a branch that gets deleted because we can't land a fix to make it installable, but it's slightly less clear when the branch exists but is in the newly-minted unmaintained state | 15:47 |
opendevreview | Artom Lifshitz proposed openstack/nova master: POC: power up dedicated cores during pre_live_migration https://review.opendev.org/c/openstack/nova/+/909806 | 15:47 |
artom | Let's see if the emulator_pins fix is enough | 15:48 |
frickler | those grenade runs seem also have been in a weird situation where the stable/yoga branches were already removed in the repos, but still present in our in-image-git-cache. but I also don't understand why n-cond seems to get started, but doesn't produce any log | 15:53 |
melwitt | I dunno if yall have seen but it looks like nova-next is at a 100% fail rate, POST_FAILURE | 15:57 |
melwitt | Deleting allocation key from the binding:profile of the bandwidth aware port | 15:57 |
melwitt | + openstack port unset --binding-profile allocation port-normal-qos | 15:57 |
melwitt | ForbiddenException: 403: Client Error for url: https://213.32.74.123:9696/networking/v2.0/ports/f5b273ab-1c98-44f6-9664-8c1a16d5abda, (rule:update_port and rule:update_port:binding:profile) is disallowed by policy | 15:57 |
melwitt | not sure what changed that could have affected policy. I'll look around | 15:58 |
sean-k-mooney | melwitt: i have a fix for it | 16:02 |
melwitt | ok thank goodness | 16:02 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/909859 | 16:02 |
sean-k-mooney | i will readd the test coverage later | 16:02 |
sean-k-mooney | tldr modifying binding_profile now needd a service token | 16:03 |
sean-k-mooney | hence the 403 | 16:03 |
melwitt | ahh | 16:03 |
melwitt | sean-k-mooney: looks like there's at least one more thing that should be removed https://review.opendev.org/c/openstack/nova/+/909859/1/gate/post_test_hook.sh#174 | 16:14 |
sean-k-mooney | oh right | 16:15 |
sean-k-mooney | yep unset woudl also not work | 16:15 |
sean-k-mooney | im conflicted if we shoudl also remove that coverage | 16:16 |
sean-k-mooney | or if i shoudl bite the bullet and try and repalce it with a python script | 16:16 |
melwitt | yeah, I wondered that too. not sure how hard it would be to get something with service token working | 16:18 |
sean-k-mooney | the my-key stuff is not really that imporant | 16:18 |
sean-k-mooney | but the unset there is a key part of that test | 16:18 |
sean-k-mooney | melwitt: well we have nova avialabe in this env | 16:18 |
sean-k-mooney | so i was hoping we could just import some nova code for hat | 16:19 |
sean-k-mooney | this is running on the contoler | 16:19 |
sean-k-mooney | so the nova.conf should have all the setting set correctly | 16:19 |
sean-k-mooney | so in theory its improt the nvoa.conf module | 16:19 |
melwitt | yeah, it would | 16:19 |
sean-k-mooney | create the neutron clinet form our clinet module | 16:19 |
sean-k-mooney | and do port update | 16:19 |
melwitt | I should know bc I added it but π | 16:19 |
melwitt | hm k | 16:20 |
bauzas | JayF: are you around ? | 16:20 |
JayF | What's up? | 16:20 |
bauzas | JayF: about https://review.opendev.org/c/openstack/nova/+/903915/ I wonder if you could provide a new revision due to sean-k-mooney's -1 ? | 16:21 |
bauzas | also, do we now have a Tempest job testing it ? | 16:21 |
JayF | Sean and I talked in depth about a testing plan earlier this week, and today and tomorrow is the time I set aside to implement it. At which point I'll re-stack all the sharding changes, make sure they pass CI, and do some manual testing | 16:22 |
JayF | We have implemented significant sharding testing in ironic for ensuring the API works properly, the second step I'm working on today is the scenario tests | 16:23 |
sean-k-mooney | cool just an fyi tomorrow is a "rechage day" at redhat meaning none of us will be here tomorow | 16:23 |
sean-k-mooney | but we can take a look on monday | 16:23 |
JayF | Yeah I think I have all the information I need based on our conversation earlier, I just need to not have my day stolen by a thousand tiny conversations | 16:23 |
melwitt | sean-k-mooney: the new_websocket_client method is what gets called when a new request to the proxy comes in, so I also dunno why that would need to be called to close a connection ... ? | 16:32 |
sean-k-mooney | auniyal: can you try https://review.opendev.org/c/openstack/nova/+/901824/18/nova/console/websocketproxy.py#154 | 16:37 |
auniyal | sean-k-mooney, ack thanks | 16:46 |
auniyal | sean-k-mooney, can we raise an Exception as invalid target token, inside | 16:48 |
auniyal | reason I am saying is https://github.com/openstack/nova/blob/master/nova/console/websocketproxy.py#L265 | 16:49 |
sean-k-mooney | that is coming from self.do_proxy | 16:50 |
sean-k-mooney | right | 16:51 |
auniyal | yes, so there some exception will be raised, like something happend in network or at client | 16:51 |
sean-k-mooney | so if we think about the reason why that might happen | 16:52 |
sean-k-mooney | the only reason i can think of is the client already disconnect before the time expired | 16:52 |
sean-k-mooney | in which case i dont think we should be raising an excption as that is an expected scenairo | 16:52 |
sean-k-mooney | so im not convinced raising an excption is correct | 16:53 |
auniyal | ack, | 16:53 |
sean-k-mooney | a debug level log woudl be fine | 16:53 |
sean-k-mooney | but do you have any other edge case in mind | 16:53 |
auniyal | okay, also if something really happend at client side https://review.opendev.org/c/openstack/nova/+/901824/18/nova/console/websocketproxy.py#294 | 16:53 |
auniyal | no, so as you said, everytime we come to close connection we will have valid tsock | 16:54 |
auniyal | ack, will remove if-else then, thanks | 16:55 |
sean-k-mooney | you have already protected agasint the socket being cloased with the OSError and the fileno being -1 | 16:56 |
sean-k-mooney | so i htink you have handled all the edgecase that are required there | 16:56 |
auniyal | ack, I'll just run tests locally and respin | 16:59 |
opendevreview | sean mooney proposed openstack/nova master: [S-RBAC] adapt nova-next for port's binding:profile field change https://review.opendev.org/c/openstack/nova/+/909859 | 16:59 |
sean-k-mooney | melwitt: by the way using blockcopy instead of blockrebase is a nice find | 17:06 |
sean-k-mooney | melwitt: it remiened me of something for future us | 17:07 |
sean-k-mooney | we currenly have some legacy code in hwo we do live snapshots to workaround some bugs with older version fo qemu/libvirt | 17:07 |
sean-k-mooney | my understanding is those issues have been fixed | 17:07 |
sean-k-mooney | so we may be able to simply that code alot in the future | 17:07 |
sean-k-mooney | by removing the legacy workaround | 17:08 |
melwitt | sean-k-mooney: yeah, I couldn't stop thinking about live snapshot not working so I asked in #virt and they advised to use blockcopy in order to provide encryption options | 17:08 |
melwitt | yeah, I think we likely could | 17:08 |
sean-k-mooney | we do this complciated dance where we start a rebase or copy jobs afte we copy a base iamge then abort it to do the live snapshot | 17:09 |
sean-k-mooney | i.e. we copy the backing file, start a jobs to sysnc the chagnes, briefly freese the guest file system and then abort | 17:09 |
sean-k-mooney | this https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3412 | 17:10 |
melwitt | right yeah | 17:11 |
sean-k-mooney | so i need to do git balame locally because github hate doing it on that file | 17:11 |
sean-k-mooney | but that is a hack and should not be required | 17:12 |
bauzas | JayF: (sorry, forgot to see your reply) sure, I'll lookup then your patch tomorrow (even if at RH, I'll work on tomorrow morning) | 17:12 |
JayF | Today and tomorrow :) So look tomorrow but don't expect until Monday | 17:12 |
opendevreview | Amit Uniyal proposed openstack/nova master: enforce remote console shutdown https://review.opendev.org/c/openstack/nova/+/901824 | 17:14 |
sean-k-mooney | melwitt: so this dates form when we first added live snapshot https://github.com/uggla/nova/commit/46de2d1e2d0abd6fdcd4da13facaf3225c721f5e | 17:16 |
melwitt | quite a bit ago :) | 17:17 |
sean-k-mooney | yep so the simpeler ways of doign this were not "bug-free" in qemu 1.3 | 17:18 |
sean-k-mooney | but we havng used that in half a decade or more | 17:19 |
melwitt | yeah, agree it would be good to clean that up | 17:21 |
sean-k-mooney | based on https://libvirt.org/kbase/domainstatecapture.html we have a few options | 17:25 |
sean-k-mooney | i would hope we could use the direct backup functionaltiy or similar | 17:27 |
sean-k-mooney | something like this https://libvirt.org/formatbackup.html#examples | 17:28 |
melwitt | ah, ok | 17:29 |
sean-k-mooney | the thing is i dont know if any of that will work with encyption | 17:30 |
sean-k-mooney | if it does not virDomainSnapshotCreateXML may or we might just need to keep what we are doing | 17:31 |
sean-k-mooney | but it wooul dbe a good conversation to have with the virt folks about | 17:31 |
sean-k-mooney | "if we were ot do it from scratch today what would you recommend" | 17:32 |
melwitt | yeah. I think they mentioned the real snapshot API would be the better way to do it (with encryption) | 17:32 |
sean-k-mooney | this one https://libvirt.org/html/libvirt-libvirt-domain-snapshot.html#virDomainSnapshotCreateXML | 17:33 |
sean-k-mooney | ya qemu can aslo do memory snapshots and other fancy thigns via that | 17:34 |
sean-k-mooney | we can pass flags=VIR_DOMAIN_SNAPSHOT_CREATE_LIVE|VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY | 17:34 |
sean-k-mooney | and there is VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT | 17:35 |
sean-k-mooney | which alluse you ot resue a precreate file i think | 17:35 |
melwitt | yeah that's right | 17:36 |
sean-k-mooney | anway that a long way to say im +1 on the bockcopy usage and we may be able to simplfy later | 17:39 |
melwitt | ack ++ :) | 17:40 |
opendevreview | Dan Smith proposed openstack/nova master: Catch ImageNotFound on snapshot failure https://review.opendev.org/c/openstack/nova/+/905316 | 19:22 |
opendevreview | Dan Smith proposed openstack/nova master: Support glance's new location API https://review.opendev.org/c/openstack/nova/+/891036 | 19:22 |
opendevreview | Dan Smith proposed openstack/nova master: DNM: Test glance new location api https://review.opendev.org/c/openstack/nova/+/891207 | 19:22 |
melwitt | sean-k-mooney: nova-next is still unhappy :( https://zuul.opendev.org/t/openstack/build/3c2c7955a4b34112a377a857623d6a73/log/job-output.txt#37662 | 19:25 |
melwitt | guess we can't check for healed port allocations | 19:26 |
sean-k-mooney | really | 19:32 |
sean-k-mooney | or did i forget to delete on eof the checks | 19:33 |
sean-k-mooney | so tha tot me say that we have not configure bandwith qos properly | 19:36 |
sean-k-mooney | am i need ot go have somethign to eat but ok ill just drop all the port stuff | 19:36 |
sean-k-mooney | and if that does not work we can drop all the heal code | 19:36 |
mnaser | https://bugs.launchpad.net/nova/+bug/2052915 is affecting nova right now but i have no idea where to look at the refernece | 19:39 |
sean-k-mooney | that sounds like neutron broke upgrades | 19:40 |
melwitt | sean-k-mooney: I'm not sure .. it's failing on "bandwidth_allocations=$(echo "$allocations" | grep NET_BW_EGR_KILOBIT_PER_SEC)" | 19:40 |
sean-k-mooney | yep so if bandwith qos was working that allocation should exist | 19:40 |
melwitt | the weird thing is that it's supposed to "echo "Failed to heal port allocations."" if the above returned "" but it's not echoing that | 19:41 |
sean-k-mooney | yep we have allcoations but not for that type | 19:42 |
melwitt | oh, I see. ok | 19:42 |
sean-k-mooney | so what i ment by not configre correctly is i think we may not have enabel the qos extionion in the job | 19:42 |
sean-k-mooney | or somethign liek that | 19:43 |
opendevreview | sean mooney proposed openstack/nova master: [S-RBAC] adapt nova-next for port's binding:profile field change https://review.opendev.org/c/openstack/nova/+/909859 | 19:43 |
melwitt | doesn't the fact that it used to work before the service token change mean the extension is present? | 19:44 |
sean-k-mooney | i would have to read the code but not nessiarlay | 19:44 |
melwitt | ok | 19:45 |
sean-k-mooney | the allocation is create by nova but if we dont have teh qos extion is not loaded | 19:45 |
sean-k-mooney | what will happen is neutron will not make a placement resouce request in teh port | 19:45 |
sean-k-mooney | melwitt: the hook was not asserting the bandwith resouce request existed before we healed | 19:46 |
sean-k-mooney | or that the the port has port resouce requests | 19:46 |
melwitt | π΅βπ« | 19:47 |
sean-k-mooney | it does look like the qos plugin was loaded https://7afe9236b7eaaf244ae9-d98f699786987cf0f7a232f67c9b09f7.ssl.cf2.rackcdn.com/909859/2/check/nova-next/3c2c795/controller/logs/etc/neutron/neutron_conf.txt | 19:55 |
sean-k-mooney | and we see | 19:56 |
sean-k-mooney | 2024-02-22 18:53:27.154019 | controller | | resource_request | {'request_groups': [{'id': '35e2b68f-9d87-54ff-a7c1-c0772986407e', 'required': ['CUSTOM_PHYSNET_PUBLIC', 'CUSTOM_VNIC_TYPE_NORMAL'], 'resources': {'NET_BW_EGR_KILOBIT_PER_SEC': 1000, 'NET_BW_IGR_KILOBIT_PER_SEC': 1000}}], 'same_subtree': ['35e2b68f-9d87-54ff-a7c1-c0772986407e']} | | 19:56 |
sean-k-mooney | in the neutron port | 19:56 |
sean-k-mooney | so the neutron port appears to have the port resouce reqquest in it | 19:57 |
sean-k-mooney | its also in allocations we fore we delete them | 19:57 |
sean-k-mooney | melwitt: but its not after we heal the allcoations | 19:58 |
sean-k-mooney | https://paste.opendev.org/show/bltzmAmf8ZoEE2NEimRR/ | 19:58 |
sean-k-mooney | melwitt: i need to finish for today but i wonder if there is another regression here | 20:00 |
melwitt | sean-k-mooney: hm, ok. thanks for looking o/ | 20:01 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L1749 | 20:06 |
sean-k-mooney | i would expec that to print somewhere firht | 20:06 |
sean-k-mooney | or this https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L1760C1-L1785C62 | 20:07 |
sean-k-mooney | ok im actully going to leave now but this is the neutron clien twe are using in nova_manage | 20:16 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/network/neutron.py#L248 | 20:16 |
sean-k-mooney | we are creating it with admin=true | 20:16 |
sean-k-mooney | and this is where the service auth token shoudl be enabled | 20:16 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/network/neutron.py#L219-L233 | 20:16 |
sean-k-mooney | so it shoudl have an admin client with service tokens | 20:17 |
sean-k-mooney | but the nova code seams to not be healign the port so we should see why | 20:18 |
melwitt | sean-k-mooney: thanks for the guidance | 20:20 |
melwitt | looks like nova-ceph-multistore is at a 100% fail too with "Details: b'400 Bad Request\n\nThe Store URI was malformed.\n\n '" | 21:55 |
melwitt | π© | 21:58 |
opendevreview | melanie witt proposed openstack/nova-specs master: Update ephemeral encryption specs to reflect implementation https://review.opendev.org/c/openstack/nova-specs/+/907654 | 22:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!