*** brault has quit IRC | 00:16 | |
*** brault has joined #openstack-nova | 00:18 | |
*** TxGirlGeek has joined #openstack-nova | 00:21 | |
*** spatel has joined #openstack-nova | 00:25 | |
*** mdbooth has quit IRC | 00:26 | |
*** mdbooth has joined #openstack-nova | 00:28 | |
*** ociuhandu has joined #openstack-nova | 00:31 | |
*** ociuhandu has quit IRC | 00:35 | |
*** tkajinam_ has quit IRC | 00:49 | |
*** tkajinam has joined #openstack-nova | 00:49 | |
*** TxGirlGeek has quit IRC | 00:56 | |
*** hamzy has joined #openstack-nova | 01:01 | |
*** adriant has quit IRC | 01:04 | |
openstackgerrit | Arthur Dayne proposed openstack/nova master: libvirt:volume:Disallow AIO=native when no 'O_DIRECT' is available https://review.opendev.org/682772 | 01:04 |
---|---|---|
*** bnemec has joined #openstack-nova | 01:12 | |
*** bnemec has quit IRC | 01:25 | |
*** brinzhang_ has joined #openstack-nova | 01:30 | |
*** brinzhang has quit IRC | 01:33 | |
*** BjoernT_ has quit IRC | 01:35 | |
*** BjoernT has joined #openstack-nova | 01:37 | |
*** bnemec has joined #openstack-nova | 01:42 | |
*** brinzhang has joined #openstack-nova | 01:46 | |
*** adriant has joined #openstack-nova | 01:47 | |
*** brinzhang_ has quit IRC | 01:49 | |
*** brinzhang_ has joined #openstack-nova | 01:51 | |
*** spatel has quit IRC | 01:54 | |
*** brinzhang has quit IRC | 01:54 | |
*** Garyx has quit IRC | 01:55 | |
*** Garyx has joined #openstack-nova | 01:56 | |
openstackgerrit | Brin Zhang proposed openstack/python-novaclient master: Add minor version [21] to test_versions https://review.opendev.org/688599 | 02:03 |
openstackgerrit | Huachang Wang proposed openstack/nova master: cleanup to objects.fields https://review.opendev.org/688600 | 02:05 |
*** Garyx has quit IRC | 02:05 | |
*** dave-mccowan has joined #openstack-nova | 02:06 | |
*** Garyx has joined #openstack-nova | 02:07 | |
*** SonPham has joined #openstack-nova | 02:20 | |
*** SonPham has quit IRC | 02:20 | |
openstackgerrit | Huachang Wang proposed openstack/nova master: Set instance CPU policy to 'share' when 'hw_cpu_policy==share' https://review.opendev.org/688603 | 02:22 |
*** brinzhang has joined #openstack-nova | 02:28 | |
*** BjoernT has quit IRC | 02:28 | |
*** brinzhang_ has quit IRC | 02:31 | |
*** markvoelker has joined #openstack-nova | 02:32 | |
*** brinzhang has quit IRC | 02:37 | |
*** brinzhang has joined #openstack-nova | 02:38 | |
*** ricolin has joined #openstack-nova | 02:40 | |
*** gregwork has joined #openstack-nova | 02:47 | |
*** markvoelker has quit IRC | 02:52 | |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Provider Config File: YAML file loading and schema validation https://review.opendev.org/673341 | 02:54 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Provider Config File: Function to further validate and retrieve configs https://review.opendev.org/676029 | 02:54 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Provider Config File: Merge provider configs to provider tree https://review.opendev.org/676522 | 02:54 |
*** gbarros has quit IRC | 02:57 | |
*** mkrai_ has joined #openstack-nova | 03:05 | |
*** mkrai_ has quit IRC | 03:22 | |
*** mkrai__ has joined #openstack-nova | 03:22 | |
*** slaweq has joined #openstack-nova | 03:31 | |
*** hongbin has joined #openstack-nova | 03:35 | |
*** slaweq has quit IRC | 03:35 | |
*** psachin has joined #openstack-nova | 03:38 | |
*** awalende has joined #openstack-nova | 03:40 | |
*** awalende has quit IRC | 03:44 | |
*** brinzhang_ has joined #openstack-nova | 03:53 | |
*** brinzhang has quit IRC | 03:56 | |
*** BjoernT has joined #openstack-nova | 03:58 | |
*** dave-mccowan has quit IRC | 04:09 | |
*** BjoernT has quit IRC | 04:17 | |
*** FlorianFa has quit IRC | 04:18 | |
*** brault has quit IRC | 04:19 | |
*** hongbin has quit IRC | 04:22 | |
*** FlorianFa has joined #openstack-nova | 04:26 | |
*** larainema has joined #openstack-nova | 04:30 | |
*** brinzhang has joined #openstack-nova | 04:31 | |
*** brinzhang_ has quit IRC | 04:34 | |
*** mkrai__ has quit IRC | 04:45 | |
*** brinzhang_ has joined #openstack-nova | 04:50 | |
*** brinzhang has quit IRC | 04:53 | |
*** brinzhang has joined #openstack-nova | 04:59 | |
*** brinzhang_ has quit IRC | 05:03 | |
*** ratailor has joined #openstack-nova | 05:05 | |
*** Luzi has joined #openstack-nova | 05:16 | |
*** brinzhang_ has joined #openstack-nova | 05:28 | |
*** brinzhang has quit IRC | 05:31 | |
*** udesale has joined #openstack-nova | 05:33 | |
*** udesale has quit IRC | 05:38 | |
*** udesale has joined #openstack-nova | 05:39 | |
*** markvoelker has joined #openstack-nova | 05:47 | |
*** takamatsu has quit IRC | 05:54 | |
*** mkrai_ has joined #openstack-nova | 05:55 | |
openstackgerrit | Brin Zhang proposed openstack/python-novaclient master: Microversion 2.80: Add user_id/project_id to migration-list API https://review.opendev.org/675023 | 05:57 |
*** ccamacho has joined #openstack-nova | 05:59 | |
*** brinzhang has joined #openstack-nova | 06:07 | |
*** slaweq has joined #openstack-nova | 06:10 | |
*** igordc has quit IRC | 06:10 | |
*** brinzhang_ has quit IRC | 06:10 | |
*** brinzhang_ has joined #openstack-nova | 06:17 | |
*** brinzhang has quit IRC | 06:20 | |
*** sapd1 has joined #openstack-nova | 06:21 | |
*** pcaruana has joined #openstack-nova | 06:30 | |
*** mkrai_ has quit IRC | 06:30 | |
*** mkrai_ has joined #openstack-nova | 06:33 | |
*** vesper11- has quit IRC | 06:33 | |
*** vesper11 has joined #openstack-nova | 06:36 | |
*** ratailor_ has joined #openstack-nova | 06:38 | |
*** ratailor has quit IRC | 06:40 | |
*** brinzhang has joined #openstack-nova | 06:40 | |
*** brinzhang_ has quit IRC | 06:44 | |
*** dpawlik has joined #openstack-nova | 06:52 | |
*** trident has quit IRC | 06:52 | |
*** trident has joined #openstack-nova | 06:56 | |
openstackgerrit | Brin Zhang proposed openstack/python-novaclient master: Add functional test for migration-list in v2.80 https://review.opendev.org/688635 | 06:58 |
*** maciejjozefczyk has joined #openstack-nova | 07:00 | |
*** Luzi_ has joined #openstack-nova | 07:04 | |
*** jangutter_ has joined #openstack-nova | 07:05 | |
*** nanzha has joined #openstack-nova | 07:05 | |
*** Luzi has quit IRC | 07:08 | |
*** damien_r has joined #openstack-nova | 07:08 | |
*** jangutter has quit IRC | 07:08 | |
*** mkrai_ has quit IRC | 07:09 | |
*** jawad_axd has joined #openstack-nova | 07:12 | |
*** tesseract has joined #openstack-nova | 07:12 | |
*** udesale has quit IRC | 07:13 | |
*** udesale has joined #openstack-nova | 07:13 | |
*** rcernin has quit IRC | 07:16 | |
*** awalende has joined #openstack-nova | 07:16 | |
*** awalende has quit IRC | 07:18 | |
*** awalende has joined #openstack-nova | 07:19 | |
*** jangutter has joined #openstack-nova | 07:22 | |
*** jangutter_ has quit IRC | 07:25 | |
*** tkajinam has quit IRC | 07:25 | |
*** tkajinam has joined #openstack-nova | 07:26 | |
*** ttsiouts has joined #openstack-nova | 07:26 | |
*** ralonsoh has joined #openstack-nova | 07:28 | |
*** mkrai_ has joined #openstack-nova | 07:28 | |
*** ociuhandu has joined #openstack-nova | 07:30 | |
*** ociuhandu has quit IRC | 07:35 | |
gibi | good morning nova | 07:35 |
*** ociuhandu has joined #openstack-nova | 07:35 | |
openstackgerrit | Brin Zhang proposed openstack/python-novaclient master: Add functional test for migration-list in v2.80 https://review.opendev.org/688635 | 07:41 |
*** ociuhandu has quit IRC | 07:42 | |
*** dpawlik has quit IRC | 07:43 | |
gibi | stephenfin: just to be sure you also got the message I got a mail from the summit organizers that the project updates will not be recoreded. Instead they would like to get an etherpad with main points. | 07:48 |
*** ivve has joined #openstack-nova | 07:51 | |
*** priteau has joined #openstack-nova | 07:53 | |
*** dtantsur|afk is now known as dtantsur | 07:53 | |
jkulik | Hi, regarding https://bugs.launchpad.net/nova/+bug/1648501 I see a problem with code assuming instance.image_ref being None equals boot-from-volume as mentioned here https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3371-L3372 | 08:02 |
openstack | Launchpad bug 1648501 in OpenStack Compute (nova) "providing different imageRef when using block_device_mapping (image -> volume) " [Low,Confirmed] | 08:02 |
*** tkajinam has quit IRC | 08:02 | |
jkulik | Because even if imageRef is set to the same values as in the block-device-mapping, instance.image_ref will then be set even though it's boot-from-volume. | 08:03 |
*** sapd1 has quit IRC | 08:04 | |
jkulik | As the bug mentions, the cli already forbids providing imageRef and block-device-mapping at the same time. But the API accepts it. Shouldn't this be made consistent? | 08:05 |
*** rpittau|afk is now known as rpittau | 08:07 | |
*** ttsiouts has quit IRC | 08:12 | |
*** ttsiouts has joined #openstack-nova | 08:12 | |
*** ociuhandu has joined #openstack-nova | 08:13 | |
*** dpawlik has joined #openstack-nova | 08:15 | |
*** ttsiouts has quit IRC | 08:17 | |
*** ttsiouts has joined #openstack-nova | 08:17 | |
*** dpawlik has quit IRC | 08:20 | |
*** ociuhandu has quit IRC | 08:20 | |
*** ociuhandu has joined #openstack-nova | 08:25 | |
*** takamatsu has joined #openstack-nova | 08:26 | |
*** takamatsu has quit IRC | 08:29 | |
*** ociuhandu has quit IRC | 08:31 | |
*** ociuhandu has joined #openstack-nova | 08:31 | |
*** ociuhandu has quit IRC | 08:35 | |
*** ociuhandu has joined #openstack-nova | 08:35 | |
*** takamatsu has joined #openstack-nova | 08:36 | |
*** ttsiouts has quit IRC | 08:37 | |
*** ttsiouts has joined #openstack-nova | 08:37 | |
*** takamatsu has quit IRC | 08:41 | |
*** ttsiouts has quit IRC | 08:42 | |
*** ociuhandu has quit IRC | 08:42 | |
*** tssurya has joined #openstack-nova | 08:45 | |
*** dpawlik has joined #openstack-nova | 08:47 | |
*** takamatsu has joined #openstack-nova | 08:48 | |
*** ttsiouts has joined #openstack-nova | 08:49 | |
*** dpawlik has quit IRC | 08:52 | |
*** bnemec has quit IRC | 08:53 | |
*** brinzhang has joined #openstack-nova | 09:23 | |
*** ttsiouts has quit IRC | 09:24 | |
*** ttsiouts has joined #openstack-nova | 09:24 | |
*** ttsiouts has quit IRC | 09:29 | |
*** derekh has joined #openstack-nova | 09:43 | |
*** ociuhandu has joined #openstack-nova | 09:47 | |
*** ociuhandu has quit IRC | 09:53 | |
*** SonPham has joined #openstack-nova | 09:53 | |
SonPham | Hi. I'm working with nova and horizon. I traced code form Horizon on Button start-instance: Horizon call NovaClient: novaclient(request).servers.start(instance_id) | 09:54 |
SonPham | and I check python client server start is: | 09:55 |
SonPham | def start(self, server): | 09:55 |
SonPham | and i think it call to /nova/nova/copmute/apy.py / def start() | 09:55 |
SonPham | how it work? | 09:56 |
*** dpawlik has joined #openstack-nova | 10:02 | |
*** brinzhang_ has joined #openstack-nova | 10:04 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove unused 'nova-dsvm-base' job https://review.opendev.org/688389 | 10:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Stop testing Python 2 https://review.opendev.org/687954 | 10:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: zuul: Make functional job inherit from openstack parents https://review.opendev.org/688425 | 10:05 |
gibi | SonPham: the python client calls the nova-api service via HTTP (REST). The server create request is handed by https://github.com/openstack/nova/blob/63fb66e39a2590f00541f36d94e31372c2fe82ee/nova/api/openstack/compute/servers.py#L598 | 10:05 |
*** dpawlik has quit IRC | 10:06 | |
*** brinzhang has quit IRC | 10:08 | |
*** Luzi has joined #openstack-nova | 10:08 | |
gibi | jkulik: I agree it feels strange that you can provide an imageRef POST /servers but at that same time you are server is not booted from that image but booted from a volume with different content | 10:09 |
gibi | jkulik: so I think the nova-api can reject such situation | 10:09 |
*** Luzi_ has quit IRC | 10:10 | |
*** ttsiouts has joined #openstack-nova | 10:11 | |
*** ratailor_ has quit IRC | 10:12 | |
*** ratailor has joined #openstack-nova | 10:13 | |
jkulik | for the vmware driver in queens at least, we get a volume and an ephemeral disk, because it seems to only take into account instance.image_ref when creating the ephemeral one | 10:16 |
jkulik | (still boots from the volume, though) | 10:16 |
openstackgerrit | Martin Midolesov proposed openstack/nova master: Implementing graceful shutdown. https://review.opendev.org/666245 | 10:17 |
SonPham | gibi server.py call to nova.api and this code call to compute/api.py/def start() ?? : self.compute_api.start(context, instance) | 10:18 |
*** ttsiouts has quit IRC | 10:27 | |
*** ttsiouts has joined #openstack-nova | 10:28 | |
*** SonPham has quit IRC | 10:30 | |
*** ociuhandu has joined #openstack-nova | 10:31 | |
*** brinzhang has joined #openstack-nova | 10:33 | |
*** ttsiouts has quit IRC | 10:33 | |
*** markvoelker has quit IRC | 10:33 | |
*** brinzhang_ has quit IRC | 10:36 | |
*** ociuhandu has quit IRC | 10:36 | |
*** ratailor_ has joined #openstack-nova | 10:38 | |
*** ratailor has quit IRC | 10:40 | |
*** tbachman has quit IRC | 10:42 | |
*** dpawlik has joined #openstack-nova | 10:42 | |
*** dpawlik has quit IRC | 10:46 | |
*** bbowen has quit IRC | 10:47 | |
*** takamatsu has quit IRC | 10:56 | |
*** kaliya has joined #openstack-nova | 11:00 | |
*** kaliya has quit IRC | 11:05 | |
*** brinzhang_ has joined #openstack-nova | 11:07 | |
*** brinzhang has quit IRC | 11:09 | |
*** factor has joined #openstack-nova | 11:15 | |
*** udesale has quit IRC | 11:16 | |
*** ttsiouts has joined #openstack-nova | 11:18 | |
*** markvoelker has joined #openstack-nova | 11:19 | |
*** takamatsu has joined #openstack-nova | 11:34 | |
*** dpawlik has joined #openstack-nova | 11:35 | |
*** brinzhang has joined #openstack-nova | 11:38 | |
*** ttsiouts has quit IRC | 11:38 | |
*** ttsiouts has joined #openstack-nova | 11:39 | |
*** markvoelker has quit IRC | 11:39 | |
*** brinzhang_ has quit IRC | 11:41 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Allow evacuating server with port resource request https://review.opendev.org/688387 | 11:41 |
*** ttsiouts has quit IRC | 11:43 | |
*** ttsiouts has joined #openstack-nova | 11:44 | |
*** ccamacho has quit IRC | 11:46 | |
*** takamatsu has quit IRC | 11:51 | |
*** sapd1 has joined #openstack-nova | 11:51 | |
*** nanzha has quit IRC | 11:52 | |
*** dpawlik has quit IRC | 11:52 | |
*** nanzha has joined #openstack-nova | 11:52 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Enable evacuation with qos ports https://review.opendev.org/688688 | 11:53 |
*** mkrai_ has quit IRC | 12:00 | |
*** tbachman has joined #openstack-nova | 12:03 | |
*** bbowen has joined #openstack-nova | 12:04 | |
*** brinzhang_ has joined #openstack-nova | 12:04 | |
gibi | dansmith: when you are up. the runway link in the channel topic need some low and as far as I remember you can have op in this channel to fix it | 12:07 |
gibi | s/low/love/ | 12:07 |
*** brinzhang has quit IRC | 12:07 | |
*** tbachman has quit IRC | 12:08 | |
efried | brinzhang_: you around? | 12:09 |
*** tbachman has joined #openstack-nova | 12:10 | |
*** brinzhang_ has quit IRC | 12:14 | |
*** takamatsu has joined #openstack-nova | 12:14 | |
*** brinzhang_ has joined #openstack-nova | 12:14 | |
*** takamatsu has quit IRC | 12:16 | |
*** larainema has quit IRC | 12:18 | |
*** dpawlik has joined #openstack-nova | 12:21 | |
*** takamatsu has joined #openstack-nova | 12:21 | |
*** ratailor_ has quit IRC | 12:21 | |
*** belmoreira has joined #openstack-nova | 12:24 | |
*** dpawlik has quit IRC | 12:26 | |
*** takamatsu has quit IRC | 12:29 | |
*** yaawang has quit IRC | 12:29 | |
*** yaawang_ has joined #openstack-nova | 12:29 | |
*** takamatsu has joined #openstack-nova | 12:32 | |
*** hamzy has quit IRC | 12:34 | |
*** xek has joined #openstack-nova | 12:36 | |
*** hamzy has joined #openstack-nova | 12:37 | |
*** markvoelker has joined #openstack-nova | 12:38 | |
slaweq | efried: gibi: hi | 12:43 |
slaweq | did You saw failure like https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_a28/681671/10/check/neutron-tempest-dvr/a281c2b/testr_results.html.gz before? | 12:43 |
gibi | slaweq: hi! | 12:43 |
slaweq | I saw it at least 3-4 times during last week | 12:43 |
slaweq | maybe it's already known issue than I will not report new bug | 12:44 |
gibi | slaweq: I haven't seen it before. But it seems like a race condition between the build and the delete | 12:44 |
slaweq | gibi: ok, I will report new bug than | 12:45 |
slaweq | it's more for placement, right? | 12:45 |
gibi | slaweq: thanks. it is for nova. As nova calls placement with a wrong generation | 12:45 |
efried | was melwitt working on that? | 12:45 |
stephenfin | gibi: I saw that, yeah. If we've slides, that ought to do the trick | 12:46 |
slaweq | gibi: ok, thx | 12:46 |
gibi | efried: I searched gerrit now but I haven't found a patch | 12:47 |
gibi | stephenfin: OK, cool | 12:47 |
efried | yeah, me neither, hm... | 12:47 |
efried | I was sure there was a bug at least... | 12:48 |
*** brinzhang has joined #openstack-nova | 12:49 | |
efried | gibi, slaweq: does this look right? https://bugs.launchpad.net/nova/+bug/1836754 | 12:50 |
openstack | Launchpad bug 1836754 in OpenStack Compute (nova) "Conflict when deleting allocations for an instance that hasn't finished building" [Medium,Confirmed] | 12:50 |
slaweq | efried: yes, I just found it now :) | 12:50 |
*** xek has quit IRC | 12:51 | |
slaweq | efried: thx for looking for it | 12:51 |
gibi | efried: yeah, down in the comments there is another failure when the compute host is deleted | 12:51 |
*** xek has joined #openstack-nova | 12:52 | |
*** CeeMac has joined #openstack-nova | 12:52 | |
efried | gibi: this is another case where it would be useful to be able to distinguish between consumer and provider conflicts | 12:53 |
*** brinzhang_ has quit IRC | 12:53 | |
*** xek has quit IRC | 12:53 | |
efried | We don't care if we get a provider conflict here, but if we get a consumer conflict that's bad. | 12:53 |
gibi | efried: noted. I still have a TODO to make progress on that | 12:53 |
*** xek has joined #openstack-nova | 12:54 | |
efried | we could solve this with a retry, but without making that ^ distinction, I'm not sure it's the right thing to do. | 12:54 |
efried | ...or that it's better than just using an old (generation-less) microversion to drop the allocs. | 12:54 |
efried | ...or using the DELETE route which IIRC doesn't do generations at all. | 12:55 |
efried | (because no payload). | 12:55 |
efried | (and we don't put that stuff in headers) | 12:55 |
gibi | I think we need to retry on consumer conflict as well. This bug basically means that the server delete codepatch racing with the server create codepath. The end user want's the server to be deleted so even if the create codepath updated the server allocation we need to delete the updated allocation | 12:57 |
*** markvoelker has quit IRC | 12:57 | |
gibi | so if we want to delete that allocation in every case, then we can even call DELETE without the generation checking | 12:58 |
efried | If we can be sure we're in the instance delete flow, I agree with you. | 12:58 |
efried | Meaning we can't just go hard at the report.py level; we have to {call a different method | send a specific flag} indicating we want to force it. | 12:59 |
*** mdbooth has quit IRC | 12:59 | |
*** dpawlik has joined #openstack-nova | 12:59 | |
efried | because in the general case if we're deleting allocations and something changes the consumer, it doesn't necessarily mean we want to proceed. | 12:59 |
gibi | efried: you are right | 13:00 |
*** mdbooth has joined #openstack-nova | 13:00 | |
efried | Though I'm not sure how we could race e.g. a resize and a migrate | 13:00 |
gibi | efried: we need to be careful and only force the delete from the server delete codepath | 13:00 |
*** panda|off is now known as panda | 13:01 | |
efried | btw I checked the placement logs in slaweq's repro and they don't include the message for the 409, so I can't tell whether it was indeed a provider conflict. | 13:01 |
*** xek has quit IRC | 13:01 | |
*** mriedem has joined #openstack-nova | 13:01 | |
*** xek has joined #openstack-nova | 13:02 | |
*** dpawlik has quit IRC | 13:04 | |
*** takamatsu has quit IRC | 13:06 | |
gibi | efried: in the placement log here is the conflict Oct 14 14:45:26.823202 | 13:08 |
gibi | efried: around that there is multiple PUT request for consumer 5b5b12dc | 13:09 |
*** takamatsu has joined #openstack-nova | 13:10 | |
gibi | efried: nova-api doing a local delete for server 5b5b12dc | 13:10 |
gibi | efried: so at least slaweq's repor is a race between a server create and a server delete | 13:12 |
efried | mm. So we could fix this with a hard delete -- but what actually worries me is the reverse problem. | 13:14 |
efried | what if the delete happens first, and then the create comes in? We would have leaked allocations. | 13:15 |
*** nanzha has quit IRC | 13:15 | |
*** nanzha has joined #openstack-nova | 13:16 | |
*** liuyulong has joined #openstack-nova | 13:16 | |
*** brinzhang_ has joined #openstack-nova | 13:16 | |
efried | we should have some way for the delete to abort the create... | 13:17 |
efried | But I guess that's another problem for another day. | 13:17 |
*** nweinber has joined #openstack-nova | 13:19 | |
*** brinzhang has quit IRC | 13:19 | |
*** Luzi has quit IRC | 13:19 | |
*** lpetrut has joined #openstack-nova | 13:21 | |
*** jangutter_ has joined #openstack-nova | 13:24 | |
*** jangutter_ has quit IRC | 13:24 | |
*** jangutter has quit IRC | 13:27 | |
*** sapd1 has quit IRC | 13:28 | |
mriedem | stephenfin: since you care about the py2 droppage, grenade jobs are failing on some weird package things | 13:28 |
mriedem | https://zuul.opendev.org/t/openstack/build/4da3c44dcbcd4ed7aa04a8dcaa19c011/log/logs/grenade.sh.txt.gz#35366 | 13:28 |
*** gbarros has joined #openstack-nova | 13:28 | |
*** dave-mccowan has joined #openstack-nova | 13:28 | |
mriedem | that's with py2 on the old (train) side and py3 on the new (ussuri) side | 13:28 |
mriedem | ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall. | 13:29 |
mriedem | gibi: efried: i'm pretty sure melwitt brought up that bug the other day (the local delete conflict), | 13:29 |
mriedem | did someone report a bug for it? | 13:29 |
mriedem | looks like artom already opened one for the same thing https://bugs.launchpad.net/nova/+bug/1836754 | 13:30 |
openstack | Launchpad bug 1836754 in OpenStack Compute (nova) "Conflict when deleting allocations for an instance that hasn't finished building" [Medium,Confirmed] | 13:30 |
gibi | mriedem: yes, that one | 13:30 |
*** awalende has quit IRC | 13:31 | |
*** brinzhang has joined #openstack-nova | 13:31 | |
mriedem | last time i looked we just need a retry in the local delete case | 13:31 |
*** awalende has joined #openstack-nova | 13:31 | |
gibi | mriedem: yeah, that was my understanding above as well | 13:32 |
*** pcaruana has quit IRC | 13:32 | |
mriedem | "what if the delete happens first, and then the create comes in? We would have leaked allocations." | 13:33 |
mriedem | yeah that is a problem | 13:33 |
mriedem | since the create will just re-create the consumer and allocations in placement | 13:33 |
mriedem | hear me out, SOFT DELETE-ABLE CONSUMERS! | 13:33 |
*** dpawlik has joined #openstack-nova | 13:34 | |
*** brinzhang_ has quit IRC | 13:34 | |
mriedem | efried: i'd think we could trap that case in conductor once the response comes from the scheduler, | 13:35 |
mriedem | b/c we'll check if the build request has been deleted in the interim during scheduling and if so halt the build process - at that point we could cleanup allocations (do we not already?) | 13:35 |
*** takamatsu has quit IRC | 13:36 | |
*** awalende has quit IRC | 13:36 | |
*** awalende has joined #openstack-nova | 13:37 | |
*** ChanServ sets mode: +o dansmith | 13:38 | |
*** dansmith changes topic to "Current runways: https://etherpad.openstack.org/p/nova-runways-ussuri -- This channel is for Nova development. For support of Nova deployments, please use #openstack." | 13:38 | |
*** ChanServ sets mode: -o dansmith | 13:38 | |
dansmith | gibi: got it | 13:38 |
gibi | dansmith: thanks! | 13:39 |
*** dpawlik has quit IRC | 13:39 | |
efried | mriedem: we might already, yeah. | 13:41 |
*** spsurya has joined #openstack-nova | 13:42 | |
*** awalende has quit IRC | 13:42 | |
mriedem | gibi: efried: i might work up a functional test to recreate it since unit tests aren't going to cut it for that kind of interaction | 13:42 |
efried | cool | 13:43 |
*** bnemec has joined #openstack-nova | 13:43 | |
*** KeithMnemonic has joined #openstack-nova | 13:43 | |
efried | brinzhang: yt? | 13:44 |
gibi | mriedem: cool, sorry I'm busy today - tomorrow so I did not bite | 13:44 |
mriedem | np | 13:44 |
*** jamesdenton has quit IRC | 13:46 | |
*** jawad_axd has quit IRC | 13:52 | |
*** jawad_axd has joined #openstack-nova | 13:53 | |
*** xek_ has joined #openstack-nova | 13:55 | |
*** jawad_ax_ has joined #openstack-nova | 13:57 | |
*** xek has quit IRC | 13:58 | |
*** ociuhandu has joined #openstack-nova | 13:58 | |
*** jawad_axd has quit IRC | 13:58 | |
dansmith | mriedem: I think you oughta drop your -W on that (now) base patch and I'll +2.. I know you're going to add the test later and it sounds like you don't think it's likely to break as it is | 13:59 |
*** brinzhang_ has joined #openstack-nova | 14:01 | |
*** jawad_ax_ has quit IRC | 14:01 | |
*** pcaruana has joined #openstack-nova | 14:04 | |
*** brinzhang has quit IRC | 14:04 | |
mriedem | dansmith: ok | 14:09 |
dansmith | I already +2d | 14:09 |
*** dklyle has quit IRC | 14:18 | |
*** markvoelker has joined #openstack-nova | 14:23 | |
*** jangutter has joined #openstack-nova | 14:25 | |
*** brinzhang has joined #openstack-nova | 14:28 | |
*** brinzhang has quit IRC | 14:30 | |
*** brinzhang has joined #openstack-nova | 14:31 | |
*** brinzhang_ has quit IRC | 14:31 | |
*** brinzhang has quit IRC | 14:32 | |
*** brinzhang has joined #openstack-nova | 14:32 | |
*** brinzhang has quit IRC | 14:33 | |
*** ociuhandu has quit IRC | 14:33 | |
*** brinzhang has joined #openstack-nova | 14:34 | |
*** brinzhang has quit IRC | 14:35 | |
*** brinzhang has joined #openstack-nova | 14:35 | |
*** brinzhang has quit IRC | 14:35 | |
stephenfin | mriedem: Ugh, that's pip's total lack of a dependency resolution biting us in the ass. I've no idea how to fix that | 14:38 |
stephenfin | Is that the log from my patch or something else? | 14:38 |
*** dtantsur is now known as dtantsur|brb | 14:40 | |
dansmith | so, there's a cinder tempest test that is running assertEqual() and failing because an updated_at stamp isn't exactly what it expects... | 14:42 |
dansmith | I don't see that up on e-r | 14:42 |
*** dklyle has joined #openstack-nova | 14:43 | |
*** mlavalle has joined #openstack-nova | 14:43 | |
dansmith | it's the minimum basic scenario, where it expects the post-volume-create list to exactly match a show a couple statements later, but seems like something else has touched that volume | 14:44 |
dansmith | mriedem: know anything about such a thing? | 14:45 |
*** ivve has quit IRC | 14:45 | |
*** eharney has joined #openstack-nova | 14:49 | |
mriedem | updated_at or something is different right? | 14:51 |
mriedem | i've seen that before, maybe a new regression | 14:51 |
mriedem | stephenfin: it was a log from my devstack patch to default USE_PYTHON3=True which your patch depends on | 14:51 |
mriedem | note i brought it up in -tc since they were talking about this upgrade testing yesterday | 14:51 |
dansmith | mriedem: yep. | 14:51 |
mriedem | dansmith: got a link to a job failure? | 14:53 |
dansmith | mriedem: https://365c4224c221ec730c2d-019bc8f0795daf4dab730f80e83974fa.ssl.cf1.rackcdn.com/627891/62/check/nova-next/58f7f91/testr_results.html.gz | 14:53 |
stephenfin | mriedem: I've a minimal reproducer here: http://paste.openstack.org/show/783995/ | 14:53 |
mriedem | oh https://bugs.launchpad.net/tempest/+bug/1838202 | 14:53 |
openstack | Launchpad bug 1838202 in tempest "TestMinimumBasicScenario.test_minimum_basic_scenario race fail comparing volume to expected values with updated_at diff" [Undecided,New] | 14:53 |
mriedem | dansmith: ^ | 14:54 |
openstackgerrit | Merged openstack/nova master: VMware: Update flavor-related metadata on resize https://review.opendev.org/681004 | 14:54 |
stephenfin | Stupid pip | 14:54 |
dansmith | mriedem: ah nice, I didn't see that on e-r | 14:54 |
mriedem | because it's not....let me look | 14:54 |
*** mkrai_ has joined #openstack-nova | 14:54 | |
*** cfriesen has joined #openstack-nova | 14:55 | |
dansmith | if my logstashing is right, looks like it started around 10/7 although not sure how much history we have, that's about a week ago | 14:55 |
mriedem | yeah the pain in the ass is the mismatch is singleline indexing | 14:56 |
dansmith | yeah | 14:56 |
mriedem | logstash only goes back 10 days | 14:56 |
mriedem | i reported that bug 78 days ago | 14:56 |
openstack | bug 78 in Baz (deprecated) "When asking you to sign something; baz should tell you what" [Medium,Won't fix] https://launchpad.net/bugs/78 | 14:56 |
mriedem | heh bug 666 | 14:56 |
openstack | bug 666 in Launchpad itself "can't file a bug on Ubuntu" [Medium,Invalid] https://launchpad.net/bugs/666 | 14:56 |
dansmith | yeah | 14:56 |
*** TxGirlGeek has joined #openstack-nova | 14:59 | |
*** cfriesen has quit IRC | 15:01 | |
mriedem | it looks like the dict keys are at least sorted so i can do: | 15:05 |
mriedem | http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22actual%20%20%20%20%3D%20%7B'attachments'%3A%20%5B%5D%2C%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d | 15:05 |
dansmith | hmm, that only shows two hits/ | 15:06 |
dansmith | maybe those are the two I looked at | 15:06 |
*** ociuhandu has joined #openstack-nova | 15:07 | |
stephenfin | mriedem: https://review.opendev.org/688731 | 15:07 |
mriedem | dansmith: yeah it's on 627891 | 15:08 |
mriedem | it's just rare | 15:08 |
dansmith | yeah | 15:08 |
mriedem | stephenfin: ack i'll make my devstack change depend on that | 15:10 |
*** ociuhandu has quit IRC | 15:10 | |
mriedem | oh i lost all of my beautiful meaningless +1s......what a world what a world | 15:10 |
*** ociuhandu has joined #openstack-nova | 15:11 | |
stephenfin | Cool. Looks like a fix in pip _is_ underway but it could be months/years before that lands https://pradyunsg.me/blog/2019/06/23/oss-update-1/ | 15:11 |
* stephenfin didn't do anything cool like that in college :/ | 15:11 | |
mriedem | cool like what? write a dep resolver for pip? | 15:12 |
mriedem | fwiw i think lifeless shed many years from his life working on a dep resolver for pip.... | 15:12 |
mriedem | before or during just saying f it and doing the constraints stuff in openstack | 15:13 |
*** gyee has joined #openstack-nova | 15:13 | |
dansmith | ah the good old days of openstack | 15:13 |
sean-k-mooney | mdbooth: is this https://review.opendev.org/#/c/663382/ the patch form stephenfin you planned to add functional test too? if not i might add one in a few days | 15:14 |
*** mriedem has quit IRC | 15:18 | |
*** mriedem has joined #openstack-nova | 15:18 | |
sean-k-mooney | stephenfin: by the way we still need to land this https://review.opendev.org/#/c/675776/ and backport it to train | 15:20 |
sean-k-mooney | although it looks like it has lots of other change mixed in | 15:21 |
*** ttsiouts has quit IRC | 15:21 | |
*** ttsiouts has joined #openstack-nova | 15:22 | |
sean-k-mooney | so ya we shoudl merge v5 https://review.opendev.org/#/c/675776/5 not v6 ill -1 the patch and ask for the patch to be fixed. | 15:24 |
*** ttsiouts has quit IRC | 15:27 | |
*** markvoelker has quit IRC | 15:28 | |
*** jawad_axd has joined #openstack-nova | 15:28 | |
*** jawad_axd has quit IRC | 15:33 | |
*** igordc has joined #openstack-nova | 15:37 | |
*** damien_r has quit IRC | 15:43 | |
*** jamesdenton has joined #openstack-nova | 15:44 | |
*** xek__ has joined #openstack-nova | 15:46 | |
*** xek_ has quit IRC | 15:48 | |
*** jawad_axd has joined #openstack-nova | 15:49 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Remove compute compat checks for aborting queued live migrations https://review.opendev.org/688409 | 15:49 |
*** liuyulong has quit IRC | 15:51 | |
*** maciejjozefczyk has quit IRC | 15:52 | |
*** belmoreira has quit IRC | 15:53 | |
*** jawad_axd has quit IRC | 15:53 | |
*** rpittau is now known as rpittau|afk | 16:03 | |
*** xek__ has quit IRC | 16:05 | |
*** xek__ has joined #openstack-nova | 16:05 | |
*** xek_ has joined #openstack-nova | 16:09 | |
*** jawad_axd has joined #openstack-nova | 16:10 | |
*** hemna_afk is now known as hemna_ | 16:11 | |
*** xek__ has quit IRC | 16:12 | |
*** dtantsur|brb is now known as dtantsur | 16:13 | |
*** jawad_axd has quit IRC | 16:14 | |
*** jawad_axd has joined #openstack-nova | 16:30 | |
*** igordc has quit IRC | 16:31 | |
*** jawad_axd has quit IRC | 16:34 | |
*** henriqueof has joined #openstack-nova | 16:36 | |
*** ivve has joined #openstack-nova | 16:36 | |
*** dtantsur is now known as dtantsur|afk | 16:37 | |
*** nanzha has quit IRC | 16:38 | |
*** mkrai_ has quit IRC | 16:38 | |
*** tssurya has quit IRC | 16:40 | |
*** ociuhandu_ has joined #openstack-nova | 16:40 | |
*** ociuhandu has quit IRC | 16:44 | |
*** derekh has quit IRC | 16:47 | |
*** ociuhandu_ has quit IRC | 16:47 | |
*** takamatsu has joined #openstack-nova | 16:49 | |
*** dviroel_ has joined #openstack-nova | 16:54 | |
*** ociuhandu has joined #openstack-nova | 16:56 | |
*** ociuhandu has quit IRC | 17:01 | |
*** lpetrut has quit IRC | 17:01 | |
*** priteau has quit IRC | 17:03 | |
*** takamatsu has quit IRC | 17:12 | |
*** nweinber_ has joined #openstack-nova | 17:16 | |
*** tbachman has quit IRC | 17:18 | |
*** nweinber has quit IRC | 17:18 | |
*** ricolin has quit IRC | 17:20 | |
*** mlavalle has quit IRC | 17:23 | |
*** mlavalle has joined #openstack-nova | 17:24 | |
*** dviroel_ is now known as dviroel | 17:24 | |
*** ralonsoh has quit IRC | 17:25 | |
*** tbachman has joined #openstack-nova | 17:25 | |
*** xek__ has joined #openstack-nova | 17:27 | |
*** xek_ has quit IRC | 17:30 | |
*** ociuhandu has joined #openstack-nova | 17:31 | |
*** spsurya has quit IRC | 17:33 | |
*** ociuhandu has quit IRC | 17:35 | |
*** psachin has quit IRC | 17:38 | |
*** eharney has quit IRC | 17:43 | |
*** ociuhandu has joined #openstack-nova | 17:48 | |
*** jawad_axd has joined #openstack-nova | 17:50 | |
mriedem | i'm finding that writing a functional test for bug 1836754 is so hacky to order the events that it's probably not worth it and we should just throw a retry decorator on delete_allocation_for_instance when we hit a consumer generation conflict | 17:54 |
openstack | bug 1836754 in OpenStack Compute (nova) "Conflict when deleting allocations for an instance that hasn't finished building" [Low,Confirmed] https://launchpad.net/bugs/1836754 | 17:54 |
mriedem | efried: also conductor deletes the allocations created by the scheduler if the server is gone by the time we get back from scheduling https://github.com/openstack/nova/blob/149327a3abb12418cdf65316e7c1d4924767bfdf/nova/conductor/manager.py#L1402 | 17:55 |
mriedem | so that's covered | 17:55 |
*** ociuhandu has quit IRC | 17:56 | |
artom | mriedem, we never hit it that often, did we? | 17:59 |
artom | And only in CI | 17:59 |
artom | So I think no func test is fine | 18:00 |
artom | We can use the e-r query to see if merging the commit makes the hits disappear | 18:00 |
openstackgerrit | Dan Smith proposed openstack/python-novaclient master: Add aggregate-cache-images command and client routines https://review.opendev.org/687141 | 18:01 |
*** markvoelker has joined #openstack-nova | 18:07 | |
*** openstackgerrit has quit IRC | 18:07 | |
*** jangutter has quit IRC | 18:08 | |
*** priteau has joined #openstack-nova | 18:11 | |
*** markvoelker has quit IRC | 18:12 | |
mriedem | artom: yeah it's very rare | 18:15 |
*** markvoelker has joined #openstack-nova | 18:21 | |
*** ociuhandu has joined #openstack-nova | 18:26 | |
*** markvoelker has quit IRC | 18:31 | |
*** eharney has joined #openstack-nova | 18:32 | |
*** nweinber__ has joined #openstack-nova | 18:33 | |
*** openstackgerrit has joined #openstack-nova | 18:34 | |
openstackgerrit | Merged openstack/nova master: Filter migrations by user_id/project_id https://review.opendev.org/674243 | 18:34 |
*** nweinber_ has quit IRC | 18:36 | |
*** igordc has joined #openstack-nova | 18:36 | |
*** tbachman has quit IRC | 18:54 | |
efried | mriedem: are you working on the forced alloc delete rn? | 18:55 |
*** ociuhandu has quit IRC | 18:55 | |
*** tbachman has joined #openstack-nova | 18:57 | |
mriedem | yeah, need to write a test | 19:01 |
*** markvoelker has joined #openstack-nova | 19:01 | |
efried | mriedem: imo we should use DELETE when we're serious rather than using a retry. | 19:01 |
efried | and indicate seriousness explicitly from the caller | 19:02 |
mriedem | why was it using PUT with an empty allocations dict to begin with then? | 19:02 |
efried | mriedem: exactly so we can capture 409s. | 19:03 |
*** tbachman_ has joined #openstack-nova | 19:03 | |
efried | point being, we only want to do that some of the time. | 19:03 |
efried | IIRC we used to use DELETE until I made a stink about it. | 19:04 |
*** tbachman has quit IRC | 19:04 | |
*** tbachman_ is now known as tbachman | 19:04 | |
efried | mriedem: I was about to dig in; if you want to write a test, I can take a swing at the patch. | 19:05 |
efried | unless you're doing both already | 19:05 |
mriedem | so you must have gotten stinky here https://review.opendev.org/#/c/591597/ | 19:05 |
*** markvoelker has quit IRC | 19:06 | |
efried | yeah, that was a step on the path. I think I remember complaining about the fact that that wasn't really helping us, because of the tininess of the window we were leaving. | 19:06 |
efried | though clearly we're hitting that window, hence the bug. | 19:07 |
efried | mriedem: is this something you think we should backport? Because I'd like to do a series where we split the existing one into forced-or-normal, also getting rid of @safe_connect. But can do a "tactical" version for backport if that's a consideration. | 19:08 |
efried | artom: did you see this bug in real life or just in CI? | 19:09 |
mriedem | reading some of the comments on that patch it sounds like you were pushing for keeping DELETE rather than going with PUT and consumer generation handling: | 19:09 |
mriedem | "This patch does not make a conflict visible unless it happens to occur in the teeny window *within* the method itself. IMO introducing this code could make one *think* we're doing so, which is bad. Because the vast, vast majority of cases where such conflicts occur (allocation gets mucked with between when the instance is created and when the deletion process begins) will go completely unnoticed." | 19:09 |
efried | wise words. Who said them? | 19:09 |
mriedem | you | 19:09 |
mriedem | but you said above you made a stink about using DELETE | 19:09 |
efried | Yes, I pushed for s/DELETE/PUT{...generation}/ in, I think, Dublin. | 19:10 |
mriedem | because you wanted the caller to decide what to do, | 19:10 |
efried | because the way it *should* be done is that the GET happens somewhere earlier in the process | 19:10 |
mriedem | so in this case the API would catch the conflict and retry rather than blindly adding a @retries decorator to the method | 19:10 |
efried | right, that would be one way to approach it. | 19:10 |
efried | the other way would be, when we don't care about conflicts and really just want to do the delete, we use DELETE (again, via a new reportclient method, or a flag to the existing, which amounts to the same thing) and never have to do multiple calls. | 19:11 |
efried | I would advocate for the latter, because this isn't the only place we're going to want to force from, and it's easier to force=True than to write the same retry logic from multiple places, not to mention the fewer calls. | 19:12 |
mriedem | going back to your other question i would not mangle up removing @safe_connect and all that in the same patch that resolves the bug | 19:12 |
*** tbarron_ is now known as tbarron | 19:13 | |
mriedem | speaking for artom (i'll take the liberty here) i think it's seen in CI only but that doesn't mean it's not in the real world | 19:13 |
efried | yeah, I was going to do a bottom patch with @safe_connect backward compat and then kill it after the bugfix. | 19:13 |
mriedem | in fact, if you see this in real life i'm not sure a normal user can even retry the delete if we changed the task_state and didn't reset it on error | 19:13 |
efried | right, you have to heal allocations via nova-manage or whatever. | 19:14 |
efried | I think | 19:14 |
mriedem | no, | 19:14 |
mriedem | i mean you as a user should be able to just retry the delete of the server since you got a 409 from the compute API (not a 500) | 19:14 |
efried | right, you *should* be able to, but you can't, you have to use nova-manage today. | 19:14 |
efried | I gotta chauffeur a kid, back in 20. | 19:15 |
*** efried is now known as efried_afk | 19:15 | |
mriedem | i think we're talking past each other but ok | 19:15 |
*** henriqueof has quit IRC | 19:21 | |
*** markvoelker has joined #openstack-nova | 19:25 | |
*** trident has quit IRC | 19:29 | |
*** bbowen has quit IRC | 19:31 | |
*** efried_afk is now known as efried | 19:33 | |
efried | mriedem: I get what you're saying now -- in this particular path we don't actually leak the allocation, we just fail to finish deleting the instance. | 19:33 |
mriedem | correct | 19:33 |
*** trident has joined #openstack-nova | 19:33 | |
mriedem | i'm working something up and will post it so you can see before working on tests | 19:34 |
efried | okay. be warned I'm likely to be -1 on a solution involving retrying. | 19:35 |
mriedem | i'm not retrying | 19:36 |
efried | cool | 19:36 |
*** factor has quit IRC | 19:43 | |
artom | efried, mriedem, I... don't remember? | 19:44 |
artom | I went IRC log diving | 19:47 |
artom | Jul 16 10:07:27 <artom> efried, hah, see where else that error popped up: http://logs.openstack.org/09/666409/8/check/tempest-full-py3/38bf84e/job-output.txt#_2019-07-14_17_19_10_677555 | 19:47 |
artom | Jul 16 10:08:33 <artom> efried, there aren't that many hits, but yeah, our theory from last night is pretty much confirmed | 19:47 |
efried | okay, well anyway, mriedem is working on a fix. | 19:47 |
artom | Jul 15 16:05:11 <efried> artom: This is interesting, the failure on that skip patch http://logs.openstack.org/48/670848/1/check/neutron-tempest-dvr/ed2b81c/testr_results.html.gz | 19:48 |
artom | So, I think we first hit this in CI back when we were testing that hybrid plug revert resize thing | 19:48 |
artom | So yeah, CI only | 19:48 |
efried | ack | 19:48 |
artom | That we know of, anyways | 19:48 |
mriedem | it's an extremely tight window between GETing the allocations and PUTing them back with allocations={} | 19:49 |
efried | I would think it would be fairly tough to hit otherwise | 19:49 |
*** tesseract has quit IRC | 19:49 | |
mriedem | tempest creates a server and then immediately deletes it | 19:49 |
efried | yeah, there's basically nothing in that window. | 19:49 |
mriedem | when we hit this | 19:49 |
artom | Right, it's coming back | 19:51 |
artom | It's a specific tempest tes | 19:51 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Add force kwarg to delete_allocation_for_instance https://review.opendev.org/688802 | 19:56 |
mriedem | i'm a bit troubled by the various places that should use force=True in ^ | 19:56 |
mriedem | meaning, wtf did we change the behavior of delete_allocation_for_instance in the first place? it feels like we did because we could. | 19:56 |
*** nweinber__ has quit IRC | 19:59 | |
*** markvoelker has quit IRC | 20:00 | |
melwitt | efried, gibi, mriedem: can confirm I ran into the bug but didn't file one (and I saw in the backscroll that artom filed one) | 20:00 |
efried | mriedem: agreed, given the change we actually ended up with, we would have been better off not doing it. | 20:00 |
mriedem | i've annoted that places i'm using force=True now to try and justify the reasoning | 20:03 |
mriedem | *the places | 20:03 |
mriedem | maybe gibi can say "no we shouldn't force b/c resource requests" or something, idk | 20:03 |
efried | mriedem: if we're going to use generation-based allocation management at all, we should really be doing the GET early in the flow (except for spawn*) so that the race window actually means what it should. | 20:04 |
efried | *for spawn, we should always be using NULL, and if we get a conflict, it means we're racing with some other operation (delete, resize, etc) and should abort the spawn | 20:04 |
mriedem | *port resource requests and nested allocations | 20:04 |
efried | but the existing use of generations is worse than useless. | 20:04 |
efried | (I should have objected harder instead of +2ing that patch on the promise of "we'll improve it later") | 20:04 |
mriedem | yeah idk, i don't remember being very involved in this, in irc, or the meetings. i didn't comment in the ML thread and i didn't get into the "why"s in the patch, just reviewed it, likely to keep the series moving and trust gibi and everyone else's decisions on this (since you, chris, gibi and jay were all involved) | 20:08 |
mriedem | going back to the feeling of "we did it because we could" | 20:09 |
efried | we did it because we *should*, but we should do it right, and were planning on doing so eventually. | 20:11 |
mriedem | that makes more sense in the other patches in that series which dealt with PUTing allocations with updates rather than removing them when deleting a server | 20:12 |
*** jawad_axd has quit IRC | 20:21 | |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Provider Config File: YAML file loading and schema validation https://review.opendev.org/673341 | 20:24 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Provider Config File: Function to further validate and retrieve configs https://review.opendev.org/676029 | 20:24 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Provider Config File: Merge provider configs to provider tree https://review.opendev.org/676522 | 20:24 |
mriedem | dansmith: a few things on https://review.opendev.org/#/c/687140/ | 20:24 |
mriedem | did you intend to drop the release note? | 20:24 |
dansmith | mriedem: ah thanks. No, I tried to get all smart with the renumber on the microversion which caused me to lose several things.. it was a real timesaver | 20:25 |
*** pcaruana has quit IRC | 20:25 | |
KeithMnemonic | melwitt, mriedem, that patch from hemna finally verified. Thanks for all of you help. Reviews when someone gets time are appreciated in advance https://review.opendev.org/#/c/683008/ | 20:26 |
melwitt | thanks for the heads up | 20:27 |
*** dpawlik has joined #openstack-nova | 20:29 | |
efried | what's the ironic ring thing called? | 20:30 |
efried | rebalance puts your node in another.... "X"? | 20:30 |
efried | yeah, it's a "ring", I'm not too crazy. | 20:31 |
openstackgerrit | Dan Smith proposed openstack/nova master: Add image caching API for aggregates https://review.opendev.org/687140 | 20:31 |
dansmith | mriedem: ^ | 20:31 |
dansmith | I gotta step away for a bit..got a raging headache | 20:31 |
dansmith | and no matter what my wife says, it is NOT because I emptied a whole can of Brakleen on my valve cover last night in the closed-up garage | 20:31 |
mriedem | some are saying fumes are good for the brain | 20:32 |
mriedem | efried: yeah hashring | 20:33 |
efried | thx | 20:33 |
KeithMnemonic | looking for some tips/suggestions on another odd issue I am investigating. This is Rocky with ceph backed instances. i.e instance boots from ceph directly. per https://docs.ceph.com/docs/master/rbd/rbd-openstack/ . When doing an evacuate from a compute that is powered off it fails with "Invalid state of instance files on shared storage" it looks like somewhere here it is failing on this "Checking instance | 20:33 |
KeithMnemonic | files accessibility /var/lib/nova/instances/... nova/virt/libvirt/driver.py:8893 " my guess is maybe a permission or something but was wondering if anyone ever ran into something like this | 20:33 |
KeithMnemonic | the instance ran fine of the source compute. my next step is to try and see if the same happens with a migrate | 20:34 |
*** pcaruana has joined #openstack-nova | 20:34 | |
KeithMnemonic | and other instances in the same ceph pool are running on the target | 20:34 |
KeithMnemonic | so both computes can talk to ceph | 20:34 |
efried | mriedem: do you have to disable a compute service before you delete it? | 20:34 |
*** macz has joined #openstack-nova | 20:35 | |
*** tbachman has quit IRC | 20:35 | |
mriedem | nope | 20:35 |
mriedem | you should stop the actual process though | 20:35 |
mriedem | see https://docs.openstack.org/api-ref/compute/?expanded=delete-compute-service-detail#delete-compute-service | 20:35 |
mriedem | which is somewhat related to our old friend https://review.opendev.org/#/c/678100/ | 20:36 |
efried | right, so technically you could race service deletion with an instance operation. | 20:36 |
mriedem | yup | 20:36 |
efried | even though it means you were bad. | 20:36 |
mriedem | and we fail to delete the providers | 20:36 |
efried | rite | 20:36 |
efried | swhat I'm looking at now. | 20:36 |
mriedem | in the related ML thread for that patch we talked about making the API only proceed if the service was down but nacked that idea for some reason | 20:37 |
*** slaweq has quit IRC | 20:38 | |
mriedem | fun it looks like my rechecks are being ignored | 20:44 |
mriedem | dan rechecked https://review.opendev.org/#/c/634832/ hours ago with no results and it's not queued and i just rechecked it and it's still not queued | 20:44 |
mriedem | fungi: ^ | 20:44 |
*** bbowen has joined #openstack-nova | 20:45 | |
mriedem | my guess is because the comment doesn't start with "recheck", it starts with "(3 comments)" | 20:45 |
mriedem | yup, now it's queued | 20:45 |
mriedem | is that new behavior? | 20:45 |
*** ociuhandu has joined #openstack-nova | 20:46 | |
fungi | nope, it's just the way zuul is configured via a regular expression on the text of the comment event gerrit emits | 20:46 |
mriedem | could have sworn i've issued rechecks while leaving comments before, but can't say for certain | 20:47 |
fungi | if you leave a vote at the same time you add a recheck comment, it won't match the regex | 20:47 |
mriedem | ok | 20:48 |
fungi | https://opendev.org/openstack/project-config/src/branch/master/zuul.d/pipelines.yaml#L24 | 20:48 |
fungi | that's the current regex | 20:48 |
*** tbachman has joined #openstack-nova | 20:48 | |
fungi | i've noticed before that it ignores the recheck if i leave a vote with the same comment | 20:48 |
fungi | i've never dug in with a sample comment event to see if that regex could be extended to accommodate it | 20:49 |
mriedem | efried: are you working through bug 1841481 ? | 20:50 |
openstack | bug 1841481 in OpenStack Compute (nova) "Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache" [Medium,In progress] https://launchpad.net/bugs/1841481 - Assigned to Matt Riedemann (mriedem) | 20:50 |
efried | mriedem: stale, but in my backlog | 20:50 |
efried | is it time to get back to it? | 20:50 |
mriedem | well, was just going to point out https://review.opendev.org/#/c/684840/ and above | 20:50 |
efried | I feel guilty any time I do real code. | 20:50 |
efried | there's always some f'in ptl thing that needs doing. | 20:50 |
mriedem | your series deals with the corrupt provider tree cache, mine deals with the corrupt RT.compute_nodes cache | 20:51 |
efried | so they need to be combined? | 20:51 |
efried | or at least reconciled | 20:51 |
*** pcaruana has quit IRC | 20:51 | |
mriedem | last i looked at yours i said, | 20:53 |
mriedem | "Simply dealing with the ResourceTracker.compute_nodes invalid cache (issue #1 in the bug report) resolves the issue assuming the ProviderTree associations are considered stale. If the associations are stale in the ProviderTree cache, we likely still have a problem which is what Eric's series here is dealing with (but Eric's series doesn't deal with the ResourceTracker.compute_nodes aspect of the bug)." | 20:53 |
mriedem | i'm not totally sure my functional recreate test hits all of the nuance with the provider tree cache but it does check _associations_stale | 20:54 |
mriedem | the actual RT.compute_nodes cache fix is simple https://review.opendev.org/#/c/684849/2/nova/compute/resource_tracker.py | 20:55 |
mriedem | don't cache the node unless RT._update is OK | 20:55 |
*** dpawlik has quit IRC | 20:55 | |
*** tbachman has quit IRC | 21:02 | |
mriedem | http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Skipping%20removal%20of%20allocations%20for%20deleted%20instances%3A%20Failed%20to%20retrieve%20allocations%20for%20resource%20provider%5C%22%20AND%20message%3A%5C%22No%20resource%20provider%20with%20uuid%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22%20AND%20project%3A%5C%22openstack%2Fironic%5C%22&from=7d | 21:04 |
mriedem | we do see it in ironic multinode jobs | 21:04 |
*** dpawlik has joined #openstack-nova | 21:06 | |
mriedem | maybe the functional test is too much in there, idk - could just do a simple unit test to make sure we don't save the node in RT.compute_nodes if _update fails like we did here https://review.opendev.org/#/c/675704/ | 21:08 |
*** dpawlik has quit IRC | 21:11 | |
openstackgerrit | Dan Smith proposed openstack/nova master: Add image caching API for aggregates https://review.opendev.org/687140 | 21:14 |
efried | clearly I'll need to set aside some time to swap that all in | 21:32 |
efried | which isn't today unfortunately. | 21:32 |
*** ociuhandu has quit IRC | 21:40 | |
*** TxGirlGeek has quit IRC | 21:46 | |
*** markvoelker has joined #openstack-nova | 22:00 | |
*** markvoelker has quit IRC | 22:04 | |
*** ivve has quit IRC | 22:12 | |
*** rcernin has joined #openstack-nova | 22:30 | |
*** francoisp has quit IRC | 22:35 | |
*** bnemec has quit IRC | 22:38 | |
*** jmlowe has joined #openstack-nova | 22:41 | |
*** tbachman has joined #openstack-nova | 22:44 | |
openstackgerrit | Merged openstack/nova master: Add cache_images() to conductor https://review.opendev.org/687139 | 22:56 |
openstackgerrit | Merged openstack/nova master: Fix legacy issues in filter migrations by user_id/project_id https://review.opendev.org/682198 | 22:56 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Add negative test to delete server during cross-cell resize claim https://review.opendev.org/688832 | 23:10 |
*** tkajinam has joined #openstack-nova | 23:16 | |
mriedem | drats, have to rebase the cross-cell resize series again | 23:16 |
mriedem | dansmith: we can talk about it tomorrow or whenever but i wrote that negative test you asked for https://review.opendev.org/688832 and it exposes a latent bug in how MigrationTask.rollback works, which i think affects same-cell resize as well wrt leaked allocations on the source host | 23:17 |
mriedem | but i'm basically done for the day as well | 23:17 |
*** macz has quit IRC | 23:22 | |
*** gbarros has quit IRC | 23:23 | |
*** mriedem has quit IRC | 23:27 | |
*** mlavalle has quit IRC | 23:35 | |
*** eharney has quit IRC | 23:41 | |
*** markvoelker has joined #openstack-nova | 23:45 | |
*** brinzhang has joined #openstack-nova | 23:48 | |
*** brinzhang_ has joined #openstack-nova | 23:51 | |
*** brinzhang has quit IRC | 23:54 | |
*** markvoelker has quit IRC | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!