*** tosky has quit IRC | 00:01 | |
*** macz_ has quit IRC | 00:15 | |
*** bbowen has quit IRC | 00:36 | |
*** bbowen has joined #openstack-nova | 00:37 | |
*** LinPeiWen has joined #openstack-nova | 00:40 | |
*** mlavalle has quit IRC | 00:47 | |
*** kevinz has joined #openstack-nova | 01:03 | |
*** LinPeiWen has quit IRC | 01:08 | |
*** LinPeiWen94 has joined #openstack-nova | 01:19 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Replaces tenant_id with project_id from List/Update Servers APIs https://review.opendev.org/c/openstack/nova/+/764292 | 01:28 |
---|---|---|
openstackgerrit | Brin Zhang proposed openstack/nova master: Replace all_tenants with all_projects in List Server APIs https://review.opendev.org/c/openstack/nova/+/765311 | 01:28 |
*** dklyle has quit IRC | 01:40 | |
*** tkajinam has quit IRC | 01:41 | |
*** tkajinam has joined #openstack-nova | 01:42 | |
*** dklyle has joined #openstack-nova | 01:48 | |
*** chengsheng1 is now known as chengsheng | 02:00 | |
*** tinwood has quit IRC | 02:08 | |
*** tkajinam has quit IRC | 02:09 | |
*** tkajinam has joined #openstack-nova | 02:10 | |
*** tinwood has joined #openstack-nova | 02:11 | |
*** macz_ has joined #openstack-nova | 02:16 | |
*** zenkuro has quit IRC | 02:18 | |
*** macz_ has quit IRC | 02:21 | |
*** ccstone has quit IRC | 02:26 | |
*** ccstone has joined #openstack-nova | 02:26 | |
*** zzzeek has quit IRC | 02:30 | |
*** spatel has joined #openstack-nova | 02:31 | |
*** zzzeek has joined #openstack-nova | 02:31 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Replaces tenant_id with project_id from Rebuild Server API https://review.opendev.org/c/openstack/nova/+/766380 | 02:35 |
*** zzzeek has quit IRC | 02:36 | |
*** zzzeek has joined #openstack-nova | 02:40 | |
*** dklyle has quit IRC | 02:45 | |
*** hamalq has quit IRC | 02:56 | |
*** rcernin has quit IRC | 02:57 | |
*** sapd1 has joined #openstack-nova | 02:58 | |
*** sapd1 has quit IRC | 03:03 | |
*** mkrai has joined #openstack-nova | 03:04 | |
*** rcernin has joined #openstack-nova | 03:18 | |
*** rcernin has quit IRC | 03:21 | |
*** rcernin has joined #openstack-nova | 03:21 | |
*** psachin has joined #openstack-nova | 03:33 | |
*** sapd1 has joined #openstack-nova | 03:36 | |
*** swp20 has quit IRC | 03:44 | |
*** swp20 has joined #openstack-nova | 03:45 | |
*** sapd1 has quit IRC | 04:00 | |
*** sapd1 has joined #openstack-nova | 04:09 | |
*** rcernin has quit IRC | 04:35 | |
*** rcernin has joined #openstack-nova | 04:35 | |
*** ratailor has joined #openstack-nova | 04:52 | |
openstackgerrit | sean mooney proposed openstack/os-traits master: add vdpa trait https://review.opendev.org/c/openstack/os-traits/+/770530 | 04:57 |
*** sapd1 has quit IRC | 05:01 | |
*** gyee has quit IRC | 05:08 | |
*** vishalmanchanda has joined #openstack-nova | 05:11 | |
openstackgerrit | sean mooney proposed openstack/os-traits master: add vdpa trait https://review.opendev.org/c/openstack/os-traits/+/770530 | 05:12 |
openstackgerrit | sean mooney proposed openstack/nova master: [WIP] add vdpa nodedev parsing and interface config gen https://review.opendev.org/c/openstack/nova/+/770532 | 05:14 |
openstackgerrit | sean mooney proposed openstack/nova master: [WIP] add vdpa trait reporting. https://review.opendev.org/c/openstack/nova/+/770533 | 05:14 |
openstackgerrit | sean mooney proposed openstack/nova master: add constants for vnic type vdpa https://review.opendev.org/c/openstack/nova/+/770474 | 05:21 |
openstackgerrit | sean mooney proposed openstack/nova master: [WIP] add vdpa nodedev parsing and interface config gen https://review.opendev.org/c/openstack/nova/+/770532 | 05:21 |
openstackgerrit | sean mooney proposed openstack/nova master: [WIP] add vdpa trait reporting. https://review.opendev.org/c/openstack/nova/+/770533 | 05:21 |
*** alex_xu has joined #openstack-nova | 05:26 | |
*** rcernin_ has joined #openstack-nova | 05:42 | |
*** rcernin has quit IRC | 05:42 | |
openstackgerrit | sean mooney proposed openstack/nova master: [WIP] add vdpa prefilter https://review.opendev.org/c/openstack/nova/+/770534 | 05:47 |
*** sapd1 has joined #openstack-nova | 05:55 | |
*** spatel has quit IRC | 05:58 | |
*** hemanth_n has joined #openstack-nova | 06:47 | |
*** mkrai has quit IRC | 06:54 | |
*** mkrai has joined #openstack-nova | 06:54 | |
*** mkrai has quit IRC | 07:13 | |
*** ralonsoh has joined #openstack-nova | 07:19 | |
*** zzzeek has quit IRC | 07:28 | |
*** rcernin_ has quit IRC | 07:28 | |
*** zzzeek has joined #openstack-nova | 07:31 | |
*** openstackgerrit has quit IRC | 07:47 | |
*** mkrai has joined #openstack-nova | 07:52 | |
*** nightmare_unreal has joined #openstack-nova | 07:53 | |
gibi | good morning | 07:55 |
*** slaweq has joined #openstack-nova | 07:59 | |
*** slaweq has quit IRC | 08:04 | |
*** rcernin_ has joined #openstack-nova | 08:06 | |
*** slaweq has joined #openstack-nova | 08:10 | |
*** andrewbonney has joined #openstack-nova | 08:13 | |
*** mkrai has quit IRC | 08:13 | |
*** tesseract has joined #openstack-nova | 08:17 | |
*** rpittau|afk is now known as rpittau | 08:25 | |
*** rcernin_ has quit IRC | 08:26 | |
*** tosky has joined #openstack-nova | 08:39 | |
*** mkrai has joined #openstack-nova | 08:44 | |
lyarwood | Morning | 08:48 |
gibi | lyarwood: melwit explained one of my questions in the detach patch, so I have things to do with that patch, but if you have any other hints about open question then I would be glad to discuss | 08:49 |
lyarwood | gibi: I've just got the change open now, let me take a look | 08:51 |
gibi | cool | 08:51 |
gibi | I promise I will not dissapera now for couple of hours :) | 08:52 |
*** songwenping_ has joined #openstack-nova | 09:12 | |
lyarwood | gibi: okay updated, I need to check if there's an internal libvirt timeout for these detach events | 09:12 |
*** swp20 has quit IRC | 09:14 | |
lyarwood | gibi: ah nope, it's raised on a sync failure, there's no async checking within libvirtd that raises it | 09:15 |
lyarwood | I didn't post my comments anyway, doh! | 09:16 |
*** dasp_ has quit IRC | 09:16 | |
*** dasp has joined #openstack-nova | 09:17 | |
gibi | lyarwood: thanks | 09:18 |
gibi | lyarwood: yeah, the persisten/live error comes synchronously | 09:19 |
gibi | lyarwood: do you happen to know that when we check that the device is in the domain does that check looks into the live domian? | 09:20 |
lyarwood | gibi: iirc we use XMLDesc(0) to dump the domain and that's the live config | 09:23 |
lyarwood | gibi: there was a bug about this for paused instances iirc | 09:23 |
lyarwood | gibi: where we need to provide the VIR_DOMAIN_XML_INACTIVE flag https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainXMLFlags | 09:24 |
gibi | lyarwood: thanks, so we check the live config, thats good, then if the synch error came then we can simply check the live domain and it device is there then we can retry | 09:27 |
lyarwood | gibi: yeah I'd continue to retry on a direct sync error if the device is still there, VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED (but that should be a direct sync failure?) and a configurable timeout within n-cpu | 09:29 |
gibi | VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED is the failed event | 09:29 |
gibi | so tathat is async | 09:29 |
gibi | I can unify the retry if we get sync or async failure and the device is still in the live domain then we retry | 09:30 |
lyarwood | right sorry my point was that within libvirt at least it looks like that's only actually raised synchronously with the failure of the initial request to QEMU and that should bubble up directly to our call to libvirt | 09:30 |
lyarwood | yup cool that works | 09:31 |
lyarwood | I'm likely missing something in the libvirt code anyway regarding where VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED is being raised so that sounds like the best approach | 09:31 |
gibi | ack, thanks for the help | 09:32 |
gibi | if you get the libvirt timeout value for detach event then let me know and I will update the nova timeout value to be bigger | 09:32 |
lyarwood | kashyap: https://review.opendev.org/c/openstack/nova/+/770246 ; you might be interested in this, gibi is trying to rewrite our detach device logic in the libvirt driver to use events. I've made some comments in the change but if you have anymore context feel free to add it there. | 09:35 |
*** derekh has joined #openstack-nova | 09:35 | |
* lyarwood will try to join the libvirt channel again later today and ask about the behaviour of the events when we fail to detach | 09:35 | |
kashyap | lyarwood: Yeah, was just skimming the chat here. Was responding to something downstream that was breathing down my neck | 09:35 |
lyarwood | np | 09:35 |
lyarwood | switching topics, stephenfin how's your SQL/sqlalchemy foo? trying to work out if 1. the following is a valid query for a nova-status command and 2. if it would work in sqlalchemy. | 09:37 |
lyarwood | select distinct instances.uuid from instances left join instance_system_metadata on instances.uuid = instance_system_metadata.instance_uuid where instances.uuid not in (select instance_system_metadata.instance_uuid from instance_system_metadata where instance_system_metadata.key = 'hw_machine_type'); | 09:37 |
lyarwood | tl;dr I'm trying to list the instance uuids that *don't* have a `hw_machine_type` key set in instance_system_metadata | 09:38 |
lyarwood | and it has been waaaaaaaaaaaay too long since I wrote any SQL so this might be entirely wrong | 09:38 |
*** songwenping_ has quit IRC | 09:39 | |
*** songwenping_ has joined #openstack-nova | 09:39 | |
kashyap | gibi: Thx for taking up that; I just skimmed the patch. I'll look deeper; once I switch context. | 09:40 |
gibi | kashyap: thanks | 09:41 |
lyarwood | oh and that reminds me, sean-k-mooney, you know how you asked if we could stash image metadata properties in instance_system_metadata? Well they are already there. | 09:41 |
stephenfin | lyarwood: It's not my strongest skill, but that does look reasonable to me. I don't think the subquery is necessary, but the syntax I'm thinking of could be backend-specific | 09:42 |
lyarwood | sean-k-mooney: https://github.com/openstack/nova/blob/e6f5e814050a19d6f027037424556b2889514ec3/nova/objects/image_meta.py#L113-L127 | 09:42 |
lyarwood | stephenfin: yeah I couldn't work out the SQL to select instances.uuid where instance_system_metadata.key doesn't contain 'hw_machine_type' | 09:43 |
*** zenkuro has joined #openstack-nova | 09:43 | |
*** hoonetorg has joined #openstack-nova | 09:44 | |
lyarwood | stephenfin: I'll convert this into sqla for now and go from there, thanks | 09:45 |
stephenfin | lyarwood: 0c441e636ba9d287909584b6ddf15eab5d479f0e would be good prior art also | 09:47 |
stephenfin | If not an exact match, at least it might help in terms of wiring up the machinery for an online migration | 09:48 |
lyarwood | stephenfin: I wasn't going to write an online migration for this | 09:49 |
lyarwood | stephenfin: this is something n-cpu will populate at startup | 09:49 |
lyarwood | stephenfin: and nova-status can warn about later prior to changing defaults | 09:50 |
stephenfin | ah, gotcha | 09:50 |
lyarwood | stephenfin: I don't see a query in that change FWIW | 09:50 |
lyarwood | well not a join etc | 09:50 |
*** openstackgerrit has joined #openstack-nova | 10:28 | |
openstackgerrit | YumengBao proposed openstack/os-traits master: add owner traits for accelerator resources https://review.opendev.org/c/openstack/os-traits/+/770569 | 10:28 |
*** tesseract has quit IRC | 10:31 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Replaces tenant_id with project_id from List SG API https://review.opendev.org/c/openstack/nova/+/766726 | 10:32 |
*** tesseract has joined #openstack-nova | 10:33 | |
gibi | brinzhang, alex_xu: responded in https://review.opendev.org/c/openstack/nova/+/729563 (finally) | 10:34 |
openstackgerrit | Kashyap Chamarthy proposed openstack/os-traits master: Add a trait for UEFI Secure Boot support https://review.opendev.org/c/openstack/os-traits/+/770570 | 10:34 |
*** ociuhandu has joined #openstack-nova | 10:40 | |
*** dtantsur|afk is now known as dtantsur | 10:41 | |
openstackgerrit | Stephen Finucane proposed openstack/python-novaclient master: Add support for microversion v2.88 https://review.opendev.org/c/openstack/python-novaclient/+/770573 | 10:42 |
brinzhang | gibi: so we wont merge this patch, right? | 10:49 |
*** hemanth_n has quit IRC | 11:01 | |
*** songwenping__ has joined #openstack-nova | 11:02 | |
*** songwenping_ has quit IRC | 11:05 | |
*** mkrai has quit IRC | 11:26 | |
*** ociuhandu has quit IRC | 11:36 | |
*** zenkuro has quit IRC | 11:40 | |
*** zenkuro has joined #openstack-nova | 11:41 | |
gibi | brinzhang: we need a separate bugfix, that is all what I said | 11:46 |
brinzhang | IMO, the bug fix shuold not prevent this patch go | 11:48 |
brinzhang | we shuold register a bugfix, and then fix it | 11:48 |
gibi | brinzhang: yepp, that works for me | 11:51 |
brinzhang | gibi: thanks, I hope we can make this patch merge, it's also meet alex_xu's meaning | 11:53 |
*** sapd1 has quit IRC | 12:01 | |
*** mgariepy has quit IRC | 12:04 | |
*** raildo has joined #openstack-nova | 12:16 | |
*** ratailor has quit IRC | 12:26 | |
sean-k-mooney | lyarwood: i know the image metadata is in the instance_system_metadata table | 12:26 |
sean-k-mooney | lyarwood: thats why i wanted you to set the value there | 12:26 |
sean-k-mooney | they are just prefixed with img_ | 12:27 |
lyarwood | sean-k-mooney: ah I thought you were also suggesting that we dump all of the image metadata props in there as well | 12:27 |
lyarwood | sean-k-mooney: I just missed that they were there already, prefixed by image_ as you said | 12:28 |
sean-k-mooney | yeah so i was suggesting setting the effective values of all image props there instead of just the set values | 12:28 |
sean-k-mooney | e.g. if you dont have hw_vif_model set today it will normally default to virtio | 12:29 |
lyarwood | ah right | 12:29 |
sean-k-mooney | so we woudl store img_hw_vif_model=virtio | 12:29 |
sean-k-mooney | or whatever it is | 12:29 |
sean-k-mooney | as if it had been set | 12:29 |
lyarwood | FWIW I'm not overwriting image_hw_machine_type at the moment | 12:30 |
lyarwood | I'm just dumping it into hw_machine_type | 12:30 |
lyarwood | image_hw_machine_type just remains on the original value | 12:30 |
sean-k-mooney | ya you could do that the only issue with that approch is you have to now check both in the code | 12:30 |
sean-k-mooney | well if image_hw_machine_type was set tehn you would not be settting hw_machine_type | 12:31 |
sean-k-mooney | since you only need to set that if the machine type is not set in the image | 12:31 |
lyarwood | it's just copied from image_meta in that case | 12:31 |
sean-k-mooney | yep | 12:32 |
sean-k-mooney | i was trying to avoid having two sources of truth | 12:32 |
sean-k-mooney | e.g. image_hw_machine_type and hw_machine_type | 12:32 |
lyarwood | image_hw_machine_type is just the original, hw_machine_type is the single source of truth from now on | 12:33 |
lyarwood | as we can change it over time etc | 12:33 |
sean-k-mooney | well no we cant thats the point | 12:33 |
sean-k-mooney | if its set in the image it cant be changed | 12:33 |
lyarwood | you can through the versioned machine types | 12:33 |
sean-k-mooney | no if its set in the image thats it we dont use the config values at all | 12:34 |
lyarwood | why? moving forward through the versioned machine types provides a stable ABI etc | 12:34 |
sean-k-mooney | it would break backwards compatiablity with teh existing usage | 12:35 |
lyarwood | how? | 12:35 |
kashyap | Yes to what lyarwood said on versioned machine types | 12:35 |
kashyap | sean-k-mooney: When talking of this topic, a clear example would make sure we're not talking of different things. | 12:35 |
sean-k-mooney | the existing usage is that if yuou set a version machine type in the image metadata it will have that version for the lifetime of the instance | 12:35 |
sean-k-mooney | if you set the unversion on then it will use the latest version on the host it spawns on | 12:35 |
sean-k-mooney | we should not be changing that behavior in your spec | 12:36 |
lyarwood | the only part I'm changing is that instead of being for the lifetime of the instance operators can now update the versioned machien type | 12:36 |
lyarwood | aliases from the image would stay, I don't switch them out for the versioned machine types etc | 12:37 |
sean-k-mooney | lyarwood: that was not part of the spec | 12:37 |
sean-k-mooney | we did not provide any mechanium to update the machine type over the instacen lifttime | 12:38 |
sean-k-mooney | that is what the recreate api would provide | 12:38 |
lyarwood | that's between types, why would we ask users to rebuild for a version update? | 12:39 |
sean-k-mooney | operators could alwasy update the versioned machine type by updating the config for instance that dont have hw_machine_type set | 12:39 |
lyarwood | and I'm pretty sure it's in the spec | 12:39 |
sean-k-mooney | lyarwood: recreate with the same image and flavor was ment to just update the metadata its not the same as rebuild in that case | 12:40 |
lyarwood | sean-k-mooney: ah sorry you're talking about an API that doesn't exist :) | 12:40 |
sean-k-mooney | yes the part that was defered/rejected at the ptg meaning we had no aggreaded way to update the machine type | 12:41 |
lyarwood | sean-k-mooney: and I agree that would be nicer and would mean we wouldn't need a nova-manage command for this | 12:41 |
* lyarwood is being called for lunch, brb | 12:41 | |
sean-k-mooney | ya so in the spec the only way to change the machine type is via the nova manage command | 12:43 |
openstackgerrit | Vlad Gusev proposed openstack/nova stable/victoria: Use subqueryload() instead of joinedload() for (system_)metadata https://review.opendev.org/c/openstack/nova/+/761809 | 12:43 |
sean-k-mooney | but the image metadata still has precidence | 12:43 |
sean-k-mooney | so it only matters for vms without that | 12:43 |
*** ociuhandu has joined #openstack-nova | 12:45 | |
*** spatel has joined #openstack-nova | 12:47 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Replaces tenant_id with project_id from Flavor Access APIs https://review.opendev.org/c/openstack/nova/+/767704 | 12:51 |
*** spatel has quit IRC | 12:52 | |
*** bbowen has quit IRC | 12:53 | |
brinzhang | bauzas: do you have time to check my response of your comments in https://review.opendev.org/c/openstack/nova/+/729563, there is -1 on it | 12:54 |
brinzhang | bauzas: may it will take your some time, but whatever thanks | 12:55 |
*** ociuhandu has quit IRC | 12:55 | |
*** ociuhandu has joined #openstack-nova | 12:56 | |
*** ociuhandu has quit IRC | 12:56 | |
*** vishalmanchanda has quit IRC | 13:01 | |
*** ociuhandu has joined #openstack-nova | 13:01 | |
*** brinzhang has quit IRC | 13:11 | |
*** brinzhang has joined #openstack-nova | 13:11 | |
*** mgariepy has joined #openstack-nova | 13:21 | |
*** bbowen has joined #openstack-nova | 13:21 | |
bauzas | brinzhang: ack, will look at your replies then | 13:31 |
*** whoami-rajat__ has joined #openstack-nova | 13:46 | |
*** links has joined #openstack-nova | 13:47 | |
*** links has quit IRC | 13:47 | |
*** nweinber has joined #openstack-nova | 13:55 | |
*** nweinber has quit IRC | 14:01 | |
*** liuyulong has joined #openstack-nova | 14:01 | |
*** nweinber has joined #openstack-nova | 14:02 | |
*** liuyulong has quit IRC | 14:03 | |
*** jmlowe has joined #openstack-nova | 14:26 | |
*** vishalmanchanda has joined #openstack-nova | 14:35 | |
*** mkrai has joined #openstack-nova | 14:40 | |
openstackgerrit | Takashi Natsume proposed openstack/python-novaclient master: Deprecate agent commands and APIs https://review.opendev.org/c/openstack/python-novaclient/+/769068 | 14:44 |
*** belmoreira has joined #openstack-nova | 14:53 | |
*** ociuhandu_ has joined #openstack-nova | 14:58 | |
*** zenkuro has quit IRC | 15:02 | |
*** ociuhandu has quit IRC | 15:02 | |
*** zenkuro has joined #openstack-nova | 15:03 | |
*** macz_ has joined #openstack-nova | 15:13 | |
*** macz_ has quit IRC | 15:17 | |
*** psachin has quit IRC | 15:28 | |
*** ociuhandu_ has quit IRC | 15:35 | |
*** ociuhandu has joined #openstack-nova | 15:35 | |
*** dklyle has joined #openstack-nova | 15:46 | |
*** sapd1 has joined #openstack-nova | 15:59 | |
*** macz_ has joined #openstack-nova | 16:02 | |
*** mkrai has quit IRC | 16:03 | |
*** mkrai_ has joined #openstack-nova | 16:03 | |
*** rnoriega_ is now known as rnoriega123 | 16:06 | |
*** rnoriega123 is now known as rnoriega_ | 16:06 | |
*** rnoriega_ is now known as rnoriega | 16:09 | |
*** ociuhandu_ has joined #openstack-nova | 16:11 | |
*** ociuhandu has quit IRC | 16:11 | |
melwitt | stephenfin, artom: would like to have your numa expert review on this patch (and the func test below it) please if you could spare some time this week https://review.opendev.org/c/openstack/nova/+/769614 | 16:19 |
artom | melwitt, will take a look tomorrow | 16:19 |
*** mgariepy has quit IRC | 16:20 | |
melwitt | thanks! | 16:20 |
artom | Hopefully before then, actually | 16:20 |
*** mkrai_ has quit IRC | 16:22 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP libvirt: Record the machine_type of instances in system_metadata https://review.opendev.org/c/openstack/nova/+/767533 | 16:29 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP nova-manage: Add commands for managing instance machine type https://review.opendev.org/c/openstack/nova/+/769548 | 16:29 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP nova-status: Add hw_machine_type check for libvirt instances https://review.opendev.org/c/openstack/nova/+/770643 | 16:29 |
*** slaweq has quit IRC | 16:34 | |
*** slaweq has joined #openstack-nova | 16:36 | |
*** adrianc has quit IRC | 16:41 | |
*** tosky has quit IRC | 16:41 | |
*** adrianc has joined #openstack-nova | 16:42 | |
*** tosky has joined #openstack-nova | 16:42 | |
*** tesseract has quit IRC | 16:54 | |
sean-k-mooney | artom: i havenet made any changes yet but i responed to your questions in https://review.opendev.org/c/openstack/nova-specs/+/764999/2/specs/wallaby/approved/libvirt-vdpa-support.rst can you re review | 16:56 |
sean-k-mooney | i have some WIP patches up as well https://review.opendev.org/q/topic:%22vhost-vdpa%22+(status:open%20OR%20status:merged) | 16:57 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: DNM try to replace retry with libvirt event in detach https://review.opendev.org/c/openstack/nova/+/770246 | 16:57 |
sean-k-mooney | its not fully complete as i still need to extend the pci tracker and where we create the pci request | 16:57 |
sean-k-mooney | and i need to test it locally and write tests and docs | 16:58 |
sean-k-mooney | but it has the outline of about 2 thirds of the code | 16:58 |
artom | sean-k-mooney, cool - I want to un-WIP my socket affinity spec, then I'll take a look | 16:59 |
*** gyee has joined #openstack-nova | 17:00 | |
*** ociuhandu_ has quit IRC | 17:06 | |
*** markguz_ has joined #openstack-nova | 17:08 | |
*** ociuhandu has joined #openstack-nova | 17:09 | |
markguz_ | Hi nova folks. THe good people at #openstack-ironic thought it might be a good idea for me to ask about my problem here. | 17:10 |
markguz_ | I've got an issue where my ironic instance spawning is getting stuck for +/- 10mins just after i start deployment. The nova-scheduler picks the correct compute node and then the nova-compute node reports "Starting instance... _do_build_and_run_instance" | 17:12 |
markguz_ | then nothing happens for about 10 mins. then suddenly the process starts and deployment continues.. | 17:12 |
*** ociuhandu_ has joined #openstack-nova | 17:13 | |
markguz_ | TheJulia over at ironic thinks that the process is getting stuck at the scheduling stage. The vm reports Building, but the task status sits at "none" during that 10mins | 17:13 |
markguz_ | For VMs there is no delay, only for BMs. This is Rocky. and it is not a busy deployment. Very little activity going. We can go days without spawning bm or vm instances | 17:14 |
markguz_ | I've been trying to dig around the code to see what happens when the compute node reports "Starting instance... _do_build_and_run_instance" but it's very hard to follow | 17:15 |
markguz_ | If anyone could help me follow the rabbit through the rabbit hole and see exactly what is happening, I'd be most appreciative | 17:16 |
*** ociuhandu has quit IRC | 17:16 | |
*** ociuhandu_ has quit IRC | 17:17 | |
*** mgariepy has joined #openstack-nova | 17:20 | |
*** ociuhandu has joined #openstack-nova | 17:22 | |
melwitt | markguz_: if it's landed on the compute node already, we don't consider that to be "scheduling" as it's already been scheduled/placed. but to dig into this further you'll want to trace the request id of the line that says "Starting instance" in the nova-compute log and see if you can see where it stops making progress. that would help | 17:23 |
sean-k-mooney | if this is ironic by they way whe n we get do do_build_and_run_instance at some point the compute manager will hand of to the ironic driver which will call ironic to provision the node | 17:25 |
melwitt | right. would want to look and see if he can verify it's gotten to that point | 17:26 |
sean-k-mooney | https://opendev.org/openstack/nova/src/branch/master/nova/compute/manager.py#L2186 starting to build instance is right at the top | 17:26 |
*** ociuhandu has quit IRC | 17:27 | |
sean-k-mooney | the we save the task state at None and vmstate building | 17:27 |
melwitt | yeah, I know. just saying he can trace the request id to see how far it gets | 17:27 |
melwitt | after that | 17:27 |
markguz_ | here's a grep of the req id for an instance out of the logs http://paste.openstack.org/show/801601/ | 17:29 |
markguz_ | nothing between 9.40 and 10.05 | 17:29 |
sean-k-mooney | do you have the concurnet build limit set | 17:31 |
stephenfin | melwitt: Comments left | 17:31 |
stephenfin | sean-k-mooney: ^ | 17:31 |
sean-k-mooney | for the compute service | 17:31 |
sean-k-mooney | it defaults to 10 i belive | 17:31 |
*** zenkuro has quit IRC | 17:31 | |
TheJulia | markguz_: something between scheduling and initial network setup :\ | 17:31 |
sean-k-mooney | ock "compute_resources" acquired by "nova.compute.resource_tracker.instance_claim" :: waited 1495.269s | 17:31 |
sean-k-mooney | it looks like it was jsut waiting on the RT lock | 17:32 |
melwitt | yep, waited 24 min for the lock | 17:32 |
melwitt | thank you stephenfin | 17:32 |
markguz_ | why would the lock take 25mins? | 17:33 |
sean-k-mooney | the comput service is likely bussy starting up | 17:34 |
sean-k-mooney | you mentioned this is only after the iniall start right | 17:34 |
sean-k-mooney | or is this for each spawn | 17:34 |
lyarwood | just grep for the compute_resources lock and see what was holding it before? | 17:34 |
sean-k-mooney | ya that too | 17:35 |
sean-k-mooney | you could see how many other instance got the lock in that interval | 17:35 |
markguz_ | sean-k-mooney: it's every spawn. usually 10mins, sometimes longer and sometimes shorter | 17:36 |
lyarwood | does the resource tracker make external API calls in the Ironic driver? | 17:36 |
sean-k-mooney | the RT is shared so i dont think so | 17:38 |
lyarwood | yeah sorry I mean the code that refreshes it within the driver | 17:39 |
sean-k-mooney | im pretty sure this is the lock in question https://opendev.org/openstack/nova/src/branch/stable/rocky/nova/compute/manager.py#L2221-L2222 | 17:40 |
melwitt | markguz_: I think you might be hitting https://bugs.launchpad.net/nova/+bug/1864122 | 17:41 |
openstack | Launchpad bug 1864122 in OpenStack Compute (nova) "Instances (bare metal) queue for 30-60 seconds when managing a large amount of Ironic nodes" [Medium,Fix released] - Assigned to Jason Anderson (jasonandersonatuchicago) | 17:41 |
sean-k-mooney | yep that was what grabed the lock https://opendev.org/openstack/nova/src/branch/stable/rocky/nova/compute/resource_tracker.py#L159-L160 | 17:41 |
markguz_ | i have +/- 240 nodes | 17:42 |
markguz_ | is that a large amount? | 17:42 |
sean-k-mooney | melwitt: yep a race on the lock with the update periodic task seams likely | 17:42 |
melwitt | if you read the bug it says can be seen around > 100 nodes | 17:42 |
sean-k-mooney | markguz_: how many ironic compute services do you have | 17:43 |
melwitt | that fix is available in ussuri and onward, it was not backported because it requires a newer version of oslo.concurrency | 17:43 |
markguz_ | sean-k-mooney: 1 | 17:43 |
sean-k-mooney | i belive the periodic will only update the resouce usage for the nodes that are assgined to it | 17:43 |
sean-k-mooney | so i think you can scale it by deploying more compute service instances TheJulia is that correct? | 17:44 |
sean-k-mooney | markguz_: if you have 3 contolers i would suggest running an ironic nova compute service instance on each assuming that makes sense to TheJulia or others | 17:45 |
TheJulia | sean-k-mooney: yes, you can, you should just be able to run multiple instances | 17:45 |
TheJulia | markguz_: ^^^ instances of nova-compute configured for ironic | 17:45 |
stephenfin | melwitt: comments left on the bug report too | 17:45 |
* stephenfin knocks off for the evening o/ | 17:45 | |
sean-k-mooney | markguz_: the other thing you could do is reduce the interval of the periodic | 17:46 |
melwitt | hm, I thought you needed to configure node partitioning to do that | 17:46 |
sean-k-mooney | we fixed it by chanigin the type of lock we use | 17:46 |
melwitt | "conductor groups" | 17:46 |
TheJulia | melwitt: only to force specific grouping/allocation into specific grouping | 17:46 |
sean-k-mooney | that on the ironic side i think | 17:46 |
melwitt | it's not | 17:47 |
TheJulia | its on both sides | 17:47 |
sean-k-mooney | ah ok | 17:47 |
markguz_ | peridoc_task_interval is set to 240 | 17:47 |
melwitt | well, it might be but you have to do it on the nova side too | 17:47 |
TheJulia | otherwise it runs a hash ring based upon the node list | 17:47 |
TheJulia | and the group is just a key in the hash ring | 17:47 |
sean-k-mooney | we improved this in nova by using oslos fair locks | 17:48 |
TheJulia | the nova side name is a little different because naming_is_fun^TM | 17:48 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/711528/2 | 17:48 |
markguz_ | so we use this in a lab env and when we spin up baremetal we need to spin up a spcific node as they are connected to specific hardware that is being tested | 17:48 |
sean-k-mooney | but that was only done in ussuri | 17:48 |
TheJulia | sean-k-mooney: ohhhhh neat | 17:48 |
sean-k-mooney | so we would have to backport it unfortunetlly im not sure oslo has the required support in rocky let me check | 17:49 |
melwitt | ok conductor groups are not available until stein anyways | 17:49 |
sean-k-mooney | we would need oslo.concurrancy 3.29.0 to backport it | 17:49 |
markguz_ | if i run multiple computes i'm guessing that i will need to change how i call a instance. right now i use the "avail_zone:compute_host:bm_uuid" trick | 17:49 |
melwitt | yeah, I said all of that earlier | 17:49 |
sean-k-mooney | stable rocky is oslo.concurrency===3.27.0 | 17:49 |
TheJulia | markguz_: yeah, :\ | 17:50 |
melwitt | yes, the patch that added fair locks bumped the oslo.concurrency version | 17:50 |
sean-k-mooney | so it can go back to stien | 17:50 |
sean-k-mooney | but not rocky | 17:50 |
melwitt | so it wasn't bumped until ussuri | 17:50 |
sean-k-mooney | markguz_: ya you would need to know which host has it | 17:51 |
markguz_ | what's ironic is (pun intended) is that i was upgrading with the intention of getting to ussuri but when ironic failed at the rocky step i didn't want to compound the problem by continuing to upgrade | 17:51 |
melwitt | huh yeah actually it could be backported to stein because the upper constraint is 3.29.1 for whatever reason | 17:52 |
melwitt | I did not expect that | 17:52 |
markguz_ | assuming the bug is the problem going to ussuri will fix it? but that will break a lot of our automation due to the way were calling the nodes | 17:53 |
markguz_ | i mean it's not the end of the world, but ugh.. more work :-( | 17:53 |
markguz_ | at least i finally have a better idea of what's wrong at least. i seriously was losing the will to live over this ;-) | 17:53 |
melwitt | yeah. you could try to haxx and apply the patch to see if it helps. you just need oslo.concurrency >= 3.29.0 | 17:54 |
melwitt | (so you know for sure whether you're hitting that bug) | 17:54 |
markguz_ | melwitt: does the patch need to go on the scheduler or the compute node? or both? | 17:55 |
melwitt | markguz_: compute node | 17:55 |
markguz_ | melwitt: then i can probably crowbar that in | 17:56 |
*** mlavalle has joined #openstack-nova | 17:57 | |
melwitt | bleh, there's merge conflicts but it's really just adding fair=True to all the @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE, fair=True) | 17:58 |
markguz_ | ok. i'll give it try and see what happens. | 17:59 |
markguz_ | will i need to upgrade the other oslo. components or just concurrency? | 18:00 |
melwitt | just concurrency | 18:00 |
openstackgerrit | melanie witt proposed openstack/nova stable/train: Use fair locks in resource tracker https://review.opendev.org/c/openstack/nova/+/770585 | 18:02 |
*** hamalq has joined #openstack-nova | 18:03 | |
sean-k-mooney | we could proably implenet a version of the patch for rocky too | 18:04 |
sean-k-mooney | that just did not use the fair lock form oslo | 18:04 |
*** rpittau is now known as rpittau|afk | 18:05 | |
*** derekh has quit IRC | 18:07 | |
sean-k-mooney | markguz_: this was the implemenation fo the fair lock https://github.com/openstack/oslo.concurrency/commit/2b55da68ae45ff45cba68672cdbc24342cf115f6 | 18:10 |
*** andrewbonney has quit IRC | 18:12 | |
sean-k-mooney | markguz_: if you wanted to backport the upstream patch and then backport the implemantion of the fair lock into nova we could evaulate that or at least the stable team could | 18:13 |
sean-k-mooney | its just using https://fasteners.readthedocs.io/en/latest/api/lock.html#fasteners.lock.ReaderWriterLock | 18:13 |
sean-k-mooney | to actuly provide the fifo behavior | 18:14 |
sean-k-mooney | the version of fasteners on stable rocky has the required functionality | 18:15 |
sean-k-mooney | that said i think you can just bump the oslo.concurrancy version locally and locally apply the nova patch and it should run fine | 18:16 |
openstackgerrit | melanie witt proposed openstack/nova stable/stein: Use fair locks in resource tracker https://review.opendev.org/c/openstack/nova/+/770657 | 18:19 |
*** ralonsoh has quit IRC | 18:20 | |
sean-k-mooney | stephenfin: by the way as far as i am aware we never us the cpu_toplopgy filed in the numa cell object to generate teh xml at all | 18:20 |
openstackgerrit | melanie witt proposed openstack/nova stable/stein: Use fair locks in resource tracker https://review.opendev.org/c/openstack/nova/+/770657 | 18:20 |
sean-k-mooney | the numa toplogy of the guest or host should have no impact on the cpu toplogy of the guest period | 18:21 |
sean-k-mooney | any other behviaor is inconsistent with the intended behviaor as discibed by the specs | 18:22 |
*** dtantsur is now known as dtantsur|afk | 18:43 | |
openstackgerrit | Merged openstack/nova stable/victoria: Omit resource inventories from placement update if zero https://review.opendev.org/c/openstack/nova/+/766177 | 18:58 |
*** belmoreira has quit IRC | 19:03 | |
markguz_ | sean-k-mooney: i think it would be simpler for me to just upgrade to ussuri | 19:14 |
sean-k-mooney | if that is an option yes | 19:14 |
openstackgerrit | Merged openstack/nova stable/victoria: Add upgrade check about old computes https://review.opendev.org/c/openstack/nova/+/761924 | 19:15 |
markguz_ | fortunately for me this is an internal deployment that is not used by paying customers so i have some degree of flexibility on it's availability | 19:15 |
sean-k-mooney | what do you use to deploy/manage it | 19:16 |
markguz_ | sean-k-mooney: originally i deployed kilo with rdo packstack. since then it's become a bit of a bit of hodgepodge of manual installs. I mostly use ansible to keep things up to date | 19:21 |
sean-k-mooney | ah i see | 19:21 |
sean-k-mooney | packstack has more or less been unsupported for a few years now | 19:22 |
sean-k-mooney | i think it still technicaly exists but redhat stop supporting in with our product in queens i think | 19:22 |
markguz_ | yeah. i generally just install from the package manager. and have some ansible plays that configure compute nodes etc etc. | 19:22 |
sean-k-mooney | e.g. we moved to require all customer deploy with triplo around queens | 19:22 |
markguz_ | i've got some bits an pieces that are installed from git master for things like magnum and heat. | 19:23 |
sean-k-mooney | markguz_: you should look into openstack ansible so | 19:23 |
markguz_ | it's on my todo list :-) | 19:23 |
markguz_ | The patch seems to have fixed my problem... spawning is happening at the expected rate now | 19:26 |
sean-k-mooney | did you just bump the oslo concurancy version and apply th nova patch | 19:26 |
markguz_ | sean-k-mooney: yup | 19:26 |
sean-k-mooney | there is still goign to be contention on the lock but it should now preserve order | 19:27 |
markguz_ | so spliting to multiple compute nodes is still probably the best path | 19:27 |
*** hamalq has quit IRC | 19:29 | |
sean-k-mooney | long term proably but at leat your current issue is mitagated | 19:30 |
sean-k-mooney | i wont say solved but managemable | 19:30 |
openstackgerrit | Artom Lifshitz proposed openstack/nova-specs master: `socket` PCI NUMA-affinity Policy https://review.opendev.org/c/openstack/nova-specs/+/765551 | 19:33 |
artom | sean-k-mooney, stephenfin (though I suspect you're done for the day) ^^ | 19:33 |
* artom looks at sean-k-mooney's specs next | 19:34 | |
sean-k-mooney | more or less ill leave it open however for tomorrow | 19:34 |
artom | ... with some snow shovelling and kid driving thrown in the mix | 19:34 |
*** zenkuro has joined #openstack-nova | 19:50 | |
*** whoami-rajat__ has quit IRC | 19:55 | |
*** hoonetorg has quit IRC | 20:04 | |
*** nightmare_unreal has quit IRC | 20:16 | |
*** slaweq has quit IRC | 20:41 | |
*** adeberg has quit IRC | 20:53 | |
*** nweinber has quit IRC | 21:09 | |
*** jdillaman has joined #openstack-nova | 21:09 | |
*** hoonetorg has joined #openstack-nova | 21:18 | |
*** vishalmanchanda has quit IRC | 21:41 | |
*** hoonetorg has quit IRC | 21:43 | |
*** rcernin has joined #openstack-nova | 21:59 | |
*** xek has quit IRC | 22:00 | |
*** xek has joined #openstack-nova | 22:01 | |
*** xek has quit IRC | 22:05 | |
*** brinzhang_ has joined #openstack-nova | 23:02 | |
*** songwenping_ has joined #openstack-nova | 23:02 | |
*** songwenping__ has quit IRC | 23:05 | |
*** brinzhang has quit IRC | 23:05 | |
openstackgerrit | Merged openstack/nova stable/stein: [stable-only] Cap bandit and make lower-constraints job non-voting https://review.opendev.org/c/openstack/nova/+/766487 | 23:54 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!