*** gyee has quit IRC | 00:00 | |
*** BrinZhang has quit IRC | 00:04 | |
*** r-daneel has quit IRC | 00:10 | |
*** zhurong has joined #openstack-nova | 00:12 | |
*** tetsuro has joined #openstack-nova | 00:15 | |
openstackgerrit | Merged openstack/nova master: Add functional test for forced live migration rollback allocs https://review.openstack.org/586636 | 00:24 |
---|---|---|
*** erlon has quit IRC | 00:26 | |
*** moshele has quit IRC | 00:35 | |
*** gbarros has joined #openstack-nova | 00:40 | |
*** mhen has quit IRC | 01:24 | |
*** mhen has joined #openstack-nova | 01:25 | |
*** ircuser-1 has quit IRC | 01:27 | |
openstackgerrit | Merged openstack/nova stable/ocata: [stable only] Add functional regression test for bug 1783613 https://review.openstack.org/588416 | 01:36 |
openstack | bug 1783613 in OpenStack Compute (nova) ocata "[ocata only] quota usage not decremented during boot/delete race" [Undecided,In progress] https://launchpad.net/bugs/1783613 - Assigned to melanie witt (melwitt) | 01:36 |
*** gbarros has quit IRC | 01:39 | |
*** Dinesh_Bhor has joined #openstack-nova | 01:44 | |
*** zhurong has quit IRC | 01:47 | |
*** gbarros has joined #openstack-nova | 01:48 | |
*** gbarros has quit IRC | 01:52 | |
*** gbarros has joined #openstack-nova | 01:56 | |
*** mrsoul has quit IRC | 01:59 | |
*** Dinesh_Bhor has quit IRC | 02:03 | |
*** gbarros has quit IRC | 02:04 | |
*** mriedem has quit IRC | 02:08 | |
openstackgerrit | Chen proposed openstack/nova stable/queens: Fix bad links for admin-guide https://review.openstack.org/590068 | 02:08 |
*** Dinesh_Bhor has joined #openstack-nova | 02:10 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/pike: Update nova network info when doing rebuild for evacuate operation https://review.openstack.org/590070 | 02:13 |
openstackgerrit | Chen proposed openstack/nova stable/queens: Fix bad links for admin-guide https://review.openstack.org/590068 | 02:16 |
openstackgerrit | Chen proposed openstack/nova stable/pike: Fix bad links for admin-guide https://review.openstack.org/590072 | 02:16 |
*** Bhujay has joined #openstack-nova | 02:30 | |
*** Bhujay has quit IRC | 02:30 | |
*** Bhujay has joined #openstack-nova | 02:31 | |
*** zhurong has joined #openstack-nova | 02:33 | |
*** Nel1x has joined #openstack-nova | 02:36 | |
openstackgerrit | Chen proposed openstack/nova master: Update ssh configuration doc https://review.openstack.org/589844 | 02:39 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Update the parameter explain when updating a volume attachment https://review.openstack.org/565181 | 02:45 |
*** dklyle has joined #openstack-nova | 02:46 | |
*** psachin has joined #openstack-nova | 02:47 | |
*** hongbin has joined #openstack-nova | 02:49 | |
*** Bhujay has quit IRC | 02:51 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: [placement] api-ref: add description for 1.29 https://review.openstack.org/589407 | 02:56 |
openstackgerrit | Vishakha Agarwal proposed openstack/nova master: Quota details for key_pair "in_use" is 0. https://review.openstack.org/590081 | 03:12 |
openstackgerrit | Chen proposed openstack/nova master: Trivial fix on migration doc https://review.openstack.org/589028 | 03:17 |
*** hongbin has quit IRC | 03:41 | |
*** dklyle has quit IRC | 03:42 | |
*** zhurong has quit IRC | 03:53 | |
*** _ix has quit IRC | 03:54 | |
*** Dinesh_Bhor has quit IRC | 03:55 | |
*** janki has joined #openstack-nova | 03:57 | |
*** hemna_ has quit IRC | 03:57 | |
*** ratailor has joined #openstack-nova | 04:11 | |
openstackgerrit | Merged openstack/nova stable/queens: Fix bad links for admin-guide https://review.openstack.org/590068 | 04:13 |
*** Nel1x has quit IRC | 04:14 | |
*** liuyulong has joined #openstack-nova | 04:22 | |
*** Bhujay has joined #openstack-nova | 04:32 | |
*** Bhujay has quit IRC | 04:34 | |
*** markvoelker has joined #openstack-nova | 04:41 | |
*** Dinesh_Bhor has joined #openstack-nova | 04:54 | |
*** ratailor has quit IRC | 04:57 | |
*** ratailor has joined #openstack-nova | 04:58 | |
*** ircuser-1 has joined #openstack-nova | 05:04 | |
*** udesale has joined #openstack-nova | 05:06 | |
openstackgerrit | Vishakha Agarwal proposed openstack/nova master: Quota details for key_pair "in_use" is 0. https://review.openstack.org/590081 | 05:32 |
*** nicolasbock has joined #openstack-nova | 05:34 | |
*** tetsuro has quit IRC | 05:35 | |
*** links has joined #openstack-nova | 05:50 | |
openstackgerrit | Vishakha Agarwal proposed openstack/nova master: Quota details for key_pair "in_use" is 0. https://review.openstack.org/590081 | 06:05 |
*** hamzy_ has quit IRC | 06:29 | |
*** hamzy_ has joined #openstack-nova | 06:29 | |
*** nicolasbock has quit IRC | 06:35 | |
*** pcaruana has joined #openstack-nova | 06:38 | |
*** nicolasbock has joined #openstack-nova | 06:41 | |
*** holser_ has joined #openstack-nova | 06:45 | |
*** udesale has quit IRC | 06:54 | |
*** evrardjp has joined #openstack-nova | 06:55 | |
openstackgerrit | Sergii Golovatiuk proposed openstack/nova master: libvirt: Always escape IPv6 addresses when used in migration URI https://review.openstack.org/589548 | 06:55 |
*** hshiina has joined #openstack-nova | 06:56 | |
*** ccamacho has joined #openstack-nova | 06:58 | |
*** udesale has joined #openstack-nova | 06:58 | |
*** ratailor_ has joined #openstack-nova | 07:00 | |
*** ratailor has quit IRC | 07:03 | |
*** luksky has joined #openstack-nova | 07:03 | |
*** ispp has joined #openstack-nova | 07:06 | |
*** stakeda has joined #openstack-nova | 07:06 | |
*** rmart04 has joined #openstack-nova | 07:14 | |
openstackgerrit | liuyamin proposed openstack/python-novaclient master: Replace os-client-config to openstacksdk https://review.openstack.org/590141 | 07:15 |
*** Bhujay has joined #openstack-nova | 07:16 | |
*** rcernin has quit IRC | 07:16 | |
*** tetsuro has joined #openstack-nova | 07:16 | |
*** zhangbailin_ has quit IRC | 07:21 | |
*** zhangbailin_ has joined #openstack-nova | 07:21 | |
*** Bhujay has quit IRC | 07:22 | |
*** dpawlik has joined #openstack-nova | 07:22 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Adds a test for _get_provider_ids_matching() https://review.openstack.org/590150 | 07:23 |
*** dpawlik has quit IRC | 07:24 | |
*** dpawlik has joined #openstack-nova | 07:24 | |
*** zhangbailin_ has quit IRC | 07:26 | |
*** BrinZhang has joined #openstack-nova | 07:26 | |
*** rmart04_ has joined #openstack-nova | 07:26 | |
*** rmart04 has quit IRC | 07:26 | |
*** rmart04_ is now known as rmart04 | 07:26 | |
*** brinzh has joined #openstack-nova | 07:27 | |
*** BrinZhang has quit IRC | 07:27 | |
*** ratailor_ has quit IRC | 07:34 | |
*** ratailor__ has joined #openstack-nova | 07:35 | |
*** Bhujay has joined #openstack-nova | 07:40 | |
*** ratailor_ has joined #openstack-nova | 07:50 | |
*** jpena|off is now known as jpena | 07:50 | |
*** XueFeng has quit IRC | 07:50 | |
*** ratailor__ has quit IRC | 07:52 | |
*** udesale has quit IRC | 07:53 | |
*** johnthetubaguy has joined #openstack-nova | 07:54 | |
gibi | melwitt, mriedem: opened a versioned notification bp for stein https://blueprints.launchpad.net/nova/+spec/versioned-notification-transformation-stein | 08:04 |
*** tetsuro has quit IRC | 08:11 | |
*** _ix has joined #openstack-nova | 08:11 | |
*** rmart04 has quit IRC | 08:13 | |
*** mdbooth has joined #openstack-nova | 08:17 | |
*** Dinesh_Bhor has quit IRC | 08:17 | |
mdbooth | lyarwood: Passing: https://review.openstack.org/#/c/587013/ ! | 08:19 |
*** tssurya has joined #openstack-nova | 08:19 | |
*** dulek has joined #openstack-nova | 08:21 | |
mdbooth | lyarwood: Also passed without the rebase workaround. I'll merge them and resubmit. | 08:21 |
*** Dinesh_Bhor has joined #openstack-nova | 08:23 | |
*** udesale has joined #openstack-nova | 08:24 | |
openstackgerrit | Matthew Booth proposed openstack/nova master: fixtures: Track volume attachments within CinderFixtureNewAttachFlow https://review.openstack.org/587013 | 08:27 |
*** dtantsur|afk is now known as dtantsur | 08:33 | |
openstackgerrit | Jose Castro Leon proposed openstack/nova master: Fix get_device_path from network mounted volume https://review.openstack.org/590188 | 08:36 |
*** janki has quit IRC | 08:36 | |
*** ratailor_ has quit IRC | 08:46 | |
*** ratailor_ has joined #openstack-nova | 08:46 | |
*** priteau has quit IRC | 08:47 | |
*** priteau has joined #openstack-nova | 08:48 | |
openstackgerrit | Matthew Booth proposed openstack/nova master: Add regression test for bug#1784353 https://review.openstack.org/587014 | 08:49 |
*** rmart04 has joined #openstack-nova | 09:00 | |
*** cdent has joined #openstack-nova | 09:08 | |
*** ispp has quit IRC | 09:10 | |
*** Dinesh_Bhor has quit IRC | 09:11 | |
lyarwood | mdbooth: cool, thanks again for working through this :) | 09:13 |
mdbooth | lyarwood: np. | 09:14 |
mdbooth | lyarwood: Started accidentally while reviewing and seemed silly to stop :) | 09:14 |
*** josecastroleon has quit IRC | 09:14 | |
*** janki has joined #openstack-nova | 09:15 | |
*** josecastroleon has joined #openstack-nova | 09:19 | |
lyarwood | mdbooth: are you going to rebase https://review.openstack.org/#/c/587071/ ? | 09:21 |
lyarwood | mdbooth: np if not, I have time to work on it this morning finally | 09:21 |
mdbooth | lyarwood: I was going to leave that for you, that's the real bit :) | 09:21 |
lyarwood | mdbooth: tis cool, thanks again | 09:21 |
mdbooth | lyarwood: Incidentally, did you consider the 'fix it in compute' approach | 09:22 |
mdbooth | I know that's where the patch started, then you moved to conductor | 09:22 |
mdbooth | But in fixing the fixture I can across other cleanup in compute which already does exactly what I was talking about | 09:22 |
lyarwood | mdbooth: yeah the remove_volume_connections call | 09:23 |
mdbooth | Yeah | 09:23 |
lyarwood | mdbooth: yeah I'll take a look now, it's a shame to flip back again but meh | 09:24 |
mdbooth | lyarwood: I'm not saying do it, just asking if it's feasible/worth considering | 09:25 |
mdbooth | Or if you've already considered and rejected it, in fact | 09:25 |
lyarwood | mdbooth: yeah understood, I think it is feasible and ultimatley a better approach I just haven't looked into the knock on impact of changing shutdown_instance. | 09:28 |
mdbooth | lyarwood: ack | 09:28 |
*** amarao has joined #openstack-nova | 09:32 | |
*** derekh has joined #openstack-nova | 09:35 | |
*** goutham1 has joined #openstack-nova | 09:37 | |
*** Dinesh_Bhor has joined #openstack-nova | 09:38 | |
goutham1 | HI all i a facing this issue in rally when i try to create a deployment it throws this error Env manager got invalid spec: | 09:39 |
goutham1 | ["There is no Platform plugin with name: 'existing@openstack'"] | 09:39 |
goutham1 | 2:57 | 09:39 |
goutham1 | any idea on how to fix it ?? | 09:39 |
*** dpawlik has quit IRC | 09:39 | |
goutham1 | rally deployment create --fromenv --name=existing | 09:39 |
goutham1 | Env manager got invalid spec: | 09:39 |
goutham1 | ["There is no Platform plugin with name: 'existing@openstack'"] | 09:39 |
goutham1 | it shows something of this sort any idea on how to fix this ?? | 09:40 |
*** dpawlik has joined #openstack-nova | 09:41 | |
mdbooth | goutham1: You'll want to try #openstack for user issues, I think | 09:41 |
goutham1 | thank mdbooth thanks | 09:41 |
*** dpawlik has quit IRC | 09:45 | |
*** dpawlik has joined #openstack-nova | 09:49 | |
tobasco | gibi: maybe a stupid question but is the os-server-external-events interface related to legacy or versioned notifications in nova? | 09:52 |
gibi | tobasco: no it doesn't | 09:52 |
gibi | tobasco: os-server-external-events is in the REST API | 09:52 |
tobasco | ok thanks | 09:53 |
gibi | tobasco: when I say nova notifications I mean notification emitted on the notifications or versioned_notification RPC topic | 09:53 |
*** neiljerram has joined #openstack-nova | 09:55 | |
neiljerram | Good morning everyone. | 09:55 |
*** luksky has quit IRC | 09:56 | |
neiljerram | I am struggling with a problem in Queens where I can do novaclient.images.list() if novaclient is for the admin tenant, but I get 401 if novaclient is for some other tenant/project. | 09:58 |
neiljerram | This was working in a Pike installation, and using Keystone v2 for authentication. In my Queens install I don't have Keystone v2 so am now using Keystone v3 for auth. | 09:59 |
*** maciejjozefczyk has quit IRC | 09:59 | |
neiljerram | Any thoughts? | 09:59 |
neiljerram | I believe any tenant/project should be able to list images, right? | 10:00 |
*** maciejjozefczyk has joined #openstack-nova | 10:01 | |
*** goutham1 has quit IRC | 10:04 | |
neiljerram | When I do novaclient.images.list() with a non-admin tenant, and get 401, there is no new logging in nova-api.log. (Whereas when I do a successful list with the admin tenant, I see a 200 log line in nova-api.log.) | 10:04 |
neiljerram | Therefore I guess that this 401 is coming from some middleware before nova-api? But I don't know how to debug or see any logging for that middleware... | 10:05 |
neiljerram | Ah, just realized that I should be asking all this in #openstack instead... | 10:09 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Use placement 1.28 in scheduler report client https://review.openstack.org/583667 | 10:11 |
*** claudiub|2 has joined #openstack-nova | 10:17 | |
*** claudiub|2 is now known as claudiub | 10:17 | |
openstackgerrit | Sergii Golovatiuk proposed openstack/nova master: libvirt: Always escape IPv6 addresses when used in migration URI https://review.openstack.org/589548 | 10:18 |
mdbooth | neiljerram: :) You're also best talking to glance directly for listing images. Pretty sure nova would just proxy it. Actually, I wonder if novaclient just talks to glance instead? When can we kill that, btw? | 10:21 |
*** Dinesh_Bhor has quit IRC | 10:22 | |
neiljerram | mdbooth, I think you're right that nova proxies this to glance. When I try this with an admin tenant, I see a 200 log line in nova-api.log - which I think means that it can't be going directly to glance; right? | 10:22 |
mdbooth | neiljerram: Yeah, it means we saw it. | 10:23 |
neiljerram | mdbooth, I think my problem may be more to do with not understanding how users and tokens work in Keystone v3... | 10:24 |
*** bhagyashris has joined #openstack-nova | 10:25 | |
*** stakeda has quit IRC | 10:26 | |
neiljerram | mdbooth, If it's OK to ask this here: if I have just created some new project and user (in that project), do I also need to explicitly set up some token(s) for that user? Or will that happen under the covers when I first try to do something with that user? | 10:26 |
*** goutham1 has joined #openstack-nova | 10:26 | |
*** ratailor_ has quit IRC | 10:33 | |
*** ratailor has joined #openstack-nova | 10:33 | |
*** hshiina has quit IRC | 10:34 | |
*** goutham1 has quit IRC | 10:34 | |
*** luksky has joined #openstack-nova | 10:35 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Use placement 1.28 in scheduler report client https://review.openstack.org/583667 | 10:40 |
*** sahid has joined #openstack-nova | 10:45 | |
*** erlon has joined #openstack-nova | 10:49 | |
lyarwood | mdbooth: https://review.openstack.org/#/c/589567/3 - would you mind taking a look at my latest comment here, basically we wanted to introduce a workaround to avoid using qemu-img to aviod the RT from blocking other operations, however this path is also used during LM to get the sizes of disks before we recreate on the dest if required. I'm tempted to just close this change and move on but I might be | 10:55 |
lyarwood | missing something here. | 10:55 |
mdbooth | lyarwood: ack. Looking now. | 10:55 |
lyarwood | mdbooth: note the logic in the current change is all backwards, I was rewriting it to always default to using os.path.getsize and only use qemu-img when the workaround was enabled. | 10:56 |
lyarwood | mdbooth: but that reintroduces 1770640 | 10:56 |
mdbooth | lyarwood: We should really just write that data into disk.info and read it from there. | 10:58 |
lyarwood | mdbooth: yeah looks like get_instance_disk_info always checks the config | 11:00 |
lyarwood | the actual instance config that is, not disk.info | 11:01 |
*** jpena is now known as jpena|lunch | 11:01 | |
lyarwood | mdbooth: brb need to drop for 10 mins | 11:02 |
mdbooth | lyarwood: k | 11:02 |
*** holser_ has quit IRC | 11:06 | |
openstackgerrit | Chen proposed openstack/nova master: Add additional info to resource provider aggregates update API https://review.openstack.org/590243 | 11:07 |
*** maciejjozefczyk has quit IRC | 11:08 | |
*** maciejjozefczyk has joined #openstack-nova | 11:09 | |
*** maciejjozefczyk has quit IRC | 11:10 | |
*** brinzh has quit IRC | 11:12 | |
mdbooth | lyarwood: If you're back, why the hell didn't we just stat it? | 11:15 |
mdbooth | lyarwood: Probably because qemu-img is there and we weren't considering the performance impact. | 11:15 |
mdbooth | lyarwood: But for allocated size, stat should give us exactly what we want, be really fast, and we don't need a workaround. | 11:16 |
openstackgerrit | Merged openstack/nova master: Docs: Add guide to migrate instance with snapshot https://review.openstack.org/584442 | 11:27 |
*** sambetts|afk has quit IRC | 11:29 | |
*** sambetts_ has joined #openstack-nova | 11:31 | |
*** maciejjozefczyk has joined #openstack-nova | 11:32 | |
openstackgerrit | Matthew Booth proposed openstack/nova master: Improve performance of get_allocated_disk_size https://review.openstack.org/590253 | 11:33 |
mdbooth | lyarwood: How about ^^^ instead? Just throwing that out as a proposal. | 11:33 |
*** sambetts_ is now known as sambetts | 11:35 | |
*** sambetts is now known as sambetts|afk | 11:35 | |
* mdbooth -> lunch | 11:35 | |
*** bhagyashris has quit IRC | 11:35 | |
*** links has quit IRC | 11:36 | |
openstackgerrit | Merged openstack/nova master: Fix host validity check for live-migration https://review.openstack.org/401009 | 11:36 |
*** sambetts|afk has quit IRC | 11:46 | |
*** sambetts_ has joined #openstack-nova | 11:47 | |
*** s10 has joined #openstack-nova | 11:52 | |
openstackgerrit | Rajesh Tailor proposed openstack/nova stable/queens: Fix host validity check for live-migration https://review.openstack.org/590262 | 11:55 |
openstackgerrit | Rajesh Tailor proposed openstack/nova stable/pike: Fix host validity check for live-migration https://review.openstack.org/590263 | 11:58 |
*** jpena|lunch is now known as jpena | 11:58 | |
lyarwood | mdbooth: sorry that took a while | 11:59 |
* lyarwood reads | 12:00 | |
lyarwood | mdbooth: yeah but we also need the virtual size | 12:00 |
mdbooth | lyarwood: Where? | 12:00 |
lyarwood | mdbooth: and that's the more important value tbh | 12:00 |
*** ratailor has quit IRC | 12:01 | |
lyarwood | mdbooth: so we need it to work out over_committed_disk_size but also during LM to ensure we create the dest disks correctly | 12:01 |
lyarwood | mdbooth: your change is still good | 12:02 |
lyarwood | mdbooth: but I don't think it stops us from calling qemu-img to get the virtual size | 12:02 |
mdbooth | lyarwood: Yes, you're right | 12:02 |
mdbooth | Something somewhere said it the regression was introduced in a particular change, and that change only added get_allocated_disk_size | 12:03 |
lyarwood | mdbooth: yeah the bug for this highlights that change first I think | 12:03 |
*** sambetts_ has quit IRC | 12:04 | |
mdbooth | Ok. We could also eliminate get_disk_size, but that would be more complex | 12:04 |
lyarwood | mdbooth: that introduced the first call to qemu-img, then we noticed that broke LM so I introduced the virtual size call | 12:04 |
*** sambetts_ has joined #openstack-nova | 12:04 | |
mdbooth | We'd have to cache it | 12:04 |
lyarwood | mdbooth: yeah I think we can do that for virtual size | 12:05 |
mdbooth | I think. Not hard, but harder. | 12:05 |
mdbooth | Possibly not worth it harder | 12:05 |
mdbooth | Incidentally, that is the only use of get_allocated_disk_size | 12:05 |
openstackgerrit | Merged openstack/nova master: Update nova network info when doing rebuild for evacuate operation https://review.openstack.org/382853 | 12:06 |
lyarwood | mdbooth: yeah as I introduced it to fix the original over commit issue a while ago | 12:06 |
lyarwood | mdbooth: where we originally used os.path.getsize | 12:06 |
lyarwood | mdbooth: I guess we could just use that for the virtual size for files and avoid the call to qemu-img | 12:07 |
lyarwood | mdbooth: and use your change to get the allocated size | 12:08 |
lyarwood | mdbooth: then everyone is happy | 12:08 |
mdbooth | lyarwood: Ok, so now it's a judgement call. The workaround is frankly ugly and puts the onus on users to fix it. However, there shouldn't be many of those users and it's technically simpler. | 12:08 |
mdbooth | I think os.path.getsize() is unreliable | 12:09 |
mdbooth | Depends what qcow2 allocation we use | 12:09 |
lyarwood | true | 12:09 |
mdbooth | qemu-img is the right tool to use for that | 12:09 |
mdbooth | lyarwood: So my fix is only going to half the performance impact. Is that enough? | 12:10 |
lyarwood | mdbooth: it's not going to change the impact | 12:11 |
lyarwood | mdbooth: we still make a single qemu-img call for the virtual size | 12:11 |
lyarwood | mdbooth: we only make one now | 12:11 |
mdbooth | lyarwood: Well we'll only call it once instead of twice, right? | 12:11 |
lyarwood | mdbooth: nope, https://review.openstack.org/#/c/589513/ reduced it down to one | 12:12 |
mdbooth | lyarwood: Ah, with that landed there are *no* calls to get_alloated_disk_size | 12:12 |
mdbooth | \o/ dead code | 12:12 |
lyarwood | mdbooth: well before you rm -rf it | 12:13 |
lyarwood | mdbooth: I still think we could use the getsize approach for raw disks | 12:13 |
lyarwood | mdbooth: and your stat call | 12:13 |
lyarwood | mdbooth: that way, no workaround, just an additional bugfix for RAW disks | 12:14 |
mdbooth | Honestly, I'd prefer to avoid doing anything complicated for a legacy code path. Adding if <raw>, elif <qcow2>, elif <lvm>... | 12:14 |
mdbooth | Doesn't seem like a good plan. I'll take your hack over that. | 12:15 |
lyarwood | mdbooth: well we already do that here anyway tbh | 12:16 |
lyarwood | disk_type == file driver_type == ploop etc | 12:16 |
*** eharney has quit IRC | 12:17 | |
mdbooth | lyarwood: Right, but we'd be adding at least 1 new code path. Also, we'd still be slow for qcow2 at least sometimes. | 12:23 |
mdbooth | lyarwood: I'm ambivalent here in case you hadn't picked up :) I'm not necessarily against your hack, just thinking if it's worth the effort to do better. | 12:25 |
lyarwood | mdbooth: yeah I really don't enjoy touching this stuff tbh as something always comes up but I think the stat/getsize approach for RAW is the best we can offer | 12:26 |
mdbooth | Incidentally, what scheme do we now have which isn't interested in actual disk usage? | 12:27 |
mdbooth | That seems odd. | 12:27 |
s10 | update_available_resource() with this call to get_allocated_disk_size is blocking some other operations only in post_live_migration: https://github.com/openstack/nova/commit/ab1e48f4683315db631be3f0995be6258edf6997 ? Do we really need this call here now? | 12:27 |
mdbooth | s10: Intuitively I'd say yes, but based on the same assumptions which tell me we should still be interested in actual disk usage. | 12:29 |
mdbooth | ...which we're apparently not. | 12:29 |
lyarwood | yeah placement should handle that now, so I think we can actually remove this? | 12:30 |
mdbooth | lyarwood: But how does placement handle it? | 12:30 |
mdbooth | Placement doesn't know anything about actual disk usage which the hypervisor didn't tell it. | 12:30 |
lyarwood | mdbooth: yeah but I didn't think that was coming from the RT but I'm likely wrong. | 12:31 |
mdbooth | Unless we deprecated disk overcommit? | 12:32 |
*** gbarros has joined #openstack-nova | 12:34 | |
*** sambetts_ has quit IRC | 12:36 | |
*** sambetts_ has joined #openstack-nova | 12:37 | |
mdbooth | lyarwood: An alternate (but not necessarily 'better'): write virtual size into disk.info. It's possible to do this in a backwards compatible way. We would just read it out for virtual size, update it automatically if it's missing so we don't have to fetch it again, and use state for allocated size. | 12:39 |
mdbooth | s/state/stat/ | 12:39 |
mdbooth | lyarwood: It would be fast, accurate, and secure. | 12:39 |
mdbooth | It would also be a bit more complex, so only worth it if we continue to need the data. | 12:40 |
mdbooth | If a fast get_disk_size() is required ongoing, I think we should do ^^^. If not, I think we should go with the workaround. | 12:40 |
*** oanson has joined #openstack-nova | 12:42 | |
*** edmondsw has joined #openstack-nova | 12:45 | |
*** lbragstad has joined #openstack-nova | 12:45 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP libvirt: rewrite _get_instance_disk_info_from_config https://review.openstack.org/589567 | 12:47 |
lyarwood | mdbooth: I'd rather keep this simple if possible, what about ^ | 12:47 |
* lyarwood isn't sure about the ploop logic tbh | 12:49 | |
mdbooth | lyarwood: That doesn't eliminate the qemu-img call, though | 12:49 |
mdbooth | For qcow2 | 12:49 |
lyarwood | mdbooth: yeah I don't think we can | 12:49 |
lyarwood | mdbooth: the issue was reported against RAW FWIW | 12:49 |
mdbooth | You can if you cache it | 12:49 |
lyarwood | true | 12:49 |
mdbooth | And then you've also got 1 less code path to test | 12:50 |
lyarwood | well you still need it if it isn't cached | 12:50 |
lyarwood | for now | 12:50 |
lyarwood | but longer term we can remove it | 12:50 |
mdbooth | Right, but you can put that in a utility call with separate tests | 12:51 |
lyarwood | mdbooth: are there util methods for reading/writing to disk.info btw? | 12:51 |
mdbooth | No, we'd need to pull it out of imagebackend | 12:51 |
mdbooth | (A good thing) | 12:51 |
*** _ix has quit IRC | 12:52 | |
mdbooth | lyarwood: But that's conditional on us continuing to need this stuff. If it has a limited shelf life it's not worth it. | 12:54 |
*** sambetts_ has quit IRC | 12:54 | |
mdbooth | Although I still don't understand why we don't need allocated disk any more. | 12:55 |
*** sambetts_ has joined #openstack-nova | 12:56 | |
lyarwood | mdbooth: well we still need it for LM | 12:57 |
*** josecastroleon has quit IRC | 12:57 | |
*** josecastroleon has joined #openstack-nova | 12:57 | |
lyarwood | mdbooth: tbh I'd rather land something simple like this as a bugfix and then work to switch to disk.info outside of this in a bp or something | 12:57 |
*** eharney has joined #openstack-nova | 12:58 | |
mdbooth | lyarwood: Sure. I'd prefer the workaround over the extra code paths for sure. | 12:59 |
lyarwood | mdbooth: wait, getting confused now, which workaround? | 12:59 |
lyarwood | mdbooth: os.stat? | 12:59 |
mdbooth | lyarwood: No, your original one. | 12:59 |
mdbooth | os.stat() doesn't fix it, we established that | 12:59 |
*** _ix has joined #openstack-nova | 13:00 | |
mdbooth | lyarwood: So... land your original workaround, with the disk.info thing in reserve. | 13:01 |
lyarwood | mdbooth: that workaround breaks LM | 13:01 |
lyarwood | mdbooth: that's why I'm suggesting using os.stat and os.path.getsize for RAW at least | 13:01 |
mdbooth | How does it break LM, btw? | 13:02 |
lyarwood | mdbooth: see my comment, LM with non-shared storage where we are creating images on the dest in pre_live_migration | 13:03 |
lyarwood | mdbooth: if we don't get an accurate virtual size we end up creating images that are too small | 13:03 |
*** josecastroleon has quit IRC | 13:03 | |
lyarwood | https://review.openstack.org/#/c/589567/3 - my comment there sorry | 13:04 |
*** josecastroleon has joined #openstack-nova | 13:04 | |
lyarwood | https://bugs.launchpad.net/nova/+bug/1770640 | 13:04 |
openstack | Launchpad bug 1770640 in nova (Ubuntu Bionic) "live block migration of instance with vfat config drive fails" [High,Fix committed] | 13:04 |
mdbooth | lyarwood: We shouldn't be live migrating a config disk anyway | 13:10 |
mdbooth | That sounds like a different bug | 13:10 |
mdbooth | We should just host->host copy it | 13:10 |
lyarwood | mdbooth: yeah the issue wasn't with the config disk but the main instance disk iirc | 13:11 |
mdbooth | Ok. | 13:12 |
lyarwood | hmm actually that's vdb | 13:12 |
lyarwood | but you can see we are mirroring | 13:12 |
*** holser_ has joined #openstack-nova | 13:12 | |
*** sambetts_ has quit IRC | 13:15 | |
odyssey4me | Hi folks. Is there a conf entry for the number of workers nova-scheduler fires up? | 13:17 |
mdbooth | lyarwood: Ok, now I understand the interaction. | 13:17 |
odyssey4me | I can't seem to find one in the references. | 13:17 |
*** sambetts_ has joined #openstack-nova | 13:18 | |
odyssey4me | ok, it would appear that there is one: https://github.com/openstack/nova/blob/master/nova/cmd/scheduler.py#L49 | 13:19 |
*** sambetts_ has quit IRC | 13:22 | |
*** sambetts_ has joined #openstack-nova | 13:23 | |
*** mriedem has joined #openstack-nova | 13:24 | |
*** amarao has left #openstack-nova | 13:28 | |
*** gbarros has quit IRC | 13:28 | |
mriedem | dansmith: i'm +2 on https://review.openstack.org/#/c/582413/ if you want to re-apply your +2 | 13:28 |
mriedem | bauzas: can you go through these backports? https://review.openstack.org/#/q/topic:bug/1784705+status:open | 13:29 |
*** edmondsw has quit IRC | 13:29 | |
mriedem | mel said she was looking to cut stable releases today | 13:29 |
mriedem | so i'm going to try and flush some of these out | 13:29 |
*** _ix has quit IRC | 13:29 | |
dansmith | okay | 13:30 |
*** edmondsw has joined #openstack-nova | 13:37 | |
*** sambetts_ has quit IRC | 13:37 | |
*** sambetts has joined #openstack-nova | 13:38 | |
*** jistr is now known as jistr|call | 13:39 | |
mriedem | stephenfin_: e. gads. https://review.openstack.org/#/q/topic:bug/1746393+status:open | 13:40 |
*** edmondsw has quit IRC | 13:41 | |
mriedem | feels like a feature as a bug fix | 13:41 |
mriedem | especially nervous when we have 0 CI of any of this stuff | 13:42 |
*** ccamacho has quit IRC | 13:44 | |
*** edmondsw has joined #openstack-nova | 13:44 | |
*** ccamacho has joined #openstack-nova | 13:44 | |
*** eharney has quit IRC | 13:45 | |
mriedem | sean-k-mooney[m]: i guess the intel 3rd party PCI/NFV CI must be dead huh? | 13:46 |
mriedem | dansmith: here are the queens backports with a +2 ready to go https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/queens+label:Code-Review=2 - the rest are in stephen's series, which i'd want to hold off on for a bit | 13:47 |
*** ccamacho has quit IRC | 13:48 | |
mriedem | oh and https://review.openstack.org/#/c/590062/ would be nice - that fix sat for over a year | 13:49 |
*** eharney has joined #openstack-nova | 13:49 | |
melwitt | nova meeting in 10 minutes | 13:50 |
*** sean-k-mooney has joined #openstack-nova | 13:51 | |
melwitt | cdent: is this bug considered closed/fixed now that both patches have landed? neither patch used Closes-Bug in the commit message https://bugs.launchpad.net/nova/+bug/1786055 | 13:53 |
openstack | Launchpad bug 1786055 in OpenStack Compute (nova) "performance degradation in placement with large number of resource providers" [High,In progress] - Assigned to Chris Dent (cdent) | 13:53 |
*** takashin has joined #openstack-nova | 13:53 | |
cdent | melwitt: hmmm. There is more than can be done, but not likely that more will be done _now_, so I would guess closed is probably a reasonable state. The major factor has been addressed. Fixing the rest will involve considerable refactoring | 13:54 |
*** sambetts has quit IRC | 13:55 | |
melwitt | cdent: I see, thanks | 13:56 |
*** sambetts_ has joined #openstack-nova | 13:58 | |
*** awaugama has joined #openstack-nova | 13:58 | |
*** jistr|call is now known as jistr | 13:59 | |
*** amotoki_ is now known as amotoki | 14:01 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: libvirt: Use os.stat and os.path.getsize for RAW disk inspection https://review.openstack.org/589567 | 14:01 |
*** _ix has joined #openstack-nova | 14:03 | |
*** gbarros has joined #openstack-nova | 14:05 | |
*** sambetts_ has quit IRC | 14:05 | |
*** jaypipes has quit IRC | 14:06 | |
*** sambetts_ has joined #openstack-nova | 14:07 | |
*** josecastroleon has quit IRC | 14:08 | |
*** jaypipes has joined #openstack-nova | 14:10 | |
mriedem | i guess we don't need to wait for translations https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:zanata/translations | 14:17 |
*** josecastroleon has joined #openstack-nova | 14:19 | |
melwitt | yeah, I was thinking it seems like openstack doesn't do translations anymore but I didn't know how to check. I'll add that link to the release checklist wiki | 14:20 |
melwitt | I know we don't translate log messages | 14:20 |
melwitt | but other user-facing message, still translate? I wasn't sure | 14:20 |
efried | afaik we're still supposed to _() for exception messages. | 14:21 |
lyarwood | mdbooth: remind me again where the compute code was that deleted and recreated an attachment? | 14:21 |
melwitt | aye, I have seen that | 14:21 |
*** gbarros has quit IRC | 14:21 | |
*** sambetts_ has quit IRC | 14:21 | |
mdbooth | lyarwood: _terminate_volume_connections | 14:22 |
lyarwood | mdbooth: urgh was looking at remove_volume_connection | 14:24 |
*** sambetts_ has joined #openstack-nova | 14:24 | |
mdbooth | lyarwood: IIRC it was triggering a bug in the cinder fixture, which assumed only 1 attachment | 14:25 |
mdbooth | But with this we've briefly got 2 attachments | 14:25 |
*** ccamacho has joined #openstack-nova | 14:25 | |
*** ccamacho has quit IRC | 14:25 | |
*** ccamacho has joined #openstack-nova | 14:26 | |
mdbooth | lyarwood: I don't love the raw-only fix, tbh, because I think it increases the test and maintenance burden. If that code needs to live on I'd prefer to bring it together somehow. | 14:26 |
mdbooth | I'll abandon the stat thing, though, because as you point out it's not a solution | 14:27 |
lyarwood | mdbooth: kk, well it improves performance for the raw images user that reported the issue in the short term until we start using disk.info to store the virtual size | 14:28 |
lyarwood | mdbooth: and given that means we also need to refactor code out of imagebackend I'd rather land something simple first then focus on that | 14:29 |
mdbooth | I don't think it's a refactor, btw. Just code motion really iirc. | 14:30 |
mdbooth | Would just be moving it elsewhere to make it easier to call. | 14:30 |
mriedem | GET /kashyap returns me a 404 | 14:31 |
mriedem | is his nick not registered? | 14:31 |
mdbooth | mriedem: Yep. He's out for a few more days yet, I think | 14:31 |
mriedem | blarg | 14:31 |
mriedem | wanted him to read the "nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?" thread in the ML | 14:31 |
mriedem | wondering if there is a good reason why we don't set guest.arch in the libvirt domain xml based on the hw_architecture image property | 14:32 |
mriedem | maybe someone could ask danpb? | 14:32 |
mdbooth | stephenfin_: might have an opinion | 14:33 |
mdbooth | mriedem: Is it tagged [nova]? | 14:33 |
* mdbooth doesn't see it | 14:34 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: Add regression test for bug#1784353 https://review.openstack.org/587014 | 14:34 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP compute: Terminate volume connections during _shutdown_instance https://review.openstack.org/590348 | 14:34 |
lyarwood | mdbooth: ^ that's the terminate_volume_connections alternative btw, without unit test changes | 14:34 |
mdbooth | lyarwood: That was fast! Looking. | 14:34 |
mdbooth | mriedem: Do you have an opinion on ^^^, btw? | 14:35 |
mdbooth | mriedem: Basically leaves us with a blank attachment when calling _shutdown_instance | 14:36 |
lyarwood | hmm that removes the call to detach with cinderv2 | 14:36 |
lyarwood | why would we do that during shutdown? | 14:37 |
mdbooth | lyarwood: remind me what v2 detach() does | 14:38 |
mriedem | mdbooth: seems fun at first since it's very much the same code, | 14:38 |
mriedem | but see inline comments | 14:38 |
*** sambetts_ has quit IRC | 14:39 | |
mdbooth | terminate_connection() causes the storage backend to kill the connection | 14:39 |
mriedem | mdbooth: v2 detach is just the os-detach volume action which changes the volume status to 'available' | 14:39 |
mriedem | does nothing on the volume backend | 14:39 |
mdbooth | mriedem: Got it. | 14:39 |
mriedem | so if you did this, | 14:40 |
*** sambetts_ has joined #openstack-nova | 14:40 | |
*** hongbin has joined #openstack-nova | 14:41 | |
mriedem | you'd have to have volume attachment cleanup code in both compute manager if we don't reschedule (max_attempts=1 or force_hosts/nodes is set) or if we do reschedule and conductor build_instances hits MaxRetriesExceeded | 14:41 |
mriedem | which is essentially what we have for cleaning up ports | 14:41 |
sean-k-mooney | mdbooth: stephenfin_ is on vaction for the next week and a half just fyi | 14:41 |
mriedem | i assume that was meant for me | 14:42 |
mdbooth | mriedem: See, over here in communist Europe everybody takes vacation in August | 14:42 |
sean-k-mooney | mriedem: well both you and mdbooth since he suggested stephenfin_ would have an oppion on something | 14:42 |
sean-k-mooney | i was still scrolling back to see what | 14:42 |
*** dtantsur is now known as dtantsur|brb | 14:43 | |
lyarwood | mriedem: kk thanks, so this doesn't really simplify the fix at all | 14:43 |
mriedem | lyarwood: not really | 14:43 |
mriedem | reschedules are a minefield | 14:44 |
mriedem | mdbooth: is that what bauzas is doing as well? | 14:44 |
mriedem | france gets august off right? | 14:44 |
lyarwood | pretty much | 14:44 |
sean-k-mooney | mriedem: oh ye were talking about setting the arch in libvirtxml based on hw_architecture | 14:44 |
mriedem | for wine and cheese and love making | 14:44 |
mdbooth | mriedem: Yep | 14:44 |
melwitt | efried: haha, I am literally looking at that review already right now | 14:44 |
efried | melwitt: Cool. I wanted to make sure it got attention from someone who knows how to spell "quota" (which ain't me). | 14:45 |
dansmith | mriedem: question in here: https://review.openstack.org/#/c/590062/1 | 14:45 |
dansmith | I know it's a backport, but want to make sure I understand at least | 14:45 |
melwitt | efried: yeah, makes sense. I'm conflicted about the sentence they're proposing because it's not true that it's not possible to count keypairs. it's just, for legacy reasons (before pike) it always returned zero so I kept the behavior with the quota work in pike | 14:46 |
mdbooth | mriedem lyarwood: I think that approach is fundamentally good. We do need to think about the attachment cleanup after the last reschedule failure, but if we're not doing that then we've always been leaking there. | 14:46 |
efried | melwitt: Ah, okay, then perhaps it should just say, "For legacy reasons, this value is always zero. We'll fix it in a future microversion. Maybe. If you're lucky." | 14:46 |
mdbooth | That is, we can leak there right now, because it's possible to schedule to a compute and have it fail before touching volumes, so the 'reservation' still exists after failure. | 14:47 |
mdbooth | ^^^ * 3 == leak | 14:47 |
melwitt | efried: haha, yeah. | 14:47 |
mdbooth | Unless we already handle it | 14:47 |
mriedem | mdbooth: what i'd be most comfortable with is if we suspect we leak today, that we add a functional regression test for that which does reschedules with a volume attached, and asserts at the end of the reschedules when we get novalidhost (for max retries exceeded) that we've cleaned up all attachments to the volume | 14:48 |
mriedem | mdbooth: because this is too hairy to go on based on review alone | 14:49 |
mriedem | *functional test (not really a regression if it's always leaked) | 14:49 |
mriedem | that may have been fixed recently though | 14:49 |
mriedem | after about 7 years of being broken | 14:49 |
mdbooth | Got it. I did a bit of an audit here, btw: https://review.openstack.org/#/c/587071/9/nova/tests/unit/conductor/test_conductor.py@1004 | 14:50 |
mriedem | I8b1c05317734e14ea73dc868941351bb31210bf0 | 14:51 |
*** hvvcben has joined #openstack-nova | 14:51 | |
mriedem | yeah so we'll call _cleanup_volumes which will detach if we abort the build | 14:52 |
mriedem | but not if we reschedule | 14:52 |
*** priteau has quit IRC | 14:52 | |
mriedem | and conductor doesn't do any volume cleanup on MaxRetriesExceeded | 14:52 |
mriedem | so that's probably a separate bug, | 14:52 |
mriedem | and would benefit from a functional test since it involves more than a single service | 14:52 |
mriedem | (really 3 - conductor and 2 computes for the reschedule) | 14:52 |
mriedem | we might already have a func test that does reschedules with a volume attached | 14:53 |
mriedem | i don't see one though | 14:54 |
mriedem | but shouldn't be hard to write | 14:54 |
lyarwood | mdbooth: https://review.openstack.org/#/c/587014/ does that | 14:58 |
mdbooth | lyarwood: Thought I'd seen it recently :) | 14:59 |
hvvcben | Hi - probably a newb question and if this isn't the correct channel please advise. - I am trying to rework an old neutron ML2 driver into Queens and having issues with Nova-compute during port creation. Because of no bind_host_id = $nodeID during instance create. The nova-compute api call to neutron to create port it doesn't have the bind_host_id set in the api call. On older version like Mitaka, that parameter is in | 15:00 |
hvvcben | the API call, i.e "binding:host_id": "mymitkaComputehost", | 15:00 |
hvvcben | any help or links to doc pertaining to these changes would be greatly appreciated | 15:00 |
*** Swami has joined #openstack-nova | 15:03 | |
*** ivve has quit IRC | 15:03 | |
*** janki has quit IRC | 15:03 | |
mriedem | melwitt: is your link to irc in https://review.openstack.org/#/c/589972/ wrong? | 15:03 |
melwitt | oh, yeah it is now because I used "latest". derp. I think I've done that a few times lately | 15:04 |
melwitt | added a new comment with the right link | 15:05 |
hvvcben | mitka api nova-compute api to neutron = "binding:host_id": "mymitkaComputehost", Queens nova-compute api call to neutron more like "binding:host_id": "", | 15:06 |
mriedem | hvvcben: i see bind_host_id in the neutronv2/api.py code in queens | 15:06 |
mriedem | are you saying bind_host_id isn't being passed down from the compute manager to allocate_for_instance? | 15:06 |
hvvcben | yes, | 15:07 |
mriedem | i think that was only ever used by the ironic driver | 15:07 |
*** rmart04 has quit IRC | 15:07 | |
mriedem | it's still used in queens https://github.com/openstack/nova/blob/stable/queens/nova/compute/manager.py#L1390 | 15:08 |
hvvcben | I am just having trouble figuring out why it does it in mitaka and not in later version, I was thinking the port creation process has been modified and that value would come later in the process | 15:08 |
mriedem | but as i said, that would only ever have a value for ironic | 15:08 |
hvvcben | this particular driver does interact with hardware and in its present state fails if no Host_id is passed | 15:09 |
hvvcben | hardware meaning switch hardware | 15:10 |
mriedem | the only difference i see when setting binding:host_id between mitaka and queens is that in mitaka we only set that if the neutron port binding extension was available, and we stopped looking for that sometime later and just assumed it would be available | 15:11 |
hvvcben | i was just curious, since i have default install of mitaka and it passes it(using openvswitch as driver) and the queens version doesn't was there some point where that was changed? Thats what I am having trouble finding. I thought it may relate to livemigration | 15:11 |
mriedem | you said you're trying to create an instance, not live migrate it, right? | 15:11 |
mriedem | there is nothing immediately obviously different between mitaka and queens in how binding:host_id is handled, | 15:12 |
hvvcben | yes yes, but I thought their were some rework done on port creation that affected nova and neutron dealing with port creation as a whole in effort to smooth out live migrations | 15:12 |
mriedem | so you're going to have to debug | 15:12 |
mriedem | that's in rocky | 15:12 |
hvvcben | yes been trying | 15:12 |
mriedem | maybe you mean the migrating_to stuff? | 15:13 |
mriedem | for dvr | 15:13 |
*** sambetts_ has quit IRC | 15:13 | |
mriedem | if you're not live migrating, you wouldn't hit any of that so shouldn't be a problem | 15:13 |
hvvcben | gotcha. has the port creation process changed significantly from mitaka to queens? | 15:14 |
hvvcben | as far as what nova do etc? | 15:14 |
mriedem | you're talking about like a 2 year window of dev here :) | 15:16 |
mriedem | i'm not aware of anything significant changing in that flow in that time though, no | 15:16 |
*** sambetts_ has joined #openstack-nova | 15:16 | |
hvvcben | : ) I know i know | 15:17 |
mriedem | are you sure you're not using now-invalid config in queens? | 15:17 |
mriedem | like, we could have deprecated some config options in mitaka/newton and they are gone by the time you get to queens | 15:17 |
hvvcben | ... appreciate i will dig further.. I just mainly need to find a way to get the nova-compute host_id and pass it to neutron in a way during create_port_precommit | 15:17 |
*** s10 has quit IRC | 15:18 | |
mriedem | which virt driver are you using? | 15:18 |
mriedem | libvirt? | 15:18 |
mriedem | https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/driver.py#L1587 | 15:18 |
hvvcben | probably all of the above... -- the driver was designed to work with Mitaka and not maintained their have been quite a bit of changes in neutron since then obviously(in a good way) | 15:19 |
hvvcben | yes it is libvirt | 15:19 |
mriedem | https://github.com/openstack/nova/blob/stable/queens/nova/virt/driver.py#L1657 | 15:19 |
mriedem | the only other thing i can think is by the time we call network_binding_host_id in the compute manager, the instance.host field isn't set yet | 15:19 |
hvvcben | yea i think that may be part of a port staging process now where back then it was more like "Create it right now" | 15:20 |
mriedem | melwitt: i'm +2 on the reno https://review.openstack.org/589303 and the rpc alias https://review.openstack.org/589972 so you will need to bug another core | 15:20 |
melwitt | mriedem: ack, thanks | 15:20 |
*** Bhujay has quit IRC | 15:21 | |
mriedem | hvvcben: shouldn't have changed this, the ResourceTracker.instance_claim sets the instance.host, | 15:22 |
mriedem | and that happens before we start the network allocation stuff | 15:22 |
openstackgerrit | Jay Pipes proposed openstack/nova master: split gigantor SQL placement query into multiple https://review.openstack.org/590041 | 15:22 |
openstackgerrit | Jay Pipes proposed openstack/nova master: placement: use simple code paths when possible https://review.openstack.org/590388 | 15:22 |
*** alex_xu has quit IRC | 15:24 | |
hvvcben | thanks mriedem: thanks for the assistance | 15:25 |
mriedem | dansmith: replied in https://review.openstack.org/#/c/590062/ | 15:26 |
mriedem | hvvcben: np, good luck | 15:26 |
openstackgerrit | Eric Fried proposed openstack/nova master: Nix 'new in 1.19' from 1.19 sections for rp aggs https://review.openstack.org/590389 | 15:27 |
*** hvvcben has quit IRC | 15:27 | |
*** dtantsur|brb is now known as dtantsur | 15:29 | |
*** ccamacho has quit IRC | 15:29 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: fixtures: Track volume attachments within CinderFixtureNewAttachFlow https://review.openstack.org/587013 | 15:29 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: Add regression test for bug#1784353 https://review.openstack.org/587014 | 15:29 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: conductor: Recreate volume attachments during a reschedule https://review.openstack.org/587071 | 15:29 |
dansmith | mriedem: ah dang, I saw update_cells and stopped reading | 15:31 |
dansmith | thinking that was just the v1 sync thing like instance save | 15:32 |
dansmith | so nevermind | 15:32 |
*** takashin has left #openstack-nova | 15:37 | |
*** priteau has joined #openstack-nova | 15:38 | |
mriedem | lyarwood: can you hit this? https://review.openstack.org/#/c/590062/ | 15:39 |
*** dklyle has joined #openstack-nova | 15:39 | |
lyarwood | mriedem: yup looking | 15:43 |
*** psachin has quit IRC | 15:44 | |
*** rmart04 has joined #openstack-nova | 15:48 | |
*** tssurya has quit IRC | 15:48 | |
*** sahid has quit IRC | 15:49 | |
melwitt | mriedem: I just happened upon the patch for adding the zvm driver to the support matrix https://review.openstack.org/532720 | 15:53 |
melwitt | other doc updates are stacked on top | 15:53 |
melwitt | and I found that no reno was added for the zvm driver at the time of the changes, so I think someone needs to add that | 15:54 |
mriedem | if you want it, it's likely going to have to be you | 15:55 |
mriedem | jichen is probably gone for the day | 15:55 |
melwitt | yeah. I was thinking that, given the time factor | 15:56 |
openstackgerrit | Jay Pipes proposed openstack/nova master: placement: use simple code paths when possible https://review.openstack.org/590388 | 15:56 |
openstackgerrit | Jay Pipes proposed openstack/nova master: split gigantor SQL placement query into multiple https://review.openstack.org/590041 | 15:56 |
*** dklyle has quit IRC | 15:59 | |
*** dklyle has joined #openstack-nova | 16:01 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Adds a test for _get_provider_ids_matching() https://review.openstack.org/590150 | 16:04 |
*** jpena is now known as jpena|off | 16:07 | |
*** Bhujay has joined #openstack-nova | 16:08 | |
mriedem | -1 on the zvm feature support matrix patch | 16:10 |
mriedem | so if we do an rc2, the zvm docs and such might need to fall into that | 16:10 |
melwitt | ack | 16:10 |
mriedem | the only mention that it was added will be in your prelude reno | 16:11 |
*** holser_ has quit IRC | 16:12 | |
melwitt | I know ... I'm writing up it's own reno based on the patches, and hopefully efried can help. I would like it to have its own reno with the details and not have the only mention be in the prelude | 16:12 |
melwitt | I didn't realize it was missing a reno of its own | 16:13 |
mriedem | ok, i personally don't think we should hold up https://review.openstack.org/#/c/589303/ on that, | 16:13 |
mriedem | but ok | 16:14 |
efried | I would only be guessing in writing up that reno. I guess it prolly needs to be done by EOB though, huh? | 16:14 |
efried | mriedem: No, I agree, I was holding on the other issue. | 16:14 |
*** derekh has quit IRC | 16:14 | |
mriedem | the grammar nit? | 16:14 |
mriedem | then let's just fix it inline and approve? | 16:14 |
efried | yeah, sounds good. | 16:15 |
mriedem | melwitt: ^? | 16:15 |
melwitt | mriedem: you think it's ok for that to be the only mention? if so, I'm fine with it. you know a lot more about this than I do | 16:15 |
mriedem | i'm fine with it | 16:15 |
mriedem | the bigger docs series is what really matters | 16:15 |
mriedem | but that's not ready for rc1 | 16:15 |
efried | I can try to rip out a reno real quick | 16:15 |
melwitt | ok, inline fix it and approve is cool with me then | 16:15 |
mriedem | you can always do the docs in rc2 | 16:16 |
efried | mriedem: I'll do the grammar fix and push it. | 16:16 |
mriedem | ack | 16:16 |
openstackgerrit | Eric Fried proposed openstack/nova master: Add a prelude release note for the 18.0.0 Rocky GA https://review.openstack.org/589303 | 16:16 |
melwitt | ok, I wasn't sure if a docs-only thing was rc2 worthy. if it is, then we can do that | 16:16 |
efried | melwitt, mriedem: done. If we're not worried about getting the reno landed today, I'll happily wait for jichenjc to do it. | 16:17 |
mriedem | i'm not losing sleep over a detailed reno for the zvm driver which does a very small number of things - the docs are more important to me | 16:18 |
mriedem | the prelude mentions it | 16:18 |
mriedem | if someone wants to learn more, they can find the driver docs | 16:18 |
mriedem | the way i look at renos, if it's really detailed, it likely needs to be a doc, because release notes are a one time only thing | 16:19 |
mriedem | melwitt: should probably hold up stable releases on https://review.openstack.org/#/q/topic:bug/1784705+status:open | 16:22 |
melwitt | ok, that's helpful. fwiw, I was thinking basic detail in the reno like, what operations are supported (spawn, destroy, snapshot, get console output, power actions) | 16:22 |
melwitt | mriedem: ok, will do | 16:22 |
mriedem | anyone using ironic + ComputeCapabilitiesFilter will be hit by those | 16:22 |
mriedem | melwitt: supported ops for the zvm driver are in the feature support matrix | 16:22 |
mriedem | so i wouldn't put that in the reno | 16:22 |
melwitt | ok. efried, we don't need an additional reno ^ | 16:23 |
efried | Okay, wfm. jichenjc, in case you're snooping later ^ | 16:24 |
efried | The prelude could have linked to the admin config doc, if it was landed, but it ain't, so... | 16:24 |
*** rmart04 has quit IRC | 16:25 | |
*** rmart04 has joined #openstack-nova | 16:25 | |
melwitt | based on what mriedem said, we can have a rc2 because of the docs, and land them there | 16:26 |
melwitt | so I'll add those to https://etherpad.openstack.org/p/nova-rocky-release-candidate-todo | 16:27 |
melwitt | ok he already added them, thanks | 16:27 |
mriedem | lyarwood: should i start reviewing https://review.openstack.org/#/c/587013/ again or do you expect more from mdbooth/ | 16:28 |
mriedem | ? | 16:28 |
lyarwood | mriedem: that should be good now | 16:28 |
melwitt | dansmith, lyarwood: could you pls review these changes for the ironic bug, we're holding the stable releases on those fixes https://review.openstack.org/#/q/topic:bug/1784705+status:open | 16:31 |
*** awaugama has quit IRC | 16:31 | |
lyarwood | melwitt: ack'd the Pike changes | 16:32 |
melwitt | thanks | 16:33 |
*** rmart04 has quit IRC | 16:35 | |
*** hongbin has quit IRC | 16:37 | |
*** Swami has quit IRC | 16:37 | |
*** s10 has joined #openstack-nova | 16:38 | |
mriedem | lyarwood: just a couple of small things in https://review.openstack.org/#/c/587013/ | 16:38 |
mriedem | random musings in your functional test too; i.e. i wonder how many volumes we orphan when nova creates the root volume and we reschedule | 16:44 |
mriedem | nice fun way to go over volume quota | 16:44 |
lyarwood | mriedem: hmmm I forgot the compute did that, does it not see the existing bdm? | 16:46 |
mriedem | the existing bdm will have source_type=image on it or whatever | 16:48 |
mriedem | right? | 16:48 |
mriedem | we don't update that | 16:48 |
mriedem | so it will be transformed to a DriverImageBlockDevice or whatever | 16:48 |
openstackgerrit | Merged openstack/nova stable/pike: Fix bad links for admin-guide https://review.openstack.org/590072 | 16:48 |
mriedem | DriverVolImageBlockDevice | 16:48 |
mriedem | anyway, haven't tested it, but i'm pretty sure that's been busted since forever | 16:48 |
mriedem | we cleanup after ourselves for ports but not volumes | 16:49 |
lyarwood | ah right understood, should be easy enough to show in another functional test | 16:51 |
mriedem | maybe.....i'm not sure the fixture is setup for that really | 16:51 |
mriedem | devstack is probably much easier/faster to start | 16:51 |
mriedem | if you have 2 nodes... | 16:52 |
lyarwood | mriedem: I don't to hand but I'll make a note to give this a go | 16:57 |
*** s10 has quit IRC | 16:57 | |
*** itlinux has joined #openstack-nova | 16:58 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: block_device: Rollback volumes to in-use on DeviceDetachFailed https://review.openstack.org/590439 | 17:21 |
*** udesale has quit IRC | 17:23 | |
*** luksky has quit IRC | 17:27 | |
*** cdent has quit IRC | 17:29 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: placement: ignore policy scope check failures if not enforcing scope https://review.openstack.org/590445 | 17:34 |
*** gouthamr is now known as gouthamr_away | 17:37 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: placement: ignore policy scope check failures if not enforcing scope https://review.openstack.org/590445 | 17:38 |
*** Bhujay has quit IRC | 17:39 | |
openstackgerrit | Merged openstack/nova master: Update the parameter explain when updating a volume attachment https://review.openstack.org/565181 | 17:44 |
*** rmart04 has joined #openstack-nova | 17:47 | |
*** gyee has joined #openstack-nova | 17:47 | |
*** rmart04 has quit IRC | 17:50 | |
*** rmart04 has joined #openstack-nova | 17:51 | |
*** psachin has joined #openstack-nova | 17:51 | |
*** awaugama has joined #openstack-nova | 18:07 | |
*** priteau has quit IRC | 18:15 | |
*** sambetts_ has quit IRC | 18:18 | |
*** sambetts_ has joined #openstack-nova | 18:21 | |
*** panda|ruck is now known as panda|ruck|off | 18:22 | |
*** dtantsur is now known as dtantsur|afk | 18:24 | |
*** owalsh has quit IRC | 18:50 | |
*** owalsh has joined #openstack-nova | 18:51 | |
*** gbarros has joined #openstack-nova | 18:58 | |
*** luksky has joined #openstack-nova | 18:58 | |
*** rmart04 has quit IRC | 19:02 | |
*** mriedem has quit IRC | 19:11 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: placement: ignore policy scope check failures if not enforcing scope https://review.openstack.org/590445 | 19:11 |
*** mriedem has joined #openstack-nova | 19:11 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix image-defined numa claims during evacuate https://review.openstack.org/588657 | 19:15 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add encrypted volume support to feature matrix docs https://review.openstack.org/570255 | 19:18 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: api-ref: fix GET /flavors?is_public description https://review.openstack.org/588092 | 19:28 |
*** pcaruana has quit IRC | 19:33 | |
*** gbarros has quit IRC | 19:35 | |
*** prometheanfire has joined #openstack-nova | 19:37 | |
prometheanfire | is https://github.com/openstack/nova/commit/ff747792b8f5aefe1bebb01bdf49dacc01353348#diff-f4019782d93a196a0d026479e6aa61b1R6938 run multiple times or only checked once? | 19:37 |
openstackgerrit | Merged openstack/nova stable/queens: Fix host validity check for live-migration https://review.openstack.org/590262 | 19:37 |
openstackgerrit | Merged openstack/nova master: [placement] api-ref: add description for 1.29 https://review.openstack.org/589407 | 19:37 |
prometheanfire | live migrations seem to be limited to 1M a sec and never increase | 19:37 |
melwitt | hm | 19:44 |
melwitt | do you know anything about that mriedem ^ | 19:45 |
mriedem | prometheanfire: linuxbridge? | 19:45 |
mriedem | https://github.com/openstack/nova/commit/ff747792b8f5aefe1bebb01bdf49dacc01353348#diff-f4019782d93a196a0d026479e6aa61b1R5380 | 19:46 |
mriedem | are you using linuxbridge i mean | 19:46 |
prometheanfire | ya, lb | 19:47 |
prometheanfire | vlan interface on the VM | 19:47 |
mriedem | well, we should be waiting on network-vif-plugged events from neutron and if we get them, we set the bw back up and resume the live migration, else we should fail the live migration | 19:48 |
mriedem | i'm assuming you have vif_plugging_timeout left at the default config of 300? | 19:48 |
prometheanfire | the migration finishes, it's just slow | 19:49 |
prometheanfire | I don't think we changed it | 19:49 |
mriedem | does it finish in under 5 minutes? | 19:49 |
prometheanfire | takes ~10 min | 19:49 |
prometheanfire | debug log | 19:49 |
prometheanfire | https://gist.githubusercontent.com/mheler/475d21b741aa58f320a456c3ac0d0f45/raw/ff76c45c3b6968b0d0514a9a7dcf478f054f6b70/gistfile1.txt | 19:49 |
mriedem | we should have either timed out and failed by then or reconfigured the guest to go back to the normal bw | 19:49 |
prometheanfire | which includes x-auth info, great | 19:50 |
mriedem | i don't see either the timeout or "VIF events received, continuing migration" messages in those logs | 19:50 |
prometheanfire | ya, either do I, which is why I'm confused | 19:50 |
melwitt | I'm not seeing that message "LOG.debug('VIF events received, continuing migration with max bandwidth configured" in your logs | 19:51 |
prometheanfire | that was the first thing I looked for | 19:51 |
mriedem | _http_log_request /openstack/venvs/nova-r16.2.2/lib/python2.7/site-packages/keystoneauth1/session.py:375 for that x-auth thing | 19:51 |
melwitt | bah, lag | 19:51 |
mriedem | besides me, sahid and dansmith are the other two that know about that change, | 19:51 |
mriedem | but are you sure you actually have that code? | 19:51 |
mriedem | would be nice if we logged something like "waiting for events" in that block | 19:52 |
prometheanfire | ya, set using b58c7f033771e3ea228e4b40c796d1bc95a087f5 from nova | 19:52 |
mriedem | prometheanfire: well your token thing isn't a problem :) https://github.com/openstack/keystoneauth/blob/master/keystoneauth1/session.py#L371 | 19:54 |
mriedem | it's redacted | 19:54 |
mriedem | prometheanfire: do you know the instance id in question here? | 19:54 |
mriedem | checking logs w/o an instance id is kind of hard | 19:54 |
prometheanfire | yes | 19:54 |
mriedem | also, | 19:54 |
mriedem | are these logs from the source or dest host? | 19:55 |
mriedem | b/c what we're looking for would be source host | 19:55 |
prometheanfire | the logs are from grepping it for c37d7489-a67b-47ea-a4f7-9323804cc552 | 19:55 |
prometheanfire | ya, source | 19:55 |
mriedem | 2018-08-09 21:20:20.557 12111 DEBUG nova.compute.manager [req-84ca4a17-0d3d-4597-91d6-f5721989dd41 143ee57edd4d4e3b9a165d375d0e7e1a a727713d2c0a4ed69b730d9cb2116af6 - default default] [instance: c37d7489-a67b-47ea-a4f7-9323804cc552] Received event network-vif-plugged-490f6f25-8b88-487c-a76b-62d16e3c0da1 external_instance_event /openstack/venvs/nova-r16.2.2/lib/python2.7/site-packages/nova/compute/manager.py:7071 | 19:56 |
mriedem | 2018-08-09 21:20:20.558 12111 DEBUG nova.compute.manager [req-84ca4a17-0d3d-4597-91d6-f5721989dd41 143ee57edd4d4e3b9a165d375d0e7e1a a727713d2c0a4ed69b730d9cb2116af6 - default default] [instance: c37d7489-a67b-47ea-a4f7-9323804cc552] No waiting events found dispatching network-vif-plugged-490f6f25-8b88-487c-a76b-62d16e3c0da1 pop_instance_event /openstack/venvs/nova-r16.2.2/lib/python2.7/site-packages/nova/compute/manager.py:36 | 19:56 |
mriedem | 2018-08-09 21:20:20.559 12111 WARNING nova.compute.manager [req-84ca4a17-0d3d-4597-91d6-f5721989dd41 143ee57edd4d4e3b9a165d375d0e7e1a a727713d2c0a4ed69b730d9cb2116af6 - default default] [instance: c37d7489-a67b-47ea-a4f7-9323804cc552] Received unexpected event network-vif-plugged-490f6f25-8b88-487c-a76b-62d16e3c0da1 for instance | 19:56 |
prometheanfire | this is a pike install btw | 19:56 |
mriedem | maybe we're getting the event before we're waiting for it? | 19:56 |
prometheanfire | but it was backported to pike, so meh | 19:56 |
prometheanfire | perhaps | 19:56 |
mriedem | we could also be getting ^ from the vif plug that happens on the dest host during pre-live migration | 19:57 |
mriedem | the events are going to go to the source host | 19:57 |
mriedem | which isn't waiting for those | 19:57 |
prometheanfire | not yet at least, ya | 19:57 |
*** eharney has quit IRC | 19:57 | |
mriedem | https://github.com/openstack/nova/commit/ff747792b8f5aefe1bebb01bdf49dacc01353348#diff-f4019782d93a196a0d026479e6aa61b1R6899 | 19:57 |
mriedem | "They are going to be # created by libvirt at the very beginning of the # live-migration process." | 19:58 |
prometheanfire | yep | 19:58 |
prometheanfire | I read that :P | 19:58 |
mriedem | that must mean plug_vifs during pre-live migration on the dest host | 19:58 |
mriedem | which triggers the event from neutron to the source host | 19:58 |
mriedem | and we're getting it before we start waiting it looks like | 19:58 |
prometheanfire | which isn't waiting yet? | 19:59 |
prometheanfire | ya | 19:59 |
mriedem | but, you should then hit this https://github.com/openstack/nova/commit/ff747792b8f5aefe1bebb01bdf49dacc01353348#diff-f4019782d93a196a0d026479e6aa61b1R6933 | 19:59 |
mriedem | and the migration should fail | 19:59 |
prometheanfire | also yes | 19:59 |
mriedem | added https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L675 in rocky but that doesn't help you here on pike | 20:00 |
prometheanfire | yarp | 20:01 |
mriedem | we can't backport that either b/c it's got rpc changes in it | 20:01 |
mriedem | so maybe the raised MigrationError isn't really doing anything? | 20:01 |
* prometheanfire shrugs | 20:01 | |
mriedem | i expect you to know exactly how all of this nova code works | 20:02 |
prometheanfire | lolol | 20:02 |
*** psachin has quit IRC | 20:02 | |
prometheanfire | ya, I'm honestly surprised that it finished, I expected it to fail | 20:02 |
mriedem | no it shoud raise up, | 20:02 |
mriedem | i was thinking we might be threaded but that's here https://github.com/openstack/nova/blob/ff747792b8f5aefe1bebb01bdf49dacc01353348/nova/virt/libvirt/driver.py#L6928 | 20:03 |
prometheanfire | maybe eventlet.timeout.Timeout isn't the actual error getting raised? | 20:03 |
openstackgerrit | Merged openstack/nova stable/queens: Update nova network info when doing rebuild for evacuate operation https://review.openstack.org/590062 | 20:03 |
prometheanfire | maybe we finish the migration before the timeout occurs? | 20:04 |
mriedem | Timeout is the right error | 20:04 |
mriedem | prometheanfire: well that's why i asked how long it took, | 20:04 |
mriedem | but the default timeout is 5 min | 20:04 |
mriedem | you said it completed in 10 min | 20:04 |
prometheanfire | went from 21:20 to 21:25 ish | 20:05 |
mriedem | oh, well that's not 10 min L( | 20:05 |
mriedem | :) | 20:05 |
prometheanfire | ya | 20:05 |
mriedem | so yeah i bet you completed before the timeout | 20:05 |
*** gbarros has joined #openstack-nova | 20:05 | |
prometheanfire | 21:20:21.043 to 21:27:12.788 at least | 20:06 |
mriedem | but honestly, | 20:06 |
mriedem | the threading here is messing with my head | 20:06 |
prometheanfire | Migration running for 410 secs | 20:06 |
prometheanfire | so over 5 min | 20:06 |
prometheanfire | yep | 20:06 |
mriedem | wait_for_instance_event is meant to register events to wait | 20:06 |
mriedem | then run some code and wait or timeout | 20:06 |
mriedem | "opthread = utils.spawn(self._live_migration_operation" is what code gets run | 20:07 |
mriedem | but that should mean we wait until we do "opthread.link(thread_finished, finish_event)" | 20:08 |
mriedem | even if we get the timeout, it seems this is pretty dangerous if we've already started the live migration in the hypervisor | 20:08 |
mriedem | i +2ed this code too... | 20:09 |
prometheanfire | so we can blame you :P | 20:09 |
mriedem | kinda need dansmith here | 20:09 |
prometheanfire | ya | 20:09 |
mriedem | i still don't know why you wouldn't see the timeout message | 20:09 |
prometheanfire | same | 20:09 |
mriedem | from your log | 20:09 |
mriedem | 2018-08-09 21:27:20.197 12111 DEBUG nova.virt.libvirt.driver [req-84ca4a17-0d3d-4597-91d6-f5721989dd41 143ee57edd4d4e3b9a165d375d0e7e1a a727713d2c0a4ed69b730d9cb2116af6 - default default] [instance: c37d7489-a67b-47ea-a4f7-9323804cc552] Migration operation thread notification thread_finished /openstack/venvs/nova-r16.2.2/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:6952 | 20:09 |
mriedem | sahid will be around in the morning if you can catch him | 20:11 |
dansmith | catch me up? | 20:11 |
*** nicolasbock has quit IRC | 20:11 | |
mriedem | dansmith: prometheanfire is running a linuxbridge live migration in pike, | 20:11 |
mriedem | with that bw slow down patch of sahid's | 20:11 |
mriedem | the live migration takes longer than our default vif plugging timeout | 20:12 |
dansmith | that got backported I assume? | 20:12 |
mriedem | yeah (from us). and looks like we actually get the event before sahid's code registers to wait, | 20:12 |
mriedem | but the weird thing is we don't get the timeout event after 5 minutes | 20:12 |
mriedem | https://gist.githubusercontent.com/mheler/475d21b741aa58f320a456c3ac0d0f45/raw/ff76c45c3b6968b0d0514a9a7dcf478f054f6b70/gistfile1.txt | 20:12 |
mriedem | instance is c37d7489-a67b-47ea-a4f7-9323804cc552 | 20:12 |
dansmith | if it comes before we register it gets dropped, | 20:12 |
prometheanfire | that's the sender | 20:12 |
dansmith | but obviously the point is it's supposed to come after the register as you said | 20:13 |
mriedem | gets dropped but won't we wait for something that doesn't come and timeout? | 20:13 |
dansmith | should timeout yeah | 20:13 |
mriedem | the network-vif-plugged is triggered via plug_vifs during pre_live_migration on the dest, | 20:13 |
mriedem | which happens before his code runs to register the waiter | 20:13 |
mriedem | so it's a total race window | 20:14 |
dansmith | ugh | 20:14 |
mriedem | which is why we added https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L675 | 20:14 |
mriedem | but not backportable | 20:14 |
dansmith | I thought it gets triggered by the actual guest starting on the other side, which came from the actual live migration op | 20:14 |
mriedem | i'd need sahid to confirm that | 20:14 |
mriedem | but network-vif-plugged, as far as i know, comes from plug_vifs on the dest during pre_live_migratoin | 20:15 |
mriedem | which is before his code runs | 20:15 |
mriedem | on the source | 20:15 |
dansmith | so, | 20:15 |
dansmith | the even comes from the tap being created actually | 20:15 |
dansmith | *event | 20:15 |
dansmith | so maybe plug is creating a tap before libvirt does but I'm not sure how we'd give it to it | 20:16 |
mriedem | prometheanfire: i'm assuming you have this https://review.openstack.org/#/c/586965/ | 20:16 |
mriedem | ^ fix for the pike backport | 20:16 |
prometheanfire | ya | 20:17 |
dansmith | I learned this after we were working on that patch though | 20:17 |
prometheanfire | that's within the sha I posted earlier | 20:17 |
prometheanfire | otherwise it wouldn't succeed at all :P | 20:17 |
openstackgerrit | Jay Pipes proposed openstack/nova master: placement: use simple code paths when possible https://review.openstack.org/590388 | 20:17 |
openstackgerrit | Jay Pipes proposed openstack/nova master: split gigantor SQL placement query into multiple https://review.openstack.org/590041 | 20:17 |
mriedem | prometheanfire: yeah | 20:17 |
mriedem | very obvious explosion | 20:17 |
openstackgerrit | Jay Pipes proposed openstack/nova master: Adds a test for _get_provider_ids_matching() https://review.openstack.org/590150 | 20:17 |
mriedem | prometheanfire: you have this? https://review.openstack.org/#/c/510013/ | 20:18 |
sean-k-mooney | mriedem: netwrokg-vif-plugged comre from neutron when it finishes wiering up the port | 20:18 |
mriedem | prometheanfire: this one was fun in that it depended on neutron backports as well | 20:18 |
sean-k-mooney | also i just as pluging stuff so need to scoll back to get context | 20:18 |
prometheanfire | that one I'm not sure, but probably | 20:18 |
mriedem | might want to check | 20:19 |
*** awaugama has quit IRC | 20:19 | |
prometheanfire | checking | 20:19 |
prometheanfire | merged dec 3 into stable pike https://github.com/openstack/neutron/commits/stable/pike?after=ad8f00236cc57ce9a8f077dd2d32c6fada00e817+139 | 20:20 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Handle binding_failed vif plug errors on compute restart https://review.openstack.org/587498 | 20:22 |
dansmith | if the event comes from stuff we're doing in pre_dest, it seems unlikely we'd ever win the race in gate | 20:22 |
prometheanfire | using at least this version of neutron https://github.com/openstack/openstack-ansible/blob/5c341a7bada78edab5f3d132d55adb00eaf2413f/playbooks/defaults/repo_packages/openstack_services.yml#L125 | 20:23 |
prometheanfire | which is from 2018 in may | 20:23 |
mriedem | idk this is where i thought we'd generate the event https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7775 | 20:23 |
prometheanfire | ok, I have to go for a bit, but will be back | 20:24 |
dansmith | well, | 20:24 |
*** tbachman has quit IRC | 20:24 | |
dansmith | I did some debugging on plug stuff with the godaddy people a month or so ago, | 20:24 |
dansmith | and read through all the neutron code related to this | 20:24 |
dansmith | and I was surprised to learn that actually what happens is, | 20:24 |
dansmith | something creates an interface with the right name, | 20:25 |
dansmith | a periodic in the neutron agent notices, hooks it up and sends the event | 20:25 |
dansmith | so it's a little less connected to us than I would have thought | 20:25 |
mriedem | i knew the linuxbridge-agent does polling only b/c sean mooney explained that when we had the issue with waiting for hard reboot events for linuxbridge in the gate | 20:26 |
mriedem | ovs agent listens for an actual event from ovs itself | 20:26 |
dansmith | maybe that's why we can win it in the gate, since there was a polling loop | 20:26 |
*** Sundar has joined #openstack-nova | 20:26 | |
mriedem | we use ovs in the gate | 20:27 |
mriedem | for most everything | 20:27 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Handle binding_failed vif plug errors on compute restart https://review.openstack.org/587498 | 20:27 |
dansmith | I thought we had a LB job? | 20:27 |
mriedem | neutron has a couple of lb jobs i think | 20:27 |
dansmith | regardless, you did a LB migration job and it passed a few times at least | 20:27 |
mriedem | i had rigged up a patch that ran linuxbridge multinode to test sahid's patch | 20:27 |
mriedem | yeah, could have won the race though right? | 20:27 |
Sundar | efried: Please ping me when you have a moment | 20:27 |
efried | Hi Sundar, what's up? | 20:28 |
dansmith | I guess because of the polling loop we have a decent chance | 20:28 |
mriedem | and all of our controller services on a single node slowing things down maybe, idk | 20:28 |
dansmith | because pre-dest is pretty long before we get to the event wait, which seems ...crazy to ever win it | 20:28 |
mriedem | don't know what prometheanfire's env setup is like | 20:28 |
dansmith | if so, presumably this patch broke live migration for anyone with a fast system | 20:28 |
*** owalsh_ has joined #openstack-nova | 20:29 | |
mriedem | god i hope so | 20:29 |
mriedem | that would be magical | 20:29 |
dansmith | so, sahid has been arguing to make the timeout non-fatal, reportedly because someone was using a custom driver or something | 20:29 |
dansmith | but I wonder if it's actually because this is actually totes broken | 20:29 |
Sundar | We were discussing the relative roles of plugins and drivers. It seems to me that that the distinction is not a hard one. We just need some extension agent with clear APIs for both os-acc and Cyborg. It could be the same module providing two sets of APIs. Does that work for you? | 20:29 |
efried | Sundar: Absolutely. | 20:30 |
dansmith | mriedem: melwitt: tbh, if we really think this is that broken (and sounds like it is) then we should probably revert it from the release immediately | 20:30 |
mriedem | well the other thing i was saying above was, | 20:31 |
melwitt | oof | 20:31 |
Sundar | Great. I was trying to delineate the two and it was getting rather ambiguous. Good to have this sorted out. Will send out the os-acc spec with this update | 20:31 |
mriedem | even if we get the TimeoutError, we raise MigrationError or whatever, | 20:31 |
mriedem | but i *think* by that point we've already started the guest transfer | 20:31 |
mriedem | b/c we call _live_migration_operation | 20:31 |
efried | Sundar: Well, we should still delineate the two, if they're going to be *able* to be separate modules. | 20:31 |
dansmith | correct | 20:31 |
mriedem | the nwait | 20:31 |
*** owalsh has quit IRC | 20:31 | |
mriedem | so starting the guest transfer and then timing out on the watier b/c we registered late kills the nova live migration option but not what's in the hypervisor | 20:32 |
mriedem | right? | 20:32 |
mriedem | s/option/operation/ | 20:32 |
dansmith | mriedem: um, what? | 20:32 |
mriedem | if we get the timeout and raise MigrationError | 20:32 |
mriedem | but have already started the guest transfer | 20:32 |
mriedem | the only thing that happens in nova is we call the _rollback_live_migration code, | 20:33 |
mriedem | we don't attempt to kill any live migration job that's running in libvirt | 20:33 |
mriedem | right? | 20:33 |
Sundar | efried: There will certainly be two sets of APIs: one for instance half of the attach and one for the device half. But both can do device-specific, platform-specific and vendor-specific actions. | 20:33 |
efried | Sundar: I dig it. | 20:33 |
efried | Sundar: And I like the idea of being able to supply that code in one module or two. | 20:33 |
dansmith | lemme look | 20:33 |
mriedem | in other words, | 20:34 |
*** owalsh has joined #openstack-nova | 20:34 | |
mriedem | nova will say "migratoin failed" but the guest might actually get transferred | 20:34 |
mriedem | just really f'ing slowly | 20:34 |
dansmith | well, yeah, I mean, the point of this code was to not raise the speed limit until it came | 20:34 |
*** owalsh_ has quit IRC | 20:35 | |
mriedem | so did sahid want the timeout to just log and we'd have a finally that always set the bw back up? | 20:35 |
dansmith | yes | 20:35 |
mriedem | given what seems to be a pretty easy race to fail, that seems like it would have been better | 20:36 |
dansmith | which means you let it go to the other side but without networking | 20:36 |
Sundar | efried: So, it may be superfluous to have two separate modules, which are separately loaded by Stevedore. os-acc would have to load both, and the distinction in terms of what each module does seems to come down to APIs, rather than anything else. So, we might as well define two sets of APIs, and have one module do both. Internally, of course, the module may have separate packages/sub-modules for different functionalities. | 20:36 |
dansmith | mriedem: but the goal of the patch wasn't to "maybe catch the plug event", so if it never came in, it really should stop | 20:36 |
dansmith | mriedem: so I think it should cancel | 20:36 |
dansmith | which I said on the patch a couple of times, but I guess we never even got it that far | 20:37 |
Sundar | We may also provide common functions in os-acc for specific hypervisors | 20:37 |
mriedem | dansmith: ok, well the waiter is in the wrong place then, and https://review.openstack.org/#/c/558001/ was the right thing, | 20:37 |
mriedem | but not backportable | 20:37 |
Sundar | which any driver/plugin/module can invoke | 20:37 |
efried | Sundar: Offhand I don't see a problem with that. Is it ever going to be the case that you need to run one but not both (i.e. a driver but not its corresponding plugin, or vice versa) on a given system? | 20:37 |
sean-k-mooney | efried: Sundar provided there is a well defiend versioned interface its ok but Sundar i dont thin os-acc should be able to alter the hypervior context | 20:37 |
efried | Sundar: Let's move to #openstack-cyborg so we're not cross-talking with the others. | 20:38 |
sean-k-mooney | e.g. just like os-vif os-acc should not be able to modify the libvirt xml | 20:38 |
dansmith | mriedem: yeah, I was just looking through compute manager wondering why the fsck it was in there too | 20:38 |
dansmith | mriedem: does that not work for LB for some reason? | 20:38 |
*** owalsh_ has joined #openstack-nova | 20:39 | |
mriedem | does what not work? | 20:39 |
openstackgerrit | Merged openstack/nova stable/queens: Reload oslo_context after calling monkey_patch() https://review.openstack.org/589249 | 20:39 |
mriedem | https://review.openstack.org/#/c/558001/ ? | 20:39 |
dansmith | yeah | 20:39 |
openstackgerrit | Merged openstack/nova stable/queens: Fix message for unexpected external event https://review.openstack.org/589505 | 20:39 |
*** owalsh has quit IRC | 20:39 | |
mriedem | prometheanfire is failing in pike | 20:39 |
mriedem | https://review.openstack.org/#/c/558001/ is rocky | 20:39 |
openstackgerrit | Merged openstack/nova master: Trivial fix on migration doc https://review.openstack.org/589028 | 20:39 |
mriedem | b/c we backported sahid's patch | 20:39 |
openstackgerrit | Merged openstack/nova master: Add a prelude release note for the 18.0.0 Rocky GA https://review.openstack.org/589303 | 20:39 |
Sundar | sean-k-mooney: os-acc may provide device-specific XML snippets, for example, which libvirt driver would compose into a domain XML. | 20:39 |
Sundar | efried: Sure, joined #openstack-cyborg | 20:40 |
dansmith | mriedem: no, I realize that | 20:40 |
sean-k-mooney | Sundar: i really hope not or that code should live in the nova tree | 20:40 |
efried | sean-k-mooney: Can you join us in -cyborg? | 20:40 |
sean-k-mooney | efried: sure | 20:40 |
dansmith | mriedem: what I'm saying is, because the event gets triggered from pre-migration, the wait should really be up a level in compute manager, which you added in rocky | 20:40 |
dansmith | mriedem: and I'm asking if there's some reason why the wait in compute manager can't work with LB | 20:41 |
dansmith | mriedem: so we like just rip sahid's stuff out of everywhere and make sure that you're including events in the compute manager wait | 20:41 |
sean-k-mooney | efried: #openstack-cyborg? | 20:42 |
efried | yes | 20:42 |
mriedem | dansmith: it should work for LB as far as i know | 20:44 |
dansmith | seems like it | 20:44 |
mriedem | the only backend i know that won't work, | 20:44 |
mriedem | is ODL | 20:44 |
dansmith | in fact | 20:45 |
mriedem | because that doesn't send events on vif plug/unplug, only port host binding changes | 20:45 |
sean-k-mooney | mriedem: lb polls for new interfaces and can miss the addtion and removal of interfaces in some cases | 20:45 |
dansmith | mriedem: you're not only waiting for ovs interfaces there right? | 20:45 |
sean-k-mooney | so we can rely on lb to emit the event | 20:45 |
mriedem | dansmith: correct | 20:45 |
sean-k-mooney | or rathar the lb l2 agent | 20:45 |
dansmith | mriedem: so you should be eating them up there, and then his wait is definitely never going to get them right? | 20:45 |
dansmith | so in pike, I expect it races, | 20:46 |
dansmith | and in rocky it never ever works at all | 20:46 |
mriedem | yeah maybe | 20:47 |
mriedem | i could dig up my 2 node lb ci patch, | 20:47 |
dansmith | oh sweet baby jesus thank you for this day | 20:47 |
mriedem | and enable this waiter in nova on master, | 20:47 |
mriedem | and we'd have to probably turn the vif plugging timeout way down to actually see if we hit a timeout | 20:47 |
mriedem | otherwise i'd expect in the gate, live migration with a tiny cirros guest not doing anything transfers pretty fast | 20:47 |
dansmith | yeah | 20:48 |
mriedem | heh, and i was just going to start mowing and packing | 20:48 |
dansmith | melwitt: so, honestly, reverting sahid's thing for rocky needs to be high prio I think | 20:48 |
dansmith | melwitt: live migration with LB is completely broken I expect | 20:48 |
mriedem | i'll update that ci patch | 20:48 |
melwitt | ok, so is this a RC1 thing or a RC2 thing? | 20:49 |
dansmith | melwitt: and it probably needs to be reverted out of the older releases too | 20:49 |
dansmith | melwitt: your call but RCsomething, IMHO | 20:49 |
prometheanfire | back | 20:49 |
prometheanfire | but leaving soonish | 20:49 |
dansmith | I would think we could do something like what mriedem did in rocky for those releases | 20:49 |
melwitt | ok, definitely RC2. trying to do it by the end of today will be hard unless it gets approved, stat | 20:50 |
melwitt | *definitely RC2, at least | 20:50 |
dansmith | well, maybe not since he needed a signal from the remote machine that it was going to do the wait ... | 20:50 |
prometheanfire | mriedem: I'll try to get a bug reported if you think that's the next step | 20:51 |
dansmith | although the event should still trigger | 20:51 |
mriedem | prometheanfire: yes please | 20:51 |
dansmith | prometheanfire: yes | 20:51 |
mriedem | we'll track that for rc2 | 20:51 |
prometheanfire | rc2? | 20:51 |
mriedem | was going to try and see how long a guest transfer takes in the gate | 20:51 |
mriedem | prometheanfire: today is release candidate 1 day | 20:51 |
prometheanfire | I thought this didn't hit rocky | 20:51 |
prometheanfire | but I'll leave that to you | 20:51 |
mriedem | prometheanfire: new wrinkle | 20:51 |
*** lbragstad has quit IRC | 20:51 | |
prometheanfire | oh, nice | 20:51 |
dansmith | worse wrinkle | 20:52 |
mriedem | (1) probably race fail on pike | 20:52 |
prometheanfire | happy to help :P | 20:52 |
mriedem | (2) totes broken on master | 20:52 |
prometheanfire | even better | 20:52 |
dansmith | prometheanfire: hook me up with a bug number and I'll propose the revert | 20:53 |
dansmith | and I can comment on the bug with all the deets | 20:53 |
*** gouthamr_away is now known as gouthamr | 20:53 | |
dansmith | since mriedem will be busy with the ci patch and packing for da nang | 20:53 |
mriedem | isn't da nang vietnam? | 20:53 |
dansmith | yes | 20:54 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: placement: ignore policy scope check failures if not enforcing scope https://review.openstack.org/590445 | 20:54 |
prometheanfire | I'm hoping to get the user to report, but I will if he went home | 20:54 |
dansmith | the revert is complete conflict | 20:55 |
dansmith | wonderful. | 20:55 |
dansmith | the other benefit of doing this in the manager is that we don't need the silly artificial speed limit | 20:56 |
dansmith | although we probably need sahid and libvirt people to confirm that there's not something we're missing here | 20:56 |
dansmith | because I asked him specifically about doing this early on in his patch and he said it wasn't possible, but I believed that qemu/libvirt on the dest machine were responsible for the plugging at the time | 20:57 |
dansmith | and maybe he did too | 20:57 |
mriedem | meanwhile, our granite seller is being an ass and i have to get back to our plumber | 21:00 |
dansmith | #firstworldrichpersonproblems | 21:01 |
mriedem | ok so looking at a job, we register waiting for events starting here http://logs.openstack.org/98/587498/1/check/nova-live-migration/5ff805a/logs/screen-n-cpu.txt.gz#_Jul_31_17_18_19_562999 | 21:06 |
mriedem | Jul 31 17:18:19.562999 ubuntu-xenial-inap-mtl01-0001077052 nova-compute[2320]: DEBUG nova.compute.manager [None req-942f438c-3cbb-4ce7-8afb-ecd250c98f75 tempest-LiveAutoBlockMigrationV225Test-1515676049 tempest-LiveAutoBlockMigrationV225Test-1515676049] [instance: 7f68c430-f565-433a-8f87-27b9a00d29a0] Preparing to wait for external event network-vif-plugged-6b030652-5fe6-471a-b7ec-0b70e95159a4 {{(pid=2320) prepare_for_instanc | 21:06 |
mriedem | ent /opt/stack/new/nova/nova/compute/manager.py:328}} | 21:06 |
mriedem | pre_live_migration takes about 7 seconds | 21:07 |
mriedem | Jul 31 17:18:26.178239 ubuntu-xenial-inap-mtl01-0001077052 nova-compute[2320]: INFO nova.compute.manager [None req-994d9893-545f-47b9-b93e-1e21cb439db7 tempest-LiveMigrationTest-233418614 tempest-LiveMigrationTest-233418614] [instance: 7f68c430-f565-433a-8f87-27b9a00d29a0] Took 6.61 seconds for pre_live_migration on destination host ubuntu-xenial-inap-mtl01-0001077053. | 21:07 |
mriedem | Jul 31 17:18:26.237174 ubuntu-xenial-inap-mtl01-0001077052 nova-compute[2320]: DEBUG nova.virt.libvirt.driver [None req-994d9893-545f-47b9-b93e-1e21cb439db7 tempest-LiveMigrationTest-233418614 tempest-LiveMigrationTest-233418614] [instance: 7f68c430-f565-433a-8f87-27b9a00d29a0] Starting monitoring of live migration {{(pid=2320) _live_migration /opt/stack/new/nova/nova/virt/libvirt/driver.py:7555}} | 21:09 |
mriedem | start monitoring the live migration there ^ | 21:09 |
mriedem | live migration complete: | 21:09 |
mriedem | Jul 31 17:18:27.486275 ubuntu-xenial-inap-mtl01-0001077052 nova-compute[2320]: INFO nova.compute.manager [None req-14ccce2e-8610-47b7-aba2-77f6fc468b61 tempest-LiveMigrationRemoteConsolesV26Test-255629851 tempest-LiveMigrationRemoteConsolesV26Test-255629851] [instance: 7f68c430-f565-433a-8f87-27b9a00d29a0] VM Migration completed (Lifecycle Event) | 21:09 |
mriedem | heh 1 second? | 21:10 |
dansmith | makes sense.. I doubt a cirros guest has more than a hundred meg of dirty ram | 21:10 |
dansmith | which is 1 second at gigE | 21:11 |
mriedem | i never see "VIF events received, continuing migration" | 21:11 |
dansmith | it's not LB | 21:11 |
dansmith | right? | 21:11 |
mriedem | oh right duh | 21:11 |
prometheanfire | you'll probably want to retitle the bug https://bugs.launchpad.net/nova/+bug/1786346 | 21:11 |
openstack | Launchpad bug 1786346 in OpenStack Compute (nova) "live migrations slow" [Undecided,New] | 21:11 |
prometheanfire | mriedem: dansmith ^ | 21:11 |
dansmith | prometheanfire: thanks | 21:11 |
prometheanfire | if you can let me know when you update the bug with details I'd appreciate it | 21:11 |
dansmith | I'm trying to get the revert to even pass tests and then I will | 21:12 |
prometheanfire | thanks | 21:14 |
*** dosaboy has joined #openstack-nova | 21:14 | |
*** dave-mccowan has quit IRC | 21:17 | |
mriedem | dansmith: ok so https://review.openstack.org/553608 should do the wait in compute now | 21:18 |
*** rmart04 has joined #openstack-nova | 21:18 | |
dansmith | mriedem: cool, updating the bug now and working on the revert in parallel, so we can make that depend on the revert to be sure we don't get the timeout message at least right? | 21:19 |
*** mhen has quit IRC | 21:19 | |
mriedem | well, that's why i was looking at timings, | 21:19 |
mriedem | because this means we'll consume the network-vif-plugged event from pre_live_migration before we call driver.live_migration which does the bw stuff, | 21:20 |
*** gouthamr is now known as gouthamr|brb | 21:20 | |
mriedem | the event won't come for that waiter, | 21:20 |
mriedem | but the guest transfer is so fast, won't we just finish the operatoin before we ever had a chance to timeout? | 21:20 |
mriedem | like, do i need a patch that puts a fake sleep in the driver's live migratoin metohd? | 21:20 |
mriedem | *method | 21:20 |
*** gouthamr|brb is now known as gouthamr | 21:20 | |
*** rmart04 has quit IRC | 21:20 | |
*** gouthamr is now known as gouthamr|brb | 21:21 | |
dansmith | even at 1MB/s? | 21:21 |
mriedem | could set the vif_plugging_timeout to like 1 minute, and add a 30 second sleep in the driver | 21:21 |
mriedem | well, | 21:21 |
dansmith | should go slower there, but I guess it won't take long enough | 21:21 |
mriedem | maybe not, but the test will timeout before the 5 minute vif_plugging_timeout i think | 21:21 |
dansmith | yeah okay so we'll have to force it down I guess | 21:21 |
mriedem | just wondering if i should set the vif_plugging_timeout to like 1 minute | 21:22 |
dansmith | 30sec but yeah | 21:23 |
*** mhen has joined #openstack-nova | 21:23 | |
mriedem | ok updated; hopefully my zuul fu is strong enough | 21:23 |
jaypipes | melwitt, dansmith: regarding https://review.openstack.org/#/c/540258, even if we fix the scheduler/top-level issues around server group affinity and multiple cells, that's still not going to fix the eleventh-hour on-the-compute-node checks that currently run just for affinity groups, though, right? I mean, the computes can't talk cross-cell anyway so there would be no way for those on-compute-node checks to run... | 21:23 |
mriedem | jaypipes: yes https://review.openstack.org/#/c/540258/8/nova/scheduler/utils.py@738 | 21:24 |
mriedem | "Also note that we could be racing if we have multiple server create requests for the same affinity group and the scheduler decides to put them each in different cells - the late affinity check in the compute won't resolve that because the upcall check is targeted to the cell the compute is in, and won't see any other hosts for other members in other cells. Separate bug though..." | 21:25 |
*** owalsh_ is now known as owalsh | 21:25 | |
jaypipes | mriedem: ack, ok, just wanted to verify I wasn't crazypants. | 21:26 |
*** slaweq has quit IRC | 21:26 | |
mriedem | good to have more than just me thinking that | 21:26 |
mriedem | *that it's an issue.. | 21:27 |
mriedem | not that you're (not?) crazy | 21:27 |
dansmith | prometheanfire: commented on the bug | 21:27 |
melwitt | yeah, it makes sense. I was re-thinking about the upcalls described in https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#operations-requiring-upcalls and it's true that they can only work if you're single cell | 21:28 |
melwitt | once you're multi-cell/split-MQ I think none of them can work | 21:28 |
smcginnis | melwitt: Howdy. How are things coming along for the RC? | 21:29 |
mriedem | ha | 21:30 |
*** rmart04 has joined #openstack-nova | 21:30 | |
dansmith | smcginnis: awesome | 21:30 |
dansmith | they're going awesome | 21:30 |
dansmith | thanks for asking | 21:30 |
melwitt | haha .... | 21:30 |
smcginnis | And I know no one here would ever be sarcastic so... that's great! | 21:31 |
smcginnis | :P | 21:31 |
melwitt | smcginnis: we stumbled upon something that we'll need to fix for rc2. but as for rc1, I'm waiting on the RPC version alias patch to land, then will propose the release for rc1 | 21:31 |
smcginnis | melwitt: Cool, sounds good. I know a few others already know they will need to get an RC2, so that's no big deal. Thanks. | 21:32 |
mriedem | dansmith: so one comment on the bug, | 21:33 |
mriedem | by default the compute manager won't wait for the event | 21:33 |
mriedem | the config is false for backward compat | 21:33 |
melwitt | mriedem: I guess, since we're having a rc2, should I just leave the rpc version alias for then? or get it for rc1? | 21:33 |
melwitt | smcginnis: we're not alone... :) | 21:33 |
dansmith | mriedem: oh? I didn't see a config valve | 21:34 |
mriedem | yes, live_migration_wait_for_vif_plug | 21:34 |
mriedem | b/c not all network backends send the event for just vif plugging (ODL) | 21:34 |
*** rmart04 has quit IRC | 21:34 | |
smcginnis | melwitt: We can wait a bit. Unless you think it will be several hours yet. | 21:34 |
mriedem | melwitt: probably fine either way | 21:35 |
dansmith | mriedem: ah, so we handled that in sahid's patch by looking for the vif_Type | 21:35 |
melwitt | mriedem: thx. probably will just go ahead now and let that be in rc2 | 21:35 |
dansmith | making people know to opt into proper behavior kidna sucks | 21:35 |
melwitt | the version alias was recheck at 13:30 so it's got awhile before it will have a chance to merge | 21:36 |
mriedem | true, | 21:36 |
mriedem | but ODL shows up as ovs vif_type | 21:36 |
dansmith | oh | 21:36 |
dansmith | well, that sucks | 21:36 |
mriedem | which means we have no idea how to not wait for ODL | 21:36 |
mriedem | remember the hard reboot wait fiasco? | 21:36 |
dansmith | well, I'd say let them opt out | 21:36 |
dansmith | personallt | 21:36 |
dansmith | but whatever | 21:36 |
mriedem | that's why i plan on making the option True by default in stein | 21:36 |
melwitt | smcginnis: the version alias patch was rechecked an hour ago, so it'll be awhile to merge, if the gate doesn't fail on us again. but since we're having a rc2, that patch will be fine to go into rc2, so I can just cut rc1 now | 21:37 |
dansmith | oh I see, I just read your comment | 21:37 |
dansmith | okay | 21:37 |
mriedem | i might have also defaulted to False when i originally thought we'd backport this | 21:37 |
smcginnis | melwitt: Up to you. I'm fine waiting awhile. | 21:37 |
mriedem | smcginnis doesn't have to go to china tomorrow | 21:37 |
melwitt | heh | 21:37 |
smcginnis | :) | 21:38 |
openstackgerrit | Dan Smith proposed openstack/nova master: Revert "libvirt: slow live-migration to ensure network is ready" https://review.openstack.org/590538 | 21:40 |
dansmith | that was a super nasty revert, fyi | 21:40 |
dansmith | so look at it with critical eyes | 21:40 |
melwitt | mriedem, get out soul crusher #3 | 21:40 |
*** gouthamr|brb is now known as gouthamr | 21:40 | |
melwitt | smcginnis: am I to include cycle-highlights in the patch? I see that was done for queens | 21:45 |
melwitt | *in the release patch | 21:45 |
prometheanfire | dansmith: thanks | 21:46 |
smcginnis | melwitt: Ideally, yes. Marketing type folks would love to have that. | 21:48 |
melwitt | ok, I will include them | 21:48 |
smcginnis | melwitt: It can be a follow up patch too though. | 21:48 |
melwitt | thanks | 21:48 |
*** lbragstad has joined #openstack-nova | 21:55 | |
mriedem | dansmith: ok done | 21:56 |
mriedem | the params stuff looks like it made that terrible | 21:56 |
dansmith | yes, yes it did | 21:57 |
dansmith | mriedem: you want a reno that says what exactly? | 21:57 |
*** mchlumsky has quit IRC | 21:58 | |
mriedem | well we can revert this because the original bug is fixed with the new config option right? | 21:58 |
dansmith | that bug $orig was solved automatically but because of bug $new you must now enable $conf? | 21:58 |
mriedem | right | 21:58 |
dansmith | the original bug was arguably less bad than the current state | 21:58 |
dansmith | okay | 21:58 |
mriedem | the chances of anyone even having picked up that fix on stable already and be relying on it are pretty slim, at least for upstream, but you guys sound like you had at least one major customer that needed this | 21:59 |
dansmith | they did, but it never worked for them.. I think I now know why :) | 21:59 |
dansmith | mriedem: and are you asking me to actually clean up those tests here or just commenting about later reverts? | 22:01 |
melwitt | mriedem: rc1 release proposed https://review.openstack.org/590574 | 22:04 |
melwitt | I tried to pick the top highlights, let me know if I should add/remove based on your opinion | 22:05 |
mriedem | dansmith: just commenting, and that we can clean up that other unused stuff in the later separate revert | 22:05 |
openstackgerrit | Dan Smith proposed openstack/nova master: Revert "libvirt: slow live-migration to ensure network is ready" https://review.openstack.org/590538 | 22:07 |
*** luksky has quit IRC | 22:07 | |
*** neiljerram has quit IRC | 22:08 | |
dansmith | since we're doing this in rc2, and since melwitt will be up early tomorrow to ping him anyway, I assume we're going to wait for sahid's ack before putting this in? | 22:08 |
mriedem | melwitt: lgtm | 22:08 |
mriedem | yeah, also waiting on the recreate in my ci patch | 22:09 |
dansmith | yeah | 22:09 |
melwitt | yeah, let's talk to sahid tomorrow | 22:09 |
*** rcernin has joined #openstack-nova | 22:09 | |
*** slaweq has joined #openstack-nova | 22:11 | |
*** imacdonn has quit IRC | 22:12 | |
*** imacdonn has joined #openstack-nova | 22:12 | |
*** tobasco is now known as tobias-urdin | 22:14 | |
mriedem | and, | 22:15 |
mriedem | just wrapped up my plumbing thing | 22:15 |
mriedem | it's all coming together | 22:15 |
*** slaweq has quit IRC | 22:15 | |
melwitt | granite all stars | 22:15 |
mriedem | well you see the granite tops come with a free sink but i need to know the dimensions for the plumber otherwise we needed to order our own which costs extra obviously and there is a time crunch and just omwoeoweitew | 22:16 |
melwitt | haha | 22:16 |
* mriedem goes to mow the lawn - the CI job is running tempest now | 22:21 | |
mriedem | https://review.openstack.org/#/c/553608/ | 22:21 |
*** itlinux has quit IRC | 22:23 | |
*** _ix has quit IRC | 22:29 | |
openstackgerrit | Merged openstack/nova stable/ocata: [stable only] Handle quota usage during create/delete races https://review.openstack.org/582413 | 22:31 |
openstackgerrit | Merged openstack/nova master: Update ssh configuration doc https://review.openstack.org/589844 | 22:31 |
openstackgerrit | Merged openstack/nova stable/queens: [placement] Retry allocation writes server side https://review.openstack.org/588569 | 22:35 |
openstackgerrit | Merged openstack/nova stable/pike: Reload oslo_context after calling monkey_patch() https://review.openstack.org/589251 | 22:38 |
*** sambetts_ has quit IRC | 22:52 | |
*** claudiub has quit IRC | 22:55 | |
*** evrardjp has quit IRC | 22:55 | |
*** sambetts_ has joined #openstack-nova | 22:58 | |
*** sambetts_ has quit IRC | 23:02 | |
openstackgerrit | melanie witt proposed openstack/nova master: Add functional test for affinity with multiple cells https://review.openstack.org/585073 | 23:02 |
openstackgerrit | melanie witt proposed openstack/nova master: Make scheduler.utils.setup_instance_group query all cells https://review.openstack.org/540258 | 23:02 |
*** sambetts_ has joined #openstack-nova | 23:06 | |
*** gyee has quit IRC | 23:06 | |
melwitt | guess I'll be holding off on stable releases because of the slow live migration issue | 23:06 |
*** slaweq has joined #openstack-nova | 23:11 | |
*** gbarros has quit IRC | 23:11 | |
*** slaweq has quit IRC | 23:16 | |
*** efried has quit IRC | 23:31 | |
*** efried has joined #openstack-nova | 23:31 | |
*** gbarros has joined #openstack-nova | 23:50 | |
*** slagle has joined #openstack-nova | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!