*** tetsuro_ has quit IRC | 00:01 | |
*** tbachman has joined #openstack-nova | 00:02 | |
*** gyee has quit IRC | 00:06 | |
*** Nel1x has joined #openstack-nova | 00:09 | |
openstackgerrit | zhufl proposed openstack/nova master: Fix none-ascii char in doc https://review.openstack.org/588422 | 00:13 |
---|---|---|
*** frankwang has joined #openstack-nova | 00:17 | |
*** gbarros has quit IRC | 00:19 | |
*** zhurong has joined #openstack-nova | 00:22 | |
*** harlowja has quit IRC | 00:26 | |
*** mvkr has quit IRC | 00:26 | |
*** mvkr has joined #openstack-nova | 00:27 | |
*** mvkr has quit IRC | 00:39 | |
*** mvkr has joined #openstack-nova | 00:40 | |
*** gbarros has joined #openstack-nova | 00:40 | |
*** frankwang has quit IRC | 00:42 | |
*** tetsuro_ has joined #openstack-nova | 00:44 | |
openstackgerrit | Merged openstack/nova master: Refactor AllocationFixture in placement test https://review.openstack.org/588159 | 00:46 |
openstackgerrit | Merged openstack/nova master: Adds a test for getting allocations API https://review.openstack.org/588886 | 00:50 |
*** mvkr has quit IRC | 00:50 | |
*** mvkr has joined #openstack-nova | 00:51 | |
*** gbarros has quit IRC | 00:59 | |
*** tetsuro_ has quit IRC | 01:01 | |
*** mvkr has quit IRC | 01:01 | |
*** mvkr has joined #openstack-nova | 01:02 | |
*** hongbin has joined #openstack-nova | 01:03 | |
*** erlon has joined #openstack-nova | 01:06 | |
*** tetsuro_ has joined #openstack-nova | 01:09 | |
openstackgerrit | Merged openstack/nova master: Not use project table for user table https://review.openstack.org/588887 | 01:11 |
*** erlon has quit IRC | 01:13 | |
mriedem | gibi: fyi i can't attend the notification meeting this week | 01:15 |
*** mriedem has quit IRC | 01:15 | |
*** erlon has joined #openstack-nova | 01:26 | |
*** zhurong has quit IRC | 01:26 | |
*** tetsuro_ has quit IRC | 01:27 | |
*** tetsuro_ has joined #openstack-nova | 01:28 | |
*** tetsuro_ has quit IRC | 01:34 | |
openstackgerrit | zhufl proposed openstack/nova master: xx_instance_type_id in list_migrations should be integer https://review.openstack.org/588481 | 01:40 |
*** tetsuro_ has joined #openstack-nova | 01:41 | |
*** tetsuro_ has quit IRC | 01:44 | |
*** gbarros has joined #openstack-nova | 01:49 | |
*** tetsuro_ has joined #openstack-nova | 01:53 | |
*** Dinesh_Bhor has joined #openstack-nova | 02:01 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Use common functions in granular fixture https://review.openstack.org/588113 | 02:04 |
*** naichuans has joined #openstack-nova | 02:12 | |
*** lbragstad has quit IRC | 02:18 | |
*** tetsuro_ has quit IRC | 02:19 | |
*** tetsuro_ has joined #openstack-nova | 02:23 | |
*** tetsuro_ has quit IRC | 02:26 | |
*** donghm has joined #openstack-nova | 02:26 | |
*** BrinZhang has joined #openstack-nova | 02:38 | |
*** psachin has joined #openstack-nova | 02:40 | |
*** erlon has quit IRC | 02:45 | |
*** tetsuro_ has joined #openstack-nova | 02:50 | |
openstackgerrit | melanie witt proposed openstack/nova-specs master: Add a script for counting blueprints https://review.openstack.org/581914 | 02:52 |
*** tetsuro__ has joined #openstack-nova | 02:53 | |
*** tetsuro_ has quit IRC | 02:54 | |
openstackgerrit | Merged openstack/nova master: Define irrelevant-files for tempest-full-py3 job https://review.openstack.org/589039 | 02:58 |
*** tetsuro__ has quit IRC | 02:59 | |
*** bbbbzhao_ has quit IRC | 03:08 | |
*** jhesketh_ is now known as jhesketh | 03:14 | |
*** vivsoni_ has quit IRC | 03:19 | |
*** Nel1x has quit IRC | 03:22 | |
*** gbarros has quit IRC | 03:26 | |
*** dave-mccowan has quit IRC | 03:31 | |
*** liuyulong has joined #openstack-nova | 03:34 | |
*** tetsuro_ has joined #openstack-nova | 03:39 | |
*** janki has joined #openstack-nova | 03:41 | |
*** tetsuro_ has quit IRC | 03:47 | |
*** tetsuro_ has joined #openstack-nova | 03:48 | |
*** udesale has joined #openstack-nova | 03:48 | |
*** tetsuro_ has quit IRC | 03:51 | |
*** jaypipes has quit IRC | 04:02 | |
*** jaypipes has joined #openstack-nova | 04:02 | |
*** Dinesh_Bhor has quit IRC | 04:04 | |
*** tetsuro_ has joined #openstack-nova | 04:12 | |
*** Bhujay has joined #openstack-nova | 04:13 | |
*** vivsoni has joined #openstack-nova | 04:14 | |
*** hongbin has quit IRC | 04:15 | |
*** Bhujay has quit IRC | 04:30 | |
openstackgerrit | Merged openstack/python-novaclient master: Refactor the getid method in novaclient/base.py https://review.openstack.org/588983 | 04:56 |
*** links has joined #openstack-nova | 04:57 | |
*** Dinesh_Bhor has joined #openstack-nova | 05:01 | |
openstackgerrit | Merged openstack/nova master: Use common functions in NonSharedStorageFixture https://review.openstack.org/588114 | 05:01 |
*** stakeda has joined #openstack-nova | 05:04 | |
*** ratailor has joined #openstack-nova | 05:11 | |
*** tetsuro_ has quit IRC | 05:15 | |
*** psachin has quit IRC | 05:18 | |
*** tetsuro_ has joined #openstack-nova | 05:22 | |
*** psachin has joined #openstack-nova | 05:25 | |
*** jaosorior has quit IRC | 05:26 | |
*** tetsuro_ has quit IRC | 05:27 | |
*** nicolasbock has joined #openstack-nova | 05:46 | |
*** Luzi has joined #openstack-nova | 05:47 | |
*** moshele has joined #openstack-nova | 05:56 | |
*** tetsuro_ has joined #openstack-nova | 05:59 | |
*** tetsuro_ has quit IRC | 06:02 | |
*** tetsuro_ has joined #openstack-nova | 06:02 | |
*** jaosorior has joined #openstack-nova | 06:05 | |
openstackgerrit | Chen proposed openstack/nova master: Trivial fix on migration doc https://review.openstack.org/589028 | 06:24 |
openstackgerrit | Takashi NATSUME proposed openstack/python-novaclient master: Fix server strings in reboot operation https://review.openstack.org/588981 | 06:30 |
openstackgerrit | Takashi NATSUME proposed openstack/python-novaclient master: Fix server strings in reboot operation https://review.openstack.org/588981 | 06:31 |
*** hoonetorg has quit IRC | 06:33 | |
*** pcaruana has joined #openstack-nova | 06:36 | |
*** Dinesh_Bhor has quit IRC | 06:36 | |
*** tetsuro_ has quit IRC | 06:36 | |
*** Dinesh_Bhor has joined #openstack-nova | 06:37 | |
*** Bhujay has joined #openstack-nova | 06:42 | |
*** whoami-rajat has joined #openstack-nova | 06:45 | |
*** hoonetorg has joined #openstack-nova | 06:49 | |
*** Dinesh_Bhor has quit IRC | 06:52 | |
*** luksky has joined #openstack-nova | 06:54 | |
*** adrianc has joined #openstack-nova | 06:55 | |
*** amarao has joined #openstack-nova | 07:01 | |
*** dpawlik has joined #openstack-nova | 07:23 | |
*** wznoinsk has quit IRC | 07:27 | |
*** wznoinsk has joined #openstack-nova | 07:27 | |
*** avolkov has joined #openstack-nova | 07:28 | |
*** adrianc_ has joined #openstack-nova | 07:30 | |
*** adrianc has quit IRC | 07:34 | |
*** jpena|off is now known as jpena | 07:35 | |
*** sahid has joined #openstack-nova | 07:36 | |
*** luksky has quit IRC | 07:41 | |
*** rmart04 has joined #openstack-nova | 07:44 | |
*** jaosorior has quit IRC | 07:49 | |
*** maciejjozefczyk has quit IRC | 07:57 | |
*** slunkad has quit IRC | 07:58 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: [placement] Add version directives in the history doc https://review.openstack.org/589392 | 07:59 |
*** maciejjozefczyk has joined #openstack-nova | 07:59 | |
*** maciejjozefczyk has joined #openstack-nova | 08:00 | |
*** jaosorior has joined #openstack-nova | 08:02 | |
*** jarod_ has joined #openstack-nova | 08:07 | |
*** rcernin has quit IRC | 08:11 | |
*** cdent has joined #openstack-nova | 08:14 | |
openstackgerrit | Chris Dent proposed openstack/nova master: [placement] Add /reshaper handler for POST https://review.openstack.org/576927 | 08:27 |
openstackgerrit | Chris Dent proposed openstack/nova master: reshaper: Look up provider if not in inventories https://review.openstack.org/585033 | 08:27 |
openstackgerrit | Chris Dent proposed openstack/nova master: Make get_allocations_for_resource_provider sane https://review.openstack.org/584598 | 08:27 |
openstackgerrit | Chris Dent proposed openstack/nova master: Report client: Real get_allocs_for_consumer https://review.openstack.org/584599 | 08:27 |
openstackgerrit | Chris Dent proposed openstack/nova master: Report client: get_allocations_for_provider_tree https://review.openstack.org/584648 | 08:27 |
openstackgerrit | Chris Dent proposed openstack/nova master: Report client: _reshape helper, placement min bump https://review.openstack.org/585034 | 08:27 |
openstackgerrit | Chris Dent proposed openstack/nova master: Report client: update_from_provider_tree w/reshape https://review.openstack.org/585049 | 08:27 |
openstackgerrit | Chris Dent proposed openstack/nova master: Compute: Handle reshaped provider trees https://review.openstack.org/576236 | 08:27 |
*** sayalilunkad has joined #openstack-nova | 08:31 | |
*** owalsh_ is now known as owalsh | 08:33 | |
*** panda|rover-ish is now known as panda|rover | 08:33 | |
*** mhen has joined #openstack-nova | 08:37 | |
mhen | hello everybody! can you tell me the reason for using binascii.hexlify() to transform secrets into a hex representation before passing them to cryptsetup? example: https://github.com/openstack/nova/blob/25fa2470e220a83ce632fed70ad41e55aabda0da/nova/privsep/libvirt.py#L53 | 08:38 |
openstackgerrit | zhufl proposed openstack/nova master: xx_instance_type_id in list_migrations should be integer https://review.openstack.org/588481 | 08:39 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: fixtures: Track volume attachments within CinderFixtureNewAttachFlow https://review.openstack.org/587013 | 08:42 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: Add regression test for bug#1784353 https://review.openstack.org/587014 | 08:42 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: conductor: Recreate volume attachments during a reschedule https://review.openstack.org/587071 | 08:42 |
lyarwood | mhen: so at present we are getting binary key from barbican that we need to turn into an ascii string to provide to cryptsetup / libvirt | 08:43 |
lyarwood | mhen: the binascii calls are legacy code that we've not changed in order to not break existing users | 08:44 |
lyarwood | mhen: really we need barbican to provide actual passphrases instead of binary keys to avoid using this in the future | 08:44 |
lyarwood | mdbooth: ^ respun my changes above, thanks for catching the off by one mistake! | 08:45 |
lyarwood | hopefully the py35-functional tests should also be fixed now | 08:46 |
*** derekh has joined #openstack-nova | 08:46 | |
mhen | lyarwood, ok thanks I figured as much - is there any reason for using a hex encoding specifically? When talking about passphrases, you effectively double the string length since hex can only represent half as much in a single byte/character iirc. | 08:46 |
mdbooth | lyarwood: :) Do you know where the attachment delete code is, btw? | 08:46 |
mdbooth | lyarwood: Any reason it doesn't remove the attachment from the BDM. | 08:46 |
mdbooth | ? | 08:46 |
lyarwood | mdbooth: yeah I left a comment, https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2503-L2505, is bdm.attachment_id is used all over the place to determine if we are using v3 | 08:47 |
mdbooth | lyarwood: Eurgh | 08:48 |
lyarwood | but we'd still need this code to recreate the attachments even if we do remove it | 08:48 |
lyarwood | as we don't get back up to the API where the initial attachments are created | 08:48 |
mdbooth | lyarwood: Right, it's just that by getting out of whack we're opening ourselves to potential errors, and we also have to ping cinder to see if that's what we've done | 08:48 |
mdbooth | That's not a problem in your patch, but it's unfortunate | 08:49 |
lyarwood | mhen: I don't know of any reason, it's just a legacy choice that I've wanted to remove for ages but can't with existing users | 08:49 |
lyarwood | mdbooth: yeah, I guess without the attachment_id we would need to do the v2 / v3 checks again in the conductor etc | 08:49 |
mdbooth | lyarwood: Would you, though? Why wouldn't you just use v3 if conn_info and attachment_id are both unset? | 08:50 |
*** luksky has joined #openstack-nova | 08:51 | |
lyarwood | mdbooth: iirc the API has a series of compute version checks it goes through before creating the attachments | 08:51 |
*** adrianc__ has joined #openstack-nova | 08:51 | |
*** Kevin_Zheng has joined #openstack-nova | 08:51 | |
*** tetsuro_ has joined #openstack-nova | 08:53 | |
*** donghm has quit IRC | 08:54 | |
mdbooth | lyarwood: Is it possible to have a reschedule *without* the original 'reservation' attachments having been deleted? | 08:54 |
mdbooth | e.g. by an early failure which doesn't cause cleanup to go through _shutdown_instance? | 08:55 |
*** adrianc_ has quit IRC | 08:55 | |
*** priteau has joined #openstack-nova | 08:55 | |
lyarwood | mdbooth: yeah but that should be fine, the destination compute should just UPDATE the existing attachment at that point and an updated connection_info dict in return from cinder | 08:56 |
lyarwood | mdbooth: nova/virt/block_device.py -> _volume_attach | 08:58 |
gibi | dansmith: double checked the legacy allocation handling and we are lucky as the legacy codepath does the right thing for revert. Anyhow filled a bug https://bugs.launchpad.net/nova/+bug/1785776 | 08:59 |
openstack | Launchpad bug 1785776 in OpenStack Compute (nova) "resize revert still hitting the legacy allocation handling " [Undecided,New] | 08:59 |
mhen | lyarwood, I assume that the hex representation was chosen to have a predictable and limited set of string characters in order to avoid any problems related to special characters? | 09:01 |
lyarwood | mhen: yeah that could very well be the case | 09:01 |
*** jaosorior has quit IRC | 09:02 | |
*** josecastroleon has joined #openstack-nova | 09:14 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: [placement] api-ref: add description for 1.29 https://review.openstack.org/589407 | 09:15 |
kosamara | efried: I studied your device-passthrough spec for nova-powervm. Do you plan to propose this or part of it in nova? | 09:18 |
*** finucannot is now known as stephenfin | 09:30 | |
*** maciejjozefczyk has quit IRC | 09:32 | |
*** sahid has quit IRC | 09:39 | |
openstackgerrit | Rajesh Tailor proposed openstack/nova master: Fix host validity check for live-migration https://review.openstack.org/401009 | 09:42 |
*** tetsuro_ has quit IRC | 09:45 | |
*** tetsuro_ has joined #openstack-nova | 09:45 | |
mdbooth | lyarwood: I still think you're missing a test, btw. https://review.openstack.org/#/c/587071/8/nova/tests/unit/conductor/test_conductor.py | 09:50 |
lyarwood | mdbooth: kk, where it exists? | 09:50 |
lyarwood | mdbooth: either way the functional tests still aren't happy so I'll respin later today | 09:51 |
mdbooth | lyarwood: yeah. I just checked the code and there's a reasonably wide window for errors which don't result in attachment deletion. | 09:51 |
*** rmart04 has quit IRC | 09:53 | |
*** tetsuro_ has quit IRC | 09:59 | |
*** tetsuro has joined #openstack-nova | 09:59 | |
*** adrianc_ has joined #openstack-nova | 10:01 | |
*** Dinesh_Bhor has joined #openstack-nova | 10:02 | |
*** adrianc__ has quit IRC | 10:05 | |
*** sahid has joined #openstack-nova | 10:05 | |
*** jamesdenton has quit IRC | 10:18 | |
*** tetsuro has quit IRC | 10:21 | |
*** sahid has quit IRC | 10:27 | |
*** links has quit IRC | 10:27 | |
*** tetsuro has joined #openstack-nova | 10:28 | |
*** tetsuro has quit IRC | 10:28 | |
*** Dinesh_Bhor has quit IRC | 10:30 | |
*** jamesdenton has joined #openstack-nova | 10:37 | |
*** sahid has joined #openstack-nova | 10:38 | |
*** BrinZhang has quit IRC | 10:45 | |
*** liuyulong has quit IRC | 10:45 | |
*** jaosorior has joined #openstack-nova | 10:46 | |
*** dave-mccowan has joined #openstack-nova | 10:51 | |
*** avolkov has quit IRC | 10:55 | |
*** Dinesh_Bhor has joined #openstack-nova | 10:56 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Fix resize revert to use non-legacy alloc handling https://review.openstack.org/589425 | 11:03 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Fix resize revert to use non-legacy alloc handling https://review.openstack.org/589425 | 11:04 |
openstackgerrit | Merged openstack/nova master: Grease some more tests hitting RetryDecorator https://review.openstack.org/588391 | 11:09 |
openstackgerrit | Merged openstack/nova master: Grease test_try_deallocate_network_retry_direct https://review.openstack.org/588364 | 11:09 |
*** artom has joined #openstack-nova | 11:11 | |
*** Dinesh_Bhor has quit IRC | 11:14 | |
*** udesale has quit IRC | 11:15 | |
*** Dinesh_Bhor has joined #openstack-nova | 11:15 | |
*** Bhujay has quit IRC | 11:18 | |
*** panda|rover is now known as panda|rover|lunc | 11:18 | |
*** ltomasbo has left #openstack-nova | 11:22 | |
*** vivsoni has quit IRC | 11:24 | |
*** stakeda has quit IRC | 11:26 | |
*** vivsoni has joined #openstack-nova | 11:31 | |
*** Dinesh_Bhor has quit IRC | 11:36 | |
*** slagle has joined #openstack-nova | 11:39 | |
*** erlon has joined #openstack-nova | 11:56 | |
*** s10 has joined #openstack-nova | 11:57 | |
*** jaosorior has quit IRC | 11:59 | |
*** edmondsw has joined #openstack-nova | 12:04 | |
*** ratailor has quit IRC | 12:05 | |
*** s10 has quit IRC | 12:09 | |
openstackgerrit | Zhenyu Zheng proposed openstack/nova master: Update installation guide to be more clear about cellsv2 https://review.openstack.org/584244 | 12:09 |
*** s10 has joined #openstack-nova | 12:10 | |
*** s10 has quit IRC | 12:10 | |
*** jpena is now known as jpena|lunch | 12:15 | |
*** panda|rover|lunc is now known as panda|rover | 12:16 | |
*** jaosorior has joined #openstack-nova | 12:19 | |
*** Bhujay has joined #openstack-nova | 12:24 | |
*** zhangbailin_ has joined #openstack-nova | 12:26 | |
*** zhangbailin_ has quit IRC | 12:27 | |
*** jaosorior has quit IRC | 12:27 | |
*** BrinZhang has joined #openstack-nova | 12:27 | |
*** BrinZhang has quit IRC | 12:27 | |
*** BrinZhang has joined #openstack-nova | 12:28 | |
*** BrinZhang has quit IRC | 12:29 | |
*** jaosorior has joined #openstack-nova | 12:42 | |
*** sapcc-bot3 has joined #openstack-nova | 12:46 | |
*** d063130_ has quit IRC | 12:50 | |
*** sapcc-bot has quit IRC | 12:50 | |
*** thomasem has quit IRC | 12:50 | |
*** weshay has quit IRC | 12:50 | |
*** gbarros has joined #openstack-nova | 12:56 | |
*** ccamacho has quit IRC | 12:58 | |
*** dklyle has quit IRC | 12:59 | |
*** bigdogstl has joined #openstack-nova | 13:00 | |
mdbooth | lyarwood: FYI, I accidentally started looking at your functional test failure, btw | 13:02 |
mdbooth | lyarwood: Haven't fixed it yet, but I can continue or not as you like. | 13:02 |
*** janki has quit IRC | 13:03 | |
*** janki has joined #openstack-nova | 13:04 | |
lyarwood | mdbooth: I've not got back around to it yet but I assume there are multiple attachments in the self.attachments[instance_uuid] list when I've used [0] to delete things from self.volume_to_attachment right? | 13:05 |
lyarwood | mdbooth: if you already have something feel free to continue and push it up when you're done | 13:05 |
mdbooth | lyarwood: Ok, will do. | 13:05 |
*** tssurya has joined #openstack-nova | 13:07 | |
mdbooth | lyarwood: What's the thinking here, btw: | 13:07 |
mdbooth | def fake_get(self_api, context, volume_id, microversion=None): | 13:07 |
mdbooth | + attachment_id = self.volume_to_attachment.get(volume_id, volume_id) | 13:07 |
mdbooth | Why would you want fake attachment_id to default to volume_id if not present? | 13:07 |
mdbooth | Ah... that's what it did before | 13:09 |
lyarwood | yarp | 13:09 |
* mdbooth wonders if that's the bug... | 13:09 | |
lyarwood | hmm this was working with a previous PS with that in place | 13:09 |
*** jamesdenton has quit IRC | 13:10 | |
mdbooth | lyarwood: I'm speculating. Issue is that we end up trying to fetch an attachment id not present. If the attachment id is a volume id ,that might be it. | 13:10 |
mdbooth | I'm going to make it unique and stash it before returning. | 13:11 |
*** lbragstad has joined #openstack-nova | 13:11 | |
*** jamesdenton has joined #openstack-nova | 13:11 | |
lyarwood | mdbooth: https://review.openstack.org/#/c/587013/6..7/nova/tests/fixtures.py@1708 - I bet it's that line | 13:12 |
mdbooth | Might be in more than 1 place. This is attachment_update | 13:13 |
mdbooth | But yeah, that looks like another candidate | 13:13 |
*** s10 has joined #openstack-nova | 13:15 | |
*** mriedem has joined #openstack-nova | 13:16 | |
*** mriedem has joined #openstack-nova | 13:16 | |
efried | kosamara: Yes, in Stein or the T release. Is this something that interests you? | 13:18 |
*** eharney has joined #openstack-nova | 13:22 | |
*** tbachman has quit IRC | 13:23 | |
kosamara | efried: yes. I'm working at CERN and I think you had a relevant discussion with Belmiro late June. | 13:26 |
efried | kosamara: I would be happy to talk through it further. I assume you're interested in doing this with libvirt, or do you have Power systems in your deployment? | 13:27 |
kosamara | efried: We would like at least a simplified version of what you're developing. With the prospect of working on it, I was looking for any relevant specs, but only came across yours recently. | 13:27 |
*** namnh has joined #openstack-nova | 13:27 | |
kosamara | efried: libvirt | 13:27 |
efried | kosamara: Are you aware of the cyborg project? | 13:28 |
*** ccamacho has joined #openstack-nova | 13:28 | |
kosamara | efried: yes, but it appears to be doing much more than what we need, so a solution within nova to use GPUs seems better at this point. | 13:29 |
efried | kosamara: How much have you done with the existing PCI passthrough framework? | 13:30 |
*** jpena|lunch is now known as jpena | 13:30 | |
kosamara | efried: we are already using it in testing and slowly moving to production. | 13:30 |
kosamara | efried: and only for GPUs | 13:32 |
s10 | Hello. I've found, that function update_available_resource(), https://github.com/openstack/nova/blob/stable/pike/nova/compute/resource_tracker.py#L694 , driver.get_available_resource(nodename) with Libvirt driver takes 30 seconds to execute on the host with 150 instances on local storage. | 13:32 |
s10 | Is this behaviour normal or is it a regression introduced by https://github.com/openstack/nova/commit/d88b75e81eabfbd463007f6a4f27e6966a466530 and following commits? Before this commit it was 1-2 seconds for all of them. | 13:32 |
efried | kosamara: Okay. That's going to be your best bet for the near future. And it should do pretty much everything you need if all you're trying to do is pass through whole GPUs. | 13:32 |
efried | s10: You've specifically nailed it down to that commit? If you revert it, your performance goes back to normal? | 13:33 |
kosamara | efried: Whole GPUs is our main use case ATM. "that"? Our main problems with the current pci passthrough in nova are quotas and scheduling without an extra filter. So basically, implementing RPs/RCs for GPUs. | 13:34 |
s10 | Yes, if I return this function to be like before this commit, performance of this function goes back to normal. But then we will lose all fixes, introduced by this commits and following, like https://github.com/openstack/nova/commit/938c0a745325fa73d098c6d5ddd20b2a599f9624 | 13:35 |
*** bigdogstl has quit IRC | 13:36 | |
s10 | efried: if i turn on logging for oslo_concurrency, I see, that a lot of time takes for 300 calls of "qemu-img info" for /var/lib/nova/instances/UUID/disk and disk.config | 13:37 |
s10 | efried: at least 0.052 for every call. 300 calls - 15 seconds. | 13:38 |
*** moshele has quit IRC | 13:39 | |
*** adrianc__ has joined #openstack-nova | 13:40 | |
efried | kosamara: Not sure if we've really started making plans to implement quotas around placement artifacts yet. alex_xu, were you working on that? | 13:40 |
efried | s10: Let me take a look at what's piled on top of that commit. Trying to figure out what the effect would be of *just* reverting that one fix. | 13:41 |
s10 | efried: so for 150 instances qemu_img info is being called 600 times. 2 times for dk_size = disk_api.get_allocated_disk_size(path) and 2 times for virt_size = disk_api.get_disk_size(path). | 13:41 |
dansmith | s10: maybe talk to lyarwood about it | 13:42 |
efried | s10: Note that we have some work ongoing to cut that in half at least: https://review.openstack.org/#/c/520024/ | 13:42 |
efried | s10: But that still wouldn't make it okay as performance regressions go. | 13:43 |
sean-k-mooney | s10: we could proably cache it but is it an apricalble portion of the total boot time | 13:43 |
*** adrianc_ has quit IRC | 13:43 | |
sean-k-mooney | there are other cases where we could chache thing in the boot process but dont because it was felt the caching introduced complexity that did not result in a significant perfomace increase when taken as a propotion of the total boot time | 13:45 |
s10 | efried: I added timer before driver.get_available_resource(nodename) and stopped it after and logged it, so only one call of this function takes 30 seconds. | 13:46 |
efried | s10: I think your best course of action right now is to open a bug and make sure lyarwood sees it. | 13:46 |
efried | s10: Do you have the ability to apply a single patch on the fly in your env and reproduce the flow? | 13:47 |
*** tbachman has joined #openstack-nova | 13:47 | |
s10 | efried: yes, I have such ability | 13:47 |
efried | s10: I can try throwing out a quick revert just to confirm that it does the trick. Stand by. | 13:48 |
*** psachin has quit IRC | 13:51 | |
openstackgerrit | Eric Fried proposed openstack/nova stable/pike: DNM: Revert d88b75e https://review.openstack.org/589479 | 13:51 |
efried | s10: ^ | 13:51 |
*** jamesdenton has quit IRC | 13:52 | |
* lyarwood reads up | 13:52 | |
lyarwood | s10: and this is with an images_type of raw? | 13:52 |
efried | s10: Note: a real revert would be tied to a bug and introduced in master and backported. This is just to confirm. | 13:52 |
kosamara | efried: alex_xu had proposed a spec for that. The first step is to have pci passthrough GPUs in placement of course. | 13:53 |
mriedem | yay https://review.openstack.org/#/q/If642e51a4e186833349a8e30b04224a3687f5594 | 13:53 |
s10 | efried: Before: Took 20.89 seconds to get available resources for nodename. update_available_resource /usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py:705 | 13:53 |
*** jamesdenton has joined #openstack-nova | 13:53 | |
s10 | efried: After: Took 11.04 seconds to get available resources for nodename | 13:53 |
sean-k-mooney | kosamara: the first step to having pcie gpus in placement is having pci devices in placement | 13:53 |
s10 | If I change virt_size = disk_api.get_disk_size(path) same way, this time reduces to 2 seconds. | 13:53 |
efried | Okay s10, I'll leave you in lyarwood's capable hands at this point. | 13:54 |
s10 | lyarwood: yes, image_type is raw | 13:54 |
mriedem | it's going to be slower b/c it's exec'ing qemu-img info | 13:55 |
lyarwood | s10: unfortunatley I'm just between calls, could you write this up in a bug and I'll get back to you in ~60mins or so | 13:55 |
s10 | lyarwood: and preallocate_images=space. I will fill bug report. | 13:55 |
mriedem | dansmith: want to send https://review.openstack.org/#/q/topic:bug/1784705+(status:open+OR+status:merged)+branch:stable/queens to their maker? | 13:58 |
dansmith | mriedem: yeah | 13:58 |
*** maciejjozefczyk has joined #openstack-nova | 13:59 | |
maciejjozefczyk | mriedem: efried hey, about https://review.openstack.org/#/c/520024; we have it on production and it works properly; release is newton | 14:00 |
efried | maciejjozefczyk: Sweet, thanks for the info. mriedem cdent Ima +2 that sucker. Shall we backport it too? | 14:00 |
maciejjozefczyk | only one thing is that, as I remember correctly, on nova master was issue with https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L866 def_resource_change() | 14:01 |
maciejjozefczyk | efried: so could you please take me a moment to confirm that its fiexed on master, or not? | 14:01 |
cdent | efried: assume maciejjozefczyk's concerns there are okay, I think a backport would be nice but not critical? | 14:01 |
mriedem | maciejjozefczyk: you said you think https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L866 should be removed yes? | 14:02 |
maciejjozefczyk | mriedem: afailr I know it says all the time 'False'; so the update is not send to the DB | 14:03 |
maciejjozefczyk | and it ends with old timestamp for compute row | 14:03 |
maciejjozefczyk | so it could be deleted | 14:04 |
maciejjozefczyk | It was running all the time because in memory some resources were changed (because of this double-update) so it was always returning 'True' | 14:04 |
mriedem | is it always False because we've already updated the compute node before we get here? | 14:04 |
*** awaugama has joined #openstack-nova | 14:05 | |
maciejjozefczyk | nope, if you apply my patch - compute is def _update() is called only once. but inside def _update() there is a check if somethings change about ram,disk etc in memory | 14:06 |
*** _ix has quit IRC | 14:06 | |
maciejjozefczyk | so without any change - it ends with False and the DB update is not called | 14:06 |
s10 | https://bugs.launchpad.net/nova/+bug/1785827 | 14:06 |
openstack | Launchpad bug 1785827 in OpenStack Compute (nova) "Performance regression in libvirt get_available_resource()" [Undecided,New] | 14:06 |
mriedem | is there any reason to update the compute node record in the db if nothing changed? | 14:07 |
s10 | lyarwood: ^ i've written a bug report | 14:07 |
maciejjozefczyk | without my change - it was called basically all the time, because first update was without for ex: shutdown instances, and the second was with (so the amout of ram, disk, etc was always changing - the def _resource_change() was saying 'True') | 14:07 |
lyarwood | s10: ack thanks | 14:07 |
maciejjozefczyk | mriedem: exept timestamp, no | 14:07 |
maciejjozefczyk | except* | 14:07 |
kosamara | sean-k-mooney: yes, which is what efried 's spec does. | 14:07 |
maciejjozefczyk | mriedem: but for me if 'nova compute-show' shows that compute is UP and timestamp is new - it says, yea, compute is working | 14:08 |
maciejjozefczyk | mriedem: maybe somebody use it as source for some scripts | 14:08 |
maciejjozefczyk | mriedem: but ye, service-list should be used anyway for that pourpose | 14:09 |
mriedem | i want to say i think there is something in the scheduler that cares about the compute node updated_at time and uses it for some refresh threshold | 14:09 |
maciejjozefczyk | mriedem: no idea | 14:09 |
mriedem | HostManager._check_for_nodes_rebalance | 14:09 |
mriedem | if (self.updated and compute.updated_at | 14:10 |
mriedem | and self.updated > compute.updated_at): | 14:10 |
mriedem | return | 14:10 |
maciejjozefczyk | yep, so thats the issue | 14:10 |
maciejjozefczyk | mriedem: the code is from placement? | 14:11 |
mriedem | no that's in the nova scheduler HostManager | 14:11 |
maciejjozefczyk | mriedem: https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L929 | 14:11 |
sean-k-mooney | kosamara: oh i did not know efried had a pci device in placement spec up for stien cool il should review that | 14:12 |
maciejjozefczyk | mriedem: ah, so it bases on the db, right? | 14:12 |
kosamara | efried: I would like to contribute. I think I could start with making the libvirt driver report PCI RPs, similar to your spec. I can propose a small spec just for that for Stein. | 14:12 |
mriedem | right | 14:12 |
mriedem | https://github.com/openstack/nova/blob/536acbfe0572f10ea84f330f2f29b07ca9279114/nova/scheduler/host_manager.py#L194 | 14:12 |
kosamara | sean-k-mooney: not in nova, in the nova-powervm fork: https://review.openstack.org/#/c/579359/10/doc/source/specs/rocky/device-passthrough.rst | 14:12 |
maciejjozefczyk | ok, so I vote for dropping resource_change logic, but first let me see how we use it in our deployment | 14:13 |
efried | kosamara: That would be cool, please add me as a reviewer. | 14:13 |
sean-k-mooney | kosamara: if your are doing that can you also make sure to pull the pci feature flags from the pci_devices table in the nova db and add them as traits to the RP | 14:13 |
maciejjozefczyk | mriedem: anyway my RUN team doesn't scream about this issue anymore | 14:13 |
mriedem | maciejjozefczyk: since I40c17ed88f50ecbdedc4daf368fff10e90e7be11 i'm not sure this check in the HostState object even matters | 14:13 |
mriedem | we don't cache HostStates in the scheduler anymore | 14:14 |
*** Bhujay has quit IRC | 14:18 | |
*** nicolasbock has quit IRC | 14:18 | |
efried | kosamara: One main aspect of the nova-powervm spec that I expect to be carried through to nova proper is a YAML configuration file allowing the operator to: | 14:21 |
efried | - Identify devices to be permitted for passthrough | 14:21 |
efried | - Specify a resource class | 14:21 |
efried | - Specify traits | 14:21 |
kosamara | sean-k-mooney: I wasn't aware of this info in the pci_devices table. What exactly is it? | 14:22 |
maciejjozefczyk | mriedem: anyway I need to go, I'll check this resource_updated() logic with my patch once again, I'll leave comment tonight | 14:22 |
sean-k-mooney | efried: um im not sure about that... given how we did the numa aware vswitch spec i would have assumed we would have used dynamic config insead of a yaml file | 14:22 |
efried | maciejjozefczyk: Thanks! | 14:22 |
efried | sean-k-mooney: -1 to dynamic config. | 14:23 |
kosamara | efried: I may propose something more basic, with the existing passthrough_whitelist conf. | 14:23 |
efried | sean-k-mooney: I think the only reason we did that instead of yaml is to minimize the effort. | 14:23 |
sean-k-mooney | kosamara: for network devices we use ethtool ioctls via libvirt to get the nic feature flags such as tcp checksume offload | 14:23 |
efried | kosamara: Oh dear gods please no. | 14:23 |
sean-k-mooney | kosamara: adding more stuff to passthough_whitelist is basically an automatic -3 | 14:24 |
mdbooth | lol | 14:24 |
*** tbachman has quit IRC | 14:24 | |
*** maciejjozefczyk has quit IRC | 14:25 | |
efried | sean-k-mooney: We talked about using yaml in Denver. It really makes the most sense for this kind of thing, because trying to manage nested hierarchical data via oslo_config is a major PITA. | 14:25 |
*** tbachman has joined #openstack-nova | 14:25 | |
sean-k-mooney | efried: well there is an argument to be made that today we dont use yaml for any other configs so we should not intoduce it for this feature | 14:25 |
efried | kosamara: ...Also, automatically generated traits. In my spec we've "namespaced" the generated traits with _POWERVM_ but some of them will potentially overlap on any platform (e.g. vendor & product IDs) | 14:25 |
sean-k-mooney | efried: that said im not really against it either | 14:26 |
kosamara | efried: I had it like that in my mind following a previous discussion with sean-k-mooney and gibi. I'll rethink it. | 14:26 |
efried | sean-k-mooney: IIRC jaypipes was a proponent and even dansmith was in agreement. | 14:26 |
efried | kosamara: In case the prospect of yaml schema/parsing is intimidating, here's code: https://review.openstack.org/#/c/579289/ | 14:27 |
sean-k-mooney | efried: well jaypipes didnt want more semantics to the whitelist and having a dedicated config not in nova was cleaner as nova was not using dynamic config at the time. as i said im not against it but its adding another dependicy to nova e.g. yaml parsing | 14:28 |
sean-k-mooney | efried: are you going to propose that spec to nova propper for stien? | 14:29 |
sean-k-mooney | efried: if so is the scope just pci devices or generic device passthouhg. we had talked about expanding it to usb/sata devices in denver too but not sure if that is a different spec | 14:29 |
sean-k-mooney | * should be a different sepc | 14:30 |
efried | sean-k-mooney: I hadn't yet decided whether to propose a nova spec for Stein or wait until T, but it sounds like kosamara may be interested in doing it for Stein. | 14:31 |
efried | sean-k-mooney: The way I've written the nova-powervm spec, the schema would be easily extensible to support non-PCI. | 14:31 |
jaypipes | efried: vendor and product IDs should not be traits. | 14:31 |
efried | sigh | 14:31 |
jaypipes | just sayin. | 14:31 |
mdbooth | lyarwood: Hey, this is interesting | 14:32 |
sean-k-mooney | efried: i abandonded my nic feature based schduling work in favour of using this in the furutre but not sure ill be working on that now | 14:32 |
mdbooth | lyarwood: Still investigating your functional failures, came across _terminate_volume_connections in ComputeManager, which does exactly what I proposed | 14:32 |
sean-k-mooney | jaypipes: im assuming you would prefer a resouce_class to track that | 14:33 |
jaypipes | sean-k-mooney: no. | 14:33 |
sean-k-mooney | jaypipes: no? | 14:33 |
mdbooth | lyarwood: Specifically create a blank attachment, delete the old attachment, update the bdm to point to the blank. | 14:33 |
*** dklyle has joined #openstack-nova | 14:33 | |
jaypipes | sean-k-mooney: I'm just saying traits are capabilities. they aren't key/value metadata items. | 14:33 |
efried | sean-k-mooney: I think jaypipes wants traits for the *capabilities* associated with a vendor/product | 14:33 |
sean-k-mooney | ah ok | 14:33 |
openstackgerrit | Vlad Gusev proposed openstack/nova stable/pike: Fix message for unexpected external event https://review.openstack.org/589503 | 14:33 |
jaypipes | efried: bingo. | 14:33 |
sean-k-mooney | yep i agree that they are capablityes not metadata | 14:33 |
efried | which irl will entail maintaining a matrix of vendor/product to capabilities | 14:34 |
openstackgerrit | Vlad Gusev proposed openstack/nova stable/queens: Fix message for unexpected external event https://review.openstack.org/589505 | 14:34 |
jaypipes | efried: correct. | 14:34 |
sean-k-mooney | i was implying that vendor/product id is metadata that should be associated with the resouce_class | 14:34 |
efried | which means whenever a new device is introduced that we want to support, we need to change code. | 14:34 |
jaypipes | efried: which... *gasp* the friggin vendors should be responsible for. | 14:34 |
efried | jaypipes: The vendors should be responsible for proposing nova patches to support their devices? | 14:34 |
sean-k-mooney | i had asked previously about intodusing resouce class metadata at some point but did not really push the point in the past | 14:34 |
*** amarao has quit IRC | 14:35 | |
jaypipes | efried: no. the vendors should be responsible for keeping the matrix of capabilities up to date with their product lines. | 14:35 |
efried | jaypipes: And that matrix of capabilities should be discoverable by querying the device somehow | 14:35 |
*** josecastroleon has quit IRC | 14:35 | |
jaypipes | efried: in the same way they are responsible for ensuring the pciids database is kept up to date with all their vendor, subvendor/reseller and product information. | 14:35 |
sean-k-mooney | jaypipes: well the vendor id/product id is important for other reasons such as knowing what driver is required for the device | 14:35 |
*** josecastroleon has joined #openstack-nova | 14:36 | |
efried | If that were the case, and if vendors could agree on names (IDs?) for capabilities across the board, I could get behind it. | 14:36 |
sean-k-mooney | efried: well intel has been pushing to try and stardise some of them in etsi and dmtf(redfish) | 14:36 |
efried | But I think we need to have a realistic fallback plan so that, if such a nirvana does not come into being, operators will still be able to ask for a GPU by product ID. | 14:37 |
sean-k-mooney | from an openstack point of view that what the standard traits in os-traits are for | 14:37 |
jaypipes | efried: well, if vendors want to enable their technology in OpenStack, they should work with us. Instead, nova needs to jump through a bunch of hacky hoops to work with vendors. ala https://review.openstack.org/#/c/579897/. which is complete bullshit, IMHO. | 14:37 |
efried | yes, I noticed you getting excited about that yesterday :) | 14:38 |
*** hongbin has joined #openstack-nova | 14:38 | |
sean-k-mooney | jaypipes: ya.. well that is working around a limitation that nvidia put in there driver to prevent you using there gpus in vm unless you baught there datacenter gpus | 14:39 |
jaypipes | sean-k-mooney: I'm fully aware of that, yes. | 14:39 |
jaypipes | sean-k-mooney: please see my comment on whether we are OpenStack or WorkaroundForClosedStack | 14:40 |
jaypipes | sean-k-mooney: I think the answer has already been answered on that, actually. we have been VendorStack for about 5 years now. | 14:40 |
sean-k-mooney | oh i see it i have it open now. i agree that we should not need to do this but its a usecase they have | 14:40 |
sean-k-mooney | i think there are usecase beyond the gpu case for wanting to hide the hypervios signture | 14:41 |
jaypipes | name one. | 14:42 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Use placement 1.28 in scheduler report client https://review.openstack.org/583667 | 14:42 |
sean-k-mooney | linux kernel dev | 14:42 |
*** maciejjozefczyk has joined #openstack-nova | 14:42 | |
*** _ix has joined #openstack-nova | 14:42 | |
sean-k-mooney | anyway i know that is a reach but for ovs-dpdk i dev i have used trick in the past to allow me to use openstack as a dev env | 14:42 |
*** READ10 has joined #openstack-nova | 14:43 | |
sean-k-mooney | such as setting the nic model to e1000 to i could test the non viutalised nic binding workflow | 14:43 |
jaypipes | sean-k-mooney: you're performing use case gymnastics here. | 14:44 |
sean-k-mooney | yes and no. in the early days of dpdk support i had to do thing like set the cpu-mode to host-passthough becasue i did not have the cpu feature reqiure to even compile it otherwise | 14:45 |
dansmith | jaypipes: it's a common case for kernel development to use virtualization to simulate environments for that dev.. like fake numa nodes, pci devices, etc | 14:46 |
sean-k-mooney | as i said i was reaching a bit but if i want to do driver dev and i want to do that in a vm on openstack then i might need to hide the hypervisor signiture. that said i also recognise that openstack is not a virtualisation stack | 14:46 |
dansmith | jaypipes: that's not to say that I think IaaS really needs to worry about such things | 14:46 |
*** maciejjozefczyk has quit IRC | 14:47 | |
dansmith | jaypipes: given that developers can use bare hypervisors on their laptops to do that with much better control | 14:47 |
*** tbachman has quit IRC | 14:47 | |
sean-k-mooney | dansmith: yep that said for ci of that code openstack is a much more compelling option | 14:47 |
dansmith | yup, CI is another reasonable use case imho | 14:48 |
*** maciejjozefczyk has joined #openstack-nova | 14:48 | |
*** maciejjozefczyk has quit IRC | 14:48 | |
sean-k-mooney | the alternitive without hypervior hiding is ironic but that is arguable more complex. the difference is the complexity is on the enduer not the technical debt nova has to maintain | 14:49 |
dansmith | sean-k-mooney: note that I'm defending the dev case for needing finer grained control, but *not* defending the hypervisor-hiding case | 14:49 |
dansmith | the latter is mostly total closed-source proprietary BS | 14:49 |
jaypipes | sean-k-mooney, dansmith: in that case, the use case would be to be able to *set* the vendor ID to a particular value, not hardcode it to "1234567890ab" just for the HyperV emulator just to avoid licensing issues with Nvidia: https://review.openstack.org/#/c/579897/4/nova/virt/libvirt/config.py@2241 | 14:50 |
*** priteau has quit IRC | 14:50 | |
lyarwood | s10: hey, thanks for the report, so what's the actual knock of impact of update_available_resource taking longer here? Scheduling accuracy? | 14:50 |
sean-k-mooney | jaypipes: well i had originally asked shoudl we just remove the hypervier section entirly if hiding was asked for to fully hide the fact we were in a vm but that has performance impacts | 14:51 |
sean-k-mooney | but yes making this more generic to allow setting a spefic vendor id would be just as vaild | 14:52 |
s10 | lyarwood: It looks like, when I try to live migrate instances from this host, all operations should wait for this update_available_resource every minute. So I have a small live migration window: 30 seconds every minute, only after update_available_resource ends and before it starts. | 14:53 |
sean-k-mooney | s10: why? | 14:54 |
jaypipes | s10: why are you running update_available_resource every minute? | 14:55 |
*** josecastroleon has quit IRC | 14:55 | |
s10 | sean-k-mooney: I don't know, that's what I see in logs. No live migration could be performed during update_available_resources. If change libvirt/driver.py to like before first commits, migrations goes without pause. | 14:55 |
cdent | jaypipes: that's the periodic job default | 14:56 |
sean-k-mooney | s10: well that makes more sense | 14:56 |
jaypipes | quoi? wtf is it that often? :( | 14:56 |
sean-k-mooney | we are running the periodic job on a green thread | 14:56 |
cdent | jaypipes: dunno, but that's what it is | 14:56 |
jaypipes | sean-k-mooney: there is a semaphore lock around it. | 14:56 |
*** hoonetorg has quit IRC | 14:57 | |
*** josecastroleon has joined #openstack-nova | 14:57 | |
sean-k-mooney | so if update_available_resources is taking 30sec on a node with 150 instance then the compute agent would be tied up for 30secs every minute executing it | 14:57 |
s10 | jaypipes: because update_resources_interval run with default periodic interval with default value update_resources_interval=0, and periodic_task_interval=60 by default. | 14:58 |
sean-k-mooney | jaypipes: the point im trying to make is the periodic jobs are time sharing with the rest of the compute agent | 14:58 |
stephenfin | efried: Does POWER expose NUMA to the OS? | 14:59 |
sean-k-mooney | stephenfin: as in powerpc ? | 14:59 |
stephenfin | sean-k-mooney: up | 14:59 |
stephenfin | *yup | 14:59 |
dansmith | sean-k-mooney: but most of that is waiting for IO so it's not keeping the process busy | 14:59 |
mdbooth | lyarwood: I have another meeting now, so I've chucked a brain dump in the functional failure review. | 14:59 |
efried | stephenfin: My understanding is that POWER handles NUMA under the covers, and does it well enough that the deployer doesn't need control. | 14:59 |
sean-k-mooney | stephenfin: numa is exposed to linux via the bios so it should would the same on powerpc | 15:00 |
sean-k-mooney | the quest is does the hypervior expose it or not | 15:00 |
stephenfin | efried: Is that the hardware doing the work or the hypervisor? | 15:00 |
stephenfin | sean-k-mooney: yeah ^ | 15:00 |
stephenfin | (the work of abstracting NUMA'ness) | 15:00 |
*** jaosorior has quit IRC | 15:01 | |
sean-k-mooney | dansmith: good point it shoudl yeild exection | 15:01 |
*** panda|rover is now known as panda|backin2h | 15:01 | |
jaypipes | cdent, dansmith: is there any reason now that placement claims are doing most of the resource consumption work (in an atomic manner) that we can't set the update_available_resource default to something sensible like 15 minutes? | 15:01 |
* cdent thinks | 15:02 | |
sean-k-mooney | jaypipes: just trying to think is there any late claim still dont in the compute node. i think not | 15:02 |
cdent | there's not | 15:02 |
dansmith | jaypipes: I would have to look, because I think we have other things hung off that process | 15:02 |
cdent | jaypipes: do we do anything with host states and Filters where that info needs to be up to date? | 15:03 |
cdent | non-placement-related filtering | 15:03 |
sean-k-mooney | stephenfin: im pretty sure that hyperv hides the numa info from nova | 15:03 |
jroll | jaypipes: one catch would be picking up new/changed ironic nodes, but maybe we can just doc that caveat | 15:04 |
sean-k-mooney | stephenfin: i know they have made numa affinity of instance memory a hypervir config option | 15:04 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP libvirt: Reduce calls to qemu-img during update_available_resource https://review.openstack.org/589513 | 15:04 |
jroll | "changed" e.g. ironic node goes to / comes out of maintenance mode | 15:05 |
lyarwood | s10: ^ we can remove half of the qemu-img calls with that | 15:05 |
*** mriedem is now known as mriedem_afk | 15:06 | |
stephenfin | sean-k-mooney: Fair enough. I was trying to write a high-level overview of NUMA but needed to figure out what platforms would be affected. I'll just leave that piece out :) | 15:07 |
*** nicolasbock has joined #openstack-nova | 15:07 | |
*** Luzi has quit IRC | 15:07 | |
sean-k-mooney | stephenfin: i would recommend putting it in the context of which hypervior rather then which plathform e.g. achitecuter if you do | 15:08 |
sean-k-mooney | i would expect power-kvm via libvirt to work the same as kvm on x86 but i would expect powervm direct driver to work very different | 15:09 |
*** dpawlik has quit IRC | 15:10 | |
stephenfin | sean-k-mooney: Good call. I'll do that | 15:11 |
*** pcaruana has quit IRC | 15:11 | |
cdent | jaypipes: we could consider have a different periodic job for the placement calls? There's a bazillion periodic jobs in the compute manager already, aren't there? What's one more :() | 15:11 |
sean-k-mooney | cdent: they mainly all share the same config option today however. we dont have a interval time option for each | 15:12 |
*** hemna_ has joined #openstack-nova | 15:12 | |
*** s10 has quit IRC | 15:12 | |
sean-k-mooney | that said i dont see an issue with adding a new config option and default it to the current periodic job interval | 15:12 |
sean-k-mooney | or if you want the behavior change to be implicit on upgrade then default to 15 mins or whatever makes sense | 15:13 |
cdent | sean-k-mooney: there are several *_internal conf settings in nova/conf/compute.py and they all fall back to the default for any periodic in an oslo_service based service | 15:13 |
cdent | s/_internal/_interval/ | 15:14 |
*** s10 has joined #openstack-nova | 15:15 | |
efried | stephenfin, sean-k-mooney: Sorry, had to step away for a sec. From your point of view, the "hardware" and the "hypervisor" are the same thing. | 15:15 |
sean-k-mooney | efried: meaning we cant see the hardware so we should only care about what the hypervior reporst | 15:16 |
efried | sean-k-mooney: That's probably a sane way to think about it. I don't think the hypervisor reports NUMA cells at all. You see procs and memory. | 15:17 |
efried | I'm not an expert here, for sure. | 15:17 |
sean-k-mooney | efried: that is how the host is reported form hyperv also as far as i know | 15:17 |
sean-k-mooney | just one big pool for ram and cpus | 15:18 |
sean-k-mooney | they have a hypervior config option (not in openstack) to turn on numa affinity at the host level | 15:18 |
efried | yeah, see, I think NUMA affinity Just Happens (tm) on Power. | 15:19 |
jaypipes | cdent: perhaps, yes | 15:19 |
sean-k-mooney | efried: i dont think vsphere exposes it either. its just libvirt that exposes it as far as i can tell | 15:19 |
efried | frickin libvirt <rolls eyes> | 15:19 |
sean-k-mooney | hehe well libvirt exposes it because libvirt is not a hypervior its an hypervior abstraction layer and isnce qemu does not do this by default it fell to nova to fill in the gapps | 15:20 |
efried | :) | 15:21 |
jaypipes | sean-k-mooney: WRONG! libvirt is an XML file management system. | 15:22 |
* cdent uses xpath and xslt to launch vms | 15:22 | |
sean-k-mooney | jaypipes: lol | 15:22 |
sean-k-mooney | jaypipes: it does a little bit more then that but more or less | 15:22 |
* jaypipes crafts artisanal VMs from unicorns and rainbows | 15:23 | |
*** Bhujay has joined #openstack-nova | 15:23 | |
sean-k-mooney | you know it could be argured that a new service could be insrted between nova and libvirt that did all the nfv stuff and the libvirt driver could be made way simpeler | 15:24 |
cdent | I sure hope those unicorns are free range | 15:25 |
*** priteau has joined #openstack-nova | 15:26 | |
*** tbachman has joined #openstack-nova | 15:27 | |
*** adrianc__ has quit IRC | 15:28 | |
*** Sigyn2 has joined #openstack-nova | 15:29 | |
*** Sigyn2 has joined #openstack-nova | 15:29 | |
*** Sigyn2 has joined #openstack-nova | 15:30 | |
*** Sigyn2 has joined #openstack-nova | 15:30 | |
*** Sigyn2 has joined #openstack-nova | 15:31 | |
*** Sigyn2 has joined #openstack-nova | 15:31 | |
*** gyee has joined #openstack-nova | 15:32 | |
s10 | lyarwood: with https://review.openstack.org/589513 update_available_resource() lasts 10 seconds for 100 instances instead of 20 seconds without. | 15:35 |
*** dpawlik has joined #openstack-nova | 15:38 | |
lyarwood | s10: kk, that's a single qemu-img info call per disk to avoid the other issues fixed by the original changes | 15:39 |
*** s10 has quit IRC | 15:39 | |
dansmith | s10: this sounds like a silly question, but what does it matter how long that takes? | 15:39 |
dansmith | it's not blocking other work right? | 15:39 |
efried | stephenfin: I think https://review.openstack.org/#/c/588422/ is ready for your +A now | 15:40 |
stephenfin | ack | 15:40 |
*** s10 has joined #openstack-nova | 15:40 | |
lyarwood | dansmith: I did ask above and it's causing issues with LM apparently | 15:40 |
dansmith | lyarwood: oh sorry I missed that.. maybe because it's holding the RT semaphore? | 15:41 |
lyarwood | dansmith: yes I think so | 15:41 |
dansmith | okay makes sense I guess | 15:41 |
dansmith | lyarwood: the new call checks the allocated value, which doesn't change over the lifecycle of the image right? | 15:42 |
sean-k-mooney | dansmith: correct it should not | 15:42 |
*** dpawlik has quit IRC | 15:42 | |
sean-k-mooney | dansmith: unless the call is checking the actul used space on the host and we are not preallocting ? | 15:43 |
lyarwood | dansmith: allocated can, virtual shouldn't | 15:43 |
lyarwood | yeah pretty much | 15:43 |
dansmith | lyarwood: but which are we looking at now? | 15:43 |
lyarwood | dansmith: now it's just a single qemu-img info call that grabs both | 15:44 |
sean-k-mooney | lyarwood: the resouce track should be tracking the virtual amont right not the currently allcoated amount? | 15:44 |
dansmith | lyarwood: ah, both so one of them is dynamic and we can't really cache it yeah? | 15:44 |
lyarwood | sean-k-mooney: the allocated amount is used to work out over commit etc | 15:44 |
sean-k-mooney | lyarwood: that seams wrong the maxium it could use should be used for that | 15:44 |
dansmith | sean-k-mooney: IIRC, this is not for reporting to placement but for some of the legacy values (right lyarwood ?) | 15:45 |
jaypipes | dansmith: what are your thoughts on https://bugs.launchpad.net/nova/+bug/1784826? I can't tell what the expected behaviour there should be... | 15:45 |
openstack | Launchpad bug 1784826 in OpenStack Compute (nova) "Guest remain in origin host after evacuate and unset force-down nova-compute" [Undecided,In progress] - Assigned to huanhongda (hongda) | 15:45 |
dansmith | because auditing on the compute node doesn't change what placement is counting | 15:45 |
lyarwood | dansmith: yeah correct, this doesn't make it to placement AFAIK | 15:46 |
dansmith | lyarwood: so we really only need to be collecting this info for old RT, which is ignored if you don't have DiskFilter enabled, and only useful for people using the deprecated CachingScheduler yeah? | 15:47 |
sean-k-mooney | lyarwood: what advantage is there to using allocated though? since placement would already be using its allocation ratio to filter the hosts before the compute ever need to check its over_subsciption ratio setting | 15:47 |
lyarwood | dansmith: in master but this series was backported all the way back to ocata | 15:47 |
lyarwood | dansmith: wasn't it still used back then? | 15:48 |
dansmith | lyarwood: yeah I know, but diskfilter hasn't been needed since claims in the scheduler | 15:48 |
dansmith | *maybe* ocata, but not pike or later IIRC | 15:48 |
*** maciejjozefczyk has joined #openstack-nova | 15:50 | |
lyarwood | sean-k-mooney: right, I think the only reason this came up before is that it's visable from the CLI | 15:50 |
dansmith | lyarwood: anyway, just wondering if maybe we should add a workaround config tweak to disable this extra inspection so you can turn it off if you're not using DiskFilter and/or are on a new enough version | 15:50 |
lyarwood | dansmith: yeah that sounds like the way to go tbh | 15:51 |
dansmith | lyarwood: ah, well, that makes it even more pain than gain if it was just "make the numbers line up" :) | 15:51 |
dansmith | lyarwood: probably best to get some sign-offs from the PTL(s) of the affected releases, but that's kinda where I'm thinking | 15:52 |
sean-k-mooney | lyarwood: so if its visable via the cli i think thats even more reason to use the virtual not allocated disk size to have it corralte with placement | 15:52 |
sean-k-mooney | lyarwood: that said that would be a behavior change i guess. | 15:52 |
sean-k-mooney | lyarwood: i assume this is visable via the hyperviors api? | 15:53 |
lyarwood | sean-k-mooney: yeah via a hypervisor-show - https://bugs.launchpad.net/nova/+bug/1764489 | 15:54 |
openstack | Launchpad bug 1764489 in OpenStack Compute (nova) queens "Preallocated disks are deducted twice from disk_available_least when using preallocated_images = space" [Medium,Fix committed] - Assigned to Lee Yarwood (lyarwood) | 15:54 |
*** maciejjozefczyk has quit IRC | 15:55 | |
sean-k-mooney | right so its messing up disk_available_least not local_gb_used... | 15:56 |
*** adrianc__ has joined #openstack-nova | 15:58 | |
*** Bhujay has quit IRC | 16:00 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Use placement 1.28 in scheduler report client https://review.openstack.org/583667 | 16:01 |
*** luksky has quit IRC | 16:02 | |
*** tbachman has quit IRC | 16:08 | |
*** namnh has quit IRC | 16:10 | |
mdbooth | lyarwood: I'm partially through the functional failure. Got past the shelve/unshelve failure, error is now in _test_attach_volume_error | 16:11 |
*** mriedem_afk has quit IRC | 16:11 | |
mdbooth | lyarwood: However, I'm going to leave shortly, it doesn't pass yet, and there are still print statements all over it | 16:11 |
melwitt | . | 16:12 |
mdbooth | lyarwood: So I'm not inclined to push it this evening unless that would be especially helpful to you | 16:12 |
lyarwood | mdbooth: yeah np leave it for now | 16:15 |
*** jaypipes_ has joined #openstack-nova | 16:16 | |
*** jaypipes has quit IRC | 16:17 | |
*** jaypipes_ has quit IRC | 16:17 | |
*** burt has quit IRC | 16:19 | |
*** jpena is now known as jpena|off | 16:21 | |
*** burt has joined #openstack-nova | 16:22 | |
*** lbragstad[m] has quit IRC | 16:32 | |
*** jaypipes has joined #openstack-nova | 16:33 | |
*** tbachman has joined #openstack-nova | 16:36 | |
*** s10 has quit IRC | 16:38 | |
*** imacdonn has quit IRC | 16:38 | |
*** imacdonn has joined #openstack-nova | 16:38 | |
*** gbarros has quit IRC | 16:39 | |
*** tbachman_ has joined #openstack-nova | 16:40 | |
*** takedakn has joined #openstack-nova | 16:42 | |
*** tbachman has quit IRC | 16:42 | |
*** tbachman_ is now known as tbachman | 16:42 | |
*** moshele has joined #openstack-nova | 16:44 | |
*** gbarros has joined #openstack-nova | 16:44 | |
*** takedakn has left #openstack-nova | 16:45 | |
*** tssurya has quit IRC | 16:45 | |
*** janki has quit IRC | 16:47 | |
openstackgerrit | Sergii Golovatiuk proposed openstack/nova master: Fix URI for IPv6 https://review.openstack.org/589548 | 16:50 |
melwitt | dansmith: this review looks in your wheelhouse https://review.openstack.org/589425 | 16:57 |
dansmith | yeah, gibi and I have been discussing it | 16:58 |
dansmith | I'll hit it in a few | 16:58 |
*** sahid has quit IRC | 16:58 | |
melwitt | k | 16:59 |
*** derekh has quit IRC | 17:00 | |
*** panda|backin2h is now known as panda|rover | 17:08 | |
efried | stephenfin, sean-k-mooney: I found out a little more about NUMA affinity on Power. It is indeed done dynamically; you can't ask for it to be done a certain way. However, the NUMA cell and affinity information *is* exposed to the guest, so if the guest has the savvy and ability to use such information for clever scheduling of threads/processes, it can do so. | 17:10 |
efried | stephenfin, sean-k-mooney: Also of note, the system may swizzle affinities dynamically for various reasons (including, apparently, at the behest of a generic "improve my affinity" background job you can run) so the guest needs to be able to hang with that. | 17:11 |
*** ThomasWhite has quit IRC | 17:15 | |
openstackgerrit | Dan Smith proposed openstack/nova master: Fix resize revert to use non-legacy alloc handling https://review.openstack.org/589425 | 17:26 |
*** harlowja has joined #openstack-nova | 17:31 | |
*** adrianc__ has quit IRC | 17:35 | |
sean-k-mooney | efried: stephenfin so am i right in saying i can request a partcalar virtual numa topology for the guest but powervm will dynmically map that as it sees fit and adjust as needed | 17:38 |
sean-k-mooney | brb | 17:39 |
*** adrianc__ has joined #openstack-nova | 17:41 | |
*** harlowja has quit IRC | 17:43 | |
*** markvoelker_ has quit IRC | 17:45 | |
*** moshele has quit IRC | 17:50 | |
*** gbarros has quit IRC | 18:06 | |
*** Swami has joined #openstack-nova | 18:12 | |
*** luksky has joined #openstack-nova | 18:14 | |
efried | sean-k-mooney: You can *not* request a particular topology for a guest. PowerVM will set it up dynamically and adjust as needed. | 18:25 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP libvirt: Reduce calls to qemu-img during update_available_resource https://review.openstack.org/589513 | 18:29 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP libvirt: Add workaround to stop use of qemu-img by the RT https://review.openstack.org/589567 | 18:29 |
lyarwood | dansmith: ^ was that what you had in mind for the qemu-img workaround? If it is I'll clean it up and ping the ML to see if others agree. | 18:29 |
dansmith | lyarwood: kinda, but we weren't just returning zero before right? what were we doing? | 18:30 |
dansmith | I was thinking just make the workaround trigger "do the old thing" | 18:31 |
lyarwood | dansmith: well before we were calling os.path.getsize | 18:31 |
dansmith | ah right, so .. I would just make the workaround revert to that I think | 18:32 |
sean-k-mooney | efried: does that imply that from within the guest i can see that the topoloyg is chaning? | 18:45 |
efried | sean-k-mooney: Yeah. Is that crazy? | 18:45 |
sean-k-mooney | efried: yes | 18:45 |
efried | sean-k-mooney: It's possible that the topo only changes if you run that job I mentioned. | 18:45 |
sean-k-mooney | windows and linux are really not going to do the right thing | 18:46 |
sean-k-mooney | im pretry sure that neither will check the numa topology on an ongoing basis | 18:46 |
sean-k-mooney | so the linux schduler is going to detect the numa topology when it start up and then contiue to use the same mappings for the liftime of the vm | 18:47 |
*** openstackgerrit has quit IRC | 18:49 | |
*** moshele has joined #openstack-nova | 18:50 | |
sean-k-mooney | if the hyperviors hide the phyical topology from the guest but dynamically rempas the ram across the host numa nodes then atleast the info present to the guest is consitent and it can try to make sane decisions but if the virtual tolopogy the guest sees is dynamic things like numactl are going to not work correctly in the guest | 18:51 |
efried | sean-k-mooney: Yeah, I don't know to this level of detail. I imagine it'd be in response to other LPARs going away or whatever, and thus freeing up the system to shuffle my stuff to make it more efficient. Or maybe I hot plug a device and now it's better to move my procs/mem to the cell that's affined to that device. But I don't know how that actually appears to the guest. | 18:53 |
*** adrianc__ has quit IRC | 18:53 | |
sean-k-mooney | efried: well the guest view is what i was asking about when i said can you request a virtual numa toplogy. as long as the geust view does not change the hypervior can do what it like underneath without worriying it will break the guest | 18:55 |
sean-k-mooney | the guest can still make bad desisions if the hyperviour dicides to split a virutal numa node across host phyical numa nodes but it wont break anything | 18:56 |
sean-k-mooney | if the gust view is changing things like hugepages within the guest would break | 18:57 |
efried | sean-k-mooney: When you say "request a virtual numa topology" do you mean "ask what the topology is" or do you mean "request that the virtual topology look like XYZ" ? | 18:57 |
efried | s/look like/get set up as/ | 18:57 |
sean-k-mooney | i mean "power vm please create a vm that to the guest looks like it has 2 numa node, do whatever you like on the host" | 18:58 |
sean-k-mooney | that is effectivly what hw:numa_nodes=2 means | 18:59 |
efried | sean-k-mooney: Right, okay, so I'm saying you definitely can't do that in PowerVM. I'm not sure whether you can do it in PowerKVM. | 18:59 |
sean-k-mooney | ok cool | 18:59 |
sean-k-mooney | you can do that with hyperv i think but i dont think i guarntees they are mapped to 2 phyical host numa nodes | 19:00 |
sean-k-mooney | or rather if they are mapped/affinitised to host numa nodes is governed by a hypervior config option as far as i know | 19:01 |
*** tbachman has quit IRC | 19:14 | |
sean-k-mooney | efried: its burried in the docs but ya it looks like you can do numa affinity with powerkvm https://www.ibm.com/support/knowledgecenter/SSZJY4_3.1.0/liabp/liabphotplugcpunuma.htm | 19:14 |
sean-k-mooney | you can also do cpu pinning | 19:15 |
efried | orly? | 19:15 |
*** maciejjozefczyk has joined #openstack-nova | 19:15 | |
sean-k-mooney | yep and discribe the virual cpu topoloyg in terms of socket,cores and thread | 19:15 |
efried | sean-k-mooney: This appears to be talking about just virtual topology, yah? | 19:16 |
sean-k-mooney | yes unlsee you can use the numa tune elements | 19:16 |
*** s10 has joined #openstack-nova | 19:17 | |
sean-k-mooney | efried: so ya this all apears to be virtual thought the docs are so hard to navigate one could almost belive ibm did not want you to use those features :P | 19:18 |
*** moshele has quit IRC | 19:18 | |
efried | sean-k-mooney: Oh, don't worry, that's not specific to NUMA. | 19:18 |
sean-k-mooney | haha well i partcalarly like how the numa stuff is not linked too for the "manage processor and memory" section https://www.ibm.com/support/knowledgecenter/en/SSZJY4_3.1.0/liabp/liabpmanageprocessors.htm | 19:20 |
efried | sean-k-mooney: I think I can now officially say you've spent more time on the IBM "knowledge" center in the past twenty minutes than I have in the past five years. | 19:22 |
sean-k-mooney | haha well from what i can tell if you used the libvirt kvm driver an pointed it at a power-kvm host i think it would "just work" with all of the nuam and epa stuff we do for normal kvm | 19:23 |
*** mriedem has joined #openstack-nova | 19:24 | |
efried | sean-k-mooney: I have found out that there is some kind of notification the OS can get when affinities change. | 19:24 |
sean-k-mooney | proably some ahci interupt or something | 19:24 |
sean-k-mooney | i know there are intrupts that can be sent for memory and cpu hotplug which argubly could change numa topology | 19:25 |
sean-k-mooney | so it not out of the question that linux or widnows could handel a dynmaic topology if you were to notify it. not sure i would like a numa node to go away but adding one might be ok | 19:26 |
*** openstackgerrit has joined #openstack-nova | 19:27 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP libvirt: Add workaround to stop use of qemu-img by the RT https://review.openstack.org/589567 | 19:27 |
efried | sean-k-mooney: I think it's more likely to be like, "You thought procs 0 and 2 were next to each other, and 1 and 3 were next to each other. Well, now it's 0-1 and 2-3. Deal." | 19:27 |
sean-k-mooney | haha ya maybe. if you ever test it let me know | 19:28 |
efried | sean-k-mooney: I feel confident in saying that will never happen. | 19:28 |
efried | ...for some value of "never". | 19:28 |
sean-k-mooney | efried: you obvioulsy are not spending enough time talking to telcos | 19:29 |
efried | If it's anything like talking to the AT&T help desk, I assure you I don't want to spend any more. | 19:30 |
*** READ10 has quit IRC | 19:40 | |
*** cdent has quit IRC | 19:42 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/pike: Make ResourceTracker.stats node-specific https://review.openstack.org/588037 | 19:46 |
mriedem | was there a zuul reset or something such that we need to recheck things? | 19:49 |
dansmith | 513 backlog, | 19:51 |
dansmith | so I imagine it's catching up | 19:51 |
dansmith | backlog has been up and down all day so I figured something was going on | 19:51 |
sean-k-mooney | nothing on the openstack-infra twitter for the last 4 days. then normally post somthing if they need to do a reset | 19:53 |
sean-k-mooney | well it auto posts when then set a status message on the infra channel but in either case the gate is proably just busy/backed up | 19:54 |
mriedem | ok | 19:56 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/ocata: Make ResourceTracker.stats node-specific https://review.openstack.org/588077 | 20:00 |
*** tbachman has joined #openstack-nova | 20:16 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: doc: mark the max microversion for rocky https://review.openstack.org/589598 | 20:20 |
*** hoonetorg has joined #openstack-nova | 20:30 | |
maciejjozefczyk | mriedem: efried please check my response: https://review.openstack.org/#/c/520024 | 20:31 |
efried | Thanks maciejjozefczyk | 20:32 |
maciejjozefczyk | efried: u2 for pinging me, I forgot about this topic | 20:32 |
mriedem | melwitt: comments in the reno prelude https://review.openstack.org/#/c/589303/ | 20:37 |
melwitt | ack | 20:37 |
*** maciejjozefczyk has quit IRC | 20:42 | |
-openstackstatus- NOTICE: Due to a bug, Zuul has been unable to report on cherry-picked changes over the last 24 hours. This has now been fixed; if you encounter a cherry-picked change missing its results (or was unable to merge), please recheck now. | 20:43 | |
mriedem | jroll: you might want to check the rebalance question in here https://review.openstack.org/#/c/520024/ | 20:43 |
*** dpawlik has joined #openstack-nova | 20:45 | |
*** nicolasbock has quit IRC | 20:45 | |
jroll | "want" is a weird word to use :P | 20:49 |
jroll | looking | 20:49 |
mriedem | i'm just asking for a witness that i asked for an ironic person to look before we break the rebalance stuff | 20:53 |
jroll | heh | 20:53 |
jroll | so I'm thinking it's fine to remove that, but... I don't know everything that _update does and I'm not sure things like _update_usages_from_instances() and such will do the right thing, if we haven't updated which compute hosts we moved active ironic instances to | 20:54 |
jroll | I feel like I'm nervous to remove it | 20:55 |
sean-k-mooney | mriedem: is there an ironic job we can trigger maybe via the experimental pipelien to test that patch | 20:56 |
jroll | sean-k-mooney: there isn't one that actively kills compute hosts and such to trigger rebalances | 20:57 |
sean-k-mooney | jroll: it looks like the only ironic job that did run on this skipp 22 out of the 23 test ... http://logs.openstack.org/24/520024/9/check/ironic-tempest-dsvm-ipa-wholedisk-bios-agent_ipmitool-tinyipa/1fe9001/testr_results.html.gz | 20:58 |
*** slaweq has quit IRC | 20:58 | |
*** tbachman has quit IRC | 21:06 | |
jroll | sean-k-mooney: yes, we run a very small set of tests on nova changes because our tests take a long time | 21:07 |
jroll | sean-k-mooney: that doesn't change the fact that there aren't any full-stack tests that exercise this code | 21:08 |
jroll | because it's really hard to do - would involve multiple nova-compute instances and orchestrating bringing them up and down and such | 21:08 |
efried | just mock everything | 21:08 |
* jroll thinks that's probably sarcasm | 21:09 | |
jroll | anyway, I'm gonna post my comments on the change and then not computer anymore today. see y'all later | 21:09 |
sean-k-mooney | jroll: im not saying there would be test for this just taking note of the fact that the ironic testing is much lighter then i would have expected | 21:11 |
sean-k-mooney | jroll: that is not a slight againt ironic i just would have expect more tests even if they only ran in the gate pipeline and not check | 21:12 |
jroll | sean-k-mooney: figure out how to get nested virt in the gate and we can run them fast enough to test more :) | 21:12 |
jroll | that one instance boot takes like 20 minutes or something iirc | 21:13 |
sean-k-mooney | whell i honestly dont know why we dont use nested virt in the gate other then the fact that rackspace provides xen based instance and most of our test would like kvm | 21:14 |
jroll | there's a long history of why that I don't have time to get into | 21:14 |
* jroll really out now | 21:14 | |
sean-k-mooney | jroll: i have been trying to get nested virt in the gate since 2013 im well awaree of the history but no worries | 21:14 |
* sean-k-mooney sometimes i hate dpdk ... | 21:17 | |
melwitt | only sometimes? | 21:18 |
sean-k-mooney | i just spent 6 hours debuging network connectivit issue with ovs-dpdk because it has buggy numa detection logic | 21:18 |
sean-k-mooney | melwitt: normally it works fine but when it doesnt its a pain in the ass to figure out why | 21:19 |
*** awaugama has quit IRC | 21:21 | |
*** slaweq has joined #openstack-nova | 21:23 | |
*** tbachman has joined #openstack-nova | 21:26 | |
*** edmondsw has quit IRC | 21:29 | |
mriedem | dansmith: do you have any idea why we have a Service.availability_zone field on the versioned object? | 21:31 |
mriedem | i see that we try to lazy-load it for notifications | 21:31 |
mriedem | 2018-08-07 17:24:53,165 DEBUG [nova.objects.service] Lazy-loading 'availability_zone' on Service id 2 | 21:31 |
mriedem | 2018-08-07 17:24:53,166 DEBUG [nova.notifications.objects.base] Defaulting the value of the field 'availability_zone' to None in ServiceStatusPayload due to 'Object action obj_load_attr failed because: attribute availability_zone not lazy-loadable' | 21:31 |
mriedem | i guess for https://github.com/openstack/nova/commit/a2c6838ff5ff095940a76ebd4d578e24575c30d8#diff-b9be5fa188b7efd457da79e9c543344bR110 | 21:34 |
sean-k-mooney | jaypipes: are you around? | 21:39 |
*** tbachman has quit IRC | 21:42 | |
sean-k-mooney | i just realised that we missed | 21:43 |
sean-k-mooney | * i just realised that https://review.openstack.org/#/c/587378/3/vif_plug_ovs/ovs.py misses passing the ovsdb_connection on on of the vhost-user code paths. should i just submit a patch for the missing fucntion or revert and submit an updated versions | 21:45 |
sean-k-mooney | im thinking just add a patch on top the get the missing function call but just said i would ask | 21:46 |
*** tbachman has joined #openstack-nova | 21:47 | |
*** bitskrie1 has quit IRC | 21:54 | |
*** slagle has quit IRC | 21:56 | |
sean-k-mooney | mriedem: by the way im assuming you did not have time to test livemigrating between different neutron backends as part of the multiple port binding blueprint? | 22:01 |
mriedem | sean-k-mooney: mlavalle did it between ovs and linuxbridge using neutron directly, not via nova | 22:02 |
mriedem | i don't have a mixed vif type env setup no | 22:02 |
sean-k-mooney | mriedem: now that i have figured out why ovs-dpdk was not working from me ill try an test it out later this week. ill also try it via linux bridge if i get a chance | 22:02 |
sean-k-mooney | cool no worries | 22:02 |
sean-k-mooney | i normaly have ovs, ovs-dpdk and linux bride deployed concurrnetly or at least i did before i move. ill test the matirx of all 3 setups and let you know how it goes | 22:04 |
*** dpawlik has quit IRC | 22:09 | |
*** owalsh_ has joined #openstack-nova | 22:09 | |
*** priteau has quit IRC | 22:09 | |
*** owalsh has quit IRC | 22:12 | |
*** owalsh- has joined #openstack-nova | 22:12 | |
*** owalsh- is now known as owalsh | 22:13 | |
*** beagles has quit IRC | 22:13 | |
*** owalsh_ has quit IRC | 22:14 | |
*** beagles has joined #openstack-nova | 22:20 | |
*** rcernin has joined #openstack-nova | 22:20 | |
mriedem | Kevin_Zheng: comments all over https://review.openstack.org/#/q/topic:bug/1781880+(status:open+OR+status:merged) so it should be clear to update now | 22:30 |
mriedem | and abandon the functional test since it will never work | 22:30 |
*** luksky has quit IRC | 22:37 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update really old comments about vmware hosts managing multiple nodes https://review.openstack.org/589666 | 22:39 |
*** hongbin has quit IRC | 22:39 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add microversion info in the os-server-groups API samples https://review.openstack.org/589006 | 22:47 |
melwitt | mriedem: I'm not sure what to make of this comment after trying to make the changes to make things fail if a cell raises an exception https://review.openstack.org/#/c/540258/10/nova/scheduler/utils.py@843 | 22:50 |
melwitt | it's not clear to me if a cell raising an exception will be part of our "down cell" handling down the line or if we should expect to potentially fail a server create with a 500 if a cell raises an exception when we query it | 22:51 |
melwitt | pre-down-cell handling | 22:51 |
melwitt | initially I was thinking yes, it makes sense to fail the create if a cell raises an exception, but that would propagate up to the user as a 500 because there's something wrong with a cell, and then I became unsure if that falls under the "down cell" stuff or if it's just something we should do now | 22:53 |
*** edmondsw has joined #openstack-nova | 22:54 | |
efried | mriedem: I assume that +1 is so we wait until stein to land it | 22:58 |
efried | (the "Update resources once" patch) | 22:58 |
mriedem | efried: yeah | 22:58 |
efried | mriedem: I'm going to -2 it just in case, but I'm with ya. | 22:59 |
mriedem | efried: don't think you need to -2 | 22:59 |
*** edmondsw has quit IRC | 22:59 | |
openstackgerrit | Merged openstack/os-traits master: Update reno for stable/rocky https://review.openstack.org/586103 | 22:59 |
mriedem | efried: i -W'ed it | 23:00 |
efried | mriedem: You mean because no other cores are going to "accidentally" merge it? :) | 23:00 |
mriedem | melwitt: how does a cell raise an exception? | 23:00 |
mriedem | if the called function does? | 23:00 |
mriedem | like the DB API query explodes or something? | 23:00 |
melwitt | mriedem: yeah, exactly. if whatever is called under target_cell raises | 23:00 |
efried | mriedem: ack, +2ed | 23:00 |
melwitt | it makes sense to fail the boot over that if someone is trying to boot with affinity and a cell somehow raises an exception (gibi pointed it out on the review) | 23:01 |
mriedem | melwitt: it wouldn't be a 500 to the user for server create | 23:01 |
mriedem | b/c we've already cast from api to conductor which is what calls setup_instance_group | 23:01 |
melwitt | I'm just getting mixed up about whether that is in the scope of the bug fix I'm working on right now, or if that's going to be later on with the handling of a down cell work | 23:02 |
mriedem | cold/live migrate + evacuate + unshelve might return a 500.... | 23:02 |
mriedem | well gibi said it was ok to add a TODO and deal with it later yeah? | 23:02 |
melwitt | oh, okay. my bad, I was thinking setup_instance_group was called from compute/api but was mistaken | 23:02 |
mriedem | it's called from conductor, but whether or not we've already returned 202 to the user depends on the operation | 23:03 |
melwitt | he said the TODO for the "did not respond" but for the raised exception case, suggeste failing the boot | 23:03 |
mriedem | i would lump that into tssurya's bp in stein | 23:03 |
mriedem | or as a separate bug fix | 23:03 |
mriedem | it's not the issue for this patch | 23:03 |
melwitt | okay. thanks. that's what I was thinking as I went to add it, it's adding a lot to the scope | 23:04 |
mriedem | commented | 23:04 |
mriedem | melwitt: i also looked at https://review.openstack.org/#/c/582332/ and i'm not sure what it changes, | 23:04 |
mriedem | but i did notice we're logging in a greenthread in one place there and i thought that was a real no-no | 23:05 |
mriedem | dansmith: yeah? ^ | 23:05 |
*** _ix has quit IRC | 23:05 | |
melwitt | mriedem: I'm not 100% sure where the methods that were changed get called during the scheduler run, so I have to look at that to see if the result is visible during a gate run | 23:07 |
melwitt | basically, any time those methods run, they will replace the thread local context that oslo.context stores underneath, and oslo.log pulls from that to log request-ids | 23:07 |
melwitt | so what would happen is a request-id for a thread would change midway if one of the methods that created RequestContext without overwrite=False ran during it | 23:08 |
mriedem | both run on startup | 23:08 |
melwitt | okay, I think that's why it wouldn't show up | 23:08 |
mriedem | well, one does | 23:08 |
mriedem | the cells refresh one | 23:08 |
mriedem | the other doesn't, presumably b/c it's in a greenthread | 23:08 |
mriedem | or maybe b/c we have'nt discovered any hosts yet | 23:09 |
*** itlinux_ has joined #openstack-nova | 23:13 | |
melwitt | yeah, looks like the greenthread is spawned during startup so it's effectively during startup too | 23:15 |
openstackgerrit | Merged openstack/nova master: [placement] Add version directives in the history doc https://review.openstack.org/589392 | 23:15 |
openstackgerrit | Merged openstack/nova master: Avoid joins in _server_group_count_members_by_user https://review.openstack.org/580764 | 23:16 |
openstackgerrit | Merged openstack/nova master: Use common functions in granular fixture https://review.openstack.org/588113 | 23:16 |
*** priteau has joined #openstack-nova | 23:17 | |
melwitt | but I see now, it doesn't run at all, don't find it in the log | 23:17 |
melwitt | oh, because logging in the greenthread is expected not to work? | 23:17 |
melwitt | oh, it's because [filter_scheduler]/track_instance_changes = False | 23:19 |
mriedem | melwitt: no i think it's because we disable CONF.filter_scheduler.track_instance_changes in superconductor mode in devstack | 23:19 |
mriedem | http://logs.openstack.org/32/582332/5/check/neutron-grenade/b319aa4/logs/screen-n-sch.txt.gz#_Jul_31_18_48_24_316837 | 23:19 |
melwitt | ahh | 23:20 |
mriedem | grenade runs in singleconductor mode so it's logged there | 23:20 |
melwitt | nice | 23:20 |
melwitt | so let's see if there's a difference in that job before the patch | 23:20 |
melwitt | looks like it, new request-id as of async_init_instance_info http://logs.openstack.org/58/540258/10/check/neutron-grenade/54f96c0/logs/screen-n-sch.txt.gz#_Jul_24_06_24_31_353068 | 23:22 |
melwitt | wait, but I still see the request-id prior to that being logged too | 23:22 |
melwitt | hm | 23:22 |
melwitt | oh, bc it has its own greenthread. duh. yeah so that one wouldn't show anything anyway | 23:28 |
melwitt | it can't cause a change in any other greenthread's local context | 23:29 |
* melwitt will bbl | 23:34 | |
*** s10 has quit IRC | 23:37 | |
*** s10 has joined #openstack-nova | 23:38 | |
*** s10 has quit IRC | 23:38 | |
jaypipes | sean-k-mooney: sure, just add a patch on top. | 23:38 |
*** s10 has joined #openstack-nova | 23:39 | |
*** s10 has quit IRC | 23:39 | |
*** s10 has joined #openstack-nova | 23:39 | |
*** s10 has quit IRC | 23:40 | |
*** s10 has joined #openstack-nova | 23:40 | |
*** s10 has quit IRC | 23:41 | |
*** s10 has joined #openstack-nova | 23:41 | |
*** s10 has quit IRC | 23:41 | |
*** Swami has quit IRC | 23:46 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: api-ref: fix min_version for parent_provider_uuid in responses https://review.openstack.org/579577 | 23:47 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: api-ref: Add descriptions for rebuild https://review.openstack.org/588931 | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!