*** gyee has quit IRC | 00:00 | |
*** cdent has quit IRC | 00:02 | |
*** tbachman has quit IRC | 00:04 | |
*** mvkr has quit IRC | 00:05 | |
*** mvkr has joined #openstack-nova | 00:18 | |
*** prometheanfire has left #openstack-nova | 00:24 | |
*** hongbin has joined #openstack-nova | 00:26 | |
*** macza has quit IRC | 00:41 | |
*** erlon has quit IRC | 00:41 | |
*** sapd1 has joined #openstack-nova | 00:46 | |
*** imacdonn has quit IRC | 00:49 | |
*** imacdonn has joined #openstack-nova | 00:49 | |
*** moshele has quit IRC | 00:53 | |
*** bhagyashris has joined #openstack-nova | 01:08 | |
*** bzhao__ has joined #openstack-nova | 01:10 | |
*** mhen has quit IRC | 01:13 | |
*** mhen has joined #openstack-nova | 01:17 | |
*** mrsoul has quit IRC | 01:25 | |
*** mriedem has quit IRC | 01:26 | |
*** janki has joined #openstack-nova | 01:28 | |
*** Dinesh_Bhor has joined #openstack-nova | 01:35 | |
*** brinzhang has joined #openstack-nova | 01:51 | |
*** Bhujay has joined #openstack-nova | 02:00 | |
*** bhagyashris has quit IRC | 02:00 | |
*** gcb_ has joined #openstack-nova | 02:07 | |
*** Dinesh_Bhor has quit IRC | 02:25 | |
*** bhagyashris has joined #openstack-nova | 02:25 | |
*** gbarros has quit IRC | 02:28 | |
*** gbarros has joined #openstack-nova | 02:37 | |
*** Dinesh_Bhor has joined #openstack-nova | 02:40 | |
*** gbarros has quit IRC | 02:53 | |
*** markvoelker has joined #openstack-nova | 03:01 | |
*** bhagyashris has quit IRC | 03:01 | |
*** psachin has joined #openstack-nova | 03:06 | |
*** janki has quit IRC | 03:06 | |
*** Bhujay has quit IRC | 03:24 | |
*** yikun has joined #openstack-nova | 03:24 | |
*** Nel1x has quit IRC | 03:34 | |
*** udesale has joined #openstack-nova | 04:00 | |
*** Dinesh_Bhor has quit IRC | 04:07 | |
*** hongbin has quit IRC | 04:10 | |
*** janki has joined #openstack-nova | 04:27 | |
*** psachin has quit IRC | 04:27 | |
*** hoonetorg has quit IRC | 04:29 | |
*** gbarros has joined #openstack-nova | 04:31 | |
*** markvoelker has quit IRC | 04:32 | |
*** markvoelker has joined #openstack-nova | 04:43 | |
*** Bhujay has joined #openstack-nova | 04:45 | |
*** hoonetorg has joined #openstack-nova | 04:47 | |
*** Bhujay has quit IRC | 04:51 | |
*** markvoelker has quit IRC | 04:52 | |
*** Bhujay has joined #openstack-nova | 04:53 | |
*** hoonetorg has quit IRC | 04:54 | |
*** holser_ has joined #openstack-nova | 04:59 | |
*** ratailor has joined #openstack-nova | 05:04 | |
*** hoonetorg has joined #openstack-nova | 05:06 | |
*** Dinesh_Bhor has joined #openstack-nova | 05:10 | |
*** psachin has joined #openstack-nova | 05:14 | |
*** gbarros has quit IRC | 05:42 | |
*** jaosorior has quit IRC | 05:45 | |
*** jaosorior has joined #openstack-nova | 05:47 | |
*** Luzi has joined #openstack-nova | 05:51 | |
vishakha | mriedem: Hi, https://review.openstack.org/#/c/580271/ is this change valid?? Kindly review. As I am little confused with comments. Thanks | 05:52 |
---|---|---|
*** abhishekk has joined #openstack-nova | 05:55 | |
*** moshele has joined #openstack-nova | 06:01 | |
*** holser_ has quit IRC | 06:02 | |
*** janki has quit IRC | 06:06 | |
*** yikun has quit IRC | 06:11 | |
*** abhishekk has quit IRC | 06:18 | |
*** alexchadin has joined #openstack-nova | 06:27 | |
*** ccamacho has joined #openstack-nova | 06:28 | |
openstackgerrit | Chen proposed openstack/nova stable/rocky: Update ssh configuration doc https://review.openstack.org/594041 | 06:31 |
*** maciejjozefczyk has joined #openstack-nova | 06:33 | |
*** ratailor_ has joined #openstack-nova | 06:34 | |
*** ratailor has quit IRC | 06:35 | |
*** ratailor_ has quit IRC | 06:36 | |
*** ratailor has joined #openstack-nova | 06:36 | |
*** abhishekk has joined #openstack-nova | 06:36 | |
openstackgerrit | Chen proposed openstack/nova stable/rocky: Revisons on notifications doc https://review.openstack.org/594042 | 06:37 |
*** rcernin has quit IRC | 06:38 | |
*** ratailor_ has joined #openstack-nova | 06:38 | |
*** ratailor__ has joined #openstack-nova | 06:40 | |
*** rcernin has joined #openstack-nova | 06:40 | |
*** ratailor has quit IRC | 06:41 | |
*** ratailor__ has quit IRC | 06:41 | |
*** ratailor__ has joined #openstack-nova | 06:41 | |
*** pcaruana has joined #openstack-nova | 06:42 | |
*** ratailor_ has quit IRC | 06:43 | |
*** abhishekk has quit IRC | 06:47 | |
*** luksky has joined #openstack-nova | 06:50 | |
*** adrianc has joined #openstack-nova | 06:51 | |
*** rcernin has quit IRC | 06:51 | |
*** tssurya has joined #openstack-nova | 06:52 | |
*** NostawRm has quit IRC | 07:09 | |
openstackgerrit | Jiri Suchomel proposed openstack/nova master: Ignore deleted instances when populating with availability zones https://review.openstack.org/594050 | 07:13 |
gmann | alex_xu: i am on vacation for 2 weeks (till 31st Aug ) so will not be able to do API office hour. | 07:17 |
gmann | melwitt: ^^ i will be able to provide API updates from 31st Aug onward (on vacation for 2 weeks.) | 07:17 |
*** alexchadin has quit IRC | 07:21 | |
*** sahid has joined #openstack-nova | 07:22 | |
*** alexchadin has joined #openstack-nova | 07:32 | |
*** tetsuro has quit IRC | 07:36 | |
*** jpena|off is now known as jpena | 07:40 | |
*** macza has joined #openstack-nova | 08:06 | |
*** tetsuro has joined #openstack-nova | 08:08 | |
*** yankcrime has joined #openstack-nova | 08:11 | |
*** macza has quit IRC | 08:11 | |
*** burt has quit IRC | 08:20 | |
*** burt has joined #openstack-nova | 08:21 | |
*** cdent has joined #openstack-nova | 08:33 | |
*** macza has joined #openstack-nova | 08:48 | |
*** sahid has quit IRC | 08:53 | |
*** alexchadin has quit IRC | 08:53 | |
*** macza has quit IRC | 08:53 | |
*** holser_ has joined #openstack-nova | 08:53 | |
*** macza has joined #openstack-nova | 09:09 | |
*** macza has quit IRC | 09:14 | |
*** alexchadin has joined #openstack-nova | 09:14 | |
*** luksky has quit IRC | 09:28 | |
*** macza has joined #openstack-nova | 09:30 | |
*** macza has quit IRC | 09:35 | |
*** Dinesh_Bhor has quit IRC | 09:39 | |
*** mriedem has joined #openstack-nova | 09:42 | |
mriedem | o/ | 09:43 |
kashyap | Isn't it terribly early for you there? | 09:44 |
* cdent checks the time | 09:44 | |
cdent | jet lag? | 09:44 |
sean-k-mooney | mriedem: o/ | 09:44 |
* kashyap waves hi, getting back after a 2-ish week PTO | 09:44 | |
mriedem | jet lag and work on the brain | 09:45 |
lbragstad | mriedem: same, my sleep schedule is completely screwed | 09:45 |
sean-k-mooney | mriedem: well do what jay pipes used to do. if your up early get work done early and be done by 1/2 pm and enjoy the rest of your day | 09:46 |
* kashyap just bought the book (everybody and their dog are recommending it) -- https://www.amazon.com/Why-We-Sleep-Unlocking-Dreams/dp/1501144316. Almost at every page the author backs up his claims based on solid science | 09:47 | |
kashyap | mriedem: ^ Might want to check it out, when you're not sleeping :P | 09:48 |
kashyap | (FWIW, the author is not a "journalist" writing junk 'pop science'; he's a serious researcher on that topic.) | 09:49 |
*** sahid has joined #openstack-nova | 09:56 | |
mriedem | sean-k-mooney: question in https://bugs.launchpad.net/nova/+bug/1788014 | 09:57 |
openstack | Launchpad bug 1788014 in OpenStack Compute (nova) "when live migration fails due to a internal error rollback is not handeled correctly." [Undecided,New] | 09:57 |
*** dtantsur|afk is now known as dtantsur | 09:59 | |
*** adrianc has quit IRC | 10:00 | |
*** luksky has joined #openstack-nova | 10:02 | |
*** dpawlik_ has quit IRC | 10:03 | |
sean-k-mooney | mriedem: hi i think that is the issue yes. i have not had time to pin down the exact cause but i supect it because we are not activating the source binding after deleting the dest | 10:03 |
mriedem | hmm, i guess i would have expected neutron to automatically activate the source host port bindings when the dest host bindings were deleted | 10:04 |
mriedem | because when we activate the dest host bindings, neutron automatically de-activates the source host bindings, | 10:04 |
mriedem | so my thinking was when we delete the dest bindings on rollback, neutron would say, oh i need to activate the only other bindings (source) left | 10:04 |
mriedem | i could have a wip patch for you to test with if you still have that live migration env available | 10:05 |
*** dpawlik has joined #openstack-nova | 10:08 | |
*** adrianc has joined #openstack-nova | 10:09 | |
*** macza has joined #openstack-nova | 10:12 | |
sean-k-mooney | mriedem: i have the devstack vms shut down but i can have it set up quickly again | 10:16 |
openstackgerrit | Slawek Kaplonski proposed openstack/os-vif master: Avoid os-vif to add ovs ports as trunk by default https://review.openstack.org/594118 | 10:16 |
*** macza has quit IRC | 10:16 | |
sean-k-mooney | im goint to work on the first two neutron bugs first | 10:16 |
sean-k-mooney | ^ that is confusing.... we dont | 10:17 |
sapd1 | sean-k-mooney: Are you working on SR-IOV attach/detach? | 10:20 |
sapd1 | :D | 10:20 |
sean-k-mooney | sapd1: its on my todo list | 10:21 |
*** alexchadin has quit IRC | 10:25 | |
*** Bhujay has quit IRC | 10:26 | |
openstackgerrit | Jiri Suchomel proposed openstack/nova master: Set default AZ explicitely for instances without host. Ignore deleted instances when populating with availability zones https://review.openstack.org/594050 | 10:26 |
openstackgerrit | Jiri Suchomel proposed openstack/nova master: Set default AZ explicitely for instances without host. https://review.openstack.org/594050 | 10:27 |
*** neha30 has quit IRC | 10:30 | |
mriedem | tssurya: on ^, we should just filter out instances w/o a host | 10:31 |
mriedem | default_availability_zone isn't the right config option for instances | 10:32 |
mriedem | default_schedule_zone is, but it defaults to None so it wouldn't fix the bug | 10:32 |
*** macza has joined #openstack-nova | 10:33 | |
*** Dinesh_Bhor has joined #openstack-nova | 10:34 | |
*** alexchadin has joined #openstack-nova | 10:36 | |
sean-k-mooney | mriedem: isint the default availableity zone nova? | 10:36 |
mriedem | default default_availability_zone is nova | 10:36 |
mriedem | if the instance is on a host | 10:36 |
mriedem | default_schedule_zone is the thing we set on instance.availability_zone if the user didn't request an az | 10:37 |
mriedem | and that defaults to None | 10:37 |
*** ratailor__ has quit IRC | 10:37 | |
*** ratailor has joined #openstack-nova | 10:37 | |
*** macza has quit IRC | 10:38 | |
sean-k-mooney | oh ok, does horozon handel that differently? | 10:38 |
sean-k-mooney | or does devstack set them both to nova? | 10:38 |
mriedem | no | 10:39 |
mriedem | GET /servers/{id} will return '' for the az if the instance doesn't have a host set https://github.com/openstack/nova/blob/722d5b477219f0a2435a9f4ad4d54c61b83219f1/nova/api/openstack/compute/views/servers.py#L170 | 10:40 |
*** ratailor_ has joined #openstack-nova | 10:40 | |
*** ratailor_ has quit IRC | 10:40 | |
sean-k-mooney | mriedem: when can an instance not have a host set. when its shelved? | 10:41 |
mriedem | if it fails during scheduling | 10:42 |
mriedem | NoValidHost | 10:42 |
sean-k-mooney | ah ok | 10:42 |
mriedem | and yes if it's shelved offloaded | 10:42 |
mriedem | i'm not sure that we clear out the instance availability_zone on shelve offload though | 10:42 |
sean-k-mooney | well it makes sense if its not schduled to a node it should not have az right? | 10:42 |
mriedem | if not that's like another bug | 10:42 |
mriedem | correct | 10:42 |
mriedem | that's what i'm saying in the review | 10:43 |
*** ratailor has quit IRC | 10:43 | |
mriedem | melwitt: i've marked https://bugs.launchpad.net/nova/+bug/1788115 for rc potential | 10:44 |
openstack | Launchpad bug 1788115 in OpenStack Compute (nova) "nova-manage db online_data_migrations hangs on instances with no host set" [Medium,In progress] - Assigned to Jiri Suchomel (jsuchome) | 10:44 |
mriedem | yeah we don't clear out the instance.az on shelve offload | 10:46 |
mriedem | we will update it on unshelve though | 10:46 |
mriedem | https://github.com/openstack/nova/blob/722d5b477219f0a2435a9f4ad4d54c61b83219f1/nova/conductor/manager.py#L815 | 10:46 |
mriedem | which reminds me https://review.openstack.org/#/c/559828/ | 10:47 |
mriedem | as noted in ^ we also don't clear the port binding | 10:48 |
tssurya | mriedem: back from lunch, oh okay yea would be better to all together filter out the ones where host is NOne | 10:49 |
mriedem | tssurya: i think that still works for your original bug too | 10:49 |
mriedem | comments inline on why | 10:50 |
* tssurya looking | 10:50 | |
*** macza has joined #openstack-nova | 10:54 | |
*** macza has quit IRC | 10:59 | |
tssurya | mriedem: I agree, | 10:59 |
*** Dinesh_Bhor has quit IRC | 11:05 | |
*** jpena is now known as jpena|lunch | 11:10 | |
*** Bhujay has joined #openstack-nova | 11:13 | |
*** udesale has quit IRC | 11:25 | |
*** macza has joined #openstack-nova | 11:35 | |
*** macza has quit IRC | 11:40 | |
openstackgerrit | Brin Zhang proposed openstack/nova-specs master: Resource retrieving: add change-before filter https://review.openstack.org/591976 | 11:44 |
openstackgerrit | Jiri Suchomel proposed openstack/nova master: Filter out instances without a host when populating AZ https://review.openstack.org/594050 | 11:47 |
*** MasterofJOKers has quit IRC | 11:49 | |
*** ujjain has quit IRC | 11:51 | |
mriedem | sean-k-mooney: see how ^ floats your boat | 11:53 |
mriedem | oops | 11:53 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Re-activate source host port bindings on live migration rollback https://review.openstack.org/594139 | 11:53 |
mriedem | sean-k-mooney: ^ | 11:53 |
mriedem | i want to see what miguel thinks about that too | 11:53 |
sean-k-mooney | mriedem: ill test that and see. the other thing is im not sure what the state of qemu when this partaclar bug happens due to the fact th monitor closed. | 11:55 |
sean-k-mooney | that is kind of a seperate bug however | 11:56 |
sean-k-mooney | at least with this patch if i do a hard reboot i think everything shold work properly again. | 11:56 |
*** MasterofJOKers has joined #openstack-nova | 11:56 | |
mriedem | i am a bit surprised that qemu would bomb out after we went into post-copy mode | 11:56 |
mriedem | we only activate the dest host port bindings during post-copy or post-live migration | 11:56 |
mriedem | so your qemu failure must have happened after post-copy for us to activate the dest host bindings | 11:57 |
mriedem | you'd know if you saw "Binding ports to destination host" in the source host compute logs | 11:57 |
*** yikun_ has joined #openstack-nova | 11:57 | |
sean-k-mooney | ill reporduce and check that then apply your patch and see what happens | 11:58 |
*** ujjain has joined #openstack-nova | 12:00 | |
*** macza has joined #openstack-nova | 12:04 | |
openstackgerrit | Yikun Jiang (Kero) proposed openstack/nova master: [placement] Use oslotest uuidsentinel https://review.openstack.org/594144 | 12:05 |
mriedem | cdent: ^ | 12:05 |
cdent | nice | 12:05 |
openstackgerrit | Brin Zhang proposed openstack/nova-specs master: Resource retrieving: add change-before filter https://review.openstack.org/591976 | 12:07 |
cdent | mriedem: wait | 12:07 |
cdent | nm | 12:08 |
cdent | efried: did you see https://review.openstack.org/#/c/594068/ | 12:08 |
efried | looking... | 12:08 |
*** macza has quit IRC | 12:08 | |
efried | cdent: Thanks, I was just starting to poke on that. | 12:09 |
cdent | efried: cool, wasn't sure if you had started your own | 12:09 |
*** macza has joined #openstack-nova | 12:11 | |
mriedem | clearly that will have to go through the release dance | 12:11 |
*** macza has quit IRC | 12:12 | |
*** jpena|lunch is now known as jpena | 12:18 | |
openstackgerrit | Brin Zhang proposed openstack/nova-specs master: Resource retrieving: add change-before filter https://review.openstack.org/591976 | 12:23 |
openstackgerrit | Slawek Kaplonski proposed openstack/os-vif master: Avoid os-vif to add untagged ports to ovs ports by default https://review.openstack.org/594118 | 12:23 |
*** brinzhang has quit IRC | 12:24 | |
openstackgerrit | Jiri Suchomel proposed openstack/nova master: Filter out instances without a host when populating AZ https://review.openstack.org/594050 | 12:26 |
*** Luzi has quit IRC | 12:28 | |
openstackgerrit | Slawek Kaplonski proposed openstack/os-vif master: DNM Testing different CDN projects with DEAD_VLAN_TAG https://review.openstack.org/594153 | 12:28 |
*** tbachman has joined #openstack-nova | 12:32 | |
*** macza has joined #openstack-nova | 12:35 | |
*** tbachman_ has joined #openstack-nova | 12:35 | |
*** Luzi has joined #openstack-nova | 12:35 | |
*** tbachman has quit IRC | 12:37 | |
*** tbachman_ is now known as tbachman | 12:37 | |
*** macza has quit IRC | 12:40 | |
tssurya | alex_xu, gmann or other api-experts: nova list seems to have this filter "--instance-name" which doesn't seem to be processed anywhere, was wondering where/why it was used ? | 12:58 |
*** NostawRm has joined #openstack-nova | 13:00 | |
alex_xu | tssurya: I even don't know we have '--instance-name' filter in the API | 13:00 |
alex_xu | I don't think we have that in the API | 13:01 |
tssurya | alex_xu: I didn't find it either; but its listed in the options | 13:01 |
mriedem | what is it translated to in novaclient? | 13:01 |
mriedem | added in 2011 by rackspace so it was probably something in rax | 13:03 |
*** efried is now known as efried_goatin | 13:03 | |
mriedem | not upstream | 13:03 |
mriedem | looks like it's meant to filter on OS-EXT-SRV-ATTR:instance_name | 13:04 |
alex_xu | we have attribute 'OS-EXT-SRV-ATTR:instance_name', try to find out if there is any translate | 13:04 |
alex_xu | mriedem: yea | 13:04 |
mriedem | right which would be instance.name | 13:04 |
mriedem | filtering on --name would be instance.display_name | 13:04 |
tssurya | but we also have the "--name" | 13:04 |
tssurya | ah okay | 13:04 |
mriedem | https://github.com/openstack/python-novaclient/commit/dcd5544133f1cc1171f8078b2ed54143b52fb064 | 13:06 |
alex_xu | but it doesn't work in server side | 13:07 |
tssurya | alex_xu: yea, was just trying it | 13:07 |
*** s10 has joined #openstack-nova | 13:08 | |
tssurya | because we don't have it on the server side right ? | 13:08 |
alex_xu | yes, I think so | 13:09 |
alex_xu | gmann: enjoy your vacation! | 13:09 |
tssurya | cool, should I open a bug on the client to remove it from the option for the users ? or do we have plans on putting it on the server side | 13:10 |
mriedem | that initial novaclient change wasn't even correct, | 13:11 |
mriedem | it was later updated in https://github.com/openstack/python-novaclient/commit/fc8e5e3fe3a1164eb2e923ed599e63a2af1a4f3c | 13:11 |
alex_xu | tssurya: we have a filter called 'name', it is should be the '--instance-name' | 13:11 |
tssurya | either ways, I was trying to skip the minimal constructs for down cells for "all" filters and came across this one not abiding the rules and doing nothing except priting everything | 13:11 |
alex_xu | tssurya: sorry, that 'name' is mapping to 'display_name' also | 13:12 |
tssurya | alex_xu: yea, the documentation for those options need to be more clear to explain what means what if we are going to have both | 13:12 |
alex_xu | tssurya: yea | 13:12 |
tssurya | mriedem: oh, so you want to keep both ? | 13:13 |
mriedem | not necessarily, | 13:13 |
mriedem | clearly there is a bug in novaclient which needs to be reported | 13:13 |
tssurya | mriedem: right, I can open a bug now and we can see if this option is really useful to implement on the server side, else we can punt it. At least the documentation should be clearer for those options | 13:14 |
mriedem | so the --name filter in nova list is being mapped to filter on instance.name rather than display_name? | 13:15 |
mriedem | https://github.com/openstack/python-novaclient/commit/fc8e5e3fe3a1164eb2e923ed599e63a2af1a4f3c | 13:16 |
mriedem | oops | 13:16 |
mriedem | filter_mapping = { | 13:16 |
mriedem | 'image': 'image_ref', | 13:16 |
mriedem | 'name': 'display_name', | 13:16 |
mriedem | so we map name to display_name in the compute API code | 13:16 |
mriedem | and instance_name should map to 'name' | 13:16 |
mriedem | is what i think alex_xu was saying | 13:16 |
mriedem | if we want to support that in the server | 13:16 |
mriedem | but the client side --instance-name filter doesn't do anything today, right? | 13:16 |
alex_xu | mriedem: we have instance_name filter long time before https://github.com/openstack/nova/commit/1c90eb34085dbb69f37e2f63dea7496afabb06b3#diff-516904cc81cade24a9122ecf96707bf0R702 | 13:17 |
mriedem | right | 13:17 |
mriedem | (8:16:28 AM) mriedem: and instance_name should map to 'name' | 13:17 |
*** psachin has quit IRC | 13:17 | |
mriedem | so when was that removed? | 13:17 |
alex_xu | mriedem: probably folsom release, I see that filter in that relese, but disappear after grizzle | 13:20 |
*** eharney has quit IRC | 13:26 | |
alex_xu | mriedem: tssurya here https://review.openstack.org/#/c/10917/3 | 13:26 |
mriedem | ah ok, and forgot to remove the novaclient side of that | 13:27 |
tssurya | ah thanks | 13:28 |
mriedem | and apparently no one has noticed since folsom | 13:28 |
*** erlon has joined #openstack-nova | 13:28 | |
openstackgerrit | Chris Dent proposed openstack/nova master: Set policy_opt defaults in placement gabbi fixture https://review.openstack.org/594172 | 13:28 |
mriedem | https://bugs.launchpad.net/python-novaclient/+bug/1295126/comments/3 | 13:29 |
openstack | Launchpad bug 1295126 in python-novaclient "Admin only shown for args that can be used by non-admin" [Wishlist,Fix released] - Assigned to Verónica Musso (veronica-a-musso) | 13:29 |
mriedem | "and --instance-name has no effect for both" | 13:29 |
mriedem | tssurya: alex_xu: i'd probably just deprecate the --instance-name option in nova list, it's not done anything since essex | 13:30 |
mriedem | adding the support server-side at this point is likely a microversion | 13:30 |
alex_xu | mriedem: yea, and it is admin-only filter so we can deprecate it | 13:30 |
tssurya | mriedem: ack, I don't think its that essential a filter | 13:30 |
tssurya | its not an admin-only.. | 13:31 |
alex_xu | tssurya: instance_name field only can be see by the admin? | 13:33 |
mriedem | no, | 13:35 |
mriedem | OS-EXT-SRV-ATTR:instance_name is also shown for non-admins | 13:35 |
mriedem | it's in ExtendedServerAttributesController | 13:35 |
mriedem | oh wait no alex_xu is correc | 13:36 |
mriedem | *correct | 13:36 |
mriedem | os_compute_api:os-extended-server-attributes defaults to admin-only | 13:36 |
mriedem | https://docs.openstack.org/nova/latest/configuration/policy.html | 13:37 |
*** efried_goatin is now known as efried | 13:37 | |
mriedem | stephenfin: see https://docs.openstack.org/nova/latest/configuration/policy.html and os_compute_api:os-extended-server-attributes - i thought we had restructured text formatting on policy option help? | 13:38 |
mriedem | maybe that's only in oslo.config option help? | 13:38 |
mriedem | efried: ? ^ | 13:38 |
*** ccamacho has quit IRC | 13:38 | |
efried | mriedem: Patch not merged. Lemme grab it... | 13:39 |
tssurya | oh, well its confusing because the help for the options doesn't say its Admin only and the bug above ^ says they changed it: https://bugs.launchpad.net/python-novaclient/+bug/1295126/comments/6 | 13:39 |
openstack | Launchpad bug 1295126 in python-novaclient "Admin only shown for args that can be used by non-admin" [Wishlist,Fix released] - Assigned to Verónica Musso (veronica-a-musso) | 13:39 |
mriedem | tssurya: yup, | 13:39 |
mriedem | despite that one person saying it was never even used | 13:39 |
mriedem | tssurya: so just report a bug and deprecate --instance-name | 13:39 |
mriedem | i'll +2 that | 13:39 |
tssurya | mriedem: cool | 13:39 |
mriedem | it predates gerrit so i'm not surprised it's a mess | 13:40 |
stephenfin | mriedem: Yeah, just oslo.config, I think | 13:40 |
stephenfin | though I had it in my head oslo.policy wasn't broken in the first place. Obviously not | 13:40 |
efried | mriedem: | 13:41 |
efried | - nova patch to twiddle a couple of options to prove it works | 13:41 |
efried | - oslo.config patch to address complaint that using the rst role in help text shows up ugly in the sample: https://review.openstack.org/#/c/583064/ | 13:41 |
efried | https://review.openstack.org/#/c/583025/ shoulda been that first link, sorry | 13:41 |
stephenfin | efried: I think that's a different issue | 13:41 |
stephenfin | efried: mriedem's asking why newlines and the likes in policy.help aren't being parsed | 13:42 |
stephenfin | ...in the HTML output. Your patch affects the ini output, right? | 13:42 |
tssurya | okay, so instance.display_name is name and instance.hostname is OS-EXT-SRV-ATTR:hostname and we don't care about OS-EXT-SRV-ATTR:instance_name. | 13:43 |
efried | stephenfin: the oslo.config patch, yes. | 13:43 |
efried | oh, reread what mriedem was actually saying. Yeah, I don't know about that, sorry. | 13:44 |
efried | I would have asked stephenfin :) | 13:44 |
mriedem | ha | 13:44 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Filter out instances without a host when populating AZ https://review.openstack.org/594050 | 13:44 |
mriedem | ^ is likely an RC3 issue | 13:45 |
mriedem | regarding petr's email about install guide testing, | 13:47 |
stephenfin | mriedem: Agreed | 13:47 |
mriedem | i wonder how valid it is, or time would be saved, by starting up devstack but not enabling nova, so that you can do that manually after keystone/glance/cinder/neutron are already setup | 13:48 |
mriedem | i think the only major thing in the install guide in rocky was the placement db | 13:48 |
sean-k-mooney | mriedem: i think you will hit depency issues | 13:49 |
*** awaugama has joined #openstack-nova | 13:49 | |
mriedem | on other openstack services? | 13:50 |
mriedem | or things like setting up libvirt? | 13:50 |
sean-k-mooney | well neutron would expect to be able to talks to placement for things like routed networks | 13:50 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Filter out instances without a host when populating AZ https://review.openstack.org/594178 | 13:50 |
mriedem | routed networks are optional and devstack doesn't set those up anyway | 13:50 |
mriedem | we definitely *should* have a ci job that uses routed networks | 13:51 |
mriedem | across a 2-node deploy where each host is in a separate aggregate | 13:51 |
mriedem | but that would require time and people that care to make sure it continues to work | 13:51 |
sean-k-mooney | in theroy devstack should be able to help i guess | 13:52 |
*** gbarros has joined #openstack-nova | 13:52 | |
sean-k-mooney | mriedem: is placement installation considered part of the nova install guide | 13:52 |
*** moshele has quit IRC | 13:53 | |
*** alexchadin has quit IRC | 13:55 | |
dansmith | tssurya: looks like the down-cell stack needs rebasing again | 13:55 |
dansmith | presumably its review-able regardless? | 13:56 |
efried | mriedem: https://review.openstack.org/594179 <== alternative uuidsentinel impl | 13:56 |
tssurya | dansmith: yea, its ready for a first time review | 13:56 |
tssurya | still working on filtering part | 13:56 |
dansmith | okay | 13:56 |
tssurya | but would be nice to get opinions | 13:56 |
tssurya | I have them as seperate patches for now, will squash them with the version BUMP | 13:56 |
tssurya | once we review the approach | 13:57 |
tssurya | and, mriedem: sorry about missing the instance.host None case earlier on and the backport headaches. | 13:57 |
openstackgerrit | Jiri Suchomel proposed openstack/nova stable/pike: Filter out instances without a host when populating AZ https://review.openstack.org/594184 | 13:58 |
*** alexchadin has joined #openstack-nova | 13:58 | |
openstackgerrit | Surya Seetharaman proposed openstack/nova stable/queens: Filter out instances without a host when populating AZ https://review.openstack.org/594185 | 14:00 |
stephenfin | mriedem: https://bugs.launchpad.net/oslo.policy/+bug/1788183 | 14:02 |
openstack | Launchpad bug 1788183 in oslo.policy "Rule description not rendered as rST" [Undecided,New] | 14:02 |
mriedem | tssurya: not your fault, we have reviewers for a reason | 14:04 |
mriedem | and i obviously missed it as well | 14:04 |
mriedem | sean-k-mooney: i think so yes | 14:04 |
mriedem | efried: why not in oslotest? because of the circular dep? | 14:05 |
efried | mriedem: And because it's... a UUID util. And because just because I can't think of a reason for it to be used outside of test, doesn't mean it can't be. See commit message. | 14:05 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: Filter out instances without a host when populating AZ https://review.openstack.org/594185 | 14:06 |
sean-k-mooney | mriedem: then in that case if you wanted to test the nova install guide, and use devstack to help, you would just have devstack install keystone,mysql, and rabbitmq. and perhapse memcached | 14:07 |
mriedem | but nova also needs neutron | 14:07 |
mriedem | and i don't want to go through the neutron install guide to test nova's install | 14:07 |
mriedem | same for glance | 14:07 |
mriedem | cue kevin fox to say it should all be one monolithic install | 14:08 |
*** ccamacho has joined #openstack-nova | 14:08 | |
sean-k-mooney | well i would not expect the install guide for nova to cover the glance or neutron parts | 14:09 |
sean-k-mooney | i also would not assmue you could boot a vm after finishing it | 14:09 |
sean-k-mooney | i would just assumne i had the nova compontes deployed and fuctioning | 14:09 |
sean-k-mooney | e.g. nova hypervior list should show all the resouces but openstack server create would fail | 14:10 |
mriedem | well, if i'm installing nova, i would like to be able to create a vm by the end of it | 14:10 |
mriedem | otherwise i don't know if i f'ed up the install somewhere | 14:10 |
sean-k-mooney | in that case it does have to be a multi service install guide | 14:11 |
mriedem | also, https://docs.openstack.org/nova/latest/install/controller-install-ubuntu.html#install-and-configure-components refers to the neutron install guide | 14:11 |
*** alexchadin has quit IRC | 14:13 | |
sean-k-mooney | i guess refering to the other guide also works. that said untill nova networks if fully dead neutron is technical not a nova depency | 14:13 |
sean-k-mooney | but i could see adding neutron to the devstack install. | 14:13 |
*** alexchadin has joined #openstack-nova | 14:13 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/pike: Filter out instances without a host when populating AZ https://review.openstack.org/594184 | 14:14 |
sean-k-mooney | glace i guess would also be required because there is no way to boot a vm otherwise. unless you used the fake drivers | 14:14 |
*** Luzi has quit IRC | 14:23 | |
*** eharney has joined #openstack-nova | 14:23 | |
*** mlavalle has joined #openstack-nova | 14:25 | |
*** vivsoni has quit IRC | 14:25 | |
*** lbragstad has quit IRC | 14:26 | |
*** munimeha1 has joined #openstack-nova | 14:27 | |
mriedem | jroll: pretty sure this has always been true yeah? https://bugs.launchpad.net/nova/+bug/1787509 | 14:27 |
openstack | Launchpad bug 1787509 in OpenStack Compute (nova) "Baremetal filters and default filters cannot be used simultaneously in the same nova" [Undecided,New] | 14:27 |
mriedem | until pike anyway | 14:27 |
dansmith | mriedem: so reviewing tssurya's series just now made me (re-)realize | 14:34 |
dansmith | we're still iterating all of the instances from a cell before returning them in order to do the fault stuff | 14:35 |
mriedem | dansmith: i think when you were adding instance lister, | 14:35 |
mriedem | you pre-joined faults in the db api and it didn't seem to make a difference in perf | 14:35 |
mriedem | and it might have caused some other issue | 14:35 |
dansmith | so I'm surprised we gained as much as we did by my batching, and so I wonder if we push the faults into the batches if it would help | 14:36 |
dansmith | mriedem: yeah, I remember that now | 14:36 |
mriedem | we also only show fault if the vm state is ERROR or DELETED | 14:36 |
dansmith | right | 14:36 |
*** Luzi has joined #openstack-nova | 14:36 | |
mriedem | well this bug says nova list is too slow https://bugs.launchpad.net/nova/+bug/1788149 | 14:37 |
openstack | Launchpad bug 1788149 in OpenStack Compute (nova) "nova list too slow" [Undecided,New] | 14:37 |
mriedem | so there is that | 14:37 |
*** Luzi has quit IRC | 14:37 | |
dansmith | heh | 14:37 |
tssurya | nice | 14:38 |
mriedem | alex_xu: did you say gmann was on vacation? https://review.openstack.org/#/c/584223/ | 14:41 |
mriedem | (8:09:57 AM) alex_xu: gmann: enjoy your vacation! | 14:41 |
tssurya | mriedem: yes untill 31st | 14:41 |
tssurya | until* | 14:42 |
sean-k-mooney | mriedem: regarding the migration issue i dont see "Binding ports to destination host" in either the source or dest compute logs | 14:45 |
sean-k-mooney | the dest does have "Plugging VIFs using destination host port bindings before live migration." and "Deleted binding for port 3218fd70-ea82-4ee1-9a5b-2d3c9d8b9fa0 and host devstack2." | 14:46 |
mriedem | the former is when we do pre_live_migration on the dest host, | 14:46 |
mriedem | at that point port bindings are still active for the source host | 14:46 |
mriedem | the latter is when we're rolling back after the failed migration | 14:47 |
mriedem | so i'm not sure that my patch would fix your issue if we never deactivated the source host bindings | 14:47 |
*** r-daneel has joined #openstack-nova | 14:47 | |
mriedem | that's why i was saying i'd be surprised if it fixed it b/c it would mean the failure happened after post-copy | 14:47 |
sean-k-mooney | ya. i expected to see boot. could deleteing the port bindings be the root of the issue | 14:47 |
mriedem | maybe? | 14:47 |
mriedem | maybe neutron f's up and screws up the active source host port binding? | 14:48 |
sean-k-mooney | ill apply the patch in anycase and see what happens | 14:48 |
s10 | mriedem: bug https://bugs.launchpad.net/nova/+bug/1788149 could be eventlet related and maybe caused by nova-neutron connection on every nova show/nova list (https://bugs.launchpad.net/nova/+bug/1567655) | 14:50 |
openstack | Launchpad bug 1788149 in OpenStack Compute (nova) "nova list too slow" [Undecided,Incomplete] | 14:50 |
openstack | Launchpad bug 1567655 in OpenStack Compute (nova) "500 error when trying to list instances and neutron-server is down" [Medium,Confirmed] | 14:50 |
mriedem | s10: hmm yeah good point re eventlet | 14:51 |
mriedem | also if they are running nova-api via wsgi in pike we aren't monkey patching eventlet | 14:51 |
*** gbarros has quit IRC | 14:52 | |
mriedem | and yeah https://bugs.launchpad.net/nova/+bug/1567655 came up again with my product team overlords last week | 14:52 |
openstack | Launchpad bug 1567655 in OpenStack Compute (nova) "500 error when trying to list instances and neutron-server is down" [Medium,Confirmed] | 14:52 |
mriedem | regarding perf/scaling issues with nova api | 14:52 |
*** gbarros has joined #openstack-nova | 14:52 | |
mriedem | tl;dr we should cache the port security group information in instance.info_cache like we do for other port information | 14:52 |
*** ccamacho has quit IRC | 14:53 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Merge config drive extension response into server controller https://review.openstack.org/584223 | 14:53 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Merge extended server attributes extension response https://review.openstack.org/584590 | 14:53 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Merge keypair extension response into server view builder https://review.openstack.org/584748 | 14:53 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Merge server usage extension response into server view builder https://review.openstack.org/585262 | 14:53 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Merge security groups extension response into server view builder https://review.openstack.org/585475 | 14:53 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Merge extended_status extension response into server view builder https://review.openstack.org/592092 | 14:53 |
openstackgerrit | Dan Smith proposed openstack/nova master: WIP: Record cell success/failure/timeout in CrossCellLister https://review.openstack.org/594265 | 14:56 |
dansmith | tssurya: ^ | 14:56 |
*** alexchadin has quit IRC | 14:56 | |
dansmith | tssurya: what if we do that, and then in get_instance_objects_sorted() (or above) we just get a handle on InstanceLister itself | 14:56 |
dansmith | tssurya: then we can construct the missing instances from the failed cells separately from the intricate multi-cell-listing logic? | 14:57 |
tssurya | dansmith: sounds good, guess its better to move it out of the generator | 14:57 |
*** alexchadin has joined #openstack-nova | 14:57 | |
sean-k-mooney | mriedem: this is really annoying. im not seeing the issue going from ovs-dpdk to kernel ovs. the migration failes but the binding are fine. i was hitting this going form ovs to ovs-dpdk when i reported the bug so ill test that next. | 14:58 |
dansmith | tssurya: yeah I actually would like it higher than get_instance_objects_sorted(), but definitely want it out of the low-level logic if possible | 14:58 |
tssurya | dansmith: when you say higher than get_instance_objects_sorted(), then you mean we return the list of non-responsive cells instead from ^^ patch and constrcut it in compute/api ? | 15:01 |
tssurya | construct* | 15:01 |
dansmith | tssurya: yes, I'd like that better personally, just to keep "api logic" closer to the api | 15:01 |
dansmith | tssurya: maybe just return a tuple from get_instance_objects_sorted() indicating (failed_cell_uuids, instances_i_actually_got) | 15:02 |
*** lbragstad has joined #openstack-nova | 15:02 | |
*** dpawlik has quit IRC | 15:03 | |
*** alexchadin has quit IRC | 15:03 | |
tssurya | dansmith: okay looks doable, I will try it out | 15:03 |
dansmith | cool | 15:04 |
tssurya | thanks for the review btw, was getting kind of lost in the details | 15:04 |
tssurya | :) | 15:04 |
dansmith | tssurya: no problem, this is complicated and I've been neglecting it too long | 15:04 |
tssurya | hehe :D | 15:04 |
openstackgerrit | Jose Castro Leon proposed openstack/nova master: Add extend in-use volumes support for RBD https://review.openstack.org/594273 | 15:04 |
jroll | mriedem: yeah, death to the baremetal filters | 15:05 |
mriedem | i left comments on the bug, | 15:05 |
mriedem | was mostly looking for input on how people have done vm/bm in a single compute endpoint, i assume host aggregates | 15:05 |
mriedem | but i've heard there are also quota issues when doing it that way | 15:06 |
*** pcaruana has quit IRC | 15:09 | |
*** gbarros has quit IRC | 15:09 | |
*** gbarros has joined #openstack-nova | 15:11 | |
*** priteau has joined #openstack-nova | 15:14 | |
sean-k-mooney | mriedem: you can use capablityes in the flavor to avoid the need for host aggregates | 15:15 |
sean-k-mooney | not sure how many people go that route vs AZs or host aggregates | 15:16 |
*** sambetts|afk is now known as sambetts | 15:16 | |
sean-k-mooney | atully with more recent releases you can just use resouce classes + dedicated baremetal or vm flavor and let placement handel it | 15:17 |
mriedem | that's why the baremetal filter options were deprecated in pike and removed in rocky | 15:18 |
*** s10 has quit IRC | 15:18 | |
mriedem | which is why i marked the bug as won't fix | 15:18 |
*** mhen has quit IRC | 15:18 | |
*** mhen has joined #openstack-nova | 15:20 | |
dansmith | mriedem: do you know if Kevin_Zheng and yikun_ are still working on tests? because I think if they have no ERROR instances, I could hack up a generator they could test to compare apples to apples on whether or not that object list loop could go faster | 15:20 |
jrock_cfdg | hello - i'm trying to add a serial device with specific paramaters to an instance at creation time (source mode=connect host=0.0.0.0 port=4555) ; I think i've narrowed it down to these 3 scripts (/usr/lib/python-2.7/site-packages/nova/virt/libvirt/{config,driver,guest}.py - which is the correct place to make this change? And has anyone here done anything like this and maybe have some examples? | 15:20 |
mriedem | idk | 15:20 |
dansmith | meaning still have their profiling setup accessible or whatevef | 15:20 |
dansmith | okay | 15:20 |
mriedem | i'm sure it's still setup | 15:21 |
Kevin_Zheng | we can still test | 15:21 |
mriedem | it's just a bash script on a devstack deploy on a baremetal host | 15:21 |
dansmith | Kevin_Zheng: ohai | 15:21 |
mriedem | the lurker | 15:21 |
mriedem | oh right, monday, tuesday thursday are work late days for kevin and yikun | 15:21 |
dansmith | ah | 15:22 |
dansmith | Kevin_Zheng: I assume all your test instances are ACTIVE or something right? | 15:22 |
*** moshele has joined #openstack-nova | 15:22 | |
Kevin_Zheng | Yes | 15:22 |
Kevin_Zheng | All active | 15:22 |
dansmith | Kevin_Zheng: so right here, we iterate all the instances: https://github.com/openstack/nova/blob/master/nova/compute/instance_list.py#L124-L126 | 15:22 |
dansmith | Kevin_Zheng: and so I'm wondering if removing that would also help your perf a bit.. the problem is we have to handle faults, which are handled in that list method right now | 15:23 |
*** alex_xu has quit IRC | 15:23 | |
mriedem | maciejjozefczyk: if you're around https://review.openstack.org/#/c/591607/ | 15:23 |
dansmith | Kevin_Zheng: but with the batching, we *might* be better off doing that in the batches instead of at the top to reduce latency | 15:23 |
mriedem | maciejjozefczyk: our public cloud ops team reported the same issue | 15:24 |
Kevin_Zheng | So instead all instances, we do what? | 15:24 |
dansmith | Kevin_Zheng: well, we'd do it in the batch handler, so we fill faults on ~100 instances at a time in "parallel" instead of on 1000 instances serially | 15:24 |
mriedem | efried: i guess we can land this now huh https://review.openstack.org/#/c/520024/ | 15:25 |
Kevin_Zheng | Oh OK | 15:25 |
dansmith | Kevin_Zheng: sounds like if I come up with a test patch you could run it again and compare to without the patch just to see if it helps or hurts? | 15:25 |
Kevin_Zheng | Guess I have to generate some error instance then | 15:25 |
cdent | yay! on 520024 | 15:26 |
Kevin_Zheng | Yeah we can do it | 15:26 |
mriedem | you insert them right into the cell db right? | 15:26 |
Kevin_Zheng | Yes | 15:26 |
dansmith | Kevin_Zheng: well, the first test would still be all active, just to measure what the perf impact of unrolling that loop is | 15:26 |
dansmith | Kevin_Zheng: then we'd test a patch with some error instances to see if we lose all of that with the fault handling, or only a fraction of the gain we made | 15:27 |
*** Bhujay has quit IRC | 15:27 | |
Kevin_Zheng | Ok | 15:27 |
dansmith | Kevin_Zheng: I'll try cooking something up and will add you to the review | 15:30 |
Kevin_Zheng | Cool, I will go to bed and check in the morning | 15:30 |
dansmith | thanks | 15:31 |
sean-k-mooney | mriedem: so regarding the live migration bug. the source node is activating the binding on the dest host binding after the migration aborts and this is also racing with the deltion of the binding on the dest host ... | 15:33 |
efried | mriedem: Yes, on 024, thanks. | 15:33 |
*** gbarros has quit IRC | 15:33 | |
*** gbarros has joined #openstack-nova | 15:34 | |
mriedem | sean-k-mooney: so we're hitting post-copy and then aborting? | 15:37 |
mriedem | there are only 2 places that live migration activates the dest host port binding: | 15:38 |
mriedem | 1. post-copy event callback from libvirt | 15:38 |
mriedem | 2. _post_live_migration after the hypervisor said the live migration was successful | 15:38 |
*** dklyle has quit IRC | 15:41 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Explicitly fail if trying to attach SR-IOV port https://review.openstack.org/591898 | 15:42 |
*** luzC has quit IRC | 15:42 | |
sean-k-mooney | mriedem: http://paste.openstack.org/show/728534/ | 15:42 |
dansmith | ugh, the expectation that we return an instancelist from get_all makes this harder than I thought | 15:42 |
sean-k-mooney | i think we are geting an updat form neutron and that is trigering the activate. perhaps hitting the _pos_live_migration code | 15:42 |
*** dklyle has joined #openstack-nova | 15:44 | |
sean-k-mooney | mriedem: lines 43-50 are teh ones im suspicous of | 15:45 |
mriedem | a neutron event wouldn't make us activate a port | 15:45 |
mriedem | just refresh the info cache | 15:45 |
mriedem | Aug 21 16:19:17 devstack2 nova-compute[25894]: WARNING nova.compute.manager [None req-594840ec-7af2-47d2-929b-cef9dda07bb8 service nova] [instance: fead1ca6-beab-4c47-a73e-a3ab7f7c4de2] Received unexpected event network-vif-unplugged-ef02ea3f-9a11-4519-bcd3-2bfca97edf26 for instance with vm_state active and task_state migrating. | 15:46 |
mriedem | means we ignore it | 15:46 |
sean-k-mooney | hum ok well on line 50 we activate the port binding for devstack5 which was the destination node. but the migration has already aborted | 15:47 |
mriedem | hmm | 15:47 |
mriedem | Aug 21 16:19:17 devstack2 nova-compute[25894]: DEBUG nova.network.neutronv2.api [None req-c8b07cbc-52f7-4d20-aacc-f3036ad90c8d None None] Activated binding for port ef02ea3f-9a11-4519-bcd3-2bfca97edf26 and host devstack5. {{(pid=25894) activate_port_binding /opt/stack/nova/nova/network/neutronv2/api.py:1352}} | 15:47 |
mriedem | indeed | 15:48 |
*** luksky has quit IRC | 15:48 | |
mriedem | sean-k-mooney: do you see any "(Lifecycle Event)" messages right before the traceback on the source node? | 15:50 |
sean-k-mooney | checking | 15:50 |
mriedem | should have also seen "Binding ports to destination host" if it was handle_lifecycle_event was what was activating the binding | 15:51 |
mriedem | er, | 15:51 |
mriedem | sean-k-mooney: are these the logs before or after my patch from a few hours ago? | 15:51 |
sean-k-mooney | before. and ya the migration competes... | 15:53 |
sean-k-mooney | ill paste the log section | 15:53 |
mriedem | so you're seeing the "Migration completed" lifecycle event | 15:53 |
mriedem | ? | 15:53 |
mriedem | maybe that's sent in both failure and success cases | 15:54 |
sean-k-mooney | http://paste.openstack.org/show/728539/ | 15:54 |
mriedem | bingo | 15:54 |
mriedem | Aug 21 16:19:16 devstack2 nova-compute[25894]: INFO nova.compute.manager [None req-c8b07cbc-52f7-4d20-aacc-f3036ad90c8d None None] [instance: fead1ca6-beab-4c47-a73e-a3ab7f7c4de2] VM Migration completed (Lifecycle Event) | 15:54 |
sean-k-mooney | line 25 is the completion and line 31 is the failue | 15:55 |
mriedem | Aug 21 16:19:17 devstack2 nova-compute[25894]: DEBUG nova.compute.manager [None req-c8b07cbc-52f7-4d20-aacc-f3036ad90c8d None None] [instance: fead1ca6-beab-4c47-a73e-a3ab7f7c4de2] Binding ports to destination host: devstack5 {{(pid=25894) handle_lifecycle_event /opt/stack/nova/nova/compute/manager.py:1130}} | 15:55 |
sean-k-mooney | ya i just saw that too | 15:55 |
mriedem | Aug 21 16:19:17 devstack2 nova-compute[25894]: ERROR nova.virt.libvirt.driver [-] [instance: fead1ca6-beab-4c47-a73e-a3ab7f7c4de2] Live Migration failure: internal error: qemu unexpectedly closed the monitor: 2018-08-21T15:19:15.187710Z qemu-kvm: -chardev socket,id=charnet0,path=/var/run/openvswitch/vhuef02ea3f-9a,server: info: QEMU waiting | 15:55 |
mriedem | yeah so the driver is sending the 'migration completed' event even though the job failed | 15:55 |
mriedem | that's the bug | 15:55 |
mriedem | and that's why we are activating the dest host port bindings on failure | 15:55 |
mriedem | and then deleting them in rollback :) | 15:56 |
sean-k-mooney | ya. so libvirt bug? | 15:56 |
mriedem | libvirt driver bug yeah | 15:56 |
*** gyee has joined #openstack-nova | 15:56 | |
sean-k-mooney | well the live migration completion event is comming from libvirt no? | 15:57 |
mriedem | yes, but the driver should check the job status to see if it failed or not | 15:57 |
mriedem | if we can | 15:57 |
mriedem | otherwise i don't think we can rely on that event | 15:57 |
sean-k-mooney | let me see if danpb is about | 15:57 |
openstackgerrit | Chris Dent proposed openstack/nova master: Set policy_opt defaults in placement deploy unit test https://review.openstack.org/594334 | 15:57 |
*** danpb has joined #openstack-nova | 15:58 | |
*** itlinux has joined #openstack-nova | 15:59 | |
sean-k-mooney | danpb: thanks. am regarding http://paste.openstack.org/show/728539/. does the live migration completion event from libvirt have a status we can check for failures? | 15:59 |
mriedem | i've updated https://review.openstack.org/#/c/594139/ with comments | 15:59 |
danpb | sean-k-mooney: you summoned me :-) | 15:59 |
mriedem | how many goats had to be sacrificed? | 16:00 |
sean-k-mooney | haha TBD | 16:00 |
*** macza has joined #openstack-nova | 16:00 | |
danpb | sean-k-mooney: you have any more context than just that log file ? | 16:01 |
sean-k-mooney | danpb: yes im testing live migration between ovs to ovs-dpdk in this case | 16:01 |
sean-k-mooney | that is causing qemu to have an internal error | 16:02 |
danpb | yep, looks like QEMU on target saw error in expected state & exited | 16:02 |
mdbooth | mriedem: I think if we've reached the point of post-copy, we shouldn't rollback. | 16:02 |
sean-k-mooney | nova is assuming that when we get the live migration complete event that everything worked fine bug in this case qemu explodes and the migration failes | 16:02 |
danpb | whcih should have caused libvirt to abort migration & nova to rollback | 16:02 |
mdbooth | mriedem: Because the guest was actually running on the destination at that point. | 16:02 |
mdbooth | Guessing that could be tricky with the current code structure, though... | 16:03 |
danpb | mdbooth: this logfile isn't showing post-coyp is it ? looks like normal pre-copy to me | 16:03 |
sean-k-mooney | mdbooth: the guest is still running fine after this. is netwroking is messed up but its still running on the host | 16:03 |
mriedem | mdbooth: we aren't hitting post-copy | 16:03 |
mdbooth | danpb: We could be talking about different things. I was referring to https://review.openstack.org/#/c/594139/ | 16:03 |
danpb | oh fun, two different live migration discussions in parallel :-) | 16:04 |
sean-k-mooney | mdbooth: danpb its the same one | 16:04 |
mriedem | danpb: so this is all new code since you've been in nova | 16:04 |
sean-k-mooney | i reproduced it | 16:04 |
mriedem | we're just assuming "migration completed" means it was successful, which is wrong in this case | 16:04 |
*** sahid has quit IRC | 16:05 | |
mriedem | so just need to not send that event callback up to the compute manager if the job failed | 16:05 |
mriedem | which i think we can glean from the jobState object | 16:05 |
mriedem | assuming that info is available to us in the params to _event_lifecycle_callback | 16:05 |
mriedem | i'm not sure that it does though, we get event and detail | 16:06 |
mriedem | but not the job status | 16:06 |
danpb | mriedem: which migration events are you referring to ? | 16:07 |
mdbooth | I don't think we're actually consuming events. We're polling the migration job. | 16:07 |
mriedem | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L179-L184 | 16:07 |
mriedem | mdbooth: no | 16:07 |
mriedem | not for this | 16:08 |
danpb | oh, so you're just looking at the lifecycle events | 16:08 |
mdbooth | mriedem: ack. Was looking at _live_migration_monitor. | 16:08 |
danpb | i'm not convinced that's not a desirable way to determine success vs failure | 16:08 |
mriedem | we're getting VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED | 16:09 |
mriedem | and assuming it's success | 16:09 |
danpb | the the job status from the _live_migration_monitor is better way to check for failure | 16:09 |
mriedem | yeah, but we're on different threads here | 16:09 |
mriedem | we have the domain, could we get the jobState from that? | 16:09 |
danpb | mriedem: that VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED just says that the guest has been paused, as a result of the live migration operation | 16:09 |
danpb | it doesn't say anything about the operation being success or failure | 16:10 |
mriedem | right, and that's our bug :) | 16:10 |
danpb | so you definitely can't assume success from that | 16:10 |
mriedem | right, | 16:11 |
mriedem | so i can remove that to fix this quick | 16:11 |
mriedem | or try to find the jobState from the domain and check the status? | 16:11 |
dansmith | Kevin_Zheng: okay I've changed my mind for the moment.. the api code is so generator-unfriendly that a quick hack to test this is more involved than I thought | 16:11 |
danpb | mriedem: if there's some action that needs to take place during the migration operation | 16:11 |
danpb | mriedem: then my gut feeling would be to hav the _live_migration_monitor thread take care of it | 16:12 |
sean-k-mooney | mriedem: well im not sure we need to change that code. where is the EVENT_LIFECYCLE_MIGRATION_COMPLETED event consumed because we have stopped moving stuff at this point we jsut dont know if it succeded | 16:12 |
mriedem | danpb: yeah most likely - and that's inline with what dansmith said on the review for this change | 16:12 |
mriedem | since it was baking libvirt logic into the compute manager lifecycle callback handler | 16:12 |
mriedem | sean-k-mooney: ComputeManager.handle_lifecycle_event | 16:13 |
danpb | if the lifecycle events are needed, then forward those onto that thread too | 16:13 |
sean-k-mooney | mriedem: so what we actully need to do is check the job status here https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1126-L1139 | 16:15 |
mriedem | we're not going to do that in the compute manager | 16:16 |
*** adrianc has quit IRC | 16:16 | |
openstackgerrit | Dan Smith proposed openstack/nova master: Batch results per cell when doing cross-cell listing https://review.openstack.org/592698 | 16:16 |
openstackgerrit | Dan Smith proposed openstack/nova master: List instances from all cells explicitly https://review.openstack.org/593717 | 16:16 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make instance_list perform per-cell batching https://review.openstack.org/593131 | 16:16 |
openstackgerrit | Eric Fried proposed openstack/nova master: [placement] Add /reshaper handler for POST https://review.openstack.org/576927 | 16:16 |
openstackgerrit | Eric Fried proposed openstack/nova master: reshaper: Look up provider if not in inventories https://review.openstack.org/585033 | 16:16 |
openstackgerrit | Eric Fried proposed openstack/nova master: Make get_allocations_for_resource_provider sane https://review.openstack.org/584598 | 16:17 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: Real get_allocs_for_consumer https://review.openstack.org/584599 | 16:17 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: get_allocations_for_provider_tree https://review.openstack.org/584648 | 16:17 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: _reshape helper, placement min bump https://review.openstack.org/585034 | 16:17 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: update_from_provider_tree w/reshape https://review.openstack.org/585049 | 16:17 |
openstackgerrit | Eric Fried proposed openstack/nova master: Compute: Handle reshaped provider trees https://review.openstack.org/576236 | 16:17 |
openstackgerrit | Eric Fried proposed openstack/nova master: [placement] Regex consts for placement schema https://review.openstack.org/591863 | 16:17 |
danpb | mriedem: yeah you'd want to check status in the libvirt driver, and if some action is required in the compute manager, trigger some callout for the compute manager to act on i guess | 16:17 |
mriedem | how does one even determine job status based on https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainJobInfo ? | 16:18 |
mriedem | https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainJobType ? | 16:18 |
*** jpena is now known as jpena|off | 16:19 | |
openstackgerrit | Eric Fried proposed openstack/nova master: [placement] Regex consts for placement schema https://review.openstack.org/591863 | 16:19 |
openstackgerrit | Eric Fried proposed openstack/nova master: [placement] Add /reshaper handler for POST https://review.openstack.org/576927 | 16:19 |
openstackgerrit | Eric Fried proposed openstack/nova master: reshaper: Look up provider if not in inventories https://review.openstack.org/585033 | 16:20 |
openstackgerrit | Eric Fried proposed openstack/nova master: Make get_allocations_for_resource_provider sane https://review.openstack.org/584598 | 16:20 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: Real get_allocs_for_consumer https://review.openstack.org/584599 | 16:20 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: get_allocations_for_provider_tree https://review.openstack.org/584648 | 16:20 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: _reshape helper, placement min bump https://review.openstack.org/585034 | 16:20 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: update_from_provider_tree w/reshape https://review.openstack.org/585049 | 16:20 |
openstackgerrit | Eric Fried proposed openstack/nova master: Compute: Handle reshaped provider trees https://review.openstack.org/576236 | 16:20 |
mriedem | ah nvm i see how we get this info in nova | 16:21 |
danpb | mriedem: yeah the job type field is what we're hooking off | 16:21 |
mriedem | yup | 16:21 |
mriedem | elif info.type == libvirt.VIR_DOMAIN_JOB_FAILED: | 16:21 |
mriedem | danpb: alright thanks i think i know what to do here, | 16:22 |
mriedem | sean-k-mooney: i probably won't have something for you to test by your eod | 16:22 |
mriedem | although your eod varies wildly | 16:22 |
mriedem | but i'm in serious need of a shower and lunch at this point....i'm devolving | 16:23 |
mdbooth | mriedem danpb: IIRC we encountered limitations with this in the block rebase operation. Isn't there a race with the job info disappearing? If the job is no longer present, we no longer know if it failed or not, and the solution was supposed to be to consume events? | 16:23 |
sean-k-mooney | haha yes it does today i need to drive home which is an hour and a half away so ill be leave shortly. if you have something ill test it as soon as im back online | 16:23 |
mdbooth | Yeah, I wrote one of my comment essays about it | 16:24 |
mdbooth | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L827-L838 | 16:25 |
sean-k-mooney | mdbooth: right. am can we check if the domain is still present on the source node? if it is it would mean it failed right? | 16:28 |
danpb | mdbooth: with new enough libvirt the job will stick around | 16:31 |
danpb | mdbooth: with older libvirt the _live_Migration_monitor code has heuristic to try to figure out if no-job == failed vs success | 16:32 |
mdbooth | danpb: I got the impression at the time that eric was piling on heuristics in there for us, but really we weren't supposed to be doing that. Sounds like that's out of date? | 16:33 |
*** panda is now known as panda|off | 16:33 | |
danpb | mdbooth: what do you mean ? | 16:34 |
mdbooth | There was also the heuristic for status.end. | 16:36 |
mdbooth | I just got the strong impression at the time that consuming events was the intended approach here. | 16:36 |
mdbooth | If we've sat on the problem for that to be out of date... result :) | 16:36 |
*** dtantsur is now known as dtantsur|afk | 16:38 | |
mriedem | mdbooth: if i get no job, i'm going to not send the callback event to compute manager to trigger the port binding activation, | 16:38 |
mriedem | because worst case is the job failed and we're screwing up networking, which is what sean is seeing, | 16:39 |
mriedem | best case is we don't know, but post live migration will still activate the port bindings, | 16:39 |
mriedem | you just have a bigger window of network downtime | 16:39 |
mriedem | *plus*, if the job was successful and we go into post-copy, we activate the port bindings then too | 16:39 |
*** s10 has joined #openstack-nova | 16:39 | |
mriedem | i'm fairly certain this is 100% fool proof and will forever be bug free | 16:40 |
*** s10 has quit IRC | 16:46 | |
tssurya | dansmith: would you prefer me returnng (1) the failed_cell_uuids from get_instance_objects_sorted only if cell_down_support is set ? or (2) you don't want this flag creeping down even to that level and so we just return the tuple under all conditions ? | 16:47 |
tssurya | and deal with it in the api | 16:47 |
tssurya | I am asking because its called "get_instance_objects_sorted" and returning the tuple under all conditions kind of might be weird ? | 16:49 |
dansmith | tssurya: just return it always, and let the api decide what to do with it based on the version I think | 16:49 |
tssurya | dansmith: ack | 16:49 |
dansmith | tssurya: you can change the name if you think that's important | 16:49 |
tssurya | I will put it up for review and we can see | 16:50 |
*** markvoelker has joined #openstack-nova | 16:50 | |
tssurya | thanks | 16:51 |
dansmith | cool | 16:51 |
*** sambetts is now known as sambetts|afk | 16:52 | |
sean-k-mooney | mriedem: can we get that on a tee shirt. | 16:52 |
*** NostawRm has quit IRC | 16:52 | |
mriedem | sean-k-mooney: my bug free guarantee? | 16:57 |
mriedem | it only applies from today through labor day | 16:57 |
*** nicolasbock has joined #openstack-nova | 16:59 | |
*** danpb has quit IRC | 17:00 | |
*** sean-k-mooney has quit IRC | 17:00 | |
dansmith | oof, 329 in check | 17:02 |
melwitt | . | 17:02 |
mriedem | so ima also mark https://bugs.launchpad.net/nova/+bug/1788014 as rc potential | 17:04 |
openstack | Launchpad bug 1788014 in OpenStack Compute (nova) "when live migration fails due to a internal error rollback is not handeled correctly." [Medium,In progress] - Assigned to Matt Riedemann (mriedem) | 17:04 |
mriedem | given it's a regression when live migration fails | 17:04 |
mriedem | my only question on that one is doing a tactical fix for the GA | 17:05 |
melwitt | ok, so rc3 now | 17:06 |
*** NostawRm has joined #openstack-nova | 17:11 | |
*** tbachman has quit IRC | 17:11 | |
openstackgerrit | Merged openstack/nova master: Update resources once in update_available_resource https://review.openstack.org/520024 | 17:36 |
openstackgerrit | Merged openstack/nova master: Set policy_opt defaults in placement gabbi fixture https://review.openstack.org/594172 | 17:36 |
openstackgerrit | Merged openstack/nova master: Set policy_opt defaults in placement deploy unit test https://review.openstack.org/594334 | 17:39 |
dansmith | mriedem: what's the plan here? https://review.openstack.org/#/c/591735/ | 17:40 |
dansmith | we would like that to be in all current upstream stable, but will backport it ourselves if we're not going to do it upstream, so I just wanna know if I should hold off or not | 17:41 |
*** tbachman has joined #openstack-nova | 17:45 | |
*** tssurya has quit IRC | 17:50 | |
mriedem | dansmith: i was waiting for you to rebase it | 17:53 |
dansmith | oh heh sorry | 17:53 |
mriedem | sni | 17:53 |
mriedem | snip snap | 17:53 |
mriedem | melwitt: yeah so probably rc3 | 17:53 |
mriedem | these are the 2 as of today https://bugs.launchpad.net/nova/+bugs?field.tag=rocky-rc-potential | 17:54 |
mriedem | looks like final rc is thursday | 17:54 |
mriedem | that first one has a fix in thegate | 17:54 |
mriedem | i wouldn't mind bouncing of a few of you on the 2nd one | 17:54 |
mriedem | *off a few | 17:54 |
melwitt | ack | 17:55 |
mriedem | dansmith: melwitt: so tl;dr, the issue in https://bugs.launchpad.net/nova/+bug/1788014 is that live migration fails and we trigger a lifecycle event which activates the port binding on the dest host incorrectly, it shouldn't do that, | 17:58 |
openstack | Launchpad bug 1788014 in OpenStack Compute (nova) "when live migration fails due to a internal error rollback is not handeled correctly." [Medium,In progress] - Assigned to Matt Riedemann (mriedem) | 17:58 |
mriedem | we're getting an event from libvirt but don't know if it's success or failure for the job, | 17:58 |
mriedem | so what i could do if we want to be low risk with the fix for rocky GA is just not listen on that event and we'll activate port bindings on success like we always did before the change, and we'll still do the early port activating on post-copy events if the live migration is successful | 17:59 |
mriedem | long-term we could check the actual job status and if failed, don't trigger our lifecycle event, but that's riskier for rocky GA at this point IMO | 17:59 |
mriedem | so i'd propose a 2-part fix, one partial that we backport and one for just stein | 18:00 |
*** luksky has joined #openstack-nova | 18:02 | |
melwitt | ok | 18:03 |
openstackgerrit | Dan Smith proposed openstack/nova stable/queens: Fix cancel_all_events event name parsing https://review.openstack.org/592086 | 18:04 |
openstackgerrit | Dan Smith proposed openstack/nova stable/queens: Wait for network-vif-plugged before starting live migration https://review.openstack.org/591735 | 18:04 |
openstackgerrit | Dan Smith proposed openstack/nova stable/queens: DNM: Debug patch to test live migration waiting https://review.openstack.org/591775 | 18:04 |
melwitt | I guess I don't understand what the lifecycle event gives us if we already know success or failure | 18:05 |
melwitt | without listening for it | 18:05 |
mriedem | in rocky we started listening on 2 new events, | 18:07 |
mriedem | one is post-copy and one is migration completed | 18:07 |
mriedem | the idea is that as soon as we switch we activate the port bindings on the dest host for minimal downtime | 18:07 |
mriedem | the problem is we get the latter event even if live migration fails | 18:07 |
mriedem | and we're not doing any conditional logic in that one to see if the job failed or not | 18:08 |
melwitt | I think I understand that part, it's just when you said "we'll activate port bindings on success like we always did before the change" it makes me not understand what gain the lifecycle event was supposed to give, if we already know success or failure without it | 18:09 |
*** tbachman has quit IRC | 18:11 | |
mriedem | because if we can activate the network on the dest at the point the guest is paused to complete the transfer, it makes the network downtime window shorter | 18:11 |
*** tbachman has joined #openstack-nova | 18:12 | |
melwitt | and without the lifecycle event, we activate the network later on after the guest is paused | 18:14 |
mriedem | we might still need my other fix for rollback, i'm not sure; what sean is hitting isn't a failure after post-copy | 18:14 |
mriedem | we activate the network after the guest transfer is complete and resumed on the dest | 18:14 |
*** tbachman has quit IRC | 18:16 | |
melwitt | ok, got it | 18:16 |
*** tbachman has joined #openstack-nova | 18:18 | |
*** cfriesen has joined #openstack-nova | 18:21 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: libvirt: Don't react to VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED events https://review.openstack.org/594508 | 18:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: libvirt: Don't react to VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED events https://review.openstack.org/594508 | 18:33 |
*** NobodyCam has quit IRC | 18:37 | |
*** Kevin_Zheng has quit IRC | 18:37 | |
*** kklimonda has quit IRC | 18:37 | |
*** leifz has quit IRC | 18:37 | |
*** andrewbogott has quit IRC | 18:37 | |
*** ttx has quit IRC | 18:37 | |
*** alaski has quit IRC | 18:37 | |
*** kencjohnston has quit IRC | 18:37 | |
*** zer0c00l has quit IRC | 18:37 | |
*** fungi has quit IRC | 18:37 | |
*** kencjohnston_ has joined #openstack-nova | 18:38 | |
*** kosamara has quit IRC | 18:42 | |
*** TheJulia has quit IRC | 18:42 | |
*** fyx has quit IRC | 18:42 | |
*** mnaser has quit IRC | 18:42 | |
*** nicholas has quit IRC | 18:42 | |
*** TheJulia has joined #openstack-nova | 18:48 | |
*** fungi has joined #openstack-nova | 18:48 | |
*** eharney has quit IRC | 18:52 | |
*** eharney has joined #openstack-nova | 18:59 | |
*** priteau has quit IRC | 19:00 | |
*** mnaser has joined #openstack-nova | 19:09 | |
*** luksky11 has joined #openstack-nova | 19:14 | |
*** cdent has quit IRC | 19:16 | |
*** luksky has quit IRC | 19:17 | |
*** bnemec has quit IRC | 19:20 | |
*** bnemec has joined #openstack-nova | 19:20 | |
*** r-daneel has quit IRC | 19:24 | |
*** r-daneel has joined #openstack-nova | 19:24 | |
*** moshele has quit IRC | 19:25 | |
*** theanalyst has quit IRC | 19:33 | |
*** melwitt has quit IRC | 19:33 | |
*** sdake has quit IRC | 19:33 | |
*** melwitt has joined #openstack-nova | 19:34 | |
*** sdake has joined #openstack-nova | 19:34 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: libvirt: check job status for VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED event https://review.openstack.org/594527 | 19:36 |
*** moshele has joined #openstack-nova | 19:36 | |
*** luksky has joined #openstack-nova | 19:36 | |
mriedem | melwitt: dansmith: alright fyi ^ would need sean-k-mooney to test out the 2nd more complicated fix since he has the env that recreates the bug | 19:37 |
mriedem | and i need to run to an appt | 19:37 |
*** luksky11 has quit IRC | 19:38 | |
*** evrardjp has quit IRC | 19:39 | |
*** Vek has quit IRC | 19:39 | |
*** pcarver has quit IRC | 19:39 | |
*** jistr|off has quit IRC | 19:39 | |
*** aarents has quit IRC | 19:39 | |
*** jcosmao has quit IRC | 19:39 | |
*** ingy has quit IRC | 19:39 | |
*** gryf has quit IRC | 19:39 | |
mriedem | also threw that stuff in the rc todo etherpad | 19:39 |
*** mriedem is now known as mriedem_afk | 19:39 | |
*** sambetts|afk has quit IRC | 19:42 | |
melwitt | ack | 19:43 |
*** gryf has joined #openstack-nova | 19:44 | |
*** sambetts_ has joined #openstack-nova | 19:45 | |
*** moshele has quit IRC | 19:45 | |
*** samueldmq has quit IRC | 19:46 | |
*** _hemna has joined #openstack-nova | 19:47 | |
*** jistr has joined #openstack-nova | 19:47 | |
*** BlackDex has quit IRC | 19:48 | |
*** samueldmq has joined #openstack-nova | 19:49 | |
*** BlackDex has joined #openstack-nova | 19:52 | |
openstackgerrit | Eric Fried proposed openstack/nova master: [placement] Add /reshaper handler for POST https://review.openstack.org/576927 | 19:54 |
openstackgerrit | Eric Fried proposed openstack/nova master: reshaper: Look up provider if not in inventories https://review.openstack.org/585033 | 19:54 |
openstackgerrit | Eric Fried proposed openstack/nova master: Make get_allocations_for_resource_provider raise https://review.openstack.org/584598 | 19:54 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: Real get_allocs_for_consumer https://review.openstack.org/584599 | 19:54 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: get_allocations_for_provider_tree https://review.openstack.org/584648 | 19:55 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: _reshape helper, placement min bump https://review.openstack.org/585034 | 19:55 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: update_from_provider_tree w/reshape https://review.openstack.org/585049 | 19:55 |
openstackgerrit | Eric Fried proposed openstack/nova master: Compute: Handle reshaped provider trees https://review.openstack.org/576236 | 19:55 |
*** luksky11 has joined #openstack-nova | 19:56 | |
*** eharney has quit IRC | 19:59 | |
*** theanalyst has joined #openstack-nova | 19:59 | |
*** luksky has quit IRC | 20:00 | |
*** r-daneel has quit IRC | 20:03 | |
*** r-daneel has joined #openstack-nova | 20:03 | |
*** tbachman has quit IRC | 20:08 | |
*** Tahvok_ has joined #openstack-nova | 20:12 | |
*** gmann_ has joined #openstack-nova | 20:13 | |
*** jaosorior_ has joined #openstack-nova | 20:14 | |
dansmith | so I | 20:14 |
dansmith | am pretty sure my batching breaks tssurya's down cell in its current form | 20:14 |
dansmith | more specifically, it'll cause scatter gather to never notice failues | 20:15 |
*** jaosorior has quit IRC | 20:17 | |
*** edmondsw_ has joined #openstack-nova | 20:18 | |
*** Tahvok has quit IRC | 20:19 | |
*** edmondsw has quit IRC | 20:19 | |
*** gmann has quit IRC | 20:19 | |
*** edmondsw_ is now known as edmondsw | 20:19 | |
*** gmann_ is now known as gmann | 20:19 | |
*** Tahvok_ is now known as Tahvok | 20:19 | |
melwitt | dansmith: it's specific to the batching? it looks like pre-batch the code still removes the "did not respond" and "raised exception" results | 20:24 |
dansmith | still removes? | 20:24 |
melwitt | dansmith: this part looks like it's removing "down cells" from results? https://review.openstack.org/#/c/592698/5/nova/compute/multi_cell_list.py@311 | 20:25 |
dansmith | right but with the batching we never hit that because we don't start executing the queries until the heapq | 20:26 |
melwitt | oh, I see | 20:27 |
melwitt | hmm | 20:27 |
dansmith | I'll just have to get a little more into the middle of that process and we just won't get the standard handlers from scatter gather | 20:29 |
*** erlon has quit IRC | 20:39 | |
*** holser_ has quit IRC | 20:44 | |
*** awaugama has quit IRC | 20:51 | |
*** harlowja has joined #openstack-nova | 21:02 | |
*** mriedem_afk is now known as mriedem | 21:04 | |
mriedem | huh, you don't see instance.save() messaging timeouts in the gate very often http://logs.openstack.org/98/591898/3/check/nova-next/2d5e60c/logs/screen-n-cpu.txt.gz#_Aug_21_17_10_17_732263 | 21:13 |
mriedem | guessing this isn't good http://logs.openstack.org/98/591898/3/check/nova-next/2d5e60c/logs/screen-n-cpu.txt.gz#_Aug_21_17_10_17_426420 | 21:14 |
mriedem | seen here too http://logs.openstack.org/85/567785/7/check/nova-tox-functional-py35/d5a8036/job-output.txt#_2018-08-17_09_06_20_241427 | 21:16 |
*** munimeha1 has quit IRC | 21:21 | |
*** luksky11 has quit IRC | 21:29 | |
mriedem | melwitt: what do you think is missing from placement for shared storage providers support? as far as i know, it's the nova stuff that's lacking as we identified ~2 weeks ago | 21:36 |
mriedem | for sure move operations are not ready for shared storage providers | 21:37 |
mriedem | the placement side of shared storage is pretty simple though, and has been done for a long time | 21:37 |
openstackgerrit | Dmitry Sutyagin proposed openstack/nova-specs master: Allow disabling KSM / mem-merge via extra spec https://review.openstack.org/593197 | 21:40 |
melwitt | mriedem: it's not that I think anything is missing, I'm pragmatically thinking of the integration work and if there is something unforeseen we need to fix. I expect bugs to shake out when we integrate something for the first time. I think not having bugs shake out will be the rarer case | 21:41 |
dansmith | mriedem: I think we were saying the same about aggregates being done in placement before we added the placement filter stuff and realized we needed tweaks | 21:41 |
dansmith | and surely thought what was being done in placement for NRPs was going to be usable by nova until we thought about it | 21:41 |
mriedem | dansmith: the member_of thing with aggs right? | 21:46 |
dansmith | we needed member_of with and and or | 21:46 |
dansmith | and I meant prefilter above | 21:46 |
dansmith | or request filter | 21:46 |
dansmith | or whatever that guy called it | 21:46 |
dansmith | the granular request stuff is another similar example, where we try to use what we think is just clean resource requests for actual nova stuff and realize we need this giant complex syntax instead | 21:47 |
dansmith | all that could be developed in two separate rooms for sure, just like multiattach or multiple host bindings | 21:47 |
dansmith | and hot damn, in a few years we'll be golden | 21:48 |
dansmith | mriedem: by the way, I wonder if the object action rpc methods to conductor ought to be long_rpc_timeouts | 21:49 |
melwitt | yeah. in case it wasn't clear, the thing I care about is delivering stuff that operators and users need, that we know they need, and I don't see how becoming two separate groups helps that | 21:49 |
dansmith | mriedem: they really should never hang for a long time, but just piling up more because we time out, run the periodic again and generate more traffic is probably worse | 21:49 |
dansmith | mriedem: re: your save timeout | 21:50 |
*** itlinux has quit IRC | 21:51 | |
mriedem | dansmith: it seems something weird happened with the servicegroup in that failure | 21:52 |
dansmith | yeah, not related to your actual thing, but just thinking of what that reminds me of, which is conductor is overwhelmed | 21:52 |
mriedem | dansmith: also, i was talking with efried about kevin's exclusive trait thing, and guess what https://review.openstack.org/#/c/593475/ | 21:58 |
mriedem | it's already been proposed :) | 21:58 |
dansmith | well, having not read it and skimmed the -1, I'm assuming it's for the same reason I think it's a non-starter | 22:00 |
*** rha has quit IRC | 22:02 | |
*** rha has joined #openstack-nova | 22:03 | |
mriedem | encoding metadata in a trait name | 22:03 |
dansmith | and it's only one special prefix, | 22:05 |
dansmith | which means one class | 22:05 |
mriedem | CUSTOM_INTEL_FOR_SERIOUS_WORKLOADS | 22:06 |
mriedem | i can see it now | 22:06 |
efried | POST /traits/CUSTOM_INTEL_FOR_SERIOUS_WORKLOADS | 22:08 |
efried | { 'name': 'CUSTOM_INTEL_FOR_SERIOUS_WORKLOADS, | 22:08 |
efried | 'required': true, | 22:08 |
efried | 'allowed_user_ids': [...], | 22:08 |
efried | 'allowed_project_ids': [...], | 22:08 |
efried | ... | 22:08 |
efried | } | 22:08 |
mriedem | queue jay vomit | 22:08 |
mriedem | *cue | 22:09 |
efried | we could do the same thing with aggregates | 22:09 |
mriedem | dansmith: re granular, we could still do POST queries.... | 22:09 |
mriedem | just saying | 22:09 |
mriedem | granular request group syntax is likely something that could benefit from some kind of flavor extra specs validate api | 22:10 |
*** rcernin has joined #openstack-nova | 22:10 | |
efried | no argument there | 22:13 |
*** mriedem is now known as mriedem_away | 22:22 | |
*** moshele has joined #openstack-nova | 22:46 | |
openstackgerrit | Eric Fried proposed openstack/nova master: [placement] Regex consts for placement schema https://review.openstack.org/591863 | 23:05 |
openstackgerrit | Eric Fried proposed openstack/nova master: [placement] Add /reshaper handler for POST https://review.openstack.org/576927 | 23:05 |
openstackgerrit | Eric Fried proposed openstack/nova master: reshaper: Look up provider if not in inventories https://review.openstack.org/585033 | 23:05 |
openstackgerrit | Eric Fried proposed openstack/nova master: Make get_allocations_for_resource_provider raise https://review.openstack.org/584598 | 23:05 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: Real get_allocs_for_consumer https://review.openstack.org/584599 | 23:05 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: get_allocations_for_provider_tree https://review.openstack.org/584648 | 23:05 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: _reshape helper, placement min bump https://review.openstack.org/585034 | 23:05 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: update_from_provider_tree w/reshape https://review.openstack.org/585049 | 23:05 |
openstackgerrit | Eric Fried proposed openstack/nova master: Compute: Handle reshaped provider trees https://review.openstack.org/576236 | 23:05 |
*** erlon has joined #openstack-nova | 23:06 | |
*** r-daneel has quit IRC | 23:09 | |
*** erlon has quit IRC | 23:22 | |
* melwitt will bbl | 23:28 | |
*** Kevin_Zheng has joined #openstack-nova | 23:35 | |
*** macza has quit IRC | 23:37 | |
openstackgerrit | Brin Zhang proposed openstack/nova-specs master: Resource retrieving: add change-before filter https://review.openstack.org/591976 | 23:42 |
*** mlavalle has quit IRC | 23:42 | |
openstackgerrit | Dan Smith proposed openstack/nova master: Batch results per cell when doing cross-cell listing https://review.openstack.org/592698 | 23:48 |
openstackgerrit | Dan Smith proposed openstack/nova master: List instances from all cells explicitly https://review.openstack.org/593717 | 23:48 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make instance_list perform per-cell batching https://review.openstack.org/593131 | 23:48 |
openstackgerrit | Dan Smith proposed openstack/nova master: Record cell success/failure/timeout in CrossCellLister https://review.openstack.org/594265 | 23:48 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make CELL_TIMEOUT a constant https://review.openstack.org/594570 | 23:48 |
openstackgerrit | Dan Smith proposed openstack/nova master: Stash the cell uuid on the context when targeting https://review.openstack.org/594571 | 23:48 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make RecordWrapper record RequestContext and expose cell_uuid https://review.openstack.org/594572 | 23:48 |
*** gbarros has quit IRC | 23:49 | |
*** gbarros has joined #openstack-nova | 23:51 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!