*** ttsiouts has quit IRC | 00:03 | |
*** itlinux_ has joined #openstack-nova | 00:04 | |
*** rcernin has quit IRC | 00:07 | |
*** itlinux has quit IRC | 00:07 | |
*** frankwang has joined #openstack-nova | 00:08 | |
*** samueldmq has quit IRC | 00:12 | |
*** markvoelker has quit IRC | 00:31 | |
*** brinzhang has joined #openstack-nova | 00:34 | |
*** ttsiouts has joined #openstack-nova | 00:35 | |
*** gyee has quit IRC | 00:39 | |
*** ttsiouts has quit IRC | 00:40 | |
*** sapd1_x has quit IRC | 00:41 | |
*** rcernin has joined #openstack-nova | 01:05 | |
*** ykarel has joined #openstack-nova | 01:11 | |
*** ykarel has quit IRC | 01:15 | |
*** ttsiouts has joined #openstack-nova | 01:15 | |
*** bhagyashris has joined #openstack-nova | 01:19 | |
*** itlinux_ has quit IRC | 01:26 | |
*** itlinux has joined #openstack-nova | 01:30 | |
*** cfriesen has quit IRC | 01:47 | |
*** minmin has joined #openstack-nova | 01:48 | |
*** ttsiouts has quit IRC | 01:49 | |
*** ykarel has joined #openstack-nova | 01:55 | |
*** itlinux has quit IRC | 01:56 | |
openstackgerrit | melanie witt proposed openstack/nova master: Use instance mappings to count server group members https://review.opendev.org/638324 | 02:03 |
---|---|---|
openstackgerrit | melanie witt proposed openstack/nova master: Add documentation for counting quota usage from placement https://review.opendev.org/653845 | 02:03 |
*** itlinux has joined #openstack-nova | 02:04 | |
*** altlogbot_0 has quit IRC | 02:12 | |
*** altlogbot_2 has joined #openstack-nova | 02:13 | |
*** tbachman has quit IRC | 02:14 | |
*** ykarel has quit IRC | 02:15 | |
*** markvoelker has joined #openstack-nova | 02:32 | |
*** gmann has joined #openstack-nova | 02:42 | |
openstackgerrit | Merged openstack/nova stable/stein: Disable limit if affinity(anti)/same(different)host is requested https://review.opendev.org/659239 | 02:42 |
*** tbachman has joined #openstack-nova | 02:46 | |
*** ttsiouts has joined #openstack-nova | 02:55 | |
*** markvoelker has quit IRC | 03:06 | |
*** nicolasbock has quit IRC | 03:09 | |
*** whoami-rajat has joined #openstack-nova | 03:11 | |
*** dave-mccowan has quit IRC | 03:14 | |
yaawang | bauzas: Hi, in spec expose-auto-converge-post-copy(https://review.opendev.org/#/c/651681/), what do you mean by "preffering"? | 03:23 |
*** ykarel has joined #openstack-nova | 03:26 | |
*** ttsiouts has quit IRC | 03:29 | |
*** psachin has joined #openstack-nova | 03:31 | |
*** ricolin has joined #openstack-nova | 03:31 | |
*** itlinux has quit IRC | 03:40 | |
*** itlinux has joined #openstack-nova | 03:43 | |
*** boxiang has joined #openstack-nova | 03:59 | |
*** frankwang has quit IRC | 04:00 | |
*** ttsiouts has joined #openstack-nova | 04:02 | |
*** markvoelker has joined #openstack-nova | 04:02 | |
*** itlinux has quit IRC | 04:03 | |
*** itlinux has joined #openstack-nova | 04:06 | |
*** ttsiouts has quit IRC | 04:07 | |
*** ykarel has quit IRC | 04:08 | |
*** tbachman has quit IRC | 04:14 | |
*** boxiang has quit IRC | 04:15 | |
*** boxiang has joined #openstack-nova | 04:16 | |
*** ykarel has joined #openstack-nova | 04:24 | |
*** markvoelker has quit IRC | 04:35 | |
*** ttsiouts has joined #openstack-nova | 04:39 | |
*** itlinux has quit IRC | 04:42 | |
*** itlinux has joined #openstack-nova | 04:49 | |
openstackgerrit | Sundar Nadathur proposed openstack/nova-specs master: Nova Cyborg interaction specification. https://review.opendev.org/603955 | 04:58 |
*** itlinux has quit IRC | 05:04 | |
*** itlinux has joined #openstack-nova | 05:05 | |
*** pcaruana has joined #openstack-nova | 05:08 | |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: ksa auth conf and client for cyborg access https://review.opendev.org/631242 | 05:10 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: WIP: Add Cyborg device profile groups to request spec. https://review.opendev.org/631243 | 05:10 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: WIP: Create and bind Cyborg ARQs. https://review.opendev.org/631244 | 05:10 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: WIP: Get resolved Cyborg ARQs and add PCI BDFs to VM's domain XML. https://review.opendev.org/631245 | 05:10 |
*** ttsiouts has quit IRC | 05:11 | |
*** pcaruana has quit IRC | 05:16 | |
*** dpawlik has joined #openstack-nova | 05:16 | |
*** pcaruana has joined #openstack-nova | 05:17 | |
*** sapd1_x has joined #openstack-nova | 05:23 | |
*** markvoelker has joined #openstack-nova | 05:32 | |
*** tkajinam has quit IRC | 05:32 | |
*** tkajinam has joined #openstack-nova | 05:34 | |
*** sapd1_x has quit IRC | 05:35 | |
*** imacdonn has quit IRC | 05:42 | |
*** imacdonn has joined #openstack-nova | 05:42 | |
*** frankwang has joined #openstack-nova | 05:43 | |
*** itlinux has quit IRC | 05:44 | |
*** tbachman has joined #openstack-nova | 05:44 | |
*** lpetrut has joined #openstack-nova | 05:45 | |
*** lpetrut has quit IRC | 05:46 | |
*** lpetrut has joined #openstack-nova | 05:46 | |
*** tbachman has quit IRC | 05:49 | |
eandersson | What is the general recommended value for heal_instance_info_cache_interval on a large deployment (~1k nodes)? | 05:49 |
eandersson | With the default we couldn't scale neutron fast enough to actually handle the load. | 05:50 |
*** Luzi has joined #openstack-nova | 06:01 | |
*** sapd1_x has joined #openstack-nova | 06:05 | |
*** markvoelker has quit IRC | 06:05 | |
*** kaisers has quit IRC | 06:13 | |
*** kaisers has joined #openstack-nova | 06:16 | |
openstackgerrit | Boxiang Zhu proposed openstack/nova master: Add compute_nodes_uuid field to Destination object https://review.opendev.org/661188 | 06:18 |
*** sapd1_x has quit IRC | 06:18 | |
*** stakeda has joined #openstack-nova | 06:19 | |
*** ttsiouts has joined #openstack-nova | 06:27 | |
mnaser | melwitt, efried: wrt nova-consoleauth, we've killed it in production (and we've killed it in OSA and looks like Matt proposed a change to remove the cleanup code) -- https://review.opendev.org/#/c/661126/ | 06:31 |
*** slaweq has joined #openstack-nova | 06:35 | |
openstackgerrit | Leehom Li proposed openstack/nova master: RT keep report usage in case CPUPinning conflict https://review.opendev.org/661208 | 06:42 |
openstackgerrit | Leehom Li proposed openstack/nova master: RT keep report usage in case CPUPinning conflict https://review.opendev.org/661208 | 06:44 |
*** ivve has joined #openstack-nova | 06:52 | |
*** dpawlik has quit IRC | 06:56 | |
*** ttsiouts has quit IRC | 06:58 | |
*** markvoelker has joined #openstack-nova | 07:03 | |
*** ratailor has joined #openstack-nova | 07:06 | |
*** awalende has joined #openstack-nova | 07:08 | |
*** minmin has quit IRC | 07:11 | |
*** minmin has joined #openstack-nova | 07:12 | |
*** tssurya has joined #openstack-nova | 07:15 | |
jangutter | If only this comic came out at the start of the month: https://xkcd.com/2153/ | 07:24 |
*** dpawlik has joined #openstack-nova | 07:26 | |
*** gmann has quit IRC | 07:28 | |
*** ttsiouts has joined #openstack-nova | 07:30 | |
*** dtantsur|afk is now known as dtantsur | 07:30 | |
*** donnyd has quit IRC | 07:34 | |
*** icey has quit IRC | 07:34 | |
*** markvoelker has quit IRC | 07:35 | |
*** mnaser has quit IRC | 07:35 | |
*** helenafm has joined #openstack-nova | 07:36 | |
*** fyx has quit IRC | 07:36 | |
*** rcernin has quit IRC | 07:36 | |
*** guilhermesp has quit IRC | 07:37 | |
*** seyeongkim has quit IRC | 07:39 | |
openstackgerrit | Yongli He proposed openstack/nova master: clean up orphan instances https://review.opendev.org/627765 | 07:39 |
*** jungleboyj has quit IRC | 07:40 | |
*** tkajinam has quit IRC | 07:41 | |
*** fyx has joined #openstack-nova | 07:43 | |
*** jungleboyj has joined #openstack-nova | 07:43 | |
*** seyeongkim has joined #openstack-nova | 07:43 | |
*** guilhermesp has joined #openstack-nova | 07:43 | |
*** mnaser has joined #openstack-nova | 07:43 | |
*** ralonsoh has joined #openstack-nova | 07:50 | |
*** mnaser has quit IRC | 08:02 | |
*** tkajinam has joined #openstack-nova | 08:04 | |
*** jungleboyj has quit IRC | 08:04 | |
*** mnaser has joined #openstack-nova | 08:05 | |
*** geekinutah has quit IRC | 08:06 | |
*** jungleboyj has joined #openstack-nova | 08:07 | |
*** bbowen_ has joined #openstack-nova | 08:07 | |
*** rpittau|afk is now known as rpittau | 08:08 | |
*** rm_work has quit IRC | 08:09 | |
*** rm_work has joined #openstack-nova | 08:10 | |
*** geekinutah has joined #openstack-nova | 08:10 | |
*** fyx has quit IRC | 08:10 | |
*** bbowen has quit IRC | 08:11 | |
*** boxiang has quit IRC | 08:11 | |
*** boxiang has joined #openstack-nova | 08:11 | |
*** fyx has joined #openstack-nova | 08:13 | |
*** yankcrime has joined #openstack-nova | 08:16 | |
*** icey has joined #openstack-nova | 08:17 | |
*** zbr has joined #openstack-nova | 08:20 | |
*** ykarel is now known as ykarel|lunch | 08:21 | |
lyarwood | ganso: LGTM but as tonyb said, as this is stable-only we typically get full Nova Core members to also sign off on things | 08:25 |
*** davidsha has joined #openstack-nova | 08:26 | |
lyarwood | dansmith / efried ; https://review.opendev.org/#/c/659338 - as above, as this is stable-only would you mind giving this a once over? | 08:26 |
lyarwood | mdbooth: https://review.opendev.org/#/c/658903/ - tgif, am I missing the point here? | 08:28 |
mdbooth | lyarwood: looking | 08:28 |
lyarwood | ah nvm I think I see it now | 08:31 |
lyarwood | they want to pull directly from rbd when using a file based backend | 08:31 |
*** awalende has quit IRC | 08:32 | |
*** awalende has joined #openstack-nova | 08:32 | |
*** markvoelker has joined #openstack-nova | 08:32 | |
*** trident has quit IRC | 08:33 | |
mdbooth | lyarwood: Right | 08:34 |
*** trident has joined #openstack-nova | 08:34 | |
*** ttsiouts has quit IRC | 08:35 | |
mdbooth | lyarwood: I hesitate to say it, but I think we want an os-glance library for this | 08:35 |
mdbooth | Otherwise we're embedding details of glance backend drivers in nova | 08:36 |
mdbooth | I suspect cinder would also use it | 08:36 |
lyarwood | we already have rbd_utils.py | 08:36 |
mdbooth | That's generic | 08:36 |
mdbooth | And glance presumably has other backend drivers | 08:36 |
lyarwood | as is this stuff tbh | 08:36 |
mdbooth | which aren't ceph | 08:37 |
lyarwood | there's nothing glance specific you'd need here | 08:37 |
lyarwood | you already have the location | 08:37 |
*** awalende has quit IRC | 08:37 | |
lyarwood | if anything rbd_utils etc could all live in os-brick | 08:37 |
mdbooth | lyarwood: See: file, http, rbd, swift, sheepdog, cinder, vmware | 08:38 |
openstackgerrit | Boxiang Zhu proposed openstack/nova master: Add valid compute nodes with host and/or node in api layer https://review.opendev.org/661237 | 08:42 |
*** tesseract has joined #openstack-nova | 08:48 | |
lyarwood | mdbooth: right, I just don't think we need yet another lib for this | 08:53 |
*** janki has joined #openstack-nova | 08:53 | |
*** ccamacho has joined #openstack-nova | 08:56 | |
*** derekh has joined #openstack-nova | 08:57 | |
*** bhagyashris has quit IRC | 09:03 | |
*** markvoelker has quit IRC | 09:06 | |
*** ttsiouts has joined #openstack-nova | 09:11 | |
*** ykarel|lunch is now known as ykarel | 09:11 | |
*** sapd1_x has joined #openstack-nova | 09:14 | |
*** ttsiouts has quit IRC | 09:15 | |
*** tkajinam has quit IRC | 09:36 | |
*** cdent has joined #openstack-nova | 09:44 | |
*** lpetrut has quit IRC | 09:46 | |
*** sapd1_x has quit IRC | 09:46 | |
*** ttsiouts has joined #openstack-nova | 09:52 | |
*** awalende has joined #openstack-nova | 09:52 | |
*** awalende has quit IRC | 09:56 | |
*** tesseract has quit IRC | 09:56 | |
*** tesseract has joined #openstack-nova | 09:59 | |
*** awalende has joined #openstack-nova | 09:59 | |
*** tesseract has quit IRC | 10:00 | |
*** tesseract has joined #openstack-nova | 10:01 | |
*** markvoelker has joined #openstack-nova | 10:03 | |
openstackgerrit | Silvan Kaiser proposed openstack/nova stable/stein: Added mount fstype based validation of Quobyte mounts https://review.opendev.org/660706 | 10:08 |
openstackgerrit | Merged openstack/nova stable/stein: Fix target used in nova.policy.check_is_admin https://review.opendev.org/660330 | 10:23 |
openstackgerrit | Merged openstack/nova stable/queens: Define irrelevant-files for tempest-full-py3 job https://review.opendev.org/650460 | 10:23 |
openstackgerrit | Merged openstack/nova stable/queens: Error out migration when confirm_resize fails https://review.opendev.org/652150 | 10:23 |
*** ttsiouts has quit IRC | 10:25 | |
openstackgerrit | Merged openstack/nova stable/queens: Teardown networking when rolling back live migration even if shared disk https://review.opendev.org/658149 | 10:32 |
openstackgerrit | Merged openstack/nova master: Link versioned notification talk into docs https://review.opendev.org/661115 | 10:32 |
*** markvoelker has quit IRC | 10:35 | |
*** panda has quit IRC | 10:38 | |
*** stakeda has quit IRC | 10:40 | |
*** brinzhang has quit IRC | 10:42 | |
*** boxiang has quit IRC | 10:45 | |
*** panda has joined #openstack-nova | 10:46 | |
*** luksky has quit IRC | 10:48 | |
*** panda is now known as panda|rover | 10:48 | |
openstackgerrit | Adam Spiers proposed openstack/nova master: Move patch_exists() to nova.test.TestCase for reuse https://review.opendev.org/660500 | 10:57 |
*** gmann has joined #openstack-nova | 11:02 | |
openstackgerrit | Adam Spiers proposed openstack/nova master: Move patch_exists() to nova.test.TestCase for reuse https://review.opendev.org/660500 | 11:07 |
*** tesseract has quit IRC | 11:10 | |
*** luksky has joined #openstack-nova | 11:17 | |
*** psachin has quit IRC | 11:29 | |
*** markvoelker has joined #openstack-nova | 11:32 | |
*** ttsiouts has joined #openstack-nova | 11:33 | |
*** cdent has quit IRC | 11:35 | |
*** cdent has joined #openstack-nova | 11:36 | |
*** psachin has joined #openstack-nova | 11:38 | |
lyarwood | has anyone ever seen LM try to claim resources on the source at the start of a migration? | 11:39 |
lyarwood | IHAC suggesting that's happening but that seems totally wrong | 11:39 |
*** cdent has quit IRC | 11:40 | |
*** dave-mccowan has joined #openstack-nova | 11:46 | |
*** ratailor has quit IRC | 11:47 | |
*** ratailor has joined #openstack-nova | 11:48 | |
*** amorin has quit IRC | 11:49 | |
*** dave-mccowan has quit IRC | 11:50 | |
*** amorin has joined #openstack-nova | 11:51 | |
*** cdent has joined #openstack-nova | 11:55 | |
*** markvoelker has quit IRC | 11:57 | |
*** xek has joined #openstack-nova | 12:02 | |
openstackgerrit | Adam Spiers proposed openstack/nova master: Move patch_exists() to nova.test.TestCase for reuse https://review.opendev.org/660500 | 12:03 |
*** jaosorior has joined #openstack-nova | 12:06 | |
*** ttsiouts has quit IRC | 12:06 | |
*** psachin has quit IRC | 12:08 | |
efried | stephenfin: in case you didn't see http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-05-24.log.html#t2019-05-24T06:31:39 | 12:09 |
openstackgerrit | Adam Spiers proposed openstack/nova master: Move patch_exists() to nova.test.TestCase for reuse https://review.opendev.org/660500 | 12:09 |
*** derekh has quit IRC | 12:15 | |
aspiers | god sometimes Python really pisses me off | 12:15 |
aspiers | Python 2/3 unicode stuff is a never-ending nightmare | 12:16 |
aspiers | I'm literally seeing heisenbugs with it | 12:17 |
aspiers | one minute everything's fine, the next it magically breaks | 12:17 |
cdent | i've never met a language that does it in a way that makes me happy. python is one of many that make me want to flip tables | 12:17 |
cdent | aspiers: bugs on your own machine or remote nodes? if the latter there are so many variables involved (system encoding, python build, blah blah) | 12:17 |
aspiers | local | 12:18 |
* cdent flips all the tables | 12:18 | |
aspiers | I'm on the verge | 12:18 |
aspiers | I just want to slurp in the contents of __file__ (sounds weird I know, but there is a good reason) and then do a substring match on it | 12:18 |
aspiers | if I try to decode('utf-8') on each line I get UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 1: unexpected end of data | 12:19 |
aspiers | despite it being an ASCII file (nova/nova/tests/unit/test_test.py) | 12:19 |
*** dims has left #openstack-nova | 12:20 | |
aspiers | ascii is no better: UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position 1: ordinal not in range(128) | 12:20 |
aspiers | WTF!! I literally ran the test twice in a row and it only failed the second time!! | 12:24 |
aspiers | OK, I've spotted a pattern | 12:27 |
aspiers | It always passes on py36 | 12:27 |
aspiers | It *only* passes on py27 if I run it after editing and saving the file | 12:28 |
aspiers | If I re-run it, subsequent runs fail | 12:28 |
* aspiers <- head explodes | 12:28 | |
aspiers | WT actual F | 12:28 |
*** eharney has joined #openstack-nova | 12:30 | |
cdent | if __file__ always the .py file and never the .pyc file? | 12:31 |
aspiers | ohhhhhhh | 12:31 |
cdent | s/if/is/ | 12:31 |
aspiers | you genius | 12:31 |
aspiers | that will be it | 12:31 |
cdent | it was a guess | 12:31 |
aspiers | that would perfectly explain it though | 12:32 |
cdent | my genius is never expecting things to be sensible | 12:32 |
aspiers | haha | 12:32 |
cdent | "what if it's this totally stupid thing that shouldn't happen" | 12:32 |
aspiers | my folly is always expecting things to be sensible | 12:32 |
cdent | how long have you been in this business ? :) | 12:32 |
aspiers | (except in politics. I'm not *that* stupid) | 12:32 |
*** ttsiouts has joined #openstack-nova | 12:35 | |
aspiers | cdent: works! __file__.rstrip('c') | 12:35 |
cdent | wacky | 12:35 |
aspiers | you just saved me bashing my head over this for hours - THANK YOU | 12:35 |
cdent | you're very welcome | 12:35 |
aspiers | no amount of rubber-ducking would have helped on that one, I fear | 12:36 |
* cdent quacks | 12:36 | |
aspiers | :) | 12:36 |
openstackgerrit | Adam Spiers proposed openstack/nova master: Move selective patching of open() to nova.test for reuse https://review.opendev.org/661266 | 12:38 |
aspiers | cdent: ^^^ | 12:38 |
aspiers | BTW I think I fixed the contextlib issue in the parent of that one | 12:39 |
*** ttsiouts has quit IRC | 12:39 | |
cdent | \o/ | 12:41 |
efried | tonyb: done | 12:51 |
aspiers | cdent: just noticed some functional tests are calling os.path.exists('/etc/policy.d') | 12:54 |
aspiers | presumably that's not good in a functional test environment | 12:55 |
cdent | There's a few bleeds like that in the nova functional tests | 12:55 |
aspiers | also .netrc, yikes | 12:55 |
cdent | I remember there were some (since fixed) I think, which would be change behavior by using the /etc/nova/nova.conf | 12:56 |
aspiers | my new helpers might help clean some of those cases up | 12:58 |
*** tbachman has joined #openstack-nova | 13:05 | |
openstackgerrit | Silvan Kaiser proposed openstack/nova stable/stein: Fixes multi-registry config in Quobyte driver https://review.opendev.org/660706 | 13:06 |
*** jistr is now known as jistr|afk | 13:11 | |
*** mriedem has joined #openstack-nova | 13:12 | |
*** ttsiouts has joined #openstack-nova | 13:12 | |
*** awalende has quit IRC | 13:16 | |
*** awalende has joined #openstack-nova | 13:16 | |
*** awalende_ has joined #openstack-nova | 13:20 | |
*** awalende has quit IRC | 13:21 | |
*** dpawlik has quit IRC | 13:23 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Clarify --before help text in nova manage https://review.opendev.org/661289 | 13:24 |
*** awalende_ has quit IRC | 13:24 | |
*** boxiang has joined #openstack-nova | 13:29 | |
*** ykarel is now known as ykarel|meeting | 13:29 | |
stephenfin | efried: I did not. Thanks for the heads up :) | 13:29 |
*** luksky has quit IRC | 13:31 | |
stephenfin | efried: So TripleO is being handled (mschuppert is on the case) and I'm working through the Kolla change. That's pretty much all we need, right? | 13:35 |
*** cmart has quit IRC | 13:35 | |
efried | stephenfin: IIUC from yesterday's meeting, yes. | 13:37 |
lyarwood | mriedem: prior to https://review.opendev.org/#/q/topic:bug/1469179+(status:open+OR+status:merged) the only real way to avoid over allocation issues when booting from volumes was to use flavors with a disk size of 0 right? | 13:37 |
efried | stephenfin: Assume you caught up on those logs (which prompted your ML thread)? | 13:38 |
*** ttsiouts has quit IRC | 13:40 | |
*** tbachman has quit IRC | 13:40 | |
*** ratailor has quit IRC | 13:40 | |
*** ttsiouts has joined #openstack-nova | 13:40 | |
mriedem | lyarwood: sounds right | 13:49 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make nova-next archive using --before https://review.opendev.org/661002 | 13:49 |
*** mlavalle has joined #openstack-nova | 13:50 | |
mriedem | can i get a novaclient core to review https://review.opendev.org/#/c/659886/ since it's holding up a release? both tssurya and i have things in the upcoming novaclient release that are needed for enabling changes in other projects (osc and watcher) | 13:50 |
*** lorenjan has quit IRC | 13:50 | |
*** betherly has joined #openstack-nova | 13:55 | |
efried | dansmith: | 13:55 |
efried | efried@efried-ThinkPad-W520:~/openstack/nova$ f() { | 13:55 |
efried | > echo $1 | 13:55 |
efried | > } | 13:55 |
efried | efried@efried-ThinkPad-W520:~/openstack/nova$ f $(date) | 13:55 |
efried | Fri | 13:55 |
efried | efried@efried-ThinkPad-W520:~/openstack/nova$ f "$(date)" | 13:55 |
efried | Fri May 24 08:53:45 CDT 2019 | 13:55 |
efried | unless argparse is doing something magical... | 13:55 |
dansmith | efried: what? | 13:56 |
*** jistr|afk is now known as jistr | 13:56 | |
efried | see comment on https://review.opendev.org/661002 | 13:56 |
aspiers | Yeah, that comment looks right | 13:56 |
dansmith | oh yeah | 13:56 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make nova-next archive using --before https://review.opendev.org/661002 | 13:57 |
aspiers | efried: BTW I can't believe you are still using a W520 | 13:57 |
aspiers | I mean, I am too, but only as a VPN server hidden under my desk | 13:57 |
efried | aspiers: Heh, it was my IBM-provided workstation, did a nice SSD and memory upgrade, it was really stable and solid, so I bought it back when I left. | 13:58 |
aspiers | Mine weighed a ton and the battery used to last 2 hours on a good day | 13:58 |
efried | My Intel-provided HP thingy has roughly equivalent specs, weighs about a quarter as much... and performs like a dog. | 13:58 |
efried | ...and won't run not-Windows. | 13:58 |
efried | (because of Intel gorp) | 13:59 |
efried | (without major hoops of fire) | 13:59 |
*** ttsiouts has quit IRC | 13:59 | |
*** betherly has quit IRC | 14:00 | |
*** whoami-rajat has quit IRC | 14:00 | |
*** hemna has quit IRC | 14:05 | |
openstackgerrit | Silvan Kaiser proposed openstack/nova stable/stein: Fixes multi-registry config in Quobyte driver https://review.opendev.org/660706 | 14:05 |
*** hemna has joined #openstack-nova | 14:05 | |
*** hongbin has joined #openstack-nova | 14:06 | |
*** tbachman has joined #openstack-nova | 14:08 | |
mriedem | efried: we might want to see https://review.opendev.org/#/c/661002/ working in action before +2ing :) | 14:10 |
efried | mriedem: It worked at a previous PS | 14:11 |
efried | as in, didn't blow up | 14:11 |
efried | syntax weirdness notwithstanding | 14:11 |
efried | i.e. archived rows older than [earlier date than intended] | 14:11 |
efried | presumably all the rows in that job are newer than "yesterday" anyway, so the change is a no-op? | 14:12 |
mriedem | http://logs.openstack.org/02/661002/1/check/nova-next/d88596c/logs/devstack-gate-post_test_hook.txt.gz#_2019-05-23_23_53_30_031 | 14:12 |
mriedem | nova-manage: error: unrecognized arguments: --max-rows | 14:12 |
efried | oh, that doesn't cause the test to fail? Guess that should be fixed :) | 14:12 |
mriedem | we should probably have a set -e around this b/c it's blowing up but not failing | 14:12 |
efried | okay | 14:13 |
*** bnemec is now known as beekneemech | 14:14 | |
*** stephenfin is now known as finucannot | 14:14 | |
mriedem | set -e in a separate change so we can backport it imo | 14:14 |
finucannot | efried: I did, yup, cheers. The kolla patch has a +2 already too. Anything else needed to get that approved now? :) | 14:15 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make nova-next archive using --before https://review.opendev.org/661002 | 14:21 |
*** _erlon_ has joined #openstack-nova | 14:22 | |
*** jaosorior has quit IRC | 14:23 | |
*** Anticime1 is now known as Anticimex | 14:26 | |
melwitt | dansmith: do you recall whether we exclude certain exceptions from incrementing the build_failed counter for the auto-disabling of computes? I thought it was discussed in the past but looking at the code I don't see anything like that https://github.com/openstack/nova/blob/stable/queens/nova/compute/manager.py#L1789 | 14:30 |
dansmith | yes, there are two classes | 14:31 |
dansmith | so things we catch and handle in a certain way don't count, | 14:32 |
dansmith | and anything we don't catch counts | 14:32 |
dansmith | and then some gray in the middle | 14:32 |
dansmith | and part of the problem is we don't do nearly as good at distinguishing as we thought we did | 14:33 |
dansmith | hmm, although looking again, maybe we lumped them all together in the end after that realization | 14:34 |
*** jchhatbar has joined #openstack-nova | 14:34 | |
melwitt | yeah, I don't remember this, I thought there was some kind of whitelist that wouldn't increment the counter but not seeing it now | 14:34 |
dansmith | I think it was the failed/rescheduled result | 14:35 |
dansmith | but we increment for both now | 14:35 |
melwitt | I see | 14:35 |
*** janki has quit IRC | 14:36 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Clarify --before help text in nova manage https://review.opendev.org/661289 | 14:36 |
mriedem | there is no whitelist | 14:36 |
mriedem | we talked about it | 14:36 |
mriedem | but it got out of hand | 14:36 |
mriedem | i had a wip patch for a bit for one type of exception but it was gross and dropped it | 14:37 |
*** liuyulong has joined #openstack-nova | 14:37 | |
mriedem | there are about a billion ways you can get an exception in that build path in the compute | 14:37 |
dansmith | yeah, that's what I mean by we thought it would be clearer which things are terminal failures or not, but it's not | 14:37 |
mriedem | and since it's python and not strictly typing any of that stuff, we can't really just catch one | 14:37 |
dansmith | well, I dunno about _that_ :) | 14:38 |
mriedem | well, meaning like in java where you have to declare and explicitly handle exceptions that get raised up during compiling | 14:39 |
dansmith | ah sure | 14:39 |
dansmith | i.e. "the most annoying part of java" | 14:39 |
*** ttsiouts has joined #openstack-nova | 14:39 | |
mriedem | zvm driver could add a new MainframeKaputError to raise and we'd not handle it | 14:39 |
mriedem | if you're ocd like me it's not so bad :) | 14:39 |
mriedem | i'd like some credit for my witty use of a german word in that zvm joke please... | 14:40 |
melwitt | ok, this is sounding familiar now, I vaguely remember your wip patch about it | 14:40 |
melwitt | oh yeah, kaputt, nicely done | 14:40 |
mriedem | melwitt: so is your customer on just a release that is too old to have the weigher? | 14:41 |
efried | finucannot: If two out of three of melwitt, mriedem, and sean-k-mooney ack it, I'll be happy to approve it (nova-consoleauth removal bp) | 14:41 |
melwitt | no, they have the weigher, it's just incrementing the counter for user-caused build failures we think | 14:41 |
dansmith | that's the whole thing | 14:42 |
dansmith | that's why we moved it to a weigher | 14:42 |
dansmith | because we suck so bad at knowing why a thing failed that auto-disable isn't really sane | 14:42 |
mriedem | melwitt: malicious user? | 14:42 |
dansmith | so we made it a weigher so it doesn't take it out of rotation and you can adjust how much score you give to that decision | 14:42 |
melwitt | mriedem: I doubt it but not sure | 14:42 |
mriedem | so jed at the bank wrote a script to boot 100 vms at once and it's DoSing their private cloud | 14:43 |
melwitt | dansmith: got it. yeah, IIUC this customer is trying to pack instances and they see it start spreading instances when they don't want it to, because of the user-caused failures | 14:43 |
*** jchhatbar has quit IRC | 14:44 | |
*** ttsiouts has quit IRC | 14:44 | |
dansmith | melwitt: right so null out that weigher and move on | 14:44 |
*** ivve has quit IRC | 14:45 | |
melwitt | mdbooth: turns out I was mistaken. we had tried some exception whitelist patches but it got unwieldy, so that idea was abandoned. the recommended way to handle this scenario is to disable the failed build weigher | 14:45 |
mdbooth | melwitt: Ack, thanks. | 14:46 |
*** cdent has quit IRC | 14:46 | |
*** Sundar has joined #openstack-nova | 14:47 | |
*** ykarel|meeting is now known as ykarel | 14:48 | |
mriedem | once again let me state i dislike the non-border table rendering in our docs now https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#id17 | 14:48 |
gibi | mriedem: do you have opinion about the CLI syntaxt propsal in https://review.opendev.org/#/c/651783/3/osc_placement/resources/resource_provider.py@130 ? | 14:51 |
*** boxiang has quit IRC | 14:53 | |
*** KH-Jared has joined #openstack-nova | 14:54 | |
mriedem | sec | 14:54 |
gibi | thanks | 14:54 |
finucannot | mriedem: I'll trade you a fix for that table issue for your ack on my remove-consoleauth bp | 15:02 |
finucannot | been meaning to get to that for weeks now | 15:03 |
*** itlinux has joined #openstack-nova | 15:03 | |
*** markmcclain has joined #openstack-nova | 15:07 | |
*** jangutter has quit IRC | 15:10 | |
*** gyee has joined #openstack-nova | 15:13 | |
*** itlinux has quit IRC | 15:15 | |
aspiers | efried: asking on #openstack-requirements about the contextlib2 issue, I have not much clue what's going on | 15:17 |
*** itlinux has joined #openstack-nova | 15:17 | |
aspiers | bizarrely it's moaning about sphinx too | 15:17 |
aspiers | where's stephenfin when you need him? | 15:17 |
*** hemna has quit IRC | 15:17 | |
*** hemna has joined #openstack-nova | 15:17 | |
finucannot | aspiers: I'm here (it's Friday) | 15:18 |
aspiers | hah | 15:18 |
* aspiers learns his nova IRC nicks | 15:18 | |
openstackgerrit | Arnaud Morin proposed openstack/nova master: Force refresh instance network info on deletion https://review.opendev.org/660761 | 15:18 |
finucannot | What's the specific Sphinx issue? | 15:19 |
finucannot | aspiers: ^ | 15:19 |
aspiers | finucannot: see #openstack-requirements | 15:19 |
*** itlinux has quit IRC | 15:19 | |
*** ttsiouts has joined #openstack-nova | 15:20 | |
finucannot | Man, I wish we'd drop Python 2 this cycle instead of in U :( | 15:20 |
*** luksky has joined #openstack-nova | 15:21 | |
*** itlinux has joined #openstack-nova | 15:21 | |
amorin | hi efried, mriedem and others. I updated the patch to refresh instance network cache on deletion, thanks for commenting/reviewing! | 15:22 |
ganso | dansmith: Hi! When you have a few minutes, could you please take a look at and sign off https://review.opendev.org/#/c/659338/ if everything is ok? thanks in advance =) | 15:23 |
openstackgerrit | Adam Spiers proposed openstack/nova master: Move patch_exists() to nova.test.TestCase for reuse https://review.opendev.org/660500 | 15:24 |
dansmith | ganso: I was going to challenge the directness of your query, but I see tonyb threw me under the bus, so.. okay :) | 15:24 |
mriedem | efried: finucannot: i've put my ack in https://blueprints.launchpad.net/nova/+spec/remove-consoleauth | 15:25 |
*** owalsh has quit IRC | 15:25 | |
mriedem | amorin: questions in there, | 15:30 |
mriedem | but without the force will this even fix the case in your bug? have you tested it without the force_refresh=True? | 15:31 |
mriedem | b/c the point of the force flag was to rebuild the cache from neutron rather than the cache itself | 15:31 |
*** owalsh has joined #openstack-nova | 15:32 | |
*** cdent has joined #openstack-nova | 15:33 | |
amorin | mriedem: good point | 15:33 |
amorin | hum | 15:34 |
amorin | I need to test that, it worked in my dev infra, but I need to double check that | 15:34 |
*** macza has joined #openstack-nova | 15:34 | |
amorin | I wont have time today, but I will do next week | 15:35 |
mriedem | "it" being the original patch with force_refresh=True? | 15:35 |
mriedem | or the current version of https://review.opendev.org/#/c/660761/3 ? | 15:35 |
mriedem | if terminate is just racing with build and the copy of the instance during terminate just doesn't have the info cache data from the db, then the refresh should fix your problem i think, | 15:36 |
mriedem | but if the info cache in the db is empty when you refresh, that bottom change isn't going to help | 15:36 |
amorin | the cache is populated in the DB | 15:36 |
mriedem | it won't make things worse of course | 15:36 |
mriedem | ok, well in that case you might be saved, | 15:36 |
amorin | but after the copy of instance is used for deletion | 15:37 |
amorin | yup, anyway I will double check that point | 15:37 |
melwitt | dansmith: do you remember where we landed when we were talking about disabling oslo.messaging heartbeats in nova-api, as far as whether we would recommend running that configuration in our upstream docs? | 15:37 |
mriedem | because https://github.com/openstack/nova/blob/976d1b89c2b754729903291050e01c9bf49704b9/nova/network/neutronv2/api.py#L1757 | 15:37 |
sean-k-mooney | efried: sorry was on a call for the last 90 mins | 15:37 |
amorin | _gather_port_ids_and_networks is using the nova cache DB right? | 15:37 |
sean-k-mooney | efried: did ye reach an agreement on the nova-console auth blueprint | 15:37 |
openstackgerrit | Adam Spiers proposed openstack/nova master: Move selective patching of open() to nova.test for reuse https://review.opendev.org/661266 | 15:37 |
melwitt | dansmith: this is about the idle nova-api wsgi app heartbeat error messages stuff | 15:38 |
mriedem | amorin: yeah | 15:38 |
amorin | mriedem, by the way, thanks for the tip about mocking in setup, I was pretty sure that this was possible | 15:38 |
mriedem | amorin: yeah so tl;dr if you're just racing to refresh from the db on delete then your fix is probably ok | 15:38 |
amorin | but I am beginning in that subject | 15:38 |
mriedem | np | 15:38 |
mriedem | i've only recently-ish started using that trick | 15:39 |
efried | sean-k-mooney: It sounds like all the deployment projects are either sorted or have patches in flight to sort. I was looking for you & others to vet and +1 basically so I could approve the bp | 15:39 |
melwitt | efried, finucannot: I'm also +1 on removing nova-consoleauth since it looks like everything is sorted with deployment tools | 15:39 |
sean-k-mooney | efried: kolla has an open bug but no progess in a while | 15:40 |
sean-k-mooney | if that is the only blocker i can go write the patches needed | 15:40 |
sean-k-mooney | it is already optional but it would jsut invovle deleting the container on upgrade an removing the deployment code | 15:40 |
efried | finucannot: do you have any further words about kolla ^ ? | 15:41 |
sean-k-mooney | so yes im +1 on removing it and can help the kolla folks as needed since i used to work on kolla and kolla ansibel a few release ago | 15:41 |
finucannot | sean-k-mooney: So there's more to it than this? https://review.opendev.org/#/c/661251/ | 15:42 |
finucannot | efried: ^ | 15:42 |
sean-k-mooney | that removed it from kolla | 15:42 |
sean-k-mooney | but they also need to remove it in kolla ansible | 15:42 |
sean-k-mooney | e.g. you just deleted teh container | 15:42 |
*** helenafm has quit IRC | 15:43 | |
sean-k-mooney | but the upgrade playbook need to koll all running instance of the contain | 15:43 |
sean-k-mooney | *kill | 15:43 |
finucannot | Ah, I figured there was more to it than that. I can try drafting those patches based on what they did in OSA. I'd need one to kill the running instances (which would be backported to stable/stein) then a follow up to remove all references entirely | 15:43 |
sean-k-mooney | we need to kill this handeler https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/nova/handlers/main.yml#L112-L136 | 15:44 |
efried | finucannot: Okay, mriedem, melwitt, and sean-k-mooney are all +1, I'm approving the bp assuming the above is fairly easy and will be taken care of before we pull the trigger. | 15:44 |
sean-k-mooney | and remove like 2 or 3 other things but its a fairly small patch | 15:44 |
*** rpittau is now known as rpittau|afk | 15:45 | |
finucannot | sean-k-mooney: I'll do that before I clock off | 15:46 |
sean-k-mooney | ok you need to kill the service default https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/nova/defaults/main.yml#L83-L93 and as i sad just update https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/nova/tasks/rolling_upgrade.yml and https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/nova/tasks/legacy_upgrade.yml to nuke the running container on | 15:47 |
sean-k-mooney | upgrade | 15:47 |
*** itlinux has quit IRC | 15:49 | |
sean-k-mooney | let me know if you need any help navagaing kolla-ansible | 15:49 |
*** Luzi has quit IRC | 15:50 | |
*** itlinux has joined #openstack-nova | 15:50 | |
mriedem | gibi: replied, i think your suggestion is ok | 15:50 |
finucannot | mriedem, efried: https://review.opendev.org/661344 | 15:52 |
dansmith | melwitt: I think disabling heartbeats will increase recovery time if the connection goes stale, and will cause stale connections to stay open on the rabbit side | 15:53 |
dansmith | melwitt: so I think threads=1, leave heartbeat behavior | 15:53 |
*** ttsiouts has quit IRC | 15:54 | |
melwitt | sean-k-mooney, mdbooth ^ | 15:54 |
aspiers | efried: I'm pondering whether my helpers fall foul of https://docs.python.org/3/library/unittest.mock.html#where-to-patch | 15:54 |
melwitt | dansmith: thanks | 15:54 |
*** itlinux has quit IRC | 15:55 | |
*** ivve has joined #openstack-nova | 15:55 | |
aspiers | efried: although that wouldn't explain why they have been working so far | 15:56 |
*** ykarel is now known as ykarel|away | 15:56 | |
gibi | mriedem: thanks. | 15:56 |
sean-k-mooney | dansmith: if we leave teh hearbeat enable should we at least change the log level in the nova-api for the oslo messaging logs? | 15:57 |
sean-k-mooney | or the heart beat on in partiaclar | 15:57 |
efried | finucannot: why does starlingx get different zebra stripe? | 15:59 |
dansmith | I dunno, but we might want to soften the log level in o.msg where it raises holy hell about the reconnct | 15:59 |
efried | *I* want special zebra stripe | 15:59 |
efried | aspiers: I'm going to have to get back to you (possibly much) later | 16:00 |
aspiers | efried: it's OK. Attacking it with a debugger now | 16:00 |
sean-k-mooney | dansmith: so we suggested that and got some push back to droping it to info in oslo | 16:00 |
*** dtantsur is now known as dtantsur|afk | 16:01 | |
finucannot | efried: Whoops, missed that | 16:01 |
finucannot | fixed | 16:01 |
dansmith | sean-k-mooney: so what is your thought, to specifically set o.msg level to error in nova-api? | 16:01 |
*** wwriverrat has joined #openstack-nova | 16:01 | |
sean-k-mooney | dansmith: effectivly but i think we can be more granular then that. we do this for some privesep stuff right? | 16:02 |
dansmith | sean-k-mooney: we can set defaults on the logger name level, I think | 16:02 |
sean-k-mooney | dansmith: actully setting it to error wont help as o.msg is currently loggin it as an error | 16:02 |
sean-k-mooney | so more we woudl have to install a log filter to speficaly drop that error | 16:03 |
dansmith | sean-k-mooney: well, I dunno, it *really* seems like that shouldn't be an error, it's just a reconnect, which is exactly what it's supposed to be doing | 16:04 |
dansmith | so whatever.. however you make it not show up, or leave it and document it or whatever, I care about that less | 16:04 |
dansmith | what I do care about is that we not disable heartbeats just because it's making a statement in the logs we don't want | 16:04 |
mriedem | monkeypatch the oslo.messaging code! | 16:04 |
beekneemech | mriedem: Crazy talk! :-P | 16:05 |
sean-k-mooney | dansmith: ya i agree it not an error. maybe we could make it a warning in o.mesg and then set the log level to error for o.msg in nova | 16:05 |
dansmith | sean-k-mooney: yeah | 16:05 |
mriedem | if only DEFAULT_LOG_LEVELS wasn't a gd list | 16:07 |
*** betherly has joined #openstack-nova | 16:07 | |
*** ykarel|away has quit IRC | 16:07 | |
mriedem | makes it a pita to parse and override from code | 16:08 |
melwitt | dansmith: IIUC from past conversation, the ideal way to handle the eventlet monkey patch issue longer term would be to separate the wsgi app part of nova-api from the rest of it, so we don't have behavior that doesn't fit in with the wsgi app happening inside of it. what would that look like? a separate process for each of nova-api-wsgi and nova-api? | 16:08 |
dansmith | melwitt: yeah | 16:10 |
dansmith | that's how wsgi apps area really supposed to be done, AFAIK, | 16:11 |
dansmith | the wsgi bit is just the "view" and you delegate everything to the controller | 16:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: Delete allocations even if _confirm_resize raises https://review.opendev.org/652153 | 16:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: Add functional confirm_migration_error test https://review.opendev.org/658136 | 16:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: [stable-only] Delete allocations even if _confirm_resize raises (part 2) https://review.opendev.org/661349 | 16:11 |
mriedem | delicious MVC | 16:12 |
*** betherly has quit IRC | 16:12 | |
mriedem | ganso: ^ likely needs some work b/c the functional test in queens isn't failing when i remove the fix | 16:13 |
* cdent can smell mriedem yearning for java again | 16:13 | |
mriedem | struts baby! | 16:13 |
*** BjoernT has joined #openstack-nova | 16:13 | |
mriedem | that's the last time i did UI Work | 16:13 |
mriedem | *work | 16:13 |
mriedem | Faces and AJAX was a brand new world to me when i left that dept | 16:13 |
melwitt | dansmith: ok, cool. I wanted to get an idea about what's involved to see if we could make that happen this cycle. | 16:14 |
melwitt | to do it properly for future | 16:14 |
cdent | melwitt, dansmith : are you talking about a "controller" as a separate process? | 16:15 |
dansmith | melwitt: uh, that would be a significant undertaking | 16:15 |
dansmith | cdent: yeah | 16:15 |
melwitt | oh :( | 16:15 |
cdent | yeah, that's why I ask. It would be significant | 16:15 |
dansmith | cdent: definitely | 16:15 |
*** brault has joined #openstack-nova | 16:16 | |
cdent | a somewhat shorter step would be to make a "scatter-gatther-cells-agent" that can be threaded (in whatever way) that the wsgi app talks to over a unix socket | 16:16 |
dansmith | melwitt: going from a thing designed to be eventlet-based to just slapping a wsgi frontend on it is the kind of thing that doesn't end with a small refactor to substantially change how it works :) | 16:16 |
*** hongbin has quit IRC | 16:17 | |
cdent | but even that agent idea would be an undertaking | 16:17 |
melwitt | the concern was, the near term things we can do are to somehow hide the log message and set threads=1. but what's the long term thing to do, or is the near term fix going to be the forever fix | 16:17 |
dansmith | cdent: yeah, could do that, but that's a lot of deployment change for just that thing | 16:17 |
* mriedem goes to solve his short-term problem of an empty stomach | 16:17 | |
cdent | dansmith: yeah, no doubt. (I guess you could also fork it as needed) | 16:17 |
cdent | call scatter-gather-cells over rpc to the conductor ? | 16:18 |
* cdent is spitballing | 16:19 | |
dansmith | cdent: that will slow down instance list quite a bit | 16:19 |
* cdent nods | 16:19 | |
*** hongbin has joined #openstack-nova | 16:19 | |
cdent | Yeah, I don't really have much in the way of ideas. We've got impedance mismatches galore. | 16:19 |
cdent | s/ideas/ideas that can be realized/ | 16:20 |
melwitt | yeah... just wanted to understand whether we'll have a next step here or if this is it. sounds like realistically, there's not more we could do | 16:20 |
dansmith | yep.. IMHO, fix the short term and then let's properly design the bigger change after we see how that goes | 16:20 |
dansmith | and not armchair quarterback it the day before a holiday weekend (for some of us) :) | 16:20 |
melwitt | haha, no, I was only looking for a high-level ballpark idea of what it would look like | 16:21 |
*** brault has quit IRC | 16:21 | |
* cdent was the one playing football | 16:21 | |
melwitt | because when you said "separate wsgi part" initially, I didn't really know what that meant | 16:21 |
* cdent goes outside not for football | 16:25 | |
*** cdent has quit IRC | 16:25 | |
*** ccamacho has quit IRC | 16:30 | |
*** itlinux has joined #openstack-nova | 16:32 | |
*** davidsha has quit IRC | 16:32 | |
*** tssurya has quit IRC | 16:44 | |
openstackgerrit | Merged openstack/nova master: Add --before to nova-manage db archive_deleted_rows https://review.opendev.org/556751 | 16:50 |
*** manjeets has quit IRC | 16:51 | |
*** ttsiouts has joined #openstack-nova | 16:56 | |
*** hongbin has quit IRC | 16:58 | |
aspiers | gibi: you around? I'm trying to get a functional test to restart the compute service in a way which triggers init_host() | 16:58 |
aspiers | gibi: and I see you worked on restart_compute_service() | 16:59 |
aspiers | unfortunately that only restarts the Host, not the Service | 16:59 |
*** itlinux has quit IRC | 17:01 | |
aspiers | I tried self.compute.stop(); self.start_compute() but the latter invokes nova.test.TestCase.start_service() which borks by trying to create a duplicate HostMapping | 17:02 |
*** ricolin has quit IRC | 17:03 | |
aspiers | dansmith: I see your fingerprints on this code too :) | 17:03 |
aspiers | This is about the point where my newbie-ness makes it pretty hard to figure out a way forward | 17:03 |
*** ralonsoh has quit IRC | 17:06 | |
openstackgerrit | Adam Spiers proposed openstack/nova master: Provide HW_CPU_AMD_SEV trait when SEV is supported https://review.opendev.org/638680 | 17:10 |
melwitt | aspiers: I don't know much about that but you reminded me of a patch of gibi's that might be of help https://review.opendev.org/512552 | 17:11 |
aspiers | melwitt: thanks! yeah that's the exact area of code I'm talking about | 17:12 |
aspiers | maybe that change will yield some clues | 17:13 |
melwitt | that's what I'm hoping :) | 17:13 |
*** ykarel|away has joined #openstack-nova | 17:14 | |
dansmith | aspiers: "restart the host not the service" does not make sense to me, fwiw | 17:14 |
*** itlinux has joined #openstack-nova | 17:15 | |
dansmith | aspiers: calling start_service() again is not the right thing to do, because it's really create_and_start_service() I think | 17:15 |
aspiers | yes, I think that's the problem | 17:15 |
dansmith | aspiers: if you want to re-run init_host() I would just run it | 17:15 |
*** itlinux has quit IRC | 17:16 | |
*** itlinux_ has joined #openstack-nova | 17:16 | |
dansmith | aspiers: the restart_compute_service() comment explains why doing a restart in a functional test isn't super realistic, and that's why it saves/restores the RT across the start/stop | 17:17 |
dansmith | so yeah, I think you just want to re-run init if that's what you're trying to do | 17:18 |
aspiers | OK, thanks | 17:18 |
aspiers | I wasn't sure if init_host() depended on a bunch of other stuff being run before it in the same context | 17:18 |
dansmith | well, you can see what it's doing | 17:19 |
aspiers | I guess self.basic_config_check() is the only real thing done before | 17:19 |
aspiers | and I don't need to re-run that | 17:19 |
*** itlinux has joined #openstack-nova | 17:19 | |
dansmith | init_host doesn't really do much other than run the driver's init really.. what about it do you need to restart? | 17:20 |
dansmith | re-run the init_instance parts? | 17:20 |
*** itlinux_ has quit IRC | 17:20 | |
aspiers | the SEV check I've added to it | 17:20 |
dansmith | oh okay | 17:20 |
aspiers | oh, actually I don't think this will be good enough by itself | 17:20 |
aspiers | because the SEV check only sets an instance variable | 17:20 |
aspiers | which is later consumed in u_p_t() to provide the trait | 17:21 |
*** itlinux_ has joined #openstack-nova | 17:21 | |
aspiers | but maybe _run_periodics() will take care of that | 17:21 |
*** BjoernT has quit IRC | 17:22 | |
aspiers | ah, looks like my previous confusion was an incorrect assumption that in functional tests, self.compute is a Host object, but it's actually a Service or something | 17:23 |
dansmith | ...as in a real deployment | 17:24 |
aspiers | I think I just misread the code somewhere | 17:24 |
aspiers | or had a stack overflow in my brain | 17:24 |
*** itlinux has quit IRC | 17:25 | |
*** ttsiouts has quit IRC | 17:30 | |
aspiers | interesting LibvirtDriver.init_host() has an unused host parameter | 17:33 |
*** cmart has joined #openstack-nova | 17:39 | |
*** JamesBenson has joined #openstack-nova | 17:46 | |
*** itlinux_ has quit IRC | 17:47 | |
artom | aspiers, probably another virt driver needs it | 17:48 |
aspiers | yeah probably | 17:48 |
aspiers | artom: nope, none of them do | 17:50 |
artom | aspiers, heh, you actually checked :) | 17:50 |
aspiers | just | 17:50 |
artom | Historic then, maybe? Still needed by something out of tree? | 17:51 |
aspiers | https://opendev.org/openstack/nova/commit/f51526b596f3d89cda2ec4501e1 | 17:51 |
aspiers | historic | 17:51 |
aspiers | I think someone probably forgot to remove it during some redesign | 17:52 |
aspiers | it was used here https://opendev.org/openstack/nova/src/commit/f51526b596f3d89cda2ec4501e19baf085c534e0/nova/virt/libvirt_conn.py#L163 | 17:52 |
aspiers | would be a bit of a pain to trawl through the git history to find out what removed it | 17:53 |
aspiers | or maybe not | 17:53 |
aspiers | let's see | 17:53 |
ganso | mriedem: hmm this is weird, as it failed in the gate on PS-3 (parent was fix part 1 only) | 17:53 |
*** pcaruana has quit IRC | 17:53 | |
mriedem | ganso: i rebased the queens backport series so there is the (1) original fix, (2) the part 2 fix which makes the functional test pass and then (3) the functional test | 17:55 |
*** bbowen_ has quit IRC | 17:55 | |
mriedem | the problem is, i commented out the fix from (2) and (3) still passed on queens | 17:55 |
mriedem | just got back to my desk though so haven't investigate | 17:55 |
mriedem | *investigated | 17:55 |
aspiers | artom: found it https://opendev.org/openstack/nova/commit/8d97118be776fcaad3053d1f93f61d339685a4ae | 17:56 |
ganso | mriedem: hmm this is weird, as it failed in the gate on PS-3 (parent was fix part 1 only) | 17:57 |
*** ttsiouts has joined #openstack-nova | 17:57 | |
artom | aspiers, OK :) | 17:57 |
aspiers | artom: should I submit a patch to remove it? seems like technical debt | 17:57 |
aspiers | ah, would that break out-of-tree drivers? | 17:58 |
aspiers | in fact, are there any? | 17:58 |
mriedem | seems like ever since we enabled novnc in the multi-cell job there is a novnc test that continues to fail | 17:58 |
aspiers | lol, this is how little I know about nova | 17:58 |
artom | aspiers, https://github.com/openstack/nova-powervm | 17:58 |
mriedem | which is maybe bad for the consoleauth removal stuff... | 17:58 |
aspiers | artom: weird, so why is powervm also in-tree? | 17:59 |
mriedem | aspiers: there are several out of tree drivers | 17:59 |
artom | aspiers, ¯\_(ツ)_/¯ | 17:59 |
aspiers | :) | 17:59 |
aspiers | mriedem: thanks | 17:59 |
mriedem | the out of tree drivers are maintained by vendors that push more features into them than what's in the in-tree versions | 17:59 |
aspiers | gotcha | 17:59 |
mriedem | some aren't in tree at all, like lxd | 17:59 |
aspiers | thought it might be something like that | 17:59 |
aspiers | OK, so just randomly removing parameters from the ComputeDriver interface isn't cool | 18:00 |
mriedem | some we could probably deprecate from in-tree like xen, zvm and powervm | 18:00 |
mriedem | there are no guarantees on the virt driver interface since it's internal and not versoined | 18:01 |
mriedem | *versioned | 18:01 |
mriedem | we generally try to be nice and at least email to the list if we're changing an interface | 18:01 |
aspiers | right | 18:01 |
mriedem | also, yup, enabling n-novnc in the multi-cell job on 5/20 is when the test started failing https://github.com/openstack/nova/commit/c5b83c3fbca83726f4a956009e1788d26bcedde0#diff-7415f5ff7beee2cdf9ffe31e12e4c086 | 18:02 |
*** ttsiouts has quit IRC | 18:02 | |
mriedem | melwitt: finucannot: ^ | 18:02 |
mriedem | i'm going to report a bug since i don't have time to dig in, | 18:02 |
mriedem | but that could put a damper on the remove-consoleauth shindig | 18:02 |
aspiers | well I suppose I could at least add a NOTE to the base class referencing https://opendev.org/openstack/nova/commit/8d97118be776fcaad3053d1f93f61d339685a4ae and https://opendev.org/openstack/nova/commit/f51526b596f3d89cda2ec4501e1 | 18:02 |
aspiers | to save anyone else from needing ancient archaelogy to understand why the parameter is there | 18:03 |
mriedem | i've walked backward into this conversation so i don't know what the context is | 18:03 |
aspiers | mriedem: I just noticed that the host parameter to ComputeDriver.init_host() is never used | 18:03 |
mriedem | i'd say we want the host arg in init_host | 18:03 |
aspiers | at least, not by anything in-tree | 18:03 |
mriedem | because we can use that for our fake virt driver to avoid global config | 18:03 |
mriedem | in fact i have a patch doing that i think | 18:04 |
aspiers | oh, I didn't think to check the fakevirt driver | 18:04 |
mriedem | CONF.host doesn't work in tests when you have multiple computes | 18:04 |
mriedem | https://review.opendev.org/#/c/656709/2/nova/virt/fake.py@149 | 18:04 |
artom | Wait, so we officially have a precedent for writing code *just* for CI to work? | 18:05 |
finucannot | mriedem: I'll stick that on my queue to investigate early next week | 18:05 |
*** finucannot is now known as stephenfin | 18:05 | |
aspiers | mriedem: OK, that's a bit over my head at this stage so I'll just defer to your judgement and not do anything | 18:06 |
*** wwriverrat has quit IRC | 18:06 | |
mriedem | melwitt: stephenfin: https://bugs.launchpad.net/nova/+bug/1830417 | 18:07 |
openstack | Launchpad bug 1830417 in OpenStack Compute (nova) "NoVNCConsoleTestJSON.test_novnc fails in nova-multi-cell job since 5/20" [Medium,Confirmed] | 18:07 |
aspiers | at least the history is now in eavesdrop if we ever need it again | 18:07 |
artom | aspiers, the premier searchable, accessible, information repository ;) | 18:08 |
aspiers | artom ;-) | 18:08 |
*** cmart has quit IRC | 18:11 | |
mriedem | melwitt: in this case i don't see anything going on in the novnc service for cell2 http://logs.openstack.org/73/638073/31/check/nova-multi-cell/6ea3306/controller/logs/screen-n-novnc-cell2.txt.gz | 18:16 |
mriedem | and that's where the instance is | 18:16 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Skip novnc tests in multi-cell job until bug 1830417 is fixed https://review.opendev.org/661371 | 18:24 |
openstack | bug 1830417 in OpenStack Compute (nova) "NoVNCConsoleTestJSON.test_novnc fails in nova-multi-cell job since 5/20" [Medium,Confirmed] https://launchpad.net/bugs/1830417 | 18:24 |
mriedem | efried: you said in the nova meeting yesterday that the multi-cell job was wonky, well, that's why ^ | 18:24 |
mriedem | oooo this is great http://logs.openstack.org/04/656304/1/check/nova-grenade-live-migration/083d5ec/logs/subnode-2/screen-n-cpu.txt.gz?level=TRACE#_May_24_16_55_08_219083 | 18:26 |
mriedem | May 24 16:55:08.219083 ubuntu-bionic-rax-ord-0006536941 nova-compute[867]: ERROR nova.virt.libvirt.driver [-] [instance: da0c957c-7a7d-4673-bd2d-0336d22f6fff] Live Migration failure: internal error: process exited while connecting to monitor: Failed to initialize module: /usr/lib/x86_64-linux-gnu/qemu/block-rbd.so May 24 16:55:08.219083 ubuntu-bionic-rax-ord-0006536941 nova-compute[867]: Note: only modules from the same build | 18:26 |
mriedem | be loaded. | 18:26 |
*** ttsiouts has joined #openstack-nova | 18:29 | |
mriedem | i now realize there was no release note with the eventlet monkeypatch change in stable/stein https://review.opendev.org/#/c/647310/ | 18:36 |
mriedem | i wonder if we should have had a release note on that | 18:36 |
mriedem | to at least mention overriding that with OS_NOVA_DISABLE_EVENTLET_PATCHING | 18:36 |
melwitt | mriedem: gah, looking | 18:37 |
sean-k-mooney | mriedem: well OS_NOVA_DISABLE_EVENTLET_PATCHING is not ment to be a public thing for people to use | 18:37 |
mriedem | iow, because of bug 1829062 on stable/stein, should we have a release note for ^ | 18:37 |
openstack | bug 1829062 in StarlingX "nova placement api non-responsive due to eventlet error" [Critical,In progress] https://launchpad.net/bugs/1829062 - Assigned to Gerry Kopec (gerry-kopec) | 18:37 |
mriedem | sean-k-mooney: starlingx is using it | 18:37 |
sean-k-mooney | it was intended for use in tox | 18:37 |
sean-k-mooney | mriedem: well i know mdbooth never intended it to be used by anything in production | 18:38 |
mriedem | ok no reno it is | 18:39 |
sean-k-mooney | have we told starling x that they can work around the issue by seting the wsgi server threads=1 instead | 18:40 |
mriedem | i haven't | 18:40 |
mriedem | and there is nothing in that bug | 18:40 |
mriedem | as far as i know when you guys all talk about this every other week none of the results of the conversation are written down publicly | 18:40 |
sean-k-mooney | ok... ill try and write something up as a post to the mailing list on monday | 18:42 |
*** Sundar has quit IRC | 18:43 | |
*** xek has quit IRC | 18:47 | |
melwitt | mriedem: I think what's wrong is that the vnc proxy urls in the nova-cpu.conf files are both using port 6080 when one of them (the compute in cell2) should be using 6084 | 18:47 |
melwitt | http://logs.openstack.org/73/638073/31/check/nova-multi-cell/6ea3306/controller/logs/etc/nova/nova-cpu_conf.txt.gz vs http://logs.openstack.org/73/638073/31/check/nova-multi-cell/6ea3306/compute1/logs/etc/nova/nova-cpu_conf.txt.gz | 18:48 |
*** pcaruana has joined #openstack-nova | 18:48 | |
*** xek has joined #openstack-nova | 18:48 | |
*** pcaruana has quit IRC | 18:48 | |
mriedem | ah yup | 18:48 |
melwitt | and from what I recall in devstack, I don't know how we can configure separate ones and use them in an easy way | 18:49 |
mriedem | we can pass that information into each node from the job config if we want to hard-code them | 18:49 |
melwitt | oh, I see, so if we know the multi-cell job only has two cells and we know what cell2 port is going to be, hard-code it into the job config | 18:50 |
mriedem | right we have NOVA_SERVICE_LISTEN_ADDRESS in devstack | 18:51 |
mriedem | for the host | 18:51 |
mriedem | which points back at the controller | 18:51 |
melwitt | because having devstack properly set up a separate nova-cpu.conf per cell and then getting the right ones passed in job configs would be a pretty big change | 18:51 |
mriedem | i think we'd add a devstack variable alongside NOVA_SERVICE_LISTEN_ADDRESS but for NOVNC_PROXY_PORT or something and default to '' but allow the job to set that via zuul/ansible on the subnode | 18:52 |
melwitt | I think anyway. I wondered about this when I did the patch for the different ports to avoid collisions but then it was working in your devstack env so I thought I was missing something. I briefly looked through thinking how different nova-cpu.conf could be made and it didn't look straightforward | 18:53 |
*** xek_ has joined #openstack-nova | 18:53 | |
*** xek has quit IRC | 18:54 | |
mriedem | it might just be a matter of taking the NOVA_CPU_CELL variable into account on the subnode | 18:55 |
mriedem | https://github.com/openstack/nova/blob/master/.zuul.yaml#L275 | 18:55 |
melwitt | oh, and letting devstack configure nova-cpu.conf accordingly? that would be a better way, if we can. because I'm thinking, in theory, there's more than just the novnc port, there's all the other possible console proxy ports too | 18:56 |
sean-k-mooney | mriedem: wait that bug was for the placemnet api | 18:56 |
mriedem | so rather than this https://review.opendev.org/#/c/649473/3/lib/nova@613 | 18:56 |
sean-k-mooney | we dont monkeypatch the placmenet api? | 18:57 |
mriedem | maybe configure_console_proxies should just be using NOVA_CPU_CELL as the offset | 18:57 |
mriedem | but... | 18:57 |
melwitt | oh, hmm | 18:57 |
mriedem | then we won't have separate n-novnc services for cell1 and cell2 on the controller | 18:57 |
mriedem | sean-k-mooney: in stein, placement is in nova | 18:58 |
mriedem | for some | 18:58 |
sean-k-mooney | starlingx is not using extracted plamcent? | 18:58 |
*** lbragstad has joined #openstack-nova | 18:59 | |
sean-k-mooney | eventls and wsgi shoudl normally be ok if you dont have long lived threads like the heartbeat | 18:59 |
sean-k-mooney | so even using the intree placment they should have been fine | 18:59 |
sean-k-mooney | oh they are hitting the other issue | 19:00 |
sean-k-mooney | with the cannot switch to a different trhead | 19:00 |
mriedem | most people are probably not going to be using extracted placement at stein ga | 19:01 |
sean-k-mooney | my comment still applies then | 19:01 |
*** luksky has quit IRC | 19:01 | |
mriedem | your comment about threads=1? | 19:01 |
sean-k-mooney | ya https://bugs.launchpad.net/nova/+bug/1829062/comments/7 | 19:01 |
openstack | Launchpad bug 1829062 in StarlingX "nova placement api non-responsive due to eventlet error" [Critical,In progress] - Assigned to Gerry Kopec (gerry-kopec) | 19:01 |
mriedem | if so, then my comment about a known issue release note applies :) | 19:02 |
sean-k-mooney | if they set wsgi to use 1 thread per process then they wont get the context swtich error | 19:02 |
sean-k-mooney | *thread switch | 19:02 |
melwitt | mriedem: I'm actually thinking that in devstack lib/nova, we could use NOVA_CPU_CELL to mimic the other offset code to set the ports properly in the one nova-cpu.conf it writes. I'll give it a try and see if it could work | 19:02 |
*** ttsiouts has quit IRC | 19:03 | |
sean-k-mooney | also i spend the last 5 minutes trying to figure out how the starling x repos work and i am more confused then when i started | 19:03 |
openstackgerrit | Adam Spiers proposed openstack/nova master: Provide HW_CPU_AMD_SEV trait when SEV is supported https://review.opendev.org/638680 | 19:03 |
melwitt | I didn't realize there was a variable that would tell us what cell we (nova-compute) we are | 19:03 |
melwitt | -we | 19:04 |
aspiers | efried: this might be ready to go now ^^^ | 19:04 |
sean-k-mooney | melwitt: we do? | 19:04 |
mriedem | melwitt: ok, yeah that's probably easy to test with a nova patch depending on it since nova-multi-cell doesn't run on devstack changes (we could add it to the devstack experimental queue for testing stuff like this on-demand) | 19:04 |
mriedem | i'd just add nova-multi-cell to devstack experimental, i can push a change for that | 19:04 |
melwitt | sean-k-mooney: yeah NOVA_CPU_CELL, zuul job configs can set it | 19:04 |
sean-k-mooney | melwitt: oh a devstack variable | 19:04 |
melwitt | yes | 19:04 |
melwitt | mriedem: ok, sounds good | 19:05 |
sean-k-mooney | ah ok i taught you ment in the nova code | 19:05 |
melwitt | heh, yeah no | 19:05 |
openstackgerrit | Adam Spiers proposed openstack/nova master: Reduce logging of host hypervisor capabilities to DEBUG level https://review.opendev.org/661379 | 19:05 |
efried | aspiers: I was partway through PS12, was going to suggest moving your trait setting into _get_cpu_traits itself. And also, where's the bit where the SEV trait was going to be moved under the X86 namespace? | 19:09 |
aspiers | efried: I did consider putting it in _get_cpu_traits but that already seemed pretty over-grown | 19:10 |
openstackgerrit | Merged openstack/nova stable/rocky: [stable-only] Delete allocations even if _confirm_resize raises (part 2) https://review.opendev.org/659338 | 19:11 |
openstackgerrit | Merged openstack/nova stable/rocky: Add functional confirm_migration_error test https://review.opendev.org/658834 | 19:11 |
aspiers | efried: maybe it can be split up | 19:11 |
*** xek__ has joined #openstack-nova | 19:11 | |
aspiers | efried: oh, I remember now - that wasn't the only reason | 19:11 |
aspiers | efried: I'm still behind with the latest news on the great CPU trait taxonomy debate | 19:13 |
aspiers | efried: this None check gave me pause https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/driver.py#L6777 | 19:14 |
*** xek_ has quit IRC | 19:14 | |
aspiers | efried: *theoretically* all the traits could vanish from a host, and then this code would fail to remove them | 19:14 |
aspiers | but if that actually happened, maybe it would be a sure sign things have gone very badly wrong, in which case maybe that's a desirable accident anyway | 19:15 |
dansmith | efried: mriedem I'll admit I had my head on backwards with what I was expecting, but: http://logs.openstack.org/02/661002/4/check/nova-next/39819ea/job-output.txt.gz#_2019-05-24_18_14_23_954469 | 19:16 |
dansmith | seems to work to make sure we don't archive recent stuff | 19:16 |
mriedem | i just -1ed that :) | 19:16 |
mriedem | we are sympatico | 19:16 |
dansmith | my intent was to make the job actually call this with "tomorrow" or something | 19:16 |
mriedem | heh yeah | 19:16 |
efried | okay, that was clear to me, thought it was intentional. But I guess if this is the only place we test the script, it makes sense to invoke it in a way that does something. | 19:18 |
*** whoami-rajat has joined #openstack-nova | 19:18 | |
dansmith | efried: I was expecting to actually land this, so it needs to not not run | 19:18 |
dansmith | I was expecting to make it do what we do today, but with a date, which would be tomorrow not yesterday | 19:19 |
dansmith | but in reality, making sure it doesn't eff up today's records when called for yesterday is what we wanted to sanity check | 19:19 |
mriedem | plus note that purge failed | 19:19 |
mriedem | as a result | 19:19 |
mriedem | but b/c no set -e the job doesn't fail | 19:19 |
efried | dansmith: What about running it twice, once like this to make sure it outputs "Nothing was archived" and then the second time without --before (or with --before $(date -d tomorrow)) to make sure it works? | 19:19 |
mriedem | dansmith: you could run it both ways | 19:19 |
mriedem | jinx | 19:19 |
dansmith | yep, doing that now | 19:19 |
efried | what went wrong with purge? | 19:20 |
efried | And is there a fup change to turn on -e? | 19:20 |
mriedem | purge fails with rc=3 if nothing was purged | 19:20 |
mriedem | "fails" | 19:20 |
mriedem | i'm all fup'ed out | 19:20 |
efried | yeah, so 3 shouldn't be a failure condition, from the pov of the calling script, right? | 19:22 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make nova-next archive using --before https://review.opendev.org/661002 | 19:24 |
dansmith | I'm just about done for the holiday, we can finish this up next week | 19:24 |
*** kaiokmo has joined #openstack-nova | 19:24 | |
efried | aspiers: If the traits all vanished, that dict would still not be None. It would just have all False values. | 19:25 |
*** bbowen has joined #openstack-nova | 19:26 | |
aspiers | efried: OK, because CPU_TRAITS_MAPPING is hardcoded. Makes sense | 19:28 |
efried | yah | 19:28 |
aspiers | _get_cpu_traits() is still too big though | 19:28 |
efried | aspiers: When mriedem was doing the trait reporting in the RT, I made sure we were covering that corner case | 19:28 |
aspiers | nice | 19:29 |
efried | aspiers: Sure, I would be fine splitting that up. | 19:29 |
aspiers | efried: well, shouldn't be too hard to do that and then move the SEV bit in there | 19:29 |
aspiers | efried: other than that and maybe some trait taxonomy thing which I don't yet know about, hopefully everything else is in order with this patch | 19:30 |
efried | aspiers: https://review.opendev.org/#/c/538498/6/nova/compute/resource_tracker.py@904 (patch look familiar?) | 19:30 |
aspiers | efried: yeah, I do remember reading that and vaguely understanding | 19:31 |
aspiers | I'm sure a re-read will make a lot more sense now | 19:31 |
*** ykarel|away has quit IRC | 19:35 | |
*** panda|rover has quit IRC | 19:36 | |
*** panda has joined #openstack-nova | 19:40 | |
mriedem | ganso: ok so i think things in queens are ok except the functional test is failing because of some missing test handling stuff on teardown in queens, working on that now | 19:47 |
ganso | mriedem: thanks Matt! =D | 19:48 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: Add functional confirm_migration_error test https://review.opendev.org/658136 | 19:52 |
mriedem | ganso: i think you can drop the -1 on https://review.opendev.org/#/c/652153/ now | 19:52 |
ganso | mriedem: yup, dropped =) | 19:53 |
kaiokmo | hey nova folks! I was wondering if something changed regarding how metadata are defined/handled by nova-scheduler | 20:07 |
kaiokmo | specifically on the aggregate_instance_extra_specs filter | 20:08 |
kaiokmo | I have a Pike deployment in which this filter works properly | 20:08 |
kaiokmo | last week I deployed a environment on Rocky (tag is 18.1.7), and this filters is not working as expected | 20:10 |
*** ttsiouts has joined #openstack-nova | 20:10 | |
kaiokmo | nova-scheduler is ignoring the filter and all hosts are passing the filtering phase | 20:10 |
kaiokmo | nova.conf is configured as follows (only the aggregate_instance_extra_specs is enabled): http://paste.openstack.org/show/752054/ | 20:12 |
kaiokmo | I appreciate any help. | 20:12 |
kaiokmo | thanks in advance | 20:13 |
*** slaweq has quit IRC | 20:19 | |
aspiers | alright, I'm off. happy weekend o/ | 20:22 |
melwitt | kaiokmo: not aware of any change off the top of my head. will look around the code to see if anything has changed in that area | 20:28 |
kaiokmo | melwitt: thank you very much | 20:32 |
melwitt | I'm not seeing that anything has changed in the area | 20:36 |
melwitt | kaiokmo: have you enabled debug logging in nova-scheduler and look at what's going on? if so, do you see DEBUG messages like these? https://github.com/openstack/nova/blob/master/nova/scheduler/filters/aggregate_instance_extra_specs.py#L64 | 20:37 |
*** aspiers has quit IRC | 20:38 | |
*** amodi has quit IRC | 20:38 | |
melwitt | interesting, my devstack change started running on a node within 14 minutes of uploading it | 20:39 |
melwitt | reminds me of something in a ML thread where things in the nova queue have to wait longer because of something to do with zuul scheduling? | 20:40 |
efried | kaiokmo: Does this happen to be on a rebuild? | 20:41 |
kaiokmo | melwitt: debug is set to True on nova.conf but can't see messages like that on nova-scheduler log | 20:42 |
kaiokmo | efried: no. RUN_ON_REBUILD is set to false on the filter, and I'm only spawning new instances, not rebuilding an existing one | 20:43 |
*** ttsiouts has quit IRC | 20:43 | |
melwitt | kaiokmo: hm, did you restart nova-scheduler after adding debug=True? that config should work. we'll probably need to look at the debug logs to get to the bottom of this because I didn't find any open bugs mentioning the AggregateInstanceExtraSpecsFilter and no recent code changes | 20:44 |
efried | kaiokmo: I'm looking specifically at https://review.opendev.org/#/c/523212/ which introduced RUN_ON_REBUILD and wondering if you're running afoul of the logic at https://review.opendev.org/#/c/523212/2/nova/scheduler/host_manager.py | 20:45 |
efried | that's some complex stuff with check type and force hosts/nodes, I don't really understand it. | 20:46 |
efried | kaiokmo: If you felt like experimenting, you could try reverting (pieces of) that change and see if the problem goes away. | 20:46 |
kaiokmo | melwitt: yes, I did restart the service. right now I can see some DEBUG messages on the logs | 20:47 |
melwitt | oh, good. argh, I just realized you had said the problem is all hosts are included, not the other way around | 20:48 |
kaiokmo | efried: I can test reverting this change, although I don't feel like it is going to change anything for me | 20:48 |
melwitt | so you wouldn't see messages like that if it thinks the hosts are good when they should not be | 20:48 |
*** aspiers has joined #openstack-nova | 20:49 | |
kaiokmo | I see. I'm wondering if even the host_passes function is being called. I managed to put some LOG.debug myself inside the function, but none of my messages are displayed on the logs | 20:50 |
efried | kaiokmo: It's just the only thing that changed on pike in that particular filter, so it's the first suspect for root cause. | 20:50 |
efried | right | 20:50 |
melwitt | kaiokmo: does your flavor have extra_specs in it? because the filter code is showing it will "pass" the host if the flavor does not have any extra_specs | 20:50 |
melwitt | oh really, ok that's not good. lemme see... | 20:50 |
*** BjoernT has joined #openstack-nova | 20:51 | |
kaiokmo | btw, the is being loaded correctly by the scheduler. LOG.debug placed before host_passes appear on the logs | 20:51 |
melwitt | kaiokmo: this might be a problem with your config, the enabled_filters and available_filters are supposed to be lists. I wonder if that's what's wrong? | 20:53 |
melwitt | I can never remember how to format lists in ini conf | 20:56 |
melwitt | looks like it's just comma separated: enabled_filters = RetryFilter,AvailabilityZoneFilter | 20:57 |
efried | iiuc you're using the same config file as before? | 20:58 |
efried | kaiokmo: Oh, disregard the thing about reverting that patch. I misread your original issue, you said this was working on pike and broke on rocky. That patch went back to pike, so probably isn't to blame. | 20:59 |
melwitt | I was wondering the same thing. because according to this, it should be the class name only, not the whole path. but I don't know whether having the whole path would hurt or not | 20:59 |
kaiokmo | melwitt: yes. metadata on aggregate is custom: "aggregate" with "test" value | 20:59 |
kaiokmo | metadata on flavor is "aggregate_instance_extra_specs:aggregate" with "test" value | 21:00 |
melwitt | kaiokmo: when you added your own debug messages, did you put one here? https://github.com/openstack/nova/blob/master/nova/scheduler/filters/aggregate_instance_extra_specs.py#L48 wondering if you would see that one? | 21:01 |
kaiokmo | melwitt: yes, it is LOG.debug("***OMG***") :) | 21:02 |
melwitt | haha xD | 21:03 |
melwitt | did you put one right at the beginning of host_passes before anything else? because if that's not showing, then you're right the filter isn't running | 21:04 |
melwitt | if the filter isn't running, I'd try to use config options like available_filters = AggregateInstanceExtraSpecsFilter enabled_filters = AggregateInstanceExtraSpecsFilter and then see if you see the filter run | 21:05 |
kaiokmo | yes. I put one on the beginning of the function, and it is not showing. but, I also did put one right before the function and this one appears on the logs when the nova-scheduler service is restarted (which I think means that the filter was loaded) | 21:06 |
kaiokmo | that's what I did: this is the filter section on nova.conf for nova-scheduler http://paste.openstack.org/show/752054/ | 21:07 |
melwitt | yeah, I agree, the message you do see indicates the class was loaded by something | 21:07 |
melwitt | I'd try also changing available_filters to match enabled_filters, i.e. use class name only | 21:08 |
kaiokmo | ok. trying that now | 21:08 |
melwitt | because available_filters is a superset of enabled_filters | 21:08 |
melwitt | if that doesn't work to get the filter to run, then I will be lost again | 21:09 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix hard-delete of instance with soft-deleted referential constraints https://review.opendev.org/661398 | 21:10 |
*** slaweq has joined #openstack-nova | 21:11 | |
*** mriedem has quit IRC | 21:13 | |
*** ttsiouts has joined #openstack-nova | 21:13 | |
kaiokmo | melwitt: well, if available_filter is class only, oslo importutils (called by nova-scheduler) doesn't seem to be able to load the filter | 21:14 |
kaiokmo | ValueError: Empty module name | 21:14 |
melwitt | :\ | 21:14 |
melwitt | ok, then I guess the config help on that one is wrong. these available_filters and enabled_filters are supposed to work the same way | 21:16 |
melwitt | next I would try commenting out the available_filters setting altogether and let it default to all filters | 21:16 |
melwitt | it will still only use what you have in enabled_filters and won't use any additional filters | 21:16 |
melwitt | and if that enables the filter to run, then it sounds like we have some kind of bug or config help doc problem with the available_filters option | 21:17 |
*** ttsiouts has quit IRC | 21:17 | |
melwitt | I can't find any examples of it being used and not being left as the default, so I don't know it's intended to be set as a list | 21:18 |
melwitt | *how it's intended | 21:19 |
*** slaweq has quit IRC | 21:24 | |
*** whoami-rajat has quit IRC | 21:28 | |
*** xek__ has quit IRC | 21:31 | |
kaiokmo | melwitt: did as you said. the filter was loaded (like before), but it did not run (also like before) | 21:34 |
melwitt | ... I don't understand what is going on | 21:36 |
melwitt | I'm going to try on my devstack | 21:37 |
*** yankcrime has quit IRC | 21:37 | |
kaiokmo | me neither | 21:38 |
melwitt | ok, when I appended AggregateInstanceExtraSpecsFilter to enabled_filters in devstack it runs the filter | 21:40 |
kaiokmo | runs it or loads it? did you created a host aggregate and a flavor (both with the proper metadata)? | 21:42 |
*** ttsiouts has joined #openstack-nova | 21:42 | |
melwitt | runs it, I added a log message right at the beginning inside the host_passes method | 21:42 |
melwitt | no, I didn't do anything with aggregate and flavor. the filter not running would be unrelated to aggregate/flavor AFAIK | 21:43 |
melwitt | I'll try it | 21:44 |
kaiokmo | doesn't seem to work for me. I configured nova.conf as you suggested, with enabled_filters=AggregateInstanceExtraSpecsFilter and I left available_filters commented to load all filters by default | 21:44 |
kaiokmo | in which tag are your nova/devstack on? | 21:45 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Introduces SDK to IronicDriver and uses for node.get https://review.opendev.org/642899 | 21:45 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Use SDK instead of ironicclient for node.list https://review.opendev.org/656027 | 21:45 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Use SDK instead of ironicclient for validating instance and node https://review.opendev.org/656028 | 21:45 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: Use SDK instead of ironicclient for setting instance id https://review.opendev.org/659690 | 21:45 |
openstackgerrit | Dustin Cowles proposed openstack/nova master: WIP: Use SDK instead of ironicclient for add/remove instance info from node https://review.opendev.org/659691 | 21:45 |
melwitt | I'm at a5e3054e1d6df248fc4c00b9abd7289dde160393 in train | 21:46 |
melwitt | when I set enabled_filters = AggregateInstanceExtraSpecsFilter I get only the one filter running | 21:47 |
*** lbragstad has quit IRC | 21:52 | |
melwitt | gonna try now with aggregate and flavor | 21:52 |
kaiokmo | ok, thank you. it is probably gonna work, since the filter is running | 21:53 |
kaiokmo | I'm at 18.0.0, deployed with OSA | 21:54 |
*** JamesBenson has quit IRC | 22:00 | |
melwitt | ok, booting with matching metadata works. now I'll try no match | 22:00 |
kaiokmo | my knowledge about nova behaviou (api and etcetera) is limited. could this be related with the placement service or something? | 22:01 |
melwitt | nonmatch fails to boot | 22:01 |
kaiokmo | s/behaviou/behavior | 22:01 |
melwitt | I thought about whether placement could be involved, but that would *not* affect whether your filter runs | 22:02 |
melwitt | when something changes related to placement, it would a change where the host candidates coming back from placement are problematic in some way | 22:03 |
melwitt | placement will pre-filter hosts before they run through the nova scheduler filters | 22:03 |
kaiokmo | ah. I see | 22:03 |
melwitt | the fact that your filter isn't running at all is bizarre and I can't think of how that could be happening other than a config problem | 22:04 |
*** BjoernT has quit IRC | 22:06 | |
*** slaweq has joined #openstack-nova | 22:11 | |
melwitt | kaiokmo: when the nova-scheduler service starts up, the DEBUG log will show what values it picked up for available_filters and enabled_filters. you might take a look at that to see if it yields any clues | 22:14 |
openstackgerrit | Merged openstack/nova master: Move get_pci_mapping_for_migration to MigrationContext https://review.opendev.org/643023 | 22:14 |
openstackgerrit | Merged openstack/nova master: Allow driver to properly unplug VIFs on destination on confirm resize https://review.opendev.org/643024 | 22:14 |
melwitt | for example May 24 21:46:45 ubuntu-xenial nova-scheduler[15474]: DEBUG oslo_service.service [None req-3351fca8-ca06-4931-bebb-e1f40771a478 None None] filter_scheduler.enabled_filters = ['AggregateInstanceExtraSpecsFilter'] | 22:14 |
melwitt | May 24 21:46:45 ubuntu-xenial nova-scheduler[15474]: DEBUG oslo_service.service [None req-3351fca8-ca06-4931-bebb-e1f40771a478 None None] filter_scheduler.available_filters = ['nova.scheduler.filters.all_ | 22:15 |
melwitt | filters'] | 22:15 |
kaiokmo | mine is filter_scheduler.available_filters = ['nova.scheduler.filters.all_filters'] | 22:16 |
kaiokmo | and filter_scheduler.enabled_filters = ['AggregateInstanceExtraSpecs Filter'] | 22:16 |
kaiokmo | which seems correct | 22:17 |
melwitt | is the space in the middle of ['AggregateInstanceExtraSpecs Filter'] just a typo in chat or? | 22:17 |
melwitt | because it should not have a space | 22:17 |
kaiokmo | typo in chat. | 22:19 |
kaiokmo | for a moment I thought "omg, it can't be" :) | 22:20 |
melwitt | scheduler.driver = filter_scheduler should be in the log too | 22:23 |
*** slaweq has quit IRC | 22:25 | |
kaiokmo | yeah, it is here | 22:25 |
melwitt | w t f | 22:26 |
kaiokmo | doesn't make sense for me either | 22:27 |
kaiokmo | been trying to get this working since yesterday. | 22:28 |
melwitt | yeah, I dunno what to tell you. I'd be putting prints all over the code and just try to trace it at this point | 22:30 |
melwitt | do you see a log like this at least? DEBUG nova.filters [None req-4d2d1a33-c0da-4285-a500-3806a3313c4d admin admin] Starting with 1 host(s) | 22:30 |
melwitt | that's what it says before it starts running filters | 22:30 |
kaiokmo | not really. only a bunch of "Lock acquired...", "Running periodic tasks...", and "Succesfully synced instances... | 22:33 |
kaiokmo | so, the filters are not running at all? that's not good | 22:33 |
melwitt | backing up, you are seeing a server successfully boot right? and it's going onto hosts you don't want? | 22:34 |
melwitt | if so, are you running one scheduler or multiple? at least one of them should be showing filtering related messages if debug=True in the nova-scheduler config | 22:35 |
melwitt | unless you are forcing to specific host or something in your nova boot command | 22:35 |
*** _erlon_ has quit IRC | 22:36 | |
kaiokmo | yeah, I can boot servers but they are going to whatever host chosen by scheduler. I'm running three schedulers on separated infra nodes, behind a haproxy. | 22:39 |
kaiokmo | yes, I'm looking for the messages on the three of them, and replicating the same configuration. | 22:40 |
melwitt | O.o | 22:40 |
kaiokmo | this is the entire nova.conf I'm using http://paste.openstack.org/show/752060/ | 22:42 |
kaiokmo | doesn't seem to be anything wrong with scheduler and filter_scheduler sections | 22:42 |
melwitt | I see a few config options whose names have changed in later releases but nothing that should be hurting | 22:45 |
melwitt | should also see a log message like this on at least one scheduler | 22:46 |
melwitt | DEBUG nova.scheduler.filter_scheduler [None req-4d2d1a33-c0da-4285-a500-3806a3313c4d admin admin] Filtered | 22:46 |
kaiokmo | none of the sort. nothing like "nova.scheduler.filter_scheduler" | 22:49 |
melwitt | is your deployment custom patched or anything? | 22:53 |
tonyb | dansmith: thanks! and sorry | 22:56 |
*** macza has quit IRC | 23:02 | |
*** luksky has joined #openstack-nova | 23:04 | |
*** rcernin has joined #openstack-nova | 23:05 | |
openstackgerrit | Merged openstack/nova master: Move patch_exists() to nova.test.TestCase for reuse https://review.opendev.org/660500 | 23:07 |
*** slaweq has joined #openstack-nova | 23:11 | |
*** rcernin has quit IRC | 23:11 | |
*** KH-Jared has quit IRC | 23:16 | |
*** mkarpiarz has quit IRC | 23:24 | |
*** slaweq has quit IRC | 23:24 | |
openstackgerrit | Adam Spiers proposed openstack/nova master: Move selective patching of open() to nova.test for reuse https://review.opendev.org/661266 | 23:39 |
openstackgerrit | Adam Spiers proposed openstack/nova master: Provide HW_CPU_AMD_SEV trait when SEV is supported https://review.opendev.org/638680 | 23:46 |
openstackgerrit | Adam Spiers proposed openstack/nova master: Reduce logging of host hypervisor capabilities to DEBUG level https://review.opendev.org/661379 | 23:47 |
openstackgerrit | Adam Spiers proposed openstack/nova master: Add <launchSecurity> element to libvirt guest XML for AMD SEV https://review.opendev.org/636318 | 23:49 |
openstackgerrit | Adam Spiers proposed openstack/nova master: Extract SEV-specific bits on host detection https://review.opendev.org/636334 | 23:52 |
*** ttsiouts has quit IRC | 23:53 | |
openstackgerrit | Merged openstack/nova master: Skip novnc tests in multi-cell job until bug 1830417 is fixed https://review.opendev.org/661371 | 23:58 |
openstack | bug 1830417 in devstack "NoVNCConsoleTestJSON.test_novnc fails in nova-multi-cell job since 5/20" [Undecided,In progress] https://launchpad.net/bugs/1830417 - Assigned to melanie witt (melwitt) | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!