*** jmlowe has quit IRC | 00:00 | |
*** markvoelker has joined #openstack-nova | 00:10 | |
*** markvoelker has quit IRC | 00:15 | |
sean-k-mooney | it was an automatic rebase against master | 00:16 |
---|---|---|
sean-k-mooney | no conflicts | 00:16 |
sean-k-mooney | so just commit and rebase against master and push | 00:16 |
sean-k-mooney | artom: ^ | 00:16 |
artom | sean-k-mooney, ack, thanks | 00:16 |
sean-k-mooney | also something is broken with hw:mem_page_size=small. i get an error about pagesize changeing when i try to migrate when i set that. | 00:18 |
*** gyee has quit IRC | 00:18 | |
sean-k-mooney | not sure if that is related to your changes or not | 00:18 |
*** threestrands has joined #openstack-nova | 00:24 | |
*** hongbin has joined #openstack-nova | 00:37 | |
*** tbachman has joined #openstack-nova | 00:38 | |
*** brault has joined #openstack-nova | 00:52 | |
artom | sean-k-mooney, do the 2 hosts have different page sizes? | 00:52 |
artom | (Or it could be a legit bug with that code, as it's new) | 00:52 |
*** ociuhandu has joined #openstack-nova | 00:53 | |
*** brault has quit IRC | 00:57 | |
*** ociuhandu has quit IRC | 00:57 | |
artom | Oh snap, rollback works | 00:59 |
artom | *actually* works | 00:59 |
artom | If I don't drop the move claim on the dest (comment that bit out), the test fails with the expected pinning assertion | 01:00 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Introduce live_migration_claim() https://review.opendev.org/635669 | 01:01 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: New objects for NUMA live migration https://review.opendev.org/634827 | 01:01 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: LM: add support for augmenting migrate_data with info from claims https://review.opendev.org/634828 | 01:01 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: LM: add support for updating NUMA-related XML on the source https://review.opendev.org/635229 | 01:01 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: NUMA live migration support https://review.opendev.org/634606 | 01:01 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration https://review.opendev.org/640021 | 01:01 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration https://review.opendev.org/672595 | 01:01 |
* artom -> gym | 01:01 | |
*** slaweq has joined #openstack-nova | 01:11 | |
*** rcernin has quit IRC | 01:13 | |
*** slaweq has quit IRC | 01:16 | |
*** factor has joined #openstack-nova | 01:33 | |
openstackgerrit | Merged openstack/nova master: [Trivial]Remove unused helper get_vif_devname_with_prefix https://review.opendev.org/678136 | 01:35 |
*** zhubx has quit IRC | 01:57 | |
*** zhubx has joined #openstack-nova | 01:57 | |
*** zhubx has quit IRC | 02:08 | |
*** slaweq has joined #openstack-nova | 02:11 | |
*** rcernin has joined #openstack-nova | 02:13 | |
*** slaweq has quit IRC | 02:16 | |
*** larainema has joined #openstack-nova | 02:23 | |
openstackgerrit | Merged openstack/nova stable/stein: rt: only map compute node if we created it https://review.opendev.org/676278 | 02:30 |
openstackgerrit | Merged openstack/nova stable/stein: Add functional regression recreate test for bug 1839560 https://review.opendev.org/676507 | 02:30 |
openstack | bug 1839560 in OpenStack Compute (nova) stein "ironic: moving node to maintenance makes it unusable afterwards" [High,In progress] https://launchpad.net/bugs/1839560 - Assigned to Matt Riedemann (mriedem) | 02:30 |
openstackgerrit | Merged openstack/nova stable/stein: Restore soft-deleted compute node with same uuid https://review.opendev.org/676509 | 02:30 |
*** dannins has joined #openstack-nova | 02:38 | |
*** gbarros has quit IRC | 03:04 | |
*** markvoelker has joined #openstack-nova | 03:10 | |
*** markvoelker has quit IRC | 03:15 | |
*** nicolasbock has quit IRC | 03:25 | |
*** psachin has joined #openstack-nova | 03:33 | |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: NUMA live migration support https://review.opendev.org/634606 | 03:47 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration https://review.opendev.org/640021 | 03:47 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration https://review.opendev.org/672595 | 03:47 |
*** udesale has joined #openstack-nova | 04:06 | |
*** markvoelker has joined #openstack-nova | 04:10 | |
*** slaweq has joined #openstack-nova | 04:11 | |
*** hongbin has quit IRC | 04:13 | |
*** markvoelker has quit IRC | 04:15 | |
*** slaweq has quit IRC | 04:16 | |
*** ratailor has joined #openstack-nova | 04:20 | |
*** mkrai has joined #openstack-nova | 04:31 | |
*** shilpasd has joined #openstack-nova | 04:33 | |
openstackgerrit | Merged openstack/nova master: [Trivial]Remove unused helper _get_min_service_version https://review.opendev.org/678446 | 04:35 |
*** dave-mccowan has quit IRC | 04:36 | |
*** ociuhandu has joined #openstack-nova | 04:53 | |
*** ociuhandu has quit IRC | 04:57 | |
*** macz has joined #openstack-nova | 04:58 | |
*** shilpasd has quit IRC | 05:02 | |
*** macz has quit IRC | 05:03 | |
*** sridharg has joined #openstack-nova | 05:10 | |
*** sridharg has quit IRC | 05:10 | |
*** markvoelker has joined #openstack-nova | 05:10 | |
openstackgerrit | Dustin Cowles proposed openstack/nova master: WIP: Load the custom resource providers to resource tracker https://review.opendev.org/676522 | 05:10 |
*** sridharg has joined #openstack-nova | 05:11 | |
*** slaweq has joined #openstack-nova | 05:11 | |
*** dustinc has joined #openstack-nova | 05:11 | |
*** Sundar has joined #openstack-nova | 05:12 | |
*** markvoelker has quit IRC | 05:15 | |
*** slaweq has quit IRC | 05:16 | |
*** Sundar has quit IRC | 05:18 | |
*** boxiang has joined #openstack-nova | 05:23 | |
*** factor has quit IRC | 05:41 | |
*** Luzi has joined #openstack-nova | 05:43 | |
*** damien_r has quit IRC | 05:47 | |
*** ccamacho has quit IRC | 05:52 | |
*** hamzy has quit IRC | 05:59 | |
*** lpetrut has joined #openstack-nova | 06:02 | |
*** macz has joined #openstack-nova | 06:11 | |
*** zhubx has joined #openstack-nova | 06:11 | |
*** slaweq has joined #openstack-nova | 06:11 | |
*** boxiang has quit IRC | 06:11 | |
*** macz has quit IRC | 06:15 | |
*** slaweq has quit IRC | 06:15 | |
*** prometheanfire has quit IRC | 06:18 | |
openstackgerrit | melanie witt proposed openstack/nova master: nova-manage db archive_deleted_rows is not multi-cell aware https://review.opendev.org/507486 | 06:22 |
openstackgerrit | melanie witt proposed openstack/nova master: Verify archive_deleted_rows --all-cells in post test hook https://review.opendev.org/672840 | 06:22 |
*** prometheanfire has joined #openstack-nova | 06:22 | |
*** zhubx has quit IRC | 06:24 | |
*** zhubx has joined #openstack-nova | 06:24 | |
*** zhubx has quit IRC | 06:26 | |
*** lee1 has joined #openstack-nova | 06:32 | |
*** brault has joined #openstack-nova | 06:38 | |
*** macz has joined #openstack-nova | 06:39 | |
*** epoojad has joined #openstack-nova | 06:40 | |
*** macz has quit IRC | 06:44 | |
*** trident has quit IRC | 07:00 | |
*** mkrai_ has joined #openstack-nova | 07:02 | |
*** mkrai__ has joined #openstack-nova | 07:05 | |
*** mkrai has quit IRC | 07:05 | |
*** markvoelker has joined #openstack-nova | 07:06 | |
*** mkrai_ has quit IRC | 07:08 | |
*** trident has joined #openstack-nova | 07:10 | |
*** slaweq has joined #openstack-nova | 07:11 | |
*** markvoelker has quit IRC | 07:15 | |
*** slaweq has quit IRC | 07:15 | |
*** ivve has joined #openstack-nova | 07:17 | |
*** macz has joined #openstack-nova | 07:17 | |
*** macz has quit IRC | 07:22 | |
*** sapd1_x has joined #openstack-nova | 07:25 | |
*** xek has joined #openstack-nova | 07:30 | |
*** threestrands has quit IRC | 07:32 | |
*** markvoelker has joined #openstack-nova | 07:35 | |
*** dtantsur|afk is now known as dtantsur | 07:37 | |
*** damien_r has joined #openstack-nova | 07:39 | |
*** rcernin has quit IRC | 07:40 | |
*** markvoelker has quit IRC | 07:40 | |
*** lee1 is now known as lyarwood | 07:46 | |
*** slaweq has joined #openstack-nova | 07:52 | |
*** ralonsoh has joined #openstack-nova | 07:52 | |
*** panda has quit IRC | 08:02 | |
*** panda has joined #openstack-nova | 08:02 | |
*** tkajinam has quit IRC | 08:07 | |
*** epoojad has quit IRC | 08:12 | |
*** mkrai__ has quit IRC | 08:15 | |
*** dougsz has joined #openstack-nova | 08:17 | |
*** tetsuro has joined #openstack-nova | 08:21 | |
*** bhagyashris has joined #openstack-nova | 08:26 | |
aspiers | kashyap: welcome back | 08:29 |
* kashyap waves hi; thanks | 08:29 | |
aspiers | kashyap: hope you had a good time! | 08:36 |
*** jangutter has quit IRC | 08:37 | |
kashyap | Partly, yes. And partly it was a "responsibility trip" :-) | 08:37 |
*** markvoelker has joined #openstack-nova | 08:40 | |
openstackgerrit | zhangyujun proposed openstack/nova master: Should not raise when restore power on failed https://review.opendev.org/624854 | 08:41 |
*** mkrai__ has joined #openstack-nova | 08:44 | |
*** markvoelker has quit IRC | 08:45 | |
*** avolkov has joined #openstack-nova | 08:46 | |
*** ccamacho has joined #openstack-nova | 08:47 | |
*** shilpasd has joined #openstack-nova | 08:47 | |
*** priteau has joined #openstack-nova | 08:50 | |
*** derekh has joined #openstack-nova | 08:50 | |
*** slaweq has quit IRC | 08:53 | |
*** macz has joined #openstack-nova | 08:53 | |
*** mkrai__ has quit IRC | 08:54 | |
*** macz has quit IRC | 08:58 | |
*** jangutter has joined #openstack-nova | 09:01 | |
*** cdent has joined #openstack-nova | 09:02 | |
*** cdent has quit IRC | 09:04 | |
*** licanwei has joined #openstack-nova | 09:05 | |
licanwei | #join #airshipit | 09:05 |
*** cdent has joined #openstack-nova | 09:06 | |
*** slaweq has joined #openstack-nova | 09:10 | |
*** mkrai__ has joined #openstack-nova | 09:10 | |
*** CeeMac has joined #openstack-nova | 09:12 | |
*** zbr has joined #openstack-nova | 09:13 | |
*** slaweq has quit IRC | 09:14 | |
*** slaweq has joined #openstack-nova | 09:20 | |
*** slaweq has quit IRC | 09:25 | |
*** dtantsur is now known as dtantsur|bbl | 09:31 | |
*** brinzhang_ has quit IRC | 09:32 | |
*** brinzhang_ has joined #openstack-nova | 09:32 | |
aspiers | sean-k-mooney: let me know if I can help with bp/image-metadata-prefiltering | 09:34 |
aspiers | sean-k-mooney: that's not me being generous, you're just ahead of me on the runway ;-) | 09:34 |
aspiers | looks like there are some CI failures, maybe I could fix those | 09:35 |
*** psachin has quit IRC | 09:37 | |
*** psachin has joined #openstack-nova | 09:41 | |
*** jaosorior has quit IRC | 09:43 | |
openstackgerrit | Luyao Zhong proposed openstack/nova master: db: Add resources column in instance_extra table https://review.opendev.org/678447 | 09:44 |
openstackgerrit | Luyao Zhong proposed openstack/nova master: object: Introduce Resource and ResouceList objs https://review.opendev.org/678448 | 09:44 |
openstackgerrit | Luyao Zhong proposed openstack/nova master: Add resources dict into _Provider https://review.opendev.org/678449 | 09:44 |
openstackgerrit | Luyao Zhong proposed openstack/nova master: Retrive the allocations early https://review.opendev.org/678450 | 09:44 |
openstackgerrit | Luyao Zhong proposed openstack/nova master: Track orphan instances and error migrations in resource tracker https://review.opendev.org/678451 | 09:44 |
openstackgerrit | Luyao Zhong proposed openstack/nova master: Claim resources in resource tracker https://review.opendev.org/678452 | 09:44 |
openstackgerrit | Luyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces https://review.opendev.org/678453 | 09:44 |
openstackgerrit | Luyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree https://review.opendev.org/678454 | 09:44 |
openstackgerrit | Luyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup https://review.opendev.org/678455 | 09:44 |
openstackgerrit | Luyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec https://review.opendev.org/678456 | 09:44 |
openstackgerrit | Luyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory https://review.opendev.org/678470 | 09:44 |
*** psachin has quit IRC | 09:48 | |
*** brinzhang_ has quit IRC | 09:49 | |
*** brinzhang_ has joined #openstack-nova | 09:50 | |
*** sapd1_x has quit IRC | 09:59 | |
openstackgerrit | mathieu bultel proposed openstack/nova master: Return security groups by name https://review.opendev.org/678776 | 10:02 |
*** bbobrov has quit IRC | 10:04 | |
*** dpawlik has quit IRC | 10:04 | |
*** bbobrov has joined #openstack-nova | 10:04 | |
*** cgoncalves has quit IRC | 10:04 | |
*** cgoncalves has joined #openstack-nova | 10:04 | |
*** bhagyashris has quit IRC | 10:11 | |
aspiers | bauzas: I was just looking at fixing some of the issues in sean-k-mooney's patch you just reviewed https://review.opendev.org/#/c/666914/ | 10:18 |
bauzas | cool | 10:18 |
aspiers | although I'm not sure if I'd be treading on his toes | 10:18 |
bauzas | well, sean is there :) | 10:19 |
aspiers | https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_14/666914/11/check/openstack-tox-py27/ee133dd/testr_results.html.gz shows an ordering error which is probably the dict.values issue you mention | 10:19 |
aspiers | only on py27 not py3 | 10:19 |
aspiers | I'm not sure if he's around today | 10:19 |
*** macz has joined #openstack-nova | 10:19 | |
sean-k-mooney | bauzas: yep i know values change between py27 and py3X | 10:20 |
sean-k-mooney | i normally would use six.iteritems | 10:20 |
sean-k-mooney | but aspiers and stephenfin prefer that i dont | 10:21 |
openstackgerrit | Gorka Eguileor proposed openstack/nova master: Use os-brick locking for volume attach and detach https://review.opendev.org/614190 | 10:21 |
aspiers | sean-k-mooney: I don't personally care but it seems that everyone else does | 10:21 |
bauzas | why so ? | 10:21 |
sean-k-mooney | aspiers: i dont like use writing slow code by default | 10:21 |
bauzas | wait, six.itervalues() isn't exactly slow | 10:21 |
sean-k-mooney | no thats what i prefer to use | 10:22 |
sean-k-mooney | for values | 10:22 |
sean-k-mooney | or iteritems for items | 10:22 |
stephenfin | I only care insofar as it generally doesn't make any difference and I'd rather one less 'six' thing to clean up when we move to Python 3 only | 10:22 |
sean-k-mooney | stephenfin: in aggreate it does | 10:22 |
bauzas | https://github.com/benjaminp/six/blob/master/six.py#L584 | 10:22 |
stephenfin | on python 2, yes | 10:23 |
bauzas | that's what does six.itervalues | 10:23 |
aspiers | I'm with stephenfin on this. The majority of cases we are talking about involve like 3 elements | 10:23 |
bauzas | stephenfin: of course | 10:23 |
stephenfin | but OSP 16 will be deployed with Python 3 (RHEL 8) | 10:23 |
aspiers | so the cost of cleaning up six stuff is higher than the performance hit | 10:23 |
stephenfin | I suspect SUSE and Canonical will be doing something similar | 10:23 |
stephenfin | Yeah, that ^ | 10:23 |
sean-k-mooney | aspiers: yes but if we ever packport the code we pay that cost for years | 10:23 |
bauzas | technically we support py2 for a while | 10:23 |
sean-k-mooney | hell if we dont we still do | 10:23 |
bauzas | and the "we" is about the upstream code | 10:24 |
aspiers | if there are exceptions with like 10000 element dicts then fine we can make an exception for those | 10:24 |
sean-k-mooney | we have people only finally moving to queens | 10:24 |
stephenfin | bauzas: Correct. But no distro is packaging it that I'm aware of | 10:24 |
sean-k-mooney | we will have people on py27 downstream until 2023 | 10:24 |
stephenfin | sean-k-mooney: There's no performance impact either, right? It's a memory impact | 10:24 |
stephenfin | Not for new code | 10:24 |
bauzas | stephenfin: sure but then that's not a good reason for not supporting py27 :) | 10:24 |
*** macz has quit IRC | 10:24 | |
sean-k-mooney | stephenfin: no its slower to use .items or .values on py27 too | 10:24 |
bauzas | anyway | 10:24 |
aspiers | this is a classic bikeshed discussion X-D | 10:25 |
stephenfin | no, but it is a good reason to make an already mostly unnecessary thing even more unnecessary | 10:25 |
bauzas | aspiers: this | 10:25 |
stephenfin | I want blue. | 10:25 |
sean-k-mooney | we are making a copy of all the values and construction an new list | 10:25 |
bauzas | stephenfin: i want snow | 10:25 |
aspiers | stephenfin: you can have any blue, as long as it's black | 10:25 |
sean-k-mooney | bauzas: but anyway do like the new version in general | 10:25 |
stephenfin | bauzas: You can't handle the snow. | 10:25 |
sean-k-mooney | bauzas: its closer to what i had originally planned for the funciton | 10:25 |
bauzas | testtools.matchers._impl.MismatchError: [1, 2, 3, 4] != [1, 2, 4, 3] | 10:25 |
bauzas | this ^ | 10:25 |
aspiers | right | 10:25 |
sean-k-mooney | ya i saw that | 10:25 |
sean-k-mooney | i was going to fix it | 10:25 |
cdent | These conversations make me laugh. We're talking about trying to maintain py2 code when real people won't see it for many months, they are all still long time behind now. | 10:25 |
sean-k-mooney | im relying on dict order | 10:26 |
sean-k-mooney | wich is only a py36+ thing | 10:26 |
sean-k-mooney | ill fix that | 10:26 |
aspiers | sean-k-mooney: you want me to submit my typo/pep8 fixes? | 10:26 |
stephenfin | bauzas: Oh, here's another one you'll probably want to look at https://review.opendev.org/#/c/674894/ | 10:26 |
sean-k-mooney | aspiers: you can if you like. | 10:26 |
cdent | We should be on at least py3.7 everywhere, because by the time the majority of people are using the code... | 10:26 |
stephenfin | I think you know that code from the vGPU work | 10:26 |
stephenfin | cdent: only ~ two months to go | 10:27 |
stephenfin | tick-tock tick-tock | 10:27 |
cdent | :fist bump: | 10:27 |
cdent | or :something: | 10:27 |
bauzas | stephenfin: looking | 10:27 |
stephenfin | and when we do, I'm doing this _everywhere_ https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/integrate-mypy-type-checking | 10:27 |
stephenfin | starting with oslo | 10:28 |
sean-k-mooney | bauzas: i can move the _project_traits function out of the class too. | 10:28 |
bauzas | sean-k-mooney: honestly, that really depends on what you want | 10:28 |
bauzas | sean-k-mooney: I'm not particularly opinionated | 10:28 |
sean-k-mooney | i want it to be a staticmethod | 10:28 |
bauzas | sean-k-mooney: but I feel we should just make it simplier | 10:28 |
bauzas | sean-k-mooney: what's fun is that you explain that this method is absolutely unrelated to the driver, but you call it by its libvirt class name :) | 10:29 |
sean-k-mooney | if we can make it simpler without reducing functionality the sure. have you looked at the folow up patches | 10:29 |
sean-k-mooney | bauzas: no i want to call it with self | 10:29 |
*** mkrai__ has quit IRC | 10:30 | |
bauzas | sean-k-mooney: no I haven't looked at othre patches yet | 10:30 |
openstackgerrit | Adam Spiers proposed openstack/nova master: Libvirt: report storage bus traits https://review.opendev.org/666914 | 10:30 |
aspiers | sean-k-mooney: ^^^ | 10:30 |
sean-k-mooney | @staticmothod is intened to allow you to associate free functions with a class so that they are callable like methods | 10:30 |
aspiers | sean-k-mooney: also, might be just me, but I would personally prefer test_flatten_iterable to be split into multiple smaller test cases | 10:31 |
sean-k-mooney | am i could to that | 10:31 |
aspiers | sean-k-mooney: currently when an assertion fails, you can't immediately see which one failed | 10:31 |
sean-k-mooney | i built it iteritivly so it was simpler to built up with more test cases | 10:31 |
aspiers | also giving each a separate method name would add some clarity to exactly what is being tested | 10:31 |
aspiers | sure | 10:32 |
sean-k-mooney | i kind of grouped them intentionally expceting that comment :) | 10:32 |
sean-k-mooney | so ill split it in the next version | 10:32 |
bauzas | sean-k-mooney: have you clicked on my link ? :) | 10:32 |
sean-k-mooney | yes | 10:32 |
sean-k-mooney | that is not my usecase and it is not a good argument | 10:33 |
sean-k-mooney | i want to be able to call it via self not via the class name | 10:33 |
sean-k-mooney | i can call it via the class name if im in a non instance/class method | 10:34 |
sean-k-mooney | but i an instnace method i wanted to call it via self | 10:34 |
*** tetsuro has quit IRC | 10:34 | |
aspiers | sean-k-mooney: cool thanks | 10:35 |
bauzas | sean-k-mooney: just make it a classmethod the,n | 10:36 |
bauzas | anyway, I need to eat | 10:36 |
sean-k-mooney | it does not need to use any class method or variables. | 10:36 |
sean-k-mooney | anyway i dont really want to argue about this. static method would be the correct decorator to use but maybe instead of sayign what i actully mean i should write the dumbest version that works... | 10:40 |
*** macz has joined #openstack-nova | 10:40 | |
openstackgerrit | Chris Dent proposed openstack/nova master: Add a "Caveats" section to the eventlet profiling docs https://review.opendev.org/676672 | 10:41 |
*** macz has quit IRC | 10:45 | |
sean-k-mooney | bauzas: by the way do you think flatten_iterable shoudl yield the values as it does now or the item tuples. i started with the tuples but it was making the code more complex. i feel like a sperate flatten_mapping would be better if we need it in the future | 10:49 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: allow getting resource request of every bound ports of an instance https://review.opendev.org/655110 | 10:49 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Pass network API to the conducor's MigrationTask https://review.opendev.org/655111 | 10:51 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Add request_spec to server move RPC calls https://review.opendev.org/655721 | 10:54 |
*** slaweq has joined #openstack-nova | 10:55 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: re-calculate provider mapping during migration https://review.opendev.org/655112 | 10:56 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: update allocation in binding profile during migrate https://review.opendev.org/656422 | 10:58 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Extend NeutronFixture to handle migrations https://review.opendev.org/655114 | 11:01 |
*** macz has joined #openstack-nova | 11:01 | |
*** nicolasbock has joined #openstack-nova | 11:02 | |
*** udesale has quit IRC | 11:02 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: prepare func test env for moving servers with bandwidth https://review.opendev.org/655109 | 11:03 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Func test for migrate server with ports having resource request https://review.opendev.org/655113 | 11:05 |
*** macz has quit IRC | 11:06 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Make _rever_allocation nested allocation aware https://review.opendev.org/676138 | 11:08 |
*** brinzhang_ has quit IRC | 11:09 | |
*** brinzhang_ has joined #openstack-nova | 11:09 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Support reverting migration / resize with bandwidth https://review.opendev.org/676140 | 11:10 |
openstackgerrit | Merged openstack/nova master: [Trivial]Remove unused helper get_vm_ref_from_name https://review.opendev.org/678444 | 11:10 |
*** tesseract has joined #openstack-nova | 11:11 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Func test for migrate re-schedule with bandwidth https://review.opendev.org/676972 | 11:12 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Support migrating SRIOV port with bandwidth https://review.opendev.org/676980 | 11:15 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Allow migrating server with port resource request https://review.opendev.org/671497 | 11:17 |
sean-k-mooney | gibi: does ^ work with sriov livemigrtaion or just cold migration | 11:17 |
*** hamzy has joined #openstack-nova | 11:18 | |
*** dpawlik has joined #openstack-nova | 11:18 | |
gibi | sean-k-mooney: just cold so far | 11:18 |
gibi | sean-k-mooney: I still have to work on live migrate, evacuate, and shelve offload | 11:18 |
sean-k-mooney | ok do you have live migration support ingeneral | 11:18 |
sean-k-mooney | ah ok | 11:18 |
sean-k-mooney | i dont think you will have to do anything spcial for sriov in that regard | 11:19 |
gibi | sean-k-mooney: I expect resize to work out of the box but I have to add functinal coverage for it | 11:19 |
sean-k-mooney | if you get generic live migration working then it should work for sriov too | 11:19 |
gibi | sean-k-mooney: the selection of pci devices needs to be driven the same way for live migration as it is now done for cold migration | 11:20 |
*** jaosorior has joined #openstack-nova | 11:20 | |
gibi | sean-k-mooney: but besides that I don't expect any sriov specific thing | 11:20 |
sean-k-mooney | we handel that differently for sriov | 11:20 |
sean-k-mooney | we do not use move claims | 11:20 |
sean-k-mooney | so i hope you are not assuming we do | 11:21 |
sean-k-mooney | ill take a look at the patch in either case | 11:21 |
gibi | sean-k-mooney: how does live migration selects the pci device on the target host? | 11:22 |
*** macz has joined #openstack-nova | 11:22 | |
gibi | sean-k-mooney: if it uses the InstancePCIRequest then I think my strategy will work | 11:23 |
sean-k-mooney | we claim the in the pci resouce tracker in check_can_live_migrate_at_dest | 11:23 |
sean-k-mooney | we create new instancePCIRequest object i think | 11:23 |
gibi | sean-k-mooney: OK, then I need to patch those requests with https://review.opendev.org/#/c/676980/6/nova/compute/manager.py@2141 | 11:24 |
sean-k-mooney | then we store the pci addresse of the new vif objects in the migrate_data | 11:25 |
sean-k-mooney | oh the provider_mappings | 11:25 |
sean-k-mooney | ya proably | 11:25 |
sean-k-mooney | did you see my comment by the way regarding the lenght of that parmater/variable | 11:25 |
gibi | sean-k-mooney: I'm going through the series today so I will | 11:26 |
sean-k-mooney | basically request_group_resource_providers_mapping -> provider_mappings | 11:27 |
*** macz has quit IRC | 11:27 | |
aspiers | sean-k-mooney: shall I rebase the other 2 patches on top so that they have a chance of passing pep8 at least? | 11:27 |
sean-k-mooney | if we need to explain it is a request froup resouce provder mapping we can state that in the doc string | 11:28 |
aspiers | hmm, I guess they will still fail on test_utils | 11:28 |
sean-k-mooney | aspiers: no need ill do that later | 11:28 |
gibi | sean-k-mooney: ack | 11:28 |
sean-k-mooney | was getting through my email | 11:28 |
*** slaweq has quit IRC | 11:31 | |
sean-k-mooney | aspiers: grabing tea but ill rebase the follow up patch and fix the test to work on py27 shortly | 11:32 |
*** slaweq has joined #openstack-nova | 11:33 | |
*** dave-mccowan has joined #openstack-nova | 11:34 | |
*** tbachman has quit IRC | 11:36 | |
*** slaweq has quit IRC | 11:41 | |
*** slaweq has joined #openstack-nova | 11:52 | |
*** slaweq has quit IRC | 11:57 | |
*** tbachman has joined #openstack-nova | 11:59 | |
aspiers | sean-k-mooney: cool | 12:00 |
*** bjolo has joined #openstack-nova | 12:00 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: allow getting resource request of every bound ports of an instance https://review.opendev.org/655110 | 12:04 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Pass network API to the conducor's MigrationTask https://review.opendev.org/655111 | 12:04 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Add request_spec to server move RPC calls https://review.opendev.org/655721 | 12:04 |
openstackgerrit | zhangyujun proposed openstack/nova master: Should not raise when restore power on failed https://review.opendev.org/624854 | 12:07 |
*** dtantsur|bbl is now known as dtantsur | 12:08 | |
*** ratailor_ has joined #openstack-nova | 12:08 | |
*** ratailor has quit IRC | 12:08 | |
*** markvoelker has joined #openstack-nova | 12:10 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: re-calculate provider mapping during migration https://review.opendev.org/655112 | 12:11 |
*** slaweq has joined #openstack-nova | 12:12 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: update allocation in binding profile during migrate https://review.opendev.org/656422 | 12:13 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Extend NeutronFixture to handle migrations https://review.opendev.org/655114 | 12:13 |
*** slaweq has quit IRC | 12:17 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: prepare func test env for moving servers with bandwidth https://review.opendev.org/655109 | 12:18 |
*** ratailor_ has quit IRC | 12:18 | |
gibi | sean-k-mooney: addressed your comments ^^ | 12:19 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Func test for migrate server with ports having resource request https://review.opendev.org/655113 | 12:20 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Make _rever_allocation nested allocation aware https://review.opendev.org/676138 | 12:22 |
sean-k-mooney | cool ill try an emake it through the full seriese end to end this week | 12:24 |
sean-k-mooney | ill also try and deploy it but my backlog of thig to deploy and test is kind of long at the moment | 12:25 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Func test for migrate re-schedule with bandwidth https://review.opendev.org/676972 | 12:25 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Support migrating SRIOV port with bandwidth https://review.opendev.org/676980 | 12:27 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Allow migrating server with port resource request https://review.opendev.org/671497 | 12:27 |
openstackgerrit | Peter Penchev proposed openstack/nova master: libvirt: use native AIO mode for StorPool Cinder volumes. https://review.opendev.org/676172 | 12:27 |
*** larainema has quit IRC | 12:35 | |
*** jaosorior has quit IRC | 12:39 | |
*** jaosorior has joined #openstack-nova | 12:39 | |
*** slaweq has joined #openstack-nova | 12:40 | |
gibi | sean-k-mooney: tanks | 12:41 |
*** nweinber has joined #openstack-nova | 12:43 | |
*** priteau has quit IRC | 12:44 | |
*** dougsz has quit IRC | 12:46 | |
*** gbarros has joined #openstack-nova | 12:48 | |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration https://review.opendev.org/672595 | 12:53 |
*** dougsz has joined #openstack-nova | 12:57 | |
*** slaweq has quit IRC | 13:04 | |
*** jmlowe has joined #openstack-nova | 13:04 | |
*** jaosorior has quit IRC | 13:09 | |
*** macz has joined #openstack-nova | 13:10 | |
*** brinzhang_ has quit IRC | 13:12 | |
*** brinzhang has joined #openstack-nova | 13:14 | |
*** KeithMnemonic has joined #openstack-nova | 13:14 | |
*** macz has quit IRC | 13:15 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Do not query allocations twice in finish_revert_resize https://review.opendev.org/678827 | 13:16 |
openstackgerrit | sean mooney proposed openstack/nova master: Libvirt: report storage bus traits https://review.opendev.org/666914 | 13:17 |
openstackgerrit | sean mooney proposed openstack/nova master: libvirt: use domain capabilities to get supported device models https://review.opendev.org/666915 | 13:17 |
openstackgerrit | sean mooney proposed openstack/nova master: Add transform_image_metadata request filter https://review.opendev.org/665775 | 13:17 |
*** mriedem has joined #openstack-nova | 13:22 | |
*** eharney has joined #openstack-nova | 13:23 | |
*** shilpasd has quit IRC | 13:25 | |
mriedem | lyarwood: if you're hitting stable reviews, https://review.opendev.org/#/c/678254/ is a simple docs fix that has confused many operators | 13:29 |
lyarwood | mriedem: yup trying to, one more downstream fire to go first... | 13:30 |
*** udesale has joined #openstack-nova | 13:32 | |
*** jaosorior has joined #openstack-nova | 13:33 | |
*** dklyle has quit IRC | 13:38 | |
*** david-lyle has joined #openstack-nova | 13:38 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Add resize tests to nova-grenade job https://review.opendev.org/678841 | 13:41 |
*** priteau has joined #openstack-nova | 13:44 | |
*** macz has joined #openstack-nova | 13:48 | |
stephenfin | gibi: I'm seeing the following in a functional test. Any idea why that might be? 'AllocationDeleteFailed: Failed to delete allocations for consumer [uuid]' | 13:48 |
stephenfin | Being raised by 'delete_allocation_for_instance' in 'nova/scheduler/client/report.py' | 13:48 |
stephenfin | I suspect it's to do with cleanup, but I've no idea why this test is triggering it but no others | 13:49 |
gibi | stephenfin: I'm on a call, I will get back to you in an hour | 13:49 |
stephenfin | (y) | 13:49 |
*** macz has quit IRC | 13:52 | |
*** davee_ has joined #openstack-nova | 13:53 | |
artom | dansmith, did you notice the new TODO above https://review.opendev.org/#/c/634827/50/nova/objects/migrate_data.py? I thought it would answer your question. | 13:54 |
dansmith | artom: yes | 13:54 |
artom | dansmith, OK, then I don't understand your question :) | 13:55 |
dansmith | artom: is it the current instance's numa toplology? | 13:55 |
artom | Yes | 13:55 |
*** mlavalle has joined #openstack-nova | 13:55 | |
dansmith | so you're passing a whole extra nested object, which is a thing that the other side already has, to functional as a sentinel? | 13:55 |
artom | Well if you put it that way, or course it's going to sound dumb ;) | 13:56 |
artom | So... just a boolean flag then? | 13:56 |
dansmith | we have a couple other boolean fields above it that indicate dest or source support for a feature | 13:56 |
sean-k-mooney | we also need the data it contians for updating the xml right | 13:57 |
dansmith | that's why I'm asking, but I don't think so | 13:57 |
artom | sean-k-mooney, the dest can pull it from the database | 13:57 |
artom | I need to double check, but it's probably already loaded | 13:57 |
dansmith | and/or has it on the instance which is sent via rpc anyway | 13:57 |
*** mkrai has joined #openstack-nova | 13:58 | |
artom | And we can always add it to expected_attrs in the initial REST API method | 13:58 |
dansmith | you claim you already do that | 13:58 |
dansmith | also, | 13:58 |
sean-k-mooney | by the way im not really following what you are refering to are ye talkign about the LibvirtLiveMigrateNUMAInfo object | 13:58 |
dansmith | it should be lazy-loadable on the instance, and if not, make it | 13:59 |
dansmith | sean-k-mooney: no | 13:59 |
sean-k-mooney | oh ok | 13:59 |
artom | dansmith, am I? It's possible, it's been a while :) | 13:59 |
dansmith | artom: I left a question about it a few minutes ago | 13:59 |
artom | Right, but before that | 13:59 |
sean-k-mooney | oh the numa_toplogy field in https://review.opendev.org/#/c/634827/50/nova/objects/migrate_data.py@247 | 13:59 |
dansmith | artom: I think it's probably time for mriedem to take a run through these to see how he's feeling about some of the softer things I have complained about previously, | 14:00 |
dansmith | like the two nano patches in the middle, and some of the dead code stuff you're setting up | 14:00 |
dansmith | I also need to look at the functional test patch yet | 14:00 |
artom | dansmith, aha, it's loaded in the REST API layer since stephenfin's "disable LM for NUMA" patch | 14:01 |
dansmith | remind me why we can't test this with multinode in the gate? We can't have arbitrarily simple numa topo on a flavor that will always work, but exercise some of this? | 14:01 |
sean-k-mooney | do we need it for revert? initally i did not think we should need to have the source node numa topology in the LibvirtLiveMigrateData | 14:01 |
dansmith | artom: okay, well, I think that's a long walk for a short drink of water, and selective backports could break that assumption | 14:02 |
*** Luzi has quit IRC | 14:02 | |
artom | dansmith, you're saying we could backport something that would stop loading instance.numa_topology in _migrate_live()? | 14:03 |
artom | That would break a whole lot of things | 14:03 |
dansmith | or backport this without that, or backport them both but at different times such that it's no longer a useful sentinel | 14:03 |
artom | NUMA LM isn't backportable... | 14:03 |
sean-k-mooney | we cant backport object change though | 14:03 |
mriedem | what are you talking about? why can't the API pre-load the instance.numa_topology field? | 14:04 |
dansmith | I'm just saying, tie the flag to the actual support, not to some other side effect you think is recent enough | 14:04 |
*** mrjk has joined #openstack-nova | 14:04 | |
artom | dansmith, oh, sorry, we're talking about different things. I'll definitely replace numa_topology in migrate_data with a boolean sentinel | 14:04 |
mriedem | is there just some random, "if 'numa_topology' in instance: <make numa lm work>" logic? | 14:04 |
sean-k-mooney | mriedem: we are talking about is https://review.opendev.org/#/c/634827/50/nova/objects/migrate_data.py@247 requried or can we just make it a bool | 14:04 |
dansmith | mriedem: artom is saying the api is pre-loading it, and then he's checking somewhere to see if it's pre-loaded and determining whether or not the source is new enough to do the numa lm | 14:04 |
stephenfin | Wow, 'NUMAServersWithNetworksTest.test_cold_migrate_with_physnet' takes 12 seconds to run. That's a long-ass functional test | 14:04 |
*** liuyulong has joined #openstack-nova | 14:04 | |
artom | I was saying the destination will have the instance object with numa_topology loaded already | 14:05 |
sean-k-mooney | cant we just check the compute node verion in the conductor | 14:05 |
artom | sean-k-mooney, dansmith was saying there's caching that makes this fragile, IIRC | 14:05 |
dansmith | uh what? | 14:06 |
sean-k-mooney | we do it for sriov live migration | 14:06 |
artom | dansmith, there's no way I'll find the exact gerrit review now, but I definitely remember you being against checking individual compute hosts's service versions for this | 14:06 |
dansmith | artom: we do it for other things, but there was probably some other wrinkle | 14:07 |
dansmith | but | 14:07 |
dansmith | I'm not arguing for that here | 14:07 |
dansmith | I'm saying the thing you're using does not make any sense | 14:07 |
mriedem | piling on | 14:07 |
artom | dansmith, and I agree :) It's a good point, I'll change it to a boolean sentinel, and the dest can pull numa_topology from the instance object it already has | 14:08 |
mriedem | can't the dest check the source compute service version like we did for file-backed memory live migration? https://github.com/openstack/nova/blob/18.0.0/nova/virt/libvirt/driver.py#L6612 | 14:08 |
mriedem | do we need the boolean sentinel at all? | 14:08 |
sean-k-mooney | artom: we do this https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L35-L58 | 14:08 |
dansmith | mriedem: we have a boolean for file-backed in that same object | 14:08 |
artom | mriedem, we need *something* for both computes to agree that they can do NUMA-LM | 14:08 |
sean-k-mooney | i would emulate https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L203-L237 | 14:09 |
mriedem | so we do https://github.com/openstack/nova/blob/18.0.0/nova/objects/migrate_data.py#L237 | 14:09 |
mriedem | which is used when generating the xml on the source to pass to the dest https://github.com/openstack/nova/blob/46a3bcd80b41e99ec4923c7cf3d0f8dd8505e97c/nova/virt/libvirt/migration.py#L269 | 14:10 |
bauzas | just catching the discussion, but why can't we assume checking the RPC versions ? | 14:12 |
artom | Thinking about it, I believe the service version check wasn't accepted because we still want to allow operators to shoot themselves in the foot, so to speak | 14:12 |
bauzas | like we do for a couple of RPC intercommunication ? | 14:12 |
dansmith | bauzas: you can't see the client version from the server side in rpc | 14:13 |
sean-k-mooney | artom: we can still do it and but not make it fail | 14:13 |
artom | So we don't just block the LM entirely if one of the hosts is too old | 14:13 |
dansmith | there's still no reason to even be talking about this | 14:13 |
dansmith | all we need is to pass a boolean and we're done | 14:13 |
*** kashyap has quit IRC | 14:13 | |
artom | Yeah, I'm with dansmith | 14:13 |
artom | Seems way more elegant | 14:13 |
bauzas | cool, we did that with scheduler, IIRC | 14:13 |
dansmith | conductor will strip it out if you're talking to an old compute automatically | 14:13 |
bauzas | we had a sentinel | 14:13 |
sean-k-mooney | well we can sett the bool in the conductor | 14:14 |
dansmith | no | 14:14 |
dansmith | it's migrate_data, set it in the compute | 14:14 |
dansmith | again, the only reason we're even talking about this, is artom was sending a whole topology object instead of saying "yes". | 14:15 |
sean-k-mooney | you could. we need to tell the dest if the souce supprot the feature | 14:15 |
sean-k-mooney | does that line up wiht the calls | 14:15 |
dansmith | he's already sending it from the compute at the right time | 14:15 |
dansmith | he's just saying too much. much like this discussion. | 14:15 |
sean-k-mooney | then ok | 14:15 |
artom | sean-k-mooney, I'll be blunt and pull rank (though not my own): in case two things work the same dansmith's opinion is worth more than yours :P And I say that with massive admiration for the amount of information you keep in your head | 14:15 |
sean-k-mooney | artom: fair its anorrying not to have these checks in the same place in the conductor with the other migration checks | 14:16 |
sean-k-mooney | but i agree we should just have a bool flag | 14:16 |
dansmith | no, this isn't a rank thing, it's just a simplicity thing... | 14:16 |
sean-k-mooney | so just update your current code | 14:16 |
artom | dansmith, a not very convincing thing, apparently :) | 14:17 |
dansmith | migratedata is created after we're done talking to conductor for the last time, AFAIK | 14:17 |
dansmith | so I'm not sure how conductor would ever set it for us | 14:17 |
artom | Yeah, it's created by the source compute, IIRC | 14:17 |
artom | The source driver, in fact | 14:17 |
dansmith | hence why this flag should be set by the computes | 14:17 |
dansmith | it also depends on the config of each compute, | 14:18 |
dansmith | which you can't inspect from conductor | 14:18 |
dansmith | if the two computes involved are pinned to 5.0, then this can't happen even if they both support it | 14:18 |
artom | https://github.com/openstack/nova/blob/18.0.0/nova/compute/manager.py#L6056-L6089 | 14:18 |
stephenfin | gibi: calling 'self._delete_server' at the end of the test did the trick. I don't know why. Missing mocks or something like that. Good enough for me though | 14:19 |
artom | I need to get one kid to daycare, after that it's hax time | 14:19 |
artom | mriedem, you OK with squashing, as per dansmith's -1 here: https://review.opendev.org/#/c/635229/48? | 14:20 |
sean-k-mooney | ok i was suggeting passing in the booll here https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L318-L322 to check_can_live_migrate_destination but im fine with what dan suggest too | 14:21 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta https://review.opendev.org/671801 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Fold in argument to '_update_provider_tree_for_vgpu' https://review.opendev.org/676729 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add reshaper for PCPU https://review.opendev.org/674895 | 14:23 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Simplify 'fakelibvirt.HostInfo' object https://review.opendev.org/678861 | 14:23 |
* stephenfin declares cpu-resources complete. | 14:23 | |
mriedem | artom: my head is buried in something else atm so you're going to have to wait | 14:23 |
artom | mriedem, understood | 14:24 |
stephenfin | dansmith: If you're bored, I went and wrote that online migration to allow us to drop the Liberty-era NUMA topology blobs https://review.opendev.org/#/c/537414 I don't want to brag, but I think it's perfect | 14:25 |
dansmith | stephenfin: you know that if you say that, I double-fist red pens right? :D | 14:27 |
* sean-k-mooney clicks and prepares to run screaming out of the room | 14:27 | |
stephenfin | dansmith: I was counting on it ;) | 14:27 |
dansmith | heh | 14:27 |
gibi | stephenfin: I'm happy that you solved it. I have no idea what was the problem without digging | 14:29 |
sean-k-mooney | stephenfin: pet peve but i find the use of ~ for negation in filters really hard to spot | 14:30 |
sean-k-mooney | it took me 3 times to notice it was not containts give how far apart they are | 14:30 |
dansmith | artom: did you see my question about the multinode test? | 14:31 |
dansmith | artom: earlier in here | 14:31 |
*** udesale has quit IRC | 14:32 | |
artom | dansmith, yeah... I'd have to think about it some more, but I don't think it's possible to test it in the current gate | 14:33 |
artom | Or if it is, it'd be a very small part | 14:33 |
dansmith | stephenfin: you just added an online migration to the patch with the removal as before right? | 14:33 |
dansmith | artom: if I had a guest that just requested one numa node, wouldn't that boot in our current gate workers? | 14:33 |
artom | I *think* I'd rather concentrate on getting whitebox under openstack-qa and getting multi-NUMA node flavors from Fort Nebula | 14:33 |
stephenfin | dansmith: Yeah. That's what I should do, right? | 14:33 |
artom | dansmith, it might, but then what? | 14:34 |
dansmith | stephenfin: it doesn't change the fact that you can't remove that stuff in the same release | 14:34 |
dansmith | artom: live migrate such an instance between workers and that will exercise all of this code, even if it's a boring example right? | 14:34 |
*** spsurya has joined #openstack-nova | 14:34 | |
stephenfin | Oh, because it would have to be a blocker migration to do that? | 14:35 |
dansmith | stephenfin: online migrations run after you've already rolled out new code | 14:35 |
*** dpawlik has quit IRC | 14:35 | |
dansmith | stephenfin: you can't have a blocker and a removal in the same release either, but I was saying a blocker migration is likely too expensive here because of what it has to check, so we should do it with time and warnings | 14:35 |
dansmith | stephenfin: i.e. land an online migration now, and a nova-status check, wait a release or two, then remove the compat code after everyone has had a chance to patch up all their data | 14:36 |
dansmith | stephenfin: a blocker migration helps make sure they clean up before they move, but again, too expensive here I tink | 14:36 |
stephenfin | Damn. So how long does that have to hang around for once I've the migration in-place? | 14:36 |
stephenfin | mkay | 14:36 |
dansmith | stephenfin: well, at least a cycle, but with no blocker, it'd be nice to leave it longer | 14:36 |
* stephenfin respins | 14:37 | |
dansmith | maintaining compat sucks | 14:37 |
stephenfin | I'll probably pull that out of the series too, if I can | 14:37 |
dansmith | it's a good reason to try to get stuff right the first time | 14:37 |
dansmith | yeah, good idea | 14:37 |
stephenfin | true dat | 14:37 |
artom | dansmith, it'll exercise the code, but assuming the workers have identical NUMA topologies, we'll have no way of knowing | 14:38 |
artom | Because in those cases, current live migration "works" | 14:38 |
stephenfin | sean-k-mooney: fyi, six doesn't have a wrapper for the collections -> collections.abc thing | 14:39 |
*** priteau has quit IRC | 14:39 | |
stephenfin | there's an open bug but no patch for it | 14:39 |
artom | Both hosts being identical means we don't need to update the XML or claim any "new" resources | 14:39 |
sean-k-mooney | stephenfin: right but it does for is this python 3 or not | 14:39 |
sean-k-mooney | stephenfin: but ya i need to stil fix that actully so ill respin shortly | 14:40 |
stephenfin | Oh, right, gotcha | 14:40 |
stephenfin | Could you spin out 'flatten_iterable' into a separate patch too, in that case? | 14:40 |
stephenfin | If you have uses for it elsewhere | 14:40 |
dansmith | artom: no way of knowing what? if we see the debugs that show that you're doing the stuff, we'll know it's at least running your code, and not breaking something simple | 14:40 |
sean-k-mooney | i had thought you were using sum in a few places to flatten lists which was the other usecase i wanted to adress with it | 14:40 |
dansmith | artom: even if it generates the same xml on the other side, it's still doing all that work, even if for just a boring topo | 14:41 |
stephenfin | Yeah, I still don't think it's necessary when the sum thing is a well known pattern, but if you're going that way then it's definitely a patch in its own right | 14:41 |
sean-k-mooney | but i cant find them anymore. did you get rid of them recently | 14:41 |
dansmith | artom: seems like a worthwhile thing to have when a reviewer feels uneasy | 14:41 |
sean-k-mooney | and yes ill break it out | 14:41 |
bauzas | stephenfin: are you about to respin ? | 14:41 |
stephenfin | oh, no other uses? | 14:41 |
stephenfin | Hmm | 14:41 |
bauzas | I was on https://review.opendev.org/#/c/674894/11//COMMIT_MSG | 14:41 |
dansmith | artom: and I would think it would be pretty much just tweaking the flavor we create for a regular live migration job | 14:41 |
stephenfin | do we _really_ need it then? | 14:41 |
sean-k-mooney | yes | 14:41 |
* stephenfin is amazed Python doesn't have something like this in stdlib | 14:41 | |
sean-k-mooney | the sum pattern is not a common idiom | 14:42 |
sean-k-mooney | you are the only person i know of that has suggested it | 14:42 |
sean-k-mooney | so i would like to have a fucntion that names that algoritium | 14:42 |
sean-k-mooney | so we can reuse it and not have to rely on trible knowladge | 14:42 |
stephenfin | could we use itertools.chain? | 14:43 |
stephenfin | that does this same thing, right? | 14:43 |
sean-k-mooney | its close but it does not do the right things for maps | 14:43 |
stephenfin | no? | 14:43 |
sean-k-mooney | it allos does not flatten a list | 14:43 |
sean-k-mooney | *list of list | 14:44 |
sean-k-mooney | i could use it internally | 14:44 |
*** lpetrut has quit IRC | 14:44 | |
stephenfin | you can use '*' for that | 14:44 |
stephenfin | I mean, you've one caller | 14:44 |
stephenfin | This is so overengineered :D | 14:44 |
bauzas | sean-k-mooney: I honestly feel flattening an iterable of iterables is just a common pattern that doesn't really need to have a general method | 14:44 |
sean-k-mooney | but based on your feedback to not use the functional part of the standard libary like reduce i wrote it as genertor | 14:44 |
stephenfin | Yeah, I prefer reduce to this /o\ | 14:45 |
bauzas | sean-k-mooney: it's more like, you have so many ways to do what you want, pick one wisely | 14:45 |
stephenfin | I mean, I'm not in love with it, but 2 lines >> ~50 lines | 14:45 |
sean-k-mooney | ok i am going to take a break form irc now before i say some i should not | 14:45 |
bauzas | sean-k-mooney: like, returning a generator if you pass a couple of lists isn't really helpful, righT? | 14:46 |
*** shilpasd has joined #openstack-nova | 14:47 | |
mriedem | i think at this point i might just take this over to move it along https://review.opendev.org/#/c/663851/42 | 14:56 |
mriedem | alex_xu: gmann: efried: ^ any problem with me doing that? | 14:56 |
efried | I haven't been following it | 14:57 |
mriedem | it's close, in a runway for a couple more days, and there are at least 3 other changes open for review that are conflicting for the same microversion | 14:57 |
efried | you planning to +2 it after you "move it along"? | 14:57 |
mriedem | efried: i know, just saying from a PM standpoint | 14:57 |
mriedem | that's the question - my changes would be docs/test really | 14:57 |
mriedem | but alex is the only other core that's really gone through it much | 14:58 |
mriedem | so yeah i'd still want to be able to +2 | 14:58 |
efried | If your changes are docs/test, then I'm good with it. | 14:58 |
*** david-lyle has quit IRC | 15:01 | |
*** david-lyle has joined #openstack-nova | 15:01 | |
*** sridharg has quit IRC | 15:02 | |
*** mtanino has joined #openstack-nova | 15:10 | |
*** tbachman has quit IRC | 15:12 | |
*** jmlowe has quit IRC | 15:13 | |
artom | dansmith, true. I don't mind doing something like that as a follow-up, as I think func tests that test more scenarios are more important to get done first | 15:13 |
dansmith | artom: they're important for sure, they just don't give me all that much confidence | 15:13 |
artom | dansmith, honest question, why? | 15:13 |
artom | We have no way to be sure real libvirt would accept the XML we generate? | 15:14 |
*** jmlowe has joined #openstack-nova | 15:14 | |
*** mkrai has quit IRC | 15:14 | |
*** damien_r has quit IRC | 15:14 | |
dansmith | artom: because every other thing we've landed like this has been fine in the contrived tests, but not actually work in the real world | 15:15 |
artom | dansmith, heh, fair | 15:15 |
dansmith | like because we generate bad libvirt xml, or were digesting not-real-world host xml, etc | 15:15 |
dansmith | numa itself was DoA, IIRC, and pci wasn't actually usable for a long time, etc | 15:15 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Make _rever_allocation nested allocation aware https://review.opendev.org/676138 | 15:16 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Support reverting migration / resize with bandwidth https://review.opendev.org/676140 | 15:16 |
artom | dansmith, would the current combination of manual tests + func tests be enough initially? | 15:17 |
artom | ... I guess it doens't address your point, because all previous things you mention were exactly in that situation | 15:18 |
dansmith | artom: you mean, similar to the manual + func tests we had for things we landed borken in the past? | 15:18 |
*** itlinux has quit IRC | 15:18 | |
dansmith | artom: it just seems like this should be relatively simple to try, by altering the flavor on a LM job, while you're waiting for review or something | 15:18 |
dansmith | mriedem: how hard is it to float a new job or changed job which does one of our multinode LM tests, but with a different flavor that specifies a simple numa topo? | 15:18 |
mriedem | dansmith: the hard part there is likely just hacking the flavor, | 15:20 |
mriedem | but we could probably do that within the script that runs the live migration tests themselves after configuring tempest.conf to use the new flavor | 15:20 |
dansmith | yeah, I was thinking that would be doable | 15:20 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Func test for migrate re-schedule with bandwidth https://review.opendev.org/676972 | 15:20 |
mriedem | https://github.com/openstack/nova/blob/master/nova/tests/live_migration/hooks/run_tests.sh | 15:20 |
mriedem | dansmith: in there ^ | 15:21 |
dansmith | not necessarily landable, but just hacked into place so we can see it run | 15:21 |
dansmith | artom: ^ | 15:21 |
mriedem | we should be able to create a new flavor and then re-configure tempest (on the controller host) to use that for the flavor before running the tests | 15:21 |
artom | dansmith, yeah, I was looking at that as well | 15:21 |
mriedem | you can see we already configure tempest from the script using ansible | 15:22 |
mriedem | $ANSIBLE primary --become -f 5 -i "$WORKSPACE/inventory" -m ini_file -a "dest=$BASE/new/tempest/etc/tempest.conf | 15:22 |
mriedem | so yeah, just create a new flavor and re-configure tempest with the id | 15:22 |
mriedem | this one https://github.com/openstack/tempest/blob/master/tempest/config.py#L285 | 15:23 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Support migrating SRIOV port with bandwidth https://review.opendev.org/676980 | 15:23 |
*** ccamacho has quit IRC | 15:23 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Allow migrating server with port resource request https://review.opendev.org/671497 | 15:25 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Do not query allocations twice in finish_revert_resize https://review.opendev.org/678827 | 15:25 |
artom | mriedem, ack, appreciated :) | 15:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: objects: Add online migration for legacy NUMA objects https://review.opendev.org/537414 | 15:27 |
dansmith | artom: sean-k-mooney wrote these functional tests didn't he? | 15:27 |
artom | dansmith, no... You're implying there's loads of typos? | 15:27 |
dansmith | hehe, yes | 15:27 |
dansmith | you must be hanging around him too much then | 15:28 |
sean-k-mooney | :) | 15:28 |
artom | I suck up to the best :D | 15:29 |
sean-k-mooney | eventrully my influnce will spread to all in nova | 15:29 |
*** dpawlik has joined #openstack-nova | 15:29 | |
dansmith | over my dead body | 15:29 |
artom | That can be arranged | 15:29 |
dansmith | believe me, I know | 15:31 |
artom | o_O | 15:31 |
artom | Did you upset the mafia or something | 15:31 |
sean-k-mooney | bauzas: stephenfin im just running the test locally but i have droped the flatten_iterable function and replaced it with itertools.chain(*model.values()) as stephen suggested it does more or less the same thing so it will do form my usecase | 15:31 |
dansmith | artom: I assume you have russian mob connections | 15:31 |
artom | dansmith, I like to pretend I do :) | 15:32 |
bauzas | sean-k-mooney: honestly, I don't want to overthink this :p | 15:32 |
bauzas | itertools.chain() is cool with me | 15:32 |
sean-k-mooney | neither do i but i want something that will merge without me haveing to keep rewriting it | 15:32 |
bauzas | sean-k-mooney: because of the unpredictability of the dict ordering ? | 15:33 |
sean-k-mooney | ok | 15:33 |
stephenfin | bauzas: RE: https://review.opendev.org/#/c/674894/11/, will I just generate a stub 'RequestSpec' for the 'resources_from_flavor' caller? | 15:33 |
*** dpawlik has quit IRC | 15:33 | |
stephenfin | as in to address https://review.opendev.org/#/c/674894/11/nova/scheduler/utils.py@393 | 15:33 |
bauzas | sean-k-mooney: there are ways to predict a dict ordering but whatever | 15:33 |
sean-k-mooney | no this is the 4 way i have done this | 15:33 |
sean-k-mooney | in in those pathces | 15:33 |
bauzas | stephenfin: yeah, I think it would be fine | 15:33 |
stephenfin | (y) | 15:34 |
* stephenfin respins | 15:34 | |
sean-k-mooney | bauzas: it is noting to do with dict ordering | 15:34 |
sean-k-mooney | that was just a side effect of how i wrote the tests | 15:34 |
bauzas | stephenfin: I was cool with (flavor, image) as parameters | 15:34 |
bauzas | stephenfin: but is_bfv really did hurt my guts :p | 15:34 |
bauzas | sean-k-mooney: ok | 15:34 |
stephenfin | understandable :) | 15:35 |
openstackgerrit | Merged openstack/nova stable/stein: lxc: make use of filter python3 compatible https://review.opendev.org/676496 | 15:37 |
openstackgerrit | Merged openstack/nova stable/stein: Add an issue releasenote for placement eventlet stall https://review.opendev.org/676973 | 15:37 |
*** gyee has joined #openstack-nova | 15:37 | |
*** tbachman has joined #openstack-nova | 15:38 | |
*** mkrai has joined #openstack-nova | 15:42 | |
*** macz has joined #openstack-nova | 15:42 | |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: DNM: Run LM integration tests with NUMA flavor https://review.opendev.org/678887 | 15:45 |
artom | I'm pretty sure ^^ is entirely broken, but it's a first step | 15:46 |
*** jangutter has quit IRC | 15:46 | |
* artom goes back to Nova code | 15:46 | |
dansmith | artom: seems plausible | 15:48 |
sean-k-mooney | artom: that will fail because we dont have nested verit or a new enough qemu to enable the mttcg backend | 15:50 |
artom | sean-k-mooney, I thought vexxhost enabled that? | 15:50 |
sean-k-mooney | they do | 15:50 |
sean-k-mooney | did you change the nodeset to the vexxhost one | 15:50 |
*** brault has quit IRC | 15:50 | |
artom | ... no :( | 15:51 |
sean-k-mooney | it could land on vexhost still | 15:51 |
sean-k-mooney | and pass | 15:51 |
sean-k-mooney | actuly it will still fail | 15:51 |
sean-k-mooney | we disable kvm in the gate too | 15:51 |
sean-k-mooney | you would have to change the node set and then set the virt type to kvm | 15:51 |
sean-k-mooney | then it should work | 15:51 |
*** brinzhang has quit IRC | 15:52 | |
*** brinzhang has joined #openstack-nova | 15:52 | |
dansmith | sean-k-mooney: why will it fail without kvm? is it because the host will have no numa info so we'll fail to schedule anything? | 15:53 |
*** gbarros has quit IRC | 15:53 | |
sean-k-mooney | no it fails becasue when we set hw:numa_node=1 we pin the guest cpus to float over just 1 host numa node | 15:54 |
sean-k-mooney | we do that using the vcpupin element instad of the cpuset attibute on the vcpu element | 15:54 |
sean-k-mooney | the vcpupin element which does per core pinning is only supported with kvm or the mttcg backend | 15:55 |
dansmith | ah okay | 15:55 |
sean-k-mooney | the normal tsg backend that qemu ueses only support pinning via the <vcpu cpuset=""> method | 15:56 |
sean-k-mooney | so in pricaipal there is no reason it cant work just the way we generate the xml breaks it | 15:56 |
sean-k-mooney | or use a newer qemu/libvirt | 15:57 |
*** shilpasd has quit IRC | 15:57 | |
sean-k-mooney | i belive the one in fedroa 31 will be new enough by defualt | 15:57 |
sean-k-mooney | f29 and i think f30 still need the virt preview repo enabeld | 15:57 |
dansmith | sean-k-mooney: maybe you could fix up that patch for artom to do whatever it is that needs doing? | 15:58 |
sean-k-mooney | i can try yes | 15:59 |
sean-k-mooney | i also need to fix https://review.opendev.org/#/c/652197/ | 15:59 |
sean-k-mooney | that is my upstream nfv job | 15:59 |
sean-k-mooney | it uses the really new qemu form the virt preview repo but devstack/zuul jobs move a file or change the permisions | 16:00 |
sean-k-mooney | so its currenly broken | 16:00 |
sean-k-mooney | you can see i was testing hugepage + numa + pinning in the gate | 16:00 |
sean-k-mooney | https://review.opendev.org/#/c/652197/18/playbooks/nfv/nfv.yaml | 16:00 |
sean-k-mooney | if i can get that working again we can test artoms live migration stuff in the gate and stephenfin cpu pinning spec | 16:01 |
*** mtanino has quit IRC | 16:02 | |
sean-k-mooney | The destination directory (/opt/stack/devstack) is not writable by the current user. Error was: [Errno 13] Permission denied | 16:03 |
sean-k-mooney | artom: is it /opt/stack/new/devstack? | 16:03 |
artom | sean-k-mooney, at this point, whatever's quicker/more demonstrative | 16:05 |
sean-k-mooney | i think if i jsut add "become: yes" to my patch it might work | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: scheduler: Flatten 'ResourceRequest.from_extra_specs', 'from_image_props' https://review.opendev.org/674894 | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'hw:cpu_policy', 'hw:mem_page_size' extra specs from API samples https://review.opendev.org/675338 | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Start reporting PCPU inventory to placement https://review.opendev.org/671793 | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: '_get_(v|p)cpu_total' to '_get_(v|p)cpu_available' https://review.opendev.org/672693 | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Rewrap definitions of 'NUMACell' https://review.opendev.org/674395 | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: hardware: Differentiate between shared and dedicated CPUs https://review.opendev.org/671800 | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: objects: Rename 'fields' import to 'obj_fields' https://review.opendev.org/674103 | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Start reporting 'HW_CPU_HYPERTHREADING' trait https://review.opendev.org/675571 | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Simplify 'fakelibvirt.HostInfo' object https://review.opendev.org/678861 | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta https://review.opendev.org/671801 | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Fold in argument to '_update_provider_tree_for_vgpu' https://review.opendev.org/676729 | 16:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add reshaper for PCPU https://review.opendev.org/674895 | 16:05 |
sean-k-mooney | ill do that and then look at yours | 16:05 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: DNM: Run LM integration tests with NUMA flavor https://review.opendev.org/678887 | 16:06 |
openstackgerrit | sean mooney proposed openstack/nova master: Libvirt: report storage bus traits https://review.opendev.org/666914 | 16:08 |
openstackgerrit | sean mooney proposed openstack/nova master: libvirt: use domain capabilities to get supported device models https://review.opendev.org/666915 | 16:08 |
openstackgerrit | sean mooney proposed openstack/nova master: Add transform_image_metadata request filter https://review.opendev.org/665775 | 16:08 |
mriedem | sean-k-mooney: we can override the virt_type in local.conf for the job to use kvm | 16:09 |
mriedem | since this is a hack test patch anyway | 16:10 |
sean-k-mooney | yes we can | 16:10 |
*** jawad_axd has joined #openstack-nova | 16:10 | |
sean-k-mooney | artom: i think the nodeset you want is ubuntu-bionic-vexxhost but i need to double check | 16:11 |
artom | sean-k-mooney, yeah, I'm going in completely blind and relying on the gate to tell me what's wrong | 16:11 |
artom | So that's entirely possible | 16:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: DNM: Run LM integration tests with NUMA flavor https://review.opendev.org/678887 | 16:11 |
sean-k-mooney | you use it like this | 16:11 |
sean-k-mooney | https://opendev.org/openstack/magnum/src/branch/master/.zuul.yaml#L152-L158 | 16:11 |
artom | mriedem, *tips fedora* your help is appreciated | 16:12 |
sean-k-mooney | do you want to test master our your change? | 16:13 |
openstackgerrit | sean mooney proposed openstack/nova master: Libvirt: add nfv job https://review.opendev.org/652197 | 16:17 |
sean-k-mooney | i think ^ will fix my nfv job | 16:17 |
*** mkrai has quit IRC | 16:19 | |
sean-k-mooney | the multi node version of that enable live migration alther i might need to add the flag to allow that | 16:19 |
*** ivve has quit IRC | 16:22 | |
*** cdent has quit IRC | 16:22 | |
*** david-lyle is now known as dklyle | 16:23 | |
openstackgerrit | sean mooney proposed openstack/nova master: Libvirt: add nfv job https://review.opendev.org/652197 | 16:25 |
*** gbarros has joined #openstack-nova | 16:32 | |
*** slaweq has joined #openstack-nova | 16:33 | |
*** liuyulong has quit IRC | 16:33 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta https://review.opendev.org/671801 | 16:36 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Fold in argument to '_update_provider_tree_for_vgpu' https://review.opendev.org/676729 | 16:36 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add reshaper for PCPU https://review.opendev.org/674895 | 16:36 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Start checking compute usage in functional tests https://review.opendev.org/678902 | 16:36 |
stephenfin | bauzas: fwiw, I think I've everything in https://review.opendev.org/674894 resolved if you wanted to take another look | 16:37 |
stephenfin | I, however, am out of here o/ | 16:37 |
openstackgerrit | sean mooney proposed openstack/nova master: DNM: Run LM integration tests with NUMA flavor https://review.opendev.org/678887 | 16:39 |
sean-k-mooney | artom: mriedem dansmith i think ^ will fix it | 16:39 |
*** dtantsur is now known as dtantsur|afk | 16:42 | |
*** amotoki is now known as amotoki_ | 16:42 | |
efried | stephenfin: you still around? | 16:45 |
*** mdbooth_ has quit IRC | 16:46 | |
efried | I've been in meetings for 2h wanting to talk to you about this, but afraid I may have missed my window :( | 16:46 |
artom | sean-k-mooney, ack, much thanks! | 16:50 |
sean-k-mooney | we should have 3 jobs trying to test numa with the gate as we speak | 16:50 |
sean-k-mooney | the singel and multinode jobs i wrote and running and yours | 16:51 |
sean-k-mooney | i assume i should be able to go test the latest version of your code again manually? | 16:51 |
*** psachin has joined #openstack-nova | 16:53 | |
sean-k-mooney | oh i just rediscovered somthing quite useful | 16:56 |
sean-k-mooney | the experimenatl queue is way fater the check. which i kind of knew other pipleine are faster but i forgot | 16:57 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Specify availability_zone to unshelve https://review.opendev.org/663851 | 16:57 |
*** dougsz has quit IRC | 16:59 | |
*** derekh has quit IRC | 17:02 | |
mriedem | alex_xu: for your tomorrow, please take another pass on ^ | 17:03 |
mriedem | so we can get 2.77 in and move on | 17:03 |
*** jawad_axd has quit IRC | 17:03 | |
dansmith | artom: it might be easier to discuss https://review.opendev.org/#/c/635669/39/nova/compute/resource_tracker.py here | 17:06 |
artom | dansmith, shoot :) | 17:06 |
dansmith | artom: my point is that you're assuming that if we're live migrating an instance, you can pass an empty PCIRequests object into the migration claim, because a live-migrating instance couldn't possibly have pci devices, right? | 17:07 |
artom | dansmith, no, because we don't want the claim to think it has any PCI devices | 17:07 |
dansmith | um | 17:08 |
dansmith | artom: why is that? | 17:08 |
artom | Because for a subset of PCI devices (SRIOV Neutron ports), that's handled buy Sean's code | 17:08 |
artom | So we don't want to refuse based something that should otherwise work | 17:09 |
artom | The missing piece here is that all other PCI devices *should* get refused | 17:09 |
dansmith | should get refused somewhere earlier in the stack right? | 17:09 |
artom | Yeah | 17:09 |
dansmith | that's the part I don't like | 17:09 |
artom | I agree - it's also current/latent | 17:10 |
artom | IIRC we just blow up if we try to migrate an instance with PCI alias-based passthrough | 17:10 |
sean-k-mooney | we check and fail the migration here https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L216-L221 | 17:11 |
dansmith | artom: it's not latent because it's the code you're adding | 17:11 |
sean-k-mooney | if the instance has a pcirequest that is not a neutron port we never call artoms code | 17:11 |
dansmith | sean-k-mooney: currently. | 17:11 |
artom | sean-k-mooney, aha, thanks for that - that confirms my idea that we should do claims *after* your MigrationPreCheck | 17:11 |
artom | So I'll have to change that | 17:12 |
sean-k-mooney | artom: https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L317-L326 | 17:12 |
sean-k-mooney | we do the pci check before calling check_can_live_migrate_destiation | 17:12 |
sean-k-mooney | so we should do the claims in check_can_live_migrate_destination | 17:13 |
artom | Ah, sorry, missed the wider context in your link. OK, then we're good. | 17:13 |
dansmith | artom: so my concern is that you say "we should never get here" and throw up an empty PCIRequests object, which I think goes into the migration context, and eventually gets applied to the instance | 17:13 |
artom | dansmith, ah, indeed | 17:13 |
sean-k-mooney | dansmith: we get there for cold migration | 17:13 |
dansmith | artom: so if the conductor stuff changes in the future and we *do* get here, I worry that we could dump the PCIRequests from the instance when we apply | 17:13 |
dansmith | sean-k-mooney: I know, but cold migration is still handled.. that's not my problem.. my problem is that live migration is *begging* to become a silent data corruption problem in the future I think | 17:14 |
artom | dansmith, so for live migration _test_pci in the claim would have to be skipped entirely | 17:14 |
sean-k-mooney | we put the check in specifcally to prevent issue related to pci devices | 17:14 |
sean-k-mooney | we would only remove it if we added support for it | 17:15 |
artom | ... while still keeping pci_requests? | 17:15 |
dansmith | artom: it looks to me like for cold, we get the requests and filter them based on their handled-by-neutron-ness or something | 17:15 |
sean-k-mooney | which i dont think we ever will as we cant safely hot unplug generic pci devices | 17:15 |
* artom double-checks how the migration context is applied/dropped for pci devices | 17:15 | |
dansmith | artom: blindly without checking | 17:16 |
sean-k-mooney | ok so why don t we jsut do the same check again | 17:16 |
dansmith | you guys are saying it's fine to make dangerous assumptions because "nobody will ever ask for X" but that's *always* a bad assumption | 17:16 |
*** jaosorior has quit IRC | 17:16 | |
dansmith | sean-k-mooney: that's my point, assert that the thing is the way you think it is, instead of just assuming you can ignore all of it because something upstream of you did the check | 17:16 |
artom | dansmith, so if we keep pci_requests, but skip _test_pci... | 17:16 |
dansmith | because in the future when that check is gone, different, mixed-versions, etc... | 17:16 |
sean-k-mooney | dansmith: no im not im saying we specificaly thought about this edgecase in the spec and code and added a check to prevet the dangours edge case for the sriov mirgation | 17:17 |
dansmith | artom: would it make sense to just make _test_pci check the neutron-ness (or whatever) and not freak out over those things? | 17:17 |
sean-k-mooney | but im totally fine with adding another chacke for numa path | 17:17 |
sean-k-mooney | just copy past | 17:17 |
sean-k-mooney | for pci_request in self.instance.pci_requests.requests: | 17:17 |
sean-k-mooney | if pci_request.source != objects.InstancePCIRequest.NEUTRON_PORT: | 17:17 |
artom | dansmith, so move the check that sean-k-mooney linked into the claim? | 17:17 |
sean-k-mooney | # allow only VIF related PCI requests in live migration. | 17:17 |
sean-k-mooney | raise exception.MigrationPreCheckError( | 17:17 |
sean-k-mooney | reason= "non-VIF related PCI requests for instance " | 17:17 |
sean-k-mooney | "are not allowed for live migration.") | 17:17 |
sean-k-mooney | no | 17:18 |
sean-k-mooney | we dont want to fail late | 17:18 |
sean-k-mooney | bu just check again in the compute node | 17:18 |
dansmith | can we not do the same check in _test_pci, and ignore any that are vif-related, and fail if we find non-vif-related ones/ | 17:18 |
*** jungleboyj has joined #openstack-nova | 17:18 | |
artom | dansmith, conversely, make apply_migration_context() smarter, and only set the new_ fields on the instance if they're set on the migration_context? | 17:19 |
dansmith | artom: no | 17:19 |
dansmith | artom: because that is deeply-buried silent ignoring of data you're trying to save, just because you only think that will happen from the migration path | 17:19 |
dansmith | this is all about making claims work for pci devices, so make it work, don't just "fix" the problem with lots of side effects everywhere else | 17:19 |
artom | dansmith, claims *already* work for PCI devices | 17:20 |
dansmith | sorry, making claims work during live migration | 17:20 |
dansmith | mind-o there | 17:20 |
artom | They're used when booting and cold migrating | 17:20 |
dansmith | yes yes I know | 17:20 |
artom | This is about having PCI live migration in different places | 17:20 |
artom | *PCI live migration logic in different | 17:21 |
artom | dansmith, so correct me if I'm wrong, but looking at https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L1004-L1007 | 17:21 |
artom | Seems apply_migration_context is already smart enough to only look at fields in migration_context that are actually set | 17:21 |
dansmith | what's your point? | 17:22 |
artom | So what's so bad about removing pci_requests from live migration claims entirely? | 17:22 |
artom | Well, I'm assuming that'll also remove them from the migration context | 17:22 |
dansmith | because that's obscure behavior to solve the actual problem, which is that you need to keep pci_requests, but only allow the claim to work if they're of the right type | 17:23 |
*** brault has joined #openstack-nova | 17:23 | |
dansmith | why just ignore them entirely and keep them out of the context as a special case when you can keep all the other code the same, and just not proceed if they're not valid for live migration? | 17:23 |
artom | I'd dispute that - pci_requests is handled outside the claim, so why does the claim need to work with them? | 17:23 |
sean-k-mooney | why do we not jsut check the migration type here https://review.opendev.org/#/c/635669/39/nova/compute/resource_tracker.py@307 and raise an excepion if any pci device are requested that are not related to neutron | 17:23 |
sean-k-mooney | does ^ that not solve the issue | 17:24 |
dansmith | sean-k-mooney: YES | 17:24 |
dansmith | it's what I'm asking for | 17:24 |
sean-k-mooney | it should never raise but f it does it means we fucked up and remove the conductor check by mistake | 17:24 |
dansmith | or something changed | 17:24 |
sean-k-mooney | ya | 17:25 |
dansmith | if that were to happen with the current code, we would just blow away pci_requests on the instance | 17:25 |
sean-k-mooney | but it would catch such a change | 17:25 |
*** ivve has joined #openstack-nova | 17:25 | |
dansmith | right | 17:25 |
artom | sean-k-mooney, will the claim pass though? | 17:25 |
dansmith | we're already examining the requests for other migration types and doing things like this, we should do the same for live migration, but whatever different behavior we need | 17:25 |
artom | I'm worried about double-claiming | 17:25 |
sean-k-mooney | we should not try to claim them if it is a live migration | 17:26 |
sean-k-mooney | if it not we should | 17:26 |
sean-k-mooney | but we should check instead of just nulling it out | 17:26 |
artom | sean-k-mooney, exactly, if this is a live migration, the claim cannot have any pci_requests, because your code handles all that | 17:26 |
*** ivve has quit IRC | 17:26 | |
sean-k-mooney | yes. | 17:26 |
*** slaweq has quit IRC | 17:27 | |
sean-k-mooney | we could make the other code use the move claims | 17:27 |
artom | You said you don't want to fail late | 17:27 |
sean-k-mooney | but we didnt wnat to do that since that code make so many assumtion about it not being a live migration | 17:27 |
*** ivve has joined #openstack-nova | 17:27 | |
sean-k-mooney | i dont | 17:27 |
sean-k-mooney | i still want to keep the check we have in the conductor | 17:27 |
dansmith | me too | 17:27 |
sean-k-mooney | im just saying we check twice and dont care that we already checked | 17:28 |
*** slaweq has joined #openstack-nova | 17:28 | |
artom | OK, so say we keep that check, and add a the same kind of check at https://review.opendev.org/#/c/635669/39/nova/compute/resource_tracker.py@307 | 17:28 |
artom | And we live migrate an instance with NUMA and a Neutron SRIOV port | 17:28 |
artom | Both of those checks will pass, right? | 17:28 |
sean-k-mooney | yes | 17:28 |
sean-k-mooney | and in that case you can set pci_resst=[] safely or we can rewrite the sriov stuff to get teh device form the move claim | 17:29 |
*** psachin has quit IRC | 17:29 | |
*** dpawlik has joined #openstack-nova | 17:29 | |
*** tesseract has quit IRC | 17:30 | |
artom | Well we can't just set pci_requests=[], because as dansmith pointed out, when we apply that claim, we'll clobber the instance's pci requests with the empty one from the claim | 17:30 |
artom | So we have to keep the current logic | 17:30 |
artom | Except | 17:30 |
artom | Your code is claiming PCI devices here: https://review.opendev.org/#/c/634606/58/nova/compute/manager.py@6465 | 17:31 |
artom | And mine will run the MoveClaim either immediately before or after yours | 17:31 |
sean-k-mooney | we would only do it in the neutron case and we create new request in that case in my code anyway | 17:31 |
dansmith | why is it that this needs to be different at all between live and cold migration? | 17:31 |
artom | Now this may be ignorance on my part, but to me it would mean one of those claims will fail because the previous one will have used up the device | 17:31 |
artom | Or can fail, at least | 17:32 |
sean-k-mooney | mainly when we reviewd the sriov spec it was decied that it did nto make sense to make move claims work for live migration but we decided to use the for numa | 17:32 |
* dansmith is confused and frustrated | 17:33 | |
*** slaweq has quit IRC | 17:33 | |
*** psachin has joined #openstack-nova | 17:33 | |
artom | dansmith, I get ya | 17:33 |
*** dpawlik has quit IRC | 17:34 | |
artom | I'm proposing the "workaround" of unsetting pci_requests from live migration claims to avoid any clobbering, and we can consolidate SRIOV and NUMA claims in a follow up? | 17:34 |
dansmith | -2 on that | 17:34 |
artom | So... straight up consolidation right away? | 17:35 |
dansmith | answer my question above about why this needs to be different | 17:35 |
sean-k-mooney | if we most but i really dont like move claimes | 17:35 |
dansmith | not sure if sean-k-mooney answered it or not, but I didn't understand the words | 17:36 |
sean-k-mooney | but it should not be too hard to make it work | 17:36 |
artom | dansmith, why live migration claims need to be different with respect to PCI devices? | 17:36 |
dansmith | sean-k-mooney: isn't the whole point of this work to integrate the move claims with this stuff so that it works? | 17:36 |
dansmith | artom: yes | 17:36 |
sean-k-mooney | we thought that was too much work to justify doing that for sriov | 17:36 |
sean-k-mooney | also if i recall jay was not a fan of that approch | 17:37 |
*** bbowen has quit IRC | 17:37 | |
dansmith | so the plan was just to re-use most of the same code paths but special-case out the live migration? | 17:37 |
sean-k-mooney | which is why we built on the multiple port binding feature and do not use port bindings | 17:37 |
artom | dansmith, because for the subset of pci_requests that's supported for live migration, claiming resources on the destination is already being handled in https://review.opendev.org/#/c/634606/58/nova/compute/manager.py@6465 | 17:38 |
sean-k-mooney | no sriov migration jsut does not use move claims because live migration dose not use them | 17:38 |
artom | sean-k-mooney, so you left me to do the gruntwork l) | 17:38 |
*** ralonsoh has quit IRC | 17:38 | |
artom | ;) | 17:38 |
sean-k-mooney | artom: honestly i was hoping you would not use move claims either but i kind of gave up on that | 17:39 |
artom | dansmith, and that code predates using claims for live migration at all | 17:39 |
*** bbowen has joined #openstack-nova | 17:39 | |
dansmith | I honestly don't understand this | 17:39 |
artom | dansmith, can you expand on that "this" is :) | 17:39 |
dansmith | you want to use move claims for numa but not pci and have a separate code path just for pci live migration? | 17:39 |
artom | dansmith, PCI predates NUMA | 17:40 |
dansmith | and? | 17:40 |
artom | And they decided not use to claims, presumably because the PCI tracker makes it easier to claim resources without going through actual MoveClaim objects | 17:40 |
sean-k-mooney | and both spec propsoed different approche to the same thing because different peopel reveies and had differen preferences | 17:40 |
*** itlinux has joined #openstack-nova | 17:40 | |
dansmith | you seem to be saying "I can make the mess bigger because it's already a mess" and "even though we're rounding out live support for a thing that already works cold, I'm going to do it a different way because there are already lots of fragments" | 17:40 |
dansmith | neither of those really satisfy me | 17:41 |
artom | For NUMA LM, MoveClaim was a handy way to get both claiming of resources and the new instance NUMA topology in a "single operation" | 17:41 |
sean-k-mooney | move claims are not stictly requried for numa migration either | 17:41 |
sean-k-mooney | but it would require a lttile more work | 17:42 |
artom | No, but they do make things less racy and simplify getting the new instance numa topology | 17:42 |
sean-k-mooney | artom was trying to reuse the existing cold migraiton code to reduce the code change | 17:42 |
sean-k-mooney | artom: i dont think they do | 17:42 |
sean-k-mooney | we both claim in the same place. more or less | 17:42 |
*** gbarros has quit IRC | 17:42 | |
artom | sean-k-mooney, yeah, maybe there was a way to do it less racy-ly even without claims | 17:44 |
dansmith | tbh, I don't really care whether this uses claims or not, | 17:44 |
dansmith | but if you're going to, | 17:44 |
dansmith | I think that just "if live migration: do different thing" in a bunch of random places is not moving us forward | 17:44 |
*** psachin has quit IRC | 17:45 | |
dansmith | especially when it comes to things that may silently break data | 17:45 |
mriedem | "if pci: something something dragons that no one but sean understands" | 17:45 |
artom | dansmith, so that would mean folding SRIOV live migration into the claims | 17:45 |
mriedem | ^ since juno | 17:45 |
dansmith | artom: btw, you've done all this local testing of this.. have you included sriov migration and not seen it clobber pci_requests? | 17:45 |
sean-k-mooney | we currently do the claim here for sriov https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L6428-L6467 | 17:45 |
artom | dansmith, SRIOV with this is completely untested :( | 17:45 |
sean-k-mooney | right after we call check_can_live_migrate_source in check_can_live_migrate_destination | 17:46 |
dansmith | artom: so it's very likely that you're blowing those away with this yeah? | 17:46 |
artom | Although sean-k-mooney's saying apparently all recent-ish NICs can do SRIOV, so maybe I *do* have the hardware? | 17:46 |
sean-k-mooney | artom: i have hardware and i set up port forwading so you can ssh in. | 17:46 |
sean-k-mooney | but im going to check both again after dinner | 17:47 |
*** hemna has quit IRC | 17:47 | |
*** itlinux has quit IRC | 17:47 | |
sean-k-mooney | im going to grab dinner but ill kick of a devstack run before i go and setup the test enviromint | 17:49 |
artom | dansmith, seems likely, yeah | 17:49 |
*** itlinux has joined #openstack-nova | 17:50 | |
sean-k-mooney | we dont use the pci request spec object from the instance sriov migration by the way | 17:51 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L6428-L8993 | 17:51 |
dansmith | sean-k-mooney: because why? we store details in neutron about it? | 17:51 |
artom | Hrmm | 17:52 |
artom | So actually | 17:52 |
sean-k-mooney | the only info we need to pass back as the pci address of the new device | 17:52 |
artom | AFAICT | 17:52 |
artom | The PCI stuff in the claim doesn't actually *claim* any resources | 17:52 |
sean-k-mooney | so we store that in the vif port profile which is wher ewe pass it to neutron | 17:52 |
artom | Just tests that it's supported | 17:52 |
dansmith | sean-k-mooney: okay, but nova's own structure would still have them in the pci_requests right? | 17:53 |
sean-k-mooney | yes | 17:53 |
dansmith | so what about like rebuild or something? | 17:53 |
artom | Ah, no, it does the _decrease_pool_count() thing, which "claims" the resource | 17:53 |
*** eharney has quit IRC | 17:54 | |
sean-k-mooney | how does it interact with this? | 17:54 |
sean-k-mooney | the claims have 3 states | 17:54 |
sean-k-mooney | free,cliamed and allocated | 17:54 |
sean-k-mooney | something like that | 17:54 |
sean-k-mooney | one of them meens its reserved for an instace the the otehr is its in use | 17:55 |
dansmith | I'm saying I would expect we use pci_requests if we're doing a rebuild on the instance | 17:55 |
dansmith | or other operations | 17:55 |
*** itlinux has quit IRC | 17:55 | |
dansmith | point is just that corrupting our own accounting because sriov is tracked in neutron is not okay I don't think | 17:55 |
sean-k-mooney | so we are claiming the ot reserve them and later we update the instace with them weh we move it to the allocated state i think. its beed a whild and i dont really rememerb the details | 17:55 |
sean-k-mooney | this is the import cahgne https://review.opendev.org/#/c/620115/35/nova/compute/manager.py | 17:56 |
mriedem | we do'nt claim for rebuild | 17:56 |
*** itlinux has joined #openstack-nova | 17:56 | |
mriedem | it's a noop claim | 17:56 |
dansmith | mriedem: sure, not related to claim | 17:56 |
dansmith | cross-cell-migrate looks at pci_requests | 17:56 |
dansmith | looks like regular live migration also looks at them to determine if they're all neutron-related | 17:57 |
dansmith | just trying to confirm that blowing them away is not something we can just ignore :) | 17:58 |
*** jaosorior has joined #openstack-nova | 18:00 | |
efried | dansmith: I'm looking at consecutive_build_service_disable_threshold and not seeing how it could be working | 18:01 |
dansmith | efried: we removed the compute side of that, if that's what you're looking at | 18:01 |
efried | "the compute side" | 18:01 |
efried | meaning... the part that makes it behave in any way other than a bool? | 18:01 |
dansmith | the part where the compute node self-disables | 18:01 |
dansmith | there should be renos about this | 18:02 |
efried | ack | 18:02 |
dansmith | change-consecutive-boot-failure-counter-to-weigher-428de7da0ed2033a.yaml | 18:02 |
efried | dansmith: okay | 18:03 |
efried | so | 18:03 |
efried | this is now up to deployers installing their own weigher? | 18:04 |
dansmith | eh? | 18:04 |
dansmith | did you read the reno? | 18:04 |
dansmith | efried: https://review.opendev.org/#/c/572195/ | 18:04 |
efried | ack | 18:05 |
dansmith | added a weigher, changed the meaning of the threshold for compatibility reasons | 18:05 |
mriedem | https://docs.openstack.org/nova/latest/user/filter-scheduler.html#weights | 18:05 |
mriedem | BuildFailureWeigher | 18:05 |
dansmith | efried: you hip to that jive? | 18:07 |
efried | Still trying to grok the implications. | 18:07 |
efried | So in a biggish cloud, if I have a bunch of failures in a row on a particular host, even if I don't hit the default multiplier, it's still going to make scheduling to that node way less likely | 18:08 |
dansmith | mriedem: beat you to the youtube ref: https://www.youtube.com/watch?v=sgW3RxKdN0Q | 18:08 |
*** dpawlik has joined #openstack-nova | 18:08 | |
dansmith | efried: hit the multiplier meaning "change" it? | 18:09 |
efried | sorry, no, I just mean... | 18:09 |
* mriedem feels like efried has been tricked into doing bug triage for the RH team | 18:09 | |
efried | IIUC the default multiplier is really big so that by default you won't effectively-disable a compute until it's seen a really lot of build failures. | 18:09 |
dansmith | efried: no | 18:10 |
dansmith | efried: read the reno and commit for reasoning about the multiplier | 18:10 |
dansmith | efried: it has to compete against disk weigher, which is like scored by mb free or something | 18:10 |
dansmith | efried: a big multiplier makes it more likely to have an effect, not les | 18:11 |
dansmith | *less | 18:11 |
efried | ah | 18:11 |
dansmith | and the threshold is now just "report failures if nonzero", so it's not a threshold | 18:11 |
dansmith | if you want it off, set the "threshold" to zero and computes won't even report the number | 18:11 |
efried | ack | 18:12 |
dansmith | and if you do, then you tune the multiplier based on whatever else you have configured | 18:12 |
*** gbarros has joined #openstack-nova | 18:12 | |
*** dpawlik has quit IRC | 18:13 | |
efried | got it. Thanks dansmith. (And no, mriedem, I'm helping donnyd figure out why his CI hosts get effectively-disabled-without-actually-being-disabled when they spend some time trying to boot with not-yet-ready images) | 18:15 |
mriedem | :P | 18:16 |
mriedem | o-) | 18:16 |
efried | dansmith: save me tracing the code, does it still have the "reset to zero" behavior as soon as we get one success? | 18:16 |
dansmith | IIRC yes | 18:16 |
dansmith | efried: https://review.opendev.org/#/c/572195/6/nova/compute/manager.py | 18:16 |
donnyd | thats pending it ever actually gets rescheduled | 18:16 |
donnyd | I have enough space that essentially it doesn't | 18:17 |
efried | yeah | 18:17 |
dansmith | https://review.opendev.org/#/c/572195/6/nova/compute/stats.py | 18:17 |
efried | dansmith: is this one of those things that you should theoretically be able to reset via SIGHUP? | 18:17 |
dansmith | see my FIXME in there | 18:17 |
dansmith | efried: for sure | 18:17 |
efried | k. btw, not sure if you saw the update, but between bnemec and me, SIGHUP is (soon to be) fixed. | 18:18 |
efried | soon as we can get this code merged & released | 18:18 |
donnyd | What i really need is a faster way to download glance images so this isn't an issue | 18:21 |
efried | just walk a flash drive across the room | 18:24 |
donnyd | LOL efried | 18:24 |
donnyd | Well the glance image store is on an nvme drive that will move at the speed of the rest of the network... however glance doesn't feel that same way, and limits my download speeds to what one core can do... which is about 100M/s | 18:25 |
donnyd | I think it has something to do with requests if I am not mistaken | 18:25 |
sean-k-mooney | donnyd: are you useing a 1G network for your managment network | 18:33 |
donnyd | 10 | 18:33 |
donnyd | and the controllers are all 40 | 18:33 |
donnyd | So on the control plane for the compute side 10, and controller side 40 | 18:33 |
mriedem | donnyd: are you using ceph? | 18:33 |
donnyd | no | 18:33 |
donnyd | filestore is the fastest i have measured thus far | 18:34 |
melwitt | mriedem: just fyi if you didn't see, I updated the multi-cell archive patches yesterday night | 18:34 |
mriedem | melwitt: i didn't, and was going to look about 30 minutes ago, but was distracted, but i'll look in a bit, thanks | 18:34 |
donnyd | i tried cinder and swift backends, and they only made it slower | 18:34 |
melwitt | mriedem: coolness, thanks | 18:35 |
mriedem | donnyd: have you tried pre-caching the images on the computes when you have a new image? | 18:35 |
donnyd | if I was on ceph, there would be no wait time at all | 18:35 |
melwitt | and sorry for the delay | 18:35 |
mriedem | donnyd: the only thing i'd have to check is if there is some config to keep the image cache manager periodic from deleting the images if there are no guests on the host using them | 18:35 |
donnyd | mriedem: I didn't know that was an option... but I am also pretty sure nodepool is all over it whenever there is a new image | 18:36 |
mriedem | the image cache stuff in nova isn't documented at all outside of config (i don't think anyway), so i wouldn't be surprised | 18:36 |
*** gbarros has quit IRC | 18:36 | |
donnyd | mostly the issue is when new images are loaded, they can't be downloaded fast enough by compute | 18:36 |
mriedem | i don't know it all that well either | 18:36 |
mriedem | right, by pre-caching you wouldn't need compute to download them, the images would be there | 18:37 |
mriedem | but that's something yo'ud have to orchestrate outside of nova | 18:37 |
donnyd | well nodepool asks for them nearly the instant that they are active in glance | 18:37 |
donnyd | at least from what i can see | 18:37 |
mriedem | ok yeah so maybe wouldn't help | 18:38 |
donnyd | when you say pre-caching, i am thinking you mean have nova launch something based on the new image and then kill it immediately after its active. | 18:38 |
mriedem | that's one way | 18:39 |
*** jmlowe has quit IRC | 18:39 | |
mriedem | probably the easiest | 18:39 |
donnyd | I don't know how to setup the http store for glance and have it download direct from say an apache server | 18:39 |
mriedem | or you could have something push the image here https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.image_cache_subdirectory_name | 18:39 |
mriedem | jroll: doesn't verizon do extensive image pre-caching? | 18:39 |
melwitt | pre-caching (warming the cache by booting instance with new image 1 per compute host) is the only other way I know to deal with avoiding download speed issues, other than using ceph. I know at least in the past they did the cache warming thing at yahoo | 18:40 |
mriedem | ok yeah that ^ | 18:40 |
jroll | if we do it idk about it | 18:40 |
melwitt | mriedem: haha, my slow message composition strikes again | 18:40 |
mriedem | was going to ask penick but he's not here | 18:40 |
jroll | rax did some of that too | 18:40 |
donnyd | I thought about setting up a share from my block image server that would share _base across hypervisors | 18:40 |
donnyd | making it only have to download once | 18:40 |
donnyd | which is acceptable | 18:40 |
donnyd | but i am not sure of the other performance implications of doing such a thing | 18:41 |
jroll | mriedem: is there something I can help with around that? | 18:41 |
mriedem | yeah, people do that https://bugs.launchpad.net/nova/+bug/1804262 | 18:41 |
openstack | Launchpad bug 1804262 in OpenStack Compute (nova) "ComputeManager._run_image_cache_manager_pass times out when running on NFS" [Medium,In progress] - Assigned to Matthew Booth (mbooth-9) | 18:41 |
mriedem | jroll: not really | 18:41 |
jroll | k :) | 18:41 |
mriedem | i thought i saw something (blog/talk?) where penick was talking about doing this | 18:41 |
jroll | possibly | 18:41 |
mriedem | i was probably surprised because it was some non-baremetal thing | 18:42 |
donnyd | I was only going to mount _base so the first hypervisor to download the image would speed it up for the rest | 18:42 |
jroll | just not sure if there's a question to be answered or if you're just curious | 18:42 |
jroll | rax did bare metal image caching and it was awesomesauce | 18:42 |
mriedem | donnyd: yeah i'm pretty sure i was talking with someone in here awhile back that had the same idea | 18:42 |
melwitt | yeah sharing _base/ makes sense | 18:42 |
donnyd | jroll: Well when new images are loaded from nodepool, it takes a while for compute to download from glance X # of hypervisors | 18:43 |
donnyd | so some pre-caching or sharing of _base would likely speed things up | 18:43 |
donnyd | Glance is seemingly limited to one core's worth of power when downloading | 18:44 |
donnyd | which for my controllers is unfortunately not fast | 18:45 |
donnyd | melwitt: my only concern in mounting _base on a shared drive is that what happens when multiple hypervisors are all trying to download the same image at the same time | 18:46 |
mriedem | there might be a lock in the code, but would need to verify | 18:47 |
melwitt | hm, yeah | 18:47 |
mriedem | in fact, i think starlingx people added a configurable lock there | 18:47 |
jroll | donnyd: gotcha, thanks for context | 18:47 |
donnyd | would they all download it anyways, or as mriedem just said would they see another hypervisor already has that in-flight | 18:47 |
artom | sean-k-mooney, do you recall if the SRIOV live migration code ever updated instance.pci_requests? | 18:48 |
artom | I don't see anything quickly glancing through your patches | 18:48 |
mriedem | donnyd: this is what i'm thinking of https://opendev.org/openstack/nova/src/branch/master/nova/virt/libvirt/imagebackend.py#L257 | 18:48 |
sean-k-mooney | we do update them when we allocate the claimed pci deivce i belive | 18:49 |
mriedem | https://docs.openstack.org/nova/latest/configuration/config.html#compute.max_concurrent_disk_ops was the thing starlingx added | 18:50 |
donnyd | well I should give it a spin.. probably cut the "new image loaded so i will fail a bunch" down to almost nothing | 18:51 |
openstackgerrit | Merged openstack/python-novaclient master: Follow up for microversion 2.75 https://review.opendev.org/678473 | 18:51 |
sean-k-mooney | artom: i think it get saved here https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L7229-L7256 | 18:51 |
mriedem | donnyd: if you do let us know how it goes, | 18:51 |
sean-k-mooney | in post_live_migration_at_destination | 18:51 |
sean-k-mooney | so we only update it if the migration succeeds | 18:51 |
mriedem | donnyd: i also threw it in https://bugs.launchpad.net/nova/+bug/1838819/comments/7 so we don't forget docs about it | 18:51 |
openstack | Launchpad bug 1838819 in OpenStack Compute (nova) "Docs needed for tunables at large scale" [Undecided,Confirmed] | 18:51 |
sean-k-mooney | if it was reverted we do not modify the instace pci_requests | 18:52 |
mriedem | image cache docs in general would be great but stephenfin just refuses to support us documenting things | 18:52 |
mriedem | o-) | 18:52 |
donnyd | LOL mriedem | 18:52 |
artom | sean-k-mooney, AFAICT that calls down to https://github.com/openstack/nova/blob/master/nova/objects/pci_device.py#L373 | 18:54 |
artom | sean-k-mooney, which saves pci_devices, but not pci_requests | 18:54 |
artom | Does it even make sense for pci_requests to change for a live migration? | 18:56 |
sean-k-mooney | well the number and type of request wont change | 18:56 |
artom | The spec might? | 18:57 |
mriedem | i would hope that pci requests don't change for a live migration | 18:58 |
mriedem | otherwise we f'ed up somewhere in the data modeling | 18:58 |
sean-k-mooney | what do you mean | 18:58 |
mriedem | pci devices are the per-compute/instance thing that rack inventory and what's allocated | 18:58 |
mriedem | the pci requests should just be the host agnostic request | 18:58 |
sean-k-mooney | ya they more or less are | 18:58 |
sean-k-mooney | the can have an alisa | 18:58 |
sean-k-mooney | or my have tags like a neutorn physnet | 18:59 |
melwitt | donnyd, mriedem: I just asked penick and he said they've only done image cache warming once or twice on an ad hoc basis, for example for a 40G windows image that was needed on certain hypervisors. he said even still, downloads don't take too long (couple minutes at most) bc they always have their glance in the same datacenter. he also said they disable image checksums to avoid that slowness | 18:59 |
sean-k-mooney | or device type e.g. type-VF | 18:59 |
artom | sean-k-mooney, right, but that can't change during a live migration, right? | 18:59 |
sean-k-mooney | but the request does not have any info about the specifc device | 18:59 |
sean-k-mooney | artom: correct it cant | 18:59 |
*** eharney has joined #openstack-nova | 19:00 | |
donnyd | melwitt: My downloads are in the same rack on a 10G network (compute side) and they are painfully slow for the gear that underpins them | 19:00 |
melwitt | donnyd: I didn't notice if you mentioned how long is a long time in your scenario. is it a few minutes or like 20 minutes | 19:00 |
artom | So we seem fairly confident pci_requests can't change during a live migration. So back to our original problem, it should be find for a Claim to write them back to the instance | 19:02 |
artom | In the sense that, the SRIOV live migration code won't have changed them | 19:03 |
donnyd | well the drive glance is hosted on is an nvme device with 4G/s in read speed and the machine its on has 40G networking... so I expected to get somewhere around 1/4 of that or about 1G/s (ish) in download speed | 19:03 |
dansmith | right, which was my original thing.. why not just keep them? | 19:03 |
donnyd | the reality is 100M/s | 19:03 |
artom | dansmith, I was worried about them changing under us | 19:03 |
artom | So by keeping them we'd end up clobbering the new ones set by the SRIOV live migration code | 19:04 |
donnyd | If I could speed up the downloading part, i wouldn't even notice | 19:04 |
artom | Which is why I wanted to strongly to avoid touching them altogether | 19:04 |
donnyd | because a new image would be downloaded in 30-40 seconds | 19:04 |
dansmith | artom: we stash a copy of the requests to be applied int he migration context with other things.. if the sriov migration code is not playing nice with that, then it's wrong | 19:05 |
sean-k-mooney | artom: we wont clobber anything | 19:05 |
sean-k-mooney | but we will end up trying to claim pci device twice | 19:05 |
artom | sean-k-mooney, well no, dansmith's point was that we still skip the actual claiming (right?) | 19:05 |
artom | Just don't mess with any DB stuff | 19:05 |
sean-k-mooney | and then whe might not move them form claimed to allocated correctly and leak pci deivces | 19:05 |
dansmith | artom: I don't think I made that claim | 19:06 |
sean-k-mooney | e.g. once that are claimed for the instance but not allocated to it | 19:06 |
dansmith | artom: I might have asked that | 19:06 |
mriedem | "I don't think I made that claim" | 19:06 |
* mriedem cues rimshot | 19:06 | |
artom | We need to stop overloading "claim" >_< | 19:06 |
dansmith | I still don't understand why this needs to be different for cold and live migration, with respect to the accounting | 19:06 |
melwitt | donnyd: right... that does sound strange, but I'm definitely not that knowledgable about what is reasonable to expect there. I'll run those details by penick and find out if it rings a bell | 19:06 |
sean-k-mooney | claimed and allocated are states in the pci resouce tracker. | 19:06 |
sean-k-mooney | and claimes generally refers to the RT | 19:07 |
*** markvoelker has quit IRC | 19:07 | |
artom | dansmith, it doesn't :) But SRIOV live migration was implemented without it, so now we're in this mess | 19:07 |
sean-k-mooney | we have moved to using allocaiton to refer to placmeent | 19:07 |
dansmith | artom: yeah, sounds like that is the real problem here | 19:07 |
donnyd | I think the last time we dug into it, it had something to do with the requests library only being able to use one core, which in turn make sense because my controllers cores aren't super fast | 19:08 |
sean-k-mooney | i can proably make the sriov stuff work with move claims. but i would prefer to keep move claimes for cold migration | 19:08 |
sean-k-mooney | sriov migration did not use claims because we did not need to and live migration never used them before | 19:08 |
*** markvoelker has joined #openstack-nova | 19:08 | |
mriedem | so something was bolted on instead and now we have a mess | 19:09 |
donnyd | I thought the http.store option in glance would allow compute to grab from a http server (like apache or something), but i have no idea how to configure it | 19:09 |
mriedem | is the summary yeah? | 19:09 |
dansmith | rightm | 19:09 |
dansmith | that's the problem | 19:09 |
dansmith | "live migration is different so I can be more different" | 19:09 |
artom | mriedem, I think sean-k-mooney would object to the "bolted on" wording, but yeah | 19:10 |
sean-k-mooney | when we were first proposing sirov migration we were not planning to use move claime for numa migration | 19:10 |
artom | It's also on me for not having reviewed that spec/code | 19:10 |
sean-k-mooney | i am testing your code right now by the way | 19:11 |
artom | Could have said "hey let's use claims since we'll need them for NUMA LM anyways" | 19:11 |
sean-k-mooney | well we dont you could alos do the calimes the way we do but anyway should i start looking at how to convert the sriov code | 19:11 |
dansmith | which cores were reviewing that? | 19:12 |
dansmith | I don't think it was me, | 19:12 |
mriedem | jay and stephen | 19:12 |
dansmith | okay | 19:12 |
melwitt | donnyd: the last comment confuses me because compute downloads images over glance API (http), so what is the difference between that and the http.store option you mention, I wonder? | 19:12 |
dansmith | Seems like jay was actually in favor of managing the pci stuff like the rest of it: https://review.opendev.org/#/c/620115/ | 19:15 |
dansmith | "If we managed PCI devices using the same system we do for CPU, memory and local disk, then we could do the resource *claim* in the scheduler's claim_resources() " | 19:16 |
dansmith | actually, | 19:16 |
dansmith | maybe he means more placement-like by that | 19:16 |
sean-k-mooney | yes | 19:16 |
sean-k-mooney | he wanted to avoid the move cliams becuae he wante use to use placment as much as possible | 19:17 |
dansmith | I don't really see anywhere that he said that, but I don't doubt it | 19:17 |
artom | I really didn't want to start a witch hunt :( | 19:19 |
sean-k-mooney | that was my preference too by the way so im biased in recolection but i think if we need too i can adapt the sriov code to get the device form the move claim if needed | 19:19 |
dansmith | artom: no, just looking for context | 19:20 |
artom | Choices were made, I'm sure at the time with the available info they were optimal | 19:20 |
artom | dansmith, ack | 19:20 |
dansmith | artom: this is the problem with bolting on incremental "okay we'll justsupport live migration with pci if they're like this" kind of features | 19:20 |
dansmith | after you do that a couple times, you end up here | 19:20 |
donnyd | melwitt: well I am really just guessing because I don't understand how the http.store option works... I was hopeful that I could configure the filestore to save images in a particular location, and then have that exposed directly by apache (or something of the like) https://opendev.org/openstack/glance_store/src/branch/master/glance_store/_drivers/http.py | 19:20 |
sean-k-mooney | well we evenually wanted to use the multiple port binding workflow for could migration and then move the sriov port claiming to always use the live migration flow when we did that | 19:21 |
* dansmith goes for a shearing | 19:22 | |
artom | dansmith, isn't there a "law" about that? | 19:22 |
mriedem | ooo i'm scheduled for that tomorrow | 19:22 |
mriedem | we're sympatico | 19:22 |
artom | Products reproduce the communication structures of their organisations, or something like that? | 19:22 |
donnyd | melwitt: but i honestly don't know because I have never had to get that far into glance... in my normal uses cases... it just works | 19:22 |
artom | So us, being distributed, loose and non-cohesive, will produce features that are the same? | 19:22 |
artom | Aha, https://en.wikipedia.org/wiki/Conway%27s_law | 19:23 |
mriedem | "organizations that are a clusterfuck will produce software that is a clusterfuck" | 19:24 |
mriedem | got it | 19:24 |
artom | I mean, yes. :P | 19:24 |
sean-k-mooney | fyi i just migrated an instace with a numa toployg and an sriov macvtap port | 19:25 |
melwitt | donnyd: understood. I'm grepping around and also can't find the config option in glance about how to disable checksum verification for the image_cache. I'll need to chat with penick and get back to you. I'll ask him if he knows anything about http.store while I'm at it | 19:25 |
sean-k-mooney | i need to now look at the xml and see what happend | 19:25 |
sean-k-mooney | but it succeded | 19:25 |
donnyd | melwitt: much appreciate... or just any possible way to make it faster would be a massive help | 19:26 |
mriedem | melwitt: that stuff should be disabled by default in nova | 19:26 |
mriedem | https://docs.openstack.org/nova/latest/configuration/config.html#glance.verify_glance_signatures | 19:26 |
melwitt | mriedem: yeah, he said it's different than the signature verify | 19:26 |
melwitt | and I just can't find it anywhere | 19:26 |
mriedem | is this based on his ocata cloud? | 19:26 |
melwitt | it might be | 19:27 |
mriedem | https://docs.openstack.org/nova/pike/configuration/config.html#libvirt.checksum_base_images | 19:27 |
mriedem | that stuff has been removed | 19:27 |
melwitt | oh, dangit | 19:27 |
mriedem | dagnabbit even | 19:27 |
melwitt | yeah, for sure | 19:28 |
melwitt | there goes that lead :( | 19:28 |
sean-k-mooney | artom: the xml looks correct and it was updated | 19:28 |
donnyd | what would be real slick is to just mount the same share for glance filestore backend and nova image cache | 19:29 |
mriedem | make way | 19:29 |
donnyd | that way once the image is uploaded the hypervisor would have immediate access | 19:29 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add nova.compute.utils.delete_image https://review.opendev.org/637605 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Refactor ComputeManager.remove_volume_connection https://review.opendev.org/642183 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add power_on kwarg to ComputeDriver.spawn() method https://review.opendev.org/642590 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add Destination.allow_cross_cell_move field https://review.opendev.org/614035 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Change HostManager to allow scheduling to other cells https://review.opendev.org/614037 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add prep_snapshot_based_resize_at_dest compute method https://review.opendev.org/633293 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add PrepResizeAtDestTask https://review.opendev.org/627890 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add prep_snapshot_based_resize_at_source compute method https://review.opendev.org/634832 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add PrepResizeAtSourceTask https://review.opendev.org/627891 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add finish_snapshot_based_resize_at_dest compute method https://review.opendev.org/635080 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add FinishResizeAtDestTask https://review.opendev.org/635646 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Execute CrossCellMigrationTask from MigrationTask https://review.opendev.org/635668 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Plumb allow_cross_cell_resize into compute API resize() https://review.opendev.org/635684 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Filter duplicates from compute API get_migrations_sorted() https://review.opendev.org/636224 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Start functional testing for cross-cell resize https://review.opendev.org/636253 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Handle target host cross-cell cold migration in conductor https://review.opendev.org/642591 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Validate image/create during cross-cell resize functional testing https://review.opendev.org/642592 | 19:30 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add zones wrinkle to TestMultiCellMigrate https://review.opendev.org/643450 | 19:30 |
mriedem | gibi: stephenfin: dansmith: ^ i've rebased and moved the already +2'ed and less controversial things to the bottom to help move those through | 19:31 |
sean-k-mooney | artom: http://paste.openstack.org/show/765671/ | 19:32 |
donnyd | mriedem: you are supposed to wait for Friday afternoons for that kind of action.. so my CI has something to do... it gets lonely and bored | 19:32 |
sean-k-mooney | that the xml before and after the migration | 19:32 |
mriedem | donnyd: i wanted to test your CI's image download capabilities | 19:33 |
*** itlinux has quit IRC | 19:33 | |
donnyd | LOL - thanks | 19:33 |
*** itlinux has joined #openstack-nova | 19:33 | |
artom | sean-k-mooney, I think the best thing would be to write up a comprehensive test report | 19:35 |
artom | I'm fairly confident that the XML part works, that's been there since Stein | 19:35 |
artom | It's more the new surprises - pci_requests from the migration_context, etc | 19:35 |
artom | It doesn't help that the path series keeps changing under you | 19:36 |
artom | But yeah | 19:36 |
artom | I think testing by a more or less independent third party would be awesome :) | 19:36 |
artom | I need to drop for a bit while I reconnect to phone's hotspot | 19:36 |
artom | brb | 19:36 |
*** artom has quit IRC | 19:36 | |
sean-k-mooney | well im not sure i qualify in that regard but ill test it none the less | 19:37 |
sean-k-mooney | the migration context is here http://paste.openstack.org/show/765743/ | 19:42 |
sean-k-mooney | and the pci request in the instance extra table are empty | 19:42 |
*** itlinux has quit IRC | 19:42 | |
*** artom has joined #openstack-nova | 19:43 | |
sean-k-mooney | mysql> select pci_requests from instance_extra as ie where ie.instance_uuid='b304fedf-aa31-4990-9a74-29729eed336d'; | 19:43 |
sean-k-mooney | +--------------+ | 19:43 |
sean-k-mooney | | pci_requests | | 19:43 |
sean-k-mooney | +--------------+ | 19:43 |
sean-k-mooney | | [] | | 19:43 |
sean-k-mooney | +--------------+ | 19:43 |
artom | Yey :( | 19:44 |
artom | At least we know we weren't wrong to worry | 19:44 |
sean-k-mooney | the pci device will will be claimed with the instance uuid in the pci tracker | 19:45 |
sean-k-mooney | but its not correct. | 19:45 |
artom | Yeah, but won't that screw up future operations? | 19:45 |
artom | Suddenly you can evacuate to a host with no SRIOV, for example | 19:45 |
sean-k-mooney | i would have to do more testing but non nessiarally | 19:46 |
artom | Still seems dangerous | 19:47 |
artom | I mean, I want NUMA LM to land a *lot*, but there's a minimum level of quality here ;) | 19:47 |
artom | And that's not part of it | 19:47 |
sean-k-mooney | ya i know. the request_spec still has the orginal pci_request in it and we do not support pci hot attach | 19:49 |
sean-k-mooney | that is the request spec http://paste.openstack.org/show/765824/ | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: FUP for I66d8f06f19c5c631e33208580428aa843abb38d2 https://review.opendev.org/678951 | 19:49 |
sean-k-mooney | stephenfin: by the way i think we need to apply https://review.opendev.org/#/c/675776/ | 19:52 |
sean-k-mooney | i am seeing some other issue that that will likely fix | 19:52 |
sean-k-mooney | artom: with the same code if i use a non numa flavor the pci reuests are not removed | 19:54 |
sean-k-mooney | artom: we might be able to use the temporay mutation thing to ensure that the change is not saved | 19:55 |
*** jmlowe has joined #openstack-nova | 19:56 | |
artom | sean-k-mooney, errr... | 19:56 |
artom | We want to save the *other* stuff | 19:56 |
*** nweinber has quit IRC | 19:56 | |
mriedem | sean-k-mooney: so definite -1 here right? https://review.opendev.org/#/c/635669/39/nova/compute/resource_tracker.py@307 | 19:59 |
mriedem | can someone do that? | 19:59 |
sean-k-mooney | mriedem: yes | 19:59 |
sean-k-mooney | commeted and -1'd | 20:02 |
artom | Sheesh, rub it in, will'ya | 20:06 |
mriedem | i can run a -2 into it if that helps | 20:07 |
mriedem | *rub | 20:07 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update help for image_cache_manager_interval option https://review.opendev.org/678954 | 20:07 |
sean-k-mooney | rubbing salt in a wound helps it heal :P | 20:07 |
artom | mriedem, no, rubbing a -2 does something entirely different | 20:07 |
*** prometheanfire has quit IRC | 20:08 | |
*** prometheanfire has joined #openstack-nova | 20:09 | |
*** dpawlik has joined #openstack-nova | 20:09 | |
*** weshay is now known as weshay_pto | 20:11 | |
sean-k-mooney | in other new https://bugs.launchpad.net/nova/+bug/1804502 is a pain | 20:12 |
openstack | Launchpad bug 1804502 in OpenStack Compute (nova) "Rebuild server with NUMATopologyFilter enabled fails (in some cases)" [Undecided,In progress] - Assigned to David Hill (david-hill-ubisoft) | 20:12 |
sean-k-mooney | and yes i know its a feature not a bug but i think i need to go figure out how to make that work for U ... | 20:12 |
mriedem | cfriesen has a duplicate of that i think | 20:12 |
sean-k-mooney | ya its a long knonw issue | 20:13 |
sean-k-mooney | none of the filters are really aware of rebuild/resize | 20:13 |
sean-k-mooney | and they all reate the request as a new spawn | 20:13 |
*** dpawlik has quit IRC | 20:13 | |
sean-k-mooney | so we need double the capasity yada yada yada | 20:13 |
sean-k-mooney | its fixable it just a pain | 20:14 |
sean-k-mooney | and we have had more important things like moveing this stuff to placment | 20:14 |
mriedem | the filters are definitely aware of rebuild now | 20:14 |
sean-k-mooney | not all of them | 20:14 |
sean-k-mooney | or rahter they have to be coded to be aware | 20:15 |
mriedem | correct, https://github.com/openstack/nova/blob/master/nova/scheduler/filters/numa_topology_filter.py#L26 | 20:15 |
sean-k-mooney | we now at least go back to the schduler when the image changes | 20:15 |
dansmith | artom: even if there's technically nothing that breaks because we nuke pci_requests, I still don't think it makes sense to land it in a state where we're clobbering data, which I think is what you're getting at | 20:15 |
*** bbowen has quit IRC | 20:15 | |
sean-k-mooney | yes it needs to run on rebuild but it does not have special handeling for rebuild | 20:16 |
mriedem | dansmith: i woudln't be surprised if a secondary operatoin on the instance after the clobbered instance live migratoin blows up | 20:17 |
dansmith | mriedem: right, I said that earlier | 20:17 |
mriedem | have seen all kinds of fun stuff with things like that and accidentally persisting crap to the request spec | 20:17 |
dansmith | mriedem: I just didn't go looking for it | 20:17 |
sean-k-mooney | i can test that if ye like | 20:17 |
sean-k-mooney | i still have it | 20:17 |
dansmith | even if we can't poke something, it's still not okay to clobber them I think | 20:17 |
sean-k-mooney | but we obviously should still fix it | 20:18 |
dansmith | anything on the compute or cell conductor that needs pci requests wouldn't be able to get it from the reqspec | 20:18 |
dansmith | and as it is with pci, it's all very untested, so it'd be easy to miss something we're breaking | 20:18 |
sean-k-mooney | yes that is true | 20:18 |
mriedem | i'm not even sure what you'd test after the clobbered live migration to show it fails, or how we'd recreate that in a functional test to prevent future regressions, but i know that having sean manually test all these weird things isn't sustainable | 20:19 |
dansmith | obviously | 20:20 |
mriedem | i guess you could like cold migrate back to the source host and verify there are any claimed pci devices for the instance | 20:20 |
sean-k-mooney | the pci device are still tracked in the pci tacker as used by the instace | 20:21 |
mriedem | oh great | 20:21 |
sean-k-mooney | actully i should check that | 20:21 |
mriedem | didn't we just talk about something like this...vpmem? | 20:21 |
sean-k-mooney | kind of | 20:21 |
*** artom has quit IRC | 20:23 | |
sean-k-mooney | yes just check we are correctly tracking thing at the host level in the pci tracker at least | 20:23 |
*** gbarros has joined #openstack-nova | 20:24 | |
*** artom has joined #openstack-nova | 20:25 | |
donnyd | mriedem: melwitt I have the image cache dir all setup and running | 20:25 |
*** tbachman has quit IRC | 20:25 | |
donnyd | will see tomorrow if it fixes the issue | 20:25 |
zigo | I got 3 unit test failures in Nova when building Stein in sid, because the tests are expecting from libvirt: | 20:29 |
zigo | <target bus="virtio" dev="vda"/> | 20:29 |
zigo | when libvirt really returns: | 20:29 |
zigo | <target dev="vda" bus="virtio"/> | 20:29 |
zigo | Has this been fixed in master? | 20:29 |
zigo | mriedem: ^ | 20:30 |
*** gbarros has quit IRC | 20:31 | |
mriedem | zigo: no idea, but it's likely some test using dicts and not handling sort order on the keys | 20:32 |
*** gbarros has joined #openstack-nova | 20:32 | |
zigo | Exactly. | 20:32 |
mriedem | open a bug with the test traceback | 20:32 |
zigo | Sure. | 20:32 |
zigo | mriedem: Still in Launchpad, or in Storyboard? | 20:32 |
zigo | Launchpad, it seems. | 20:33 |
sean-k-mooney | if we cold migrate after live migration it failts with a port update error | 20:33 |
sean-k-mooney | Error: Failed to perform requested operation on instance "sriov-macvtap", the instance has an error status: Please try again later [Error: Port update failed for port 485937a6-7611-4948-8420-4bfd73e15ea8: Unable to correlate PCI slot 0000:01:10.1]. | 20:34 |
*** xek has quit IRC | 20:34 | |
zigo | mriedem: https://bugs.launchpad.net/nova/+bug/1841667 | 20:36 |
openstack | Launchpad bug 1841667 in OpenStack Compute (nova) "failing libvirt tests: need ordering" [Undecided,New] | 20:36 |
zigo | Feel free to rewrite the title if you find it not good enough. | 20:36 |
*** damien_r has joined #openstack-nova | 20:36 | |
sean-k-mooney | well that is unexpected.. | 20:38 |
sean-k-mooney | it errored after seting the host to the other host e.g. when migrating form cloud-4->cloud-3 it was errord on cloud-3 but libivrt did not have a domain on cloud-3 | 20:41 |
sean-k-mooney | it still had a domian on cloud-4 but was shutoff | 20:41 |
sean-k-mooney | after a hardreboot it went form error to running on cloud-3 | 20:41 |
*** damien_r has quit IRC | 20:41 | |
*** gbarros has quit IRC | 20:44 | |
mriedem | zigo: what version of lxml is being used? | 20:46 |
mriedem | prometheanfire: efried: wasn't there a recent issue with newer lxml and nova unit tests? | 20:47 |
sean-k-mooney | ya i delete that after the hard reboot it was running on cloud-3 with the device claimed on cloud-4 so that is not good. | 20:47 |
sean-k-mooney | im going to unstack and call it a night. | 20:47 |
efried | mriedem: yes, sean-k-mooney was looking into it. | 20:47 |
sean-k-mooney | mriedem: ya im ment to be looking into that | 20:47 |
efried | had to do with attribute ordering. | 20:47 |
sean-k-mooney | it was what i had planned to do today... | 20:47 |
mriedem | yup https://bugs.launchpad.net/nova/+bug/1841667 | 20:47 |
openstack | Launchpad bug 1841667 in OpenStack Compute (nova) stein "failing libvirt tests: need ordering" [Undecided,New] | 20:47 |
zigo | mriedem: 4.4.1. | 20:48 |
mriedem | efried: sean-k-mooney: did we have a bug for it? | 20:48 |
zigo | That's kind of new in Sid... | 20:48 |
sean-k-mooney | there was one filed yes | 20:48 |
sean-k-mooney | mriedem: prometheanfire filed it i think | 20:48 |
*** brault has quit IRC | 20:48 | |
mriedem | https://bugs.launchpad.net/nova/+bug/1838666 | 20:48 |
openstack | Launchpad bug 1838666 in OpenStack Compute (nova) "lxml 4.4.0 causes failed tests in nova" [Undecided,Confirmed] | 20:48 |
mriedem | got it | 20:48 |
zigo | mriedem: Uploaded 10 days ago: https://tracker.debian.org/pkg/lxml | 20:49 |
*** lpetrut has joined #openstack-nova | 20:50 | |
*** lpetrut has quit IRC | 20:51 | |
*** lpetrut has joined #openstack-nova | 20:51 | |
zigo | Wow, with my brand new laptop, it only takes me 5 minutes to run all Nova unit tests now! :) | 20:51 |
mriedem | sounds like a challenge | 20:51 |
sean-k-mooney | ya its nice when you can throw more cores at it and it scales well | 20:53 |
openstackgerrit | Eric Fried proposed openstack/nova master: Raise when inventory retrieval & refresh fails https://review.opendev.org/678957 | 20:53 |
openstackgerrit | Eric Fried proposed openstack/nova master: Remove @safe_connect from _get_provider_aggregates https://review.opendev.org/678958 | 20:53 |
openstackgerrit | Eric Fried proposed openstack/nova master: Remove @safe_connect from _get_sharing_providers https://review.opendev.org/678959 | 20:53 |
openstackgerrit | Eric Fried proposed openstack/nova master: Invalidate cache when _refresh_associations fails https://review.opendev.org/678960 | 20:53 |
efried | mriedem: This ^ may be sufficient for bug 1841481, assuming "the next refresh" is quick enough for us to repopulate the cache. Otherwise we may wish to wrap _refresh_associations in a one-loop @retry (but not on ClientException). Let me know your thoughts if you please. | 20:53 |
openstack | bug 1841481 in OpenStack Compute (nova) "Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache" [Medium,Triaged] https://launchpad.net/bugs/1841481 - Assigned to Eric Fried (efried) | 20:53 |
* efried ==> school runs | 20:53 | |
zigo | sean-k-mooney: Got a 6 core Xeon (so 12 threads...)... | 20:55 |
sean-k-mooney | avoids the whos system is faster trap. | 20:56 |
*** lpetrut has quit IRC | 20:58 | |
zigo | The nice thing is that I do have enough NVME space to have a full Debian mirror ... | 20:59 |
zigo | deb file:///home/ftp/debian ... :P | 20:59 |
sean-k-mooney | thats like somewhere betten 230-400GB right | 21:00 |
sean-k-mooney | i used to use apt-cacheng to cache the things i actully was using | 21:00 |
sean-k-mooney | i should set that up again. | 21:00 |
zigo | sean-k-mooney: 463 GB right now, only amd64 + sources, all available suites. | 21:00 |
zigo | The point is, it directly fetches on drive, avoiding network socket indirections... | 21:01 |
sean-k-mooney | right we had a ubunut mirror for the intel nfv ci and i rememebr it was somere in that range | 21:01 |
zigo | sean-k-mooney: I used approx. | 21:01 |
sean-k-mooney | but that was like 3 years ago | 21:01 |
sean-k-mooney | so i assuemd it grew | 21:01 |
mriedem | melwitt: time for you to bug dansmith https://review.opendev.org/#/c/507486/69 | 21:03 |
sean-k-mooney | prometheanfire: do you know if the lxml thing only happins on a specific python verions | 21:03 |
sean-k-mooney | im running the test uder py36 at the momemt | 21:03 |
mriedem | zigo: are you testing on py27 or py36? | 21:04 |
sean-k-mooney | i can test both but py36 is faster for me to test with | 21:04 |
sean-k-mooney | i have 4 test failrue on py36 | 21:04 |
sean-k-mooney | all of which are not using our xml matcher | 21:06 |
sean-k-mooney | there doing self.assertEqual(diska_xml.strip(), actual_diska_xml.strip()) | 21:06 |
sean-k-mooney | or assert called with | 21:06 |
*** spsurya has quit IRC | 21:07 | |
*** itlinux has joined #openstack-nova | 21:07 | |
*** damien_r has joined #openstack-nova | 21:07 | |
*** damien_r has quit IRC | 21:07 | |
prometheanfire | sean-k-mooney: I don't know :( | 21:08 |
sean-k-mooney | it broke on py36 so i can reporduce | 21:08 |
sean-k-mooney | so its fine | 21:08 |
zigo | mriedem: python 3.7 | 21:08 |
zigo | sean-k-mooney: py37 | 21:09 |
sean-k-mooney | it likely not python specfic | 21:09 |
zigo | Right. | 21:09 |
sean-k-mooney | actully at least one of the tests fail on py3 and pass on py27 | 21:11 |
sean-k-mooney | so maybe it is but ill fix it for both in anycase | 21:11 |
*** itlinux has quit IRC | 21:13 | |
donnyd | so here is some interesting information about the image cache | 21:14 |
donnyd | I keep getting this error thrown | 21:14 |
donnyd | https://www.irccloud.com/pastebin/9eB2NYg4/ | 21:14 |
sean-k-mooney | ya i have seen that | 21:15 |
donnyd | most notably this nova.exception.DiskNotFound: No disk at /var/lib/nova/instances/_base/1675f8d33da6cfdd6354d2078b61c3c11ff417d0 | 21:15 |
donnyd | however that image is there, and is continually downloaded over and over | 21:15 |
sean-k-mooney | i think mdbooth was fixing a cese were if we fail to resize an image we delete the base image backing file in the cache | 21:15 |
donnyd | https://www.irccloud.com/pastebin/yllMTx02/ | 21:16 |
donnyd | its not a resize | 21:17 |
donnyd | -rw-r--r-- 1 libvirt-qemu kvm 7054688256 Aug 27 21:13 1675f8d33da6cfdd6354d2078b61c3c11ff417d0 | 21:17 |
donnyd | -rw-r--r-- 1 nova nova 0 Aug 27 21:17 1675f8d33da6cfdd6354d2078b61c3c11ff417d0.part | 21:17 |
mriedem | wonder if there is an access issue? | 21:17 |
mriedem | the DiskNotFound might just be a shitty generic exception that is thrown | 21:18 |
mriedem | *raised | 21:18 |
donnyd | well its owned by nova till download and then changed to libvirtd-qemu | 21:18 |
*** markvoelker has quit IRC | 21:19 | |
mriedem | raising from here though so not an access issue https://github.com/openstack/nova/blob/master/nova/virt/images.py#L58 | 21:19 |
donnyd | If I mount the whole directory for nova/instances this error goes away | 21:20 |
sean-k-mooney | donnyd: normally nova is in the libvirt group | 21:21 |
*** gbarros has joined #openstack-nova | 21:21 | |
sean-k-mooney | so nova should be able to read and write to image owned by nova | 21:22 |
donnyd | Well there isnt an issue if I purge the cache, it seems to download and then run with it fine | 21:22 |
sean-k-mooney | *libvirt/qemu | 21:22 |
donnyd | But each hypervisor that tries to read this image throws that error and then redownloads | 21:23 |
*** bbowen has joined #openstack-nova | 21:24 | |
*** ivve has quit IRC | 21:25 | |
sean-k-mooney | the way the xml match works is dumb/ not documented | 21:27 |
sean-k-mooney | why do retrun nothing mean it matched | 21:29 |
sean-k-mooney | now that i know that i can actully use it | 21:30 |
sean-k-mooney | oh this comes from testtools... | 21:30 |
donnyd | well its reporting back exactly what qemu-img is telling it | 21:32 |
sean-k-mooney | donnyd: what os are you on | 21:32 |
donnyd | sudo qemu-img info /var/lib/nova/instances/_base/1675f8d33da6cfdd6354d2078b61c3c11ff417d0 | 21:33 |
donnyd | qemu-img: Could not open '/var/lib/nova/instances/_base/1675f8d33da6cfdd6354d2078b61c3c11ff417d0' | 21:33 |
donnyd | ubuntu 18.04 | 21:33 |
sean-k-mooney | can you check if apparmor or systmed is blockign it | 21:33 |
sean-k-mooney | e.g. check dmesg | 21:33 |
donnyd | possible... | 21:34 |
donnyd | not sure what i am looking for apparmor to say | 21:35 |
sean-k-mooney | something like this | 21:36 |
sean-k-mooney | audit: type=1400 audit(1565883657.055:70): apparmor="ALLOWED" operation="open" profile="libreoffice-soffice" name="/proc/31189/comm" pid=31189 comm="soffice.bin" requested_mask="r" denied_mask="r" fsuid=1000 ouid=1000 | 21:36 |
sean-k-mooney | exceet you are looking for apparmor="DENIED" | 21:36 |
donnyd | when i grep fro DENIED i return nothing | 21:37 |
sean-k-mooney | its proably not an apparmor thing but it could be if the file permession allow you to read the file as root | 21:37 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Introduce live_migration_claim() https://review.opendev.org/635669 | 21:38 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: New objects for NUMA live migration https://review.opendev.org/634827 | 21:38 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: LM: Use Claims to update numa-related XML on the source https://review.opendev.org/635229 | 21:38 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: NUMA live migration support https://review.opendev.org/634606 | 21:38 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration https://review.opendev.org/640021 | 21:38 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Functional tests for NUMA live migration https://review.opendev.org/672595 | 21:38 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: DNM: Run LM integration tests with NUMA flavor https://review.opendev.org/678887 | 21:38 |
mriedem | sean-k-mooney: https://blueprints.launchpad.net/nova/+spec/image-metadata-prefiltering is now in a runway | 21:46 |
sean-k-mooney | mriedem: ok | 21:47 |
*** mriedem has quit IRC | 21:48 | |
sean-k-mooney | the nova-next failure in the middle patch looks unrelated so i just rechecked it | 21:49 |
sean-k-mooney | the rest are clean | 21:49 |
sean-k-mooney | if you see anything ill adress it tomorow | 21:50 |
*** trident has quit IRC | 22:05 | |
*** gbarros has quit IRC | 22:08 | |
*** dpawlik has joined #openstack-nova | 22:10 | |
sean-k-mooney | prometheanfire: i think i have fixed the lxml issue locally | 22:13 |
*** trident has joined #openstack-nova | 22:13 | |
*** dpawlik has quit IRC | 22:15 | |
*** rcernin has joined #openstack-nova | 22:15 | |
sean-k-mooney | fungi: ya that looks fine | 22:15 |
sean-k-mooney | * cve description | 22:15 |
*** markvoelker has joined #openstack-nova | 22:15 | |
fungi | thanks again sean-k-mooney! | 22:16 |
*** markvoelker has quit IRC | 22:20 | |
openstackgerrit | sean mooney proposed openstack/nova master: fix lxml compatiblity issues https://review.opendev.org/678964 | 22:25 |
sean-k-mooney | zigo: efried prometheanfire ^ | 22:25 |
sean-k-mooney | i think that should fix it | 22:25 |
zigo | sean-k-mooney: Nice ! :) | 22:27 |
zigo | I've already blacklisted the unit tests in my last upload to Sid though, but if it passes, I'll add your patch instead. | 22:27 |
* zigo is currently rebuilding Ceph Nautilus for Buster. | 22:27 | |
*** mlavalle has quit IRC | 22:29 | |
efried | sean-k-mooney: assertXmlEqual is the one-step alias for what you're doing in there. | 22:29 |
sean-k-mooney | yep | 22:30 |
sean-k-mooney | i knew that was a thing too but blanked when writing it | 22:30 |
sean-k-mooney | but ill fix that tommrow | 22:30 |
sean-k-mooney | have people seen "ValueError: invalid literal for int() with base 10: '12.1'" | 22:37 |
sean-k-mooney | oslo_utils imageutils | 22:37 |
sean-k-mooney | https://zuul.opendev.org/t/openstack/build/177a7183dd234f68886e736ca4d82cd5/log/compute/logs/screen-n-cpu.txt.gz?severity=4#1367 | 22:40 |
sean-k-mooney | apparently that is a thing | 22:41 |
sean-k-mooney | >>> int('12.1') | 22:41 |
sean-k-mooney | Traceback (most recent call last): | 22:41 |
sean-k-mooney | File "<stdin>", line 1, in <module> | 22:42 |
sean-k-mooney | ValueError: invalid literal for int() with base 10: '12.1' | 22:42 |
efried | I think it may be telling you that '12.1' is an invalid literal for int() with base 10. | 22:43 |
sean-k-mooney | this is the oslo code https://github.com/openstack/oslo.utils/blob/master/oslo_utils/imageutils.py#L108 | 22:43 |
sean-k-mooney | they are trying to caset what i assume is a sting to an int | 22:44 |
sean-k-mooney | int('12') work | 22:44 |
sean-k-mooney | but it does not truncate | 22:45 |
efried | I'm guessing the caller either omitted units or forgot to convert or forgot to round. | 22:45 |
*** gbarros has joined #openstack-nova | 22:45 | |
sean-k-mooney | well this is the nova call site https://github.com/openstack/nova/blob/master/nova/virt/images.py#L95 | 22:46 |
sean-k-mooney | and out is the standard out from running "'env', 'LC_ALL=C', 'LANG=C', 'qemu-img', 'info', path£ | 22:47 |
*** avolkov has quit IRC | 22:47 | |
sean-k-mooney | so either the qemu output has changed to include decimal | 22:47 |
sean-k-mooney | or this has always been broken | 22:47 |
sean-k-mooney | i think qemu has changed its proably both | 22:48 |
sean-k-mooney | that is from a f29 job with virt preview repor so its basically the nightly build of qemu and libvirt | 22:48 |
*** brault has joined #openstack-nova | 22:49 | |
sean-k-mooney | oslo does not handel that import correctly and we pass the output form qemu directly | 22:49 |
*** brault has quit IRC | 22:53 | |
*** tkajinam has joined #openstack-nova | 23:06 | |
*** markvoelker has joined #openstack-nova | 23:15 | |
*** markvoelker has quit IRC | 23:20 | |
*** prometheanfire has quit IRC | 23:33 | |
*** prometheanfire has joined #openstack-nova | 23:33 | |
*** macz has quit IRC | 23:36 | |
*** tbachman has joined #openstack-nova | 23:46 | |
*** markvoelker has joined #openstack-nova | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!