*** macza has joined #openstack-nova | 00:33 | |
*** macza has quit IRC | 00:37 | |
*** gyee has quit IRC | 00:42 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Retry on MessagingTimeout to init compute RPC API during n-cpu start https://review.openstack.org/597330 | 00:44 |
---|---|---|
*** mriedem has quit IRC | 00:44 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Retry on MessagingTimeout to init compute RPC API during n-cpu start https://review.openstack.org/597330 | 00:46 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Retry on MessagingTimeout to init compute RPC API during n-cpu start https://review.openstack.org/597330 | 00:48 |
*** tetsuro has joined #openstack-nova | 00:52 | |
*** liuyulong has joined #openstack-nova | 00:56 | |
*** donghm has joined #openstack-nova | 01:02 | |
*** itlinux is now known as itlinux-away | 01:09 | |
*** alex_xu has joined #openstack-nova | 01:14 | |
*** fungi has quit IRC | 01:18 | |
*** imacdonn has quit IRC | 01:18 | |
*** fungi has joined #openstack-nova | 01:19 | |
*** imacdonn has joined #openstack-nova | 01:19 | |
*** odyssey4me has quit IRC | 01:19 | |
*** dosaboy has quit IRC | 01:19 | |
*** zioproto has quit IRC | 01:19 | |
*** beagles has quit IRC | 01:19 | |
*** dtantsur|afk has quit IRC | 01:19 | |
*** beagles has joined #openstack-nova | 01:19 | |
*** odyssey4me has joined #openstack-nova | 01:20 | |
*** itlinux-away is now known as itlinux | 01:21 | |
*** dtantsur has joined #openstack-nova | 01:22 | |
*** itlinux is now known as itlinux-away | 01:22 | |
*** gbarros has joined #openstack-nova | 01:22 | |
*** itlinux-away is now known as itlinux | 01:24 | |
*** itlinux is now known as itlinux-away | 01:24 | |
*** yonglihe has quit IRC | 01:25 | |
*** Bhujay has joined #openstack-nova | 01:25 | |
*** yonglihe has joined #openstack-nova | 01:25 | |
*** itlinux-away is now known as itlinux | 01:27 | |
*** itlinux is now known as itlinux-away | 01:27 | |
*** itlinux-away is now known as itlinux | 01:28 | |
*** itlinux is now known as itlinux-away | 01:29 | |
*** dpawlik has joined #openstack-nova | 01:29 | |
*** dpawlik has quit IRC | 01:33 | |
*** erlon has quit IRC | 01:35 | |
*** Dinesh_Bhor has joined #openstack-nova | 01:45 | |
*** hongbin has joined #openstack-nova | 01:48 | |
*** ujjain has quit IRC | 01:49 | |
*** pas-ha has quit IRC | 01:49 | |
*** simondodsley_ has quit IRC | 01:49 | |
*** ianw has quit IRC | 01:49 | |
*** johanssone has quit IRC | 01:49 | |
*** andreykurilin has quit IRC | 01:49 | |
*** zigo has quit IRC | 01:49 | |
*** spsurya has quit IRC | 01:49 | |
*** jbernard has quit IRC | 01:49 | |
*** geekinutah has quit IRC | 01:49 | |
*** rm_work has quit IRC | 01:49 | |
*** jbernard has joined #openstack-nova | 01:49 | |
*** andreykurilin has joined #openstack-nova | 01:49 | |
*** itlinux-away is now known as itlinux | 01:51 | |
*** itlinux is now known as itlinux-away | 01:51 | |
*** ccdevil has joined #openstack-nova | 01:52 | |
*** ujjain has joined #openstack-nova | 01:53 | |
*** ianw has joined #openstack-nova | 01:53 | |
*** rm_work has joined #openstack-nova | 01:55 | |
*** bhagyashris has joined #openstack-nova | 01:56 | |
*** ccdevil has left #openstack-nova | 01:57 | |
*** lei-zh has joined #openstack-nova | 01:57 | |
*** Dinesh_Bhor has quit IRC | 01:58 | |
*** lei-zh has quit IRC | 02:00 | |
*** lei-zh has joined #openstack-nova | 02:00 | |
*** itlinux-away is now known as itlinux | 02:01 | |
*** ianw has quit IRC | 02:03 | |
*** ianw has joined #openstack-nova | 02:04 | |
*** vishakha has quit IRC | 02:08 | |
*** Dinesh_Bhor has joined #openstack-nova | 02:09 | |
*** itlinux is now known as itlinux-away | 02:11 | |
*** psachin has joined #openstack-nova | 02:16 | |
*** hamzy has quit IRC | 02:17 | |
*** hamzy has joined #openstack-nova | 02:18 | |
*** itlinux-away is now known as itlinux | 02:29 | |
*** itlinux is now known as itlinux-away | 02:29 | |
*** sapd1 has joined #openstack-nova | 02:34 | |
*** openstack has joined #openstack-nova | 02:51 | |
*** zzzeek has joined #openstack-nova | 02:52 | |
*** ChanServ sets mode: +o openstack | 02:52 | |
*** itlinux-away is now known as itlinux | 02:53 | |
*** itlinux is now known as itlinux-away | 02:53 | |
*** itlinux-away is now known as itlinux | 02:54 | |
*** manjeets has quit IRC | 02:55 | |
*** dtroyer has joined #openstack-nova | 02:57 | |
*** jhesketh has joined #openstack-nova | 02:57 | |
*** openstackstatus has joined #openstack-nova | 03:01 | |
*** ChanServ sets mode: +v openstackstatus | 03:01 | |
*** psachin has quit IRC | 03:13 | |
*** bhagyashris has quit IRC | 03:14 | |
*** psachin has joined #openstack-nova | 03:14 | |
*** hongbin has quit IRC | 03:51 | |
*** Dinesh_Bhor has quit IRC | 03:52 | |
*** spsurya has joined #openstack-nova | 03:53 | |
*** nicolasbock has quit IRC | 03:58 | |
*** udesale has joined #openstack-nova | 03:59 | |
*** tetsuro has quit IRC | 04:04 | |
*** tetsuro has joined #openstack-nova | 04:26 | |
*** dpawlik has joined #openstack-nova | 04:29 | |
*** Dinesh_Bhor has joined #openstack-nova | 04:31 | |
*** dpawlik has quit IRC | 04:49 | |
*** dpawlik has joined #openstack-nova | 04:50 | |
*** dpawlik has quit IRC | 04:56 | |
*** tetsuro has quit IRC | 05:04 | |
*** janki has joined #openstack-nova | 05:09 | |
*** itlinux has quit IRC | 05:31 | |
*** ratailor has joined #openstack-nova | 05:35 | |
*** macza has joined #openstack-nova | 05:36 | |
*** macza has quit IRC | 05:40 | |
*** maciejjozefczyk has joined #openstack-nova | 05:40 | |
*** maciejjozefczyk has quit IRC | 05:41 | |
*** hongda has joined #openstack-nova | 05:44 | |
*** hongda has quit IRC | 05:45 | |
*** alexchadin has joined #openstack-nova | 05:47 | |
*** links has joined #openstack-nova | 05:48 | |
*** Luzi has joined #openstack-nova | 05:51 | |
openstackgerrit | zhangyangyang proposed openstack/nova master: Replace assertRaisesRegexp with assertRaisesRegex https://review.openstack.org/597378 | 05:54 |
*** dpawlik has joined #openstack-nova | 06:01 | |
*** ccamacho has joined #openstack-nova | 06:03 | |
*** dpawlik has quit IRC | 06:03 | |
*** dpawlik has joined #openstack-nova | 06:03 | |
*** dpawlik has quit IRC | 06:04 | |
*** dpawlik has joined #openstack-nova | 06:04 | |
*** BlackDex has joined #openstack-nova | 06:05 | |
*** udesale has quit IRC | 06:07 | |
openstackgerrit | Yikun Jiang (Kero) proposed openstack/nova master: [placement] Use osloutils uuidsentinel https://review.openstack.org/594144 | 06:15 |
*** sahid has joined #openstack-nova | 06:18 | |
*** alexchadin has quit IRC | 06:26 | |
*** alexchadin has joined #openstack-nova | 06:33 | |
*** adrianc has joined #openstack-nova | 06:35 | |
*** markvoelker has joined #openstack-nova | 06:39 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Fix a broken conf file description in networking doc https://review.openstack.org/597391 | 06:47 |
*** kashyap has joined #openstack-nova | 06:47 | |
*** takashin has left #openstack-nova | 06:48 | |
*** pcaruana has joined #openstack-nova | 06:50 | |
*** maciejjozefczyk has joined #openstack-nova | 06:52 | |
*** tssurya has joined #openstack-nova | 06:53 | |
*** udesale has joined #openstack-nova | 06:55 | |
*** rcernin has quit IRC | 07:00 | |
*** maciejjozefczyk has quit IRC | 07:11 | |
*** maciejjozefczyk has joined #openstack-nova | 07:13 | |
*** takamatsu has quit IRC | 07:16 | |
*** ratailor_ has joined #openstack-nova | 07:19 | |
*** ratailor has quit IRC | 07:20 | |
*** moshele has joined #openstack-nova | 07:20 | |
moshele | stephenfin: hi, can you review https://review.openstack.org/#/c/595592/ ? | 07:21 |
*** udesale has quit IRC | 07:31 | |
*** crazik has left #openstack-nova | 07:32 | |
*** ttsiouts has joined #openstack-nova | 07:45 | |
*** ttsiouts has quit IRC | 07:49 | |
*** threestrands has quit IRC | 07:50 | |
*** jpena|off is now known as jpena | 07:51 | |
*** zigo has joined #openstack-nova | 07:56 | |
openstackgerrit | Merged openstack/nova master: Mention (unused) RP generation in POST /allocs/{c} https://review.openstack.org/597304 | 07:57 |
stephenfin | moshele: I can | 08:07 |
openstackgerrit | Stephen Finucane proposed openstack/nova stable/rocky: Don't use '_TransactionContextManager._async' https://review.openstack.org/597421 | 08:08 |
stephenfin | efried: For when you're about https://review.openstack.org/597421 | 08:09 |
*** owalsh_ is now known as owalsh | 08:10 | |
*** ccamacho has quit IRC | 08:13 | |
*** ccamacho has joined #openstack-nova | 08:14 | |
*** ttsiouts has joined #openstack-nova | 08:16 | |
*** alexchadin has quit IRC | 08:27 | |
*** cdent has joined #openstack-nova | 08:39 | |
*** Dinesh_Bhor has quit IRC | 08:51 | |
*** ccamacho has quit IRC | 08:55 | |
*** ccamacho has joined #openstack-nova | 08:56 | |
*** Dinesh_Bhor has joined #openstack-nova | 08:56 | |
*** dpawlik has quit IRC | 09:03 | |
*** dpawlik has joined #openstack-nova | 09:04 | |
stephenfin | moshele: Picked myself up a connect x-4 NIC and things are slightly...different to what I'm used to | 09:05 |
stephenfin | moshele: I'm seeing "No net device was found for VF xxx" in the logs. I assume there's something I need to do to resolve this? | 09:05 |
*** Bhujay has quit IRC | 09:10 | |
kosamara | efried: I'm here now | 09:11 |
*** moshele has quit IRC | 09:12 | |
*** moshele has joined #openstack-nova | 09:15 | |
openstackgerrit | Merged openstack/nova master: [placement] Add /reshaper handler for POST https://review.openstack.org/576927 | 09:22 |
openstackgerrit | Merged openstack/nova master: Document no content on POST /reshaper 204 https://review.openstack.org/596494 | 09:22 |
*** dpawlik has quit IRC | 09:24 | |
*** maciejjozefczyk has quit IRC | 09:25 | |
*** dpawlik has joined #openstack-nova | 09:28 | |
*** maciejjozefczyk has joined #openstack-nova | 09:28 | |
*** moshele has quit IRC | 09:28 | |
*** holser_ has joined #openstack-nova | 09:30 | |
openstackgerrit | huanhongda proposed openstack/nova master: List soft-deleted instances by "--status" option https://review.openstack.org/597434 | 09:31 |
*** priteau has joined #openstack-nova | 09:46 | |
*** kosamara has quit IRC | 09:50 | |
*** sambetts_ is now known as sambetts|afk | 09:51 | |
*** udesale has joined #openstack-nova | 09:52 | |
*** mdbooth has joined #openstack-nova | 09:55 | |
mdbooth | lyarwood: Do you happen to remember why we're running qemu-img on volumes? | 09:55 |
mdbooth | lyarwood: I'm assuming that's the context of the MIN_LIBVIRT_MULTIATTACH thing? | 09:55 |
*** adrianc has quit IRC | 09:57 | |
lyarwood | mdbooth: I'm not sure that we are on volumes, I just assumed the MIN_LIBVIRT_MULTIATTACH downstream thing we were talking about was just a locking bug between two domains using the same disk etc | 09:59 |
*** kosamara has joined #openstack-nova | 09:59 | |
*** ttsiouts has quit IRC | 10:07 | |
mdbooth | lyarwood: Ok, so it's not just qemu-img? | 10:12 |
*** davidsha has joined #openstack-nova | 10:14 | |
*** adrianc has joined #openstack-nova | 10:14 | |
*** ratailor_ has quit IRC | 10:23 | |
*** psachin has quit IRC | 10:28 | |
*** adrianc_ has joined #openstack-nova | 10:34 | |
*** adrianc has quit IRC | 10:34 | |
*** moshele has joined #openstack-nova | 10:38 | |
openstackgerrit | Chen proposed openstack/nova master: Fix filter servers with SOFT_DELETED status https://review.openstack.org/597443 | 10:40 |
*** moshele has quit IRC | 10:46 | |
*** hshiina has joined #openstack-nova | 10:46 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova-specs master: Placement: any traits in allocation_candidate query https://review.openstack.org/565730 | 10:46 |
openstackgerrit | Balazs Gibizer proposed openstack/nova-specs master: Placement: support mixing required traits with any traits https://review.openstack.org/565741 | 10:46 |
openstackgerrit | Radoslav Gerganov proposed openstack/nova master: doc: add info how to troubleshoot vmware specific problems https://review.openstack.org/597446 | 10:49 |
*** maciejjozefczyk has quit IRC | 10:49 | |
*** dave-mccowan has joined #openstack-nova | 10:50 | |
*** maciejjozefczyk has joined #openstack-nova | 10:58 | |
*** maciejjozefczyk has quit IRC | 10:59 | |
*** macza has joined #openstack-nova | 11:01 | |
*** udesale has quit IRC | 11:04 | |
*** macza has quit IRC | 11:06 | |
*** Dinesh_Bhor has quit IRC | 11:08 | |
*** brinzhang has quit IRC | 11:09 | |
*** nicolasbock has joined #openstack-nova | 11:11 | |
*** lpetrut has joined #openstack-nova | 11:13 | |
*** maciejjozefczyk has joined #openstack-nova | 11:16 | |
*** ttsiouts has joined #openstack-nova | 11:20 | |
*** dpawlik has quit IRC | 11:21 | |
*** Dinesh_Bhor has joined #openstack-nova | 11:31 | |
*** Dinesh_Bhor has quit IRC | 11:31 | |
*** moshele has joined #openstack-nova | 11:37 | |
*** jpena is now known as jpena|lunch | 11:37 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Transform missing delete notifications https://review.openstack.org/410297 | 11:40 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Send soft_delete from context manager https://review.openstack.org/476459 | 11:40 |
openstackgerrit | Merged openstack/nova master: Refix disk size during live migration with disk over-commit https://review.openstack.org/536351 | 11:42 |
openstackgerrit | Merged openstack/nova master: Fix a broken conf file description in networking doc https://review.openstack.org/597391 | 11:42 |
*** alex_xu has quit IRC | 11:49 | |
*** fanzhang_ has quit IRC | 11:50 | |
openstackgerrit | Jay Pipes proposed openstack/os-traits master: clean up CUDA traits https://review.openstack.org/597170 | 11:54 |
*** vivsoni has quit IRC | 11:58 | |
kosamara | jaypipes: I think a sentence spilled over | 11:58 |
jaypipes | kosamara: crap. | 11:58 |
kosamara | I'm doing a change inline | 11:58 |
*** vivsoni has joined #openstack-nova | 11:59 | |
openstackgerrit | Jay Pipes proposed openstack/os-traits master: clean up CUDA traits https://review.openstack.org/597170 | 11:59 |
jaypipes | kosamara: done :) | 12:00 |
openstackgerrit | Konstantinos Samaras-Tsakiris proposed openstack/os-traits master: clean up CUDA traits https://review.openstack.org/597170 | 12:03 |
kosamara | jaypipes: ok? | 12:04 |
*** vivsoni has quit IRC | 12:05 | |
jaypipes | kosamara: ty sir! | 12:05 |
jaypipes | kosamara: I must have badly fat-fingered the copy-paste in vim! :) | 12:05 |
*** vivsoni has joined #openstack-nova | 12:05 | |
*** vivsoni has quit IRC | 12:06 | |
*** janki has quit IRC | 12:06 | |
kosamara | haha! | 12:06 |
*** vivsoni has joined #openstack-nova | 12:07 | |
*** ccamacho has quit IRC | 12:07 | |
*** ccamacho has joined #openstack-nova | 12:09 | |
*** beagles has quit IRC | 12:12 | |
*** beagles has joined #openstack-nova | 12:13 | |
*** dpawlik has joined #openstack-nova | 12:14 | |
*** udesale has joined #openstack-nova | 12:14 | |
*** udesale has quit IRC | 12:16 | |
*** udesale has joined #openstack-nova | 12:19 | |
*** udesale has quit IRC | 12:19 | |
*** udesale has joined #openstack-nova | 12:28 | |
*** udesale has quit IRC | 12:28 | |
*** janki has joined #openstack-nova | 12:30 | |
*** udesale has joined #openstack-nova | 12:31 | |
*** mchlumsky has joined #openstack-nova | 12:35 | |
*** erlon has joined #openstack-nova | 12:36 | |
*** mriedem has joined #openstack-nova | 12:42 | |
openstackgerrit | Chris Dent proposed openstack/nova master: DNM: [placement] Make _ensure_aggregate context not independent https://review.openstack.org/597486 | 12:42 |
*** jpena|lunch is now known as jpena | 12:46 | |
*** eharney has joined #openstack-nova | 12:47 | |
*** awaugama has joined #openstack-nova | 12:48 | |
moshele | mriedem: can you review https://review.openstack.org/#/c/595592/ ? we update the macvtap and SR-IOV CIs with the rx_queue_size/tx_queue_size and they both passing with this patch | 12:53 |
zigo | I'm currently trying to validate Rocky Debian packages. | 12:56 |
zigo | When I try to boot a new instance, I got the scheduler spitting in the logs: | 12:57 |
zigo | Got no allocation candidates from the Placement API. This could be due to insufficient resources or a temporary occurrence as compute nodes start up. select_destinations /usr/lib/python3/dist-packages/nova/scheduler/manager.py:150 | 12:57 |
zigo | What should I look into then? | 12:57 |
zigo | I normally do have enough resources on that machine ... | 12:57 |
zigo | jaypipes: ^ Help ? :) | 12:57 |
zigo | Is there a placement client somehow? | 12:58 |
*** ccamacho has quit IRC | 12:58 | |
*** ccamacho has joined #openstack-nova | 12:59 | |
*** dtantsur is now known as dtantsur|bbl | 13:04 | |
openstackgerrit | Chen proposed openstack/nova master: Fix filter server list with SOFT_DELETED status https://review.openstack.org/597443 | 13:04 |
*** pcaruana has quit IRC | 13:04 | |
cdent | zigo: there's an osc-placement plugin for the openstack client | 13:15 |
cdent | zigo: did you do a discover_hosts? | 13:15 |
cdent | but more likely it sounds like your compute node hasn't reported resources yet | 13:16 |
cdent | 'openstack resource provider list' might work | 13:16 |
zigo | cdent: It's puppet-openstack that I'm running, so normally, yeah ... | 13:16 |
cdent | s/work/provide some insight/ | 13:16 |
zigo | Thanks, trying. | 13:16 |
zigo | cdent: So, I should package that osc-placement in Debian then? | 13:17 |
mriedem | https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-list | 13:17 |
mriedem | packaging osc-placement would be nice for anyone that wants to deploy nova yeah | 13:17 |
mriedem | scheduling in nova has been critically dependent on placement since ocata | 13:17 |
mriedem | if this is a single-node setup, there should be 1 resource provider for the compute node | 13:18 |
zigo | discover_host was ran, so that's not the issue. | 13:18 |
zigo | I'm doing the packaging right away for placement client. | 13:18 |
zigo | Why was it called osc-placement, rather than python-placementclient btw? | 13:19 |
mriedem | b/c it's an openstack client plugin | 13:19 |
mriedem | and that's the naming convention | 13:19 |
mriedem | it's not a python API binding / sdk type thing like novaclient | 13:19 |
zigo | mriedem: So it will be renamed once placement is extracted from Nova? | 13:20 |
mriedem | no | 13:20 |
mriedem | it's only cli, not any python bindings | 13:20 |
zigo | ok | 13:20 |
mriedem | there is no python binding client for placement, maybe openstacksdk is working on one, would have to ask mordred or edleafe | 13:20 |
mriedem | i don't see a placement dir in the sdk yet https://github.com/openstack/openstacksdk/tree/master/openstack | 13:24 |
mriedem | not sure about plans for one though | 13:24 |
zigo | FYI, I don't bother doing py2 clients anymore. :) | 13:26 |
zigo | sphinx>=1.2.1,!=1.3b1,<1.4 # BSD <--- test-requirements.txt needs upgrade ... | 13:27 |
zigo | :P | 13:27 |
*** ttsiouts has quit IRC | 13:34 | |
mriedem | test-reqs in osc-placement? | 13:36 |
zigo | yep | 13:36 |
mriedem | please report a bug https://bugs.launchpad.net/placement-osc-plugin | 13:37 |
edleafe | mriedem: AFAIK that is not being worked on | 13:38 |
mriedem | edleafe: yeah i didn't see any open reviews | 13:38 |
*** ttsiouts has joined #openstack-nova | 13:40 | |
openstackgerrit | Matt Riedemann proposed openstack/osc-placement master: Random names for functional tests https://review.openstack.org/542745 | 13:43 |
zigo | https://bugs.launchpad.net/placement-osc-plugin/+bug/1789649 | 13:45 |
openstack | Launchpad bug 1789649 in placement-osc-plugin "test-requirements out of date" [Undecided,New] | 13:45 |
mriedem | zigo: thanks | 13:45 |
zigo | mriedem: Now that I have the package built and installed, how do I test if placement has resources ? | 13:46 |
mriedem | as an admin, | 13:47 |
mriedem | openstack resource provider lst | 13:47 |
mriedem | *list | 13:47 |
zigo | Ok, found my host, then? | 13:47 |
zigo | Show it? | 13:48 |
mriedem | inventory is more useful | 13:48 |
zigo | Hum... not much thing shows up then... | 13:48 |
mriedem | openstack resource provider inventory show <rp_uuid> | 13:48 |
mriedem | oh wait | 13:48 |
mriedem | openstack resource provider inventory list <rp_uuid> | 13:48 |
mriedem | https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-inventory-list | 13:48 |
mriedem | you should see VCPU, MEMORY_MB and DISK_GB | 13:49 |
zigo | It works... | 13:49 |
zigo | http://paste.openstack.org/show/729051/ | 13:49 |
zigo | But then? How come placement didn't find resources? | 13:49 |
mriedem | let's make sure the compute node in nova is matching that same uuid | 13:50 |
mriedem | from nova cli, | 13:50 |
mriedem | as admin, | 13:50 |
mriedem | nova hypervisor-list | 13:50 |
zigo | I see my host, and it's up and enabled. | 13:50 |
zigo | I did all this before asking! :P | 13:50 |
mriedem | does it's id equal f9716941-356f-4a2e-b5ea-31c3c1630892 ? | 13:51 |
zigo | Yup. | 13:51 |
mriedem | ok what type of flavor did you use when you tried creating the server? | 13:51 |
zigo | Is it normal that I get allocation_ratio 0.0 for all resources? | 13:51 |
mriedem | so, | 13:51 |
mriedem | that sounds exactly like the same problem the xenserver CI guys are having in the ML right now | 13:51 |
mriedem | and no, | 13:51 |
mriedem | cpu should be 16.0, | 13:52 |
zigo | 256 RAM, 5 GB HDD, 1 VCPU | 13:52 |
mriedem | ram and disk should also be > 0 | 13:52 |
mriedem | jaypipes: efried: naichuans: ^ | 13:52 |
*** markvoelker has quit IRC | 13:53 | |
mriedem | zigo: using libvirt right? | 13:53 |
zigo | mriedem: Yeah, normal qemu in a virtualbox right now. | 13:54 |
zigo | mriedem: Also using it in the OpenStack CI with puppet-openstack stuff ... | 13:54 |
zigo | mriedem: Is there a way to force something in the allocation_ratio to fix things? | 13:55 |
mriedem | yes via osc-placement, but i'd have to look it up quick | 13:55 |
*** lbragstad has quit IRC | 13:56 | |
mriedem | https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-inventory-class-set | 13:56 |
mriedem | so: openstack resource provider inventory class set --allocation_ratio 16.0 --total 4 f9716941-356f-4a2e-b5ea-31c3c1630892 VCPU | 13:56 |
mriedem | cdent: ever remember talking about making allocation_ratio a minimum of 1.0 in placement? | 13:57 |
mriedem | allowing anything to set that to 0.0 seems like a bad idea | 13:57 |
*** dtantsur|bbl is now known as dtantsur | 13:58 | |
gibi | mriedem: having < 1.0 could make sense for handling overhead, but I agree that 0.0 doesn't make sense | 13:59 |
zigo | mriedem: The max_unit for VCPUs looks funny now: http://paste.openstack.org/show/729052/ | 13:59 |
cdent | mriedem: the only vague tickle I have in my mind was "if you have a float, how do you set a min_unit that is the min-est" | 13:59 |
zigo | Shouldn't it be num-of-vcpu * allocation_ratio ? | 13:59 |
cdent | What i'm not clear on is how/what is coming along with these 0s | 13:59 |
*** Luzi has quit IRC | 13:59 | |
mriedem | cdent: i guess it couldn't be done with jsonschema and would have to be in python | 14:00 |
mriedem | if allocation_ratio <= 0.0: 400 | 14:00 |
mriedem | or use a custom jsonschema validator | 14:00 |
zigo | Can I consider there's a bug somewhere, and wait for the fix? :) | 14:00 |
mriedem | so clearly something in nova is creating resource providers with 0.0 allocation ratios from nova.conf, which is the default, but it should be getting the values from the ComputeNode object | 14:00 |
cdent | I can't remember any specific conversation about this, but I wonder if 0 was allowed as a way to disable inventory? | 14:01 |
mriedem | zigo: yes | 14:01 |
mriedem | zigo: naichuans: if you haven't already, it would be great to have a nova bug for tracking this | 14:01 |
cdent | (while still keeping a record of the "actual" inventory) | 14:01 |
mriedem | yeah idk, maybe jaypipes remmebers | 14:01 |
mriedem | so https://github.com/openstack/nova/blob/6522ea3ecfe99cca3fb33258b11e5a1f34e6e8f0/nova/compute/resource_tracker.py#L84 in the RT is what is supposed to set the default allocation ratio for each inventory on the compute node provider | 14:02 |
*** knikolla has joined #openstack-nova | 14:02 | |
mriedem | assuming the virt driver doesn't provide an allocation ratio override, which neither libvirt nor xenapi do | 14:02 |
zigo | mriedem: cdent: What is max_unit supposed to represent? Isn't it total * allocation_ratio? | 14:02 |
jaypipes | mriedem: sorry, reading back was on a call | 14:02 |
cdent | zigo: no, it is an expression of the largest amount any individual resource allocation can allocate | 14:02 |
jaypipes | zigo: max_unit is not total * allocation_ratio, no | 14:02 |
cdent | so in the case of something like disk or vcpu it is generally the max physical amount | 14:03 |
cdent | as you don't want a single instance to have more than is physically available, regardless of allocation ratio | 14:03 |
zigo | Ah, great, meaning I can have hosts with up to 2 billion VCUPs ! :) | 14:03 |
zigo | I can start writing another bug, ok. | 14:04 |
cdent | total: the real physical amount, allocation ratio: multiplier for over or under commit, min_unit: smallest individual request, max_unit: largest individual request, reserved: what the system is using for itself | 14:04 |
mriedem | i wonder if there is a bug in ProviderTree.update_inventory where it's not updating the allocation_ratio when we create/update the provider in placement | 14:06 |
mriedem | zigo: you're testing with rocky right? | 14:07 |
*** lbragstad has joined #openstack-nova | 14:07 | |
*** mlavalle has joined #openstack-nova | 14:08 | |
zigo | https://bugs.launchpad.net/nova/+bug/1789654 | 14:10 |
openstack | Launchpad bug 1789654 in OpenStack Compute (nova) "placement allocation_ratio initialized with 0.0" [Undecided,New] | 14:10 |
zigo | mriedem: Correct ! | 14:10 |
zigo | mriedem: Fresh packages just built this and last week in a record time. | 14:11 |
zigo | It took me less than 2 weeks, this time ... :P | 14:11 |
openstackgerrit | Merged openstack/osc-placement master: Update reno for stable/rocky https://review.openstack.org/586115 | 14:11 |
mriedem | so i thought we used to log something when inventory changed on a provider... | 14:13 |
mriedem | not seeing that | 14:13 |
mriedem | from one of the failed xenserver ci logs, the RP is created here http://logs.openstack.org/41/590041/17/check/tempest-full/b3f9ddd/controller/logs/screen-n-cpu.txt.gz#_Aug_27_14_18_25_580517 | 14:13 |
mriedem | then the generation changes twice but we don't log why | 14:14 |
zigo | mriedem: If you want the full logs and everything, you can find it here: https://review.openstack.org/#/c/597175/ | 14:14 |
mriedem | any of those failed jobs? | 14:14 |
zigo | That commit just switches repo from queens to rocky for Debian and puppet-openstack. | 14:14 |
zigo | Yep. | 14:15 |
zigo | The first one for example. | 14:15 |
mriedem | yup so in this case we create the RP here http://logs.openstack.org/75/597175/1/check/puppet-openstack-integration-4-scenario001-tempest-debian-stable-luminous/fd38fcf/logs/nova/nova-compute.txt.gz#_2018-08-28_17_09_40_686 | 14:15 |
mriedem | here we POST to placement http://logs.openstack.org/75/597175/1/check/puppet-openstack-integration-4-scenario001-tempest-debian-stable-luminous/fd38fcf/logs/nova/nova-placement-api.txt.gz#_2018-08-28_17_09_40_683 | 14:16 |
zigo | mriedem: What I can do is artificially change the default 0.0 allocation ratio in the package. Would you advise me to do that? | 14:16 |
mriedem | here we add inventory http://logs.openstack.org/75/597175/1/check/puppet-openstack-integration-4-scenario001-tempest-debian-stable-luminous/fd38fcf/logs/nova/nova-placement-api.txt.gz#_2018-08-28_17_09_40_794 | 14:17 |
mriedem | zigo: i would probably not recommend that at this time no | 14:18 |
mriedem | i personally would like to figure out what we're dealing with first | 14:18 |
cdent | agreed | 14:18 |
cdent | as far as I can tell the nova-compute is never logging when it sets inventory? | 14:18 |
mriedem | no, it used to | 14:18 |
openstackgerrit | Merged openstack/osc-placement master: Add image link in README.rst https://review.openstack.org/586839 | 14:20 |
efried | stephenfin: Done (though I'm not a stable core): https://review.openstack.org/597421 | 14:20 |
stephenfin | efried: Ah, indeed. Good enough though | 14:20 |
dansmith | mriedem: we did discuss having the compute node not override the allocation ratio set on a provider, only use that value when creating it | 14:21 |
dansmith | did that ever happen? | 14:21 |
dansmith | I don't see those extra values in conf/compute | 14:22 |
mriedem | i don't remember that happening no | 14:22 |
dansmith | https://review.openstack.org/#/c/552105/ | 14:22 |
mriedem | yup, i have unanswered questions in there | 14:23 |
dansmith | yep, just offering evidence that it didn't | 14:23 |
mriedem | efried: would be helpful if provider tree's _update_generation method took an "operation" param or something, | 14:24 |
mriedem | i.e. we're updating the generation b/c inventory was updated or something | 14:24 |
sean-k-mooney | mriedem: i might add a live migration item to the nova-neutron ptg slot. | 14:24 |
sean-k-mooney | mriedem: i have stated to address the neutron bugs i have found but looks like there are other nova ones too | 14:24 |
efried | kosamara: I started going through the spec to make some content, but found it's really going to be easier if we can both post our updates to gerrit. | 14:24 |
efried | kosamara: I also wanted to know how much you'd looked at the cyborg project, if at all. | 14:25 |
mriedem | zigo: let me know if/when you have a nova bug posted and i can push some debug patches | 14:25 |
mriedem | so, | 14:25 |
efried | mriedem: I would expect the report client side to be logging what it's doing there. | 14:26 |
zigo | mriedem: Yeah, here: https://bugs.launchpad.net/nova/+bug/1789654 | 14:26 |
openstack | Launchpad bug 1789654 in OpenStack Compute (nova) "placement allocation_ratio initialized with 0.0" [Undecided,New] | 14:26 |
mriedem | my guess is that update_from_provider_tree is somehow not pushing inventory changes which include the updated allocation_ratio b/c of a cache | 14:26 |
openstackgerrit | Merged openstack/osc-placement master: Resource provider examples https://review.openstack.org/553461 | 14:26 |
zigo | mriedem: I can add some debian specific patches to my package, and rerun puppet, if you like. | 14:26 |
efried | mriedem: Ahem, libvirt's update_provider_tree is not setting allocation ratio. | 14:27 |
mriedem | efried: i know | 14:27 |
mriedem | the RT does | 14:27 |
mriedem | see the _normalize method | 14:27 |
efried | no | 14:27 |
zigo | RT stands for what? | 14:27 |
mriedem | ResourceTracker | 14:27 |
efried | that doesn't get run if you implemented update_provider_tree | 14:27 |
mriedem | sure it does | 14:27 |
efried | That said, it should be getting defaulted to 1.0, per handler/inventory | 14:27 |
mriedem | https://github.com/openstack/nova/blob/6522ea3ecfe99cca3fb33258b11e5a1f34e6e8f0/nova/compute/resource_tracker.py#L901 | 14:27 |
mriedem | and we don't log anything in the report client when we go to flush provider tree inventory changes if the provider tree doesn't think inventory has changed | 14:28 |
efried | oh wtf... | 14:28 |
efried | I totally traced this exact thing like last week and was sure we weren't hitting that. /me needs to rework a thing... | 14:28 |
mriedem | if we didn't call _normalize_inventory_from_cn_obj in this case, libvirt/xen/ironic wouldn't ever have allocation_ratio or reserved values set | 14:28 |
*** moshele has quit IRC | 14:29 | |
mriedem | and it looks like in some racey cases we never update the inventory b/c the provider tree cache never thinks there is a change | 14:29 |
openstackgerrit | Konstantinos Samaras-Tsakiris proposed openstack/nova-specs master: Placement model for passthrough devices https://review.openstack.org/591037 | 14:29 |
efried | edmondsw: Redundant goofy code alert --^ | 14:29 |
kosamara | efried: No problem, we'll coordinate. | 14:30 |
efried | mriedem: Are you saying it's "normal" for us to set alloc ratio to 0.0 for a sec, because we expect the next update to hit _normalize and then push the updated inventory? | 14:32 |
kosamara | efried: I had looked at Cyborg in spring, but haven't been following it since. The primary use case that I saw Cyborg enabling was FPGA function-aaS. | 14:32 |
mriedem | efried: i don't know what is pushing the initial inventory data | 14:32 |
efried | kosamara: Yes, that's kind of the initial motivator, but the project is supposed to subsume all device management. | 14:32 |
mriedem | so i can't really say | 14:32 |
mriedem | we don't log anything | 14:32 |
mriedem | so i'm going to push some debug patches so we can .... debug | 14:33 |
efried | ack, let me know if I can help. | 14:33 |
mriedem | my guess is the local provider tree cache is current but remote is not, and we never update | 14:33 |
efried | mriedem: Well, looking at has_inventory_changed, it *should* be paying attention to alloc ratio updates, whether the field exists and is being changed, or doesn't exist and is being added. | 14:39 |
*** maciejjozefczyk has quit IRC | 14:39 | |
efried | and has_inventory_changed being false is the only way we would avoid sending the update down to placement. | 14:40 |
efried | ...barring actual errors which we would see in the compute log | 14:40 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Log the operation when updating generation in ProviderTree https://review.openstack.org/597553 | 14:43 |
efried | mriedem: FYI: http://paste.openstack.org/show/729053/ <== looks fine. | 14:44 |
*** markvoelker has joined #openstack-nova | 14:44 | |
*** lpetrut has quit IRC | 14:45 | |
kosamara | efried: Does that mean that instead of proposing improvements to the way Nova manages devices, we should contribute this functionality to Cyborg, because in the future Cyborg will have exclusive responsibility in that domain? | 14:47 |
efried | kosamara: Well, that's what needs to be discussed, really. | 14:47 |
efried | I am pretty far behind on the massive volume of cyborg specs that are out there | 14:47 |
efried | but I haven't yet seen one where they deal with discovery and whitelisting. | 14:48 |
kosamara | efried: I was going to ask if you have any pointers to similar work there | 14:48 |
efried | It's entirely likely that they've got that proposed and I just haven't gotten to it yet. | 14:48 |
kosamara | and modelling in Placement with RPs? | 14:48 |
efried | oh, yes, that's definitely their plan. | 14:48 |
*** sahid has quit IRC | 14:49 | |
efried | I'm going to try to ask about it in #openstack-cyborg, if you'd like to join me there. | 14:49 |
kosamara | But can Cyborg create RPs now, or it has to happen through nova, like neutron does in the network bandwidth providers spec? | 14:49 |
kosamara | ok | 14:49 |
*** pcaruana has joined #openstack-nova | 14:50 | |
efried | kosamara: Yeah, that's the question. The providers are intended to be created and "owned" by cyborg code, but I'm still not 100% clear whether that's at the behest/prompting of a nova flow or totally independent. | 14:50 |
efried | kosamara: Because obviously somebody has to coordinate those device RPs being parented to the compute node RP. | 14:51 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Making instance listing skipping down cells configurable https://review.openstack.org/592428 | 14:54 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add debug logs for when provider inventory changes https://review.openstack.org/597560 | 14:56 |
mriedem | naichuans: efried: cdent: jaypipes: zigo: ^ hopefully we can learn something from this | 14:56 |
mriedem | zigo: i updated your patch with a depends-on to that nova change | 14:57 |
efried | mriedem: dig | 14:57 |
cdent | mriedem: do we have a way of setting our test environments so they are more like what zigo and naichuans were experiencing? because apparently our test environments are configuring too much to reflect reality? | 14:57 |
mriedem | cdent: xenserver ci uses devstack | 14:58 |
mriedem | it's pretty stock outside of saying use the xen driver rather than libvirt | 14:58 |
zigo | mriedem: cdent: Would it help if I added these patches to my package and re-run puppet? | 14:58 |
mriedem | zigo's is using puppet and non-ubuntu but with libvirt | 14:58 |
mriedem | zigo: so my depends-on nova change won't get pulled into that CI run? | 14:59 |
zigo | non-ubuntu: Debian Stretch ... :P | 14:59 |
cdent | mriedem: I get that. The root of my question is: How come we didn't fail tempest or functional? | 14:59 |
mriedem | cdent: well that's what i'm trying to figure out... | 14:59 |
zigo | deb http://stretch-rocky.debian.net/debian stretch-rocky-backports main + deb http://stretch-rocky.debian.net/debian stretch-rocky-backports-nochange main | 14:59 |
mriedem | we do'nt configure allocation ratios in nova.conf in devstack | 14:59 |
cdent | I know, I'm not being a dick (at least I hope not), I'm just questioning-out-loud | 15:00 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Add scatter-gather-single-cell utility https://review.openstack.org/594947 | 15:00 |
cdent | it seems that if we know what the difference is between the standard gate tests and e.g. the xen tests, we can make our tests fail and work from that | 15:01 |
mriedem | as far as i know, the xen tests are mostly stock | 15:02 |
mriedem | they did not hard-code the allocation ratios until jay told them to in the ML as a workaround | 15:02 |
mriedem | which makes me think, my debug patch probably won't fail the xen ci now b/c of that... | 15:02 |
*** janki has quit IRC | 15:04 | |
openstackgerrit | Merged openstack/osc-placement master: Random names for functional tests https://review.openstack.org/542745 | 15:04 |
openstackgerrit | Konstantinos Samaras-Tsakiris proposed openstack/nova-specs master: Placement model for passthrough devices https://review.openstack.org/591037 | 15:05 |
openstackgerrit | Chris Dent proposed openstack/nova master: [placement] Make _ensure_aggregate context not independent https://review.openstack.org/597486 | 15:10 |
mriedem | anyone remember were the hell the xenserver CI repo is in github? | 15:11 |
*** ttsiouts has quit IRC | 15:11 | |
mriedem | guessing https://github.com/citrix-openstack/qa | 15:13 |
*** knikolla has quit IRC | 15:14 | |
*** knikolla has joined #openstack-nova | 15:15 | |
*** tbachman has quit IRC | 15:17 | |
*** tbachman has joined #openstack-nova | 15:22 | |
jaypipes | mriedem: sorry? | 15:22 |
*** dklyle has quit IRC | 15:22 | |
*** dklyle has joined #openstack-nova | 15:23 | |
*** manjeets has joined #openstack-nova | 15:23 | |
openstackgerrit | Dan Smith proposed openstack/nova master: DNM: Tester for grenade job https://review.openstack.org/597566 | 15:24 |
*** ttsiouts has joined #openstack-nova | 15:28 | |
*** sahid has joined #openstack-nova | 15:28 | |
openstackgerrit | Chris Dent proposed openstack/nova master: [placement] Make _ensure_aggregate context not independent https://review.openstack.org/597486 | 15:30 |
*** READ10 has joined #openstack-nova | 15:31 | |
dansmith | mriedem: tssurya: melwitt: do we need a cells meeting? my patches merged, I think we got melwitt's from last week as well, and tssurya and I are crushing the down cell stuff | 15:32 |
*** macza has joined #openstack-nova | 15:32 | |
dansmith | and when I say we, I mean tssurya is doing it and I'm throwing tomatoes | 15:32 |
tssurya | I am okay to skip, sorry about the slow-ness in the spec imple; its because I am a litte stuck with the US visa stuff for the next week | 15:32 |
tssurya | but it should be settled tomorrow finally! | 15:33 |
mriedem | i don't really have any news; some info is coming in on the cross-cell migration stuff and i'm trying to probe internally (yes wording intended) on how our product team did this for cascading aka cells v1 | 15:33 |
dansmith | tssurya: oh so you'll be in denver yeah? | 15:33 |
tssurya | yes :) | 15:33 |
mriedem | Kevin_Zheng knows more about what our product team did than i ever will | 15:34 |
dansmith | tssurya: that's cool, although I'm disappointed you'll be seeing the least nice venue in our fair country :/ | 15:34 |
tssurya | dansmith: haha really ? its actually gonna be my first time in your country | 15:34 |
dansmith | tssurya: I know, which is why I'm disappointed :/ | 15:35 |
mriedem | if i remember kevin's email correctly, tl;dr was that ports and volumes are accessible across cells for them | 15:35 |
mriedem | colorado is "nice" depending on where you are | 15:35 |
mriedem | out by the airport is not so much | 15:35 |
dansmith | yes, colorado is for sure | 15:35 |
dansmith | but the hotel and area we're in sucks | 15:35 |
tssurya | heh, I booked into the same venue as the conference | 15:35 |
tssurya | oops | 15:35 |
mriedem | everyone did | 15:35 |
dansmith | unless you like living next to a freeway onramp with a big train | 15:36 |
dansmith | tssurya: you have to be there, yeah | 15:36 |
dansmith | tssurya: it'snot close to anything, which is part of the problem | 15:36 |
mriedem | i just hope you like ihop | 15:36 |
mriedem | every | 15:36 |
mriedem | morning | 15:36 |
dansmith | mriedem: don't think we're not | 15:36 |
mriedem | oh i know | 15:36 |
mriedem | my stretch pants are on order | 15:36 |
dansmith | we will put 20lbs on tssurya before she's gone | 15:36 |
tssurya | rolf | 15:36 |
dansmith | her family won't recognize her | 15:36 |
tssurya | :P that is gonna be hard considering the amount I eat | 15:37 |
dansmith | tssurya: the denver ptg comes with a complimentary case of type 2 diabetes | 15:37 |
kashyap | LOL | 15:37 |
tssurya | :D | 15:37 |
mriedem | luckily she probably gets free health care in switzerland | 15:37 |
dansmith | they won't know what to do with her | 15:37 |
kashyap | Is the venue so bad? I read somethings about a train being noisy apparently last time. | 15:37 |
dansmith | they don't have the IHOP antidote | 15:38 |
mriedem | kashyap: that is reportedly fixed | 15:38 |
kashyap | (And you're in the same venue this time -- assuming that problem is fixed) | 15:38 |
kashyap | mriedem: Ah, I see. | 15:38 |
tssurya | you both are making it sound really bad, hopefully you are just kidding | 15:38 |
dansmith | mriedem: med made it sound like maybe not | 15:38 |
dansmith | tssurya: it's pretty bad | 15:38 |
mriedem | i'm bringing my white noise machine either way | 15:38 |
jroll | that's an odd name for your child | 15:38 |
tssurya | :( | 15:38 |
dansmith | tssurya: last year my room had a hole in the wall because they weren't finished with the remodel | 15:38 |
tssurya | really ?! | 15:39 |
tssurya | so its not really gonna like the Dublin PTG then.. | 15:39 |
dansmith | tssurya: the foundation got a *smashing* deal.. | 15:39 |
dansmith | tssurya: probably less snow, but no promises | 15:39 |
jroll | I missed the last one in denver, dansmith is making me so excited | 15:39 |
tssurya | heh, | 15:39 |
dansmith | jroll: HOOOOOOOOOONK | 15:40 |
*** tbachman has quit IRC | 15:40 | |
dansmith | jroll: HOOOOOOOOOOOOOOOOOOOOOOOOOOOONK | 15:40 |
jroll | :P | 15:40 |
dansmith | every ten minutes | 15:40 |
dansmith | every | 15:40 |
dansmith | ten | 15:40 |
dansmith | minutes | 15:40 |
tssurya | I will be there on Thursday, so I have like 4 days to go around, maybe I can see the nice part of colorado | 15:40 |
dansmith | sometimes at night I still hear it | 15:40 |
* jroll wonders how much beer it takes to sleep through a train | 15:40 | |
dansmith | jroll: you know it's bad when at checkin they hand you a noise machine and earplugs | 15:41 |
dansmith | literally. | 15:41 |
dansmith | until they ran out of course | 15:41 |
jroll | yeah, so I heard | 15:41 |
edleafe | tssurya: Rocky Mountain National Park may help you if you're missing the Alps | 15:41 |
jroll | I get in monday night. so. I guess I'll bring my own | 15:41 |
tssurya | edleafe: nice I will see if I can go there, just waiting for the visa to be in my hand.. | 15:42 |
mriedem | tssurya: you're in luck! https://www.google.com/search?q=denver+broncos+schedule&ie=utf-8&oe=utf-8&client=firefox-b-1-ab#sie=m;/g/11hdb0xw6z;6;/m/059yj;dt;fp;1 | 15:42 |
mriedem | you can go watch the broncos get destroyed at home! | 15:42 |
tssurya | mriedem: ha! | 15:43 |
mriedem | https://deadspin.com/why-your-team-sucks-2018-denver-broncos-1828078886 | 15:43 |
*** ttsiouts has quit IRC | 15:43 | |
*** fnordahl has joined #openstack-nova | 15:45 | |
cfriesen | mriedem: the "four friends kitchen" in Denver is really good, but a bit further away. | 15:46 |
*** cdent has quit IRC | 15:47 | |
*** r-daneel has joined #openstack-nova | 15:48 | |
*** markvoelker has quit IRC | 15:48 | |
*** tbachman has joined #openstack-nova | 15:50 | |
*** cdent has joined #openstack-nova | 15:51 | |
mriedem | gibi: i think i know what's going on in https://bugs.launchpad.net/nova/+bug/1781648 | 15:54 |
openstack | Launchpad bug 1781648 in OpenStack Compute (nova) "heal_allocations test randomly failing with "ValueError: Field `compute_node_uuid' cannot be None"" [Medium,Confirmed] | 15:54 |
mriedem | we're racing with the caching scheduler starting up and initializing the cache of compute nodes | 15:54 |
mriedem | so by the time we hit the scheduler, the cache is empty | 15:54 |
gibi | mriedem: that sounds like a reasonable explanation. I guess I missunderstood the logs when I stated in the bug that the host creation overlaps with the scheduling | 15:56 |
*** efried is now known as efried_rollin | 15:57 | |
gibi | mriedem: so I guess the fix is that we need to wait in the test a bit | 15:59 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Restart scheduler in TestNovaManagePlacementHealAllocations https://review.openstack.org/597571 | 16:00 |
mriedem | sort of ^ | 16:00 |
mriedem | i thought about also changing this https://github.com/openstack/nova/blob/master/nova/scheduler/caching_scheduler.py#L78 to "if not self.all_host_states:" to handle the empty list case, but that could still be a weird race failure if we have 1 host but not both in the cache | 16:02 |
gibi | mriedem: your proposed start / stop is way better than a simple sleep in the test | 16:02 |
mriedem | i learned it from watching you gibi | 16:02 |
gibi | :) | 16:04 |
*** tssurya has quit IRC | 16:04 | |
melwitt | . | 16:05 |
gibi | mriedem: I'm wondering how many other functional tests uses CachingScheduler and therefore might be affected by the same race | 16:05 |
mriedem | gibi: some do, but they might start the scheduler after the computes, i was just starting to audit that | 16:06 |
gibi | mriedem: cool | 16:06 |
*** adrianc_ has quit IRC | 16:10 | |
*** slaweq has quit IRC | 16:11 | |
mriedem | mlavalle: smcginnis: on this cross-cell cold migration thing nova will definitely need some pre-validation of the selected host, | 16:14 |
*** links has quit IRC | 16:14 | |
mriedem | i'm wondering if simply trying to create a volume attachment/port binding on the selected target host *in another cell* would be enough to tell us if that's not going to work for storage/networking | 16:15 |
mriedem | or in the case of port bindings, will neutron only fuss when we try to activate the target host port binding? | 16:15 |
*** slaweq has joined #openstack-nova | 16:16 | |
*** udesale has quit IRC | 16:16 | |
jaypipes | zigo, mriedem: we have a rough timeframe of when this began occurring (the allocation ratio 0 thing...)? | 16:16 |
*** READ10 has quit IRC | 16:16 | |
mriedem | earliest i know of is when naichuans reported it in the ML | 16:17 |
*** READ10 has joined #openstack-nova | 16:17 | |
mlavalle | mriedem: the port binding is the result of asking the mechanism managers if they can bind the port in the indicated host. so the binding process is what you call the pre-validation of the selected host | 16:18 |
melwitt | dansmith: +1 to meeting skippage | 16:18 |
mlavalle | mechanism drivers^^^^ | 16:18 |
mlavalle | so when you say create_port_binding, you are asking the mechanism drivers whether any of them can bind the port in the designated host | 16:19 |
mriedem | mlavalle: and if the network that the port is in doesn't cover that dest host, the port binding creation should fail? | 16:20 |
sean-k-mooney | mriedem: it will fail of non of the ml2 driver can bind the port to the correct network | 16:20 |
mlavalle | mriedem: correct. the question that the drivers ask themselves is whether they can a segment of the network reachable in that host | 16:20 |
mlavalle | they have a segment^^^^ | 16:21 |
*** slaweq has quit IRC | 16:21 | |
melwitt | can anyone confirm whether PCI and SRIOV PCI devices are treated the same as far as the scheduler is concerned? | 16:21 |
sean-k-mooney | mlavalle: that segment is only relevent for provider networks with the multi segment network extention correct normmal it 1 segment per network | 16:21 |
sean-k-mooney | mlavalle: as in in the pci filter? | 16:22 |
sean-k-mooney | melwitt: ^ | 16:22 |
sean-k-mooney | melwitt: they are more or less treated the same but they do have a different type in the pci manager (type-pci for passthough and type-PF or type-VF for sriov) | 16:23 |
melwitt | sean-k-mooney: maybe? just in general, for NUMA scheduling, if SRIOV PCI devices are treated differently at all or no | 16:23 |
mlavalle | sean-k-mooney: but regardless, today we can bind a port in multiple cells deployments, right? | 16:23 |
melwitt | sean-k-mooney: I see, thank you | 16:24 |
sean-k-mooney | melwitt: from a numa perspctive they are treated the same. we store the numa node info the same way in the db | 16:25 |
sean-k-mooney | mlavalle: yes i think so but never tried it | 16:25 |
*** tbachman has quit IRC | 16:26 | |
mlavalle | sean-k-mooney: so that means that the mechanism drivers are being able to answer the question, can I bind a port here or not? in multiple port bindings, we are asking that same question, it's just that the port is also bound somewhere else | 16:26 |
sean-k-mooney | mlavalle: cells is a nova only ting. neutron has availablity zones. if you had a different neutron availableity zone per cell can a netowrk span neutron availablity zones | 16:26 |
*** READ10 has quit IRC | 16:26 | |
mriedem | i would think, | 16:27 |
sean-k-mooney | mlavalle: well we are talking about cross cell cold migration so jsut like the live migration case we will have multiple port bindings so from a neutron point of view it should be identical | 16:28 |
mriedem | when we have an attached port, bound to the source host, | 16:28 |
mriedem | nova puts the az on the port binding information, | 16:28 |
mriedem | and neutron would be relying on that to know if we can bind to another host in another a | 16:28 |
mriedem | *az | 16:28 |
mlavalle | sean-k-mooney: that is what I say | 16:28 |
jaypipes | mriedem: I have a sneaking suspicion this patch is the cause: https://github.com/openstack/nova/commit/c9b74bcfa09d11c2046ce1bfb6dd8463b3a2f3b0 | 16:28 |
mriedem | jaypipes: that's only in master | 16:28 |
mriedem | zigo is hitting this in rocky | 16:28 |
sean-k-mooney | mlavalle: yep i am agreeing :) | 16:28 |
mlavalle | sean-k-mooney: LOL | 16:28 |
mriedem | btw, is it strange we don't have a cross_az_attach=False thing for nova/neutron like we have for nova/cinder? | 16:29 |
mlavalle | sean-k-mooney: will you be in Denver? | 16:29 |
sean-k-mooney | mriedem: neutron does not use the nova AZ as part of port bininging just the hostname | 16:29 |
sean-k-mooney | mlavalle: yes i will | 16:29 |
mriedem | so why do we set the device_owner on the port? | 16:29 |
mlavalle | Great! | 16:29 |
mriedem | using the instance az? | 16:29 |
mlavalle | because you want to keep track of where it is, on the Nova side, but I am just speculating | 16:30 |
sean-k-mooney | mriedem: that is a good question to which i do not know the answer | 16:30 |
mriedem | it might be used by the neutron callback code | 16:30 |
jaypipes | mriedem: oh... I thought you said this only recently occurred.. | 16:31 |
mriedem | jaypipes: it is only recently reported | 16:31 |
mriedem | but i also thought about that _update change but it's master only | 16:31 |
mriedem | and zigo said he's hitting it on rocky | 16:31 |
*** jpena is now known as jpena|off | 16:31 | |
sean-k-mooney | mriedem: im pretty sure its not used by neutron at all. my geuess is this might be from nova networks and we just kept doing it | 16:31 |
sean-k-mooney | the AZ that is | 16:32 |
jaypipes | mriedem: ack | 16:32 |
mriedem | nova net doesn't have any kind of device_owner thing | 16:32 |
*** davidsha has quit IRC | 16:32 | |
mlavalle | yeah, the device_owner stuff is a Neutron port concept | 16:32 |
sean-k-mooney | mriedem: oh i was going to specalte we might of needit to do the nova-net multhost thing | 16:32 |
mriedem | bingo http://git.openstack.org/cgit/openstack/neutron/tree/neutron/notifiers/nova.py#n77 | 16:32 |
mriedem | it's what i said it was | 16:32 |
mriedem | among apparently a lot of other things | 16:33 |
sean-k-mooney | mriedem: i dont see why we need the az form that | 16:33 |
mriedem | we probably dont | 16:34 |
mlavalle | there we only use the compute prefix | 16:34 |
mlavalle | as we do in other places of the code | 16:34 |
mriedem | probably explains why https://launchpad.net/bugs/1759924 isn't that big a deal | 16:35 |
openstack | Launchpad bug 1759924 in OpenStack Compute (nova) "Port device owner isn't updated with new host availability zone during unshelve" [Medium,In progress] - Assigned to Matt Riedemann (mriedem) | 16:35 |
mriedem | except it causes confusiong | 16:35 |
mriedem | *confusion | 16:35 |
*** slaweq has joined #openstack-nova | 16:35 | |
sean-k-mooney | mriedem: silvanb is still on pto but i was discussing this with him a few weeks ago about should we remove setting it or not | 16:35 |
mriedem | bauzas you mean? | 16:35 |
sean-k-mooney | mriedem: yes | 16:36 |
mriedem | if there is one thing sylvain loves to talk about more than cheese and skiing, it's AZs | 16:36 |
mlavalle | LOL | 16:36 |
jaypipes | mriedem: zigo's using libvirt, right? | 16:36 |
mriedem | yup | 16:36 |
jaypipes | k | 16:37 |
sean-k-mooney | there was concern over is allowing livemigation across availablity zones breaking the contract with a user. | 16:37 |
mriedem | well that reminds me of another bug fix https://review.openstack.org/#/c/567701/ | 16:37 |
sean-k-mooney | there are some open bugs where instances with floating ips break if you do this. | 16:37 |
mriedem | with neutron dvr? | 16:38 |
mriedem | or just in general? | 16:38 |
sean-k-mooney | i think it was in general. i should find out i will check when i have my live migration setup running again | 16:38 |
*** psachin has joined #openstack-nova | 16:40 | |
sean-k-mooney | mriedem: mlavalle actully while ye are both here i added some talking points to the nova neutron cross project session since it was blank | 16:40 |
sean-k-mooney | https://etherpad.openstack.org/p/nova-ptg-stein line 147 | 16:40 |
*** r-daneel has quit IRC | 16:40 | |
sean-k-mooney | cross cell migration should proably be there | 16:41 |
mriedem | cross-cell migratoin is in the cells section | 16:41 |
mriedem | but sure | 16:41 |
melwitt | cross-cell migration will apply to the cells section, the cinder section, and the neutron section, I think | 16:42 |
mlavalle | yesterday I asked rubasov, | 16:42 |
sean-k-mooney | mlavalle: so just so we can confim does neutron allow neutron networks to span neutron availablitiy zones | 16:42 |
mlavalle | who asked gibi, whether we needed to discuss bandwidth based scheduling, and the answer was no | 16:42 |
mlavalle | sean-k-mooney: I think it does but i'll confirm | 16:43 |
jaypipes | mriedem: hmm, I've gone through all patches to the nova source tree in the rocky branch in the last three months and don't see anything at all that hits the code paths involved in setting allocation ratios... I'm a little stumped, to tell the truth. | 16:44 |
stephenfin | jaypipes: Are you planning to work on that "move CPU tracking to placement" spec this cycle? | 16:44 |
* stephenfin wonders if it's PTG worthy | 16:44 | |
jaypipes | mriedem: unless this is a super latent but is just recently rearing its head... perhaps.. | 16:44 |
sean-k-mooney | mlavalle: i geuss if it does not and it causes port binding to fail when then that will be enough for nova to know not to schduler to that node | 16:45 |
jaypipes | stephenfin: yeah, I guess I have to. I'd RATHER shove hot pokers in my eyeballs, though. | 16:45 |
mlavalle | sean-k-mooney: yes, that's whaat I would say | 16:45 |
mlavalle | sean-k-mooney: I left a comment in the etherpad, L166 regarding whther we need to discuss bandwidth based scheduling | 16:48 |
*** hshiina has quit IRC | 16:48 | |
*** hshiina has joined #openstack-nova | 16:48 | |
*** slaweq has quit IRC | 16:49 | |
sean-k-mooney | mlavalle: cool well it was more of a what is the current state of this and should we be planning to schduler review time to get this finish in stein topic. | 16:49 |
mlavalle | sean-k-mooney: ping them tomorrow, they are closer to your tz | 16:50 |
mlavalle | now it is very late for them | 16:50 |
sean-k-mooney | mlavalle: sure will do | 16:51 |
mlavalle | and maybe it is getting late for you as well | 16:51 |
gibi | mlavalle, sean-k-mooney: I and rubasov can give a status of the bandwidth work on the PTG if needed | 16:52 |
mlavalle | gibi: thanks | 16:53 |
*** dtantsur is now known as dtantsur|afk | 16:54 | |
*** ccamacho has quit IRC | 16:54 | |
*** psachin has quit IRC | 16:56 | |
*** gyee has joined #openstack-nova | 16:59 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova-specs master: Resource provider - request group mapping in allocation candidate https://review.openstack.org/597601 | 17:02 |
gibi | mlavalle: ^^ this spec is only impacting placement (and or Nova) but connected to the bandwidth work. I think we will discuss it in the placement related sessions rather than in the nova-neutron cross session. | 17:03 |
gibi | mlavalle: but you might be interested still | 17:03 |
mlavalle | gibi: ack, thanks for the heads up | 17:04 |
sean-k-mooney | mlavalle: i tend to start late and work late. | 17:13 |
mriedem | i work hard and i play hard | 17:14 |
mriedem | mtreinish: | 17:14 |
sean-k-mooney | gibi: sure if you want to cover that in the placement sessions then that works too. just wanted to make sure it did not slip through the cracks | 17:15 |
*** tbachman has joined #openstack-nova | 17:15 | |
gibi | sean-k-mooney: at the moment the resource mapping does not affect Neutron it either affect placement and nova or nova only. | 17:16 |
gibi | sean-k-mooney: neutron will get the nework device RP uuid from nova during the port binding anyhow | 17:16 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: (Re)start caching scheduler after starting computes in tests https://review.openstack.org/597606 | 17:17 |
sean-k-mooney | gibi: previded we modify nova to pass them to you :) | 17:18 |
sean-k-mooney | gibi: looking at the spec this looks pretty familar to what we discussed back in dublin | 17:18 |
gibi | sean-k-mooney: code is up that does passes the RP from nova to neutron https://review.openstack.org/#/c/569459/26/nova/network/neutronv2/api.py@3129 | 17:19 |
*** tbachman has quit IRC | 17:19 | |
gibi | sean-k-mooney: it only works for the simple cases | 17:19 |
gibi | sean-k-mooney: I can make it work for the general case but it won't scale | 17:20 |
gibi | sean-k-mooney: so I proposed the spec to do the mapping in placement | 17:20 |
gibi | sean-k-mooney: I have to leave for today I happy to continue the discussion tomorrow, or on the review, and eventually on the PTG | 17:21 |
sean-k-mooney | gibi: oh cool i had not seen that series. i will have to pull it down and try it out | 17:21 |
sean-k-mooney | gibi: no worries have a good evening | 17:21 |
gibi | sean-k-mooney: same to you | 17:21 |
mriedem | so my debug logging patch didn't fail xenserver ci since they hard-coded the allocation ratios in nova.conf http://dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/53/597553/1/check/dsvm-tempest-neutron-network/971ea88/logs/etc/nova/nova.conf.txt.gz | 17:24 |
mriedem | i need to find their repo to revert that change | 17:24 |
*** gbarros has quit IRC | 17:25 | |
jaypipes | mriedem: my bad, sorry. | 17:26 |
mriedem | help me find the repo | 17:26 |
*** tbachman has joined #openstack-nova | 17:26 | |
sean-k-mooney | mriedem: could you just hard code the nova.conf passing code to return none | 17:26 |
sean-k-mooney | mriedem: or whatever it returns when its not set | 17:27 |
*** gbarros has joined #openstack-nova | 17:27 | |
sean-k-mooney | mriedem: they are hard coding it in the local.conf http://dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/53/597553/1/check/dsvm-tempest-neutron-network/971ea88/logs/local.conf.txt.gz | 17:29 |
mriedem | i didn't think git://git.openstack.org/openstack/os-xenapi was it | 17:30 |
mriedem | yeah it's not | 17:30 |
cdent | mriedem: did you say that zigo's thing was changed to depends-on your debuggery? | 17:32 |
mriedem | yeah but i'm not sure it would help | 17:32 |
dansmith | mriedem: yeah that's the pack of xenapi plugins I think | 17:32 |
mriedem | https://review.openstack.org/#/c/597175/ but that doesn't actually get the nova change into the nova package | 17:33 |
mriedem | sean-k-mooney: oh you meant the allocation ratios - yes i knew that | 17:34 |
mriedem | and yeah i might have to hack the nova code to ignore the config | 17:34 |
mriedem | but first lunch | 17:34 |
sean-k-mooney | mriedem: yes i ment hack the nova code to ignore the configs so that you can work around there hack to hard code them :) | 17:35 |
jaypipes | mriedem: https://review.openstack.org/#/c/597428/ | 17:36 |
sean-k-mooney | jaypipes: oh you found it. am i the only on that is bothered by the fact the repo is xenapi-os-testing when the other one is os-xenapi | 17:38 |
sean-k-mooney | jaypipes: that change is not merged however which implies the ci is not running off of master of that repo? | 17:39 |
mriedem | they might be patching that into all CI runs | 17:42 |
jaypipes | what mriedem said. | 17:43 |
sean-k-mooney | mriedem: you could see if a depends on would override it. e.g. make a noop patch to xenapi-os-testing then depend on it to force the unpatched version? | 17:43 |
jaypipes | sean-k-mooney: that would assume the xenserver CI is honouring depends-on, no? | 17:44 |
*** sahid has quit IRC | 17:44 | |
sean-k-mooney | true | 17:45 |
mriedem | i'm pretty sure they do, | 17:45 |
mriedem | i'll just revert that change and depends-on the nova logging patch | 17:45 |
sean-k-mooney | is https://bugs.launchpad.net/nova/+bug/1789654 only happenign with the xen driver by the way? | 17:46 |
openstack | Launchpad bug 1789654 in OpenStack Compute (nova) rocky "placement allocation_ratio initialized with 0.0" [High,Confirmed] | 17:46 |
mriedem | no, | 17:47 |
mriedem | zigo is hitting it on rocky with libvirt | 17:47 |
melwitt | mriedem: I'm unsure whether to +1 this rocky final releases patch given the regression you're investigating. I assume we are too late to fix it for GA, but not 100% sure https://review.openstack.org/597529 | 17:48 |
mriedem | melwitt: i assume the GA ship has sailed | 17:49 |
mriedem | given we aren't hitting this in the 'normal' gate and we don't have root cause, i don't think we can hold anything up | 17:50 |
melwitt | ok. I wasn't clear whether the latest find in xenserver CI yielded a root cause or not. thanks | 17:51 |
sean-k-mooney | mriedem: strange just looking at my devstack install everything looks fine on libvirt. | 17:53 |
sean-k-mooney | mriedem: do you know how to reproduce or is that what your currently investiaging | 17:53 |
*** slaweq has joined #openstack-nova | 17:53 | |
mriedem | ... | 17:53 |
melwitt | in the past, the 0.0 was supposed to be a signal for the scheduler to use different default values, which I always found confusing | 17:54 |
mriedem | what about "don't know root cause" is .... | 17:54 |
mriedem | yes it should read from hard-coded values in the compute node | 17:54 |
mriedem | this isn't the scheduler, | 17:54 |
mriedem | it's what goes into the resource providers in placement, | 17:54 |
mriedem | via the RT / ComputeNode object | 17:54 |
mriedem | here is a revert on the xen ci patch https://review.openstack.org/#/c/597613/1 | 17:55 |
mriedem | plus nova logs | 17:55 |
melwitt | this is the quote I was thinking of from the config option help, "NOTE: This can be set per-compute, or if set to 0.0, the value set on the scheduler node(s) or compute node(s) will be used and defaulted to 16.0." | 17:57 |
mriedem | yes | 17:57 |
mriedem | the 16.0 comes from the compute node object code | 17:57 |
melwitt | oh, ok | 17:57 |
mriedem | https://github.com/openstack/nova/blob/master/nova/objects/compute_node.py#L188 | 17:57 |
mriedem | which is used here https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L106 | 17:58 |
mriedem | jaypipes: hmm, maybe we aren't reading from a compute node that's come out of the db | 17:59 |
melwitt | ok, so somehow that is not being effected into the actual ratio being used (the bug) | 17:59 |
mriedem | jaypipes: nvm, ComputeNode.create calls _from_db_object | 18:00 |
mriedem | so those allocation ratio fields will be set after create | 18:00 |
jaypipes | mriedem: yea | 18:02 |
mriedem | plus if that were the case we'd always fail this in the gate | 18:02 |
*** slaweq has quit IRC | 18:03 | |
sean-k-mooney | mriedem: if it will help i can try to run the reporduce.sh script without the hard coded ratios in a clean vm and see if it will result in the 0.0 allocations? | 18:07 |
melwitt | after it pulls the defaults from the compute node object, what is the thing that is supposed to set those default values in placement? | 18:08 |
melwitt | the RT? makes a call to update the inventory? (reading from the bug) | 18:09 |
melwitt | ok, yeah here https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L902 | 18:11 |
sean-k-mooney | i would have assumed the ratios were ultimatly set in the virt driver in update_provider_tree | 18:11 |
mtreinish | mriedem: ? | 18:13 |
sean-k-mooney | melwitt: oh so we get tree from placement, and pass it to the driver in the update_provider_tree call then normalise which set the defulats then the report_clinet updates placement? | 18:15 |
mriedem | mtreinish: "we work hard and we play hard" | 18:17 |
sean-k-mooney | melwitt: the xenapi driver does not impmentd update provider tree yet so we are hittig the excpet block https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L907-L921 | 18:18 |
mriedem | mtreinish: if you're going to be around at all anymore, i need it to be for my quick simpsons references | 18:18 |
mriedem | melwitt: https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L902 doesn't update placement | 18:18 |
mriedem | it updates a view of the provider tree locally | 18:18 |
melwitt | sean-k-mooney: was just thinking about that and whether it's related. does that mean it won't update placement? | 18:19 |
mriedem | reportclient.update_from_provider_tree(context, prov_tree) is the call that is meant to send the changes from local to remote (placement) | 18:19 |
melwitt | mriedem: I see, ok, I was just about to look for that | 18:19 |
sean-k-mooney | mriedem: but we dont call that in the except block... | 18:19 |
melwitt | mriedem: but that will fail because the xen driver has not implemented it, so then there's a fallback that will do it? trying to understand what is here | 18:19 |
mriedem | i actually thought the xen driver implemented update_provider_tree, but even so, self.scheduler_client.set_inventory_for_provider( should update the thing | 18:20 |
mriedem | in placement | 18:20 |
mtreinish | mriedem: heh, I didn't realize you were in the steel industry | 18:20 |
mriedem | rust belt baby | 18:20 |
melwitt | ok, hm | 18:21 |
*** slaweq has joined #openstack-nova | 18:22 | |
mriedem | this is likely related since the report client still relies on a provider tree for caching | 18:23 |
melwitt | yeah, like you mentioned in the bug. but how could the cache think the allocation ratio is already set to 16.0, for example? and think nothing changed | 18:24 |
melwitt | if it's currently 0.0 | 18:24 |
sean-k-mooney | mriedem: well we dont call prov_tree.update_inventory(nodename, inv_data) in the except block you mentioned that updated the internal view | 18:26 |
mriedem | so my logging patch is most likely not going to help the xen case here because it's not going down that alternative route, i'll update in a bit | 18:26 |
mriedem | sean-k-mooney: no but the report client does internally | 18:26 |
cdent | mriedem: what are the odds it was related to this change: https://review.openstack.org/#/c/520024/ | 18:26 |
mriedem | on "it's" version | 18:26 |
mriedem | cdent: heh jaypipes already brought that up :) | 18:26 |
mriedem | and it was the first thing i thought of today | 18:27 |
mriedem | but that isn't on rocky and zigo said he hit this on rocky | 18:27 |
cdent | oh, sorry, I was out walking and it crossed my mind | 18:27 |
mriedem | unless of course zigo is actually hitting stein code somehow | 18:27 |
cdent | I'm not certain that what zigo is experiencing is the same as what the xen stuff is experiencing | 18:27 |
melwitt | do we see this log message, LOG.warning('Unable to refresh my resource provider record')? says "# NOTE(danms): Either we failed to fetch/create the RP on our first attempt, or a previous attempt had to invalidate the cache, and we were unable to refresh it. Bail and try again next time." | 18:27 |
cdent | because it's not clear what setup zigo has or hasn't done | 18:27 |
jaypipes | yeah, cdent, that was SOOOOO one hour ago, geez! :P | 18:28 |
cdent | heh, I was fetching elderberries | 18:28 |
cdent | which is like | 18:28 |
cdent | so british | 18:28 |
jaypipes | :) | 18:28 |
cdent | it was the first thing I came across when trying to figure out why the ComputeNode after get_inventory might be wrong | 18:29 |
cdent | because tracing the code the ComputeNode at that point has to have a cpu_allocation_ratio of 0 for us to see the results that are happening | 18:30 |
cdent | (was also prepared to fetch blackberries but they aren't ready) | 18:30 |
*** slaweq has quit IRC | 18:31 | |
mriedem | are dingleberries in season yet? | 18:31 |
jaypipes | cdent: yeah, after finding out that zigo was on rocky, I went and reviewed the last 3 months of patches to rocky that could have anything at all do with allocation ratio or inventory setting and couldn't find anything at all. Also note that the patch you mentioned above isn't in Rocky | 18:32 |
jaypipes | mriedem: always. | 18:32 |
mriedem | ha | 18:32 |
*** wolverineav has joined #openstack-nova | 18:32 | |
mriedem | melwitt: http://logs.openstack.org/41/590041/17/check/tempest-full/b3f9ddd/controller/logs/screen-n-cpu.txt.gz#_Aug_27_14_18_24_078058 is a failed xen run if you want to dig for logs | 18:33 |
melwitt | thx | 18:33 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add contributor guide for upgrade status checks https://review.openstack.org/596902 | 18:36 |
melwitt | not seeing this message in the log LOG.debug('Updated inventory for %s at generation %i', which should be there if we've ever successfully updated inventory. which supports the theory that self._provider_tree.has_inventory_changed is returning False | 18:41 |
melwitt | I don't see any of the error log messages associated with a failure to update inventory | 18:41 |
melwitt | looks like it would be helpful to have a debug message "Inventory has not changed, skipping update" when it skips | 18:43 |
sean-k-mooney | its kind of dump but im going to hard code a not implmented excpetion here https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L895 and restack and see if i get the same behavior | 18:43 |
mriedem | melwitt: yeah that's what my debug patch is doing | 18:44 |
sean-k-mooney | melwitt: i think thi is where we check if things have changed https://github.com/openstack/nova/blob/722d5b477219f0a2435a9f4ad4d54c61b83219f1/nova/scheduler/client/report.py#L865 | 18:44 |
mriedem | but now i'm distracted by fracas in the tc channel | 18:45 |
melwitt | mriedem: yeah, just saw that and was about to say, that's what you're already doing and are going to update it to also do it for the non-provider tree route | 18:45 |
melwitt | since that's where we're going for xen anyway | 18:45 |
mriedem | yes in progress | 18:45 |
mriedem | despite fracas | 18:45 |
melwitt | sean-k-mooney: yes, that's it | 18:46 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add debug logs for when provider inventory changes https://review.openstack.org/597560 | 18:51 |
mriedem | updated for alt path ^ note hyperv and vmware etc would all be failing from this as well if that alternate path is the issue | 18:51 |
sean-k-mooney | mriedem: i wonder if the hyperv ci also hardcordes the allocation ratios in the conf | 18:54 |
mriedem | we can find out | 18:56 |
sean-k-mooney | mriedem: the hyperv ones look ok | 18:58 |
sean-k-mooney | as in not set | 18:58 |
melwitt | oh, well, if xen is hard-coding values that match the 0.0 defaults, then placement would _not_ be updated right? | 18:58 |
melwitt | oh, nevermind. reportclient should be comparing with what's in placement | 18:59 |
sean-k-mooney | mlavalle: xen were hardcoding real ratios | 18:59 |
sean-k-mooney | * melwitt: ^ | 18:59 |
sean-k-mooney | melwitt: http://dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/60/597560/1/check/dsvm-tempest-neutron-network/0b2f0a8/logs/local.conf.txt.gz | 18:59 |
sean-k-mooney | [DEFAULT] | 19:00 |
sean-k-mooney | disk_allocation_ratio = 2.0 | 19:00 |
sean-k-mooney | ram_allocation_ratio = 1.5 | 19:00 |
sean-k-mooney | cpu_allocation_ratio = 16.0 | 19:00 |
melwitt | yeah. I was trying to think if that would appear to reportclient as "no change" and therefore not update placement | 19:00 |
sean-k-mooney | all vaild that said i would never advise setting the disk_allocation_ration over 1 | 19:00 |
melwitt | but, reportclient should be comparing what placement has with those new values, so it should see a change. but from what we know so far, it looks like it isn't seeing a change. mriedem's debug logs will confirm | 19:01 |
sean-k-mooney | melwitt: if you could some how get the value to the report clinet as 0.0 then yes | 19:01 |
melwitt | right, yeah | 19:02 |
*** jamesdenton has quit IRC | 19:02 | |
mriedem | sean-k-mooney: they are *now* | 19:03 |
mriedem | they weren't when they reported the issue | 19:03 |
mriedem | they are hard-coding them in config as a workaround for the CI failure | 19:03 |
mriedem | which is why i've reverted that change to try and actually get a recreate with logging | 19:03 |
sean-k-mooney | right so before they were not set | 19:04 |
cdent | mriedem: have you added logs which watch the value of the cn.*_allocation_ratio in some way? | 19:04 |
cdent | i'm looking at https://review.openstack.org/#/c/597560/2/nova/compute/resource_tracker.py,unified and wonder if we want more info about the state of the compute node along the way | 19:05 |
cdent | when cn.save() is called if those values are weird for some reason the ratio adjustment stuff in _from_db_object _might_ no be behaving as expected | 19:06 |
mriedem | i could add that | 19:06 |
cdent | (of course you have probably already analyzed this while I was getting elderberries) | 19:06 |
melwitt | sean-k-mooney: yeah, so focusing on the values somehow being 0.0 _after_ the normalize from compute node object, that's what got mentioned earlier, how that could possibly happen | 19:06 |
melwitt | the normalize function is supposed to be filling in with the defaults 16.0 etc | 19:07 |
sean-k-mooney | ... so my raise NotImplemented to force the alt path with the libvirt driver sill resulted in the correct vaules | 19:08 |
*** mrjk has quit IRC | 19:09 | |
*** pcaruana has quit IRC | 19:09 | |
*** mchlumsky has quit IRC | 19:09 | |
melwitt | ? so how is xen failing? I thought it was taking the alt path | 19:09 |
sean-k-mooney | melwitt: it is but taking the alt path is apparently not enough | 19:09 |
melwitt | oh | 19:09 |
sean-k-mooney | i have matt's debug patch applied also but im not sure that is going to show where the default values got applied | 19:11 |
*** mrjk has joined #openstack-nova | 19:12 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add debug logs for when provider inventory changes https://review.openstack.org/597560 | 19:12 |
mriedem | cdent: like this? ^ | 19:13 |
*** awaugama has quit IRC | 19:13 | |
*** jamesdenton has joined #openstack-nova | 19:16 | |
*** mrjk has quit IRC | 19:17 | |
* cdent looks | 19:17 | |
cdent | mriedem: yeah, nice. that combined with the other stuff ought to help see the flow | 19:18 |
*** mrjk has joined #openstack-nova | 19:19 | |
cdent | s/see/better see/ | 19:19 |
cdent | The difficulty with creating an MTC for this makes me anxious | 19:19 |
*** mrjk has quit IRC | 19:23 | |
*** efried_rollin is now known as efried | 19:26 | |
sean-k-mooney | im restacking in offline mode (with libvirt) we are expecting to see the defaulting to ... message if the compute node object is setting the defaults right | 19:26 |
sean-k-mooney | i can deploy a xen node tommorow if needed to see if i can reporduce | 19:28 |
*** mriedem has quit IRC | 19:31 | |
*** mriedem has joined #openstack-nova | 19:31 | |
*** gbarros has quit IRC | 19:36 | |
*** gbarros has joined #openstack-nova | 19:39 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Revert "libvirt: add method to configure migration speed" https://review.openstack.org/590814 | 19:41 |
cfriesen | jaypipes: re the cold migration with PCI devices. were you talking about the difference between it being theoretically supported and actually doing it? StarlingX integration tests do cold migration with PCI/SRIOV regularly, but I realize that doesn't answer the question for upstream. | 19:43 |
sean-k-mooney | cfriesen: i think i have done it in the past also i had tought it was ment to be supported. that said not sure it updated teh resouce tracker correctly | 19:45 |
jaypipes | cfriesen: yes, I'm referring to real-world deployments who do migrations where the instances hold on to their IP addresses, GPUs, and everything else and are migrated to a different rack/region whatever | 19:45 |
jaypipes | cfriesen: but whatever, I'm running from that conversation screaming. | 19:46 |
cfriesen | jaypipes: :) | 19:46 |
*** gbarros has quit IRC | 19:51 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional test for live migrate with anti-affinity group https://review.openstack.org/588935 | 19:55 |
mriedem | cfriesen: upstream supports cold migration with pci devices | 19:56 |
mriedem | remember moshe and ludovic got that working | 19:56 |
mriedem | there was also 3rd party ci from mellanox for it at one time i think | 19:57 |
mriedem | but that might be dead now | 19:57 |
*** slaweq has joined #openstack-nova | 20:01 | |
*** r-daneel has joined #openstack-nova | 20:01 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Delete instance_group_member records from API DB during archive https://review.openstack.org/588943 | 20:16 |
mriedem | melwitt: just noticed you had commented on this ^ test should cover the case you noted now | 20:16 |
melwitt | ok, will look | 20:17 |
mriedem | crap forgot to update the bug reference in the commit message | 20:20 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Delete instance_group_member records from API DB during archive https://review.openstack.org/588943 | 20:21 |
*** mgagne has joined #openstack-nova | 20:22 | |
*** priteau has quit IRC | 20:23 | |
*** mrjk has joined #openstack-nova | 20:24 | |
*** gbarros has joined #openstack-nova | 20:26 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Remove old check_attach version check in API https://review.openstack.org/588348 | 20:28 |
*** cdent has quit IRC | 20:28 | |
*** gbarros has quit IRC | 20:33 | |
*** luksky11 has joined #openstack-nova | 20:34 | |
*** gbarros has joined #openstack-nova | 20:36 | |
dansmith | melwitt: mriedem: this is going to pass tests in a few minutes: https://review.openstack.org/#/c/597206 | 20:39 |
dansmith | if you could ack it with a +1 (or tell me what you want changed), I will go about trying to figure out how I'm going to get that merged :) | 20:39 |
melwitt | will do | 20:40 |
dansmith | also i was going to verify resource providers before/after and then realized we can't really do that since other projects might create providers, and we have no "service type" field | 20:43 |
mriedem | one hack way to determine a compute node provider is via the VCPU inventory | 20:43 |
dansmith | for the moment, yeah, but meh | 20:44 |
dansmith | I'd rather get this in and work on the other stuff | 20:44 |
dansmith | because this was a PITA to get working | 20:44 |
mriedem | dansmith: need to recheck https://review.openstack.org/#/c/597566/ ? | 20:44 |
dansmith | just because I don't want to run it locally | 20:44 |
dansmith | mriedem: no, it's about to pass soon too | 20:45 |
mriedem | ok | 20:45 |
zigo | mriedem: As I told you, if you wish, I can push your patches into the packages... | 20:45 |
zigo | Package is building with the patch... | 20:45 |
mriedem | zigo: sure, but that's not something you'll release is it? with the debug log patch? | 20:46 |
mriedem | i'm just hoping to debug a recreate with ci logs | 20:46 |
zigo | mriedem: It just lives in my Stretch backport, until I remove the patch. | 20:47 |
zigo | I don't have the intention to push that to Debian Sid / Experimental, no. | 20:47 |
zigo | mriedem: once the package is built by my jenkins (you can see the build process there: https://stretch-queens.infomaniak.ch/job/nova/) then we just need to re-trigger the puppet-openstack CI. | 20:48 |
*** markvoelker has joined #openstack-nova | 20:48 | |
zigo | mriedem: Otherwise, I can teach you how to re-produce it on a local Stretch VM. It's very easy . | 20:50 |
mriedem | that's ok, looks like we have a recreate again in the xen ci https://review.openstack.org/#/c/597613/ | 20:53 |
*** markvoelker has quit IRC | 20:55 | |
mriedem | that doesn't have the logging i need though, so rechecking the xenserver ci job | 20:56 |
zigo | Silly me, wrong jenkins ... | 20:57 |
melwitt | dansmith: are you intentionally not checking for DISK_GB in the verify inventory step? | 20:59 |
dansmith | melwitt: um, duh, of course I'm not | 20:59 |
dansmith | I mean | 20:59 |
dansmith | who would verify DISK_GB | 20:59 |
dansmith | that'd be kinda, like, really stupid right? | 21:00 |
* dansmith looks away | 21:00 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add encrypted volume support to feature matrix docs https://review.openstack.org/570255 | 21:00 |
melwitt | lol | 21:00 |
dansmith | melwitt: like six of those patchsets were me getting resource classes wrong | 21:02 |
*** tbachman has quit IRC | 21:02 | |
dansmith | melwitt: just pushed to use a central list | 21:02 |
melwitt | heh. central list = good | 21:02 |
dansmith | since it takes about 90 minutes to test each one, I've been trying to make minimal change | 21:03 |
dansmith | you better hope this one works and I don't have to spend another couple days throwing things at the wall :) | 21:03 |
melwitt | it's gotta work | 21:04 |
*** erlon has quit IRC | 21:04 | |
*** mchlumsky has joined #openstack-nova | 21:11 | |
*** eharney has quit IRC | 21:15 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Default AZ for instance if cross_az_attach=False and checking from API https://review.openstack.org/469675 | 21:18 |
*** wolverineav has quit IRC | 21:20 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Time how long select_destinations() takes in conductor https://review.openstack.org/517108 | 21:23 |
zigo | mriedem: Package built, waiting for recheck now. | 21:24 |
zigo | It probably will end when I'll be sleeping ... | 21:25 |
*** tbachman has joined #openstack-nova | 21:27 | |
mriedem | yeah i'm t-15 minutes from parenting duties | 21:31 |
*** Sundar has joined #openstack-nova | 21:33 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Combine error handling blocks in _do_build_and_run_instance https://review.openstack.org/545960 | 21:40 |
*** liuyulong has quit IRC | 21:42 | |
*** mchlumsky has quit IRC | 21:44 | |
*** mriedem is now known as mriedem_away | 21:44 | |
*** rcernin has joined #openstack-nova | 21:49 | |
*** mriedem_away has quit IRC | 21:49 | |
Sundar | efried: Please ping me | 21:50 |
*** gbarros has quit IRC | 21:51 | |
*** takashin has joined #openstack-nova | 21:51 | |
*** luksky11 has quit IRC | 22:03 | |
*** r-daneel_ has joined #openstack-nova | 22:05 | |
*** r-daneel has quit IRC | 22:05 | |
*** r-daneel_ is now known as r-daneel | 22:05 | |
openstackgerrit | Merged openstack/nova master: doc: add info how to troubleshoot vmware specific problems https://review.openstack.org/597446 | 22:07 |
*** mlavalle has quit IRC | 22:12 | |
*** slaweq has quit IRC | 22:17 | |
*** threestrands has joined #openstack-nova | 22:19 | |
*** threestrands has quit IRC | 22:19 | |
*** threestrands has joined #openstack-nova | 22:22 | |
*** markvoelker has joined #openstack-nova | 22:46 | |
*** r-daneel has quit IRC | 22:56 | |
*** macza has quit IRC | 23:03 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Document differences and similaries between extra specs and hints https://review.openstack.org/581410 | 23:09 |
*** slaweq has joined #openstack-nova | 23:11 | |
*** markvoelker has quit IRC | 23:12 | |
*** slaweq has quit IRC | 23:15 | |
*** holser_ has quit IRC | 23:29 | |
*** macza has joined #openstack-nova | 23:31 | |
*** macza has quit IRC | 23:36 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Add TODO note for mox removal https://review.openstack.org/576758 | 23:51 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!