*** brinzhang has joined #openstack-nova | 00:06 | |
*** Sundar has quit IRC | 00:12 | |
*** efried1 has joined #openstack-nova | 00:50 | |
*** efried has quit IRC | 00:51 | |
*** efried1 is now known as efried | 00:51 | |
openstackgerrit | Merged openstack/nova master: [placement] Make _ensure_aggregate context not independent https://review.openstack.org/597486 | 00:52 |
---|---|---|
*** efried has quit IRC | 00:58 | |
openstackgerrit | Merged openstack/nova master: Add explanatory prefix to post_test_perf output https://review.openstack.org/591850 | 01:05 |
openstackgerrit | Merged openstack/nova master: Add trait query to placement perf check https://review.openstack.org/592624 | 01:12 |
openstackgerrit | Merged openstack/nova master: Restart scheduler in TestNovaManagePlacementHealAllocations https://review.openstack.org/597571 | 01:12 |
openstackgerrit | Merged openstack/nova master: reshaper: Look up provider if not in inventories https://review.openstack.org/585033 | 01:12 |
*** imacdonn has quit IRC | 01:19 | |
*** alex_xu has joined #openstack-nova | 01:19 | |
*** imacdonn has joined #openstack-nova | 01:19 | |
*** efried has joined #openstack-nova | 01:20 | |
*** gyee has quit IRC | 01:22 | |
*** jiapei has joined #openstack-nova | 01:23 | |
*** hongbin has joined #openstack-nova | 01:54 | |
*** jamesdenton has quit IRC | 01:59 | |
*** slaweq has joined #openstack-nova | 02:11 | |
*** slaweq has quit IRC | 02:16 | |
*** psachin has joined #openstack-nova | 02:50 | |
*** Dinesh_Bhor has joined #openstack-nova | 03:17 | |
openstackgerrit | Merged openstack/nova master: Make get_allocations_for_resource_provider raise https://review.openstack.org/584598 | 03:28 |
openstackgerrit | Merged openstack/nova master: api-ref: fix volume attachment update policy note https://review.openstack.org/596489 | 03:28 |
*** jiapei has quit IRC | 03:33 | |
*** dave-mccowan has quit IRC | 03:38 | |
*** nicolasbock has quit IRC | 03:40 | |
*** moshele has joined #openstack-nova | 04:00 | |
moshele | melwitt: hi | 04:00 |
moshele | melwitt: did I answer you question on https://review.openstack.org/#/c/595592? It an old legacy bug that was revealed because tripleo started to config th rx/tx queues be default | 04:02 |
*** moshele has quit IRC | 04:04 | |
*** janki has joined #openstack-nova | 04:05 | |
*** moshele has joined #openstack-nova | 04:25 | |
*** markvoelker has joined #openstack-nova | 04:34 | |
*** ivve has joined #openstack-nova | 04:47 | |
*** hongbin has quit IRC | 04:56 | |
*** moshele has quit IRC | 04:59 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Fix a failure to format config sample https://review.openstack.org/597986 | 05:08 |
*** slaweq has joined #openstack-nova | 05:11 | |
*** tojuvone has joined #openstack-nova | 05:12 | |
*** slaweq has quit IRC | 05:15 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova stable/rocky: Fix a broken conf file description in networking doc https://review.openstack.org/597987 | 05:20 |
*** ccamacho has joined #openstack-nova | 05:38 | |
*** tetsuro has joined #openstack-nova | 05:53 | |
*** udesale has joined #openstack-nova | 05:53 | |
*** moshele has joined #openstack-nova | 05:54 | |
*** openstackgerrit has quit IRC | 06:07 | |
*** udesale has quit IRC | 06:07 | |
*** udesale has joined #openstack-nova | 06:08 | |
*** openstackgerrit has joined #openstack-nova | 06:21 | |
openstackgerrit | huanhongda proposed openstack/nova master: [WIP]Forbidden non-admin user to list deleted instances https://review.openstack.org/598012 | 06:21 |
*** alexchadin has joined #openstack-nova | 06:23 | |
*** janki has quit IRC | 06:30 | |
*** jchhatbar has joined #openstack-nova | 06:30 | |
*** hoonetorg has quit IRC | 06:39 | |
*** dklyle has quit IRC | 06:43 | |
*** adrianc has joined #openstack-nova | 06:43 | |
*** dklyle has joined #openstack-nova | 06:44 | |
*** hoonetorg has joined #openstack-nova | 06:46 | |
*** hshiina has quit IRC | 06:47 | |
*** luksky11 has joined #openstack-nova | 06:47 | |
*** tetsuro has quit IRC | 07:02 | |
*** alexchadin has quit IRC | 07:04 | |
*** alexchadin has joined #openstack-nova | 07:04 | |
*** alexchadin has quit IRC | 07:05 | |
*** tetsuro has joined #openstack-nova | 07:06 | |
*** slaweq has joined #openstack-nova | 07:11 | |
*** pcaruana has joined #openstack-nova | 07:14 | |
*** slaweq has quit IRC | 07:16 | |
*** rcernin has quit IRC | 07:20 | |
*** rtjure has quit IRC | 07:20 | |
*** maciejjozefczyk has joined #openstack-nova | 07:20 | |
*** slaweq has joined #openstack-nova | 07:23 | |
*** Dinesh_Bhor has quit IRC | 07:34 | |
*** Luzi has joined #openstack-nova | 07:35 | |
*** Dinesh_Bhor has joined #openstack-nova | 07:35 | |
*** janki has joined #openstack-nova | 07:43 | |
*** jchhatbar has quit IRC | 07:44 | |
*** sahid has joined #openstack-nova | 07:48 | |
*** jchhatbar has joined #openstack-nova | 07:49 | |
*** janki has quit IRC | 07:49 | |
*** jpena|off is now known as jpena | 07:54 | |
*** hoonetorg has quit IRC | 07:55 | |
openstackgerrit | Merged openstack/nova stable/rocky: Fix a broken conf file description in networking doc https://review.openstack.org/597987 | 07:56 |
*** cdent has joined #openstack-nova | 07:59 | |
*** maciejjozefczyk has quit IRC | 08:02 | |
*** maciejjozefczyk has joined #openstack-nova | 08:03 | |
bauzas | good morning Nova | 08:18 |
* bauzas is back after 3 weeks off | 08:18 | |
*** tetsuro has quit IRC | 08:20 | |
*** tetsuro has joined #openstack-nova | 08:22 | |
*** sahid has quit IRC | 08:23 | |
openstackgerrit | huanhongda proposed openstack/nova master: Fix instance delete stuck in deleting task_state https://review.openstack.org/598084 | 08:24 |
*** Dinesh_Bhor has quit IRC | 08:25 | |
*** sahid has joined #openstack-nova | 08:25 | |
*** Dinesh_Bhor has joined #openstack-nova | 08:27 | |
gibi | bauzas: welcome back | 08:29 |
bauzas | gibi: thanks | 08:29 |
* bauzas is just depiling his emails, so please ping me directly if you want me to review some stuff | 08:32 | |
*** Bhujay has joined #openstack-nova | 08:37 | |
*** markvoelker has quit IRC | 08:38 | |
*** luksky11 has quit IRC | 08:41 | |
openstackgerrit | Rong Han proposed openstack/nova master: Reset global variable after unit test is completed. https://review.openstack.org/598088 | 08:42 |
*** ttsiouts has joined #openstack-nova | 08:43 | |
*** donghm has left #openstack-nova | 08:48 | |
*** dtantsur|afk is now known as dtantsur | 08:49 | |
*** panda is now known as panda|rover | 08:52 | |
lyarwood | bauzas: welcome back o/ | 08:55 |
bauzas | lyarwood: thanks | 08:55 |
lyarwood | bauzas: https://review.openstack.org/#/q/topic:bug/1787606 - would you mind sticking that on your review queue if you have time today or tomorrow? | 08:55 |
dpawlik | Hello, which rules from policy.json are used by placement api? I have an another role for admin and its raising me an error on compute host that "ailed to retrieve resource provider tree from placement API for UUID 573c492d-7387-4a3a-b21c-20ce531eb483. Got 403: {"errors": [{"status": 403, "request_id": "req-50ecb39c-8bea-4f52-85f2-7b92db9ae9cf", "detail": "Access was denied to this resource.\n\n admin required ", "title" | 08:56 |
dpawlik | : "Forbidden"}]}. | 08:56 |
lyarwood | after rm -rf'ing all of your emails obviously | 08:56 |
bauzas | lyarwood: today could be difficult but I can try | 08:56 |
lyarwood | bauzas: yeah no rush | 08:56 |
bauzas | lyarwood: rm -rf is one option, the other involves reading | 08:56 |
gibi | sean-k-mooney, melwitt: fyi, we are planning to show some demo on the PTG about the state of the bandwidth work http://lists.openstack.org/pipermail/openstack-dev/2018-August/134015.html | 08:56 |
lyarwood | bauzas: both are WIP I just wanted to get some input on the bug and potential fix | 08:56 |
dpawlik | problem is that I changed in policy.json file that admin_api is role:my_role | 08:56 |
bauzas | lyarwood: I'm not sure which one is the best | 08:56 |
dpawlik | but its not working on queens | 08:56 |
lyarwood | bauzas: ^_^ rm -rf every time | 08:57 |
lyarwood | bauzas: if it's that important people will send another email | 08:57 |
bauzas | and wait for others yelling at you that you haven't replied them ? That could work | 08:57 |
kashyap | Yeah, "selective reading" is the way to taming e-mail. | 09:01 |
* kashyap pretty aggressively filters ("mark as read")out based on how coherently one writes. | 09:01 | |
kashyap | The more meandering a message, the faster it gets marked as read. | 09:01 |
*** holser_ has joined #openstack-nova | 09:02 | |
*** holser_ has quit IRC | 09:02 | |
*** holser_ has joined #openstack-nova | 09:03 | |
*** maciejjozefczyk has quit IRC | 09:05 | |
*** dpawlik has quit IRC | 09:06 | |
*** maciejjozefczyk has joined #openstack-nova | 09:06 | |
*** dpawlik has joined #openstack-nova | 09:07 | |
*** takashin has quit IRC | 09:09 | |
*** markvoelker has joined #openstack-nova | 09:13 | |
*** markvoelker has quit IRC | 09:15 | |
openstackgerrit | huanhongda proposed openstack/nova master: Fix instance delete stuck in deleting task_state https://review.openstack.org/598084 | 09:15 |
*** sambetts|afk is now known as sambetts | 09:18 | |
*** luksky11 has joined #openstack-nova | 09:18 | |
*** priteau has joined #openstack-nova | 09:20 | |
*** maciejjozefczyk has quit IRC | 09:21 | |
*** tetsuro has quit IRC | 09:23 | |
*** udesale has quit IRC | 09:26 | |
*** maciejjozefczyk has joined #openstack-nova | 09:31 | |
*** maciejjozefczyk has quit IRC | 09:32 | |
*** takashin has joined #openstack-nova | 09:34 | |
*** takashin has left #openstack-nova | 09:34 | |
*** ccamacho has quit IRC | 09:39 | |
*** ccamacho has joined #openstack-nova | 09:39 | |
*** adrianc has quit IRC | 09:46 | |
stephenfin | bauzas: Welcome back. Here's a nice, easy (low priority) review to get you started https://review.openstack.org/#/c/530924/ | 09:51 |
* bauzas opening a new tab | 09:52 | |
sean-k-mooney | gibi: oh cool i look forward to seeing it | 09:52 |
*** alexchadin has joined #openstack-nova | 09:57 | |
*** kosamara has quit IRC | 10:01 | |
*** cdent has quit IRC | 10:02 | |
*** kosamara has joined #openstack-nova | 10:03 | |
Dinesh_Bhor | sean-k-mooney: Hi, May I have your 5 min? | 10:04 |
sean-k-mooney | Dinesh_Bhor: hi am sure what can i help with? | 10:04 |
Dinesh_Bhor | sean-k-mooney: we have a quota for instances currently. Actually we have some hypervisors which are specifically dedicated for "Rich VM" so we want to have separate quota for normal VM's and Rich VM's per project. | 10:06 |
Dinesh_Bhor | sean-k-mooney: do you think its a good idea to submit a blueprint for this? Or can it be managed somehow with metadata's? | 10:07 |
Dinesh_Bhor | Or something else may be | 10:07 |
sean-k-mooney | Dinesh_Bhor: this is similar to premptiable instances. in that case we also had the idea of two classes of instance that may have different sla's | 10:08 |
*** maciejjozefczyk has joined #openstack-nova | 10:08 | |
sean-k-mooney | Dinesh_Bhor: so if you were to submit a blueprint for this it may be worth trying to adress that usecase also so its a more general solution. | 10:09 |
sean-k-mooney | Dinesh_Bhor: that said the rich vms | 10:09 |
sean-k-mooney | is there richness determined by the host they land on or is it an aspect of the flavor | 10:09 |
sean-k-mooney | e.g. certin flavor you would like to have a seperate quota for. | 10:10 |
*** maciejjozefczyk has quit IRC | 10:10 | |
*** deepak_mourya_ has joined #openstack-nova | 10:11 | |
sean-k-mooney | Dinesh_Bhor: submitting a blueprint that clearly states what your usecase is and the constraits is never a bad idea but that highlevel usecase is what we would like it to capture rather then i think it should be done X way | 10:11 |
Dinesh_Bhor | sean-k-mooney: we have dedicated rich flavors which land on predefined hypervisors. Its like giving bare metal kind of experience with Rich-VMs | 10:12 |
Dinesh_Bhor | sean-k-mooney: hosts are high in memory, cpus to give bare metal kind of performance. So we want Rich-Flavors to be deployed on those hosts and want to manage quota for them. | 10:14 |
Dinesh_Bhor | sean-k-mooney: similar to normal vms per project | 10:14 |
sean-k-mooney | Dinesh_Bhor: right in that case rather then an instance quota the abblity to set a flavor qouta for a speific flavor would be enough to suit your partcalar usecase correct? | 10:14 |
sean-k-mooney | you could then use flavor to host affinity to ensure those flavor got shceduled to the correct hosts | 10:15 |
Dinesh_Bhor | A project can have normal as well as rich vms | 10:15 |
Dinesh_Bhor | sean-k-mooney: yes, we are managing that with AggregateInstanceExtraSpecFilter | 10:16 |
Dinesh_Bhor | sean-k-mooney: we are on Mitaka so can not use placement. | 10:16 |
sean-k-mooney | Dinesh_Bhor: yes when i said flavor quota i ment can have 10 instances of flavor X and any number of flavors without a qota provided they dont exceed other qoutas such as cpus | 10:16 |
*** adrianc has joined #openstack-nova | 10:17 | |
sean-k-mooney | Dinesh_Bhor: the only non invasive way i can consive of to achive this i mitaka would be to write a schduler filter that would check you usage of the rich vms and fail all hosts if you exceed a qouta | 10:18 |
Dinesh_Bhor | sean-k-mooney: yes, but for that I thought of storing "no of rich-vms allowed" metadata in host-aggregate for per project but that will again degrade the performance of scheduler I think If we have 1000+ projects. | 10:21 |
Dinesh_Bhor | sean-k-mooney: okay, let me check the flavor quota thing. | 10:22 |
Dinesh_Bhor | first | 10:22 |
sean-k-mooney | Dinesh_Bhor: yes it would degrade the performance. | 10:22 |
sean-k-mooney | Dinesh_Bhor: i would speak to melwitt about this also. i belive she will be looking at how we can start to use the new keystone limmits api in nova going forward. that might help with this usecase | 10:23 |
Dinesh_Bhor | sean-k-mooney: yes, thank you so much | 10:24 |
*** jaosorior has quit IRC | 10:25 | |
*** jaosorior has joined #openstack-nova | 10:27 | |
*** udesale has joined #openstack-nova | 10:33 | |
*** dave-mccowan has joined #openstack-nova | 10:35 | |
*** tbachman has quit IRC | 10:39 | |
*** udesale has quit IRC | 10:44 | |
*** udesale has joined #openstack-nova | 10:44 | |
*** claudiub has joined #openstack-nova | 10:45 | |
*** nicolasbock has joined #openstack-nova | 10:47 | |
*** erlon has joined #openstack-nova | 10:48 | |
*** ttsiouts has quit IRC | 10:56 | |
*** priteau has quit IRC | 11:00 | |
openstackgerrit | Rong Han proposed openstack/nova master: Reset global variable after unit test is completed. https://review.openstack.org/598088 | 11:00 |
*** Dinesh_Bhor has quit IRC | 11:03 | |
*** macza has joined #openstack-nova | 11:04 | |
*** priteau has joined #openstack-nova | 11:08 | |
*** macza has quit IRC | 11:09 | |
*** holser_ has quit IRC | 11:14 | |
*** holser_ has joined #openstack-nova | 11:15 | |
*** cdent has joined #openstack-nova | 11:15 | |
openstackgerrit | Merged openstack/nova master: Report client: Real get_allocs_for_consumer https://review.openstack.org/584599 | 11:17 |
*** cdent has quit IRC | 11:23 | |
*** tetsuro has joined #openstack-nova | 11:27 | |
*** jpena is now known as jpena|lunch | 11:27 | |
*** rnm has joined #openstack-nova | 11:34 | |
*** rnm is now known as rmart04 | 11:36 | |
openstackgerrit | Rong Han proposed openstack/nova master: Reset global variable after unit test is completed. https://review.openstack.org/598088 | 11:36 |
*** tetsuro has quit IRC | 11:38 | |
*** threestrands has quit IRC | 11:38 | |
*** rmart04 has quit IRC | 11:40 | |
*** rnm has joined #openstack-nova | 11:40 | |
*** rnm has quit IRC | 11:42 | |
*** rnm has joined #openstack-nova | 11:42 | |
*** ttsiouts has joined #openstack-nova | 11:43 | |
*** rnm is now known as rmart04 | 11:43 | |
*** cdent has joined #openstack-nova | 11:49 | |
*** gouthamr has quit IRC | 11:49 | |
*** alexchadin has quit IRC | 11:59 | |
*** tetsuro has joined #openstack-nova | 12:03 | |
bauzas | stephenfin: https://review.openstack.org/#/c/530924/6 needs a new oslo.config version, right? | 12:04 |
*** tetsuro has quit IRC | 12:07 | |
bauzas | stephenfin: nevermind, I can see it's from oslo.config 5.2.0 | 12:07 |
bauzas | https://docs.openstack.org/releasenotes/oslo.config/queens.html#relnotes-5-2-0-stable-queens | 12:07 |
bauzas | and nova uses the latest https://github.com/openstack/nova/blob/master/requirements.txt#L40 | 12:08 |
*** udesale has quit IRC | 12:08 | |
*** tetsuro has joined #openstack-nova | 12:08 | |
*** dpawlik has quit IRC | 12:08 | |
*** dpawlik has joined #openstack-nova | 12:09 | |
*** dpawlik has quit IRC | 12:10 | |
*** dpawlik has joined #openstack-nova | 12:10 | |
*** udesale has joined #openstack-nova | 12:11 | |
zigo | bauzas: Don't worry, Rocky doesn't even build with 5.2.0 anywway ... :) | 12:12 |
zigo | Once more, requirements are just plain wrong. | 12:12 |
zigo | As usual, I'd say... | 12:12 |
zigo | Some packages are using oslo_config.sphinxconfiggen which isn't available in oslo.config 5.2.0. | 12:15 |
zigo | networking-bagpipe for example. | 12:15 |
bauzas | stephenfin: heh, I found you a new Friday nick <finucannitbacktick> :p | 12:16 |
bauzas | zigo: that's a project related issue | 12:17 |
zigo | bauzas: Ok, you need another example ... | 12:17 |
bauzas | zigo: the reviewers should look at the needed oslo version when they merge a new feature | 12:17 |
zigo | bauzas: Neutron has 1700+ unit test failures with current lower bounds ! :) | 12:18 |
*** brinzhang has quit IRC | 12:20 | |
*** brinzhang has joined #openstack-nova | 12:21 | |
*** alexchadin has joined #openstack-nova | 12:28 | |
*** oanson has joined #openstack-nova | 12:30 | |
*** jpena|lunch is now known as jpena|off | 12:31 | |
*** jpena|off is now known as jpena | 12:32 | |
*** slaweq has quit IRC | 12:34 | |
*** slaweq has joined #openstack-nova | 12:34 | |
*** alexchadin has quit IRC | 12:38 | |
*** udesale has quit IRC | 12:40 | |
openstackgerrit | Jay Pipes proposed openstack/os-traits master: clean up CUDA traits https://review.openstack.org/597170 | 12:41 |
*** tbachman has joined #openstack-nova | 12:42 | |
stephenfin | bauzas: :D | 12:48 |
stephenfin | bauzas: People will eventually learn :) | 12:48 |
stephenfin | zigo: What do you mean, it doesn't build? | 12:49 |
*** vivsoni has quit IRC | 12:49 | |
stephenfin | bauzas: Also, we have oslo.config 6.1.0 in lower-constraints so I think we're all good there | 12:49 |
*** vivsoni_ has joined #openstack-nova | 12:49 | |
bauzas | yep | 12:50 |
*** mchlumsky has joined #openstack-nova | 12:50 | |
*** udesale has joined #openstack-nova | 12:53 | |
*** mriedem has joined #openstack-nova | 12:54 | |
mriedem | cdent: can you remind someone internally to re-propose the spec for this for stein? https://blueprints.launchpad.net/nova/+spec/vmware-live-migration | 12:56 |
mriedem | or you can if you want, it's just procedural | 12:56 |
mriedem | https://specs.openstack.org/openstack/nova-specs/readme.html#previously-approved-specifications | 12:57 |
cdent | mriedem: yeah, I think rado's gonna take care of it, he was on pto for a while | 12:58 |
*** tbachman has quit IRC | 12:59 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/rocky: Restart scheduler in TestNovaManagePlacementHealAllocations https://review.openstack.org/598152 | 12:59 |
brinzhang | mriedem: Could you please review this specs, https://review.openstack.org/#/c/591976/? If you have time :) | 12:59 |
mriedem | sure | 13:00 |
brinzhang | thanks ^^ | 13:00 |
*** tssurya has joined #openstack-nova | 13:01 | |
cdent | mriedem: is there any new insight on the allocation thing yet, or has that not come round on the radar yet? | 13:05 |
*** moshele has quit IRC | 13:06 | |
mriedem | i think i got a failing xen ci result last night after i knocked off for the day, was going to investigate this morning | 13:08 |
mriedem | 1 sip into coffee ... | 13:08 |
*** erlon has quit IRC | 13:08 | |
openstackgerrit | Chen proposed openstack/nova master: Fix filter server list by multiple vm or task states https://review.openstack.org/598154 | 13:09 |
*** mchlumsky has quit IRC | 13:10 | |
*** mchlumsky has joined #openstack-nova | 13:12 | |
*** brinzhang has quit IRC | 13:13 | |
*** alexchadin has joined #openstack-nova | 13:13 | |
mriedem | alex_xu: ^ looks like a behavior change | 13:16 |
mriedem | cdent: oh i remember now, i had to re-run the xen ci patch last night b/c it wasn't picking up my dependency in depends-on: <url> form b/c it's still using zuul v2 | 13:17 |
cdent | mriedem: fun! | 13:18 |
mriedem | the logs are there in the latest failed run though | 13:18 |
mriedem | http://dd6b71949550285df7dc-dda4e480e005aaa13ec303551d2d8155.r49.cf1.rackcdn.com/13/597613/2/check/dsvm-tempest-neutron-network/cc81140/logs/screen-n-cpu.txt.gz | 13:18 |
mriedem | looking at req-99d9d496-6720-4837-a2ee-560605fd1afe | 13:18 |
mriedem | naichuans: efried: ^ | 13:18 |
mriedem | Aug 29 16:56:06.926641 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.resource_tracker [None req-99d9d496-6720-4837-a2ee-560605fd1afe None None] Using cpu_allocation_ratio 16.0 for node: 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 | 13:19 |
mriedem | Aug 29 16:56:06.926926 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.resource_tracker [None req-99d9d496-6720-4837-a2ee-560605fd1afe None None] RT: Sending compute node inventory changes back toplacement for node: 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 | 13:19 |
mriedem | WAH WAH | 13:19 |
mriedem | Aug 29 16:56:06.965945 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.provider_tree [None req-99d9d496-6720-4837-a2ee-560605fd1afe None None] Inventory has not changed in ProviderTree for provider: 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 | 13:19 |
mriedem | hmm, but then it says it does update inventory | 13:20 |
mriedem | Aug 29 16:56:07.057208 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: DEBUG nova.compute.provider_tree [None req-99d9d496-6720-4837-a2ee-560605fd1afe None None] Updating resource provider 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 generation from 0 to 1 during operation: update_inventory {{(pid=24436) _update_generation /opt/stack/new/nova/nova/compute/provider_tree.py:161}} Aug 29 16:56:07.057499 dsvm-devstack-citr | 13:20 |
mriedem | ia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.provider_tree [None req-99d9d496-6720-4837-a2ee-560605fd1afe None None] Updating inventory in ProviderTree for provider 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 with inventory: {'VCPU': {'allocation_ratio': 16.0, 'total': 8, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 8}, 'MEMORY_MB': {'allocation_ratio': 1.5, 'total': 12795, 'reserved': 512, 'step_size': 1, | 13:20 |
mriedem | n_unit': 1, 'max_unit': 12795}, 'DISK_GB': {'allocation_ratio': 1.0, 'total': 47, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 47}} | 13:20 |
mriedem | there the allocation ratios are all correct | 13:20 |
mriedem | Aug 29 16:56:07.058213 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: DEBUG nova.scheduler.client.report [None req-99d9d496-6720-4837-a2ee-560605fd1afe None None] Updated inventory for 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 at generation 1: {'VCPU': {'allocation_ratio': 16.0, 'total': 8, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 8}, 'MEMORY_MB': {'allocation_ratio': 1.5, 'total': 12795, 'reserved | 13:21 |
mriedem | 12, 'step_size': 1, 'min_unit': 1, 'max_unit': 12795}, 'DISK_GB': {'allocation_ratio': 1.0, 'total': 47, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 47}} {{(pid=24436) _update_inventory_attempt /opt/stack/new/nova/nova/scheduler/client/report.py:965}} | 13:21 |
cdent | goes to zero at 16:58:05.613741 | 13:21 |
cdent | right after an "Inventory has not changed in ProviderTree for provider" | 13:23 |
openstackgerrit | Radoslav Gerganov proposed openstack/nova-specs master: VMware: add support for live migration https://review.openstack.org/598163 | 13:23 |
*** erlon has joined #openstack-nova | 13:24 | |
mriedem | Aug 29 16:58:05.483508 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.resource_tracker [None req-a869fa19-aa9d-4335-9816-42ff29b64d48 None None] Using cpu_allocation_ratio 0.0 for node: 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 | 13:24 |
mriedem | yeah wtf | 13:24 |
mriedem | that's in the _normalize_inventory_from_cn_obj method | 13:25 |
cdent | I'm gonna go with "something is being side-effecty" | 13:25 |
mriedem | somehow the ComputeNode.cpu_allocation_ratio is getting persisted as 0.0 maybe? | 13:25 |
cdent | you added a log for that didn't you? | 13:25 |
mriedem | yes and i don't see either of them | 13:26 |
mriedem | https://review.openstack.org/#/c/597560/3/nova/objects/compute_node.py. | 13:26 |
cdent | w & the t & the actual f | 13:26 |
cdent | write before the correct inventory is sent we have this line "Using cpu_allocation_ratio 0.0 for node [...]". that value, I would guess, is somehow being used for the _next_ inventory | 13:31 |
mriedem | i noticed that also, | 13:32 |
cdent | we're getting update inventories within 2 ms of one another. first one right, second one wrong | 13:32 |
mriedem | we have this with the wrong value | 13:32 |
mriedem | Aug 29 16:58:05.483508 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.resource_tracker [None req-a869fa19-aa9d-4335-9816-42ff29b64d48 None None] Using cpu_allocation_ratio 0.0 for node: 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 | 13:32 |
mriedem | and we have a good update here: | 13:32 |
mriedem | Aug 29 16:58:05.525151 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.provider_tree [None req-a869fa19-aa9d-4335-9816-42ff29b64d48 None None] Updating inventory in ProviderTree for provider 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 with inventory: {u'VCPU': {u'allocation_ratio': 16.0, u'total': 8, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 8}, u'MEMORY_MB': {u'allocation_ratio': | 13:32 |
mriedem | , u'total': 12795, u'reserved': 512, u'step_size': 1, u'min_unit': 1, u'max_unit': 12795}, u'DISK_GB': {u'allocation_ratio': 1.0, u'total': 47, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 47}} | 13:32 |
mriedem | and then the bad update: | 13:32 |
mriedem | Aug 29 16:58:05.613741 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.provider_tree [None req-a869fa19-aa9d-4335-9816-42ff29b64d48 None None] Updating inventory in ProviderTree for provider 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 with inventory: {'VCPU': {'allocation_ratio': 0.0, 'total': 8, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 8}, 'MEMORY_MB': {'allocation_ratio': 0.0, 'tot | 13:32 |
mriedem | 12795, 'reserved': 512, 'step_size': 1, 'min_unit': 1, 'max_unit': 12795}, 'DISK_GB': {'allocation_ratio': 0.0, 'total': 47, 'reserved': 0, 'step_size': 1, 'min_unit': 1, 'max_unit': 47}} | 13:32 |
*** jamesdenton has joined #openstack-nova | 13:38 | |
*** awaugama has joined #openstack-nova | 13:39 | |
*** davidsha has joined #openstack-nova | 13:40 | |
*** eharney has joined #openstack-nova | 13:40 | |
mriedem | hmm https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L622 | 13:42 |
mriedem | ^ we set the ComputeNode.cpu_allocation_ratio based on the config, which is 0.0 | 13:43 |
sean-k-mooney | mriedem: so it had the correct allocation ratio then it was chaged to 0.0 | 13:43 |
mriedem | i bet that is the problem | 13:43 |
*** toabctl has joined #openstack-nova | 13:44 | |
mriedem | _copy_resources is called from _init_compute_node, | 13:45 |
cdent | mriedem: but none of that stuff is new is it? | 13:45 |
mriedem | and on initial create of the compute node record, the ComputeNode.create() method will call _from_db_object at the end and fix the 0.0 allocation ratio to the hard-coded one, | 13:45 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Transform compute_task notifications https://review.openstack.org/482629 | 13:45 |
mriedem | then in a periodic run, the cn already exists, we'll copy over the busted 0.0 allocations from config, but b/c we removed the _update calls there, we don't fix the allocation ratios | 13:46 |
cdent | ah, there's the rub | 13:46 |
mriedem | which is https://review.openstack.org/#/c/520024/ | 13:46 |
mriedem | but that doesn't explain how zigo was hitting this on rocky | 13:46 |
mriedem | or how we're *not* hitting this in the normal gate | 13:46 |
openstackgerrit | Merged openstack/nova master: (Re)start caching scheduler after starting computes in tests https://review.openstack.org/597606 | 13:47 |
cdent | does the normal gate set conf? | 13:47 |
mriedem | no | 13:47 |
sean-k-mooney | mriedem: is there a reason we do not set the defults here https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L413-L416 | 13:47 |
*** Bhujay has quit IRC | 13:47 | |
mriedem | http://logs.openstack.org/24/520024/9/check/tempest-full/cbd025d/controller/logs/etc/nova/nova-cpu_conf.txt.gz | 13:47 |
mriedem | sean-k-mooney: yes read the help text | 13:48 |
alex_xu | mriedem: yea, sounds like | 13:48 |
*** udesale has quit IRC | 13:49 | |
sean-k-mooney | mriedem: hum right... so we can single to use the schduler nodes value | 13:49 |
mriedem | this was the change to make the defaults 0.0 https://github.com/openstack/nova/commit/4a9e14a7a73832b6b878160ba4a45f259d078d27 | 13:50 |
*** alexchadin has quit IRC | 13:50 | |
*** udesale has joined #openstack-nova | 13:50 | |
*** psachin has quit IRC | 13:51 | |
sean-k-mooney | mriedem: "That compat mode (having ratios defaulted to 0.0) is only planned to be kept for | 13:51 |
sean-k-mooney | Liberty and will be removed in the next release (Mitaka) | 13:51 |
*** jistr is now known as jistr|call | 13:51 | |
sean-k-mooney | well that never happened | 13:52 |
mriedem | talk to bauzas | 13:52 |
*** _hemna has quit IRC | 13:54 | |
bauzas | I'm back | 13:54 |
sean-k-mooney | mriedem: its just one of those things if there is not an explit TODO in teh code you can grep for at the end of a release its easy to miss removing this stuff | 13:54 |
mriedem | in a few months i'll be able to remove all the req spec compat code | 13:54 |
mriedem | sean-k-mooney: there are todos | 13:54 |
mriedem | # TODO(sbauza): Remove that in the next major version bump where | 13:54 |
mriedem | # we break compatibility with old Liberty computes | 13:54 |
bauzas | I think the comments explained the 0.0 values | 13:54 |
bauzas | it's because of an upgrade concern between Liberty and Mitaka | 13:55 |
mriedem | https://github.com/openstack/nova/blob/master/nova/objects/compute_node.py#L186 | 13:55 |
mriedem | cdent: so i'm adding more debug logs to verify where i think this is breaking down and will get another xen run | 13:55 |
bauzas | it was for knowing whether the operator was modifying the options directly, or using the defaults | 13:55 |
bauzas | sean-k-mooney: ^ | 13:55 |
bauzas | (a signal) | 13:55 |
cdent | mriedem: sounds like a good plan | 13:55 |
sean-k-mooney | bauzas: yes the commit and comments make that clear but presumably we should have deleted it before rocky | 13:56 |
bauzas | indeed | 13:56 |
*** gbarros has joined #openstack-nova | 13:56 | |
sean-k-mooney | bauzas: i assume the reason that there was not a explcit optin bool flag was so that we did not need modify the configs to get the new behavior if the operator had not overriden it before | 13:57 |
*** adrianc has quit IRC | 13:58 | |
bauzas | sean-k-mooney: it was because we changed so the options were per compute | 13:58 |
mriedem | in case you haven't noticed, the allocation ratio stuff is still biting us in the ass, so this isn't a very clear "now we can just remove stuff" case | 13:58 |
mriedem | i think jaypipes has at least 10 specs for dealing with this | 13:58 |
bauzas | mriedem: because we now use the ratio values directly without using the ComputeNode object | 13:59 |
mriedem | and this is such a mine field i'm afraid to change anything | 13:59 |
jaypipes | mriedem: yeah. :( | 13:59 |
mriedem | bauzas: not really | 13:59 |
mriedem | bauzas: this is where we get the allocatoin ratio to put into placement inventory https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L107 | 14:00 |
mriedem | if that's what you're referring to | 14:00 |
mriedem | which uses the compute node facade | 14:00 |
sean-k-mooney | the main issue we have at the moment is we have 2 different sets fo defaults that get applied depending on the code path we take for the same value | 14:00 |
*** tetsuro has quit IRC | 14:00 | |
mriedem | and because of https://review.openstack.org/#/c/520024/ it looks like we've side-stepped the facade from breaking us | 14:00 |
*** mlavalle has joined #openstack-nova | 14:03 | |
bauzas | mriedem: I'm trying to understand the problems, I should look at jaypipes's specs | 14:03 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add debug logs for when provider inventory changes https://review.openstack.org/597560 | 14:03 |
mriedem | posted debug notes in https://bugs.launchpad.net/nova/+bug/1789654 | 14:04 |
openstack | Launchpad bug 1789654 in OpenStack Compute (nova) rocky "placement allocation_ratio initialized with 0.0" [High,Confirmed] | 14:04 |
mriedem | if the allocation ratio in the db record is 16.0 but the object value in the RT is 0.0, we know that _copy_resources is what's changing our in-memory value and we're not persisting the change | 14:05 |
sean-k-mooney | mriedem: well we dont actully want to persist the change to the db in this case | 14:06 |
*** Luzi has quit IRC | 14:06 | |
sean-k-mooney | the db has the correct default | 14:06 |
mriedem | the db values are likely actually NULL | 14:06 |
mriedem | which is what the compute node object keys off of | 14:07 |
bauzas | mriedem: so we directly change the object value without reading it from the DB thru the facade ? | 14:07 |
mriedem | https://github.com/openstack/nova/blob/f534495a427d1683bc536cf003ec02edbf6d8a45/nova/objects/compute_node.py#L194 | 14:07 |
sean-k-mooney | the db perhaps but the object that is constrted form the db entry gets defaulted correctly | 14:07 |
sean-k-mooney | yep that is the line i was thinking of | 14:07 |
mriedem | sean-k-mooney: yes but my theory is we're not "fixing" the allocation ratios within the object after setting the values to 0.0 | 14:07 |
mriedem | because of https://review.openstack.org/#/c/520024/ | 14:07 |
mriedem | zigo: is it possible that you have ^ in your nova package somehow? | 14:08 |
sean-k-mooney | right because we removed the update call which update the resouce tracker | 14:08 |
mriedem | zigo: iow, are your nova rocky packages based on stable/rocky or 18.0.0 tags rather than just pulling from master? | 14:08 |
sean-k-mooney | mriedem: we proably should have kept the self._update on line 574 | 14:09 |
mriedem | that wouldn't have helped us in this case, | 14:09 |
mriedem | that's for a nova-compute restart where the cn record already exists, | 14:09 |
mriedem | what we're hitting is the condition above | 14:10 |
bauzas | mriedem: just to be clear, https://github.com/openstack/nova/blob/f534495a427d1683bc536cf003ec02edbf6d8a45/nova/objects/compute_node.py#L199-L207 is only intended to be executed if on nova-scheduler | 14:10 |
bauzas | mriedem: because https://github.com/openstack/nova/blob/f534495a427d1683bc536cf003ec02edbf6d8a45/nova/objects/compute_node.py#L198 will always tell you a value if you're on nova-compute | 14:10 |
mriedem | umm | 14:10 |
mriedem | that will also always tell you a value if you're on nova-scheduler | 14:11 |
mriedem | b/c conf is global | 14:11 |
mriedem | and the value defaults to 0.0 in config | 14:11 |
bauzas | if executed in separate workers, CONF.cpu_allocation_ratio wouldn't be defined for nova-scheduler | 14:12 |
bauzas | oh wait, sec | 14:12 |
mriedem | it doesn't need to be defined in config, we have a default | 14:12 |
mriedem | which is global | 14:12 |
mriedem | anyway, that doesn't really matter for this bug | 14:13 |
mriedem | the compute reports the inventory to placement and is reporting 0.0 allocation ratios | 14:13 |
cdent | mriedem: one thing that remains unclear for me (becuase apparently I can't read python code) is why the second inventory PUT (the one with the 0.0) is happening at all (and so soon) | 14:14 |
*** adrianc has joined #openstack-nova | 14:14 | |
*** adrianc has quit IRC | 14:17 | |
bauzas | mriedem: I think your working theory you stated in the bug comment is valid | 14:18 |
bauzas | I'm trying to wrap my head around on exactly when we pull the DB values | 14:18 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Revert "Update resources once in update_available_resource" https://review.openstack.org/598176 | 14:19 |
*** munimeha1 has joined #openstack-nova | 14:19 | |
*** alexchadin has joined #openstack-nova | 14:20 | |
bauzas | mriedem: I think we began having problems with https://github.com/openstack/nova/commit/7b95b4d60726b6c8d0e0fe939c408a91ada79e0c | 14:21 |
bauzas | I'm not saying it's all the bugs cause | 14:21 |
*** jistr|call is now known as jistr | 14:21 | |
bauzas | just that it implicitly creates a dependency on us calling .save() after it | 14:21 |
bauzas | because if not, we would then have 0.0 values | 14:22 |
mriedem | well, it seems pretty obvious to me that we just shouldn't set 0.0 values on the compute node record in _copy_resources if the config is still just 0.0 | 14:23 |
bauzas | mriedem: sure, we could leave the defaults | 14:24 |
bauzas | because once we pull the object, the facade fixes it for free | 14:24 |
mriedem | except we're no longer pulling the object in the RT | 14:27 |
mriedem | which is i think the problem | 14:27 |
zigo | mriedem: Sure, let me add it. | 14:27 |
zigo | mriedem: Yes, I package tags. | 14:28 |
mriedem | zigo: you don't want to add it | 14:29 |
bauzas | mriedem: sorry, you mean we're getting the CPU, RAM and disk values out of the DB directly ? | 14:29 |
bauzas | I thought we were still getting by id ? | 14:29 |
mriedem | zigo: i'm asking because https://review.openstack.org/#/c/520024/ is in master only, not rocky, but it looks like the culprit of the failure - but that wouldn't then explain how you'd be failing in rocky | 14:29 |
bauzas | me looks at https://github.com/openstack/nova/blob/f534495a427d1683bc536cf003ec02edbf6d8a45/nova/compute/resource_tracker.py#L85 | 14:30 |
zigo | mriedem: I have rc2 packaged, not rc3. | 14:31 |
zigo | Maybe I should update? | 14:31 |
mriedem | yeah probably | 14:32 |
mriedem | or just the GA | 14:32 |
mriedem | which was released today? | 14:32 |
mriedem | https://github.com/openstack/nova/tree/18.0.0 | 14:32 |
mriedem | yeah 20 minutes ago http://git.openstack.org/cgit/openstack/nova/tag/?h=18.0.0 | 14:32 |
zigo | Ah, right ! | 14:35 |
zigo | Doing that. | 14:36 |
*** tbachman has joined #openstack-nova | 14:36 | |
cdent | I'm thinking that the positive way of looking at this is that the second _update has been masking a nasty bug for a long time | 14:36 |
* cdent takes more happy pills | 14:36 | |
*** _hemna has joined #openstack-nova | 14:39 | |
mriedem | oh i'm not surprised that there would be a big pile of tight coupling tape holding this all together | 14:41 |
mriedem | and why i didn't want to merge that change before the GA | 14:42 |
*** alexchadin has quit IRC | 14:42 | |
*** jchhatbar has quit IRC | 14:43 | |
*** vivsoni_ has quit IRC | 14:43 | |
*** alexchadin has joined #openstack-nova | 14:44 | |
*** lei-zh has joined #openstack-nova | 14:46 | |
*** r-daneel has joined #openstack-nova | 14:47 | |
bauzas | mriedem: again, the more I read, the more I think we possibly had the original problem once we had https://github.com/openstack/nova/commit/7b95b4d60726b6c8d0e0fe939c408a91ada79e0c | 14:49 |
bauzas | and .update() was just hiding it | 14:49 |
bauzas | mriedem: so, do you want me to modify the above and only set the values if not 0.0 ? | 14:50 |
bauzas | or are you working on this ? | 14:50 |
*** vivsoni has joined #openstack-nova | 14:51 | |
bauzas | disclaimer: looking at a long list of openstack-dev threads | 14:52 |
*** ttsiouts has quit IRC | 14:55 | |
mriedem | Kevin_Zheng: yikun: you might be interested in https://review.openstack.org/#/c/591976/ | 15:00 |
mriedem | bauzas: i'm working it | 15:00 |
bauzas | okay | 15:00 |
Kevin_Zheng | Got it | 15:01 |
*** ttsiouts has joined #openstack-nova | 15:01 | |
*** luksky11 has quit IRC | 15:02 | |
zigo | Nova 18.0.0 built, doing a recheck... | 15:02 |
openstackgerrit | Merged openstack/nova master: Fix soft deleting vm fails after "nova resize" vm https://review.openstack.org/546920 | 15:03 |
*** udesale has quit IRC | 15:04 | |
melwitt | . | 15:04 |
*** tbachman has quit IRC | 15:07 | |
*** pcaruana has quit IRC | 15:12 | |
*** r-daneel has quit IRC | 15:13 | |
*** r-daneel_ has joined #openstack-nova | 15:13 | |
melwitt | stephenfin: on https://review.openstack.org/595592, if the bug is new for rocky, how did moshele run into it on OSP13 (queens)? | 15:16 |
*** r-daneel_ is now known as r-daneel | 15:16 | |
stephenfin | melwitt: If he's using OSP, then it's because we backported it downstream | 15:16 |
stephenfin | Even if not, I'd imagine it's a feature backport | 15:16 |
*** dpawlik has quit IRC | 15:18 | |
melwitt | stephenfin: ohh, ok. I didn't think of that, backport of a feature | 15:18 |
mriedem | efried: does any kind of nova/cyborg integration actually exist to do anything with servers in rocky/ | 15:30 |
mriedem | ? | 15:30 |
mriedem | b/c https://docs.openstack.org/releasenotes/cyborg/rocky.html says it does | 15:31 |
*** macza has joined #openstack-nova | 15:31 | |
mriedem | 1) rocky release notes talk about queens specs | 15:31 |
mriedem | 2) it sounds like "we completed specs" rather than "we have functioning code" | 15:31 |
Kevin_Zheng | mriedem: you made me laugh | 15:32 |
mriedem | i just don't want people showing up in -nova saying "hey why can't i attach fpgas to my vm?" | 15:33 |
efried | mriedem: No, cyborg has nothing in nova atm | 15:38 |
efried | oh, yeah, that's poorly worded. | 15:38 |
Kevin_Zheng | mriedem: we were talking today, about the batch when listing patch | 15:41 |
Kevin_Zheng | The batch size could vary depending on sort key and dir | 15:42 |
Kevin_Zheng | Like for sort with uuid | 15:42 |
*** gyee has joined #openstack-nova | 15:42 | |
Kevin_Zheng | It could be evenly distributed | 15:43 |
dansmith | Kevin_Zheng: you mean the *optimal* batch size? | 15:43 |
Kevin_Zheng | dansmith: yeah | 15:43 |
dansmith | Kevin_Zheng: obviously if you're sorting by uuid it should be fairly evenly distributes and sorting by other things could be massively less well-distributed, but I'm not sure how we could (efficiently) optimize that at runtime | 15:44 |
dansmith | Kevin_Zheng: do you have ideas? | 15:44 |
Kevin_Zheng | I was thinking a tool that analyzes dB data | 15:45 |
Kevin_Zheng | And feed to nova periodically | 15:45 |
Kevin_Zheng | But that seems to much :) | 15:45 |
dansmith | yeah :) | 15:45 |
dansmith | it likely varies by tenant, sort_key, cloud layout, scheduler weights, etc | 15:46 |
Kevin_Zheng | Just came up when we introduced the new approach to our product team | 15:46 |
dansmith | it'd be hard to pin that down except for a single-tenant cloud I think | 15:46 |
sean-k-mooney | mriedem: today i belive you can use cyborg to program a pci device and then you can use nova to pass it through via a pci passthrough flavor alisa but there is no way to force landing on the host with the device you just programed | 15:46 |
Kevin_Zheng | I’d say it is a powerful tool:) | 15:46 |
dansmith | Kevin_Zheng: if the goal is to get larger batches from cells likely to have many results, we could do things like scale up the batch size each time you hit a cell again | 15:47 |
dansmith | Kevin_Zheng: so that if your query is likely to get most results from one cell, we get $batch_size, then $batch_size*2, then $batch_size*4, etc | 15:48 |
dansmith | but I think I'd want to see a benchmark showing that as worthwhile before I approved it, | 15:48 |
Kevin_Zheng | Hmm | 15:48 |
dansmith | because I expect that since the db query time is so small compared to the processing time, I'm not sure it matters that much (even your hyper-optimized batch sizing tool :) | 15:48 |
Kevin_Zheng | That could be a good way | 15:48 |
Kevin_Zheng | Yeah, they are just guessing as always | 15:49 |
*** luksky11 has joined #openstack-nova | 15:49 | |
dansmith | Kevin_Zheng: yeah :D | 15:49 |
dansmith | Kevin_Zheng: it's common trap: One big optimization on batch size gives 60% improvement, so assume there are more 60% improvements to be gained through hyper-optimization :) | 15:50 |
openstackgerrit | Merged openstack/nova-specs master: VMware: add support for live migration https://review.openstack.org/598163 | 15:50 |
mriedem | leave some optimizations for the enterprise fellas | 15:59 |
*** hamzy has quit IRC | 16:00 | |
mriedem | gibi: you were +2 on this before i robustified the test per mel's prodding https://review.openstack.org/#/c/588943/ | 16:01 |
Kevin_Zheng | maybe left some place for them to be able to do that, like a call to my powerful tool backend :P | 16:03 |
mriedem | is the toronto lab already working on that? | 16:04 |
mriedem | research people gotta get grant money somehow | 16:05 |
sean-k-mooney | mriedem: i will likely be fixing a few things in cyborg in the near future. do you want me to fix the releases note regarding nova inetgration | 16:05 |
mriedem | need i remind everybody https://www.openstack.org/videos/vancouver-2018/revisiting-scalability-and-applicability-of-openstack-placement-1 | 16:05 |
mriedem | sean-k-mooney: i guess? | 16:05 |
mriedem | revising release notes is sometimes a tricky business | 16:06 |
sean-k-mooney | well we update specs retroactivly i done really see release notes as any different | 16:06 |
mriedem | because release notes are built from git history | 16:07 |
mriedem | specs are not | 16:07 |
stephenfin | sean-k-mooney: Any reason we don't squash these? https://review.openstack.org/#/q/topic:bug/1759420+(status:open+OR+status:merged) | 16:07 |
sean-k-mooney | stephenfin: i wanted to specifcally demonstrate that the behavior was wrong | 16:07 |
sean-k-mooney | other then that no | 16:07 |
Kevin_Zheng | No, Xian lab can work on that:) | 16:08 |
stephenfin | sean-k-mooney: I'm guessing if we reverted the functional part then we'd see the test fail, right? Any chance you could squash them? | 16:08 |
*** dpawlik has joined #openstack-nova | 16:09 | |
sean-k-mooney | stephenfin: sure but i need to go get my car NCT tested so ill do it later this evening/tomorow | 16:10 |
stephenfin | sean-k-mooney: all good | 16:10 |
sean-k-mooney | anything else you want me to change while im doing it? | 16:10 |
sean-k-mooney | stephenfin: i might add mel's notes as comments too | 16:11 |
mriedem | mdbooth: are you ok with the wording here? https://review.openstack.org/#/c/596492/ | 16:11 |
sean-k-mooney | anyway got to run. | 16:12 |
*** alexchadin has quit IRC | 16:13 | |
*** dpawlik has quit IRC | 16:13 | |
cdent | sean-k-mooney: my MOT (which I guess is the same thing) is tomorrow and it's almost certainly going to fail | 16:14 |
*** tbachman has joined #openstack-nova | 16:15 | |
stephenfin | sahid: I've still got open comments on https://review.openstack.org/#/c/532168/ | 16:18 |
*** ccamacho has quit IRC | 16:19 | |
*** gaoyan has joined #openstack-nova | 16:27 | |
stephenfin | lyarwood: Can I move this to MODIFIED too? I'm not sure what the process is for non-hotfixes as I didn't have to kick off any builds myself https://bugzilla.redhat.com/show_bug.cgi?id=1187945 | 16:27 |
openstack | bugzilla.redhat.com bug 1187945 in openstack-nova "[RFE] Take into account NUMA locality of physical NICs when plugging instance VIFS from Neutron networks" [Urgent,Post] - Assigned to sfinucan | 16:27 |
mnaser | so i never ended up doing the full clean up from the stale cell stuff | 16:29 |
*** eharney has quit IRC | 16:29 | |
mnaser | but if i have instances with an instance_mapping entry, no build_request, they don't exist in any cells (cell0 or anything else), i can just drop the instance_mapping entry to get rid of it from the listing? | 16:29 |
*** gaoyan has quit IRC | 16:29 | |
lyarwood | stephenfin: ^_^ | 16:29 |
dansmith | mnaser: yeah | 16:30 |
dansmith | mnaser: that should be the case for any instances you've deleted and then purged from the db | 16:30 |
dansmith | if you've done that | 16:30 |
*** moshele has joined #openstack-nova | 16:30 | |
dansmith | recently archive started nuking the BR at least | 16:30 |
mnaser | dansmith: yeah they're not even purged, cell_id = NULL too | 16:30 |
dansmith | not sure about the mapping | 16:30 |
dansmith | oh okay well, if they're really gone there's no need for the mapping | 16:30 |
stephenfin | lyarwood: 🙈 | 16:30 |
mnaser | this was a whole thing related to the adding entries into nova_api in a single transaction | 16:31 |
mnaser | which i think i put a patch that i *think* works but i dont know how to test that it works in a single transaction | 16:31 |
*** sambetts is now known as sambetts|afk | 16:31 | |
mnaser | https://review.openstack.org/#/c/586824/1 was supposed to be backportable interim solution to avoid listing stuff that shouldnt be there and https://review.openstack.org/#/c/586742/2 was the more fundamental fix but i havent had time to look over them more | 16:32 |
*** gbarros has quit IRC | 16:35 | |
*** bnemec has quit IRC | 16:35 | |
*** lei-zh has quit IRC | 16:37 | |
*** rmart04 has quit IRC | 16:41 | |
*** gbarros has joined #openstack-nova | 16:41 | |
melwitt | sahid: your review would be appreciated on this bug fix for handling disk_bus for root disk https://review.openstack.org/584999 | 16:42 |
sahid | stephenfin: surprising that it I did not noticed them :) | 16:42 |
sahid | melwitt: sure i will do that | 16:43 |
melwitt | thanks | 16:43 |
*** r-daneel_ has joined #openstack-nova | 16:47 | |
*** r-daneel has quit IRC | 16:47 | |
*** r-daneel_ is now known as r-daneel | 16:47 | |
*** gouthamr has joined #openstack-nova | 16:49 | |
*** gbarros has quit IRC | 16:49 | |
*** sahid has quit IRC | 16:55 | |
*** ccamacho has joined #openstack-nova | 16:58 | |
*** ttsiouts has quit IRC | 16:59 | |
*** davidsha has quit IRC | 16:59 | |
*** ttsiouts has joined #openstack-nova | 17:00 | |
*** dtantsur is now known as dtantsur|afk | 17:04 | |
*** ttsiouts has quit IRC | 17:04 | |
*** mriedem is now known as mriedem_afk | 17:11 | |
*** bnemec has joined #openstack-nova | 17:17 | |
cfriesen | in nova/compute/flavors.py we call "from nova.api.validation import parameter_types". This appears to be really expensive (~6 seconds in a recent test) due to the regex stuff. One possibility would be to do the import right before the flavor creation so that it doesn't impact all nova processes. Thoughts? | 17:19 |
openstackgerrit | Merged openstack/nova master: reshaper gabbit: Nix comments re doubled max_unit https://review.openstack.org/597220 | 17:21 |
*** ccamacho has quit IRC | 17:24 | |
*** eharney has joined #openstack-nova | 17:25 | |
*** jpena is now known as jpena|away | 17:27 | |
*** hamzy has joined #openstack-nova | 17:28 | |
openstackgerrit | Merged openstack/nova master: Fix race condition in reshaper handler https://review.openstack.org/596497 | 17:35 |
*** holser_ has quit IRC | 17:35 | |
sean-k-mooney | melwitt: im just back, am would you like me to squash those two disk bus patches together? | 17:37 |
sean-k-mooney | melwitt: i used the functional regression style partly to prove to my self that the test case was corret since you pointed out my orginial test case worked without the patch applied | 17:38 |
melwitt | sean-k-mooney: not right now, maybe only if you need to respin. I don't have a strong opinion about it, just pointing it out | 17:38 |
melwitt | yeah, understood | 17:38 |
sean-k-mooney | cdent: just got back and ya MOT and NCT are basically the same. | 17:42 |
sean-k-mooney | cdent: happily in my case it passed the second time. | 17:43 |
sean-k-mooney | cdent: that said i dont drive my car enough i have only done 8000KM/5000 miles in the last two years... | 17:44 |
cdent | sean-k-mooney: my car is 21 years old. the emissions check is going to be an issue, I fear. Apparently the trick is to take it in to the test good and hot after racing around like a crazy person | 17:44 |
*** jamesdenton has quit IRC | 17:44 | |
sean-k-mooney | cdent: if its 21 years old it shoudl qualify as a vintage car now right? | 17:45 |
cdent | hmm, that's a good point. | 17:45 |
* cdent checks | 17:45 | |
sean-k-mooney | i cant remeber what the cut off is in ireland but there is an emaitions cut off at some point in ireland where provided you have converted from lead based fule to unleeded the co2 emmsions are basically ignored | 17:46 |
openstackgerrit | Merged openstack/nova master: Report client: get_allocations_for_provider_tree https://review.openstack.org/584648 | 17:46 |
*** eharney has quit IRC | 17:46 | |
*** tssurya has quit IRC | 17:46 | |
cdent | sean-k-mooney: it looks like it may be 40 years here :( | 17:47 |
sean-k-mooney | cdent: well hopfully it will keep running that long :) | 17:50 |
cdent | i can only try | 17:53 |
cdent | sean-k-mooney: in other vaguely related to sean-k-mooney news: I'm sending in my applicaiton for an irish passport today | 17:53 |
sean-k-mooney | oh. cutting it a little close with brexit no? | 17:54 |
cdent | i had to get a hold of my mother's birth certificate | 17:54 |
*** r-daneel has quit IRC | 17:55 | |
*** gyee has quit IRC | 17:56 | |
sean-k-mooney | ya i love that to get a pass port which is ment to be the most secure id you can get in the contry you need a copy of your birth cert which is the only id i have that cant even be used to by alcohol | 17:56 |
*** eharney has joined #openstack-nova | 17:56 | |
cdent | \o/ | 17:56 |
sean-k-mooney | i know they use it in thery to prove that you our your parent are entiled to citezenship in this case but still | 17:56 |
*** moshele has quit IRC | 17:58 | |
*** jamesdenton has joined #openstack-nova | 17:59 | |
*** r-daneel has joined #openstack-nova | 17:59 | |
sean-k-mooney | ok time for food. laters o/ | 18:00 |
*** r-daneel_ has joined #openstack-nova | 18:08 | |
*** r-daneel has quit IRC | 18:09 | |
*** r-daneel_ is now known as r-daneel | 18:09 | |
*** tzumainn has joined #openstack-nova | 18:18 | |
melwitt | dansmith or jaypipes: could one of you hit this to move rocky implemented specs? https://review.openstack.org/592622 | 18:19 |
*** moshele has joined #openstack-nova | 18:26 | |
openstackgerrit | Merged openstack/nova-specs master: Move rocky implemented specs https://review.openstack.org/592622 | 18:36 |
cfriesen | sean-k-mooney: stephenfin: either of you care to take a look at review.openstack.org/588657 ? not my patch, I just think it's useful and it's stalled | 18:38 |
*** toabctl has quit IRC | 18:51 | |
tzumainn | hi! I'm working with ironic, and running into an issue where, after enrolling baremetal nodes, I can see them in the compute_nodes database table but they never get processed or whatever and show up when I run 'openstack hypervisor list' | 19:00 |
tzumainn | the nova-compute.log does have this error, which is suspicious: | 19:00 |
tzumainn | 018-08-30 17:00:51.142 7 ERROR nova.compute.manager [req-73ba9d4b-b51d-4ab7-88c8-5fc3f27fd89e - - - - -] Error updating resources for node 0e57\ | 19:00 |
tzumainn | 05cc-e872-49aa-aff4-1a91278b5cb3.: NotImplementedError: Cannot load 'id' in the base class | 19:00 |
tzumainn | 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager Traceback (most recent call last): | 19:00 |
tzumainn | 2018-08-30 17:00:51.142 7 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7729, in _update_av\ | 19:00 |
tzumainn | ailable_resource_for_node | 19:00 |
tzumainn | anyone have experience with this? sorry if the questions are vague, I'm a bit new to this | 19:00 |
*** moshele has quit IRC | 19:07 | |
*** toabctl has joined #openstack-nova | 19:07 | |
*** awaugama has quit IRC | 19:08 | |
*** mriedem has joined #openstack-nova | 19:11 | |
*** mriedem_afk has quit IRC | 19:12 | |
*** r-daneel has quit IRC | 19:17 | |
mriedem | melwitt: jungleboyj: is there a specific nova/cinder etherpad for the ptg? i just see a topic section for cinder on thursday in the nova ptg | 19:17 |
melwitt | mriedem: not that I know of. I suggested people add topics on the nova ptg etherpad in the section in my email to the ML | 19:18 |
melwitt | we can have a separate etherpad to link to if you want | 19:18 |
mriedem | ack | 19:21 |
openstackgerrit | Eric Fried proposed openstack/nova master: Fix reshaper report client functonal test nits https://review.openstack.org/598330 | 19:21 |
melwitt | tzumainn: can you pastebin the full traceback? | 19:22 |
tzumainn | melwitt, it's at http://pastebin.test.redhat.com/639596 | 19:23 |
melwitt | thanks | 19:25 |
mriedem | how about a public paste? | 19:27 |
mriedem | paste.openstack.org or gist.github.com | 19:28 |
tzumainn | whoops, sorry! | 19:28 |
tzumainn | melwitt, http://paste.openstack.org/show/729177/ | 19:29 |
melwitt | the error is saying there's the 'id' field missing from the ComputeNode object, which means it wasn't created/obtained from the database (where the 'id' field comes from). but in the code, I see a cn.create() before _setup_pci_tracker is called, so 'id' should be populated | 19:29 |
mriedem | cdent: vmware ci might be hitting the same scheduling issues? http://207.189.188.190/logs/16/270116/12/check-vote/ext-nova-zuul/7a47690/ | 19:29 |
mriedem | lots of novalidhost in there | 19:30 |
mriedem | that's on the live migration for vmware change | 19:30 |
melwitt | tzumainn: what release is this? | 19:30 |
tzumainn | melwitt, this is rocky | 19:30 |
melwitt | ok | 19:30 |
mriedem | i was going to say https://review.openstack.org/#/c/520024/ but that's not in rocky | 19:31 |
melwitt | heh, that patch again | 19:33 |
mriedem | it's in the same code path | 19:34 |
*** dpawlik has joined #openstack-nova | 19:34 | |
mriedem | by the time we call _setup_pci_tracker there in that block on L563 we should have an existing instance, either created from the RT or pulled from the DB | 19:36 |
mriedem | *existing compute node record | 19:36 |
melwitt | yeah, according to the trace, the ComputeNode object in self.compute_nodes has no 'id' field populated | 19:36 |
melwitt | so there must be a way we're adding things to self.compute_nodes that are object shells, not gotten from the DB or newly created | 19:37 |
*** jpena|away is now known as jpena|off | 19:38 | |
*** awaugama has joined #openstack-nova | 19:40 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Default AZ for instance if cross_az_attach=False and checking from API https://review.openstack.org/469675 | 19:41 |
cdent | thanks mriedem will do some poking and prodding | 19:49 |
*** Sundar has joined #openstack-nova | 19:49 | |
openstackgerrit | Chris Dent proposed openstack/nova master: VMware: Live migration of instances https://review.openstack.org/270116 | 19:49 |
mriedem | tzumainn: which virt driver? libvirt? ironic? | 19:52 |
mriedem | and are you doing anything when this happens? | 19:52 |
melwitt | it's ironic | 19:52 |
mriedem | like, is this on start of nova-compute or during a periodic task? | 19:52 |
mriedem | hmmm, is a rebalance happening? | 19:52 |
tzumainn | mriedem, ah, this is ironic - I've just enrolled four nodes, and am trying to figure out why they don't show up in 'openstack hypervisor list' | 19:53 |
mriedem | they won't show up in openstack hypervisor list until you've "discovered" them | 19:53 |
mriedem | see the nova-manage cell_v2 discover_hosts command | 19:54 |
mriedem | you'll need to discover by service | 19:54 |
*** hamzy has quit IRC | 19:54 | |
*** r-daneel has joined #openstack-nova | 19:55 | |
tzumainn | ah, okay! I wasn't aware - I'm following the instructions in http://tripleo.org/install/advanced_deployment/baremetal_overcloud.html which I guess are out of date | 19:55 |
*** hamzy has joined #openstack-nova | 19:55 | |
melwitt | the traceback was unrelated then. still don't see how the condition of no 'id' on a ComputeNode in self.compute_nodes can happen (thought it obviously can happen somehow) | 19:56 |
*** ttsiouts has joined #openstack-nova | 19:56 | |
mriedem | yeah i don't really know how that's being hit | 19:56 |
mriedem | tzumainn: no idea re tripleo deployment, they have their own irc channel for that | 19:57 |
melwitt | remove_node removes things from the dict if orphaned, all of the setting of self.compute_nodes seem to be covered by actual DB gets or creates. weird | 19:57 |
mriedem | plus like 50 red hat cores | 19:57 |
tzumainn | mriedem, haha, yep - I started out talking with some ironic folks, and confusion all around has led me here : ) | 19:58 |
tzumainn | thanks for the information, I really appreciate it! | 19:58 |
mriedem | maybe related to https://review.openstack.org/#/c/587922/ ? | 19:59 |
jungleboyj | mriedem: I don't have a separate one. Asked people to mark the topics and then I was going to collect them up. | 20:00 |
mriedem | remove_node should likely be in the same semaphore as _update_available_resource_for_node | 20:00 |
mriedem | i guess i already said that https://review.openstack.org/#/c/587922/2/nova/compute/resource_tracker.py | 20:00 |
mriedem | so self.old_resources will default a ComputeNode object if an entry isn't in the dict... | 20:01 |
zigo | mriedem: Should I try your patch at https://review.openstack.org/#/c/598176/ and report the result? | 20:02 |
melwitt | does self.compute_nodes refer to self.old_resources at all? | 20:02 |
mriedem | they are compared in _resource_change | 20:03 |
mriedem | to determine if we should call ComputeNode.save() | 20:03 |
mriedem | zigo: we aren't going to ship that revert i don't think so probably would be a waste of your time | 20:03 |
zigo | mriedem: If it's only a temporary fix that I can use to validate all of Rocky, that's nice already, then I can still remove the patch... | 20:04 |
zigo | Hum... | 20:04 |
zigo | It doesn't apply at all anyway. | 20:04 |
zigo | mriedem: This wasn't in rocky. | 20:05 |
mriedem | cdent: melwitt: was also wondering if this somehow is contributing to the allocation ratio bug https://review.openstack.org/#/c/518294/ | 20:05 |
mriedem | but that was in queens | 20:05 |
mriedem | zigo: right | 20:05 |
*** mchlumsky has quit IRC | 20:05 | |
melwitt | ack | 20:05 |
zigo | mriedem: Could you see something with the added logs in https://review.openstack.org/#/c/597175/ ? | 20:06 |
zigo | Both your patches were added there ... | 20:06 |
zigo | (the ones for logging...) | 20:06 |
mriedem | i added more debug logs this morning after a recreate in the xen ci, was just about to check those results | 20:08 |
mriedem | tzumainn: you might want to report a nova bug regardless so we don't lose track of what you hit | 20:08 |
mriedem | i'm not sure *how* you're hitting it, but that seems to be the theme this week with all bugs | 20:08 |
mriedem | "how in the hell is this even possible?" | 20:09 |
tzumainn | mriedem, hahaha - okay, I'll do that, thanks! | 20:09 |
openstackgerrit | Merged openstack/nova master: Report client: _reshape helper, placement min bump https://review.openstack.org/585034 | 20:09 |
mriedem | dansmith: is it weird that ComputeNode.save() doesn't check to see if 'id' is changed? | 20:10 |
mriedem | or not set? | 20:10 |
dansmith | create or save? | 20:10 |
mriedem | i guess if it weren't set we'd blow up | 20:10 |
mriedem | save | 20:10 |
melwitt | I was thinking save() would just fail if there's no id, because then how could it find the thing to update | 20:10 |
dansmith | not sure we usually check that it's set on save, but we could | 20:10 |
dansmith | on create we usually check to avoid re-create | 20:11 |
mriedem | melwitt: right we'd blow up if id wasn't set on save | 20:11 |
melwitt | yeah | 20:11 |
mriedem | yeah we check on create | 20:11 |
mriedem | i was wondering if id changed though and we saved | 20:11 |
mriedem | i guess we don't really check for that anywhere | 20:11 |
dansmith | we pop id out of the changes, | 20:11 |
melwitt | even if we did, that doesn't explain the complete lack of an 'id' on a ComputeNode object in self.compute_nodes | 20:12 |
dansmith | but I guess we try to save anyway even if that was the only thing | 20:12 |
melwitt | I still don't see how we could lose that, even with the self.old_resources compare | 20:12 |
dansmith | if id is actually not set then it's either an object created with no id (not from the db) or someone del'd it off an objet | 20:13 |
mriedem | i don't either, i was wondering if we were getting a blank ComputeNode from old_resources which uses defaultdict and somehow shoved that blank one into self.compute_nodes | 20:13 |
mriedem | but we don't | 20:13 |
melwitt | yeah | 20:13 |
mriedem | there is a very small window where self.compute_nodes could have a ComputeNode in it without an id | 20:16 |
mriedem | in _init_compute_node | 20:16 |
mriedem | self.compute_nodes[nodename] = cn | 20:16 |
mriedem | cn.create() | 20:16 |
mriedem | b/c cn.create() is what sets the id | 20:16 |
mriedem | but, that code is all in a lock on the same host when we call update_available_resource | 20:16 |
mriedem | so not sure how anything could race and hit that | 20:16 |
melwitt | yeah, I wondered about that | 20:16 |
mriedem | tzumainn: did you by chance have multiple nova-compute services running on the same host? | 20:17 |
mriedem | no that still wouldn't do this | 20:17 |
mriedem | b/c the compute nodes map is in memory | 20:18 |
mriedem | i give up | 20:18 |
tzumainn | mriedem, I only have one, in any case | 20:18 |
tzumainn | mriedem, no worries, thanks for taking a look : ) | 20:18 |
mriedem | so uh, the xen ci passed this time with the debug patch https://review.openstack.org/#/c/597613/ | 20:20 |
mriedem | that's nice and consistent | 20:20 |
*** erlon has quit IRC | 20:21 | |
melwitt | great | 20:23 |
*** eharney has quit IRC | 20:24 | |
*** mugsie has quit IRC | 20:25 | |
melwitt | keeping in the theme of Impossible Bug Week | 20:26 |
Sundar | melwitt: I am proposing a session for Cyborg/Nova to sort out os-acc. The main session could be in Cyborg time (Mon/Tue) and hopefully placement folks would attend. It may be helpful to get some time on Nova schedule as well to get everybody on board. What do you think? | 20:26 |
cdent | sorry mriedem, I'm sort of afk: if it was was that patch from back in queens, then the twisted narrow passageways are way twisted and have a lot of explaining to do to say why it's only showing up now | 20:26 |
*** tzumainn has quit IRC | 20:27 | |
melwitt | Sundar: do you think we need two sessions? we could just join the monday or tuesday session since nova doesn't start until wednesday | 20:29 |
cdent | mriedem: can you think of a way we could model the issue in a functional test? the feedback latency is hurting my head | 20:29 |
mriedem | cdent: given the xen ci is now passing, i don't really know | 20:30 |
mriedem | and why this isn't failing with the libvirt driver in master CI, again don't know | 20:31 |
cdent | fun fun | 20:31 |
mriedem | efried: we now have a new ERROR in the n-cpu logs on startup http://logs.openstack.org/98/584598/21/check/tempest-full/85acbda/controller/logs/screen-n-cpu.txt.gz?level=TRACE#_Aug_29_21_43_10_675029 | 20:31 |
mriedem | efried: b/c we hit ^ before the resource provider is created | 20:31 |
mriedem | new from https://review.openstack.org/#/c/584598/ | 20:31 |
dansmith | hmm, I got that disk not found one the other day locally | 20:32 |
dansmith | I thought it was residue from my old machine | 20:32 |
dansmith | is that because we're re-raising too? | 20:33 |
mriedem | https://bugs.launchpad.net/nova/+bug/1789998 | 20:34 |
openstack | Launchpad bug 1789998 in OpenStack Compute (nova) "ResourceProviderAllocationRetrievalFailed ERROR log message on fresh n-cpu startup" [Low,Triaged] | 20:34 |
mriedem | the DiskNotFound during periodic is usually b/c we're deleting an instance on that host at the same time | 20:34 |
mriedem | http://logs.openstack.org/98/584598/21/check/tempest-full/85acbda/controller/logs/screen-n-cpu.txt.gz#_Aug_29_21_51_13_055225 | 20:35 |
dansmith | okay this was on startup for me, which prevented it from every reporting initial inventory to placement | 20:35 |
mriedem | right before the DiskNotFound | 20:35 |
mriedem | Aug 29 21:51:13.055225 ubuntu-xenial-rax-iad-0001643010 nova-compute[16853]: INFO nova.virt.libvirt.driver [None req-af17b138-d942-4bbf-bf67-4d03b492ed3c tempest-ImagesTestJSON-1980657675 tempest-ImagesTestJSON-1980657675] [instance: 21227895-e216-463f-8e76-998fe637bab2] Deletion of /opt/stack/data/nova/instances/21227895-e216-463f-8e76-998fe637bab2_del complete | 20:35 |
dansmith | but I had what I think was a stale disk image in my instances directory | 20:35 |
mriedem | Aug 29 21:51:13.167422 ubuntu-xenial-rax-iad-0001643010 nova-compute[16853]: ERROR nova.compute.manager [None req-6f4285cc-ce12-4c29-b879-99fdaaae59be None None] Error updating resources for node ubuntu-xenial-rax-iad-0001643010.: DiskNotFound: No disk at /opt/stack/data/nova/instances/21227895-e216-463f-8e76-998fe637bab2/disk | 20:35 |
dansmith | nuking that fixed it, but I just had to do one more thing and then killed it all | 20:35 |
mriedem | yeah on startup would be a different weirdness | 20:35 |
Sundar | melwitt: If we can settle everything in one session, that would be great. Given the quantum of reviews, we want to make sure that there is enough convergence to close the spec https://review.openstack.org/#/c/577438 after the PTG. | 20:37 |
Sundar | Could we keep a 30-minute placeholder? | 20:38 |
*** mugsie has joined #openstack-nova | 20:39 | |
*** tzumainn has joined #openstack-nova | 20:40 | |
mriedem | so on the 2nd go around in the RT.update_available_resource in this xen CI run, we see the RT say the tracked compute node has changed: | 20:40 |
mriedem | http://paste.openstack.org/show/729180/ | 20:40 |
mriedem | b/c of _copy_resources updating the *_allocation_ratio values to 0.0 | 20:40 |
mriedem | Aug 30 08:03:09.458344 dsvm-devstack-citrix-lon-nodepool-1379396 nova-compute[24292]: INFO nova.compute.resource_tracker [None req-9b1b9924-b89e-4a03-9a69-c9fff17594e3 None None] ComputeNode.cpu_allocation_ratio changed from 16.0 to 0.0 in _copy_resources. | 20:42 |
*** harlowja has joined #openstack-nova | 20:43 | |
melwitt | Sundar: 30-minute placeholder for monday or tuesday? | 20:44 |
melwitt | Sundar: also, I think we should have the nova bits of the interaction proposed to nova-specs for review, that way we can more easily organize our review on that part | 20:46 |
*** tbachman has quit IRC | 20:46 | |
Sundar | 30 min in Nova's schedule, anytime you want. Cyborg should allocate at least 1 hour on mon/tue. | 20:46 |
*** rmart04 has joined #openstack-nova | 20:47 | |
Sundar | The os-acc spec as a whole is Nova/Cyborg, so do we want another spec? | 20:47 |
melwitt | Sundar: okay, so you're saying you think we need two sessions. I can try to find a 30-minute slot on our schedule | 20:48 |
melwitt | Sundar: we need a nova spec for the proposed changes to the nova code, so the nova team can review those proposed changes | 20:48 |
Sundar | melwitt: Thanks a lot. OK, will propose a Nova spec too. BTW, I have been interacting a lot with efried and any Nova developer who gives feedback on os-acc spec. | 20:50 |
melwitt | nova meeting in 10 minutes | 20:50 |
melwitt | Sundar: thanks | 20:50 |
mriedem | cdent: just so you can rest easy tonight, i think we can safely confirm that we have some sort of weird multi-thread race with shared ProviderTree cache | 20:52 |
mriedem | https://bugs.launchpad.net/nova/+bug/1789654/comments/9 | 20:52 |
openstack | Launchpad bug 1789654 in OpenStack Compute (nova) "placement allocation_ratio initialized with 0.0" [High,In progress] - Assigned to Matt Riedemann (mriedem) | 20:52 |
melwitt | mriedem: so _copy_resources is bypassing the compute node normalization routine? | 20:52 |
mriedem | efried: ^ | 20:53 |
mriedem | melwitt: yes, and from my log digging in that comment it shows that if we hit this in the right order, we call ComputeNode.save() which will change the CN.cpu_allocation_ratio from 0.0 (from _copy_resources) back to 16.0 | 20:53 |
cdent | oh great, I love weird multi-thread races, especially when caches are involved | 20:53 |
mriedem | which goes to the RT._normalize_inventory_from_cn_obj method which puts 16.0 back into the inventory dict | 20:53 |
melwitt | mriedem: ah, ok. nice sleuthing | 20:54 |
mriedem | but there is clearly something else hitting ProviderTree.update_inventory at the same time that RT.update_available_resource is running | 20:54 |
*** takashin has joined #openstack-nova | 20:54 | |
mriedem | and i don't think it's coming from the RT | 20:54 |
mriedem | the only place in RT that we call ProviderTree.update_inventory is after driver.update_provider_tree, which isn't implemented for xen | 20:54 |
mriedem | so i think it's the SchedulerReportClient's provider tree cache | 20:55 |
cdent | I suspect (or at least hope) that efried will have some insight on the meanderings of the cache | 20:57 |
efried | I may, once I'm not trying to do several things at once. | 20:58 |
cdent | efried: this seems to be an unfortunately common problem | 20:59 |
cdent | let's all quit | 20:59 |
cdent | (that'll show em) | 20:59 |
mriedem | meeting in 1 min? | 20:59 |
melwitt | yes, I gave a 10 minute warning 9 minutes ago | 20:59 |
*** hamzy has quit IRC | 20:59 | |
*** awaugama has quit IRC | 21:02 | |
mriedem | far as i can tell the report client gets inventory like 500 times a second | 21:05 |
mriedem | it calls _refresh_and_get_inventory *a lot* | 21:05 |
mriedem | for every _ensure_resource_provider | 21:05 |
mriedem | where again, _ensure_resource_provider is less ensure and more "refresh the world" now | 21:05 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Log the operation when updating generation in ProviderTree https://review.openstack.org/597553 | 21:13 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add debug logs for when provider inventory changes https://review.openstack.org/597560 | 21:13 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add debug logs for when provider inventory changes https://review.openstack.org/597560 | 21:15 |
*** rmart04 has quit IRC | 21:16 | |
*** dpawlik has quit IRC | 21:17 | |
*** eharney has joined #openstack-nova | 21:17 | |
openstackgerrit | Dan Smith proposed openstack/nova master: WIP: Move conductor wait_until_ready() delay before manager init https://review.openstack.org/598353 | 21:18 |
dansmith | mriedem: can we see if this helps? ^ | 21:19 |
*** tbachman has joined #openstack-nova | 21:19 | |
*** ttsiouts has quit IRC | 21:19 | |
*** holser_ has joined #openstack-nova | 21:19 | |
*** takashin has left #openstack-nova | 21:20 | |
mriedem | we sure can | 21:20 |
*** ttsiouts has joined #openstack-nova | 21:20 | |
*** ttsiouts has quit IRC | 21:20 | |
*** ttsiouts has joined #openstack-nova | 21:20 | |
mriedem | i didn't know conductor_api.wait_until_ready existed | 21:21 |
mriedem | so see, i did need you | 21:22 |
dansmith | you need better grep skills, that's all | 21:22 |
mriedem | i even said in my patch after WPIing it something like "we should really wait until conductor is ready, but what makes it 'ready'" | 21:25 |
mriedem | i guess this | 21:25 |
melwitt | wait_until_ready, of course | 21:25 |
melwitt | (I didn't know about it either) | 21:25 |
mriedem | ConductorAPI.easy_button() | 21:25 |
mriedem | what i'd like to do, del nova.compute.resource_tracker | 21:26 |
*** luksky11 has quit IRC | 21:28 | |
melwitt | you and jaypipes and bauzas and everyone else | 21:28 |
dansmith | well, so far that patch is doing super awesome in the gate | 21:29 |
*** tzumainn has quit IRC | 21:30 | |
jaypipes | dansmith: which patch? | 21:30 |
jaypipes | damn it, I've got more reading back to do... | 21:31 |
dansmith | it's annoying that this didn't even capture some basic logs | 21:31 |
dansmith | unless something else just broke real bad | 21:32 |
dansmith | ...which is the case, woot. | 21:35 |
*** holser_ has quit IRC | 21:37 | |
*** ttsiouts has quit IRC | 21:37 | |
*** ttsiouts has joined #openstack-nova | 21:40 | |
* lbragstad hands mriedem https://plugins.jetbrains.com/plugin/7125-grep-console | 21:41 | |
*** holser_ has joined #openstack-nova | 21:42 | |
mriedem | jebus h c | 21:42 |
lbragstad | all the grep skillz you need is just one plugin away | 21:42 |
mriedem | ctrl+shift+f my man | 21:43 |
*** priteau has quit IRC | 21:47 | |
efried | how does a guy install a pycharm plugin? | 21:47 |
*** rcernin has joined #openstack-nova | 21:49 | |
*** ttsiouts has quit IRC | 21:49 | |
*** claudiub has quit IRC | 21:50 | |
*** ttsiouts has joined #openstack-nova | 21:50 | |
jaypipes | efried: sudo apt install vim? | 21:51 |
lbragstad | git clone https://github.com/$USER/dotfiles | 21:51 |
efried | found it. Thanks for nothing, snarks | 21:53 |
zigo | jaypipes: Real man use joe editor ... | 21:54 |
*** dpawlik has joined #openstack-nova | 21:54 | |
*** dpawlik has quit IRC | 21:59 | |
jaypipes | zigo: luckily, I'm not a real man. | 21:59 |
dansmith | mriedem: so should I just make up a fake test for that so we can merge and see if the problem goes away? presumably that's the only way we're really going to know? | 22:00 |
*** cfriesen has quit IRC | 22:04 | |
mriedem | dansmith: is it passing in the gate? | 22:04 |
mriedem | if so, yeah sure | 22:05 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Avoid spurious ComputeNode.save during update_available_resource periodic https://review.openstack.org/598365 | 22:05 |
mriedem | efried: melwitt: ^ is a thing related to that xen bug | 22:05 |
mriedem | for allocation ratios | 22:05 |
mriedem | since i'm not sure how we're actually racing here, i'm not sure if it will actually fix it, | 22:05 |
melwitt | ack | 22:06 |
mriedem | but it's about the only thing i can think of that would get us this which f's up the inventory in placement during the periodic: | 22:06 |
mriedem | Aug 29 16:58:05.483508 dsvm-devstack-citrix-mia-nodepool-1379368 nova-compute[24436]: INFO nova.compute.resource_tracker [None req-a869fa19-aa9d-4335-9816-42ff29b64d48 None None] Using cpu_allocation_ratio 0.0 for node: 2f5a2e04-1b61-4437-ab6e-8dbbf797dc07 | 22:06 |
efried | mriedem: Doesn't seem to be doing what the commit title says... | 22:06 |
mriedem | that's logs from the normalize method in the RT | 22:06 |
dansmith | mriedem: it's passing the stuff that isn't dead on the floor for other reasons | 22:07 |
mriedem | efried: oh but you must read the full message my friend | 22:07 |
mriedem | it's a rich tapestry of suck | 22:07 |
efried | yeah yeah | 22:07 |
mriedem | and with that, i'm putting my lawn mowin' clothes on and hitting nature | 22:07 |
efried | I don't see it hurting anything to never write 0.0 to an allocation ratio. | 22:07 |
efried | unless, as you say, some other suckpoint is using that as a signal to refresh the real values from somewhere else. | 22:08 |
mriedem | right, i don't think this hurts, it might help | 22:08 |
efried | In which case that should be change. | 22:08 |
efried | d | 22:08 |
mriedem | except i have that todo in there - mostly a question for reviewers to check my brain | 22:08 |
openstackgerrit | Dan Smith proposed openstack/nova master: Move conductor wait_until_ready() delay before manager init https://review.openstack.org/598353 | 22:10 |
*** purplerbot has quit IRC | 22:16 | |
*** purplerbot has joined #openstack-nova | 22:17 | |
*** mriedem is now known as mriedem_lawnboy | 22:17 | |
jaypipes | mriedem_lawnboy, efried: prescient? https://review.openstack.org/#/c/598365/1/nova/tests/unit/compute/test_resource_tracker.py@1381 | 22:26 |
efried | Mm | 22:27 |
efried | I thought it was a bug in the test. | 22:27 |
efried | Clearly not. | 22:27 |
*** holser_ has quit IRC | 22:30 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Fix nits: Compute: Handle reshaped provider trees https://review.openstack.org/598387 | 22:37 |
*** ttsiouts has quit IRC | 22:49 | |
*** ttsiouts has joined #openstack-nova | 22:50 | |
*** ttsiouts has quit IRC | 22:54 | |
*** eharney has quit IRC | 22:59 | |
*** macza has quit IRC | 23:08 | |
*** cdent has quit IRC | 23:22 | |
*** erlon has joined #openstack-nova | 23:27 | |
*** mlavalle has quit IRC | 23:34 | |
*** r-daneel has quit IRC | 23:36 | |
*** rcernin has quit IRC | 23:56 | |
*** rcernin has joined #openstack-nova | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!