Wednesday, 2018-10-24

*** erlon has joined #openstack-nova00:03
*** hamzy has joined #openstack-nova00:05
*** slaweq has joined #openstack-nova00:11
*** tetsuro has joined #openstack-nova00:12
*** trungnv has quit IRC00:25
*** trungnv has joined #openstack-nova00:25
*** mlavalle has quit IRC00:32
*** moshele has joined #openstack-nova00:35
*** slaweq has quit IRC00:45
*** medberry has quit IRC01:02
*** medberry has joined #openstack-nova01:03
*** jackyzhu has joined #openstack-nova01:08
*** slaweq has joined #openstack-nova01:12
*** sapd1 has quit IRC01:16
*** mrsoul has joined #openstack-nova01:17
*** sapd1 has joined #openstack-nova01:18
*** hongbin has joined #openstack-nova01:19
*** yikun has joined #openstack-nova01:21
*** takashin has joined #openstack-nova01:21
*** imacdonn has quit IRC01:23
*** imacdonn has joined #openstack-nova01:24
*** tiendc has joined #openstack-nova01:33
*** erlon has quit IRC01:36
*** alex_xu has quit IRC01:37
*** litao has joined #openstack-nova01:39
litaohi01:40
*** mhen has quit IRC01:42
*** Dinesh_Bhor has joined #openstack-nova01:43
*** slaweq has quit IRC01:44
*** moshele has quit IRC01:46
*** itlinux has quit IRC01:46
*** mhen has joined #openstack-nova01:46
openstackgerritliuming proposed openstack/nova master: Deletes evacuated instance files when source host is ok  https://review.openstack.org/60598701:56
*** fanzhang has joined #openstack-nova01:57
*** alex_xu has joined #openstack-nova01:59
*** cfriesen has quit IRC02:08
openstackgerritVu Cong Tuan proposed openstack/nova-specs master: Switch to stestr  https://review.openstack.org/58128402:10
*** slaweq has joined #openstack-nova02:11
alex_xugmann: sorry, I can't join office hour today02:26
gmannalex_xu: ok, i will skip for today then. thanks for informing.02:27
*** slaweq has quit IRC02:44
*** psachin has joined #openstack-nova02:55
*** sambetts|afk has quit IRC02:56
*** sambetts_ has joined #openstack-nova03:00
*** takashin has left #openstack-nova03:09
*** slaweq has joined #openstack-nova03:11
*** slaweq has quit IRC03:45
*** udesale has joined #openstack-nova03:50
*** udesale has quit IRC03:50
*** udesale has joined #openstack-nova03:51
openstackgerritMerged openstack/nova master: Fix up compute rpcapi version for pike release  https://review.openstack.org/61223104:00
*** Dinesh_Bhor has quit IRC04:06
openstackgerritmelanie witt proposed openstack/nova stable/rocky: Fix up compute rpcapi version for pike release  https://review.openstack.org/61256104:11
*** slaweq has joined #openstack-nova04:12
*** janki has joined #openstack-nova04:23
*** spsurya has joined #openstack-nova04:24
*** hongbin has quit IRC04:25
*** Dinesh_Bhor has joined #openstack-nova04:26
*** tiendc has quit IRC04:26
*** brinzhang has joined #openstack-nova04:29
*** slaweq has quit IRC04:45
*** ratailor has joined #openstack-nova04:57
*** dave-mccowan has quit IRC04:57
*** slaweq has joined #openstack-nova05:11
*** jackyzhu007 has joined #openstack-nova05:15
*** pvradu has joined #openstack-nova05:16
*** jackyzhu007 has quit IRC05:17
*** jackyzhu has quit IRC05:19
*** phillu has joined #openstack-nova05:29
*** bhagyashris has joined #openstack-nova05:29
*** phillu has quit IRC05:43
*** slaweq has quit IRC05:44
*** tetsuro has quit IRC06:04
*** Luzi has joined #openstack-nova06:04
*** slaweq has joined #openstack-nova06:11
*** pvc has quit IRC06:13
*** phillu has joined #openstack-nova06:16
*** artom has quit IRC06:18
*** pvradu has quit IRC06:18
openstackgerritMerged openstack/nova stable/rocky: Fix formatting non-templated cell URLs with no config  https://review.openstack.org/61132706:19
*** artom has joined #openstack-nova06:20
*** tetsuro has joined #openstack-nova06:27
*** raginbajin has quit IRC06:32
*** raginbajin has joined #openstack-nova06:34
*** pvradu has joined #openstack-nova06:38
*** alex_xu has quit IRC06:40
*** pvradu has quit IRC06:42
*** pvradu has joined #openstack-nova06:43
*** jangutter has quit IRC06:47
*** jangutter has joined #openstack-nova06:47
*** ccamacho has joined #openstack-nova06:47
*** phillu has quit IRC06:47
*** Dinesh_Bhor has quit IRC07:00
*** rcernin has quit IRC07:03
*** ivve has joined #openstack-nova07:03
*** gibi_off is now known as gibi07:04
*** artom has quit IRC07:05
*** pcaruana has joined #openstack-nova07:05
*** artom has joined #openstack-nova07:07
*** Dinesh_Bhor has joined #openstack-nova07:09
*** threestrands has quit IRC07:13
*** ralonsoh has joined #openstack-nova07:19
*** helenafm has joined #openstack-nova07:19
*** cfriesen has joined #openstack-nova07:21
*** _pewp_ has quit IRC07:30
*** Cardoe has quit IRC07:31
*** Cardoe has joined #openstack-nova07:31
*** _pewp_ has joined #openstack-nova07:31
openstackgerritYongli He proposed openstack/nova-specs master: add spec "show-server-numa-topology"  https://review.openstack.org/61225607:33
*** cfriesen has quit IRC07:39
*** lpetrut has joined #openstack-nova07:41
*** adrianc has joined #openstack-nova07:43
*** adrianc_ has joined #openstack-nova07:43
*** pvc has joined #openstack-nova07:46
pvchi anyone sean-k-mooney or bauzas?07:46
*** sahid has joined #openstack-nova07:47
*** jpena|off is now known as jpena07:48
pvci already launch an instance iwth vgpu on it07:50
pvcbut which driver should i use?07:50
pvcto run a gpu application07:51
bauzasgood morning nova07:53
*** tetsuro has quit IRC07:56
*** Dinesh_Bhor has quit IRC08:01
*** pvradu_ has joined #openstack-nova08:08
*** pvradu has quit IRC08:11
bauzaspvc: you should use the grid guest driver08:18
* bauzas apologies for yesterday, had a fucking power outage08:18
bauzaswhich was originally planned, but not lasting so long08:19
*** liuyulong|away has quit IRC08:20
*** brinzhang has quit IRC08:24
*** brinzhang has joined #openstack-nova08:24
*** sayalilunkad has quit IRC08:33
*** sayalilunkad has joined #openstack-nova08:33
pvci've successfully run the nvidia-smi bauzas08:33
pvcProduct Name                    : GRID P100-2B408:34
bauzascool08:34
pvcVirtualization mode         : VGPU08:34
pvcbut i cannot test the tensorflow-gpu :(08:34
pvcdo you have any prefer docs for that08:34
*** ttsiouts has joined #openstack-nova08:34
*** moshele has joined #openstack-nova08:38
*** derekh has joined #openstack-nova08:39
*** pvc has quit IRC08:43
*** cdent has joined #openstack-nova08:44
openstackgerritBrin Zhang proposed openstack/nova-specs master: Support delete_on_termination in volume attach api  https://review.openstack.org/61294908:44
openstackgerritMatthew Booth proposed openstack/nova master: Add regression test for bug 1550919  https://review.openstack.org/59173308:46
openstackbug 1550919 in OpenStack Compute (nova) "[Libvirt]Evacuate fail may cause disk image be deleted" [Medium,In progress] https://launchpad.net/bugs/1550919 - Assigned to Matthew Booth (mbooth-9)08:46
openstackgerritMatthew Booth proposed openstack/nova master: Don't delete disks on shared storage during evacuate  https://review.openstack.org/57884608:46
*** pvc_ has joined #openstack-nova08:47
pvc_are you running tensorflow on your instance bauzas08:47
*** priteau has joined #openstack-nova08:48
*** vabada has quit IRC08:51
*** vabada has joined #openstack-nova08:52
*** phillu has joined #openstack-nova08:54
openstackgerritBrin Zhang proposed openstack/nova-specs master: Support deleting data volume when destroy instance  https://review.openstack.org/58033608:54
*** ttsiouts has quit IRC08:55
*** ttsiouts has joined #openstack-nova08:58
*** Dinesh_Bhor has joined #openstack-nova08:58
*** maciejjozefczyk has quit IRC09:02
*** maciejjozefczyk has joined #openstack-nova09:02
pvc_hi bauzas u using cuda09:03
pvc_tensorflow.python.framework.errors_impl.InternalError: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error :(09:06
sean-k-mooneypvc_: are you using one of the Q series mdev-types09:07
sean-k-mooneyhttps://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#cuda-open-cl-support-vgpu09:07
*** sambetts_ is now known as sambetts|afk09:07
pvc_NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB]09:08
sean-k-mooneyfor you gpu https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#vgpu-types-tesla-p10009:09
sean-k-mooneyyou need to fine the mdev-type that correspond to one of P100-1Q P100-2Q P100-4Q P100-8Q or P100-16Q09:09
pvc_videocard of my instance: 00:05.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:15f8] (rev a1)09:10
sean-k-mooneypvc_: yes that likely not going to change regardless of the mdev type yoou choose09:10
pvc_im using nvidia-21109:12
sean-k-mooneysure that does not help us map to the nvidia docs. you will need to look at the vendor data reported for the mdev type and see if you can find the vgpu type09:13
pvc_GRID P100-2B409:14
pvc_the name of nvidia-21109:14
sean-k-mooneyright so that is a B series vgpu and does not support compute via opencl or cuda09:15
sean-k-mooneypvc_: find the one corresponding to P100-2Q09:15
sean-k-mooneythat will be the most similar09:15
pvc_okay wait09:15
pvc_cat nvidia-84/name GRID P100-2Q09:16
sean-k-mooneycool so if you set that in the nova.conf and boot a new vm it should work with cuda09:17
sean-k-mooneyyou shoudl look at https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#vgpu-types-tesla-p100 and determin which one meets your needs09:17
pvc_wait09:17
pvc_may i know where i can find that information?09:17
pvc_thank you i'll try09:17
pvc_what docs can i read to determine where cuda will run?09:18
sean-k-mooneyall the vGPU types that end in Q support cuda acordeing to https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#cuda-open-cl-support-vgpu.09:18
sean-k-mooneyits the same doc just later in section 1.6.209:19
pvc_wow that's so cool :)09:19
pvc_thank you so much09:19
pvc_i'll lauch again an instance09:19
pvc_can i use P100-1Q sean-k-mooney?09:20
sean-k-mooneyso decodeing this a bit more it appears the number before the Q in there name ing is the amound of ram so the p100-2q has 2GB of fram buffer so you can run 8 of them on the 16GB p10009:20
sean-k-mooneyyes09:20
sean-k-mooneythat will simply have 1GB of vRAM allocated to its framebuffer instead09:21
sean-k-mooneythat will allow you to run 16 vms on each phyicl p10009:21
pvc_2Q is better right09:21
pvc_so i can launch 8 instances with 2Q09:21
sean-k-mooneyyep09:21
pvc_i have 2 tesla p10009:21
sean-k-mooney8 with 2Q 4 with 4Q 2 with 8Q and 1 with 16Q09:22
pvc_http://paste.openstack.org/show/732950/09:22
sean-k-mooneyso its a trade off between number of vms you can run and performcne that each vm will have09:22
sean-k-mooneypvc_: what software license do you have by the way.09:25
sean-k-mooneyyou will need the Quadro vDWS license to be able to used the higher performance Q series vgpu types09:25
pvc_i didnt use any license for now09:26
pvc_do i need to install it09:26
pvc_but i think i have the license09:27
sean-k-mooneyacording to https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#licensing-grid-vgpu cuda will be disabled untill you add the license key09:27
pvc_sean-k-mooney http://paste.openstack.org/show/732952/09:28
pvc_licensing on compute node or on the instance?09:29
sean-k-mooneyyou have to install the lisening server somewhere on your network and then i belive you need to point your instance at the licensing server09:29
sean-k-mooneyits all covered in https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#licensing-grid-vgpu09:30
pvc_can i just install in on my compute node?09:31
pvc_is that possible09:31
sean-k-mooneypvc_: no not as far as i can tell09:31
sean-k-mooneywhat you can do is configure the adresss of the licening server once then snapshot the vm and use the new image as your base image in glance for tenants09:32
pvc_wait i just install the licensing server09:33
*** k_mouza has joined #openstack-nova09:33
*** dtantsur|afk is now known as dtantsur09:34
bauzassean-k-mooney: pvc_: sorry was on meeting09:37
sean-k-mooneypvc_: if you are using linux guest looks like you can also just add a config file and enable the nvidia-gridd service https://docs.nvidia.com/grid/latest/grid-licensing-user-guide/index.html#licensing-grid-software-linux-config-file09:37
sean-k-mooneybauzas: no worries09:37
bauzassean-k-mooney: thanks for helping pvc_, all of what you said is valid :)09:37
bauzasnvidia requires a specific license for CUDA, and implies a GPU profile09:37
pvc_im using 90days trial, may i know if it have a free license?09:39
sean-k-mooneypvc_: the trial shoudl work for now but you will need a Quadro vDWS license09:40
sean-k-mooneypvc_: the priceing is covered here https://images.nvidia.com/content/grid/pdf/161207-GRID-Packaging-and-Licensing-Guide.pdf09:40
pvc_can i continue without license?09:41
pvc_or do i need to buy09:41
sean-k-mooneyouch $450 per concurrent user for  perpetual license09:41
sean-k-mooneypvc_: with out cuda suppor and a frame rate cap of 3 frames pre second and a reduced frame rate yes but realitically no you need a licese09:42
*** bhagyashris has quit IRC09:43
* bauzas makes no comment09:43
pvc_as per nvidia support  yes, The evaluation license provides access to 128 CCU's of NVIDIA Quadro Virtual Data Center Workstation (Quadro vDWS) edition for up to 90 days.09:47
bauzaspvc_: just to make it clear, the client licensing is in https://docs.nvidia.com/grid/6.0/grid-licensing-user-guide/index.html09:49
*** panda has quit IRC09:54
*** panda has joined #openstack-nova09:55
openstackgerritIvaylo Mitev proposed openstack/nova master: VMware: VIF info and utils for image as template  https://review.openstack.org/61297409:58
openstackgerritIvaylo Mitev proposed openstack/nova master: VMware: Inventory path utils for image as template  https://review.openstack.org/61297609:59
*** k_mouza has quit IRC10:00
*** k_mouza has joined #openstack-nova10:01
*** aloga has quit IRC10:08
*** k_mouza has quit IRC10:09
*** sahid has quit IRC10:12
openstackgerritStephen Finucane proposed openstack/nova master: Fail to live migration if instance has a NUMA topology  https://review.openstack.org/61108810:15
openstackgerritIvaylo Mitev proposed openstack/nova master: VMware: OVA and StrOpt images as VM templates  https://review.openstack.org/60973610:19
*** k_mouza has joined #openstack-nova10:20
BlackDexHello there. I'm seeing some invalid (RFC Validation) json blob data in the instance_extra table of nova. It has nested json data and also a key is not quoted.10:22
BlackDexis this by design?10:22
*** phillu has quit IRC10:24
*** pvc_ has quit IRC10:26
BlackDexwel not nested, but multiple root elements actually10:26
*** alexchadin has joined #openstack-nova10:28
*** phillu has joined #openstack-nova10:30
*** pvradu_ has quit IRC10:38
*** pvradu has joined #openstack-nova10:38
*** tbachman has quit IRC10:40
openstackgerritFan Zhang proposed openstack/nova master: Retry after hitting libvirt error VIR_ERR_OPERATION_INVALID in live migration.  https://review.openstack.org/61227210:53
*** phillu has quit IRC10:57
*** udesale has quit IRC11:01
*** ttsiouts has quit IRC11:06
*** adrianc_ has quit IRC11:07
*** adrianc has quit IRC11:07
*** erlon has joined #openstack-nova11:09
*** adrianc_ has joined #openstack-nova11:10
*** adrianc has joined #openstack-nova11:10
*** ratailor has quit IRC11:10
*** fghaas has joined #openstack-nova11:11
*** k_mouza has quit IRC11:13
*** dave-mccowan has joined #openstack-nova11:14
*** Dinesh_Bhor has quit IRC11:15
*** moshele has quit IRC11:16
*** ttsiouts has joined #openstack-nova11:19
*** k_mouza has joined #openstack-nova11:22
*** adrianc_ has quit IRC11:29
*** adrianc has quit IRC11:29
*** Dinesh_Bhor has joined #openstack-nova11:33
*** litao has quit IRC11:34
*** jpena is now known as jpena|lunch11:36
*** Dinesh_Bhor has quit IRC11:45
*** moshele has joined #openstack-nova11:56
*** adrianc has joined #openstack-nova11:58
janguttersean-k-mooney: in os-vif (neutron api v3), what's going to be the names of the two sides passing objects? The docs refer to 'provider host' and 'networking host'? (respectively on compute and controller, I presume).11:59
*** pvradu_ has joined #openstack-nova12:04
sean-k-mooneyam which doc? i have not worked on the spec for this in like 18 months12:06
sean-k-mooneythe two entities are the neutron api (sepcificaly the ml2 drivers which handel the binding request) and the nova compute agent12:07
*** pvradu has quit IRC12:08
sean-k-mooneythe intent was that nova compute agent would pass a filtered host info object(in a serialised form) to neutron as part of port binding and the neutron ml2 driver would bind the port and respond with a serialise os-vif vif object12:09
janguttersean-k-mooney: I was looking at the vestigial docs in os-vif itself. and I understood it as you describe it too.12:10
sean-k-mooneyoh ok cool12:10
sean-k-mooneyi had previously debated createing a request and responce object pair in os-vif also12:11
*** aloga has joined #openstack-nova12:11
sean-k-mooneyi think the details are something we can iterate on when we actully move forward on this.12:12
janguttersean-k-mooney: yep, just wanted to kinda capture the proposed intent and sequence a bit clearer.12:15
sean-k-mooneycool12:16
*** udesale has joined #openstack-nova12:17
sean-k-mooneyanything else you wanted to know on that topic or did i cover it above?12:17
janguttersean-k-mooney: I think it's sufficient, I'm not implementing the entire sequence here, basically just roughly sketching out things like sequence, the direction of filtering, and entities in the system.12:18
sean-k-mooneycool so you pulling to geter all the required reading to write a spec to adress it :)12:19
janguttersean-k-mooney: heh, writing a paragraph in the doc explaining: hey, this is a stub, this is why it's a stub, and it might look like this after it's unstubbed.12:21
*** k_mouza has quit IRC12:23
*** udesale has quit IRC12:25
*** moshele has quit IRC12:26
openstackgerritDaniel Abad proposed openstack/nova master: Fix ironic client ironic_url deprecation warning  https://review.openstack.org/61187212:26
*** brinzhang has quit IRC12:27
*** moshele has joined #openstack-nova12:27
openstackgerritDaniel Abad proposed openstack/nova master: Fix ironic client ironic_url deprecation warning  https://review.openstack.org/61187212:28
*** tbachman has joined #openstack-nova12:29
*** mvkr has quit IRC12:29
*** k_mouza has joined #openstack-nova12:30
*** jangutter has quit IRC12:36
*** jpena|lunch is now known as jpena12:37
openstackgerritGaudenz Steinlin proposed openstack/nova master: Ignore misleading resource updates from virt driver  https://review.openstack.org/52300612:39
*** jangutter has joined #openstack-nova12:39
janguttersean-k-mooney: one more clarification, when you say "filtered host-info", what's the filter that's applied? Is it just filtering based on the plugin info gathered on the compute node, filtering out out-of-range plugins?12:44
sean-k-mooneybasicaly the idea was that nova would prefilter the host info object on 2 factor12:45
sean-k-mooneyfirst what plugins/viftypes were supported by the hypervior on that node12:46
sean-k-mooneyand second any aspect of the guest that would resitct what vif types could be used.12:46
janguttersean-k-mooney: I see, only info locally available at the compute node at the time, analogous to "capabilities".12:47
sean-k-mooneye.g. vhost-user requires hugepages to work so if the flavor did not have a hugepage request it would be removed form the list12:47
sean-k-mooneyyep12:47
* sean-k-mooney technically hugepages arnt the requiremetn but close enough12:48
sean-k-mooneythe main motivaitoin is to enable nova to say based on my knoladge of the hypervior and the instance request(flavor and image metatdata) this is the set of vif types i could support12:49
sean-k-mooneyand then neutron can say ok form that set i can support Y and select an optimal vif type to use12:50
stephenfinnp12:50
sean-k-mooneyonce we have that capablity we can potainilly schedule on that in the future too12:51
janguttersean-k-mooney: right, makes sense and saves a round-trip with mis-scheduling or a failed portbinding.12:52
sean-k-mooneyyep or worse in the vhost user case where the vm boots with no error and no networking12:52
* sean-k-mooney part of me like the symerty of how terrible that user experince is for both teants and operators12:53
janguttersean-k-mooney: even the qemu error in libvirt is tricky to trace and very misleading in that case.12:56
sean-k-mooneyjangutter: qemu does not provide an error in that case at least it did not in the past12:56
sean-k-mooneythe only error i have ever seen for this is a debug only error in dpdk logs related to not being able to map the memory12:57
sean-k-mooneybut in anycase it would allow us on both the nova and neutron side to filter down to only inteface we think should work instead of relying on the operator/deployer to get this right12:58
*** eharney has joined #openstack-nova12:58
janguttersean-k-mooney: I remember finding something in the qemu logs about "falling back on userspace virtio" when that happened. No hard error.12:59
sean-k-mooneyreally well the fallback does not work so i guess its nice they tried but ya i think we have all been bit by that at some point if we have used vhost-user13:00
janguttersean-k-mooney: one day, there'll be unscarred users, developers and operators.13:02
*** dave-mccowan has quit IRC13:02
*** dave-mccowan has joined #openstack-nova13:04
*** bnemec has joined #openstack-nova13:04
*** dtantsur has quit IRC13:04
sean-k-mooneyjangutter: you mean when ai ban humans form coding and do it all themselves i totally agree13:04
sean-k-mooneythat or when they kill all the users ...13:05
*** mchlumsky has joined #openstack-nova13:07
*** dtantsur has joined #openstack-nova13:09
*** beagles is now known as beagles_mtg13:12
*** mvkr has joined #openstack-nova13:17
openstackgerritsean mooney proposed openstack/nova master: harden placement init under wsgi  https://review.openstack.org/61003413:18
*** udesale has joined #openstack-nova13:22
*** cdent has quit IRC13:22
*** cdent has joined #openstack-nova13:30
*** tbachman has quit IRC13:32
*** adrianc has quit IRC13:34
*** adrianc has joined #openstack-nova13:34
*** mdbooth_ has joined #openstack-nova13:35
*** mdbooth_ is now known as mdbooth13:35
sean-k-mooneyzzzeek: mdbooth cdent so regarding https://review.openstack.org/#/c/610034/4/nova/api/openstack/placement/db_api.py i think we have 3 paths forward13:36
mdboothsean-k-mooney zzzeek: Continuing our previously downstream discussion of https://review.openstack.org/#/c/610034/13:36
sean-k-mooneyone agree my code is awsome and merge it. 2 add a flag on the consumres side to track if we have configured it allready or 3 extend oslo.db to allow reconfiguring a transation_context that has been started13:37
mdboothsean-k-mooney: zzzeek to confirm, but I suspect 3 is not a thing13:37
mdboothI was going to propose a 2 phase approach:13:37
mdbooth1. Set a flag in the module on configuration, assert configuration only happens once, emit an unconditional warning on reconfiguration that reconfiguration did not happen.13:38
cdentcan someone explain what's wrong with option sean's option 'one'?13:39
mdbooth2. Update all decorators which currently close over placement_context_manager to call get_placement_context_manager(), and go with the original plan of creating a new one on reconfiguration.13:39
mdboothcdent: The TypeError will bite us when there's a bug in oslo.db, or it's changed inadvertently as code is moved around. It's not a deliberate API.13:40
mdboothAnd the configure flag is trivial to implement and better.13:40
cdentdo we have to do step 2? that's idiomatic throughout all of placement and nova13:40
cdentit is perhaps messy, but a considerable change13:41
mdboothcdent: We don't have to do step 2, but it's the only way I can think of that we'll get reconfiguration across a restart.13:41
mdboothAgree, hence the existence of step 1.13:41
sean-k-mooneycdent: well we could still keep decorator we just need to have a different one that does the dispatch to the current instance of the global rather then the one that was bound on import13:42
cdentis "reconfiguration across a restart[1]" required?13:42
cdent[1] I think maybe you mean reload in apache terms?13:42
mdboothcdent: Yeah. I understood it was required, but I'm prepared to hear it's not.13:43
sean-k-mooneycdent: ya i was thinking that an operator may have changed the config a some point and when there awas a failure it could pickup those chages13:43
sean-k-mooneycdent: im not sure its requried but it was an question i had13:43
mdboothIf it's not required, there's no reason for such a noisy change.13:43
mdboothI think the warning makes sense, though.13:43
sean-k-mooneye.g. is skipping reconfiguration vaild always13:44
sean-k-mooneymdbooth: ya i had a debug level log on the placement version of the chagne13:44
sean-k-mooneynova did not have the logger so i left it out of the nova version13:44
cdenthas anyone checked to see if mod-wsgi's behavior can be changed to be more like uwsgi's?13:45
*** k_mouza has quit IRC13:46
sean-k-mooneyno but i had considerd seeing if we could change the kolla images to use uwsgi also but no time13:46
sean-k-mooneyi understand they did this for performance reasons in mod-wsgi to have quicker reloads but it seam wrong to me13:47
openstackgerritMatt Riedemann proposed openstack/nova master: Fix min config value for shutdown_timeout option  https://review.openstack.org/61302813:47
*** k_mouza has joined #openstack-nova13:47
*** moshele has quit IRC13:47
*** whoami-rajat has quit IRC13:48
sean-k-mooneythis seams to be the only related thin in there configs https://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIScriptReloading.html13:48
mdboothcdent: This could easily be mechanical, btw13:52
mdboothcdent: I'm going to chuck up a quick POC/strawman13:52
cdentmdbooth: I'm not direclty opposed to changing it, I'm just much too familiar with being able to change things13:53
cdentbeing difficult13:53
mdboothcdent: Hehe, we're on the same page :)13:53
cdenthere we go: https://modwsgi.readthedocs.io/en/develop/user-guides/application-issues.html#reloading-python-interpreters13:53
mdboothcdent: This is why, downstream, I prioritised the deployment framework workaround for the same issue :)13:53
cdentlooks like that reloading solution has some issues with c extensions13:55
cdentsean-k-mooney: did you see the link ^13:56
sean-k-mooneyyes reading. the interperter reload option was removed in veriosn 2.0 so we cant use it anyway13:57
cdent"As an alternative, daemon mode of mod_wsgi should be used and the “Process” reload mechanism added with mod_wsgi 2.0."?13:58
mdboothcdent sean-k-mooney: I just threw something together. Running a smoke test before sharing.13:58
cdentI continue to think that we're trying to fix a problem in placement when it should be fix in how mod-wsgi is being managed13:59
sean-k-mooneymdbooth: ok but the decoror change shoudl be really trivial i know it will work but the orginical question is it needed13:59
*** janki has quit IRC14:00
sean-k-mooneycdent: well its more a question of what is the abstract machine we expect to execut the code in14:00
*** liuyulong has joined #openstack-nova14:00
cdentsean-k-mooney: sure, and WSGI, as a protocol, has some pretty simple guidelines14:00
cdentit is, by design, super fast to start up a new proecess wtih the application in it14:01
mdboothThe little I read didn't seem to define whether you get a new python vm or not.14:01
cdentbecause it just assumes you do14:01
mdboothassume != define14:01
cdentthat's kind of part of the WSGI-nature14:01
cdentsure, but you're asking placement to take on additional complexity in the wrong place14:02
mdboothUnless we're saying that mod_wsgi is architecturally broken and shouldn't be used by anyone?14:02
cdentWSGI exists so that http server-related concerns can exist outside the wsgi application code14:02
mdboothThat could be true, I wouldn't know14:02
cdentI do say that, these days :)14:02
cdentand so do many other people14:03
mdboothcdent: Are those the same folks using undefined behaviour? :P14:03
cdentthey are people who don't want the wsgi server having any impact on the wsgi application, which mod-wsgi always has14:03
cdentbut for a long time it was the only performant choice14:04
sean-k-mooneymdbooth: how mod wsgi is work i would say is falling int the same camp as undefiend behavior in c/c++14:04
*** moshele has joined #openstack-nova14:05
cdentI'm happy to consider changes in placement for all of this stuff, but it would make me much happier if there was also visible effort to inquire with Graham about whether there are ways to achieve the same thing in mod-wsgi14:05
cdentwe seem to be trying to take the local path, when a more global path _might_ be the right thing14:05
sean-k-mooneycdent: well long term i think relaoding a script by spawning an entirely seperate prociess with its own pyton interperet would be the correct thing to do in mod_wsgi14:07
*** udesale has quit IRC14:07
sean-k-mooneythat said trippleo also shoudl not be starting the placemetn api continer while upgradeing tht database14:07
*** awaugama has joined #openstack-nova14:08
mdboothsean-k-mooney: Yep, of course. But that's not the only reason why might fail on startup. This is still an issue.14:08
sean-k-mooneyshort term we proably need a local solution14:08
mdboothSuper short term is the tripleo fix.14:08
*** tbachman has joined #openstack-nova14:08
sean-k-mooneymdbooth: well the tripleo fix would have been correct in any case14:09
cdentI don't feel like I have all the info14:09
mdboothsean-k-mooney: ack14:09
*** tbachman has quit IRC14:09
mdboothcdent: We have a downstream failure because we're starting placement while still running db_sync on its db14:09
cdentbut I also don't feel like we (the three of us) have all the info about how mod-wsgi works14:09
mdboothWhen we restart it, we get a failure because placement_context_manager is already configured.14:10
mdboothSo obviously we shouldn't be doing that, but it highlights that *any* restart of placement like this will fail for the same reason.14:11
cdentwhat kind of 'restart' is being done?14:11
sean-k-mooneymdbooth: well we are not exactly restarting it. the application is crashing somehow and reloading14:11
mdboothcdent: We're not restarting apache. I did ask about that.14:11
sean-k-mooneymdbooth: if we were to restart the container then we definetly would have had a clean env14:11
mdboothsean-k-mooney: ack14:12
cdentanother option is to try touching the nova-placement-api or placement-api file14:12
openstackgerritGaudenz Steinlin proposed openstack/nova master: Extend volume for libvirt network volumes (RBD)  https://review.openstack.org/61303914:12
cdentthat _may_ cause the daemon process (if that is what you are using) to clean itself up14:12
cdentthat's the "normal" way to do code reload and process reload with daemon process based mod-wsgi14:12
sean-k-mooneycdent: updating the time stamp on an "imutable" continer feel kind of hacky14:13
cdentlong term: use uwsgi in the container, and have a FEP in some other container14:13
cdentsean-k-mooney: i agree, in the case, just start the container back up14:13
cdentthe rules about how containers operate seems to be being selected sort of randomly14:14
cdentin normal container life: if it doesn't work, you kill it and try again14:14
*** pvradu_ has quit IRC14:14
sean-k-mooneyso part of the issue is i think httpd does not exit but the apllicaiton cannoth relaod properly so the container wont restart14:14
cdentand you don't _ever_ run something as heavy as apache2 in a container14:14
* cdent needs to join the kolla project14:14
sean-k-mooneye.g. if the whole thing exploed when we hit the unrecoverable error then docker would just restat the container and we woudl be fine14:14
*** pvradu has joined #openstack-nova14:14
* cdent nods14:14
cdentyou might be able to achieve that by no using daemon mode with mod-wsgi14:15
cdentbut uwsgi would be easier :) ;)14:15
sean-k-mooneythis is happening on Rocky by the way so in osp13 for us downstream14:15
cdentfor future reference sean-k-mooney, have you looked at the way placedock works? https://github.com/cdent/placedock14:16
sean-k-mooneyfor stein + we coudl look at swappng to uwsgi in kolla i guess as an addtional mitigation14:16
sean-k-mooneycdent: i have ran it once14:16
sean-k-mooney i was tring to figure out could i use it with the osc-placement fuctional test without runnign devstack14:17
cdentthe set up there is designed to make it easy to have some other thing in the front (like a load balancer or k8s ingress thing)14:17
cdentif the application fails to start, it quits14:17
sean-k-mooneycdent: so in the kolla world we are running placement under mod_wsgi then putting haproxy in front of it ...14:17
cdentvery wasteful14:18
sean-k-mooneyyep but it "works" going to uwsgi would be a good thing in general for kolla i think14:19
*** ttsiouts has quit IRC14:19
sean-k-mooneybut back to the short term fix e.g. by end of this week/day14:19
sean-k-mooneycdent: mdbooth are we going with the flag, excetion catching or decorator change14:19
cdentsean-k-mooney: i think we're waiting to see what mdbooth's change looks like?14:20
mdboothcdent: I think regardless of my change, we want to go for the flag in the first instance14:20
openstackgerritSylvain Bauza proposed openstack/nova-specs master: Proposes NUMA topology with RPs  https://review.openstack.org/55292414:20
mdboothJust because it's simple and an obvious improvement14:20
sean-k-mooneymdbooth: ok ill add a flag instead of catching the exception14:21
mdboothThen we can consider the finer points of wsgi, and whether a refactor is worth it later14:21
*** ttsiouts has joined #openstack-nova14:21
openstackgerritBalazs Gibizer proposed openstack/nova master: Consider allocations invovling child providers during allocation cleanup  https://review.openstack.org/60605014:21
dansmithjaypipes: could you look at this for me? It's been a while since I wrote it and my context is fading, so I'd like to get it reviewed: https://review.openstack.org/#/c/61166514:23
*** moshele has quit IRC14:24
*** k_mouza has quit IRC14:24
cdentsean-k-mooney, mdbooth looks like zzzeek just did : https://review.openstack.org/#/c/613040/14:25
sean-k-mooneycdent: that is in oslo db.14:26
sean-k-mooneythe intent was to add the flag to the callee code not the lib code14:26
cdentthis allows the callee to check for already started before configuring14:27
sean-k-mooneythat said i can use that but then its not backportable easilly14:27
cdentright, I'm not suggesting you use it _now_14:27
cdentjust that it is available in the future14:27
cdentand the change of exception is handy14:28
* cdent goes to do something not on the computer for a while14:30
sean-k-mooneyacttuly i can use hasattter to see if it exits so i can contionally use it. ill submit a patch soon14:31
*** k_mouza has joined #openstack-nova14:31
sean-k-mooneycdent: enjoy your non compute thing :)14:31
*** beagles_mtg is now known as beagles_food14:33
jaypipesdansmith: done14:38
dansmithjaypipes: ah thanks, will fix those typos14:39
*** cdent has quit IRC14:41
openstackgerritDan Smith proposed openstack/nova master: Make CellDatabases fixture reentrant  https://review.openstack.org/61166514:42
openstackgerritDan Smith proposed openstack/nova master: Modify get_by_cell_and_project() to get_not_deleted_by_cell_and_projects()  https://review.openstack.org/60766314:42
openstackgerritDan Smith proposed openstack/nova master: Minimal construct plumbing for nova list when a cell is down  https://review.openstack.org/56778514:42
openstackgerritDan Smith proposed openstack/nova master: Refactor scatter-gather utility to return exception objects  https://review.openstack.org/60793414:42
openstackgerritDan Smith proposed openstack/nova master: Return a minimal construct for nova show when a cell is down  https://review.openstack.org/59165814:42
openstackgerritDan Smith proposed openstack/nova master: Return a minimal construct for nova service-list when a cell is down  https://review.openstack.org/58482914:42
openstackgerritMatt Riedemann proposed openstack/python-novaclient stable/rocky: Fix up userdata argument to rebuild.  https://review.openstack.org/61305714:42
openstackgerritMatthew Booth proposed openstack/nova master: Allow placement_context_manager to be replaced on reconfiguration  https://review.openstack.org/61305814:46
mdboothsean-k-mooney: ^^^14:46
mdboothsean-k-mooney: Not quite as clean as I'd hoped because python syntax doesn't allow @db_api.placement_context_manager().writer14:46
efriedmdbooth: Not having looked at the patch at all, why do you need () ?14:48
efriedoh, I think I get it.14:48
openstackgerritMatt Riedemann proposed openstack/python-novaclient master: Deprecate the unused instance-name  https://review.openstack.org/60252014:49
bauzasefried: the placement modeling for https://review.openstack.org/#/c/602474/2/specs/stein/approved/vgpu-stein.rst@103 is already made by the reshaper change https://review.openstack.org/#/c/599208/14:50
efriedbauzas: I thought that might be the case.14:51
bauzasmelwitt: once you're up, not sure I understand your concern about upgrade on https://review.openstack.org/#/c/602474 since I already commented this in the upgrade section14:51
*** mlavalle has joined #openstack-nova14:51
sean-k-mooneymdbooth: i was assuming you would have made it @db_api.writer but ya ill take a look when my browser stops crashing form the giat log i tried to open14:51
bauzasefried: I just wanted to keep minimalistic changes to the alrady approved spec14:52
efriedbauzas: Is that described in the reshaper spec?14:52
bauzasefried: no, that's direct code14:52
efriedbauzas: I think I'm trying to say it should be described in *some* spec *somewhere*.14:52
bauzasefried: I could amend https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/add-support-for-vgpu.html if you wish14:52
efriedI don't disagree we should minimize changes to a spec reapproval in theory, but this seems like something worth including.14:52
efriedbauzas: That would be okay too.14:52
bauzaswhat I reallly want is possible quick approval14:53
efriedbauzas: swhy I didn't downvote :)14:53
sean-k-mooneymdbooth: that will still not rebind the context on reconfigurtion14:53
bauzasand then, if left comments, a possible follow-up14:53
efriedsure14:53
bauzasefried: or I could amend https://review.openstack.org/#/c/602474 in a follow-up if you prefer14:54
sean-k-mooneymdbooth: you will need to do somehtin like this https://stackoverflow.com/a/3350730814:56
efriedbauzas: There was some question (discussion with mriedem) as to whether these vgpu reshaper patches should be associated with the reshaper bp or the vgpu bp. I'm starting to think it's more appropriate to do the latter. The reshaper bp enables the work, but we're not going to go back and tag every future reshape impl against that same bp.14:56
bauzashonestly, it's just a gerrit tag14:57
bauzasso I don't really care14:57
efriedThat being the case, IMO the text in question ought to go into https://review.openstack.org/#/c/602474 (the vgpu spec).14:57
bauzasprovided I have reviews :)14:57
mdboothsean-k-mooney: Ah, you're right14:57
* mdbooth facepalms14:57
bauzasefried: fair, I'll write a follow-up14:58
efriedIt's more than a gerrit tag. It feeds into being able to claim completion of a blueprint, etc.14:58
bauzasI understand this but meh14:58
bauzaseither way, looks like it's a priority14:58
sean-k-mooneymdbooth: ill submit the version with the flag for review. ill see if i can create a simple decorator after once the simple fix is up14:59
*** cfriesen has joined #openstack-nova14:59
efriedbauzas: I'm not a spec core, so I can't approve it either way.14:59
mdboothsean-k-mooney: In lighter news, putting an emoji in a gerrit comment causes a 500 :)14:59
bauzasefried: I know, but your comments are still valid15:00
jaypipesmelwitt, dansmith: do we actually support quota classes other than "default"?15:00
sean-k-mooneyhehe im not sure if that is a feature or a bug15:00
dansmithjaypipes: I think no15:00
*** Luzi has quit IRC15:02
*** beagles_food is now known as beagles15:03
melwittjaypipes: we don't have anything in tree that uses anything other than "default" but if we were to wire it up, it would work. we've thrown around ideas of using them for things like preemptible instances but nothing has materialized yet. and alex_xu's "quota by resource class" proposed to leverage them if you've seen that spec15:08
openstackgerritBalazs Gibizer proposed openstack/nova master: Consider allocations invovling child providers during allocation cleanup  https://review.openstack.org/60605015:09
jaypipesmelwitt: well, the quota by resource class is different. quota *classes* are more templates of default limit values for the set of re15:11
jaypipesgistered resource types.15:11
jaypipesand highly coupled to RAX's turnstile middleware...15:11
mdboothsean-k-mooney: Actually I'm just going to abandon that patch. It's dumb and nothing like it can work.15:11
melwittjaypipes: I know, but if you read the spec, we could use them to set limits for resource classes in nova. but I don't think that's gonna happen because people would rather wait until we move to keystone limits and oslo.limit15:12
mdboothsean-k-mooney: At least sed's feelings won't be hurt.15:12
jaypipesmelwitt: ack15:12
bauzasdansmith: based on the numerous feedback, could you please review https://review.openstack.org/#/c/602474/ ? I'll provide a follow-up on some efried's details15:13
bauzasit's a re-approval15:13
sean-k-mooneymdbooth: well https://stackoverflow.com/a/33507308 will work because i wrote it specically for doing this kind of thing but ya lets just stick with the simple fix until it breaks15:14
*** gyee has joined #openstack-nova15:15
openstackgerritSylvain Bauza proposed openstack/nova-specs master: Proposes NUMA topology with RPs  https://review.openstack.org/55292415:16
*** alexchadin has quit IRC15:16
bauzasefried: just fixed the typo you mentioned ^15:16
bauzasthanks for the review15:16
efriedbauzas: But just one of them :)15:16
dansmithbauzas: I'll add it to the queue15:17
openstackgerritMatt Riedemann proposed openstack/nova master: Add restrictions on updated_at when getting migrations  https://review.openstack.org/60779815:17
mdboothsean-k-mooney: However, I think I didn't demonstrate that mechanically updating all uses of placement_context_manager() is pretty easy.15:17
openstackgerritMatt Riedemann proposed openstack/nova master: Add restrictions on updated_at when getting instance action records  https://review.openstack.org/60780115:17
openstackgerritMatt Riedemann proposed openstack/nova master: Document restrictions on changes-since/before when listing servers  https://review.openstack.org/61307015:17
mdbooths/didn't/did/15:17
mdboothThat was a weird typo15:17
*** lpetrut has quit IRC15:18
sean-k-mooneymdbooth: ya i suspected that woudl be easy to do but getting the new decorator correct is the tricky bit. anyway the more i talk about the less time i spend doing it ill have the patch up in a ffew minutes15:20
*** ttsiouts has quit IRC15:22
*** tbachman has joined #openstack-nova15:23
bauzasdansmith: heh, np15:25
*** ttsiouts has joined #openstack-nova15:26
*** fghaas has quit IRC15:42
*** pcaruana has quit IRC15:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Consider allocations invovling child providers during allocation cleanup  https://review.openstack.org/60605015:50
openstackgerritBalazs Gibizer proposed openstack/nova master: Reject forced move with nested source allocation  https://review.openstack.org/60578515:50
openstackgerritBalazs Gibizer proposed openstack/nova master: Run negative server moving tests with nested RPs  https://review.openstack.org/60412515:50
openstackgerritBalazs Gibizer proposed openstack/nova master: Handle allocations consuming only from the child RPs  https://review.openstack.org/60829815:50
gibimriedem, efried, jaypipes: I have fixed up the use-nested-allocation-candidates series ^^15:51
efriedgibi: Cool, I'm sure it's perfect now.15:53
openstackgerritMatt Riedemann proposed openstack/python-novaclient stable/rocky: Follow up "Fix up userdata argument to rebuild"  https://review.openstack.org/61308615:53
gibiefried: :)15:53
*** ccamacho has quit IRC15:54
*** cdent has joined #openstack-nova15:55
jaypipesgibi: thx gibi15:56
*** fghaas has joined #openstack-nova15:57
openstackgerritMatt Riedemann proposed openstack/python-novaclient stable/queens: Fix up userdata argument to rebuild.  https://review.openstack.org/61309015:57
openstackgerritMatt Riedemann proposed openstack/python-novaclient stable/queens: Follow up "Fix up userdata argument to rebuild"  https://review.openstack.org/61309115:57
*** munimeha1 has joined #openstack-nova15:58
openstackgerritDaniel Abad proposed openstack/nova master: Fix ironic client ironic_url deprecation warning  https://review.openstack.org/61187216:00
*** dave-mccowan has quit IRC16:02
*** pvc has joined #openstack-nova16:06
pvcHi sean-k-mooney my problem is i cannot run nvidia x settings on my instance16:07
pvchttps://docs.nvidia.com/grid/latest/grid-licensing-user-guide/index.html#licensing-grid-vgpu16:07
sean-k-mooneythe docs have an advanced section that shouw how to set the liceing info using an config file on linux or the registry on windows16:10
*** pvc has quit IRC16:12
*** pvc has joined #openstack-nova16:16
*** psachin has quit IRC16:16
pvcHi sean-k-mooney can i use conf for adding a license right?16:16
openstackgerritDan Smith proposed openstack/nova master: Always read-deleted=yes on lazy-load  https://review.openstack.org/57519016:16
dansmithmelwitt: the down cell series could use some review if you have time. Everything up to the api change (which I orphaned while working on it) should be passing tests now16:18
*** itlinux has joined #openstack-nova16:18
bauzaspvc: I pointed you to the nvidia guest licensing documentation this morning16:18
bauzaspvc: https://docs.nvidia.com/grid/6.0/grid-licensing-user-guide/index.html#licensing-grid-software-linux-config-file16:19
*** pvc has quit IRC16:20
melwittdansmith: thanks for the heads up, I'll go through it. I was also thinking about the handling of quota behavior in the presence of down cells. I don't think we have a patch for that yet. if not, I can look at proposing that on top of the api change16:21
dansmithyep, not that I know of16:21
melwittack16:22
cdentsean-k-mooney: I built a place to store wood for the fire. good break. I saw mdbooth abandoned his thing, so where does stuff stand now?16:24
sean-k-mooneycdent: i was in meetings so and some other stff so ill have the simple booling flag version up soon16:25
*** pvradu has quit IRC16:26
openstackgerritMerged openstack/nova-specs master: Re-proposes multiple vGPU types in libvirt  https://review.openstack.org/60247416:26
mdboothcdent: Yeah, having thought about that again, I think it would need a different oslo.db api to do that. The decorator is returned by the object we want to replace, so there's no getting round that.16:26
*** pvradu has joined #openstack-nova16:26
*** helenafm has quit IRC16:29
*** pvradu has quit IRC16:31
*** lpetrut has joined #openstack-nova16:35
melwittbauzas: just noticed another thing for the vgpu spec follow up https://review.openstack.org/#/c/602474/2/specs/stein/approved/vgpu-stein.rst@1116:35
melwittbp name16:35
*** jmlowe has quit IRC16:39
*** ttsiouts has quit IRC16:39
*** imacdonn has quit IRC16:40
*** ttsiouts has joined #openstack-nova16:40
*** imacdonn has joined #openstack-nova16:43
*** ttsiouts has quit IRC16:45
*** dtantsur is now known as dtantsur|afk16:46
*** pcaruana has joined #openstack-nova16:46
*** jmlowe has joined #openstack-nova16:55
*** icey has quit IRC16:56
*** panda is now known as panda|off16:59
*** adrianc has quit IRC17:00
*** moshele has joined #openstack-nova17:00
*** k_mouza_ has joined #openstack-nova17:01
*** jdillaman1 has quit IRC17:04
*** icey has joined #openstack-nova17:04
*** k_mouza has quit IRC17:05
*** k_mouza_ has quit IRC17:06
*** derekh has quit IRC17:07
*** irclogbot_4 has quit IRC17:08
*** irclogbot_4 has joined #openstack-nova17:08
openstackgerritMatt Riedemann proposed openstack/nova master: Add functional recreate test for bug 1799727  https://review.openstack.org/61311517:09
openstackbug 1799727 in OpenStack Compute (nova) "CPU_Allocation_Ratio from nova.conf doesn't update exisiting providers" [Undecided,Confirmed] https://launchpad.net/bugs/179972717:09
openstackgerritJan Gutter proposed openstack/os-vif master: Update port profile unit tests in host_info  https://review.openstack.org/61063617:13
openstackgerritMatt Riedemann proposed openstack/nova master: Add functional recreate test for bug 1799727  https://review.openstack.org/61311517:14
openstackbug 1799727 in OpenStack Compute (nova) "CPU_Allocation_Ratio from nova.conf doesn't update exisiting providers" [High,Confirmed] https://launchpad.net/bugs/179972717:14
*** jpena is now known as jpena|off17:15
*** jmlowe has quit IRC17:18
*** Swami has joined #openstack-nova17:18
*** irclogbot_4 has quit IRC17:21
*** pvradu has joined #openstack-nova17:23
cfriesenjaypipes: stephenfin:  regarding the "show server numa topology" spec, are you okay with showing the *guest* topology for regular users if we clean up the various issues you raised in the spec?17:25
sean-k-mooneycfriesen: provdied the show numa toplogy spec does not show and host topology info they i think its fine17:27
*** pvradu has quit IRC17:27
sean-k-mooneycfriesen: if you want it to show how the virtual topology is map to a hosts phyisical toplogy then that would be admin only17:27
cfriesensean-k-mooney: agreed.  I think showing the "expected" host details to the admin would be useful, since we've run into cases where expected didn't match actual. :)17:31
sean-k-mooneycfriesen: it should not upstream. the intel nfv ci actully ssh's into the host that the vm is running on and validates its pinned correctly17:33
*** mvkr has quit IRC17:34
cfriesensean-k-mooney: live migration17:34
sean-k-mooneycfriesen: but for an admin yes it could be useful when debuging17:34
sean-k-mooneycfriesen: what about it i said we validated it was pinned as nova told it too17:35
cfriesen(at least until the patch goes in to fail the live migration if there's a numa topology)17:35
sean-k-mooneyi did not say nova pinned it correctly17:35
sean-k-mooneycfriesen: ya on that i have asked stephen to make that condional and off by default17:36
cfriesensean-k-mooney: we ran into some bugs during aborted/failed operations17:36
openstackgerritMerged openstack/nova stable/rocky: Fix up compute rpcapi version for pike release  https://review.openstack.org/61256117:37
sean-k-mooneycfriesen: i know of at least on production largscale deployment that uses ovs-dpdk which means the guest have hugepages and numa toplogy that uses livemigration17:37
sean-k-mooneyit is still true that live migration can fail because there are not enough free hugepges on the numa node but the failure rate was low enouh that they were happy to contiue to use it17:38
sean-k-mooneymainly since they could just specify a host that they knew would fit the instance17:39
jaypipescfriesen: I'm not thrilled about it, no..17:40
openstackgerritElod Illes proposed openstack/nova master: Transform scheduler.select_destinations notification  https://review.openstack.org/50850617:41
sean-k-mooneyjaypipes: even if its just the computed toplogy form the flavor+image with no host info?17:41
cfriesenjaypipes: so currently an end-user can't tell their topology without logging into the guest and checking it.  if they're using a per-user keypair, other users in the same tenant can't see what the topology is.17:42
dansmithcfriesen: what's the use case for that though?17:42
sean-k-mooneycfriesen: well they can they can look at the flavor and image metadata17:42
cfriesensean-k-mooney: not all clouds allow end users to see flavor extra specs17:42
sean-k-mooneycfriesen: wait they dont? how do you know what the falvor does without that info17:43
dansmithsean-k-mooney: historically those were admin only17:43
cfriesendansmith: for an admin it's useful for showing what nova expects the virt/phys mapping to be, which can then be checked against the actual mapping on the hypervisor.17:43
sean-k-mooneydansmith: huh  ok i guess i just always am an admin so never noticed17:44
cfriesenfor a normal user, it's useful in the same way knowing how many cpus or how much ram your instance has is useful17:44
dansmithcfriesen: you mean it's useful for an admin to make sure nova is doing the thing it expects? that seems like a weak case to me..17:44
dansmithcfriesen: but .. the user can get that from the guest if they're the user. I just have a hard time understanding what they can do with that info from just the API, other than maybe complain, or notice that it changed when their admin migrates them17:45
sean-k-mooneycfriesen: is any of this stuff already in the metadata api?17:46
cfriesendansmith: only a user with a suitable keypair can login to the guest.  other users in the same tenant can't.17:46
cfriesenor at least might not be able to17:46
dansmithcfriesen: right, it's that case I don't get17:46
dansmithcfriesen: like, if I can log into the guest, what does it matter what the topology is?17:47
dansmithI mean I can come up with completely synthetic reasons, but they're exceedingly weak, which is what I said above17:47
dansmithsorry.. "can't log into the guest"17:49
*** ralonsoh has quit IRC17:52
openstackgerritMatt Riedemann proposed openstack/nova master: WIP: Update allocation_ratios in placement inventory if config changes  https://review.openstack.org/61312617:54
*** lpetrut has quit IRC17:55
cfriesenmost of the uses that we added it for were admin-level, admittedly.  Like figuring out why something is having a hard time scheduling on a migration or showing the expected virt/phys mapping.  For an end user it's really just about showing all the available information about the instance and making it so they don't have to jump through hoops to get it.17:55
cdent"making it so they don't have to jump through hoops to get it" <- that ought to be compelling enough?17:56
openstackgerritGaudenz Steinlin proposed openstack/nova master: Extend volume for libvirt network volumes (RBD)  https://review.openstack.org/61303918:00
sean-k-mooneycdent: its 3 command to 1 but if some clouds hide flavor extra spec then i guess maybe they cant run the 3 commands18:01
sean-k-mooneycfriesen: had you also planned to show cpu topplogy or just numa topology18:01
sean-k-mooneycfriesen: as we both know they can and usually are very different by defualt on openstack18:02
openstackgerritJay Pipes proposed openstack/nova master: quota: remove unused code  https://review.openstack.org/61312718:03
openstackgerritJay Pipes proposed openstack/nova master: quota: remove unused Quota driver methods  https://review.openstack.org/61312818:03
openstackgerritJay Pipes proposed openstack/nova master: quota: remove QuotaDriver.destroy_all_by_project()  https://review.openstack.org/61312918:03
openstackgerritJay Pipes proposed openstack/nova master: quota: remove default kwarg on get_class_quotas()  https://review.openstack.org/61313018:03
jaypipesmelwitt, dansmith: some cleanups of the quota system ^^18:04
cfriesensean-k-mooney: we currently show memory size, page size, and which guest CPUs are associated with each guest numa node.  showing virtual CPU topology (sockets/cores/threads) would be a lot trickier since that's not in the nova DB.18:05
sean-k-mooneywell it is if set in the flavor or image else it up to the virt driver18:06
jaypipesmelwitt, dansmith: more patches on the way but those are a good first chunk of removing cruft18:06
*** cdent has quit IRC18:06
*** jdillaman has joined #openstack-nova18:06
sean-k-mooneycfriesen: so if we were going this route it would be nice to include it if set in image or flavor18:06
cfriesensean-k-mooney: don't we have scenarios where we say "max cpus per socket"?  in that case only the virt driver knows the actual number18:07
*** tbachman has quit IRC18:09
*** tbachman has joined #openstack-nova18:12
sean-k-mooneycfriesen: yes but we can also say 2 cpus per socket instead of max18:13
sean-k-mooneyanyway that is just a taught18:13
sean-k-mooneyit sound like dansmith and jaypipes would prefer this not to be in the api anyway so maybe you could do it as an osc or nova client feature18:14
*** mvkr has joined #openstack-nova18:17
*** ivve has quit IRC18:18
*** tbachman has quit IRC18:18
cfriesendansmith: jaypipes: currently there's no way for an admin to look at the expected virt/phys mapping without going into the database.  do we expect that nova admins will always have raw DB access?18:21
sean-k-mooneythey dont need db acess18:22
sean-k-mooneythey just need to be able to do a flaovr show and image show + look at teh libvirt xml18:22
cfriesensean-k-mooney: no, I'm talking about which specific guest vcpu maps to which specific host CPU18:22
sean-k-mooneythat in the libvirt xml18:22
cfriesensean-k-mooney: that's the hypervisor view, not nova's view (which in buggy cases can be different)18:23
sean-k-mooneythe database is not going to help you in does cases to debug what went wrong18:23
sean-k-mooneywe have no way to get the numa_toplogy blob from the moment when nova was caluating the pinning18:24
cfriesensure it can...if I can see that the entry in the database matched the previous mappings from before I did  a migration...18:24
sean-k-mooneywait your talking about migrtiaon with cpu pinning wich today is not supported18:25
cfriesencold migration is18:25
sean-k-mooneycold migration yes but that wont break in this case18:25
sean-k-mooneythe xml will be regenerated on the new host18:26
cfriesenthrow in power outages and downed compute nodes and lost messages and migration reverts18:26
sean-k-mooneyso when the db is in an undefiend state it may not agree with the hypervior18:27
sean-k-mooneyyes that is true. not sure this will help with htat18:27
cfriesensean-k-mooney: it'll at least tell us what the problem is18:28
sean-k-mooneythe problem being the db is borked18:28
sean-k-mooneyif the vm is running it means the hypervior pinned it correctly based on the info it had at the time.18:29
sean-k-mooneyit should never be the case that the db is correct and vm is wrong in the cold migrate case18:30
sean-k-mooneylive migrate this can invert18:30
sean-k-mooneycfriesen: would a error log message generate by one of the periodic task on the compute agent not be more useful?18:31
sean-k-mooneye.g. dicoverd instance x with pinning y expect z18:31
openstackgerritsean mooney proposed openstack/nova master: harden placement init under wsgi  https://review.openstack.org/61003418:32
sean-k-mooneymelwitt: i updated the placement wsgi patch again based on more talks with mdbooth and cdent earilier18:34
sean-k-mooneymelwitt: do you still want me to drop the second unit test https://review.openstack.org/#/c/610034/5/nova/tests/unit/api/openstack/placement/test_db_api.py18:34
sean-k-mooneyif so i can respin it  quickly18:34
cfriesensean-k-mooney: an error log like that is not a bad idea, actually.18:37
*** tbachman has joined #openstack-nova18:38
sean-k-mooneyim kindof assuming any resonable size cloud that is going to have this problem is liekly exporting there logs to elastic serach and or similar and could set up an alert for it18:38
melwittsean-k-mooney: commented18:39
*** mchlumsky_ has joined #openstack-nova18:39
*** slaweq_ has joined #openstack-nova18:40
sean-k-mooneymelwitt: thanks18:41
*** jistr_ has joined #openstack-nova18:42
*** itlinux_ has joined #openstack-nova18:43
*** aloga_ has joined #openstack-nova18:43
*** tridde has joined #openstack-nova18:43
*** jhesketh_ has joined #openstack-nova18:44
cfriesensean-k-mooney: actually, I was wrong.  we do have the actual guest CPU topology in the InstanceNUMACell, so we could display it too.18:46
sean-k-mooneyin the instance request spec im assuming18:47
sean-k-mooneyor somewhare in the instance extra stuff in the db18:47
cfriesenno, InstanceNUMACell.cpu_topology18:47
sean-k-mooneydoes that actully give you the cpu_topology or the vcpu to pcpu mappings18:48
sean-k-mooneyi have learned that we are terrible at naming anything related to numa in the code18:48
*** icey has quit IRC18:48
*** itlinux has quit IRC18:48
*** mchlumsky has quit IRC18:48
*** aloga has quit IRC18:48
*** erlon has quit IRC18:48
*** priteau has quit IRC18:48
*** sayalilunkad has quit IRC18:48
*** raginbajin has quit IRC18:48
*** slaweq has quit IRC18:48
*** FlorianFa has quit IRC18:48
*** zzzeek has quit IRC18:48
*** kevinbenton has quit IRC18:48
*** trident has quit IRC18:48
*** gryf has quit IRC18:48
*** jistr has quit IRC18:48
*** SpamapS has quit IRC18:48
*** spotz has quit IRC18:48
*** lyarwood has quit IRC18:48
*** gibi has quit IRC18:48
*** jhesketh has quit IRC18:48
*** kevinbenton has joined #openstack-nova18:49
*** icey has joined #openstack-nova18:49
*** SpamapS has joined #openstack-nova18:50
cfriesentopology...threads/cores/sockets18:50
*** sayalilunkad has joined #openstack-nova18:50
*** erlon has joined #openstack-nova18:50
cfriesenthere's also InstanceNUMACell.siblings to show guest HT siblings18:51
*** moshele has quit IRC18:51
*** spotz has joined #openstack-nova18:52
sean-k-mooneycfriesen: cool. i still think you need to convice dansmith and jaypipes there is a need for it. the periodic task i think would have value and be a easir sell as it will activly detect there is an issue you should investagate18:54
*** gryf has joined #openstack-nova18:54
*** jmlowe has joined #openstack-nova19:00
*** openstackgerrit has quit IRC19:06
*** ivve has joined #openstack-nova19:20
*** openstackgerrit has joined #openstack-nova19:23
openstackgerritMerged openstack/nova stable/rocky: Move live_migration.pre.start to the start of the method  https://review.openstack.org/61271419:23
openstackgerritMerged openstack/nova stable/rocky: Ensure attachment cleanup on failure in driver.pre_live_migration  https://review.openstack.org/61271519:23
*** moshele has joined #openstack-nova19:26
openstackgerritMatt Riedemann proposed openstack/nova master: Add functional recreate test for bug 1799727  https://review.openstack.org/61311519:28
openstackbug 1799727 in OpenStack Compute (nova) "CPU_Allocation_Ratio from nova.conf doesn't update exisiting providers" [High,In progress] https://launchpad.net/bugs/1799727 - Assigned to Matt Riedemann (mriedem)19:28
openstackgerritMatt Riedemann proposed openstack/nova master: Update reserved/allocation_ratio in placement inventory if config changes  https://review.openstack.org/61312619:28
*** moshele has quit IRC19:29
*** slaweq_ is now known as slaweq19:40
*** awaugama has quit IRC19:42
*** irclogbot_4 has joined #openstack-nova19:44
*** irclogbot_4 has quit IRC19:46
*** irclogbot_4 has joined #openstack-nova19:47
*** jmlowe has quit IRC19:50
*** READ10 has joined #openstack-nova20:03
*** tbachman has quit IRC20:06
*** jmlowe has joined #openstack-nova20:11
openstackgerritMatt Riedemann proposed openstack/nova master: Add more documentation for online_data_migrations CLI  https://review.openstack.org/60583620:14
*** ivve has quit IRC20:19
*** irclogbot_4 has quit IRC20:21
openstackgerritMerged openstack/nova stable/queens: Fix up compute rpcapi version for pike release  https://review.openstack.org/61256220:25
*** READ10 has quit IRC20:34
*** tbachman has joined #openstack-nova20:34
*** pcaruana has quit IRC20:50
*** tbachman has quit IRC20:54
openstackgerritDan Smith proposed openstack/nova master: Make CellDatabases fixture reentrant  https://review.openstack.org/61166520:58
openstackgerritDan Smith proposed openstack/nova master: Modify get_by_cell_and_project() to get_not_deleted_by_cell_and_projects()  https://review.openstack.org/60766320:58
openstackgerritDan Smith proposed openstack/nova master: Minimal construct plumbing for nova list when a cell is down  https://review.openstack.org/56778520:58
openstackgerritDan Smith proposed openstack/nova master: Refactor scatter-gather utility to return exception objects  https://review.openstack.org/60793420:58
openstackgerritDan Smith proposed openstack/nova master: Return a minimal construct for nova show when a cell is down  https://review.openstack.org/59165820:58
openstackgerritDan Smith proposed openstack/nova master: Return a minimal construct for nova service-list when a cell is down  https://review.openstack.org/58482920:58
*** openstack has quit IRC21:03
*** openstack has joined #openstack-nova21:05
*** ChanServ sets mode: +o openstack21:05
*** erlon has quit IRC21:06
*** k_mouza has joined #openstack-nova21:07
*** fghaas has left #openstack-nova21:10
*** k_mouza has quit IRC21:11
*** spsurya has quit IRC21:21
openstackgerritMerged openstack/nova master: libvirt: fix disk_bus handling for root disk  https://review.openstack.org/58499921:22
cfriesenso are we recommending setting send_service_user_token to True now?  would we ever change the default to be True?21:23
*** itlinux_ has quit IRC21:33
openstackgerritMatt Riedemann proposed openstack/nova master: Create volume attachment during boot from volume in compute  https://review.openstack.org/54142021:35
openstackgerritMerged openstack/nova stable/queens: Move live_migration.pre.start to the start of the method  https://review.openstack.org/61277321:41
openstackgerritMerged openstack/nova stable/queens: Ensure attachment cleanup on failure in driver.pre_live_migration  https://review.openstack.org/61277421:41
*** spatel has joined #openstack-nova21:44
spatelsean-k-mooney: hey21:44
spatelI am having issue with block migration21:45
spatelit migrate full instance but didn't copy full disk.img file and my VM is failed to boot21:45
sean-k-mooneyare you using config drive21:45
spatelno21:45
sean-k-mooneyok config drive breaks that i think21:46
sean-k-mooneyhum that is strange21:46
spatelhttp://paste.openstack.org/show/732988/  This is what i have in nova.conf21:46
sean-k-mooneyso the migration suceeded form a nova api point of view but actully failed?21:46
spatelHorizon migrating vm but when i reboot machine it put me in emergency mode of linux ( when i check disk.img file its just few KB in size)21:47
spatelMigration succeeded and i can see my full VM migration to new compute node and its running also but as soon as i reboot it put me in emergency mode let me show you logs21:48
sean-k-mooneyso img images are usally raw images. for qcow images i know we have a disk layering/caching thing we use21:48
spatelThis is what instance look like after reboot http://paste.openstack.org/show/732989/21:49
spatelraw image21:49
sean-k-mooneyim not sure if we do the same disk caching thing with raw images21:49
spatelYou think it could be image issue ?21:49
*** tbachman has joined #openstack-nova21:50
openstackgerritMatt Riedemann proposed openstack/nova master: Add functional test for AggregateMultiTenancyIsolation + migrate  https://review.openstack.org/57126521:50
sean-k-mooneyam no not nessicarlly but if it was using the same disk offest stuff we use for qcow the vm image would be small unless you wrote a lot of data to it after it booted21:50
sean-k-mooneythat said line 11 XFS (sda1): last sector read failed21:51
spatelon source node size was  101MB21:51
sean-k-mooneythat looks like file system curruption21:51
spateland destination node its few KB21:52
sean-k-mooneyright am you not on shared storage by the way21:52
spatelNo i don't have shared storage21:52
sean-k-mooneye.g. you dont have /var/libvirt/... on nfs21:52
spatelno21:53
sean-k-mooneyok cool just chekcing21:53
spatelsure21:53
sean-k-mooneydo you have logs form nova for the migration21:53
spatellet me pull21:53
sean-k-mooneye.g. the n-cpu logs for sorce and dest21:53
sean-k-mooneyit seam like the image just got truncated but i dont know why that would happen21:53
spatelhttp://paste.openstack.org/show/732990/21:55
spatelthis is destination compute node logs21:55
*** bnemec has quit IRC21:55
spatelI am using swap disk it shouldn't be an issue21:57
sean-k-mooneyswap will be copied also so ya it shoudl be fine21:57
openstackgerritVladyslav Drok proposed openstack/nova stable/pike: Fix resize_instance rpcapi call  https://review.openstack.org/60343921:57
spatellet me try again and see21:58
sean-k-mooneythis is a little suspicious Unknown base file: /var/lib/nova/instances/_base/68e4d13dacff5cffeaacecf533afab659ec3e17021:59
spatelhmm!22:00
spatellet me try again can capture fresh log22:00
spateli have spun up new vm and it has 98Mdisk file22:00
openstackgerritMatt Riedemann proposed openstack/nova master: Migrate old style volume attachments on nova-compute startup  https://review.openstack.org/54913022:00
spatelThis is what i am doing for migration  Horizon > live migration > select block migration22:01
sean-k-mooneyany luck reporducing?22:03
sean-k-mooneyby the way im assuming you are not using sriov interface on that vm right22:03
spatelno SR-IOV22:04
sean-k-mooneyok good because that does not wrok :)22:04
sean-k-mooneyat least not yet22:04
spatelwith in 30 second nova said migration completed and i can see host also updated in horizon22:04
spatelhere is the fresh log http://paste.openstack.org/show/732991/22:05
spatelnow going to reboot my instance22:05
spatelfailed to boot22:06
spatelEntering emergency mode. Exit the shell to continue.22:06
*** slaweq has quit IRC22:06
sean-k-mooneyhum if you log in to host ostack-compute-27.v1v0x.net does its n-cpu log have any errors22:06
spateldisk size is 3.3M22:06
spateln-cpu log?22:07
spatelis that a log file?22:07
sean-k-mooneynova compute agent22:07
sean-k-mooneyin the developement installer devstack its refered to as n-cpu for short22:07
sean-k-mooneyso same log but for source node22:08
spatelThis is the log of compute-27 http://paste.openstack.org/show/732992/22:08
spatelno error there22:09
spatelwhat is this ?  [instance: 2ad8cdf5-4db7-4024-a35f-343340ad27ee] Instance not resizing, skipping migration.22:09
sean-k-mooneywe use the same code path for migration and resizing vms since it basically the same thing22:11
spatelhmm22:11
spatelDo you think my nova config is wrong?22:12
sean-k-mooneyits being logged from here https://github.com/openstack/nova/blob/12fcfc5e2b51b563529a5bc0b2990816bbbda80b/nova/compute/resource_tracker.py#L112722:12
sean-k-mooneyno i think your nova config is fine that said we might need to enable dbug loggin to get to the bottom of what is going on22:13
spatelSo i have manually copy disk file from source to destination and VM is up22:13
spatellook like compute node not not copying full disk file or source just deleting instance before copy finish22:14
openstackgerritMerged openstack/nova master: Add debug logs for when provider inventory changes  https://review.openstack.org/59756022:14
sean-k-mooneyya that should not be required22:14
sean-k-mooneyya can you enable debug logs on both nodes and try again22:14
*** efried has quit IRC22:15
spatelLet me go home and collect all nova debug and file BUG22:15
*** efried has joined #openstack-nova22:15
spatelit seems source just deleting instance before nova migration complete...22:15
sean-k-mooneyspatel: sure its late here too so i was going to be droping ofline soon once my pizzia is readdy22:16
sean-k-mooneyif you can file a bug and attach some debug logs for the source and destination we can see why it is not copying the disk22:16
spateli will22:17
spatelthanks!22:17
sean-k-mooneyspatel: by the way what release are you running22:18
spatelqueens22:18
sean-k-mooneythere was a error we caught just as Rocky was shipping where we treated migration compution as a success without checking if there was an error22:18
sean-k-mooneyah so ya you might be hitting that bug22:18
spateldamn it!! !22:19
spatelreally!!22:19
spatelsend me that BUG and i will check code22:19
sean-k-mooneybasically if libvirt through an error after migrtion started we did not handel it properly22:19
*** mlavalle has quit IRC22:20
spatelI am leaving and need to shutdown my pc but you can send me email on satish.txt@gmail.com22:20
*** spatel has quit IRC22:21
openstackgerritMerged openstack/nova stable/pike: Revert "Make host_aggregate_map dictionary case-insensitive"  https://review.openstack.org/60526822:35
openstackgerritMerged openstack/nova stable/pike: Enforce case-sensitive hostnames in aggregate host add  https://review.openstack.org/60526922:35
*** vabada has quit IRC22:35
*** vabada has joined #openstack-nova22:37
*** eharney has quit IRC22:38
*** rcernin has joined #openstack-nova23:05
*** itlinux has joined #openstack-nova23:05
*** READ10 has joined #openstack-nova23:17
*** openstackgerrit has quit IRC23:20
*** spatel has joined #openstack-nova23:33
*** spatel has quit IRC23:37
*** Swami has quit IRC23:45

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!