*** erlon has joined #openstack-nova | 00:03 | |
*** hamzy has joined #openstack-nova | 00:05 | |
*** slaweq has joined #openstack-nova | 00:11 | |
*** tetsuro has joined #openstack-nova | 00:12 | |
*** trungnv has quit IRC | 00:25 | |
*** trungnv has joined #openstack-nova | 00:25 | |
*** mlavalle has quit IRC | 00:32 | |
*** moshele has joined #openstack-nova | 00:35 | |
*** slaweq has quit IRC | 00:45 | |
*** medberry has quit IRC | 01:02 | |
*** medberry has joined #openstack-nova | 01:03 | |
*** jackyzhu has joined #openstack-nova | 01:08 | |
*** slaweq has joined #openstack-nova | 01:12 | |
*** sapd1 has quit IRC | 01:16 | |
*** mrsoul has joined #openstack-nova | 01:17 | |
*** sapd1 has joined #openstack-nova | 01:18 | |
*** hongbin has joined #openstack-nova | 01:19 | |
*** yikun has joined #openstack-nova | 01:21 | |
*** takashin has joined #openstack-nova | 01:21 | |
*** imacdonn has quit IRC | 01:23 | |
*** imacdonn has joined #openstack-nova | 01:24 | |
*** tiendc has joined #openstack-nova | 01:33 | |
*** erlon has quit IRC | 01:36 | |
*** alex_xu has quit IRC | 01:37 | |
*** litao has joined #openstack-nova | 01:39 | |
litao | hi | 01:40 |
---|---|---|
*** mhen has quit IRC | 01:42 | |
*** Dinesh_Bhor has joined #openstack-nova | 01:43 | |
*** slaweq has quit IRC | 01:44 | |
*** moshele has quit IRC | 01:46 | |
*** itlinux has quit IRC | 01:46 | |
*** mhen has joined #openstack-nova | 01:46 | |
openstackgerrit | liuming proposed openstack/nova master: Deletes evacuated instance files when source host is ok https://review.openstack.org/605987 | 01:56 |
*** fanzhang has joined #openstack-nova | 01:57 | |
*** alex_xu has joined #openstack-nova | 01:59 | |
*** cfriesen has quit IRC | 02:08 | |
openstackgerrit | Vu Cong Tuan proposed openstack/nova-specs master: Switch to stestr https://review.openstack.org/581284 | 02:10 |
*** slaweq has joined #openstack-nova | 02:11 | |
alex_xu | gmann: sorry, I can't join office hour today | 02:26 |
gmann | alex_xu: ok, i will skip for today then. thanks for informing. | 02:27 |
*** slaweq has quit IRC | 02:44 | |
*** psachin has joined #openstack-nova | 02:55 | |
*** sambetts|afk has quit IRC | 02:56 | |
*** sambetts_ has joined #openstack-nova | 03:00 | |
*** takashin has left #openstack-nova | 03:09 | |
*** slaweq has joined #openstack-nova | 03:11 | |
*** slaweq has quit IRC | 03:45 | |
*** udesale has joined #openstack-nova | 03:50 | |
*** udesale has quit IRC | 03:50 | |
*** udesale has joined #openstack-nova | 03:51 | |
openstackgerrit | Merged openstack/nova master: Fix up compute rpcapi version for pike release https://review.openstack.org/612231 | 04:00 |
*** Dinesh_Bhor has quit IRC | 04:06 | |
openstackgerrit | melanie witt proposed openstack/nova stable/rocky: Fix up compute rpcapi version for pike release https://review.openstack.org/612561 | 04:11 |
*** slaweq has joined #openstack-nova | 04:12 | |
*** janki has joined #openstack-nova | 04:23 | |
*** spsurya has joined #openstack-nova | 04:24 | |
*** hongbin has quit IRC | 04:25 | |
*** Dinesh_Bhor has joined #openstack-nova | 04:26 | |
*** tiendc has quit IRC | 04:26 | |
*** brinzhang has joined #openstack-nova | 04:29 | |
*** slaweq has quit IRC | 04:45 | |
*** ratailor has joined #openstack-nova | 04:57 | |
*** dave-mccowan has quit IRC | 04:57 | |
*** slaweq has joined #openstack-nova | 05:11 | |
*** jackyzhu007 has joined #openstack-nova | 05:15 | |
*** pvradu has joined #openstack-nova | 05:16 | |
*** jackyzhu007 has quit IRC | 05:17 | |
*** jackyzhu has quit IRC | 05:19 | |
*** phillu has joined #openstack-nova | 05:29 | |
*** bhagyashris has joined #openstack-nova | 05:29 | |
*** phillu has quit IRC | 05:43 | |
*** slaweq has quit IRC | 05:44 | |
*** tetsuro has quit IRC | 06:04 | |
*** Luzi has joined #openstack-nova | 06:04 | |
*** slaweq has joined #openstack-nova | 06:11 | |
*** pvc has quit IRC | 06:13 | |
*** phillu has joined #openstack-nova | 06:16 | |
*** artom has quit IRC | 06:18 | |
*** pvradu has quit IRC | 06:18 | |
openstackgerrit | Merged openstack/nova stable/rocky: Fix formatting non-templated cell URLs with no config https://review.openstack.org/611327 | 06:19 |
*** artom has joined #openstack-nova | 06:20 | |
*** tetsuro has joined #openstack-nova | 06:27 | |
*** raginbajin has quit IRC | 06:32 | |
*** raginbajin has joined #openstack-nova | 06:34 | |
*** pvradu has joined #openstack-nova | 06:38 | |
*** alex_xu has quit IRC | 06:40 | |
*** pvradu has quit IRC | 06:42 | |
*** pvradu has joined #openstack-nova | 06:43 | |
*** jangutter has quit IRC | 06:47 | |
*** jangutter has joined #openstack-nova | 06:47 | |
*** ccamacho has joined #openstack-nova | 06:47 | |
*** phillu has quit IRC | 06:47 | |
*** Dinesh_Bhor has quit IRC | 07:00 | |
*** rcernin has quit IRC | 07:03 | |
*** ivve has joined #openstack-nova | 07:03 | |
*** gibi_off is now known as gibi | 07:04 | |
*** artom has quit IRC | 07:05 | |
*** pcaruana has joined #openstack-nova | 07:05 | |
*** artom has joined #openstack-nova | 07:07 | |
*** Dinesh_Bhor has joined #openstack-nova | 07:09 | |
*** threestrands has quit IRC | 07:13 | |
*** ralonsoh has joined #openstack-nova | 07:19 | |
*** helenafm has joined #openstack-nova | 07:19 | |
*** cfriesen has joined #openstack-nova | 07:21 | |
*** _pewp_ has quit IRC | 07:30 | |
*** Cardoe has quit IRC | 07:31 | |
*** Cardoe has joined #openstack-nova | 07:31 | |
*** _pewp_ has joined #openstack-nova | 07:31 | |
openstackgerrit | Yongli He proposed openstack/nova-specs master: add spec "show-server-numa-topology" https://review.openstack.org/612256 | 07:33 |
*** cfriesen has quit IRC | 07:39 | |
*** lpetrut has joined #openstack-nova | 07:41 | |
*** adrianc has joined #openstack-nova | 07:43 | |
*** adrianc_ has joined #openstack-nova | 07:43 | |
*** pvc has joined #openstack-nova | 07:46 | |
pvc | hi anyone sean-k-mooney or bauzas? | 07:46 |
*** sahid has joined #openstack-nova | 07:47 | |
*** jpena|off is now known as jpena | 07:48 | |
pvc | i already launch an instance iwth vgpu on it | 07:50 |
pvc | but which driver should i use? | 07:50 |
pvc | to run a gpu application | 07:51 |
bauzas | good morning nova | 07:53 |
*** tetsuro has quit IRC | 07:56 | |
*** Dinesh_Bhor has quit IRC | 08:01 | |
*** pvradu_ has joined #openstack-nova | 08:08 | |
*** pvradu has quit IRC | 08:11 | |
bauzas | pvc: you should use the grid guest driver | 08:18 |
* bauzas apologies for yesterday, had a fucking power outage | 08:18 | |
bauzas | which was originally planned, but not lasting so long | 08:19 |
*** liuyulong|away has quit IRC | 08:20 | |
*** brinzhang has quit IRC | 08:24 | |
*** brinzhang has joined #openstack-nova | 08:24 | |
*** sayalilunkad has quit IRC | 08:33 | |
*** sayalilunkad has joined #openstack-nova | 08:33 | |
pvc | i've successfully run the nvidia-smi bauzas | 08:33 |
pvc | Product Name : GRID P100-2B4 | 08:34 |
bauzas | cool | 08:34 |
pvc | Virtualization mode : VGPU | 08:34 |
pvc | but i cannot test the tensorflow-gpu :( | 08:34 |
pvc | do you have any prefer docs for that | 08:34 |
*** ttsiouts has joined #openstack-nova | 08:34 | |
*** moshele has joined #openstack-nova | 08:38 | |
*** derekh has joined #openstack-nova | 08:39 | |
*** pvc has quit IRC | 08:43 | |
*** cdent has joined #openstack-nova | 08:44 | |
openstackgerrit | Brin Zhang proposed openstack/nova-specs master: Support delete_on_termination in volume attach api https://review.openstack.org/612949 | 08:44 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Add regression test for bug 1550919 https://review.openstack.org/591733 | 08:46 |
openstack | bug 1550919 in OpenStack Compute (nova) "[Libvirt]Evacuate fail may cause disk image be deleted" [Medium,In progress] https://launchpad.net/bugs/1550919 - Assigned to Matthew Booth (mbooth-9) | 08:46 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Don't delete disks on shared storage during evacuate https://review.openstack.org/578846 | 08:46 |
*** pvc_ has joined #openstack-nova | 08:47 | |
pvc_ | are you running tensorflow on your instance bauzas | 08:47 |
*** priteau has joined #openstack-nova | 08:48 | |
*** vabada has quit IRC | 08:51 | |
*** vabada has joined #openstack-nova | 08:52 | |
*** phillu has joined #openstack-nova | 08:54 | |
openstackgerrit | Brin Zhang proposed openstack/nova-specs master: Support deleting data volume when destroy instance https://review.openstack.org/580336 | 08:54 |
*** ttsiouts has quit IRC | 08:55 | |
*** ttsiouts has joined #openstack-nova | 08:58 | |
*** Dinesh_Bhor has joined #openstack-nova | 08:58 | |
*** maciejjozefczyk has quit IRC | 09:02 | |
*** maciejjozefczyk has joined #openstack-nova | 09:02 | |
pvc_ | hi bauzas u using cuda | 09:03 |
pvc_ | tensorflow.python.framework.errors_impl.InternalError: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error :( | 09:06 |
sean-k-mooney | pvc_: are you using one of the Q series mdev-types | 09:07 |
sean-k-mooney | https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#cuda-open-cl-support-vgpu | 09:07 |
*** sambetts_ is now known as sambetts|afk | 09:07 | |
pvc_ | NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] | 09:08 |
sean-k-mooney | for you gpu https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#vgpu-types-tesla-p100 | 09:09 |
sean-k-mooney | you need to fine the mdev-type that correspond to one of P100-1Q P100-2Q P100-4Q P100-8Q or P100-16Q | 09:09 |
pvc_ | videocard of my instance: 00:05.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:15f8] (rev a1) | 09:10 |
sean-k-mooney | pvc_: yes that likely not going to change regardless of the mdev type yoou choose | 09:10 |
pvc_ | im using nvidia-211 | 09:12 |
sean-k-mooney | sure that does not help us map to the nvidia docs. you will need to look at the vendor data reported for the mdev type and see if you can find the vgpu type | 09:13 |
pvc_ | GRID P100-2B4 | 09:14 |
pvc_ | the name of nvidia-211 | 09:14 |
sean-k-mooney | right so that is a B series vgpu and does not support compute via opencl or cuda | 09:15 |
sean-k-mooney | pvc_: find the one corresponding to P100-2Q | 09:15 |
sean-k-mooney | that will be the most similar | 09:15 |
pvc_ | okay wait | 09:15 |
pvc_ | cat nvidia-84/name GRID P100-2Q | 09:16 |
sean-k-mooney | cool so if you set that in the nova.conf and boot a new vm it should work with cuda | 09:17 |
sean-k-mooney | you shoudl look at https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#vgpu-types-tesla-p100 and determin which one meets your needs | 09:17 |
pvc_ | wait | 09:17 |
pvc_ | may i know where i can find that information? | 09:17 |
pvc_ | thank you i'll try | 09:17 |
pvc_ | what docs can i read to determine where cuda will run? | 09:18 |
sean-k-mooney | all the vGPU types that end in Q support cuda acordeing to https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#cuda-open-cl-support-vgpu. | 09:18 |
sean-k-mooney | its the same doc just later in section 1.6.2 | 09:19 |
pvc_ | wow that's so cool :) | 09:19 |
pvc_ | thank you so much | 09:19 |
pvc_ | i'll lauch again an instance | 09:19 |
pvc_ | can i use P100-1Q sean-k-mooney? | 09:20 |
sean-k-mooney | so decodeing this a bit more it appears the number before the Q in there name ing is the amound of ram so the p100-2q has 2GB of fram buffer so you can run 8 of them on the 16GB p100 | 09:20 |
sean-k-mooney | yes | 09:20 |
sean-k-mooney | that will simply have 1GB of vRAM allocated to its framebuffer instead | 09:21 |
sean-k-mooney | that will allow you to run 16 vms on each phyicl p100 | 09:21 |
pvc_ | 2Q is better right | 09:21 |
pvc_ | so i can launch 8 instances with 2Q | 09:21 |
sean-k-mooney | yep | 09:21 |
pvc_ | i have 2 tesla p100 | 09:21 |
sean-k-mooney | 8 with 2Q 4 with 4Q 2 with 8Q and 1 with 16Q | 09:22 |
pvc_ | http://paste.openstack.org/show/732950/ | 09:22 |
sean-k-mooney | so its a trade off between number of vms you can run and performcne that each vm will have | 09:22 |
sean-k-mooney | pvc_: what software license do you have by the way. | 09:25 |
sean-k-mooney | you will need the Quadro vDWS license to be able to used the higher performance Q series vgpu types | 09:25 |
pvc_ | i didnt use any license for now | 09:26 |
pvc_ | do i need to install it | 09:26 |
pvc_ | but i think i have the license | 09:27 |
sean-k-mooney | acording to https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#licensing-grid-vgpu cuda will be disabled untill you add the license key | 09:27 |
pvc_ | sean-k-mooney http://paste.openstack.org/show/732952/ | 09:28 |
pvc_ | licensing on compute node or on the instance? | 09:29 |
sean-k-mooney | you have to install the lisening server somewhere on your network and then i belive you need to point your instance at the licensing server | 09:29 |
sean-k-mooney | its all covered in https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#licensing-grid-vgpu | 09:30 |
pvc_ | can i just install in on my compute node? | 09:31 |
pvc_ | is that possible | 09:31 |
sean-k-mooney | pvc_: no not as far as i can tell | 09:31 |
sean-k-mooney | what you can do is configure the adresss of the licening server once then snapshot the vm and use the new image as your base image in glance for tenants | 09:32 |
pvc_ | wait i just install the licensing server | 09:33 |
*** k_mouza has joined #openstack-nova | 09:33 | |
*** dtantsur|afk is now known as dtantsur | 09:34 | |
bauzas | sean-k-mooney: pvc_: sorry was on meeting | 09:37 |
sean-k-mooney | pvc_: if you are using linux guest looks like you can also just add a config file and enable the nvidia-gridd service https://docs.nvidia.com/grid/latest/grid-licensing-user-guide/index.html#licensing-grid-software-linux-config-file | 09:37 |
sean-k-mooney | bauzas: no worries | 09:37 |
bauzas | sean-k-mooney: thanks for helping pvc_, all of what you said is valid :) | 09:37 |
bauzas | nvidia requires a specific license for CUDA, and implies a GPU profile | 09:37 |
pvc_ | im using 90days trial, may i know if it have a free license? | 09:39 |
sean-k-mooney | pvc_: the trial shoudl work for now but you will need a Quadro vDWS license | 09:40 |
sean-k-mooney | pvc_: the priceing is covered here https://images.nvidia.com/content/grid/pdf/161207-GRID-Packaging-and-Licensing-Guide.pdf | 09:40 |
pvc_ | can i continue without license? | 09:41 |
pvc_ | or do i need to buy | 09:41 |
sean-k-mooney | ouch $450 per concurrent user for perpetual license | 09:41 |
sean-k-mooney | pvc_: with out cuda suppor and a frame rate cap of 3 frames pre second and a reduced frame rate yes but realitically no you need a licese | 09:42 |
*** bhagyashris has quit IRC | 09:43 | |
* bauzas makes no comment | 09:43 | |
pvc_ | as per nvidia support yes, The evaluation license provides access to 128 CCU's of NVIDIA Quadro Virtual Data Center Workstation (Quadro vDWS) edition for up to 90 days. | 09:47 |
bauzas | pvc_: just to make it clear, the client licensing is in https://docs.nvidia.com/grid/6.0/grid-licensing-user-guide/index.html | 09:49 |
*** panda has quit IRC | 09:54 | |
*** panda has joined #openstack-nova | 09:55 | |
openstackgerrit | Ivaylo Mitev proposed openstack/nova master: VMware: VIF info and utils for image as template https://review.openstack.org/612974 | 09:58 |
openstackgerrit | Ivaylo Mitev proposed openstack/nova master: VMware: Inventory path utils for image as template https://review.openstack.org/612976 | 09:59 |
*** k_mouza has quit IRC | 10:00 | |
*** k_mouza has joined #openstack-nova | 10:01 | |
*** aloga has quit IRC | 10:08 | |
*** k_mouza has quit IRC | 10:09 | |
*** sahid has quit IRC | 10:12 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Fail to live migration if instance has a NUMA topology https://review.openstack.org/611088 | 10:15 |
openstackgerrit | Ivaylo Mitev proposed openstack/nova master: VMware: OVA and StrOpt images as VM templates https://review.openstack.org/609736 | 10:19 |
*** k_mouza has joined #openstack-nova | 10:20 | |
BlackDex | Hello there. I'm seeing some invalid (RFC Validation) json blob data in the instance_extra table of nova. It has nested json data and also a key is not quoted. | 10:22 |
BlackDex | is this by design? | 10:22 |
*** phillu has quit IRC | 10:24 | |
*** pvc_ has quit IRC | 10:26 | |
BlackDex | wel not nested, but multiple root elements actually | 10:26 |
*** alexchadin has joined #openstack-nova | 10:28 | |
*** phillu has joined #openstack-nova | 10:30 | |
*** pvradu_ has quit IRC | 10:38 | |
*** pvradu has joined #openstack-nova | 10:38 | |
*** tbachman has quit IRC | 10:40 | |
openstackgerrit | Fan Zhang proposed openstack/nova master: Retry after hitting libvirt error VIR_ERR_OPERATION_INVALID in live migration. https://review.openstack.org/612272 | 10:53 |
*** phillu has quit IRC | 10:57 | |
*** udesale has quit IRC | 11:01 | |
*** ttsiouts has quit IRC | 11:06 | |
*** adrianc_ has quit IRC | 11:07 | |
*** adrianc has quit IRC | 11:07 | |
*** erlon has joined #openstack-nova | 11:09 | |
*** adrianc_ has joined #openstack-nova | 11:10 | |
*** adrianc has joined #openstack-nova | 11:10 | |
*** ratailor has quit IRC | 11:10 | |
*** fghaas has joined #openstack-nova | 11:11 | |
*** k_mouza has quit IRC | 11:13 | |
*** dave-mccowan has joined #openstack-nova | 11:14 | |
*** Dinesh_Bhor has quit IRC | 11:15 | |
*** moshele has quit IRC | 11:16 | |
*** ttsiouts has joined #openstack-nova | 11:19 | |
*** k_mouza has joined #openstack-nova | 11:22 | |
*** adrianc_ has quit IRC | 11:29 | |
*** adrianc has quit IRC | 11:29 | |
*** Dinesh_Bhor has joined #openstack-nova | 11:33 | |
*** litao has quit IRC | 11:34 | |
*** jpena is now known as jpena|lunch | 11:36 | |
*** Dinesh_Bhor has quit IRC | 11:45 | |
*** moshele has joined #openstack-nova | 11:56 | |
*** adrianc has joined #openstack-nova | 11:58 | |
jangutter | sean-k-mooney: in os-vif (neutron api v3), what's going to be the names of the two sides passing objects? The docs refer to 'provider host' and 'networking host'? (respectively on compute and controller, I presume). | 11:59 |
*** pvradu_ has joined #openstack-nova | 12:04 | |
sean-k-mooney | am which doc? i have not worked on the spec for this in like 18 months | 12:06 |
sean-k-mooney | the two entities are the neutron api (sepcificaly the ml2 drivers which handel the binding request) and the nova compute agent | 12:07 |
*** pvradu has quit IRC | 12:08 | |
sean-k-mooney | the intent was that nova compute agent would pass a filtered host info object(in a serialised form) to neutron as part of port binding and the neutron ml2 driver would bind the port and respond with a serialise os-vif vif object | 12:09 |
jangutter | sean-k-mooney: I was looking at the vestigial docs in os-vif itself. and I understood it as you describe it too. | 12:10 |
sean-k-mooney | oh ok cool | 12:10 |
sean-k-mooney | i had previously debated createing a request and responce object pair in os-vif also | 12:11 |
*** aloga has joined #openstack-nova | 12:11 | |
sean-k-mooney | i think the details are something we can iterate on when we actully move forward on this. | 12:12 |
jangutter | sean-k-mooney: yep, just wanted to kinda capture the proposed intent and sequence a bit clearer. | 12:15 |
sean-k-mooney | cool | 12:16 |
*** udesale has joined #openstack-nova | 12:17 | |
sean-k-mooney | anything else you wanted to know on that topic or did i cover it above? | 12:17 |
jangutter | sean-k-mooney: I think it's sufficient, I'm not implementing the entire sequence here, basically just roughly sketching out things like sequence, the direction of filtering, and entities in the system. | 12:18 |
sean-k-mooney | cool so you pulling to geter all the required reading to write a spec to adress it :) | 12:19 |
jangutter | sean-k-mooney: heh, writing a paragraph in the doc explaining: hey, this is a stub, this is why it's a stub, and it might look like this after it's unstubbed. | 12:21 |
*** k_mouza has quit IRC | 12:23 | |
*** udesale has quit IRC | 12:25 | |
*** moshele has quit IRC | 12:26 | |
openstackgerrit | Daniel Abad proposed openstack/nova master: Fix ironic client ironic_url deprecation warning https://review.openstack.org/611872 | 12:26 |
*** brinzhang has quit IRC | 12:27 | |
*** moshele has joined #openstack-nova | 12:27 | |
openstackgerrit | Daniel Abad proposed openstack/nova master: Fix ironic client ironic_url deprecation warning https://review.openstack.org/611872 | 12:28 |
*** tbachman has joined #openstack-nova | 12:29 | |
*** mvkr has quit IRC | 12:29 | |
*** k_mouza has joined #openstack-nova | 12:30 | |
*** jangutter has quit IRC | 12:36 | |
*** jpena|lunch is now known as jpena | 12:37 | |
openstackgerrit | Gaudenz Steinlin proposed openstack/nova master: Ignore misleading resource updates from virt driver https://review.openstack.org/523006 | 12:39 |
*** jangutter has joined #openstack-nova | 12:39 | |
jangutter | sean-k-mooney: one more clarification, when you say "filtered host-info", what's the filter that's applied? Is it just filtering based on the plugin info gathered on the compute node, filtering out out-of-range plugins? | 12:44 |
sean-k-mooney | basicaly the idea was that nova would prefilter the host info object on 2 factor | 12:45 |
sean-k-mooney | first what plugins/viftypes were supported by the hypervior on that node | 12:46 |
sean-k-mooney | and second any aspect of the guest that would resitct what vif types could be used. | 12:46 |
jangutter | sean-k-mooney: I see, only info locally available at the compute node at the time, analogous to "capabilities". | 12:47 |
sean-k-mooney | e.g. vhost-user requires hugepages to work so if the flavor did not have a hugepage request it would be removed form the list | 12:47 |
sean-k-mooney | yep | 12:47 |
* sean-k-mooney technically hugepages arnt the requiremetn but close enough | 12:48 | |
sean-k-mooney | the main motivaitoin is to enable nova to say based on my knoladge of the hypervior and the instance request(flavor and image metatdata) this is the set of vif types i could support | 12:49 |
sean-k-mooney | and then neutron can say ok form that set i can support Y and select an optimal vif type to use | 12:50 |
stephenfin | np | 12:50 |
sean-k-mooney | once we have that capablity we can potainilly schedule on that in the future too | 12:51 |
jangutter | sean-k-mooney: right, makes sense and saves a round-trip with mis-scheduling or a failed portbinding. | 12:52 |
sean-k-mooney | yep or worse in the vhost user case where the vm boots with no error and no networking | 12:52 |
* sean-k-mooney part of me like the symerty of how terrible that user experince is for both teants and operators | 12:53 | |
jangutter | sean-k-mooney: even the qemu error in libvirt is tricky to trace and very misleading in that case. | 12:56 |
sean-k-mooney | jangutter: qemu does not provide an error in that case at least it did not in the past | 12:56 |
sean-k-mooney | the only error i have ever seen for this is a debug only error in dpdk logs related to not being able to map the memory | 12:57 |
sean-k-mooney | but in anycase it would allow us on both the nova and neutron side to filter down to only inteface we think should work instead of relying on the operator/deployer to get this right | 12:58 |
*** eharney has joined #openstack-nova | 12:58 | |
jangutter | sean-k-mooney: I remember finding something in the qemu logs about "falling back on userspace virtio" when that happened. No hard error. | 12:59 |
sean-k-mooney | really well the fallback does not work so i guess its nice they tried but ya i think we have all been bit by that at some point if we have used vhost-user | 13:00 |
jangutter | sean-k-mooney: one day, there'll be unscarred users, developers and operators. | 13:02 |
*** dave-mccowan has quit IRC | 13:02 | |
*** dave-mccowan has joined #openstack-nova | 13:04 | |
*** bnemec has joined #openstack-nova | 13:04 | |
*** dtantsur has quit IRC | 13:04 | |
sean-k-mooney | jangutter: you mean when ai ban humans form coding and do it all themselves i totally agree | 13:04 |
sean-k-mooney | that or when they kill all the users ... | 13:05 |
*** mchlumsky has joined #openstack-nova | 13:07 | |
*** dtantsur has joined #openstack-nova | 13:09 | |
*** beagles is now known as beagles_mtg | 13:12 | |
*** mvkr has joined #openstack-nova | 13:17 | |
openstackgerrit | sean mooney proposed openstack/nova master: harden placement init under wsgi https://review.openstack.org/610034 | 13:18 |
*** udesale has joined #openstack-nova | 13:22 | |
*** cdent has quit IRC | 13:22 | |
*** cdent has joined #openstack-nova | 13:30 | |
*** tbachman has quit IRC | 13:32 | |
*** adrianc has quit IRC | 13:34 | |
*** adrianc has joined #openstack-nova | 13:34 | |
*** mdbooth_ has joined #openstack-nova | 13:35 | |
*** mdbooth_ is now known as mdbooth | 13:35 | |
sean-k-mooney | zzzeek: mdbooth cdent so regarding https://review.openstack.org/#/c/610034/4/nova/api/openstack/placement/db_api.py i think we have 3 paths forward | 13:36 |
mdbooth | sean-k-mooney zzzeek: Continuing our previously downstream discussion of https://review.openstack.org/#/c/610034/ | 13:36 |
sean-k-mooney | one agree my code is awsome and merge it. 2 add a flag on the consumres side to track if we have configured it allready or 3 extend oslo.db to allow reconfiguring a transation_context that has been started | 13:37 |
mdbooth | sean-k-mooney: zzzeek to confirm, but I suspect 3 is not a thing | 13:37 |
mdbooth | I was going to propose a 2 phase approach: | 13:37 |
mdbooth | 1. Set a flag in the module on configuration, assert configuration only happens once, emit an unconditional warning on reconfiguration that reconfiguration did not happen. | 13:38 |
cdent | can someone explain what's wrong with option sean's option 'one'? | 13:39 |
mdbooth | 2. Update all decorators which currently close over placement_context_manager to call get_placement_context_manager(), and go with the original plan of creating a new one on reconfiguration. | 13:39 |
mdbooth | cdent: The TypeError will bite us when there's a bug in oslo.db, or it's changed inadvertently as code is moved around. It's not a deliberate API. | 13:40 |
mdbooth | And the configure flag is trivial to implement and better. | 13:40 |
cdent | do we have to do step 2? that's idiomatic throughout all of placement and nova | 13:40 |
cdent | it is perhaps messy, but a considerable change | 13:41 |
mdbooth | cdent: We don't have to do step 2, but it's the only way I can think of that we'll get reconfiguration across a restart. | 13:41 |
mdbooth | Agree, hence the existence of step 1. | 13:41 |
sean-k-mooney | cdent: well we could still keep decorator we just need to have a different one that does the dispatch to the current instance of the global rather then the one that was bound on import | 13:42 |
cdent | is "reconfiguration across a restart[1]" required? | 13:42 |
cdent | [1] I think maybe you mean reload in apache terms? | 13:42 |
mdbooth | cdent: Yeah. I understood it was required, but I'm prepared to hear it's not. | 13:43 |
sean-k-mooney | cdent: ya i was thinking that an operator may have changed the config a some point and when there awas a failure it could pickup those chages | 13:43 |
sean-k-mooney | cdent: im not sure its requried but it was an question i had | 13:43 |
mdbooth | If it's not required, there's no reason for such a noisy change. | 13:43 |
mdbooth | I think the warning makes sense, though. | 13:43 |
sean-k-mooney | e.g. is skipping reconfiguration vaild always | 13:44 |
sean-k-mooney | mdbooth: ya i had a debug level log on the placement version of the chagne | 13:44 |
sean-k-mooney | nova did not have the logger so i left it out of the nova version | 13:44 |
cdent | has anyone checked to see if mod-wsgi's behavior can be changed to be more like uwsgi's? | 13:45 |
*** k_mouza has quit IRC | 13:46 | |
sean-k-mooney | no but i had considerd seeing if we could change the kolla images to use uwsgi also but no time | 13:46 |
sean-k-mooney | i understand they did this for performance reasons in mod-wsgi to have quicker reloads but it seam wrong to me | 13:47 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix min config value for shutdown_timeout option https://review.openstack.org/613028 | 13:47 |
*** k_mouza has joined #openstack-nova | 13:47 | |
*** moshele has quit IRC | 13:47 | |
*** whoami-rajat has quit IRC | 13:48 | |
sean-k-mooney | this seams to be the only related thin in there configs https://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIScriptReloading.html | 13:48 |
mdbooth | cdent: This could easily be mechanical, btw | 13:52 |
mdbooth | cdent: I'm going to chuck up a quick POC/strawman | 13:52 |
cdent | mdbooth: I'm not direclty opposed to changing it, I'm just much too familiar with being able to change things | 13:53 |
cdent | being difficult | 13:53 |
mdbooth | cdent: Hehe, we're on the same page :) | 13:53 |
cdent | here we go: https://modwsgi.readthedocs.io/en/develop/user-guides/application-issues.html#reloading-python-interpreters | 13:53 |
mdbooth | cdent: This is why, downstream, I prioritised the deployment framework workaround for the same issue :) | 13:53 |
cdent | looks like that reloading solution has some issues with c extensions | 13:55 |
cdent | sean-k-mooney: did you see the link ^ | 13:56 |
sean-k-mooney | yes reading. the interperter reload option was removed in veriosn 2.0 so we cant use it anyway | 13:57 |
cdent | "As an alternative, daemon mode of mod_wsgi should be used and the “Process” reload mechanism added with mod_wsgi 2.0."? | 13:58 |
mdbooth | cdent sean-k-mooney: I just threw something together. Running a smoke test before sharing. | 13:58 |
cdent | I continue to think that we're trying to fix a problem in placement when it should be fix in how mod-wsgi is being managed | 13:59 |
sean-k-mooney | mdbooth: ok but the decoror change shoudl be really trivial i know it will work but the orginical question is it needed | 13:59 |
*** janki has quit IRC | 14:00 | |
sean-k-mooney | cdent: well its more a question of what is the abstract machine we expect to execut the code in | 14:00 |
*** liuyulong has joined #openstack-nova | 14:00 | |
cdent | sean-k-mooney: sure, and WSGI, as a protocol, has some pretty simple guidelines | 14:00 |
cdent | it is, by design, super fast to start up a new proecess wtih the application in it | 14:01 |
mdbooth | The little I read didn't seem to define whether you get a new python vm or not. | 14:01 |
cdent | because it just assumes you do | 14:01 |
mdbooth | assume != define | 14:01 |
cdent | that's kind of part of the WSGI-nature | 14:01 |
cdent | sure, but you're asking placement to take on additional complexity in the wrong place | 14:02 |
mdbooth | Unless we're saying that mod_wsgi is architecturally broken and shouldn't be used by anyone? | 14:02 |
cdent | WSGI exists so that http server-related concerns can exist outside the wsgi application code | 14:02 |
mdbooth | That could be true, I wouldn't know | 14:02 |
cdent | I do say that, these days :) | 14:02 |
cdent | and so do many other people | 14:03 |
mdbooth | cdent: Are those the same folks using undefined behaviour? :P | 14:03 |
cdent | they are people who don't want the wsgi server having any impact on the wsgi application, which mod-wsgi always has | 14:03 |
cdent | but for a long time it was the only performant choice | 14:04 |
sean-k-mooney | mdbooth: how mod wsgi is work i would say is falling int the same camp as undefiend behavior in c/c++ | 14:04 |
*** moshele has joined #openstack-nova | 14:05 | |
cdent | I'm happy to consider changes in placement for all of this stuff, but it would make me much happier if there was also visible effort to inquire with Graham about whether there are ways to achieve the same thing in mod-wsgi | 14:05 |
cdent | we seem to be trying to take the local path, when a more global path _might_ be the right thing | 14:05 |
sean-k-mooney | cdent: well long term i think relaoding a script by spawning an entirely seperate prociess with its own pyton interperet would be the correct thing to do in mod_wsgi | 14:07 |
*** udesale has quit IRC | 14:07 | |
sean-k-mooney | that said trippleo also shoudl not be starting the placemetn api continer while upgradeing tht database | 14:07 |
*** awaugama has joined #openstack-nova | 14:08 | |
mdbooth | sean-k-mooney: Yep, of course. But that's not the only reason why might fail on startup. This is still an issue. | 14:08 |
sean-k-mooney | short term we proably need a local solution | 14:08 |
mdbooth | Super short term is the tripleo fix. | 14:08 |
*** tbachman has joined #openstack-nova | 14:08 | |
sean-k-mooney | mdbooth: well the tripleo fix would have been correct in any case | 14:09 |
cdent | I don't feel like I have all the info | 14:09 |
mdbooth | sean-k-mooney: ack | 14:09 |
*** tbachman has quit IRC | 14:09 | |
mdbooth | cdent: We have a downstream failure because we're starting placement while still running db_sync on its db | 14:09 |
cdent | but I also don't feel like we (the three of us) have all the info about how mod-wsgi works | 14:09 |
mdbooth | When we restart it, we get a failure because placement_context_manager is already configured. | 14:10 |
mdbooth | So obviously we shouldn't be doing that, but it highlights that *any* restart of placement like this will fail for the same reason. | 14:11 |
cdent | what kind of 'restart' is being done? | 14:11 |
sean-k-mooney | mdbooth: well we are not exactly restarting it. the application is crashing somehow and reloading | 14:11 |
mdbooth | cdent: We're not restarting apache. I did ask about that. | 14:11 |
sean-k-mooney | mdbooth: if we were to restart the container then we definetly would have had a clean env | 14:11 |
mdbooth | sean-k-mooney: ack | 14:12 |
cdent | another option is to try touching the nova-placement-api or placement-api file | 14:12 |
openstackgerrit | Gaudenz Steinlin proposed openstack/nova master: Extend volume for libvirt network volumes (RBD) https://review.openstack.org/613039 | 14:12 |
cdent | that _may_ cause the daemon process (if that is what you are using) to clean itself up | 14:12 |
cdent | that's the "normal" way to do code reload and process reload with daemon process based mod-wsgi | 14:12 |
sean-k-mooney | cdent: updating the time stamp on an "imutable" continer feel kind of hacky | 14:13 |
cdent | long term: use uwsgi in the container, and have a FEP in some other container | 14:13 |
cdent | sean-k-mooney: i agree, in the case, just start the container back up | 14:13 |
cdent | the rules about how containers operate seems to be being selected sort of randomly | 14:14 |
cdent | in normal container life: if it doesn't work, you kill it and try again | 14:14 |
*** pvradu_ has quit IRC | 14:14 | |
sean-k-mooney | so part of the issue is i think httpd does not exit but the apllicaiton cannoth relaod properly so the container wont restart | 14:14 |
cdent | and you don't _ever_ run something as heavy as apache2 in a container | 14:14 |
* cdent needs to join the kolla project | 14:14 | |
sean-k-mooney | e.g. if the whole thing exploed when we hit the unrecoverable error then docker would just restat the container and we woudl be fine | 14:14 |
*** pvradu has joined #openstack-nova | 14:14 | |
* cdent nods | 14:14 | |
cdent | you might be able to achieve that by no using daemon mode with mod-wsgi | 14:15 |
cdent | but uwsgi would be easier :) ;) | 14:15 |
sean-k-mooney | this is happening on Rocky by the way so in osp13 for us downstream | 14:15 |
cdent | for future reference sean-k-mooney, have you looked at the way placedock works? https://github.com/cdent/placedock | 14:16 |
sean-k-mooney | for stein + we coudl look at swappng to uwsgi in kolla i guess as an addtional mitigation | 14:16 |
sean-k-mooney | cdent: i have ran it once | 14:16 |
sean-k-mooney | i was tring to figure out could i use it with the osc-placement fuctional test without runnign devstack | 14:17 |
cdent | the set up there is designed to make it easy to have some other thing in the front (like a load balancer or k8s ingress thing) | 14:17 |
cdent | if the application fails to start, it quits | 14:17 |
sean-k-mooney | cdent: so in the kolla world we are running placement under mod_wsgi then putting haproxy in front of it ... | 14:17 |
cdent | very wasteful | 14:18 |
sean-k-mooney | yep but it "works" going to uwsgi would be a good thing in general for kolla i think | 14:19 |
*** ttsiouts has quit IRC | 14:19 | |
sean-k-mooney | but back to the short term fix e.g. by end of this week/day | 14:19 |
sean-k-mooney | cdent: mdbooth are we going with the flag, excetion catching or decorator change | 14:19 |
cdent | sean-k-mooney: i think we're waiting to see what mdbooth's change looks like? | 14:20 |
mdbooth | cdent: I think regardless of my change, we want to go for the flag in the first instance | 14:20 |
openstackgerrit | Sylvain Bauza proposed openstack/nova-specs master: Proposes NUMA topology with RPs https://review.openstack.org/552924 | 14:20 |
mdbooth | Just because it's simple and an obvious improvement | 14:20 |
sean-k-mooney | mdbooth: ok ill add a flag instead of catching the exception | 14:21 |
mdbooth | Then we can consider the finer points of wsgi, and whether a refactor is worth it later | 14:21 |
*** ttsiouts has joined #openstack-nova | 14:21 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Consider allocations invovling child providers during allocation cleanup https://review.openstack.org/606050 | 14:21 |
dansmith | jaypipes: could you look at this for me? It's been a while since I wrote it and my context is fading, so I'd like to get it reviewed: https://review.openstack.org/#/c/611665 | 14:23 |
*** moshele has quit IRC | 14:24 | |
*** k_mouza has quit IRC | 14:24 | |
cdent | sean-k-mooney, mdbooth looks like zzzeek just did : https://review.openstack.org/#/c/613040/ | 14:25 |
sean-k-mooney | cdent: that is in oslo db. | 14:26 |
sean-k-mooney | the intent was to add the flag to the callee code not the lib code | 14:26 |
cdent | this allows the callee to check for already started before configuring | 14:27 |
sean-k-mooney | that said i can use that but then its not backportable easilly | 14:27 |
cdent | right, I'm not suggesting you use it _now_ | 14:27 |
cdent | just that it is available in the future | 14:27 |
cdent | and the change of exception is handy | 14:28 |
* cdent goes to do something not on the computer for a while | 14:30 | |
sean-k-mooney | acttuly i can use hasattter to see if it exits so i can contionally use it. ill submit a patch soon | 14:31 |
*** k_mouza has joined #openstack-nova | 14:31 | |
sean-k-mooney | cdent: enjoy your non compute thing :) | 14:31 |
*** beagles_mtg is now known as beagles_food | 14:33 | |
jaypipes | dansmith: done | 14:38 |
dansmith | jaypipes: ah thanks, will fix those typos | 14:39 |
*** cdent has quit IRC | 14:41 | |
openstackgerrit | Dan Smith proposed openstack/nova master: Make CellDatabases fixture reentrant https://review.openstack.org/611665 | 14:42 |
openstackgerrit | Dan Smith proposed openstack/nova master: Modify get_by_cell_and_project() to get_not_deleted_by_cell_and_projects() https://review.openstack.org/607663 | 14:42 |
openstackgerrit | Dan Smith proposed openstack/nova master: Minimal construct plumbing for nova list when a cell is down https://review.openstack.org/567785 | 14:42 |
openstackgerrit | Dan Smith proposed openstack/nova master: Refactor scatter-gather utility to return exception objects https://review.openstack.org/607934 | 14:42 |
openstackgerrit | Dan Smith proposed openstack/nova master: Return a minimal construct for nova show when a cell is down https://review.openstack.org/591658 | 14:42 |
openstackgerrit | Dan Smith proposed openstack/nova master: Return a minimal construct for nova service-list when a cell is down https://review.openstack.org/584829 | 14:42 |
openstackgerrit | Matt Riedemann proposed openstack/python-novaclient stable/rocky: Fix up userdata argument to rebuild. https://review.openstack.org/613057 | 14:42 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Allow placement_context_manager to be replaced on reconfiguration https://review.openstack.org/613058 | 14:46 |
mdbooth | sean-k-mooney: ^^^ | 14:46 |
mdbooth | sean-k-mooney: Not quite as clean as I'd hoped because python syntax doesn't allow @db_api.placement_context_manager().writer | 14:46 |
efried | mdbooth: Not having looked at the patch at all, why do you need () ? | 14:48 |
efried | oh, I think I get it. | 14:48 |
openstackgerrit | Matt Riedemann proposed openstack/python-novaclient master: Deprecate the unused instance-name https://review.openstack.org/602520 | 14:49 |
bauzas | efried: the placement modeling for https://review.openstack.org/#/c/602474/2/specs/stein/approved/vgpu-stein.rst@103 is already made by the reshaper change https://review.openstack.org/#/c/599208/ | 14:50 |
efried | bauzas: I thought that might be the case. | 14:51 |
bauzas | melwitt: once you're up, not sure I understand your concern about upgrade on https://review.openstack.org/#/c/602474 since I already commented this in the upgrade section | 14:51 |
*** mlavalle has joined #openstack-nova | 14:51 | |
sean-k-mooney | mdbooth: i was assuming you would have made it @db_api.writer but ya ill take a look when my browser stops crashing form the giat log i tried to open | 14:51 |
bauzas | efried: I just wanted to keep minimalistic changes to the alrady approved spec | 14:52 |
efried | bauzas: Is that described in the reshaper spec? | 14:52 |
bauzas | efried: no, that's direct code | 14:52 |
efried | bauzas: I think I'm trying to say it should be described in *some* spec *somewhere*. | 14:52 |
bauzas | efried: I could amend https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/add-support-for-vgpu.html if you wish | 14:52 |
efried | I don't disagree we should minimize changes to a spec reapproval in theory, but this seems like something worth including. | 14:52 |
efried | bauzas: That would be okay too. | 14:52 |
bauzas | what I reallly want is possible quick approval | 14:53 |
efried | bauzas: swhy I didn't downvote :) | 14:53 |
sean-k-mooney | mdbooth: that will still not rebind the context on reconfigurtion | 14:53 |
bauzas | and then, if left comments, a possible follow-up | 14:53 |
efried | sure | 14:53 |
bauzas | efried: or I could amend https://review.openstack.org/#/c/602474 in a follow-up if you prefer | 14:54 |
sean-k-mooney | mdbooth: you will need to do somehtin like this https://stackoverflow.com/a/33507308 | 14:56 |
efried | bauzas: There was some question (discussion with mriedem) as to whether these vgpu reshaper patches should be associated with the reshaper bp or the vgpu bp. I'm starting to think it's more appropriate to do the latter. The reshaper bp enables the work, but we're not going to go back and tag every future reshape impl against that same bp. | 14:56 |
bauzas | honestly, it's just a gerrit tag | 14:57 |
bauzas | so I don't really care | 14:57 |
efried | That being the case, IMO the text in question ought to go into https://review.openstack.org/#/c/602474 (the vgpu spec). | 14:57 |
bauzas | provided I have reviews :) | 14:57 |
mdbooth | sean-k-mooney: Ah, you're right | 14:57 |
* mdbooth facepalms | 14:57 | |
bauzas | efried: fair, I'll write a follow-up | 14:58 |
efried | It's more than a gerrit tag. It feeds into being able to claim completion of a blueprint, etc. | 14:58 |
bauzas | I understand this but meh | 14:58 |
bauzas | either way, looks like it's a priority | 14:58 |
sean-k-mooney | mdbooth: ill submit the version with the flag for review. ill see if i can create a simple decorator after once the simple fix is up | 14:59 |
*** cfriesen has joined #openstack-nova | 14:59 | |
efried | bauzas: I'm not a spec core, so I can't approve it either way. | 14:59 |
mdbooth | sean-k-mooney: In lighter news, putting an emoji in a gerrit comment causes a 500 :) | 14:59 |
bauzas | efried: I know, but your comments are still valid | 15:00 |
jaypipes | melwitt, dansmith: do we actually support quota classes other than "default"? | 15:00 |
sean-k-mooney | hehe im not sure if that is a feature or a bug | 15:00 |
dansmith | jaypipes: I think no | 15:00 |
*** Luzi has quit IRC | 15:02 | |
*** beagles_food is now known as beagles | 15:03 | |
melwitt | jaypipes: we don't have anything in tree that uses anything other than "default" but if we were to wire it up, it would work. we've thrown around ideas of using them for things like preemptible instances but nothing has materialized yet. and alex_xu's "quota by resource class" proposed to leverage them if you've seen that spec | 15:08 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Consider allocations invovling child providers during allocation cleanup https://review.openstack.org/606050 | 15:09 |
jaypipes | melwitt: well, the quota by resource class is different. quota *classes* are more templates of default limit values for the set of re | 15:11 |
jaypipes | gistered resource types. | 15:11 |
jaypipes | and highly coupled to RAX's turnstile middleware... | 15:11 |
mdbooth | sean-k-mooney: Actually I'm just going to abandon that patch. It's dumb and nothing like it can work. | 15:11 |
melwitt | jaypipes: I know, but if you read the spec, we could use them to set limits for resource classes in nova. but I don't think that's gonna happen because people would rather wait until we move to keystone limits and oslo.limit | 15:12 |
mdbooth | sean-k-mooney: At least sed's feelings won't be hurt. | 15:12 |
jaypipes | melwitt: ack | 15:12 |
bauzas | dansmith: based on the numerous feedback, could you please review https://review.openstack.org/#/c/602474/ ? I'll provide a follow-up on some efried's details | 15:13 |
bauzas | it's a re-approval | 15:13 |
sean-k-mooney | mdbooth: well https://stackoverflow.com/a/33507308 will work because i wrote it specically for doing this kind of thing but ya lets just stick with the simple fix until it breaks | 15:14 |
*** gyee has joined #openstack-nova | 15:15 | |
openstackgerrit | Sylvain Bauza proposed openstack/nova-specs master: Proposes NUMA topology with RPs https://review.openstack.org/552924 | 15:16 |
*** alexchadin has quit IRC | 15:16 | |
bauzas | efried: just fixed the typo you mentioned ^ | 15:16 |
bauzas | thanks for the review | 15:16 |
efried | bauzas: But just one of them :) | 15:16 |
dansmith | bauzas: I'll add it to the queue | 15:17 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add restrictions on updated_at when getting migrations https://review.openstack.org/607798 | 15:17 |
mdbooth | sean-k-mooney: However, I think I didn't demonstrate that mechanically updating all uses of placement_context_manager() is pretty easy. | 15:17 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add restrictions on updated_at when getting instance action records https://review.openstack.org/607801 | 15:17 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Document restrictions on changes-since/before when listing servers https://review.openstack.org/613070 | 15:17 |
mdbooth | s/didn't/did/ | 15:17 |
mdbooth | That was a weird typo | 15:17 |
*** lpetrut has quit IRC | 15:18 | |
sean-k-mooney | mdbooth: ya i suspected that woudl be easy to do but getting the new decorator correct is the tricky bit. anyway the more i talk about the less time i spend doing it ill have the patch up in a ffew minutes | 15:20 |
*** ttsiouts has quit IRC | 15:22 | |
*** tbachman has joined #openstack-nova | 15:23 | |
bauzas | dansmith: heh, np | 15:25 |
*** ttsiouts has joined #openstack-nova | 15:26 | |
*** fghaas has quit IRC | 15:42 | |
*** pcaruana has quit IRC | 15:49 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Consider allocations invovling child providers during allocation cleanup https://review.openstack.org/606050 | 15:50 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Reject forced move with nested source allocation https://review.openstack.org/605785 | 15:50 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Run negative server moving tests with nested RPs https://review.openstack.org/604125 | 15:50 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Handle allocations consuming only from the child RPs https://review.openstack.org/608298 | 15:50 |
gibi | mriedem, efried, jaypipes: I have fixed up the use-nested-allocation-candidates series ^^ | 15:51 |
efried | gibi: Cool, I'm sure it's perfect now. | 15:53 |
openstackgerrit | Matt Riedemann proposed openstack/python-novaclient stable/rocky: Follow up "Fix up userdata argument to rebuild" https://review.openstack.org/613086 | 15:53 |
gibi | efried: :) | 15:53 |
*** ccamacho has quit IRC | 15:54 | |
*** cdent has joined #openstack-nova | 15:55 | |
jaypipes | gibi: thx gibi | 15:56 |
*** fghaas has joined #openstack-nova | 15:57 | |
openstackgerrit | Matt Riedemann proposed openstack/python-novaclient stable/queens: Fix up userdata argument to rebuild. https://review.openstack.org/613090 | 15:57 |
openstackgerrit | Matt Riedemann proposed openstack/python-novaclient stable/queens: Follow up "Fix up userdata argument to rebuild" https://review.openstack.org/613091 | 15:57 |
*** munimeha1 has joined #openstack-nova | 15:58 | |
openstackgerrit | Daniel Abad proposed openstack/nova master: Fix ironic client ironic_url deprecation warning https://review.openstack.org/611872 | 16:00 |
*** dave-mccowan has quit IRC | 16:02 | |
*** pvc has joined #openstack-nova | 16:06 | |
pvc | Hi sean-k-mooney my problem is i cannot run nvidia x settings on my instance | 16:07 |
pvc | https://docs.nvidia.com/grid/latest/grid-licensing-user-guide/index.html#licensing-grid-vgpu | 16:07 |
sean-k-mooney | the docs have an advanced section that shouw how to set the liceing info using an config file on linux or the registry on windows | 16:10 |
*** pvc has quit IRC | 16:12 | |
*** pvc has joined #openstack-nova | 16:16 | |
*** psachin has quit IRC | 16:16 | |
pvc | Hi sean-k-mooney can i use conf for adding a license right? | 16:16 |
openstackgerrit | Dan Smith proposed openstack/nova master: Always read-deleted=yes on lazy-load https://review.openstack.org/575190 | 16:16 |
dansmith | melwitt: the down cell series could use some review if you have time. Everything up to the api change (which I orphaned while working on it) should be passing tests now | 16:18 |
*** itlinux has joined #openstack-nova | 16:18 | |
bauzas | pvc: I pointed you to the nvidia guest licensing documentation this morning | 16:18 |
bauzas | pvc: https://docs.nvidia.com/grid/6.0/grid-licensing-user-guide/index.html#licensing-grid-software-linux-config-file | 16:19 |
*** pvc has quit IRC | 16:20 | |
melwitt | dansmith: thanks for the heads up, I'll go through it. I was also thinking about the handling of quota behavior in the presence of down cells. I don't think we have a patch for that yet. if not, I can look at proposing that on top of the api change | 16:21 |
dansmith | yep, not that I know of | 16:21 |
melwitt | ack | 16:22 |
cdent | sean-k-mooney: I built a place to store wood for the fire. good break. I saw mdbooth abandoned his thing, so where does stuff stand now? | 16:24 |
sean-k-mooney | cdent: i was in meetings so and some other stff so ill have the simple booling flag version up soon | 16:25 |
*** pvradu has quit IRC | 16:26 | |
openstackgerrit | Merged openstack/nova-specs master: Re-proposes multiple vGPU types in libvirt https://review.openstack.org/602474 | 16:26 |
mdbooth | cdent: Yeah, having thought about that again, I think it would need a different oslo.db api to do that. The decorator is returned by the object we want to replace, so there's no getting round that. | 16:26 |
*** pvradu has joined #openstack-nova | 16:26 | |
*** helenafm has quit IRC | 16:29 | |
*** pvradu has quit IRC | 16:31 | |
*** lpetrut has joined #openstack-nova | 16:35 | |
melwitt | bauzas: just noticed another thing for the vgpu spec follow up https://review.openstack.org/#/c/602474/2/specs/stein/approved/vgpu-stein.rst@11 | 16:35 |
melwitt | bp name | 16:35 |
*** jmlowe has quit IRC | 16:39 | |
*** ttsiouts has quit IRC | 16:39 | |
*** imacdonn has quit IRC | 16:40 | |
*** ttsiouts has joined #openstack-nova | 16:40 | |
*** imacdonn has joined #openstack-nova | 16:43 | |
*** ttsiouts has quit IRC | 16:45 | |
*** dtantsur is now known as dtantsur|afk | 16:46 | |
*** pcaruana has joined #openstack-nova | 16:46 | |
*** jmlowe has joined #openstack-nova | 16:55 | |
*** icey has quit IRC | 16:56 | |
*** panda is now known as panda|off | 16:59 | |
*** adrianc has quit IRC | 17:00 | |
*** moshele has joined #openstack-nova | 17:00 | |
*** k_mouza_ has joined #openstack-nova | 17:01 | |
*** jdillaman1 has quit IRC | 17:04 | |
*** icey has joined #openstack-nova | 17:04 | |
*** k_mouza has quit IRC | 17:05 | |
*** k_mouza_ has quit IRC | 17:06 | |
*** derekh has quit IRC | 17:07 | |
*** irclogbot_4 has quit IRC | 17:08 | |
*** irclogbot_4 has joined #openstack-nova | 17:08 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional recreate test for bug 1799727 https://review.openstack.org/613115 | 17:09 |
openstack | bug 1799727 in OpenStack Compute (nova) "CPU_Allocation_Ratio from nova.conf doesn't update exisiting providers" [Undecided,Confirmed] https://launchpad.net/bugs/1799727 | 17:09 |
openstackgerrit | Jan Gutter proposed openstack/os-vif master: Update port profile unit tests in host_info https://review.openstack.org/610636 | 17:13 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional recreate test for bug 1799727 https://review.openstack.org/613115 | 17:14 |
openstack | bug 1799727 in OpenStack Compute (nova) "CPU_Allocation_Ratio from nova.conf doesn't update exisiting providers" [High,Confirmed] https://launchpad.net/bugs/1799727 | 17:14 |
*** jpena is now known as jpena|off | 17:15 | |
*** jmlowe has quit IRC | 17:18 | |
*** Swami has joined #openstack-nova | 17:18 | |
*** irclogbot_4 has quit IRC | 17:21 | |
*** pvradu has joined #openstack-nova | 17:23 | |
cfriesen | jaypipes: stephenfin: regarding the "show server numa topology" spec, are you okay with showing the *guest* topology for regular users if we clean up the various issues you raised in the spec? | 17:25 |
sean-k-mooney | cfriesen: provdied the show numa toplogy spec does not show and host topology info they i think its fine | 17:27 |
*** pvradu has quit IRC | 17:27 | |
sean-k-mooney | cfriesen: if you want it to show how the virtual topology is map to a hosts phyisical toplogy then that would be admin only | 17:27 |
cfriesen | sean-k-mooney: agreed. I think showing the "expected" host details to the admin would be useful, since we've run into cases where expected didn't match actual. :) | 17:31 |
sean-k-mooney | cfriesen: it should not upstream. the intel nfv ci actully ssh's into the host that the vm is running on and validates its pinned correctly | 17:33 |
*** mvkr has quit IRC | 17:34 | |
cfriesen | sean-k-mooney: live migration | 17:34 |
sean-k-mooney | cfriesen: but for an admin yes it could be useful when debuging | 17:34 |
sean-k-mooney | cfriesen: what about it i said we validated it was pinned as nova told it too | 17:35 |
cfriesen | (at least until the patch goes in to fail the live migration if there's a numa topology) | 17:35 |
sean-k-mooney | i did not say nova pinned it correctly | 17:35 |
sean-k-mooney | cfriesen: ya on that i have asked stephen to make that condional and off by default | 17:36 |
cfriesen | sean-k-mooney: we ran into some bugs during aborted/failed operations | 17:36 |
openstackgerrit | Merged openstack/nova stable/rocky: Fix up compute rpcapi version for pike release https://review.openstack.org/612561 | 17:37 |
sean-k-mooney | cfriesen: i know of at least on production largscale deployment that uses ovs-dpdk which means the guest have hugepages and numa toplogy that uses livemigration | 17:37 |
sean-k-mooney | it is still true that live migration can fail because there are not enough free hugepges on the numa node but the failure rate was low enouh that they were happy to contiue to use it | 17:38 |
sean-k-mooney | mainly since they could just specify a host that they knew would fit the instance | 17:39 |
jaypipes | cfriesen: I'm not thrilled about it, no.. | 17:40 |
openstackgerrit | Elod Illes proposed openstack/nova master: Transform scheduler.select_destinations notification https://review.openstack.org/508506 | 17:41 |
sean-k-mooney | jaypipes: even if its just the computed toplogy form the flavor+image with no host info? | 17:41 |
cfriesen | jaypipes: so currently an end-user can't tell their topology without logging into the guest and checking it. if they're using a per-user keypair, other users in the same tenant can't see what the topology is. | 17:42 |
dansmith | cfriesen: what's the use case for that though? | 17:42 |
sean-k-mooney | cfriesen: well they can they can look at the flavor and image metadata | 17:42 |
cfriesen | sean-k-mooney: not all clouds allow end users to see flavor extra specs | 17:42 |
sean-k-mooney | cfriesen: wait they dont? how do you know what the falvor does without that info | 17:43 |
dansmith | sean-k-mooney: historically those were admin only | 17:43 |
cfriesen | dansmith: for an admin it's useful for showing what nova expects the virt/phys mapping to be, which can then be checked against the actual mapping on the hypervisor. | 17:43 |
sean-k-mooney | dansmith: huh ok i guess i just always am an admin so never noticed | 17:44 |
cfriesen | for a normal user, it's useful in the same way knowing how many cpus or how much ram your instance has is useful | 17:44 |
dansmith | cfriesen: you mean it's useful for an admin to make sure nova is doing the thing it expects? that seems like a weak case to me.. | 17:44 |
dansmith | cfriesen: but .. the user can get that from the guest if they're the user. I just have a hard time understanding what they can do with that info from just the API, other than maybe complain, or notice that it changed when their admin migrates them | 17:45 |
sean-k-mooney | cfriesen: is any of this stuff already in the metadata api? | 17:46 |
cfriesen | dansmith: only a user with a suitable keypair can login to the guest. other users in the same tenant can't. | 17:46 |
cfriesen | or at least might not be able to | 17:46 |
dansmith | cfriesen: right, it's that case I don't get | 17:46 |
dansmith | cfriesen: like, if I can log into the guest, what does it matter what the topology is? | 17:47 |
dansmith | I mean I can come up with completely synthetic reasons, but they're exceedingly weak, which is what I said above | 17:47 |
dansmith | sorry.. "can't log into the guest" | 17:49 |
*** ralonsoh has quit IRC | 17:52 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Update allocation_ratios in placement inventory if config changes https://review.openstack.org/613126 | 17:54 |
*** lpetrut has quit IRC | 17:55 | |
cfriesen | most of the uses that we added it for were admin-level, admittedly. Like figuring out why something is having a hard time scheduling on a migration or showing the expected virt/phys mapping. For an end user it's really just about showing all the available information about the instance and making it so they don't have to jump through hoops to get it. | 17:55 |
cdent | "making it so they don't have to jump through hoops to get it" <- that ought to be compelling enough? | 17:56 |
openstackgerrit | Gaudenz Steinlin proposed openstack/nova master: Extend volume for libvirt network volumes (RBD) https://review.openstack.org/613039 | 18:00 |
sean-k-mooney | cdent: its 3 command to 1 but if some clouds hide flavor extra spec then i guess maybe they cant run the 3 commands | 18:01 |
sean-k-mooney | cfriesen: had you also planned to show cpu topplogy or just numa topology | 18:01 |
sean-k-mooney | cfriesen: as we both know they can and usually are very different by defualt on openstack | 18:02 |
openstackgerrit | Jay Pipes proposed openstack/nova master: quota: remove unused code https://review.openstack.org/613127 | 18:03 |
openstackgerrit | Jay Pipes proposed openstack/nova master: quota: remove unused Quota driver methods https://review.openstack.org/613128 | 18:03 |
openstackgerrit | Jay Pipes proposed openstack/nova master: quota: remove QuotaDriver.destroy_all_by_project() https://review.openstack.org/613129 | 18:03 |
openstackgerrit | Jay Pipes proposed openstack/nova master: quota: remove default kwarg on get_class_quotas() https://review.openstack.org/613130 | 18:03 |
jaypipes | melwitt, dansmith: some cleanups of the quota system ^^ | 18:04 |
cfriesen | sean-k-mooney: we currently show memory size, page size, and which guest CPUs are associated with each guest numa node. showing virtual CPU topology (sockets/cores/threads) would be a lot trickier since that's not in the nova DB. | 18:05 |
sean-k-mooney | well it is if set in the flavor or image else it up to the virt driver | 18:06 |
jaypipes | melwitt, dansmith: more patches on the way but those are a good first chunk of removing cruft | 18:06 |
*** cdent has quit IRC | 18:06 | |
*** jdillaman has joined #openstack-nova | 18:06 | |
sean-k-mooney | cfriesen: so if we were going this route it would be nice to include it if set in image or flavor | 18:06 |
cfriesen | sean-k-mooney: don't we have scenarios where we say "max cpus per socket"? in that case only the virt driver knows the actual number | 18:07 |
*** tbachman has quit IRC | 18:09 | |
*** tbachman has joined #openstack-nova | 18:12 | |
sean-k-mooney | cfriesen: yes but we can also say 2 cpus per socket instead of max | 18:13 |
sean-k-mooney | anyway that is just a taught | 18:13 |
sean-k-mooney | it sound like dansmith and jaypipes would prefer this not to be in the api anyway so maybe you could do it as an osc or nova client feature | 18:14 |
*** mvkr has joined #openstack-nova | 18:17 | |
*** ivve has quit IRC | 18:18 | |
*** tbachman has quit IRC | 18:18 | |
cfriesen | dansmith: jaypipes: currently there's no way for an admin to look at the expected virt/phys mapping without going into the database. do we expect that nova admins will always have raw DB access? | 18:21 |
sean-k-mooney | they dont need db acess | 18:22 |
sean-k-mooney | they just need to be able to do a flaovr show and image show + look at teh libvirt xml | 18:22 |
cfriesen | sean-k-mooney: no, I'm talking about which specific guest vcpu maps to which specific host CPU | 18:22 |
sean-k-mooney | that in the libvirt xml | 18:22 |
cfriesen | sean-k-mooney: that's the hypervisor view, not nova's view (which in buggy cases can be different) | 18:23 |
sean-k-mooney | the database is not going to help you in does cases to debug what went wrong | 18:23 |
sean-k-mooney | we have no way to get the numa_toplogy blob from the moment when nova was caluating the pinning | 18:24 |
cfriesen | sure it can...if I can see that the entry in the database matched the previous mappings from before I did a migration... | 18:24 |
sean-k-mooney | wait your talking about migrtiaon with cpu pinning wich today is not supported | 18:25 |
cfriesen | cold migration is | 18:25 |
sean-k-mooney | cold migration yes but that wont break in this case | 18:25 |
sean-k-mooney | the xml will be regenerated on the new host | 18:26 |
cfriesen | throw in power outages and downed compute nodes and lost messages and migration reverts | 18:26 |
sean-k-mooney | so when the db is in an undefiend state it may not agree with the hypervior | 18:27 |
sean-k-mooney | yes that is true. not sure this will help with htat | 18:27 |
cfriesen | sean-k-mooney: it'll at least tell us what the problem is | 18:28 |
sean-k-mooney | the problem being the db is borked | 18:28 |
sean-k-mooney | if the vm is running it means the hypervior pinned it correctly based on the info it had at the time. | 18:29 |
sean-k-mooney | it should never be the case that the db is correct and vm is wrong in the cold migrate case | 18:30 |
sean-k-mooney | live migrate this can invert | 18:30 |
sean-k-mooney | cfriesen: would a error log message generate by one of the periodic task on the compute agent not be more useful? | 18:31 |
sean-k-mooney | e.g. dicoverd instance x with pinning y expect z | 18:31 |
openstackgerrit | sean mooney proposed openstack/nova master: harden placement init under wsgi https://review.openstack.org/610034 | 18:32 |
sean-k-mooney | melwitt: i updated the placement wsgi patch again based on more talks with mdbooth and cdent earilier | 18:34 |
sean-k-mooney | melwitt: do you still want me to drop the second unit test https://review.openstack.org/#/c/610034/5/nova/tests/unit/api/openstack/placement/test_db_api.py | 18:34 |
sean-k-mooney | if so i can respin it quickly | 18:34 |
cfriesen | sean-k-mooney: an error log like that is not a bad idea, actually. | 18:37 |
*** tbachman has joined #openstack-nova | 18:38 | |
sean-k-mooney | im kindof assuming any resonable size cloud that is going to have this problem is liekly exporting there logs to elastic serach and or similar and could set up an alert for it | 18:38 |
melwitt | sean-k-mooney: commented | 18:39 |
*** mchlumsky_ has joined #openstack-nova | 18:39 | |
*** slaweq_ has joined #openstack-nova | 18:40 | |
sean-k-mooney | melwitt: thanks | 18:41 |
*** jistr_ has joined #openstack-nova | 18:42 | |
*** itlinux_ has joined #openstack-nova | 18:43 | |
*** aloga_ has joined #openstack-nova | 18:43 | |
*** tridde has joined #openstack-nova | 18:43 | |
*** jhesketh_ has joined #openstack-nova | 18:44 | |
cfriesen | sean-k-mooney: actually, I was wrong. we do have the actual guest CPU topology in the InstanceNUMACell, so we could display it too. | 18:46 |
sean-k-mooney | in the instance request spec im assuming | 18:47 |
sean-k-mooney | or somewhare in the instance extra stuff in the db | 18:47 |
cfriesen | no, InstanceNUMACell.cpu_topology | 18:47 |
sean-k-mooney | does that actully give you the cpu_topology or the vcpu to pcpu mappings | 18:48 |
sean-k-mooney | i have learned that we are terrible at naming anything related to numa in the code | 18:48 |
*** icey has quit IRC | 18:48 | |
*** itlinux has quit IRC | 18:48 | |
*** mchlumsky has quit IRC | 18:48 | |
*** aloga has quit IRC | 18:48 | |
*** erlon has quit IRC | 18:48 | |
*** priteau has quit IRC | 18:48 | |
*** sayalilunkad has quit IRC | 18:48 | |
*** raginbajin has quit IRC | 18:48 | |
*** slaweq has quit IRC | 18:48 | |
*** FlorianFa has quit IRC | 18:48 | |
*** zzzeek has quit IRC | 18:48 | |
*** kevinbenton has quit IRC | 18:48 | |
*** trident has quit IRC | 18:48 | |
*** gryf has quit IRC | 18:48 | |
*** jistr has quit IRC | 18:48 | |
*** SpamapS has quit IRC | 18:48 | |
*** spotz has quit IRC | 18:48 | |
*** lyarwood has quit IRC | 18:48 | |
*** gibi has quit IRC | 18:48 | |
*** jhesketh has quit IRC | 18:48 | |
*** kevinbenton has joined #openstack-nova | 18:49 | |
*** icey has joined #openstack-nova | 18:49 | |
*** SpamapS has joined #openstack-nova | 18:50 | |
cfriesen | topology...threads/cores/sockets | 18:50 |
*** sayalilunkad has joined #openstack-nova | 18:50 | |
*** erlon has joined #openstack-nova | 18:50 | |
cfriesen | there's also InstanceNUMACell.siblings to show guest HT siblings | 18:51 |
*** moshele has quit IRC | 18:51 | |
*** spotz has joined #openstack-nova | 18:52 | |
sean-k-mooney | cfriesen: cool. i still think you need to convice dansmith and jaypipes there is a need for it. the periodic task i think would have value and be a easir sell as it will activly detect there is an issue you should investagate | 18:54 |
*** gryf has joined #openstack-nova | 18:54 | |
*** jmlowe has joined #openstack-nova | 19:00 | |
*** openstackgerrit has quit IRC | 19:06 | |
*** ivve has joined #openstack-nova | 19:20 | |
*** openstackgerrit has joined #openstack-nova | 19:23 | |
openstackgerrit | Merged openstack/nova stable/rocky: Move live_migration.pre.start to the start of the method https://review.openstack.org/612714 | 19:23 |
openstackgerrit | Merged openstack/nova stable/rocky: Ensure attachment cleanup on failure in driver.pre_live_migration https://review.openstack.org/612715 | 19:23 |
*** moshele has joined #openstack-nova | 19:26 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional recreate test for bug 1799727 https://review.openstack.org/613115 | 19:28 |
openstack | bug 1799727 in OpenStack Compute (nova) "CPU_Allocation_Ratio from nova.conf doesn't update exisiting providers" [High,In progress] https://launchpad.net/bugs/1799727 - Assigned to Matt Riedemann (mriedem) | 19:28 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update reserved/allocation_ratio in placement inventory if config changes https://review.openstack.org/613126 | 19:28 |
*** moshele has quit IRC | 19:29 | |
*** slaweq_ is now known as slaweq | 19:40 | |
*** awaugama has quit IRC | 19:42 | |
*** irclogbot_4 has joined #openstack-nova | 19:44 | |
*** irclogbot_4 has quit IRC | 19:46 | |
*** irclogbot_4 has joined #openstack-nova | 19:47 | |
*** jmlowe has quit IRC | 19:50 | |
*** READ10 has joined #openstack-nova | 20:03 | |
*** tbachman has quit IRC | 20:06 | |
*** jmlowe has joined #openstack-nova | 20:11 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add more documentation for online_data_migrations CLI https://review.openstack.org/605836 | 20:14 |
*** ivve has quit IRC | 20:19 | |
*** irclogbot_4 has quit IRC | 20:21 | |
openstackgerrit | Merged openstack/nova stable/queens: Fix up compute rpcapi version for pike release https://review.openstack.org/612562 | 20:25 |
*** READ10 has quit IRC | 20:34 | |
*** tbachman has joined #openstack-nova | 20:34 | |
*** pcaruana has quit IRC | 20:50 | |
*** tbachman has quit IRC | 20:54 | |
openstackgerrit | Dan Smith proposed openstack/nova master: Make CellDatabases fixture reentrant https://review.openstack.org/611665 | 20:58 |
openstackgerrit | Dan Smith proposed openstack/nova master: Modify get_by_cell_and_project() to get_not_deleted_by_cell_and_projects() https://review.openstack.org/607663 | 20:58 |
openstackgerrit | Dan Smith proposed openstack/nova master: Minimal construct plumbing for nova list when a cell is down https://review.openstack.org/567785 | 20:58 |
openstackgerrit | Dan Smith proposed openstack/nova master: Refactor scatter-gather utility to return exception objects https://review.openstack.org/607934 | 20:58 |
openstackgerrit | Dan Smith proposed openstack/nova master: Return a minimal construct for nova show when a cell is down https://review.openstack.org/591658 | 20:58 |
openstackgerrit | Dan Smith proposed openstack/nova master: Return a minimal construct for nova service-list when a cell is down https://review.openstack.org/584829 | 20:58 |
*** openstack has quit IRC | 21:03 | |
*** openstack has joined #openstack-nova | 21:05 | |
*** ChanServ sets mode: +o openstack | 21:05 | |
*** erlon has quit IRC | 21:06 | |
*** k_mouza has joined #openstack-nova | 21:07 | |
*** fghaas has left #openstack-nova | 21:10 | |
*** k_mouza has quit IRC | 21:11 | |
*** spsurya has quit IRC | 21:21 | |
openstackgerrit | Merged openstack/nova master: libvirt: fix disk_bus handling for root disk https://review.openstack.org/584999 | 21:22 |
cfriesen | so are we recommending setting send_service_user_token to True now? would we ever change the default to be True? | 21:23 |
*** itlinux_ has quit IRC | 21:33 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Create volume attachment during boot from volume in compute https://review.openstack.org/541420 | 21:35 |
openstackgerrit | Merged openstack/nova stable/queens: Move live_migration.pre.start to the start of the method https://review.openstack.org/612773 | 21:41 |
openstackgerrit | Merged openstack/nova stable/queens: Ensure attachment cleanup on failure in driver.pre_live_migration https://review.openstack.org/612774 | 21:41 |
*** spatel has joined #openstack-nova | 21:44 | |
spatel | sean-k-mooney: hey | 21:44 |
spatel | I am having issue with block migration | 21:45 |
spatel | it migrate full instance but didn't copy full disk.img file and my VM is failed to boot | 21:45 |
sean-k-mooney | are you using config drive | 21:45 |
spatel | no | 21:45 |
sean-k-mooney | ok config drive breaks that i think | 21:46 |
sean-k-mooney | hum that is strange | 21:46 |
spatel | http://paste.openstack.org/show/732988/ This is what i have in nova.conf | 21:46 |
sean-k-mooney | so the migration suceeded form a nova api point of view but actully failed? | 21:46 |
spatel | Horizon migrating vm but when i reboot machine it put me in emergency mode of linux ( when i check disk.img file its just few KB in size) | 21:47 |
spatel | Migration succeeded and i can see my full VM migration to new compute node and its running also but as soon as i reboot it put me in emergency mode let me show you logs | 21:48 |
sean-k-mooney | so img images are usally raw images. for qcow images i know we have a disk layering/caching thing we use | 21:48 |
spatel | This is what instance look like after reboot http://paste.openstack.org/show/732989/ | 21:49 |
spatel | raw image | 21:49 |
sean-k-mooney | im not sure if we do the same disk caching thing with raw images | 21:49 |
spatel | You think it could be image issue ? | 21:49 |
*** tbachman has joined #openstack-nova | 21:50 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional test for AggregateMultiTenancyIsolation + migrate https://review.openstack.org/571265 | 21:50 |
sean-k-mooney | am no not nessicarlly but if it was using the same disk offest stuff we use for qcow the vm image would be small unless you wrote a lot of data to it after it booted | 21:50 |
sean-k-mooney | that said line 11 XFS (sda1): last sector read failed | 21:51 |
spatel | on source node size was 101MB | 21:51 |
sean-k-mooney | that looks like file system curruption | 21:51 |
spatel | and destination node its few KB | 21:52 |
sean-k-mooney | right am you not on shared storage by the way | 21:52 |
spatel | No i don't have shared storage | 21:52 |
sean-k-mooney | e.g. you dont have /var/libvirt/... on nfs | 21:52 |
spatel | no | 21:53 |
sean-k-mooney | ok cool just chekcing | 21:53 |
spatel | sure | 21:53 |
sean-k-mooney | do you have logs form nova for the migration | 21:53 |
spatel | let me pull | 21:53 |
sean-k-mooney | e.g. the n-cpu logs for sorce and dest | 21:53 |
sean-k-mooney | it seam like the image just got truncated but i dont know why that would happen | 21:53 |
spatel | http://paste.openstack.org/show/732990/ | 21:55 |
spatel | this is destination compute node logs | 21:55 |
*** bnemec has quit IRC | 21:55 | |
spatel | I am using swap disk it shouldn't be an issue | 21:57 |
sean-k-mooney | swap will be copied also so ya it shoudl be fine | 21:57 |
openstackgerrit | Vladyslav Drok proposed openstack/nova stable/pike: Fix resize_instance rpcapi call https://review.openstack.org/603439 | 21:57 |
spatel | let me try again and see | 21:58 |
sean-k-mooney | this is a little suspicious Unknown base file: /var/lib/nova/instances/_base/68e4d13dacff5cffeaacecf533afab659ec3e170 | 21:59 |
spatel | hmm! | 22:00 |
spatel | let me try again can capture fresh log | 22:00 |
spatel | i have spun up new vm and it has 98Mdisk file | 22:00 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Migrate old style volume attachments on nova-compute startup https://review.openstack.org/549130 | 22:00 |
spatel | This is what i am doing for migration Horizon > live migration > select block migration | 22:01 |
sean-k-mooney | any luck reporducing? | 22:03 |
sean-k-mooney | by the way im assuming you are not using sriov interface on that vm right | 22:03 |
spatel | no SR-IOV | 22:04 |
sean-k-mooney | ok good because that does not wrok :) | 22:04 |
sean-k-mooney | at least not yet | 22:04 |
spatel | with in 30 second nova said migration completed and i can see host also updated in horizon | 22:04 |
spatel | here is the fresh log http://paste.openstack.org/show/732991/ | 22:05 |
spatel | now going to reboot my instance | 22:05 |
spatel | failed to boot | 22:06 |
spatel | Entering emergency mode. Exit the shell to continue. | 22:06 |
*** slaweq has quit IRC | 22:06 | |
sean-k-mooney | hum if you log in to host ostack-compute-27.v1v0x.net does its n-cpu log have any errors | 22:06 |
spatel | disk size is 3.3M | 22:06 |
spatel | n-cpu log? | 22:07 |
spatel | is that a log file? | 22:07 |
sean-k-mooney | nova compute agent | 22:07 |
sean-k-mooney | in the developement installer devstack its refered to as n-cpu for short | 22:07 |
sean-k-mooney | so same log but for source node | 22:08 |
spatel | This is the log of compute-27 http://paste.openstack.org/show/732992/ | 22:08 |
spatel | no error there | 22:09 |
spatel | what is this ? [instance: 2ad8cdf5-4db7-4024-a35f-343340ad27ee] Instance not resizing, skipping migration. | 22:09 |
sean-k-mooney | we use the same code path for migration and resizing vms since it basically the same thing | 22:11 |
spatel | hmm | 22:11 |
spatel | Do you think my nova config is wrong? | 22:12 |
sean-k-mooney | its being logged from here https://github.com/openstack/nova/blob/12fcfc5e2b51b563529a5bc0b2990816bbbda80b/nova/compute/resource_tracker.py#L1127 | 22:12 |
sean-k-mooney | no i think your nova config is fine that said we might need to enable dbug loggin to get to the bottom of what is going on | 22:13 |
spatel | So i have manually copy disk file from source to destination and VM is up | 22:13 |
spatel | look like compute node not not copying full disk file or source just deleting instance before copy finish | 22:14 |
openstackgerrit | Merged openstack/nova master: Add debug logs for when provider inventory changes https://review.openstack.org/597560 | 22:14 |
sean-k-mooney | ya that should not be required | 22:14 |
sean-k-mooney | ya can you enable debug logs on both nodes and try again | 22:14 |
*** efried has quit IRC | 22:15 | |
spatel | Let me go home and collect all nova debug and file BUG | 22:15 |
*** efried has joined #openstack-nova | 22:15 | |
spatel | it seems source just deleting instance before nova migration complete... | 22:15 |
sean-k-mooney | spatel: sure its late here too so i was going to be droping ofline soon once my pizzia is readdy | 22:16 |
sean-k-mooney | if you can file a bug and attach some debug logs for the source and destination we can see why it is not copying the disk | 22:16 |
spatel | i will | 22:17 |
spatel | thanks! | 22:17 |
sean-k-mooney | spatel: by the way what release are you running | 22:18 |
spatel | queens | 22:18 |
sean-k-mooney | there was a error we caught just as Rocky was shipping where we treated migration compution as a success without checking if there was an error | 22:18 |
sean-k-mooney | ah so ya you might be hitting that bug | 22:18 |
spatel | damn it!! ! | 22:19 |
spatel | really!! | 22:19 |
spatel | send me that BUG and i will check code | 22:19 |
sean-k-mooney | basically if libvirt through an error after migrtion started we did not handel it properly | 22:19 |
*** mlavalle has quit IRC | 22:20 | |
spatel | I am leaving and need to shutdown my pc but you can send me email on satish.txt@gmail.com | 22:20 |
*** spatel has quit IRC | 22:21 | |
openstackgerrit | Merged openstack/nova stable/pike: Revert "Make host_aggregate_map dictionary case-insensitive" https://review.openstack.org/605268 | 22:35 |
openstackgerrit | Merged openstack/nova stable/pike: Enforce case-sensitive hostnames in aggregate host add https://review.openstack.org/605269 | 22:35 |
*** vabada has quit IRC | 22:35 | |
*** vabada has joined #openstack-nova | 22:37 | |
*** eharney has quit IRC | 22:38 | |
*** rcernin has joined #openstack-nova | 23:05 | |
*** itlinux has joined #openstack-nova | 23:05 | |
*** READ10 has joined #openstack-nova | 23:17 | |
*** openstackgerrit has quit IRC | 23:20 | |
*** spatel has joined #openstack-nova | 23:33 | |
*** spatel has quit IRC | 23:37 | |
*** Swami has quit IRC | 23:45 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!