*** CeeMac has quit IRC | 00:12 | |
*** k_mouza has quit IRC | 00:16 | |
*** rcernin has joined #openstack-nova | 00:21 | |
*** sapd1_x has quit IRC | 00:40 | |
*** swp20 has quit IRC | 00:50 | |
*** __ministry has joined #openstack-nova | 01:19 | |
*** swp20 has joined #openstack-nova | 01:27 | |
*** LinPeiWen has quit IRC | 01:44 | |
*** k_mouza has joined #openstack-nova | 01:48 | |
*** hamalq has quit IRC | 01:52 | |
*** hamalq has joined #openstack-nova | 01:52 | |
*** k_mouza has quit IRC | 01:52 | |
*** priteau has quit IRC | 02:03 | |
*** sorrison has quit IRC | 02:22 | |
*** LinPeiWen has joined #openstack-nova | 02:39 | |
*** sapd1_x has joined #openstack-nova | 02:40 | |
*** hemanth_n has joined #openstack-nova | 02:49 | |
*** mkrai has joined #openstack-nova | 03:46 | |
*** sapd1_x has quit IRC | 04:26 | |
*** markmcclain has quit IRC | 04:27 | |
*** markmcclain has joined #openstack-nova | 04:28 | |
*** sapd1_x has joined #openstack-nova | 04:34 | |
*** ratailor has joined #openstack-nova | 04:37 | |
*** vishalmanchanda has joined #openstack-nova | 04:40 | |
*** mkrai_ has joined #openstack-nova | 04:53 | |
*** mkrai has quit IRC | 04:54 | |
*** ralonsoh has joined #openstack-nova | 05:27 | |
*** sapd1_x has quit IRC | 05:39 | |
*** slaweq has joined #openstack-nova | 06:00 | |
*** k_mouza has joined #openstack-nova | 06:01 | |
*** k_mouza has quit IRC | 06:06 | |
*** damien_r has joined #openstack-nova | 06:06 | |
*** damien_r has quit IRC | 06:11 | |
*** lpetrut has joined #openstack-nova | 06:16 | |
*** swp20 has quit IRC | 06:31 | |
*** luksky has joined #openstack-nova | 06:34 | |
*** gyee has quit IRC | 06:35 | |
*** mkrai_ has quit IRC | 06:35 | |
*** dklyle has quit IRC | 06:59 | |
*** hamalq has quit IRC | 07:03 | |
*** gokhani has joined #openstack-nova | 07:11 | |
*** andrewbonney has joined #openstack-nova | 07:15 | |
*** rcernin has quit IRC | 07:25 | |
*** rpittau|afk is now known as rpittau | 07:36 | |
*** hamalq has joined #openstack-nova | 07:39 | |
*** k_mouza has joined #openstack-nova | 07:45 | |
*** k_mouza has quit IRC | 07:50 | |
*** yonglihe has joined #openstack-nova | 07:50 | |
*** martinkennelly has joined #openstack-nova | 07:58 | |
gibi | fyi it seems grenade is broken on master https://zuul.opendev.org/t/openstack/builds?job_name=nova-grenade-multinode&project=openstack%2Fnova&branch=master | 07:59 |
---|---|---|
gibi | with an error in cinder upgrade | 07:59 |
gibi | https://zuul.opendev.org/t/openstack/build/7ee322bd1c024bf0a1855c7625534c29/log/logs/grenade.sh.txt#46867 | 07:59 |
gibi | "Failed to start rtslib-fb-targetctl.service: Unit rtslib-fb-targetctl.service is not loaded properly: Exec format error." | 07:59 |
*** derekh has joined #openstack-nova | 08:01 | |
*** hamalq has quit IRC | 08:02 | |
*** lucasagomes has joined #openstack-nova | 08:03 | |
lyarwood | crap, I bet that's my lioadm devstack change | 08:04 |
gibi | it seems it interferes with the bionic jobs | 08:04 |
lyarwood | https://review.opendev.org/c/openstack/devstack/+/779624 | 08:04 |
lyarwood | I was sure I pinned the bionic jobs to tgtadm | 08:05 |
gibi | http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Failed%20to%20start%20rtslib-fb-targetctl.service%5C%22 this query shows that it is mostly hit on bionic | 08:05 |
lyarwood | oh it's because this is still the old grenade multinode job | 08:07 |
lyarwood | gah | 08:07 |
lyarwood | we really need to land https://review.opendev.org/c/openstack/nova/+/778885 soonish assuming everyone is okay with the job | 08:08 |
gibi | I'm on it | 08:08 |
openstackgerrit | Brin Zhang proposed openstack/nova master: Replaces tenant_id with project_id from List SG API https://review.opendev.org/c/openstack/nova/+/766726 | 08:09 |
*** hamalq has joined #openstack-nova | 08:09 | |
gibi | lyarwood: fyi there are other than the nova grenade jobs out there that is blocked now, so while I totally agree to move forward with the new grenade jobs in nova it is not a full fix for the gate (it fixes the nova gate though :)) | 08:10 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: zuul: Pin nova-grenade-multinode to tgtadm CINDER_ISCSI_HELPER https://review.opendev.org/c/openstack/nova/+/788425 | 08:14 |
lyarwood | gibi: so the other fix is that ^ | 08:14 |
* lyarwood looks at the logstash query | 08:14 | |
lyarwood | yeah these are all dsvm bionic jobs, gah | 08:15 |
*** ociuhandu has joined #openstack-nova | 08:15 | |
lyarwood | let me fix this in devstack actually | 08:17 |
lyarwood | these jobs really need to be updated anyway | 08:18 |
gibi | OK | 08:20 |
gibi | I continue review the patch that adds the new grenade job to nova | 08:20 |
*** rcernin has joined #openstack-nova | 08:21 | |
*** ociuhandu has quit IRC | 08:25 | |
*** ociuhandu has joined #openstack-nova | 08:26 | |
*** k_mouza has joined #openstack-nova | 08:30 | |
*** gokhani has quit IRC | 08:31 | |
*** ociuhandu has quit IRC | 08:32 | |
*** rcernin has quit IRC | 08:32 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: DNM - testing nova-grenade-multinode fix https://review.opendev.org/c/openstack/nova/+/788430 | 08:35 |
*** gokhani has joined #openstack-nova | 08:37 | |
*** sapd1_x has joined #openstack-nova | 08:37 | |
*** brinzhang0 has joined #openstack-nova | 08:39 | |
lyarwood | gibi: https://review.opendev.org/c/openstack/devstack/+/788429 should fix all jobs | 08:39 |
*** ociuhandu has joined #openstack-nova | 08:41 | |
*** brinzhang_ has quit IRC | 08:42 | |
gibi | does that ^^ superseeds https://review.opendev.org/c/openstack/nova/+/788425 ? | 08:45 |
lyarwood | gibi: well I've posted three changes in all, one that fixes just nova-grenade-multinode as it currently is, one that fixes nova-grenade-multinode by moving to the zuulv3 based jobs and a devstack change to fix all bionic based jobs | 08:47 |
lyarwood | gibi: https://review.opendev.org/c/openstack/nova/+/788425 is something we can land quickly IMHO | 08:47 |
gibi | I've just approved the zuul v3 move | 08:47 |
lyarwood | ah there we go then | 08:47 |
lyarwood | I'll drop https://review.opendev.org/c/openstack/nova/+/788425 | 08:47 |
gibi | Ok | 08:48 |
gibi | thanks | 08:48 |
*** bnemec has quit IRC | 08:48 | |
lyarwood | the zuulv3 change should also move our grenade testing to focal to focal finally | 08:48 |
lyarwood | we dropped bionic back in ussuri | 08:48 |
*** bnemec has joined #openstack-nova | 08:49 | |
*** mkrai has joined #openstack-nova | 08:53 | |
sean-k-mooney | does anyone know why we have an os_type column in the instances table | 09:11 |
sean-k-mooney | as far as i can tell we always use the os_type image metadata property when generating xmls or schduling | 09:12 |
sean-k-mooney | not an os_type form the instance | 09:12 |
*** k_mouza has quit IRC | 09:18 | |
*** k_mouza has joined #openstack-nova | 09:19 | |
*** ratailor_ has joined #openstack-nova | 09:22 | |
*** priteau has joined #openstack-nova | 09:22 | |
stephenfin | bauzas: Are you okay with me addressing your nits and then reviewing https://review.opendev.org/c/openstack/nova/+/695012 myself? I think the bulk of the change will stay the same | 09:23 |
bauzas | stephenfin: woah, long story here, I can't remember the context but sure | 09:23 |
sean-k-mooney | we really should try and get that serise merged | 09:24 |
*** ratailor has quit IRC | 09:25 | |
*** sapd1_x has quit IRC | 09:26 | |
sean-k-mooney | hum i wonder why i never left any review comments on that unless its a different seriese form mark | 09:26 |
sean-k-mooney | oh i was thinking of https://review.opendev.org/c/openstack/nova/+/710848 https://review.opendev.org/c/openstack/nova/+/710847 and https://review.opendev.org/c/openstack/nova/+/760354 | 09:29 |
*** k_mouza has quit IRC | 09:30 | |
*** rcernin has joined #openstack-nova | 09:39 | |
*** k_mouza has joined #openstack-nova | 09:41 | |
*** hamalq has quit IRC | 09:44 | |
*** tosky has joined #openstack-nova | 09:45 | |
*** sapd1_x has joined #openstack-nova | 09:49 | |
stephenfin | gibi: Could I get you to review the small diff on https://review.opendev.org/c/openstack/nova/+/676209 yet again, please? /o\ | 09:50 |
stephenfin | lyarwood: Any chance you'd be able to slog through that so I can finally close it out? I genuinely think it's a useful addition | 09:51 |
*** ratailor__ has joined #openstack-nova | 09:55 | |
*** ratailor_ has quit IRC | 09:57 | |
lyarwood | stephenfin: yup I'll queue it up for later today if that's okay | 09:58 |
stephenfin | wfm. Thanks | 09:58 |
lyarwood | stephenfin: and I'll likely ask for the same later in the cycle for the block layer | 09:58 |
stephenfin | fair :) | 09:58 |
*** vishalmanchanda has quit IRC | 10:00 | |
*** sapd1_x has quit IRC | 10:06 | |
*** sapd1_x has joined #openstack-nova | 10:06 | |
*** vishalmanchanda has joined #openstack-nova | 10:11 | |
gibi | stephenfin: I will check it after my lunch | 10:12 |
*** whoami-rajat_ has joined #openstack-nova | 10:22 | |
openstackgerrit | Lee Yarwood proposed openstack/nova stable/victoria: libvirt: Ignore device already in the process of unplug errors https://review.opendev.org/c/openstack/nova/+/788467 | 10:24 |
openstackgerrit | Lee Yarwood proposed openstack/nova stable/ussuri: libvirt: Ignore device already in the process of unplug errors https://review.opendev.org/c/openstack/nova/+/788468 | 10:24 |
openstackgerrit | Lee Yarwood proposed openstack/nova stable/train: libvirt: Ignore device already in the process of unplug errors https://review.opendev.org/c/openstack/nova/+/788469 | 10:26 |
*** dtantsur is now known as dtantsur|brb | 10:32 | |
*** ociuhandu has quit IRC | 10:39 | |
sean-k-mooney | stephenfin: +1 on the first patch for nova.pci | 10:41 |
stephenfin | thanks :) | 10:41 |
sean-k-mooney | stephenfin: i have 2 other review i want to complte first then ill see if i can get back to the rest | 10:41 |
openstackgerrit | Brin Zhang proposed openstack/nova master: Replaces tenant_id with project_id from Flavor Access APIs https://review.opendev.org/c/openstack/nova/+/767704 | 10:45 |
*** tbachman has joined #openstack-nova | 10:48 | |
*** ociuhandu has joined #openstack-nova | 10:57 | |
*** brinzhang_ has joined #openstack-nova | 10:59 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: DNM: Testing with sqlalchemy 1.4 https://review.opendev.org/c/openstack/nova/+/788471 | 11:00 |
gibi | fyi we are expected failures and therefore work here ^^ | 11:01 |
gibi | s/expected/expecting/ | 11:01 |
*** brinzhang0 has quit IRC | 11:02 | |
*** k_mouza_ has joined #openstack-nova | 11:04 | |
*** k_mouza has quit IRC | 11:04 | |
*** k_mouza has joined #openstack-nova | 11:05 | |
*** mkrai has quit IRC | 11:06 | |
*** ociuhandu has quit IRC | 11:07 | |
*** k_mouza_ has quit IRC | 11:08 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Replaces tenant_id with project_id from List/Show usage APIs https://review.opendev.org/c/openstack/nova/+/768509 | 11:11 |
gibi | reported https://bugs.launchpad.net/nova/+bug/1926426 to track the work | 11:11 |
openstack | Launchpad bug 1926426 in OpenStack Compute (nova) "Nova is not compatible with sqlalchemy 1.4" [High,Triaged] | 11:11 |
*** mkrai has joined #openstack-nova | 11:12 | |
sean-k-mooney | is the issue the test or the production code? | 11:13 |
sean-k-mooney | looks like maybe both | 11:13 |
sean-k-mooney | have we capped it in UC | 11:14 |
*** sapd1_x has quit IRC | 11:14 | |
*** ociuhandu has joined #openstack-nova | 11:15 | |
*** k_mouza_ has joined #openstack-nova | 11:15 | |
sean-k-mooney | ah we had to 1.3.23 previously | 11:15 |
gibi | it is a test coming from a proposed UC bump | 11:15 |
gibi | so far we see one oslo.db issue but we also seem to have some nova specific failure modes as well | 11:16 |
*** k_mouza has quit IRC | 11:18 | |
*** k_mouza has joined #openstack-nova | 11:19 | |
*** ociuhandu has quit IRC | 11:19 | |
*** ociuhandu has joined #openstack-nova | 11:20 | |
*** k_mouza_ has quit IRC | 11:22 | |
*** mkrai has quit IRC | 11:23 | |
*** ociuhandu has quit IRC | 11:25 | |
*** ociuhandu has joined #openstack-nova | 11:36 | |
*** kinpaa12389 has joined #openstack-nova | 11:38 | |
*** ociuhandu has quit IRC | 11:41 | |
*** dtantsur|brb is now known as dtantsur | 11:41 | |
*** ociuhandu has joined #openstack-nova | 11:52 | |
gibi | sean-k-mooney: Did I understood the comment from the PTG well about when we need direction aware and when we need direction less pps resource? https://review.opendev.org/c/openstack/neutron-specs/+/785236/3/specs/xena/qos-minimum-guaranteed-packet-rate.rst#266 | 11:56 |
sean-k-mooney | sorry have nto got to your spec yet today but ill take a look at that comment now one sec | 11:58 |
*** __ministry has quit IRC | 11:59 | |
sean-k-mooney | gibi: so ya | 11:59 |
sean-k-mooney | we can either put the sum of the direct based values in the request | 12:00 |
sean-k-mooney | or we can have 3 resources class and just make the user choose the correct policy | 12:00 |
sean-k-mooney | i kind of like having 3 reosuce classes PPS_TOTAL, PPS_INGRESS and PPS_EGRESS | 12:01 |
gibi | so then in case of hw offloaded OVS the packet processing bottleneck is not some kind of shared resource (like the CPU in case of normal OVS) but some direction specific hardware resoruce? | 12:01 |
gibi | hence we need to track resource inventory per direction separately | 12:02 |
sean-k-mooney | well yes and know. the pcie bandwith is a shared resouce but hardware offloaded ovs use sriov VFs for the dataplane | 12:02 |
sean-k-mooney | so we can pretend it just sriov in this case | 12:03 |
sean-k-mooney | vdpa is more or less the same in that the parent of the vdpa device atleat today is a VF | 12:03 |
gibi | so in this case the logic that is implemented in OVS runs on the specific hardware? | 12:03 |
gibi | anyhow I think I get it, the HW offload OVS is like the SRIOV VF case | 12:04 |
sean-k-mooney | how it works is the ovs vswitchd process caluates openflow rules as normal and then use the tcflower protocal to install those flows into the hardware dataplane if it supprot them | 12:05 |
gibi | sean-k-mooney: thanks that helps visualizing the situation for me | 12:05 |
sean-k-mooney | basically you can thinkg of the programming model like a cache if the hardware can cache the rule in its hardware swtich it will process them in hardware. the ovs proces just caculates them as normal and they are then copied to the hardware | 12:06 |
sean-k-mooney | what that means in partcie is the first packet(s) of each connection is processed in software and then hardware takes over | 12:06 |
gibi | I see | 12:08 |
gibi | and in this hardwer the incoming and outgoing packet processing is implemented independently so we can run out of one direction while the other direction still has capacity | 12:09 |
gibi | and therefore we need separated resource tracking | 12:09 |
sean-k-mooney | i think that depens on the nic but in principal yes the internal capasity of the switch is higher then the phsyical connectors | 12:10 |
sean-k-mooney | so it can run out of capasity to transmit to the datacenter network before its ablity to swtich between vms on the same host | 12:10 |
sean-k-mooney | although intel tend and mellonox until recently had a habbit of oversubscibing ther port bandwith | 12:11 |
*** ociuhandu has quit IRC | 12:11 | |
sean-k-mooney | i.e. intel 2*40G fortviles only had 56Gb/s of pcie bandwith | 12:11 |
sean-k-mooney | which is fine for the 2*25G card but ya pcie bandwith and phycial port bandwith can often be less the the chip bandwith in the data center space | 12:12 |
sean-k-mooney | the assumtion often is that you are using 2 port cards for HA not more bandwitr/pps | 12:13 |
gibi | when you say "run out of capasity to transmit to the datacenter" do you mean it runs out of physical nework bandwidth? | 12:14 |
sean-k-mooney | yep | 12:14 |
gibi | so while it can still swtich packets, it cannot transmit them | 12:14 |
gibi | but then this is showing just that the packet processing capacity is independent from the bandwidth | 12:15 |
gibi | it does not show that switching incoming packets are independend from switching outgoing packets | 12:15 |
sean-k-mooney | ya the 2*25G intel fortvales actully had 2 40G chips one per uplink port but had 25G phys | 12:15 |
sean-k-mooney | gibi: you are right it does not show they are independeint on the recive side if you have an intel cpu with vt-c | 12:17 |
sean-k-mooney | that allows supported nic to place recived packet directly into the cpus l3 cache | 12:17 |
sean-k-mooney | on transmit i know the cpu can do a dma transfre to the nic queue or indead to a zerocopy transmit in some cases | 12:18 |
*** ratailor__ has quit IRC | 12:19 | |
gibi | OK, I rest my case. I will do allow the admin to configure either two direction aware pps resource inventory (in case of offloaded OVS) or a single direction less resource inventory (for the normal OVS) | 12:20 |
sean-k-mooney | well no | 12:20 |
sean-k-mooney | if you think that scoping this down to just normal ovs for now is the best way forward that would be ok | 12:21 |
sean-k-mooney | it would proably be nice to support both modesl as you said but we could start with the one you want | 12:21 |
gibi | I can do both, this is not the hard part (I guess :D)_ | 12:22 |
sean-k-mooney | i just would not like to see logic to traslate driction based request into driectionless | 12:22 |
sean-k-mooney | the translation is the bit i found slightly unclean about the proposal | 12:23 |
*** eharney has quit IRC | 12:23 | |
gibi | the QoS rule will be direction aware. So that is good. | 12:24 |
gibi | Then in case of normal OVS should we still allow configuring direction aware resource inventory? | 12:24 |
*** ociuhandu has joined #openstack-nova | 12:24 | |
gibi | that seems wrong to me as both direction are handled by the same set of CPUs runnig the OVS software | 12:24 |
sean-k-mooney | well i kind of thik we shoudl have 3 polices as i said above | 12:24 |
gibi | ohh | 12:24 |
sean-k-mooney | 2 that are direction aware and 1 that is drectionless | 12:24 |
sean-k-mooney | that map directly to the PPS_TOTAL, PPS_INGRESS and PPS_EGRESS resouce classes | 12:25 |
gibi | so for vnic_type=normal the user should only use a directionless QoS policy | 12:25 |
sean-k-mooney | yep | 12:25 |
gibi | OK now I got your point | 12:25 |
*** eharney has joined #openstack-nova | 12:25 | |
gibi | 3 resource classes and 3 QoS rule types | 12:26 |
sean-k-mooney | so ovs would report PPS_TOTAL inventories and the directionless pps rule would consume it | 12:26 |
sean-k-mooney | yep | 12:26 |
sean-k-mooney | and no need for special logic to translate between them | 12:26 |
gibi | OK so it is a bit more complexity to the user to select the proper QoS rule but in return we never lie about in the QoS policy about the direction awareness | 12:27 |
sean-k-mooney | sorry i tought i had said that at the PTG but maybe i was not very clear | 12:27 |
gibi | sean-k-mooney: no it is my bad, I mixied the QoS policy part with the resource classes | 12:27 |
gibi | next step | 12:28 |
*** gokhani has quit IRC | 12:28 | |
sean-k-mooney | but ya its a little more complet for the tenant but simpler in code | 12:28 |
sean-k-mooney | at least i think it is | 12:28 |
gibi | yepp | 12:28 |
*** gokhani has joined #openstack-nova | 12:28 | |
*** ociuhandu has quit IRC | 12:29 | |
gibi | so next step, when we have the directionless QoS policy and later we want to implement data plane enforcement for that, then how we map the direction less min guarantee to direction aware pps rate limit rules to do enforcement? | 12:29 |
gibi | this is the packet rate limiter work https://bugs.launchpad.net/neutron/+bug/1912460 from Liu | 12:31 |
openstack | Launchpad bug 1912460 in neutron "[RFE] [QoS] add qos rule type packet per second (pps)" [Wishlist,In progress] - Assigned to LIU Yulong (dragon889) | 12:31 |
gibi | and as I understood from ralonso the basic solution for the guarante is to set up max limits := min guarante for all the traffic | 12:31 |
sean-k-mooney | that is a goog question we have 3 choices i guess, 1 dont, 2 split it evenly between each direction(assuming its a share resouces), 3 assume they are indepented so allwo it for both | 12:31 |
*** gokhani has quit IRC | 12:31 | |
*** whoami-rajat_ is now known as whoami-rajat | 12:32 | |
*** gokhani has joined #openstack-nova | 12:32 | |
*** hemanth_n has quit IRC | 12:32 | |
sean-k-mooney | gibi: ya so like badnwith, the mins for ovs are not enfroced by ovs so you would use max ppp instead to do enfrocement | 12:33 |
sean-k-mooney | for any backend that can enforce it then min should be enough | 12:33 |
sean-k-mooney | and you should not need max rules | 12:33 |
gibi | OK so in case of normal OVS where we have the directionless QoS we know that it is not an independent resource and therefore we would need to split it but splitting it equally might not what the user intended originally | 12:34 |
sean-k-mooney | gibi: well since the enforcement will be doen by the max policy the could use a directionless min policy with ad directionful? pair of max policies | 12:35 |
sean-k-mooney | so min_totall_pps=1000 max_ingress_pps=200 max_egress_pps=800 | 12:36 |
gibi | so we are sure that OVS will never have the capability to enforce min alone? so we alway need to manually define both min and max | 12:36 |
sean-k-mooney | well intel wehere trying to get this into ovs-dpdk for a few year but could never get around the fact they would basically have to dedicate 1 core to doing the min enforcement | 12:37 |
sean-k-mooney | i dont think the kernel implemation has made any progress either | 12:37 |
sean-k-mooney | so i dont know | 12:37 |
gibi | OK, thanks I think I got the full picture | 12:37 |
gibi | I will go and document it now in the neutron spec | 12:38 |
gibi | I really appreciate you help | 12:39 |
sean-k-mooney | just did some googleing mabe they have some support https://mail.openvswitch.org/pipermail/ovs-dev/2016-July/317678.html | 12:40 |
*** ociuhandu has joined #openstack-nova | 12:40 | |
sean-k-mooney | i dont actully see min suport | 12:41 |
sean-k-mooney | althoug its in the exaMPLES http://www.openvswitch.org/support/dist-docs/ovs-vsctl.8.txt | 12:42 |
sean-k-mooney | min-rate is bandwith though not pps | 12:43 |
*** lpetrut has quit IRC | 12:44 | |
gibi | OK | 12:45 |
*** johanssone has quit IRC | 12:51 | |
*** johanssone has joined #openstack-nova | 12:54 | |
sean-k-mooney | gibi: so looking at https://www.openvswitch.org/support/dist-docs/ovs-vswitchd.conf.db.5.html the queue table does suppoirt min-rate now but if i rememebr correctly it is very challaging to use that correctly | 12:55 |
sean-k-mooney | other_config : min-rate: optional string, containing an integer, at | 12:55 |
sean-k-mooney | least 1 | 12:55 |
sean-k-mooney | Minimum guaranteed bandwidth, in bit/s. | 12:55 |
gibi | yapp that just bandwidth not packet rate | 12:55 |
sean-k-mooney | that sound greate but to use this you have to manually create quees on each port and then instead of output to the prot or using the normal action you have to output to the queue explictly | 12:56 |
sean-k-mooney | neutron could use that for traffic ingressing to the vm or but only if it stopped using the normal action in ml2/ovs | 12:57 |
sean-k-mooney | ovn could use that for ingress to the vm but on egress from the vm it would not really work | 12:57 |
sean-k-mooney | and ya its bandwith not pps so it wont help in your case | 12:58 |
gibi | ack | 12:58 |
*** iurygregory has quit IRC | 12:59 | |
*** macz_ has joined #openstack-nova | 13:05 | |
*** eharney has quit IRC | 13:18 | |
*** eharney has joined #openstack-nova | 13:18 | |
openstackgerrit | Daniel Bengtsson proposed openstack/nova master: Use the new type HostDomainOpt. https://review.opendev.org/c/openstack/nova/+/788240 | 13:20 |
*** johanssone has quit IRC | 13:21 | |
*** johanssone has joined #openstack-nova | 13:28 | |
*** iurygregory has joined #openstack-nova | 13:35 | |
*** ociuhandu has quit IRC | 13:52 | |
*** ociuhandu has joined #openstack-nova | 13:53 | |
*** ociuhandu has quit IRC | 13:58 | |
*** k_mouza has quit IRC | 14:14 | |
*** k_mouza_ has joined #openstack-nova | 14:14 | |
*** ociuhandu has joined #openstack-nova | 14:14 | |
*** ociuhandu has quit IRC | 14:14 | |
*** zoharm has joined #openstack-nova | 14:14 | |
*** ociuhandu has joined #openstack-nova | 14:16 | |
*** k_mouza_ has quit IRC | 14:20 | |
*** k_mouza has joined #openstack-nova | 14:20 | |
*** ociuhandu has quit IRC | 14:20 | |
*** ociuhandu has joined #openstack-nova | 14:21 | |
*** dave-mccowan has joined #openstack-nova | 14:36 | |
*** kinpaa12389 has quit IRC | 14:38 | |
*** jobewan has quit IRC | 14:43 | |
lyarwood | elod / bauzas / melwitt ; https://review.opendev.org/c/openstack/nova/+/787943 - If you have time today can I get reviews on this backport please. | 14:51 |
bauzas | lyarwood: we have our meeting in 9 mins and then I'll need to go to a blood donating but I'll try | 14:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add functional regression test for bug 1853009 https://review.opendev.org/c/openstack/nova/+/695012 | 14:53 |
openstack | bug 1853009 in OpenStack Compute (nova) ussuri "Ironic node rebalance race can lead to missing compute nodes in DB" [High,In progress] https://launchpad.net/bugs/1853009 - Assigned to Mark Goddard (mgoddard) | 14:53 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Clear rebalanced compute nodes from resource tracker https://review.opendev.org/c/openstack/nova/+/695187 | 14:53 |
lyarwood | bauzas: yup no issues if you can't get to it | 14:53 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Invalidate provider tree when compute node disappears https://review.opendev.org/c/openstack/nova/+/695188 | 14:54 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Prevent deletion of a compute node belonging to another host https://review.opendev.org/c/openstack/nova/+/694802 | 14:54 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Fix inactive session error in compute node creation https://review.opendev.org/c/openstack/nova/+/695189 | 14:54 |
bauzas | lyarwood: /me clicks anyway | 14:54 |
lyarwood | ah crap the patch below hasn't actually landed | 14:55 |
lyarwood | sorry I thought it had, this isn't urgent in that case | 14:55 |
*** jobewan has joined #openstack-nova | 14:57 | |
*** prometheanfire has left #openstack-nova | 14:57 | |
*** zoharm has quit IRC | 14:59 | |
*** rcernin has quit IRC | 15:01 | |
*** ociuhandu has quit IRC | 15:07 | |
*** ociuhandu has joined #openstack-nova | 15:08 | |
*** rcernin has joined #openstack-nova | 15:12 | |
*** ociuhandu has quit IRC | 15:13 | |
*** ociuhandu has joined #openstack-nova | 15:15 | |
*** rcernin has quit IRC | 15:17 | |
*** mlavalle has joined #openstack-nova | 15:22 | |
melwitt | lyarwood: ack, I can look | 15:23 |
*** gokhani has quit IRC | 15:29 | |
*** dklyle has joined #openstack-nova | 15:38 | |
*** damien_r has joined #openstack-nova | 15:41 | |
*** ociuhandu has quit IRC | 15:42 | |
*** jobewan has quit IRC | 15:43 | |
*** ociuhandu has joined #openstack-nova | 15:43 | |
*** jobewan has joined #openstack-nova | 15:44 | |
*** ociuhandu has quit IRC | 15:48 | |
stephenfin | Is it just me, or is nova-grenade-multinode failing 100% right now? | 15:54 |
* stephenfin compares two jobs | 15:55 | |
stephenfin | Yeah, looks like Ubuntu is broken | 15:56 |
stephenfin | Failed to start rtslib-fb-targetctl.service: Unit rtslib-fb-targetctl.service is not loaded properly: Exec format error. | 15:56 |
lyarwood | yeah it is | 15:57 |
lyarwood | stephenfin: https://bugs.launchpad.net/devstack/+bug/1926411 https://review.opendev.org/c/openstack/devstack/+/788429 | 15:57 |
openstack | Launchpad bug 1926411 in devstack "legacy bionic based dsvm jobs failing with Failed to start rtslib-fb-targetctl.service" [Undecided,In progress] | 15:57 |
stephenfin | aha, thanks | 15:58 |
stephenfin | I saw something about bionic scroll by earlier but didn't realize it was related | 15:58 |
lyarwood | argh and that's blocked by https://bugs.launchpad.net/devstack/+bug/1926434 | 15:59 |
openstack | Launchpad bug 1926434 in devstack "devstack@memory_tracker.service: Main process exited, code=exited, status=1/FAILURE" [High,In progress] | 15:59 |
lyarwood | and https://review.opendev.org/c/openstack/devstack/+/620198 | 15:59 |
*** rcernin has joined #openstack-nova | 15:59 | |
*** lucasagomes has quit IRC | 16:04 | |
*** rcernin has quit IRC | 16:05 | |
*** dklyle has quit IRC | 16:12 | |
lyarwood | stephenfin: oh and https://review.opendev.org/c/openstack/nova/+/778885 hasn't landed yet but is also seeing failures related to https://bugs.launchpad.net/devstack/+bug/1926434 | 16:12 |
openstack | Launchpad bug 1926434 in devstack "devstack@memory_tracker.service: Main process exited, code=exited, status=1/FAILURE" [High,In progress] | 16:12 |
*** dklyle has joined #openstack-nova | 16:20 | |
*** dtantsur is now known as dtantsur|afk | 16:22 | |
*** rcernin has joined #openstack-nova | 16:38 | |
*** rpittau is now known as rpittau|afk | 16:39 | |
*** rcernin has quit IRC | 16:43 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova-specs master: QoS minimum guaranteed packet rate https://review.opendev.org/c/openstack/nova-specs/+/785014 | 16:56 |
*** rcernin has joined #openstack-nova | 16:57 | |
*** rcernin has quit IRC | 17:02 | |
*** derekh has quit IRC | 17:04 | |
*** hamalq has joined #openstack-nova | 17:04 | |
*** hamalq has quit IRC | 17:05 | |
*** hamalq has joined #openstack-nova | 17:06 | |
*** k_mouza has quit IRC | 17:10 | |
*** bnemec has quit IRC | 17:16 | |
*** bnemec has joined #openstack-nova | 17:16 | |
*** lbragstad has quit IRC | 17:37 | |
*** lbragstad has joined #openstack-nova | 17:38 | |
*** gyee has joined #openstack-nova | 17:39 | |
*** ralonsoh has quit IRC | 17:53 | |
*** andrewbonney has quit IRC | 18:00 | |
*** jmlowe has quit IRC | 18:05 | |
openstackgerrit | Jessie Lass proposed openstack/nova master: Add emulation support if host arch != guest arch. https://review.opendev.org/c/openstack/nova/+/772156 | 18:21 |
*** k_mouza has joined #openstack-nova | 19:10 | |
*** k_mouza has quit IRC | 19:14 | |
*** amodi has quit IRC | 19:20 | |
*** vishalmanchanda has quit IRC | 19:40 | |
*** clarkb has joined #openstack-nova | 19:59 | |
clarkb | Hello nova. Hitting a weird error that doesn't happen 100% of the time. https://zuul.opendev.org/t/openstack/build/c7c698c95f2b4590ba95f14a6c97ffc0/log/nodepool/openstack/screen-n-api.txt#1735-1805 seems like when doing a server create somtimes nova forgets about the keystone domain? | 19:59 |
clarkb | note this would've been run against master devstack (which is master nova). And the rough sequence of events that sometimes trips it is: `source /opt/devstack/openrc admin admin && openstack --os-project-name demo --os-username demo server create --flavor cirros256 --image $cirros_image unmanaged-vm --network public` | 20:00 |
clarkb | one suspicion for why it doesn't fail 100% of the time is that the traceback shows nova hitting _refresh_neutron_extensions_cache() which maybe isn't called on every request? | 20:01 |
*** lbragstad_ has joined #openstack-nova | 20:01 | |
clarkb | note that the devstack openrc should be providing the keystone user and project domain values | 20:01 |
clarkb | Is this familar to anyone? I'm happy to help debug further but quickly running out of thread to pull on | 20:02 |
*** lbragstad has quit IRC | 20:03 | |
*** dave-mccowan has quit IRC | 20:22 | |
*** dave-mccowan has joined #openstack-nova | 20:26 | |
*** jmlowe has joined #openstack-nova | 20:39 | |
*** _erlon_ has joined #openstack-nova | 20:42 | |
*** whoami-rajat has quit IRC | 20:52 | |
*** amodi has joined #openstack-nova | 20:54 | |
*** brinzhang0 has joined #openstack-nova | 20:59 | |
*** brinzhang_ has quit IRC | 21:02 | |
openstackgerrit | Merged openstack/nova stable/train: libvirt: 'video.vram' property must be an integer https://review.opendev.org/c/openstack/nova/+/757618 | 21:24 |
gmann | clarkb: nova work on default domain, also there is no recent change on devstack side to remove default domain or so. | 21:25 |
*** slaweq has quit IRC | 21:25 | |
gmann | i am not sure why that is happening. | 21:25 |
*** andrewbogott has joined #openstack-nova | 21:31 | |
andrewbogott | This is mostly a question for dansmith but I welcome advice from anyone :) I've just upgraded my install to V and all of my hypervisors are saying "Current Nova version does not support computes older than Ussuri but the minimum compute service level in your cell is 40..." | 21:32 |
andrewbogott | A totally reasonable warning except I cannot for the live of me figure out /why/ that's the minimum service level. All my nodes are updated, | 21:33 |
andrewbogott | I've read the code, looked in all my DBs, no idea where that 40 is coming from. Can you tell me what the process is for calculating that? Is it cached someplace? | 21:33 |
dansmith | it should be stored on the service record | 21:34 |
dansmith | and yeah I'd guess maybe some old record is being considered (or you have one node not updated like you think) | 21:34 |
andrewbogott | tell me more about 'the service record'? | 21:35 |
dansmith | the records in the services table in the db | 21:35 |
andrewbogott | so you would think that 'delete from services where topic='compute' and version != 53;' would fix it right? | 21:36 |
andrewbogott | (which I already did and it didn't) | 21:36 |
dansmith | no, I'd say look at them first and let's figure out what's going on | 21:36 |
dansmith | they're all the new version? | 21:36 |
andrewbogott | there were several entries in there from nodes that don't exist anymore referring to old versions | 21:37 |
andrewbogott | that's why I ran that delete. | 21:37 |
dansmith | okay, and you've restarted services? | 21:38 |
andrewbogott | maybe not, which services would I need to restart? apis? | 21:38 |
dansmith | where are you seeing the warning? | 21:39 |
andrewbogott | nova-compute logs | 21:39 |
andrewbogott | on startup | 21:39 |
andrewbogott | (so yeah, I've definitely restarted those) | 21:40 |
andrewbogott | well -- to be clear, I haven't restarted all my compute services, only the one I'm watching the logs for | 21:40 |
*** rcernin has joined #openstack-nova | 21:40 | |
andrewbogott | but I definitely restarted them all after the version upgrade | 21:41 |
andrewbogott | I'm happy to hack in a debug line if there's someplace I can dump the name of the service it's finding that's sub-40 | 21:41 |
andrewbogott | (thank you, btw, for immediately responding to my unfunded tech support request!) | 21:42 |
dansmith | okay, well, tbh I barely remember how this works, and I don't think I wrote it | 21:43 |
dansmith | but restarting nova-compute should be plenty for nova-compute to re-survey things | 21:43 |
dansmith | do you have multiple cell dbs? | 21:44 |
andrewbogott | I have an empty cell0 and one actual in-use cell | 21:44 |
*** bnemec has quit IRC | 21:44 | |
andrewbogott | I can also just ignore the warning since it's clearly harmless in the current version. Patch comments imply that it might be a hard error in the future though. | 21:45 |
dansmith | yeah, so all of the services run that code, not just nova-compute | 21:45 |
dansmith | right, hard error now | 21:45 |
andrewbogott | ok, should probably sort it out before I hit that wall then | 21:46 |
* andrewbogott frantically restarts things | 21:46 | |
*** bnemec has joined #openstack-nova | 21:46 | |
dansmith | victoria is 52 according to the code | 21:46 |
dansmith | you said 53 above | 21:46 |
andrewbogott | maybe it's a subversion? That's what I see in my db | 21:47 |
dansmith | no, 53 was some time in wallaby | 21:48 |
dansmith | oh, | 21:49 |
dansmith | the alias is wrong | 21:49 |
andrewbogott | ok, I just ran a fleet-wide restart of all services (api, scheduler, conductor, api-metadata, and placement-api for good measure) | 21:49 |
andrewbogott | and I think the warning has gone away. Let me double check on another host... | 21:49 |
andrewbogott | yah, it's gone | 21:50 |
andrewbogott | So it was cached... somewhere! | 21:50 |
andrewbogott | thanks for talking me through it dansmith, this is my mistake for despairing before observing rule one | 21:51 |
dansmith | well, it's really not cached anywhere other than in memory in a process, fwiw | 21:51 |
dansmith | but...good? | 21:51 |
dansmith | however, it looks to me like the alias in V is wrong, it should be 53 as you note | 21:52 |
andrewbogott | you're right, 'cached' isn't really the right word for it | 21:52 |
dansmith | I don't think it would cause the situation you were seeing | 21:52 |
andrewbogott | I wonder if there should be some clean-up stage that removes records from obviously-no-longer-running services | 21:52 |
andrewbogott | Or at least that version check could be restricted to running services | 21:52 |
dansmith | well, if you do service delete that will happen, but we don't want to do that automatically since we don't know if you've just got something offline for a long time | 21:53 |
dansmith | well, | 21:53 |
andrewbogott | yeah, I assume the issue is me decom'ing hardware before doing the service delete | 21:53 |
andrewbogott | although this was for hosts that no longer appeared in the cli | 21:53 |
andrewbogott | they were hidden in the DB | 21:54 |
dansmith | we want to not regress the minimum version by starting something old, nor do we want to migrate some data that an older service that might get turned on and be confused | 21:54 |
dansmith | hidden in the db? | 21:54 |
dansmith | if it's not ignoring deleted records then that's definitely a bug | 21:54 |
andrewbogott | just -- I found references to hosts in the db that didn't appear in 'openstack compute service list' | 21:54 |
andrewbogott | but I guess since I didn't restart all services after every db change I don't know which bit was causing the problem :( | 21:54 |
dansmith | well, there might still be an issue with considering deleted records it sounds like, which would be a bug if you want to file it | 21:55 |
andrewbogott | at the very least there's an issue with <unknown> service still reporting the version of a deleted service, at least until it's restarted | 21:56 |
dansmith | well, do we know if it's deleted or just not being shown in the API? | 21:56 |
andrewbogott | good point, it could be either | 21:56 |
dansmith | I don't know how the api behaves | 21:56 |
andrewbogott | I guess I need to figure out if I can reproduce it | 21:56 |
dansmith | but yes, if it's that, then definitely a bug | 21:56 |
andrewbogott | dansmith: does that minimum_version_check happen via RPC? I stuck a bunch of debug lines in the version check on my compute node and they were never traversed. | 21:57 |
dansmith | computes can't talk to the database, so yes | 21:57 |
dansmith | but other services go straight to the db for it | 21:57 |
andrewbogott | I don't think I've ever seen @base.remotable_classmethod but I take that to mean 'this happens on a totally different system' | 21:58 |
andrewbogott | or at least can | 21:58 |
dansmith | it does, if coming from compute, | 21:59 |
andrewbogott | ok | 21:59 |
clarkb | gmann: ya and it works like 90% of the time | 21:59 |
andrewbogott | that solves another mystery then | 21:59 |
dansmith | and yes, you might be right.. you might need to restart conductor which does that on behalf of the compute | 21:59 |
clarkb | gmann: that is why I suspect maybe it has to do with the cache refresh since in theory that only happens if you exceed timeouts or similar | 21:59 |
dansmith | yeah, I hadn't really thought of that, but I guess we'd get conductor's in-memory "cache" of that version | 21:59 |
andrewbogott | That seems like the simplest explanation. Not easy to fix though | 22:00 |
andrewbogott | other than having the 'delete' command print | 22:00 |
dansmith | well, | 22:00 |
andrewbogott | "ok now restart your conductor" | 22:00 |
dansmith | not sure that needs a fix | 22:00 |
andrewbogott | d'you think the conductor would've refreshed its state eventually? | 22:00 |
dansmith | no, all the nodes will hold that value in memory until restart, that's kinda the point | 22:01 |
clarkb | gmann: fungi mentioned it could possibly be an openstack(client|sdk) regression where it isn't passing that info properly | 22:01 |
dansmith | API and scheduler will do the same | 22:01 |
dansmith | it's just that restarting an api worker will resolve it, whereas computes are dependent on the conductors for it | 22:01 |
andrewbogott | It's really the conductor's job to provide /current/ database state though isn't it? | 22:03 |
andrewbogott | Are there reasons why we would want the conductor to not re-read for every query? (Other than performance) | 22:03 |
dansmith | yeah, it's expensive, and it also controls conductor's RPC pin, which we don't want to shift at runtime unless you restart or HUP it | 22:04 |
dansmith | the conductor also translates things according to the version pin and the versions supported by the compute that is asking for a thing | 22:05 |
dansmith | it's not just a transparent proxy | 22:05 |
andrewbogott | hmmmm | 22:06 |
andrewbogott | So really it should be pushed -- when a service is deleted it should force some kind of reload (rather than the conductor re-reading the db periodically) | 22:06 |
*** rcernin has quit IRC | 22:06 | |
andrewbogott | that sounds like a pain to implement for a tiny gain | 22:06 |
*** rcernin has joined #openstack-nova | 22:07 | |
dansmith | you want to fan-out to every service in the cluster every time a service is deleted just in case some of them need to re-calculate the min version? | 22:07 |
dansmith | seems like a lot of pain for little gain | 22:07 |
dansmith | or maybe that's what you meant. | 22:07 |
andrewbogott | yeah, a lot of pain | 22:08 |
andrewbogott | So a real bug but not one that's worth fixing | 22:09 |
openstackgerrit | Dmitrii Shcherbakov proposed openstack/nova-specs master: Introduce Transport Nodes https://review.opendev.org/c/openstack/nova-specs/+/787458 | 22:09 |
*** k_mouza has joined #openstack-nova | 22:09 | |
andrewbogott | dansmith: I need to run -- thanks again for getting me unstuck! | 22:10 |
*** k_mouza has quit IRC | 22:14 | |
dansmith | yup | 22:15 |
*** macz_ has quit IRC | 22:42 | |
*** tosky has quit IRC | 22:48 | |
*** luksky has quit IRC | 22:56 | |
*** andrewbogott has left #openstack-nova | 23:25 | |
*** martinkennelly has quit IRC | 23:56 | |
*** mlavalle has quit IRC | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!