Wednesday, 2021-04-28

*** CeeMac has quit IRC00:12
*** k_mouza has quit IRC00:16
*** rcernin has joined #openstack-nova00:21
*** sapd1_x has quit IRC00:40
*** swp20 has quit IRC00:50
*** __ministry has joined #openstack-nova01:19
*** swp20 has joined #openstack-nova01:27
*** LinPeiWen has quit IRC01:44
*** k_mouza has joined #openstack-nova01:48
*** hamalq has quit IRC01:52
*** hamalq has joined #openstack-nova01:52
*** k_mouza has quit IRC01:52
*** priteau has quit IRC02:03
*** sorrison has quit IRC02:22
*** LinPeiWen has joined #openstack-nova02:39
*** sapd1_x has joined #openstack-nova02:40
*** hemanth_n has joined #openstack-nova02:49
*** mkrai has joined #openstack-nova03:46
*** sapd1_x has quit IRC04:26
*** markmcclain has quit IRC04:27
*** markmcclain has joined #openstack-nova04:28
*** sapd1_x has joined #openstack-nova04:34
*** ratailor has joined #openstack-nova04:37
*** vishalmanchanda has joined #openstack-nova04:40
*** mkrai_ has joined #openstack-nova04:53
*** mkrai has quit IRC04:54
*** ralonsoh has joined #openstack-nova05:27
*** sapd1_x has quit IRC05:39
*** slaweq has joined #openstack-nova06:00
*** k_mouza has joined #openstack-nova06:01
*** k_mouza has quit IRC06:06
*** damien_r has joined #openstack-nova06:06
*** damien_r has quit IRC06:11
*** lpetrut has joined #openstack-nova06:16
*** swp20 has quit IRC06:31
*** luksky has joined #openstack-nova06:34
*** gyee has quit IRC06:35
*** mkrai_ has quit IRC06:35
*** dklyle has quit IRC06:59
*** hamalq has quit IRC07:03
*** gokhani has joined #openstack-nova07:11
*** andrewbonney has joined #openstack-nova07:15
*** rcernin has quit IRC07:25
*** rpittau|afk is now known as rpittau07:36
*** hamalq has joined #openstack-nova07:39
*** k_mouza has joined #openstack-nova07:45
*** k_mouza has quit IRC07:50
*** yonglihe has joined #openstack-nova07:50
*** martinkennelly has joined #openstack-nova07:58
gibifyi it seems grenade is broken on master https://zuul.opendev.org/t/openstack/builds?job_name=nova-grenade-multinode&project=openstack%2Fnova&branch=master07:59
gibiwith an error in cinder upgrade07:59
gibihttps://zuul.opendev.org/t/openstack/build/7ee322bd1c024bf0a1855c7625534c29/log/logs/grenade.sh.txt#4686707:59
gibi"Failed to start rtslib-fb-targetctl.service: Unit rtslib-fb-targetctl.service is not loaded properly: Exec format error."07:59
*** derekh has joined #openstack-nova08:01
*** hamalq has quit IRC08:02
*** lucasagomes has joined #openstack-nova08:03
lyarwoodcrap, I bet that's my lioadm devstack change08:04
gibiit seems it interferes with the bionic jobs08:04
lyarwoodhttps://review.opendev.org/c/openstack/devstack/+/77962408:04
lyarwoodI was sure I pinned the bionic jobs to tgtadm08:05
gibihttp://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Failed%20to%20start%20rtslib-fb-targetctl.service%5C%22 this query shows that it is mostly hit on bionic08:05
lyarwoodoh it's because this is still the old grenade multinode job08:07
lyarwoodgah08:07
lyarwoodwe really need to land https://review.opendev.org/c/openstack/nova/+/778885 soonish assuming everyone is okay with the job08:08
gibiI'm on it08:08
openstackgerritBrin Zhang proposed openstack/nova master: Replaces tenant_id with project_id from List SG API  https://review.opendev.org/c/openstack/nova/+/76672608:09
*** hamalq has joined #openstack-nova08:09
gibilyarwood: fyi there are other than the nova grenade jobs out there that is blocked now, so while I totally agree to move forward with the new grenade jobs in nova it is not a full fix for the gate (it fixes the nova gate though :))08:10
openstackgerritLee Yarwood proposed openstack/nova master: zuul: Pin nova-grenade-multinode to tgtadm CINDER_ISCSI_HELPER  https://review.opendev.org/c/openstack/nova/+/78842508:14
lyarwoodgibi: so the other fix is that ^08:14
* lyarwood looks at the logstash query08:14
lyarwoodyeah these are all dsvm bionic jobs, gah08:15
*** ociuhandu has joined #openstack-nova08:15
lyarwoodlet me fix this in devstack actually08:17
lyarwoodthese jobs really need to be updated anyway08:18
gibiOK08:20
gibiI continue review the patch that adds the new grenade job to nova08:20
*** rcernin has joined #openstack-nova08:21
*** ociuhandu has quit IRC08:25
*** ociuhandu has joined #openstack-nova08:26
*** k_mouza has joined #openstack-nova08:30
*** gokhani has quit IRC08:31
*** ociuhandu has quit IRC08:32
*** rcernin has quit IRC08:32
openstackgerritLee Yarwood proposed openstack/nova master: DNM - testing nova-grenade-multinode fix  https://review.opendev.org/c/openstack/nova/+/78843008:35
*** gokhani has joined #openstack-nova08:37
*** sapd1_x has joined #openstack-nova08:37
*** brinzhang0 has joined #openstack-nova08:39
lyarwoodgibi: https://review.opendev.org/c/openstack/devstack/+/788429 should fix all jobs08:39
*** ociuhandu has joined #openstack-nova08:41
*** brinzhang_ has quit IRC08:42
gibidoes that ^^ superseeds https://review.opendev.org/c/openstack/nova/+/788425 ?08:45
lyarwoodgibi: well I've posted three changes in all, one that fixes just nova-grenade-multinode as it currently is, one that fixes nova-grenade-multinode by moving to the zuulv3 based jobs and a devstack change to fix all bionic based jobs08:47
lyarwoodgibi: https://review.opendev.org/c/openstack/nova/+/788425 is something we can land quickly IMHO08:47
gibiI've just approved the zuul v3 move08:47
lyarwoodah there we go then08:47
lyarwoodI'll drop https://review.opendev.org/c/openstack/nova/+/78842508:47
gibiOk08:48
gibithanks08:48
*** bnemec has quit IRC08:48
lyarwoodthe zuulv3 change should also move our grenade testing to focal to focal finally08:48
lyarwoodwe dropped bionic back in ussuri08:48
*** bnemec has joined #openstack-nova08:49
*** mkrai has joined #openstack-nova08:53
sean-k-mooneydoes anyone know why we have an os_type column in the instances table09:11
sean-k-mooneyas far as i can tell we always use the os_type image metadata property  when generating xmls or schduling09:12
sean-k-mooneynot an os_type form the instance09:12
*** k_mouza has quit IRC09:18
*** k_mouza has joined #openstack-nova09:19
*** ratailor_ has joined #openstack-nova09:22
*** priteau has joined #openstack-nova09:22
stephenfinbauzas: Are you okay with me addressing your nits and then reviewing https://review.opendev.org/c/openstack/nova/+/695012 myself? I think the bulk of the change will stay the same09:23
bauzasstephenfin: woah, long story here, I can't remember the context but sure09:23
sean-k-mooneywe really should try and get that serise merged09:24
*** ratailor has quit IRC09:25
*** sapd1_x has quit IRC09:26
sean-k-mooneyhum i wonder why i never left any review comments on that unless its a different seriese form mark09:26
sean-k-mooneyoh i was thinking of https://review.opendev.org/c/openstack/nova/+/710848 https://review.opendev.org/c/openstack/nova/+/710847 and https://review.opendev.org/c/openstack/nova/+/76035409:29
*** k_mouza has quit IRC09:30
*** rcernin has joined #openstack-nova09:39
*** k_mouza has joined #openstack-nova09:41
*** hamalq has quit IRC09:44
*** tosky has joined #openstack-nova09:45
*** sapd1_x has joined #openstack-nova09:49
stephenfingibi: Could I get you to review the small diff on https://review.opendev.org/c/openstack/nova/+/676209 yet again, please? /o\09:50
stephenfinlyarwood: Any chance you'd be able to slog through that so I can finally close it out? I genuinely think it's a useful addition09:51
*** ratailor__ has joined #openstack-nova09:55
*** ratailor_ has quit IRC09:57
lyarwoodstephenfin: yup I'll queue it up for later today if that's okay09:58
stephenfinwfm. Thanks09:58
lyarwoodstephenfin: and I'll likely ask for the same later in the cycle for the block layer09:58
stephenfinfair :)09:58
*** vishalmanchanda has quit IRC10:00
*** sapd1_x has quit IRC10:06
*** sapd1_x has joined #openstack-nova10:06
*** vishalmanchanda has joined #openstack-nova10:11
gibistephenfin: I will check it after my lunch10:12
*** whoami-rajat_ has joined #openstack-nova10:22
openstackgerritLee Yarwood proposed openstack/nova stable/victoria: libvirt: Ignore device already in the process of unplug errors  https://review.opendev.org/c/openstack/nova/+/78846710:24
openstackgerritLee Yarwood proposed openstack/nova stable/ussuri: libvirt: Ignore device already in the process of unplug errors  https://review.opendev.org/c/openstack/nova/+/78846810:24
openstackgerritLee Yarwood proposed openstack/nova stable/train: libvirt: Ignore device already in the process of unplug errors  https://review.opendev.org/c/openstack/nova/+/78846910:26
*** dtantsur is now known as dtantsur|brb10:32
*** ociuhandu has quit IRC10:39
sean-k-mooneystephenfin: +1 on the first patch for nova.pci10:41
stephenfinthanks :)10:41
sean-k-mooneystephenfin: i have 2 other review i want to complte first then ill see if i can get back to the rest10:41
openstackgerritBrin Zhang proposed openstack/nova master: Replaces tenant_id with project_id from Flavor Access APIs  https://review.opendev.org/c/openstack/nova/+/76770410:45
*** tbachman has joined #openstack-nova10:48
*** ociuhandu has joined #openstack-nova10:57
*** brinzhang_ has joined #openstack-nova10:59
openstackgerritBalazs Gibizer proposed openstack/nova master: DNM: Testing with sqlalchemy 1.4  https://review.opendev.org/c/openstack/nova/+/78847111:00
gibifyi we are expected failures and therefore work here ^^11:01
gibis/expected/expecting/11:01
*** brinzhang0 has quit IRC11:02
*** k_mouza_ has joined #openstack-nova11:04
*** k_mouza has quit IRC11:04
*** k_mouza has joined #openstack-nova11:05
*** mkrai has quit IRC11:06
*** ociuhandu has quit IRC11:07
*** k_mouza_ has quit IRC11:08
openstackgerritBrin Zhang proposed openstack/nova master: Replaces tenant_id with project_id from List/Show usage APIs  https://review.opendev.org/c/openstack/nova/+/76850911:11
gibireported https://bugs.launchpad.net/nova/+bug/1926426 to track the work11:11
openstackLaunchpad bug 1926426 in OpenStack Compute (nova) "Nova is not compatible with sqlalchemy 1.4" [High,Triaged]11:11
*** mkrai has joined #openstack-nova11:12
sean-k-mooneyis the issue the test or the production code?11:13
sean-k-mooneylooks like maybe both11:13
sean-k-mooneyhave we capped it in UC11:14
*** sapd1_x has quit IRC11:14
*** ociuhandu has joined #openstack-nova11:15
*** k_mouza_ has joined #openstack-nova11:15
sean-k-mooneyah we had to 1.3.23 previously11:15
gibiit is a test coming from a proposed UC bump11:15
gibiso far we see one oslo.db issue but we also seem to have some nova specific failure modes as well11:16
*** k_mouza has quit IRC11:18
*** k_mouza has joined #openstack-nova11:19
*** ociuhandu has quit IRC11:19
*** ociuhandu has joined #openstack-nova11:20
*** k_mouza_ has quit IRC11:22
*** mkrai has quit IRC11:23
*** ociuhandu has quit IRC11:25
*** ociuhandu has joined #openstack-nova11:36
*** kinpaa12389 has joined #openstack-nova11:38
*** ociuhandu has quit IRC11:41
*** dtantsur|brb is now known as dtantsur11:41
*** ociuhandu has joined #openstack-nova11:52
gibisean-k-mooney: Did I understood the comment from the PTG well about when we need direction aware and when we need direction less pps resource? https://review.opendev.org/c/openstack/neutron-specs/+/785236/3/specs/xena/qos-minimum-guaranteed-packet-rate.rst#26611:56
sean-k-mooneysorry have nto got to your spec yet today but ill take a look at that comment now one sec11:58
*** __ministry has quit IRC11:59
sean-k-mooneygibi: so ya11:59
sean-k-mooneywe can either put the sum of the direct based values in the request12:00
sean-k-mooneyor we can have 3 resources class and just make the user choose the correct policy12:00
sean-k-mooneyi kind of like having 3 reosuce classes PPS_TOTAL, PPS_INGRESS and PPS_EGRESS12:01
gibiso then in case of hw offloaded OVS the packet processing bottleneck is not some kind of shared resource (like the CPU in case of normal OVS) but some direction specific hardware resoruce?12:01
gibihence we need to track resource inventory per direction separately12:02
sean-k-mooneywell yes and know. the pcie bandwith is a shared resouce but hardware offloaded ovs use sriov VFs for the dataplane12:02
sean-k-mooneyso we can pretend it just sriov in this case12:03
sean-k-mooneyvdpa is more or less the same in that the parent of the vdpa device atleat today is a VF12:03
gibiso in this case the logic that is implemented in OVS runs on the specific hardware?12:03
gibianyhow I think I get it, the HW offload OVS is like the SRIOV VF case12:04
sean-k-mooneyhow it works is the ovs vswitchd process caluates openflow rules as normal and then use the tcflower protocal to install those flows into the hardware dataplane if it supprot them12:05
gibisean-k-mooney: thanks that helps visualizing the situation for me12:05
sean-k-mooneybasically you can thinkg of the programming model like a cache if the hardware can cache the rule in its hardware swtich it will process them in hardware. the ovs proces just caculates them as normal and they are then copied to the hardware12:06
sean-k-mooneywhat that means in partcie is the first packet(s) of each connection is processed in software and then hardware takes over12:06
gibiI see12:08
gibiand in this hardwer the incoming and outgoing packet processing is implemented independently so we can run out of one direction while the other direction still has capacity12:09
gibiand therefore we need separated resource tracking12:09
sean-k-mooneyi think that depens on the nic but in principal yes the internal capasity of the switch is higher then the phsyical connectors12:10
sean-k-mooneyso it can run out of capasity to transmit to the datacenter network before its ablity to swtich between vms on the same host12:10
sean-k-mooneyalthough intel tend and mellonox until recently had a habbit of oversubscibing ther port bandwith12:11
*** ociuhandu has quit IRC12:11
sean-k-mooneyi.e. intel 2*40G fortviles only had 56Gb/s of pcie bandwith12:11
sean-k-mooneywhich is fine for the 2*25G card but ya pcie bandwith and phycial port bandwith can often be less the the chip bandwith in the data center space12:12
sean-k-mooneythe assumtion often is that you are using 2 port cards for HA not more bandwitr/pps12:13
gibiwhen you say "run out of capasity to transmit to the datacenter" do you mean it runs out of physical nework bandwidth?12:14
sean-k-mooneyyep12:14
gibiso while it can still swtich packets, it cannot transmit them12:14
gibibut then this is showing just that the packet processing capacity is independent from the bandwidth12:15
gibiit does not show that switching incoming packets are independend from switching outgoing packets12:15
sean-k-mooneyya the 2*25G intel fortvales actully had 2 40G chips one per uplink port but had 25G phys12:15
sean-k-mooneygibi: you are right it does not show they are independeint on the recive side if you have an intel cpu with vt-c12:17
sean-k-mooneythat allows supported nic to place recived packet directly into the cpus l3 cache12:17
sean-k-mooneyon transmit i know the cpu can do a dma transfre to the nic queue or indead to a zerocopy transmit in some cases12:18
*** ratailor__ has quit IRC12:19
gibiOK, I rest my case. I will do allow the admin to configure either two direction aware pps resource inventory (in case of offloaded OVS) or a single direction less resource inventory (for the normal OVS)12:20
sean-k-mooneywell no12:20
sean-k-mooneyif you think that scoping this down to just normal ovs for now is the best way forward that would be ok12:21
sean-k-mooneyit would proably be nice to support both modesl as you said but we could start with the one you want12:21
gibiI can do both, this is not the hard part (I guess :D)_12:22
sean-k-mooneyi just would not like to see logic to traslate driction based request into driectionless12:22
sean-k-mooneythe translation is the bit i found slightly unclean about the proposal12:23
*** eharney has quit IRC12:23
gibithe QoS rule will be direction aware. So that is good.12:24
gibiThen in case of normal OVS should we still allow configuring direction aware resource inventory?12:24
*** ociuhandu has joined #openstack-nova12:24
gibithat seems wrong to me as both direction are handled by the same set of CPUs runnig the OVS software12:24
sean-k-mooneywell i kind of thik we shoudl have 3 polices as i said above12:24
gibiohh12:24
sean-k-mooney2 that are direction aware and 1 that is drectionless12:24
sean-k-mooneythat map directly to the PPS_TOTAL, PPS_INGRESS and PPS_EGRESS resouce classes12:25
gibiso for vnic_type=normal the user should only use a directionless QoS policy12:25
sean-k-mooneyyep12:25
gibiOK now I got your point12:25
*** eharney has joined #openstack-nova12:25
gibi3  resource classes and 3 QoS rule types12:26
sean-k-mooneyso ovs would report PPS_TOTAL inventories and the directionless pps rule would consume it12:26
sean-k-mooneyyep12:26
sean-k-mooneyand no need for special logic to translate between them12:26
gibiOK so it is a bit more complexity to the user to select the proper QoS rule but in return we never lie about in the QoS policy about the direction awareness12:27
sean-k-mooneysorry i tought i had said that at the PTG but maybe i was not very clear12:27
gibisean-k-mooney: no it is my bad, I mixied the QoS policy part with the resource classes12:27
gibinext step12:28
*** gokhani has quit IRC12:28
sean-k-mooneybut ya its a little more complet for the tenant but simpler in code12:28
sean-k-mooneyat least i think it is12:28
gibiyepp12:28
*** gokhani has joined #openstack-nova12:28
*** ociuhandu has quit IRC12:29
gibiso next step, when we have the directionless QoS policy and later we want to implement data plane enforcement for that, then how we map the direction less min guarantee to direction aware pps rate limit rules to do enforcement?12:29
gibithis is the packet rate limiter work https://bugs.launchpad.net/neutron/+bug/1912460 from Liu12:31
openstackLaunchpad bug 1912460 in neutron "[RFE] [QoS] add qos rule type packet per second (pps)" [Wishlist,In progress] - Assigned to LIU Yulong (dragon889)12:31
gibiand as I understood from ralonso the basic solution for the guarante is to set up max limits := min guarante for all the traffic12:31
sean-k-mooneythat is a goog question we have 3 choices i guess, 1 dont, 2 split it evenly between each direction(assuming its a share resouces), 3 assume they are indepented so allwo it for both12:31
*** gokhani has quit IRC12:31
*** whoami-rajat_ is now known as whoami-rajat12:32
*** gokhani has joined #openstack-nova12:32
*** hemanth_n has quit IRC12:32
sean-k-mooneygibi: ya so like badnwith, the mins for ovs are not enfroced by ovs so you would use max ppp instead to do enfrocement12:33
sean-k-mooneyfor any backend that can enforce it then min should be enough12:33
sean-k-mooneyand you should not need max rules12:33
gibiOK so in case of normal OVS where we have the directionless QoS we know that it is not an independent resource and therefore we would need to split it but splitting it equally might not what the user intended originally12:34
sean-k-mooneygibi: well since the enforcement will be doen by the max policy the could use a directionless min policy with ad directionful? pair of max policies12:35
sean-k-mooneyso min_totall_pps=1000 max_ingress_pps=200 max_egress_pps=80012:36
gibiso we are sure that OVS will never have the capability to enforce min alone? so we alway need to manually define both min and max12:36
sean-k-mooneywell intel wehere trying to get this into ovs-dpdk for a few year but could never get around the fact they would basically have to dedicate 1 core to doing the min enforcement12:37
sean-k-mooneyi dont think the kernel implemation has made any progress either12:37
sean-k-mooneyso i dont know12:37
gibiOK, thanks I think I got the full picture12:37
gibiI will go and document it now in the neutron spec12:38
gibiI really appreciate you help12:39
sean-k-mooneyjust did some googleing mabe they have some support https://mail.openvswitch.org/pipermail/ovs-dev/2016-July/317678.html12:40
*** ociuhandu has joined #openstack-nova12:40
sean-k-mooneyi dont actully see min suport12:41
sean-k-mooneyalthoug its in the exaMPLES http://www.openvswitch.org/support/dist-docs/ovs-vsctl.8.txt12:42
sean-k-mooneymin-rate is bandwith though not pps12:43
*** lpetrut has quit IRC12:44
gibiOK12:45
*** johanssone has quit IRC12:51
*** johanssone has joined #openstack-nova12:54
sean-k-mooneygibi: so looking at https://www.openvswitch.org/support/dist-docs/ovs-vswitchd.conf.db.5.html the queue table does suppoirt min-rate now but if i rememebr correctly it is very challaging to use that correctly12:55
sean-k-mooney other_config : min-rate: optional string,  containing  an  integer,  at12:55
sean-k-mooney       least 112:55
sean-k-mooney              Minimum guaranteed bandwidth, in bit/s.12:55
gibiyapp that just bandwidth not packet rate12:55
sean-k-mooneythat sound greate but to use this you have to manually create quees on each port and then instead of output to the prot or using the normal action you have to output to the queue explictly12:56
sean-k-mooneyneutron could use that for traffic ingressing to the vm or but only if it stopped using the normal action in ml2/ovs12:57
sean-k-mooneyovn could use that for ingress to the vm but on egress from the vm it would not really work12:57
sean-k-mooneyand ya its bandwith not pps so it wont help in your case12:58
gibiack12:58
*** iurygregory has quit IRC12:59
*** macz_ has joined #openstack-nova13:05
*** eharney has quit IRC13:18
*** eharney has joined #openstack-nova13:18
openstackgerritDaniel Bengtsson proposed openstack/nova master: Use the new type HostDomainOpt.  https://review.opendev.org/c/openstack/nova/+/78824013:20
*** johanssone has quit IRC13:21
*** johanssone has joined #openstack-nova13:28
*** iurygregory has joined #openstack-nova13:35
*** ociuhandu has quit IRC13:52
*** ociuhandu has joined #openstack-nova13:53
*** ociuhandu has quit IRC13:58
*** k_mouza has quit IRC14:14
*** k_mouza_ has joined #openstack-nova14:14
*** ociuhandu has joined #openstack-nova14:14
*** ociuhandu has quit IRC14:14
*** zoharm has joined #openstack-nova14:14
*** ociuhandu has joined #openstack-nova14:16
*** k_mouza_ has quit IRC14:20
*** k_mouza has joined #openstack-nova14:20
*** ociuhandu has quit IRC14:20
*** ociuhandu has joined #openstack-nova14:21
*** dave-mccowan has joined #openstack-nova14:36
*** kinpaa12389 has quit IRC14:38
*** jobewan has quit IRC14:43
lyarwoodelod / bauzas / melwitt ; https://review.opendev.org/c/openstack/nova/+/787943 - If you have time today can I get reviews on this backport please.14:51
bauzaslyarwood: we have our meeting in 9 mins and then I'll need to go to a blood donating but I'll try14:52
openstackgerritStephen Finucane proposed openstack/nova master: Add functional regression test for bug 1853009  https://review.opendev.org/c/openstack/nova/+/69501214:53
openstackbug 1853009 in OpenStack Compute (nova) ussuri "Ironic node rebalance race can lead to missing compute nodes in DB" [High,In progress] https://launchpad.net/bugs/1853009 - Assigned to Mark Goddard (mgoddard)14:53
openstackgerritStephen Finucane proposed openstack/nova master: Clear rebalanced compute nodes from resource tracker  https://review.opendev.org/c/openstack/nova/+/69518714:53
lyarwoodbauzas: yup no issues if you can't get to it14:53
openstackgerritStephen Finucane proposed openstack/nova master: Invalidate provider tree when compute node disappears  https://review.opendev.org/c/openstack/nova/+/69518814:54
openstackgerritStephen Finucane proposed openstack/nova master: Prevent deletion of a compute node belonging to another host  https://review.opendev.org/c/openstack/nova/+/69480214:54
openstackgerritStephen Finucane proposed openstack/nova master: Fix inactive session error in compute node creation  https://review.opendev.org/c/openstack/nova/+/69518914:54
bauzaslyarwood: /me clicks anyway14:54
lyarwoodah crap the patch below hasn't actually landed14:55
lyarwoodsorry I thought it had, this isn't urgent in that case14:55
*** jobewan has joined #openstack-nova14:57
*** prometheanfire has left #openstack-nova14:57
*** zoharm has quit IRC14:59
*** rcernin has quit IRC15:01
*** ociuhandu has quit IRC15:07
*** ociuhandu has joined #openstack-nova15:08
*** rcernin has joined #openstack-nova15:12
*** ociuhandu has quit IRC15:13
*** ociuhandu has joined #openstack-nova15:15
*** rcernin has quit IRC15:17
*** mlavalle has joined #openstack-nova15:22
melwittlyarwood: ack, I can look15:23
*** gokhani has quit IRC15:29
*** dklyle has joined #openstack-nova15:38
*** damien_r has joined #openstack-nova15:41
*** ociuhandu has quit IRC15:42
*** jobewan has quit IRC15:43
*** ociuhandu has joined #openstack-nova15:43
*** jobewan has joined #openstack-nova15:44
*** ociuhandu has quit IRC15:48
stephenfinIs it just me, or is nova-grenade-multinode failing 100% right now?15:54
* stephenfin compares two jobs15:55
stephenfinYeah, looks like Ubuntu is broken15:56
stephenfinFailed to start rtslib-fb-targetctl.service: Unit rtslib-fb-targetctl.service is not loaded properly: Exec format error.15:56
lyarwoodyeah it is15:57
lyarwoodstephenfin: https://bugs.launchpad.net/devstack/+bug/1926411 https://review.opendev.org/c/openstack/devstack/+/78842915:57
openstackLaunchpad bug 1926411 in devstack "legacy bionic based dsvm jobs failing with Failed to start rtslib-fb-targetctl.service" [Undecided,In progress]15:57
stephenfinaha, thanks15:58
stephenfinI saw something about bionic scroll by earlier but didn't realize it was related15:58
lyarwoodargh and that's blocked by https://bugs.launchpad.net/devstack/+bug/192643415:59
openstackLaunchpad bug 1926434 in devstack "devstack@memory_tracker.service: Main process exited, code=exited, status=1/FAILURE" [High,In progress]15:59
lyarwoodand https://review.opendev.org/c/openstack/devstack/+/62019815:59
*** rcernin has joined #openstack-nova15:59
*** lucasagomes has quit IRC16:04
*** rcernin has quit IRC16:05
*** dklyle has quit IRC16:12
lyarwoodstephenfin: oh and https://review.opendev.org/c/openstack/nova/+/778885 hasn't landed yet but is also seeing failures related to https://bugs.launchpad.net/devstack/+bug/192643416:12
openstackLaunchpad bug 1926434 in devstack "devstack@memory_tracker.service: Main process exited, code=exited, status=1/FAILURE" [High,In progress]16:12
*** dklyle has joined #openstack-nova16:20
*** dtantsur is now known as dtantsur|afk16:22
*** rcernin has joined #openstack-nova16:38
*** rpittau is now known as rpittau|afk16:39
*** rcernin has quit IRC16:43
openstackgerritBalazs Gibizer proposed openstack/nova-specs master: QoS minimum guaranteed packet rate  https://review.opendev.org/c/openstack/nova-specs/+/78501416:56
*** rcernin has joined #openstack-nova16:57
*** rcernin has quit IRC17:02
*** derekh has quit IRC17:04
*** hamalq has joined #openstack-nova17:04
*** hamalq has quit IRC17:05
*** hamalq has joined #openstack-nova17:06
*** k_mouza has quit IRC17:10
*** bnemec has quit IRC17:16
*** bnemec has joined #openstack-nova17:16
*** lbragstad has quit IRC17:37
*** lbragstad has joined #openstack-nova17:38
*** gyee has joined #openstack-nova17:39
*** ralonsoh has quit IRC17:53
*** andrewbonney has quit IRC18:00
*** jmlowe has quit IRC18:05
openstackgerritJessie Lass proposed openstack/nova master: Add emulation support if host arch != guest arch.  https://review.opendev.org/c/openstack/nova/+/77215618:21
*** k_mouza has joined #openstack-nova19:10
*** k_mouza has quit IRC19:14
*** amodi has quit IRC19:20
*** vishalmanchanda has quit IRC19:40
*** clarkb has joined #openstack-nova19:59
clarkbHello nova. Hitting a weird error that doesn't happen 100% of the time. https://zuul.opendev.org/t/openstack/build/c7c698c95f2b4590ba95f14a6c97ffc0/log/nodepool/openstack/screen-n-api.txt#1735-1805 seems like when doing a server create somtimes nova forgets about the keystone domain?19:59
clarkbnote this would've been run against master devstack (which is master nova). And the rough sequence of events that sometimes trips it is: `source /opt/devstack/openrc admin admin && openstack --os-project-name demo --os-username demo server create --flavor cirros256 --image $cirros_image unmanaged-vm --network public`20:00
clarkbone suspicion for why it doesn't fail 100% of the time is that the traceback shows nova hitting _refresh_neutron_extensions_cache() which maybe isn't called on every request?20:01
*** lbragstad_ has joined #openstack-nova20:01
clarkbnote that the devstack openrc should be providing the keystone user and project domain values20:01
clarkbIs this familar to anyone? I'm happy to help debug further but quickly running out of thread to pull on20:02
*** lbragstad has quit IRC20:03
*** dave-mccowan has quit IRC20:22
*** dave-mccowan has joined #openstack-nova20:26
*** jmlowe has joined #openstack-nova20:39
*** _erlon_ has joined #openstack-nova20:42
*** whoami-rajat has quit IRC20:52
*** amodi has joined #openstack-nova20:54
*** brinzhang0 has joined #openstack-nova20:59
*** brinzhang_ has quit IRC21:02
openstackgerritMerged openstack/nova stable/train: libvirt: 'video.vram' property must be an integer  https://review.opendev.org/c/openstack/nova/+/75761821:24
gmannclarkb: nova work on default domain, also there is no recent change on devstack side to remove default domain or so.21:25
*** slaweq has quit IRC21:25
gmanni am not sure why that is happening.21:25
*** andrewbogott has joined #openstack-nova21:31
andrewbogottThis is mostly a question for dansmith but I welcome advice from anyone :)  I've just upgraded my install to V and all of my hypervisors are saying "Current Nova version does not support computes older than Ussuri but the minimum compute service level in your cell is 40..."21:32
andrewbogottA totally reasonable warning except I cannot for the live of me figure out /why/ that's the minimum service level.  All my nodes are updated,21:33
andrewbogottI've read the code, looked in all my DBs, no idea where that 40 is coming from.  Can you tell me what the process is for calculating that?  Is it cached someplace?21:33
dansmithit should be stored on the service record21:34
dansmithand yeah I'd guess maybe some old record is being considered (or you have one node not updated like you think)21:34
andrewbogotttell me more about 'the service record'?21:35
dansmiththe records in the services table in the db21:35
andrewbogottso you would think that 'delete from services where topic='compute' and version != 53;' would fix it right?21:36
andrewbogott(which I already did and it didn't)21:36
dansmithno, I'd say look at them first and let's figure out what's going on21:36
dansmiththey're all the new version?21:36
andrewbogottthere were several entries in there from nodes that don't exist anymore referring to old versions21:37
andrewbogottthat's why I ran that delete.21:37
dansmithokay, and you've restarted services?21:38
andrewbogottmaybe not, which services would I need to restart?  apis?21:38
dansmithwhere are you seeing the warning?21:39
andrewbogottnova-compute logs21:39
andrewbogotton startup21:39
andrewbogott(so yeah, I've definitely restarted those)21:40
andrewbogottwell -- to be clear, I haven't restarted all my compute services, only the one I'm watching the logs for21:40
*** rcernin has joined #openstack-nova21:40
andrewbogottbut I definitely restarted them all after the version upgrade21:41
andrewbogottI'm happy to hack in a debug line if there's someplace I can dump the name of the service it's finding that's sub-4021:41
andrewbogott(thank you, btw, for immediately responding to my unfunded tech support request!)21:42
dansmithokay, well, tbh I barely remember how this works, and I don't think I wrote it21:43
dansmithbut restarting nova-compute should be plenty for nova-compute to re-survey things21:43
dansmithdo you have multiple cell dbs?21:44
andrewbogottI have an empty cell0 and one actual in-use cell21:44
*** bnemec has quit IRC21:44
andrewbogottI can also just ignore the warning since it's clearly harmless in the current version.  Patch comments imply that it might be a hard error in the future though.21:45
dansmithyeah, so all of the services run that code, not just nova-compute21:45
dansmithright, hard error now21:45
andrewbogottok, should probably sort it out before I hit that wall then21:46
* andrewbogott frantically restarts things21:46
*** bnemec has joined #openstack-nova21:46
dansmithvictoria is 52 according to the code21:46
dansmithyou said 53 above21:46
andrewbogottmaybe it's a subversion?  That's what I see in my db21:47
dansmithno, 53 was some time in wallaby21:48
dansmithoh,21:49
dansmiththe alias is wrong21:49
andrewbogottok, I just ran a fleet-wide restart of all services (api, scheduler, conductor, api-metadata, and placement-api for good measure)21:49
andrewbogottand I think the warning has gone away.  Let me double check on another host...21:49
andrewbogottyah, it's gone21:50
andrewbogottSo it was cached... somewhere!21:50
andrewbogottthanks for talking me through it dansmith, this is my mistake for despairing before observing rule one21:51
dansmithwell, it's really not cached anywhere other than in memory in a process, fwiw21:51
dansmithbut...good?21:51
dansmithhowever, it looks to me like the alias in V is wrong, it should be 53 as you note21:52
andrewbogottyou're right, 'cached' isn't really the right word for it21:52
dansmithI don't think it would cause the situation you were seeing21:52
andrewbogottI wonder if there should be some clean-up stage that removes records from obviously-no-longer-running services21:52
andrewbogottOr at least that version check could be restricted to running services21:52
dansmithwell, if you do service delete that will happen, but we don't want to do that automatically since we don't know if you've just got something offline for a long time21:53
dansmithwell,21:53
andrewbogottyeah, I assume the issue is me decom'ing hardware before doing the service delete21:53
andrewbogottalthough this was for hosts that no longer appeared in the cli21:53
andrewbogottthey were hidden in the DB21:54
dansmithwe want to not regress the minimum version by starting something old, nor do we want to migrate some data that an older service that might get turned on and be confused21:54
dansmithhidden in the db?21:54
dansmithif it's not ignoring deleted records then that's definitely a bug21:54
andrewbogottjust -- I found references to hosts in the db that didn't appear in 'openstack compute service list'21:54
andrewbogottbut I guess since I didn't restart all services after every db change I don't know which bit was causing the problem :(21:54
dansmithwell, there might still be an issue with considering deleted records it sounds like, which would be a bug if you want to file it21:55
andrewbogottat the very least there's an issue with <unknown> service still reporting the version of a deleted service, at least until it's restarted21:56
dansmithwell, do we know if it's deleted or just not being shown in the API?21:56
andrewbogottgood point, it could be either21:56
dansmithI don't know how the api behaves21:56
andrewbogottI guess I need to figure out if I can reproduce it21:56
dansmithbut yes, if it's that, then definitely a bug21:56
andrewbogottdansmith: does that minimum_version_check happen via RPC?  I stuck a bunch of debug lines in the version check on my compute node and they were never traversed.21:57
dansmithcomputes can't talk to the database, so yes21:57
dansmithbut other services go straight to the db for it21:57
andrewbogottI don't think I've ever seen     @base.remotable_classmethod but I take that to mean 'this happens on a totally different system'21:58
andrewbogottor at least can21:58
dansmithit does, if coming from compute,21:59
andrewbogottok21:59
clarkbgmann: ya and it works like 90% of the time21:59
andrewbogottthat solves another mystery then21:59
dansmithand yes, you might be right.. you might need to restart conductor which does that on behalf of the compute21:59
clarkbgmann: that is why I suspect maybe it has to do with the cache refresh since in theory that only happens if you exceed timeouts or similar21:59
dansmithyeah, I hadn't really thought of that, but I guess we'd get conductor's in-memory "cache" of that version21:59
andrewbogottThat seems like the simplest explanation.  Not easy to fix though22:00
andrewbogottother than having the 'delete' command print22:00
dansmithwell,22:00
andrewbogott"ok now restart your conductor"22:00
dansmithnot sure that needs a fix22:00
andrewbogottd'you think the conductor would've refreshed its state eventually?22:00
dansmithno, all the nodes will hold that value in memory until restart, that's kinda the point22:01
clarkbgmann: fungi mentioned it could possibly be an openstack(client|sdk) regression where it isn't passing that info properly22:01
dansmithAPI and scheduler will do the same22:01
dansmithit's just that restarting an api worker will resolve it, whereas computes are dependent on the conductors for it22:01
andrewbogottIt's really the conductor's job to provide /current/ database state though isn't it?22:03
andrewbogottAre there reasons why we would want the conductor to not re-read for every query?  (Other than performance)22:03
dansmithyeah, it's expensive, and it also controls conductor's RPC pin, which we don't want to shift at runtime unless you restart or HUP it22:04
dansmiththe conductor also translates things according to the version pin and the versions supported by the compute that is asking for a thing22:05
dansmithit's not just a transparent proxy22:05
andrewbogotthmmmm22:06
andrewbogottSo really it should be pushed -- when a service is deleted it should force some kind of reload (rather than the conductor re-reading the db periodically)22:06
*** rcernin has quit IRC22:06
andrewbogottthat sounds like a pain to implement for a tiny gain22:06
*** rcernin has joined #openstack-nova22:07
dansmithyou want to fan-out to every service in the cluster every time a service is deleted just in case some of them need to re-calculate the min version?22:07
dansmithseems like a lot of pain for little gain22:07
dansmithor maybe that's what you meant.22:07
andrewbogottyeah, a lot of pain22:08
andrewbogottSo a real bug but not one that's worth fixing22:09
openstackgerritDmitrii Shcherbakov proposed openstack/nova-specs master: Introduce Transport Nodes  https://review.opendev.org/c/openstack/nova-specs/+/78745822:09
*** k_mouza has joined #openstack-nova22:09
andrewbogottdansmith: I need to run -- thanks again for getting me unstuck!22:10
*** k_mouza has quit IRC22:14
dansmithyup22:15
*** macz_ has quit IRC22:42
*** tosky has quit IRC22:48
*** luksky has quit IRC22:56
*** andrewbogott has left #openstack-nova23:25
*** martinkennelly has quit IRC23:56
*** mlavalle has quit IRC23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!