16:00:09 <gibi> #startmeeting nova 16:00:12 <openstack> Meeting started Thu Aug 27 16:00:09 2020 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:15 <openstack> The meeting name has been set to 'nova' 16:00:34 <dansmith> o/ 16:00:36 <gibi> o/ 16:00:43 <gmann> o/ 16:00:47 <gibi> lets get started 16:00:49 <gibi> #topic Bugs (stuck/critical) 16:00:57 <gibi> We have a high severity CVE #link https://bugs.launchpad.net/nova/+bug/1890501 that has been fixed on master and ussuri and patches are going in to older stable branches. 16:00:59 <openstack> Launchpad bug 1890501 in OpenStack Compute (nova) stein "Soft reboot after live-migration reverts instance to original source domain XML (CVE-2020-17376)" [Critical,In progress] - Assigned to Lee Yarwood (lyarwood) 16:01:06 <bauzas> \o 16:01:11 <stephenfin> o/ 16:01:27 <gibi> besides that I don't see any critical bugs 16:01:36 <gibi> #link 31 new untriaged bugs (-5 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 16:01:45 <gibi> #link 6 untagged untriaged bugs (-2 change since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW 16:01:54 <gibi> please look at the untriaged bug list and try to push some bugs forward 16:02:01 <gibi> also I pinged some of you about specific bugs 16:02:12 <bauzas> thanks gibi for scrubbing the list 16:02:18 <lyarwood> \o 16:02:27 <gibi> do we need to talk about some of the open bugs herE? 16:02:51 <lyarwood> https://review.opendev.org/#/c/747358/ looks jammed btw 16:03:07 <lyarwood> not sure what we need to do to move it into the gate 16:03:24 <lyarwood> sorry ignore that 16:03:31 <lyarwood> thought it was the CVE, too many bugs :) 16:03:40 <gibi> :) 16:03:52 <gibi> #topic Runways 16:03:57 <gibi> etherpad #link https://etherpad.opendev.org/p/nova-runways-victoria 16:04:02 <gibi> bp/provider-config-file has been merged \o/ so it's slot is freed up in the runway 16:04:11 <gibi> the first patch of bp/cyborg-rebuild-and-evacuate now has +2 from me 16:04:20 <gibi> he spawn part of the bp/add-emulated-virtual-tpm has been merged, the migrate and resize part has a +2 from me 16:04:27 <gibi> nothing is in the queue and we have a free runway slot, so if you have a feature ready then it is a good time to add it to the queue 16:04:51 <gibi> do we need to talk about the items in the runway slots? 16:05:20 <bauzas> gibi: I'll look at the vtpm series again tomorrow morning 16:05:29 <gibi> cool 16:05:29 <gibi> thanks 16:05:33 <bauzas> this should potentially free yet another slot 16:05:45 <bauzas> *potentially* :) 16:06:09 <gibi> #topic Release Planning 16:06:14 <gibi> We have 2 weeks until Milestone 3 which is Feature Freeze 16:06:23 <gibi> I've set up the release tracking etherpad #link https://etherpad.opendev.org/p/nova-victoria-rc-potential 16:06:29 <gibi> Next week is the last release of non-client libraries for Victoria. For os-vif we would like to get #link https://review.opendev.org/#/c/744816/ merged before the release 16:06:44 <gibi> Sooner than later we have to talk about cycle highlights and reno prelude 16:07:16 <gibi> anything else about the coming release that we have to discuss? 16:07:51 <harsha24> https://blueprints.launchpad.net/nova/+spec/parallel-filter-scheduler 16:08:30 <gibi> harsha24: let's get back to that in the Open Discussion 16:08:41 <harsha24> okay 16:08:52 <gibi> #topic Stable Branches 16:09:08 <gibi> lyarwood: do you have any news? 16:10:16 <gibi> I guess we release from all the stable branches that are not in EM due to the CVE I mentioned above 16:10:19 <lyarwood> elod: has been busy cutting releases 16:10:43 <gibi> cool 16:10:45 <lyarwood> elod: no other news aside from that really 16:10:52 <elod> yes, ussuri release is done, 16:10:55 <lyarwood> ops sorry 16:11:04 <elod> train patch is open 16:11:13 <elod> stein will come as soon as the patch lands :) 16:11:22 <elod> np :) 16:11:36 <bauzas> I guess we can release ussuri but we need to hold for train, nope ? 16:11:49 * bauzas is lost with all the train queue 16:12:07 <elod> the train patch is: https://review.opendev.org/#/c/748383/ 16:12:18 <lyarwood> https://review.opendev.org/#/c/747358/ would be nice to land as we did in ussuri 16:12:25 <lyarwood> that's the change jammed at the moment 16:12:26 <elod> bauzas: there are some patch on the queue, but do we want to wait? 16:12:39 <bauzas> I don't know 16:12:45 <bauzas> ask the owner :p 16:12:48 <elod> ok, I'll -W than the Train release patch 16:13:15 <lyarwood> elod: ack, I'll reset the +W flags on that change, hopefully that should get it back into the gate 16:13:35 <gibi> OK 16:13:41 <elod> lyarwood: not jammed actually 16:13:48 <elod> just the parent needs to get merged 16:13:51 <elod> :] 16:14:04 <lyarwood> oh my bad I was sure that had already landed 16:14:19 <lyarwood> gerrit-- needs to make these dots bigger! 16:14:19 <gibi> moving on 16:14:24 <gibi> Libvirt (bauzas) 16:14:37 <gibi> there is thing on the agenda 16:14:39 <gibi> (lyarwood) Looking at bumping MIN_{QEMU,LIBVIRT}_VERSION in V 16:14:40 <bauzas> nothing to report honestly, I know aarents had changes but I didn't paid attention 16:14:45 <gibi> #link https://review.opendev.org/#/q/topic:bump-libvirt-qemu-victoria+(status:open+OR+status:merged) 16:14:50 <gibi> This would mean using UCA on bionic ahead of our move to focal, is anyone against that? 16:15:17 <lyarwood> I forgot to update the agenda 16:15:22 <gibi> looking at these patches most of them is are in merge conflict 16:15:28 <lyarwood> this looks like it's blocked on the same focal detach device bug 16:15:30 <gibi> lyarwood: ohh, so this was discussed last week 16:15:35 <lyarwood> yeah sorry 16:15:38 <gibi> np 16:15:45 <sean-k-mooney> gibi: we used to use uca on 16.04 16:16:05 <lyarwood> #link https://bugs.launchpad.net/nova/+bug/1882521 is the bug reported against focal 16:16:06 <openstack> Launchpad bug 1882521 in OpenStack Compute (nova) "Failing device detachments on Focal" [High,New] 16:16:23 <lyarwood> I'm seeing the same issue with UCA on bionic but I can't reproduce locally 16:17:03 <lyarwood> if we are supposed to be moving to focal anyway this needs to get resolved 16:17:08 <gibi> yeah 16:17:20 <sean-k-mooney> i think we were ment to move at m2 16:17:31 <sean-k-mooney> or try too so we shoudl swap sonner rather then later 16:17:38 <lyarwood> yeah I know gmann was still trying to move 16:17:47 <gibi> I'm hoping somebody will crack that as I did not have time to look into it 16:18:04 <lyarwood> same, I'm out until Tuesday after this that doesn't help 16:18:08 <gmann> lyarwood: yeah it is not clear why those fails 16:18:46 <gmann> as of now those failing tests are skipped to keep doing the testing. 16:18:54 <sean-k-mooney> lyarwood: could this be due to persitent vs transitant domain difference 16:19:03 <gmann> but yes this is one of blocker to move to Focal 16:19:50 <lyarwood> sean-k-mooney: maybe, I really can't tell tbh 16:19:54 <gibi> #action we need a volunteer to look into the Focal bug #link https://bugs.launchpad.net/nova/+bug/1882521 16:19:55 <openstack> Launchpad bug 1882521 in OpenStack Compute (nova) "Failing device detachments on Focal" [High,New] 16:20:08 <sean-k-mooney> for what its wroth focal and cento8 but use the same version of libvirt 16:20:23 <sean-k-mooney> so i would expect this to affact centos8 too 16:20:38 <sean-k-mooney> and as an extention rhel 16:21:16 <gibi> let's move on now but please continue discussing this bug on #openstack-nova 16:21:27 <gibi> #topic Stuck Reviews 16:21:46 <gibi> nothing on the agenda. Is there anything that is stuck? 16:22:28 <tosky> I'd say https://review.opendev.org/#/c/711604/ 16:22:42 <gmann> i think you missed API updates 16:22:50 <tosky> (or I can mention it when talking about community goals) 16:23:23 <gibi> gmann: ack, sorry, I will get back to that 16:23:29 <gibi> tosky: as I see you need review 16:23:31 <gibi> on that patch 16:23:45 <gibi> tosky: I can look at it tomorrow 16:23:53 <bauzas> we should rename this section 16:24:07 <bauzas> this is confusing, we're not asking for code waiting to be reviewed 16:24:13 <gibi> bauzas: sure 16:24:22 <bauzas> but for patches that are stuck because of conflicting opinions 16:24:42 <gibi> bauzas: I will rename it to reviews with conflicting oppinions 16:24:53 <bauzas> I suck at naming things 16:24:55 <dansmith> when was the last time we had one? 16:25:06 <bauzas> so I won't surely propose a thing 16:25:07 <dansmith> maybe we could just remove it from the agenda altogether? 16:25:14 <gibi> real one? on the meeting? pretty long time ago 16:25:15 <bauzas> this could work too 16:25:32 <bauzas> open discussions are there anyway, so people can argue there 16:25:42 <gibi> #action gibi to remove stuck review from the agenda 16:25:52 <gibi> moving on 16:25:52 <gibi> #topic PTG and Forum planning 16:26:01 <gibi> The next Forum and PTG is less than 2 months from now 16:26:05 <gibi> summary mail #link http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016770.html 16:26:10 <gibi> please indicate your acceptable PTG timeslots in #link https://doodle.com/poll/a5pgqh7bypq8piew 16:26:15 <gibi> please collect topics in #link https://etherpad.opendev.org/p/nova-wallaby-ptg 16:26:43 <gibi> anything to be discussed about the coming PTG and Forum? 16:27:35 <gibi> #topic Open discussion 16:27:46 <gibi> harsha24: what can we do for you? 16:28:12 <bauzas> he's proposing a new design for processing filters that are using threads 16:28:30 <bauzas> we had this discussion in the past and we always said filters aren't a performance bottleneck 16:28:35 <harsha24> i am thinking is there a way to reduce the latency to spawn a vm using mutlithreding concept 16:28:35 <bauzas> but,n 16:28:49 <bauzas> things changed slightly since we have placement 16:29:29 <bauzas> harsha24: the point is, I wouldn't argue for more complexity unless you prove me there are huge benefits in doing such parallelism 16:29:30 <sean-k-mooney> bauzas: well with plamcent you have less need for filters 16:29:37 <bauzas> sean-k-mooney: correct 16:29:58 <gibi> harsha24: do you have measurement about the latency specific to the filter executions? 16:30:09 <dansmith> imagine how much worse NoValidHost will be to debug when it depends on the ordering of threads :) 16:30:11 <bauzas> tbc, all filter processing is memory-based 16:30:18 <bauzas> dansmith: right, that too 16:30:31 <bauzas> filters ordering is crucial for most of our ops 16:30:41 <dansmith> yep 16:30:55 <sean-k-mooney> well all the filters for a given requst could probaly be put in there own tread without that being an issue 16:30:59 <artom> dansmith, well, if it's done properly we'd join() all the threads *then* print out a summary 16:31:09 <dansmith> sean-k-mooney: we can do that today with number of workers on the scheduler right? 16:31:10 <bauzas> I remember johnthetubaguy making a proposal about some smarter filter processing in the past 16:31:13 <artom> I think the question is - is the complexity and effort worth it? 16:31:13 <sean-k-mooney> indvigual filtere on the other hand would be a problem 16:31:28 <sean-k-mooney> dansmith: ya more or less 16:31:31 <bauzas> but that's... yeah, artom said it loudly 16:31:38 <sean-k-mooney> harsha24: what was your suggestion 16:31:40 <harsha24> no actually list of each filter hosts will outcome and make an intersection of filtered hosts at the end 16:32:06 <dansmith> so each filter runs on the full set of hosts, which means expensive filters cost even more 16:32:07 <bauzas> sean-k-mooney: a classic parallelism approach 16:32:19 <harsha24> yeah 16:32:20 <bauzas> dansmith: yup 16:32:28 <sean-k-mooney> so the current filter list has a defiend order 16:32:36 <bauzas> honestly, again, the CPU time is very short 16:32:42 <artom> Thinking out loud, perhaps just doing the external REST API calls async with something like concurrent.futures might be a much bigger speed gain 16:32:45 <bauzas> and we use generators 16:32:54 <sean-k-mooney> if we run them in paralle they need to filter more host and we then need to caulate the intersection 16:32:58 <bauzas> honestly, again 16:32:59 <sean-k-mooney> so that will use more cpu and memory 16:33:13 <artom> IOW - do other stuff while we wait for Cinder/Neutron to answer. I know it's a completely different thing, just continuing the effort/results thought... 16:33:13 <bauzas> operators never saw the filters performance as a bottleneck 16:33:15 <dansmith> sean-k-mooney: that was my point yeah 16:33:20 <bauzas> ask CERN if you don't know 16:33:32 <johnthetubaguy> do we know which filters are expensive, last time I looked its the DB queries that dominated 16:33:34 <dansmith> artom: what filters call out to cinder and neutron? 16:33:42 <bauzas> johnthetubaguy: my whole point 16:33:47 <dansmith> exactly 16:33:47 <artom> dansmith, not the filters, in the compute manager for example 16:33:53 <dansmith> artom: this is about filters 16:33:53 <artom> dansmith, as I said, completely different thing :) 16:33:54 <bauzas> we stopping querying the DB in the filters ages ago 16:33:57 <dansmith> ack 16:33:57 <aarents> sean-k-mooney: bauzas we need less filter with placement AND filters are run only on few hosts with room for allocation (others are filtered before by placement) 16:33:57 <sean-k-mooney> the aggrate* fitlers and numa proably are the most expencive 16:34:16 <dansmith> numa I'm sure 16:34:18 <artom> dansmith, I know - it was more the "where can we invest to improve speed" idea 16:34:18 <sean-k-mooney> aarents: yes 16:34:26 <harsha24> what about weighers 16:34:30 <bauzas> AGAIN, filters don't call DB or neutron 16:34:44 <johnthetubaguy> numa and affinity were fairly expensive last time I looked 16:34:54 <sean-k-mooney> weigher are a spesrate pahse they could be paralised but i think tehy are cheap 16:34:54 <dansmith> harsha24: even worse.. weigh all hosts even though we're going to exclude most? 16:35:05 <bauzas> they just manipulate in-memory python objects to answer a simple question 16:35:19 <bauzas> sean-k-mooney: again, the ordering is crucial 16:35:23 <sean-k-mooney> dansmith: well we coudl weight the filtered host in parrale 16:35:25 <bauzas> for weighers 16:35:32 <sean-k-mooney> bauzas: not for weighers 16:35:37 <sean-k-mooney> that is not an orderd list 16:35:50 <sean-k-mooney> or is it i should check that 16:35:51 <harsha24> I mean after the filtered hosts then can use parallelism for weighers 16:35:57 <sean-k-mooney> if its orded then yes 16:36:03 <bauzas> sean-k-mooney: trust me :) 16:36:08 <dansmith> sean-k-mooney: that seems like a tiny potential for improvement 16:36:18 <harsha24> filters order is not required as we intesect the set at the end 16:36:21 <sean-k-mooney> dansmith: correct 16:36:21 <bauzas> this seems putting more complexity for no gain 16:36:33 <johnthetubaguy> bauzas: +1 16:36:42 <sean-k-mooney> harsha24: filter order is currently used to impore perfomance 16:36:54 <dansmith> sean-k-mooney: there's still overhead in spinning up those threads, joining, and then the sorting is still linear, so I'd super doubt it's worth it :) 16:36:55 <johnthetubaguy> sean-k-mooney: +1 16:36:57 <sean-k-mooney> you can change the oder of the filters to elimiate more hosts in the intial filters 16:37:19 <bauzas> I thought ops were maybe wanting to describe filters and weights another way, but they were okay with a sequential approach 16:37:20 <gibi> OK so in summary, we don't see filters as a performance bottleneck of instance spawn. Therefore adding the complexity of the parallel exection does not worth it for us. 16:37:22 <harsha24> Again the no filters shold be less than no of threads available 16:37:24 <sean-k-mooney> dansmith: yes im not arguign for threads by the way 16:37:33 <sean-k-mooney> harsha24: can you provide any data for this 16:37:35 <dansmith> maybe this request is actually just someone's master's thesis and not really reality? :P 16:37:45 <sean-k-mooney> harsha24: so we can look at a specici case 16:38:02 <gibi> sean-k-mooney: ++ 16:38:17 <gibi> I would like to see data that proves what we could gain 16:38:25 <gibi> before moving forward with this 16:38:29 <dansmith> the claim is also latency, 16:38:32 <dansmith> not throughput 16:38:37 <sean-k-mooney> harsha24: we have had ineffinctly inpmeneted filters in the past which have been optimised either at the db layer or in python 16:38:44 <bauzas> we're discussing about a finite number of possibilities 16:38:51 <artom> dansmith, oh be nice :) If harsha24 is looking to improve Nova performance, we should guide them to where we thing the biggest problems are 16:38:58 <bauzas> in general, we have less than 1000 nodes 16:39:06 <bauzas> so parallelism doesn't really help 16:39:23 <bauzas> even 100000 with say 10 filters is reasonable 16:39:24 <sean-k-mooney> well even if we have more placment will have narrowed it 16:39:30 <bauzas> right 16:39:40 <bauzas> and again, the whole thing was sucking because of the DB access times 16:39:47 <bauzas> again, latency as dansmith said 16:40:08 <dansmith> also, you can improve latency by reducing the number of results from placement, if you're claiming latency on an empty cloud where tons of hosts are considered by the filters 16:40:17 <bauzas> at least that's what a couple of summit sessions from CERN teached me 16:40:30 <sean-k-mooney> cern reduce it excessively 16:40:37 <bauzas> aarents: what are your findings btw ? 16:40:37 <sean-k-mooney> but yes 16:40:38 <dansmith> which is far less complex than threading any of the rest of the process 16:41:15 <bauzas> ++ 16:41:29 <johnthetubaguy> more request filters to remove filters? 16:41:45 <dansmith> johnthetubaguy: that's covered under the placement topic I think, but always better yeah 16:41:47 <sean-k-mooney> wehre possible 16:41:59 <johnthetubaguy> dansmith: true 16:42:00 <bauzas> johnthetubaguy: well, I wouldn't be that opiniated 16:42:07 <dansmith> meaning, as we do more of that, fewer real filters are required because placement gives us more accurate results 16:42:14 <artom> harsha24, are you following any of this, by the way? 16:42:28 <bauzas> johnthetubaguy: I'd say some ops would prefer having a fine-grained placement query and others would prefer looping over the filters logic 16:42:28 <artom> harsha24, was your idea to implement threading, or more generally to try and improve performance? 16:42:33 <johnthetubaguy> dansmith: +1 16:42:34 <harsha24> what if we use this as a plugin when there are few filters like less than 3 filters or so 16:42:45 <aarents> bauzas: we leaves with cluster of 1500 nodes on newton(placement free).. it is slow but it "usable" 16:42:50 <aarents> live 16:43:08 <bauzas> aarents: ah, I forgot you lag :p 16:43:32 <harsha24> artom yeah 16:43:57 <sean-k-mooney> harsha24: we had plugins in the past that tried to use caching shcudler and other trick to spead up shcudling 16:44:05 <sean-k-mooney> in the long run that appoch ahs not worked out 16:44:29 <sean-k-mooney> you coudl do an experiment but im not sure the scudling is the botelneck in spawning an instance 16:44:38 <johnthetubaguy> well, caching scheduler became pointless, now placement does it better 16:44:41 <harsha24> oh okay then its not feasible 16:44:57 <gibi> OK lets wrap this up. 16:45:02 <sean-k-mooney> johnthetubaguy: ya more or less 16:45:06 <harsha24> does this method improves performance in weighers 16:45:08 <bauzas> caching scheduler was there because of the DB bottleneck, not because of the filters :) 16:45:24 <gibi> so the direct idea of paralelize the filter exection seems not worth it. please provide data to prove the gain. 16:45:33 <bauzas> +1 16:45:49 <harsha24> I will try a PoC based on this 16:45:50 <gibi> harsha24: if you are here to improve performance in general, then I think artom had some idea where to gain omre 16:45:51 <bauzas> with a devstack env ideally, please 16:45:52 <johnthetubaguy> bauzas: agreed, and the filters, with loads of hosts too basically zero time, when I measured it in prod at a public cloud 16:46:01 <johnthetubaguy> s/too/took/ 16:46:27 <gibi> Is there any other open disucssion topic for today? 16:46:37 <artom> gibi, it was literally a random thought in my head, but I guess? 16:46:47 <gibi> artom: :) 16:46:59 <artom> End this! ;) 16:47:00 <bauzas> artom: I'd dare say good luck with improving I/Os on filters 16:47:07 <dansmith> we do async the neutron stuff already, IIRC, 16:47:13 <bauzas> right 16:47:15 <dansmith> and the cinder stuff has dependencies that make it hard, IIRC 16:47:29 <bauzas> we don't call cinder on a filter either way 16:47:37 <bauzas> and I'd -2 any proposal like it 16:47:40 <dansmith> bauzas: artom was randomly picking non-filter things 16:47:46 <bauzas> hah 16:47:52 <artom> *performance related* random things! 16:47:58 <artom> I wasn't talking about, like cabbages! 16:48:01 <bauzas> but I thought we were discussing about filters parralelism 16:48:07 <dansmith> bauzas: so this is not filter-related, but it was confusing in a discussion about...filters :) 16:48:12 <artom> Yes, to improve *performance* :) 16:48:13 <bauzas> right 16:48:14 <dansmith> bauzas: we were... 16:48:16 <sean-k-mooney> let go to #openstack-nova 16:48:21 <gibi> if nothing else for today then lets close the meeting. 16:48:25 <dansmith> please 16:48:27 <artom> I said "End this!" like 13 lines ago 16:48:28 <sean-k-mooney> and we can continue this there if we want 16:48:31 <gibi> thanks for joining 16:48:31 <bauzas> okay, my tongue is dry 16:48:33 <sean-k-mooney> o/ 16:48:36 <gibi> #endmeeting