16:00:09 #startmeeting nova 16:00:12 Meeting started Thu Aug 27 16:00:09 2020 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:15 The meeting name has been set to 'nova' 16:00:34 o/ 16:00:36 o/ 16:00:43 o/ 16:00:47 lets get started 16:00:49 #topic Bugs (stuck/critical) 16:00:57 We have a high severity CVE #link https://bugs.launchpad.net/nova/+bug/1890501 that has been fixed on master and ussuri and patches are going in to older stable branches. 16:00:59 Launchpad bug 1890501 in OpenStack Compute (nova) stein "Soft reboot after live-migration reverts instance to original source domain XML (CVE-2020-17376)" [Critical,In progress] - Assigned to Lee Yarwood (lyarwood) 16:01:06 \o 16:01:11 o/ 16:01:27 besides that I don't see any critical bugs 16:01:36 #link 31 new untriaged bugs (-5 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 16:01:45 #link 6 untagged untriaged bugs (-2 change since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW 16:01:54 please look at the untriaged bug list and try to push some bugs forward 16:02:01 also I pinged some of you about specific bugs 16:02:12 thanks gibi for scrubbing the list 16:02:18 \o 16:02:27 do we need to talk about some of the open bugs herE? 16:02:51 https://review.opendev.org/#/c/747358/ looks jammed btw 16:03:07 not sure what we need to do to move it into the gate 16:03:24 sorry ignore that 16:03:31 thought it was the CVE, too many bugs :) 16:03:40 :) 16:03:52 #topic Runways 16:03:57 etherpad #link https://etherpad.opendev.org/p/nova-runways-victoria 16:04:02 bp/provider-config-file has been merged \o/ so it's slot is freed up in the runway 16:04:11 the first patch of bp/cyborg-rebuild-and-evacuate now has +2 from me 16:04:20 he spawn part of the bp/add-emulated-virtual-tpm has been merged, the migrate and resize part has a +2 from me 16:04:27 nothing is in the queue and we have a free runway slot, so if you have a feature ready then it is a good time to add it to the queue 16:04:51 do we need to talk about the items in the runway slots? 16:05:20 gibi: I'll look at the vtpm series again tomorrow morning 16:05:29 cool 16:05:29 thanks 16:05:33 this should potentially free yet another slot 16:05:45 *potentially* :) 16:06:09 #topic Release Planning 16:06:14 We have 2 weeks until Milestone 3 which is Feature Freeze 16:06:23 I've set up the release tracking etherpad #link https://etherpad.opendev.org/p/nova-victoria-rc-potential 16:06:29 Next week is the last release of non-client libraries for Victoria. For os-vif we would like to get #link https://review.opendev.org/#/c/744816/ merged before the release 16:06:44 Sooner than later we have to talk about cycle highlights and reno prelude 16:07:16 anything else about the coming release that we have to discuss? 16:07:51 https://blueprints.launchpad.net/nova/+spec/parallel-filter-scheduler 16:08:30 harsha24: let's get back to that in the Open Discussion 16:08:41 okay 16:08:52 #topic Stable Branches 16:09:08 lyarwood: do you have any news? 16:10:16 I guess we release from all the stable branches that are not in EM due to the CVE I mentioned above 16:10:19 elod: has been busy cutting releases 16:10:43 cool 16:10:45 elod: no other news aside from that really 16:10:52 yes, ussuri release is done, 16:10:55 ops sorry 16:11:04 train patch is open 16:11:13 stein will come as soon as the patch lands :) 16:11:22 np :) 16:11:36 I guess we can release ussuri but we need to hold for train, nope ? 16:11:49 * bauzas is lost with all the train queue 16:12:07 the train patch is: https://review.opendev.org/#/c/748383/ 16:12:18 https://review.opendev.org/#/c/747358/ would be nice to land as we did in ussuri 16:12:25 that's the change jammed at the moment 16:12:26 bauzas: there are some patch on the queue, but do we want to wait? 16:12:39 I don't know 16:12:45 ask the owner :p 16:12:48 ok, I'll -W than the Train release patch 16:13:15 elod: ack, I'll reset the +W flags on that change, hopefully that should get it back into the gate 16:13:35 OK 16:13:41 lyarwood: not jammed actually 16:13:48 just the parent needs to get merged 16:13:51 :] 16:14:04 oh my bad I was sure that had already landed 16:14:19 gerrit-- needs to make these dots bigger! 16:14:19 moving on 16:14:24 Libvirt (bauzas) 16:14:37 there is thing on the agenda 16:14:39 (lyarwood) Looking at bumping MIN_{QEMU,LIBVIRT}_VERSION in V 16:14:40 nothing to report honestly, I know aarents had changes but I didn't paid attention 16:14:45 #link https://review.opendev.org/#/q/topic:bump-libvirt-qemu-victoria+(status:open+OR+status:merged) 16:14:50 This would mean using UCA on bionic ahead of our move to focal, is anyone against that? 16:15:17 I forgot to update the agenda 16:15:22 looking at these patches most of them is are in merge conflict 16:15:28 this looks like it's blocked on the same focal detach device bug 16:15:30 lyarwood: ohh, so this was discussed last week 16:15:35 yeah sorry 16:15:38 np 16:15:45 gibi: we used to use uca on 16.04 16:16:05 #link https://bugs.launchpad.net/nova/+bug/1882521 is the bug reported against focal 16:16:06 Launchpad bug 1882521 in OpenStack Compute (nova) "Failing device detachments on Focal" [High,New] 16:16:23 I'm seeing the same issue with UCA on bionic but I can't reproduce locally 16:17:03 if we are supposed to be moving to focal anyway this needs to get resolved 16:17:08 yeah 16:17:20 i think we were ment to move at m2 16:17:31 or try too so we shoudl swap sonner rather then later 16:17:38 yeah I know gmann was still trying to move 16:17:47 I'm hoping somebody will crack that as I did not have time to look into it 16:18:04 same, I'm out until Tuesday after this that doesn't help 16:18:08 lyarwood: yeah it is not clear why those fails 16:18:46 as of now those failing tests are skipped to keep doing the testing. 16:18:54 lyarwood: could this be due to persitent vs transitant domain difference 16:19:03 but yes this is one of blocker to move to Focal 16:19:50 sean-k-mooney: maybe, I really can't tell tbh 16:19:54 #action we need a volunteer to look into the Focal bug #link https://bugs.launchpad.net/nova/+bug/1882521 16:19:55 Launchpad bug 1882521 in OpenStack Compute (nova) "Failing device detachments on Focal" [High,New] 16:20:08 for what its wroth focal and cento8 but use the same version of libvirt 16:20:23 so i would expect this to affact centos8 too 16:20:38 and as an extention rhel 16:21:16 let's move on now but please continue discussing this bug on #openstack-nova 16:21:27 #topic Stuck Reviews 16:21:46 nothing on the agenda. Is there anything that is stuck? 16:22:28 I'd say https://review.opendev.org/#/c/711604/ 16:22:42 i think you missed API updates 16:22:50 (or I can mention it when talking about community goals) 16:23:23 gmann: ack, sorry, I will get back to that 16:23:29 tosky: as I see you need review 16:23:31 on that patch 16:23:45 tosky: I can look at it tomorrow 16:23:53 we should rename this section 16:24:07 this is confusing, we're not asking for code waiting to be reviewed 16:24:13 bauzas: sure 16:24:22 but for patches that are stuck because of conflicting opinions 16:24:42 bauzas: I will rename it to reviews with conflicting oppinions 16:24:53 I suck at naming things 16:24:55 when was the last time we had one? 16:25:06 so I won't surely propose a thing 16:25:07 maybe we could just remove it from the agenda altogether? 16:25:14 real one? on the meeting? pretty long time ago 16:25:15 this could work too 16:25:32 open discussions are there anyway, so people can argue there 16:25:42 #action gibi to remove stuck review from the agenda 16:25:52 moving on 16:25:52 #topic PTG and Forum planning 16:26:01 The next Forum and PTG is less than 2 months from now 16:26:05 summary mail #link http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016770.html 16:26:10 please indicate your acceptable PTG timeslots in #link https://doodle.com/poll/a5pgqh7bypq8piew 16:26:15 please collect topics in #link https://etherpad.opendev.org/p/nova-wallaby-ptg 16:26:43 anything to be discussed about the coming PTG and Forum? 16:27:35 #topic Open discussion 16:27:46 harsha24: what can we do for you? 16:28:12 he's proposing a new design for processing filters that are using threads 16:28:30 we had this discussion in the past and we always said filters aren't a performance bottleneck 16:28:35 i am thinking is there a way to reduce the latency to spawn a vm using mutlithreding concept 16:28:35 but,n 16:28:49 things changed slightly since we have placement 16:29:29 harsha24: the point is, I wouldn't argue for more complexity unless you prove me there are huge benefits in doing such parallelism 16:29:30 bauzas: well with plamcent you have less need for filters 16:29:37 sean-k-mooney: correct 16:29:58 harsha24: do you have measurement about the latency specific to the filter executions? 16:30:09 imagine how much worse NoValidHost will be to debug when it depends on the ordering of threads :) 16:30:11 tbc, all filter processing is memory-based 16:30:18 dansmith: right, that too 16:30:31 filters ordering is crucial for most of our ops 16:30:41 yep 16:30:55 well all the filters for a given requst could probaly be put in there own tread without that being an issue 16:30:59 dansmith, well, if it's done properly we'd join() all the threads *then* print out a summary 16:31:09 sean-k-mooney: we can do that today with number of workers on the scheduler right? 16:31:10 I remember johnthetubaguy making a proposal about some smarter filter processing in the past 16:31:13 I think the question is - is the complexity and effort worth it? 16:31:13 indvigual filtere on the other hand would be a problem 16:31:28 dansmith: ya more or less 16:31:31 but that's... yeah, artom said it loudly 16:31:38 harsha24: what was your suggestion 16:31:40 no actually list of each filter hosts will outcome and make an intersection of filtered hosts at the end 16:32:06 so each filter runs on the full set of hosts, which means expensive filters cost even more 16:32:07 sean-k-mooney: a classic parallelism approach 16:32:19 yeah 16:32:20 dansmith: yup 16:32:28 so the current filter list has a defiend order 16:32:36 honestly, again, the CPU time is very short 16:32:42 Thinking out loud, perhaps just doing the external REST API calls async with something like concurrent.futures might be a much bigger speed gain 16:32:45 and we use generators 16:32:54 if we run them in paralle they need to filter more host and we then need to caulate the intersection 16:32:58 honestly, again 16:32:59 so that will use more cpu and memory 16:33:13 IOW - do other stuff while we wait for Cinder/Neutron to answer. I know it's a completely different thing, just continuing the effort/results thought... 16:33:13 operators never saw the filters performance as a bottleneck 16:33:15 sean-k-mooney: that was my point yeah 16:33:20 ask CERN if you don't know 16:33:32 do we know which filters are expensive, last time I looked its the DB queries that dominated 16:33:34 artom: what filters call out to cinder and neutron? 16:33:42 johnthetubaguy: my whole point 16:33:47 exactly 16:33:47 dansmith, not the filters, in the compute manager for example 16:33:53 artom: this is about filters 16:33:53 dansmith, as I said, completely different thing :) 16:33:54 we stopping querying the DB in the filters ages ago 16:33:57 ack 16:33:57 sean-k-mooney: bauzas we need less filter with placement AND filters are run only on few hosts with room for allocation (others are filtered before by placement) 16:33:57 the aggrate* fitlers and numa proably are the most expencive 16:34:16 numa I'm sure 16:34:18 dansmith, I know - it was more the "where can we invest to improve speed" idea 16:34:18 aarents: yes 16:34:26 what about weighers 16:34:30 AGAIN, filters don't call DB or neutron 16:34:44 numa and affinity were fairly expensive last time I looked 16:34:54 weigher are a spesrate pahse they could be paralised but i think tehy are cheap 16:34:54 harsha24: even worse.. weigh all hosts even though we're going to exclude most? 16:35:05 they just manipulate in-memory python objects to answer a simple question 16:35:19 sean-k-mooney: again, the ordering is crucial 16:35:23 dansmith: well we coudl weight the filtered host in parrale 16:35:25 for weighers 16:35:32 bauzas: not for weighers 16:35:37 that is not an orderd list 16:35:50 or is it i should check that 16:35:51 I mean after the filtered hosts then can use parallelism for weighers 16:35:57 if its orded then yes 16:36:03 sean-k-mooney: trust me :) 16:36:08 sean-k-mooney: that seems like a tiny potential for improvement 16:36:18 filters order is not required as we intesect the set at the end 16:36:21 dansmith: correct 16:36:21 this seems putting more complexity for no gain 16:36:33 bauzas: +1 16:36:42 harsha24: filter order is currently used to impore perfomance 16:36:54 sean-k-mooney: there's still overhead in spinning up those threads, joining, and then the sorting is still linear, so I'd super doubt it's worth it :) 16:36:55 sean-k-mooney: +1 16:36:57 you can change the oder of the filters to elimiate more hosts in the intial filters 16:37:19 I thought ops were maybe wanting to describe filters and weights another way, but they were okay with a sequential approach 16:37:20 OK so in summary, we don't see filters as a performance bottleneck of instance spawn. Therefore adding the complexity of the parallel exection does not worth it for us. 16:37:22 Again the no filters shold be less than no of threads available 16:37:24 dansmith: yes im not arguign for threads by the way 16:37:33 harsha24: can you provide any data for this 16:37:35 maybe this request is actually just someone's master's thesis and not really reality? :P 16:37:45 harsha24: so we can look at a specici case 16:38:02 sean-k-mooney: ++ 16:38:17 I would like to see data that proves what we could gain 16:38:25 before moving forward with this 16:38:29 the claim is also latency, 16:38:32 not throughput 16:38:37 harsha24: we have had ineffinctly inpmeneted filters in the past which have been optimised either at the db layer or in python 16:38:44 we're discussing about a finite number of possibilities 16:38:51 dansmith, oh be nice :) If harsha24 is looking to improve Nova performance, we should guide them to where we thing the biggest problems are 16:38:58 in general, we have less than 1000 nodes 16:39:06 so parallelism doesn't really help 16:39:23 even 100000 with say 10 filters is reasonable 16:39:24 well even if we have more placment will have narrowed it 16:39:30 right 16:39:40 and again, the whole thing was sucking because of the DB access times 16:39:47 again, latency as dansmith said 16:40:08 also, you can improve latency by reducing the number of results from placement, if you're claiming latency on an empty cloud where tons of hosts are considered by the filters 16:40:17 at least that's what a couple of summit sessions from CERN teached me 16:40:30 cern reduce it excessively 16:40:37 aarents: what are your findings btw ? 16:40:37 but yes 16:40:38 which is far less complex than threading any of the rest of the process 16:41:15 ++ 16:41:29 more request filters to remove filters? 16:41:45 johnthetubaguy: that's covered under the placement topic I think, but always better yeah 16:41:47 wehre possible 16:41:59 dansmith: true 16:42:00 johnthetubaguy: well, I wouldn't be that opiniated 16:42:07 meaning, as we do more of that, fewer real filters are required because placement gives us more accurate results 16:42:14 harsha24, are you following any of this, by the way? 16:42:28 johnthetubaguy: I'd say some ops would prefer having a fine-grained placement query and others would prefer looping over the filters logic 16:42:28 harsha24, was your idea to implement threading, or more generally to try and improve performance? 16:42:33 dansmith: +1 16:42:34 what if we use this as a plugin when there are few filters like less than 3 filters or so 16:42:45 bauzas: we leaves with cluster of 1500 nodes on newton(placement free).. it is slow but it "usable" 16:42:50 live 16:43:08 aarents: ah, I forgot you lag :p 16:43:32 artom yeah 16:43:57 harsha24: we had plugins in the past that tried to use caching shcudler and other trick to spead up shcudling 16:44:05 in the long run that appoch ahs not worked out 16:44:29 you coudl do an experiment but im not sure the scudling is the botelneck in spawning an instance 16:44:38 well, caching scheduler became pointless, now placement does it better 16:44:41 oh okay then its not feasible 16:44:57 OK lets wrap this up. 16:45:02 johnthetubaguy: ya more or less 16:45:06 does this method improves performance in weighers 16:45:08 caching scheduler was there because of the DB bottleneck, not because of the filters :) 16:45:24 so the direct idea of paralelize the filter exection seems not worth it. please provide data to prove the gain. 16:45:33 +1 16:45:49 I will try a PoC based on this 16:45:50 harsha24: if you are here to improve performance in general, then I think artom had some idea where to gain omre 16:45:51 with a devstack env ideally, please 16:45:52 bauzas: agreed, and the filters, with loads of hosts too basically zero time, when I measured it in prod at a public cloud 16:46:01 s/too/took/ 16:46:27 Is there any other open disucssion topic for today? 16:46:37 gibi, it was literally a random thought in my head, but I guess? 16:46:47 artom: :) 16:46:59 End this! ;) 16:47:00 artom: I'd dare say good luck with improving I/Os on filters 16:47:07 we do async the neutron stuff already, IIRC, 16:47:13 right 16:47:15 and the cinder stuff has dependencies that make it hard, IIRC 16:47:29 we don't call cinder on a filter either way 16:47:37 and I'd -2 any proposal like it 16:47:40 bauzas: artom was randomly picking non-filter things 16:47:46 hah 16:47:52 *performance related* random things! 16:47:58 I wasn't talking about, like cabbages! 16:48:01 but I thought we were discussing about filters parralelism 16:48:07 bauzas: so this is not filter-related, but it was confusing in a discussion about...filters :) 16:48:12 Yes, to improve *performance* :) 16:48:13 right 16:48:14 bauzas: we were... 16:48:16 let go to #openstack-nova 16:48:21 if nothing else for today then lets close the meeting. 16:48:25 please 16:48:27 I said "End this!" like 13 lines ago 16:48:28 and we can continue this there if we want 16:48:31 thanks for joining 16:48:31 okay, my tongue is dry 16:48:33 o/ 16:48:36 #endmeeting