16:00:09 <gibi> #startmeeting nova
16:00:12 <openstack> Meeting started Thu Aug 27 16:00:09 2020 UTC and is due to finish in 60 minutes.  The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:15 <openstack> The meeting name has been set to 'nova'
16:00:34 <dansmith> o/
16:00:36 <gibi> o/
16:00:43 <gmann> o/
16:00:47 <gibi> lets get started
16:00:49 <gibi> #topic Bugs (stuck/critical)
16:00:57 <gibi> We have a high severity CVE #link https://bugs.launchpad.net/nova/+bug/1890501 that has been fixed on master and ussuri and patches are going in to older stable branches.
16:00:59 <openstack> Launchpad bug 1890501 in OpenStack Compute (nova) stein "Soft reboot after live-migration reverts instance to original source domain XML (CVE-2020-17376)" [Critical,In progress] - Assigned to Lee Yarwood (lyarwood)
16:01:06 <bauzas> \o
16:01:11 <stephenfin> o/
16:01:27 <gibi> besides that I don't see any critical bugs
16:01:36 <gibi> #link 31 new untriaged bugs (-5 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New
16:01:45 <gibi> #link 6 untagged untriaged bugs (-2 change since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW
16:01:54 <gibi> please look at the untriaged bug list and try to push some bugs forward
16:02:01 <gibi> also I pinged some of you about specific bugs
16:02:12 <bauzas> thanks gibi for scrubbing the list
16:02:18 <lyarwood> \o
16:02:27 <gibi> do we need to talk about some of the open bugs herE?
16:02:51 <lyarwood> https://review.opendev.org/#/c/747358/ looks jammed btw
16:03:07 <lyarwood> not sure what we need to do to move it into the gate
16:03:24 <lyarwood> sorry ignore that
16:03:31 <lyarwood> thought it was the CVE, too many bugs :)
16:03:40 <gibi> :)
16:03:52 <gibi> #topic Runways
16:03:57 <gibi> etherpad #link https://etherpad.opendev.org/p/nova-runways-victoria
16:04:02 <gibi> bp/provider-config-file has been merged \o/ so it's slot is freed up in the runway
16:04:11 <gibi> the first patch of bp/cyborg-rebuild-and-evacuate now has +2 from me
16:04:20 <gibi> he spawn part of the bp/add-emulated-virtual-tpm has been merged, the migrate and resize part has a +2 from me
16:04:27 <gibi> nothing is in the queue and we have a free runway slot, so if you have a feature ready then it is a good time to add it to the queue
16:04:51 <gibi> do we need to talk about the items in the runway slots?
16:05:20 <bauzas> gibi: I'll look at the vtpm series again tomorrow morning
16:05:29 <gibi> cool
16:05:29 <gibi> thanks
16:05:33 <bauzas> this should potentially free yet another slot
16:05:45 <bauzas> *potentially* :)
16:06:09 <gibi> #topic Release Planning
16:06:14 <gibi> We have 2 weeks until Milestone 3 which is Feature Freeze
16:06:23 <gibi> I've set up the release tracking etherpad #link https://etherpad.opendev.org/p/nova-victoria-rc-potential
16:06:29 <gibi> Next week is the last release of non-client libraries for Victoria. For os-vif we would like to get #link https://review.opendev.org/#/c/744816/ merged before the release
16:06:44 <gibi> Sooner than later we have to talk about cycle highlights and reno prelude
16:07:16 <gibi> anything else about the coming release that we have to discuss?
16:07:51 <harsha24> https://blueprints.launchpad.net/nova/+spec/parallel-filter-scheduler
16:08:30 <gibi> harsha24: let's get back to that in the Open Discussion
16:08:41 <harsha24> okay
16:08:52 <gibi> #topic Stable Branches
16:09:08 <gibi> lyarwood: do you have any news?
16:10:16 <gibi> I guess we release from all the stable branches that are not in EM due to the CVE I mentioned above
16:10:19 <lyarwood> elod: has been busy cutting releases
16:10:43 <gibi> cool
16:10:45 <lyarwood> elod: no other news aside from that really
16:10:52 <elod> yes, ussuri release is done,
16:10:55 <lyarwood> ops sorry
16:11:04 <elod> train patch is open
16:11:13 <elod> stein will come as soon as the patch lands :)
16:11:22 <elod> np :)
16:11:36 <bauzas> I guess we can release ussuri but we need to hold for train, nope ?
16:11:49 * bauzas is lost with all the train queue
16:12:07 <elod> the train patch is: https://review.opendev.org/#/c/748383/
16:12:18 <lyarwood> https://review.opendev.org/#/c/747358/ would be nice to land as we did in ussuri
16:12:25 <lyarwood> that's the change jammed at the moment
16:12:26 <elod> bauzas: there are some patch on the queue, but do we want to wait?
16:12:39 <bauzas> I don't know
16:12:45 <bauzas> ask the owner :p
16:12:48 <elod> ok, I'll -W than the Train release patch
16:13:15 <lyarwood> elod: ack, I'll reset the +W flags on that change, hopefully that should get it back into the gate
16:13:35 <gibi> OK
16:13:41 <elod> lyarwood: not jammed actually
16:13:48 <elod> just the parent needs to get merged
16:13:51 <elod> :]
16:14:04 <lyarwood> oh my bad I was sure that had already landed
16:14:19 <lyarwood> gerrit-- needs to make these dots bigger!
16:14:19 <gibi> moving on
16:14:24 <gibi> Libvirt (bauzas)
16:14:37 <gibi> there is thing on the agenda
16:14:39 <gibi> (lyarwood) Looking at bumping MIN_{QEMU,LIBVIRT}_VERSION in V
16:14:40 <bauzas> nothing to report honestly, I know aarents had changes but I didn't paid attention
16:14:45 <gibi> #link https://review.opendev.org/#/q/topic:bump-libvirt-qemu-victoria+(status:open+OR+status:merged)
16:14:50 <gibi> This would mean using UCA on bionic ahead of our move to focal, is anyone against that?
16:15:17 <lyarwood> I forgot to update the agenda
16:15:22 <gibi> looking at these patches most of them is are in merge conflict
16:15:28 <lyarwood> this looks like it's blocked on the same focal detach device bug
16:15:30 <gibi> lyarwood: ohh, so this was discussed last week
16:15:35 <lyarwood> yeah sorry
16:15:38 <gibi> np
16:15:45 <sean-k-mooney> gibi: we used to use uca on 16.04
16:16:05 <lyarwood> #link https://bugs.launchpad.net/nova/+bug/1882521 is the bug reported against focal
16:16:06 <openstack> Launchpad bug 1882521 in OpenStack Compute (nova) "Failing device detachments on Focal" [High,New]
16:16:23 <lyarwood> I'm seeing the same issue with UCA on bionic but I can't reproduce locally
16:17:03 <lyarwood> if we are supposed to be moving to focal anyway this needs to get resolved
16:17:08 <gibi> yeah
16:17:20 <sean-k-mooney> i think we were ment to move at m2
16:17:31 <sean-k-mooney> or try too so we shoudl swap sonner rather then later
16:17:38 <lyarwood> yeah I know gmann was still trying to move
16:17:47 <gibi> I'm hoping somebody will crack that as I did not have time to look into it
16:18:04 <lyarwood> same, I'm out until Tuesday after this that doesn't help
16:18:08 <gmann> lyarwood: yeah it is not clear why those fails
16:18:46 <gmann> as of now those failing tests are skipped to keep doing the testing.
16:18:54 <sean-k-mooney> lyarwood: could this be due to persitent vs transitant domain difference
16:19:03 <gmann> but yes this is one of blocker to move to Focal
16:19:50 <lyarwood> sean-k-mooney: maybe, I really can't tell tbh
16:19:54 <gibi> #action we need a volunteer to look into the Focal bug #link https://bugs.launchpad.net/nova/+bug/1882521
16:19:55 <openstack> Launchpad bug 1882521 in OpenStack Compute (nova) "Failing device detachments on Focal" [High,New]
16:20:08 <sean-k-mooney> for what its wroth focal and cento8 but use the same version of libvirt
16:20:23 <sean-k-mooney> so i would expect this to affact centos8 too
16:20:38 <sean-k-mooney> and as an extention rhel
16:21:16 <gibi> let's move on now but please continue discussing this bug on #openstack-nova
16:21:27 <gibi> #topic Stuck Reviews
16:21:46 <gibi> nothing on the agenda. Is there anything that is stuck?
16:22:28 <tosky> I'd say https://review.opendev.org/#/c/711604/
16:22:42 <gmann> i think you missed API updates
16:22:50 <tosky> (or I can mention it when talking about community goals)
16:23:23 <gibi> gmann: ack, sorry, I will get back to that
16:23:29 <gibi> tosky: as I see you need review
16:23:31 <gibi> on that patch
16:23:45 <gibi> tosky: I can look at it tomorrow
16:23:53 <bauzas> we should rename this section
16:24:07 <bauzas> this is confusing, we're not asking for code waiting to be reviewed
16:24:13 <gibi> bauzas: sure
16:24:22 <bauzas> but for patches that are stuck because of conflicting opinions
16:24:42 <gibi> bauzas: I will rename it to reviews with conflicting oppinions
16:24:53 <bauzas> I suck at naming things
16:24:55 <dansmith> when was the last time we had one?
16:25:06 <bauzas> so I won't surely propose a thing
16:25:07 <dansmith> maybe we could just remove it from the agenda altogether?
16:25:14 <gibi> real one? on the meeting? pretty long time ago
16:25:15 <bauzas> this could work too
16:25:32 <bauzas> open discussions are there anyway, so people can argue there
16:25:42 <gibi> #action gibi to remove stuck review from the agenda
16:25:52 <gibi> moving on
16:25:52 <gibi> #topic PTG and Forum planning
16:26:01 <gibi> The next Forum and PTG is less than 2 months from now
16:26:05 <gibi> summary mail #link http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016770.html
16:26:10 <gibi> please indicate your acceptable PTG timeslots in #link https://doodle.com/poll/a5pgqh7bypq8piew
16:26:15 <gibi> please collect topics in #link https://etherpad.opendev.org/p/nova-wallaby-ptg
16:26:43 <gibi> anything to be discussed about the coming PTG and Forum?
16:27:35 <gibi> #topic Open discussion
16:27:46 <gibi> harsha24: what can we do for you?
16:28:12 <bauzas> he's proposing a new design for processing filters that are using threads
16:28:30 <bauzas> we had this discussion in the past and we always said filters aren't a performance bottleneck
16:28:35 <harsha24> i am thinking is there a way to reduce the latency to spawn a vm using mutlithreding concept
16:28:35 <bauzas> but,n
16:28:49 <bauzas> things changed slightly since we have placement
16:29:29 <bauzas> harsha24: the point is, I wouldn't argue for more complexity unless you prove me there are huge benefits in doing such parallelism
16:29:30 <sean-k-mooney> bauzas: well with plamcent you have less need for filters
16:29:37 <bauzas> sean-k-mooney: correct
16:29:58 <gibi> harsha24: do you have measurement about the latency specific to the filter executions?
16:30:09 <dansmith> imagine how much worse NoValidHost will be to debug when it depends on the ordering of threads :)
16:30:11 <bauzas> tbc, all filter processing is memory-based
16:30:18 <bauzas> dansmith: right, that too
16:30:31 <bauzas> filters ordering is crucial for most of our ops
16:30:41 <dansmith> yep
16:30:55 <sean-k-mooney> well all the filters for a given requst could probaly be put in there own tread without that being an issue
16:30:59 <artom> dansmith, well, if it's done properly we'd join() all the threads *then* print out a summary
16:31:09 <dansmith> sean-k-mooney: we can do that today with number of workers on the scheduler right?
16:31:10 <bauzas> I remember johnthetubaguy making a proposal about some smarter filter processing in the past
16:31:13 <artom> I think the question is - is the complexity and effort worth it?
16:31:13 <sean-k-mooney> indvigual filtere on the other hand would be a problem
16:31:28 <sean-k-mooney> dansmith: ya more or less
16:31:31 <bauzas> but that's... yeah, artom said it loudly
16:31:38 <sean-k-mooney> harsha24: what was your suggestion
16:31:40 <harsha24> no actually list of each filter hosts will outcome and make an intersection of filtered hosts at the end
16:32:06 <dansmith> so each filter runs on the full set of hosts, which means expensive filters cost even more
16:32:07 <bauzas> sean-k-mooney: a classic parallelism approach
16:32:19 <harsha24> yeah
16:32:20 <bauzas> dansmith: yup
16:32:28 <sean-k-mooney> so the current filter list has a defiend order
16:32:36 <bauzas> honestly, again, the CPU time is very short
16:32:42 <artom> Thinking out loud, perhaps just doing the external REST API calls async with something like concurrent.futures might be a much bigger speed gain
16:32:45 <bauzas> and we use generators
16:32:54 <sean-k-mooney> if we run them in paralle they need to filter more host and we then need to caulate the intersection
16:32:58 <bauzas> honestly, again
16:32:59 <sean-k-mooney> so that will use more cpu and memory
16:33:13 <artom> IOW - do other stuff while we wait for Cinder/Neutron to answer. I know it's a completely different thing, just continuing the effort/results thought...
16:33:13 <bauzas> operators never saw the filters performance as a bottleneck
16:33:15 <dansmith> sean-k-mooney: that was my point yeah
16:33:20 <bauzas> ask CERN if you don't know
16:33:32 <johnthetubaguy> do we know which filters are expensive, last time I looked its the DB queries that dominated
16:33:34 <dansmith> artom: what filters call out to cinder and neutron?
16:33:42 <bauzas> johnthetubaguy: my whole point
16:33:47 <dansmith> exactly
16:33:47 <artom> dansmith, not the filters, in the compute manager for example
16:33:53 <dansmith> artom: this is about filters
16:33:53 <artom> dansmith, as I said, completely different thing :)
16:33:54 <bauzas> we stopping querying the DB in the filters ages ago
16:33:57 <dansmith> ack
16:33:57 <aarents> sean-k-mooney: bauzas we need less filter with placement AND filters are run only on few hosts with room for allocation (others are filtered before by placement)
16:33:57 <sean-k-mooney> the aggrate* fitlers and numa proably are the most expencive
16:34:16 <dansmith> numa I'm sure
16:34:18 <artom> dansmith, I know - it was more the "where can we invest to improve speed" idea
16:34:18 <sean-k-mooney> aarents: yes
16:34:26 <harsha24> what about weighers
16:34:30 <bauzas> AGAIN, filters don't call DB or neutron
16:34:44 <johnthetubaguy> numa and affinity were fairly expensive last time I looked
16:34:54 <sean-k-mooney> weigher are a spesrate pahse they could be paralised but i think tehy are cheap
16:34:54 <dansmith> harsha24: even worse.. weigh all hosts even though we're going to exclude most?
16:35:05 <bauzas> they just manipulate in-memory python objects to answer a simple question
16:35:19 <bauzas> sean-k-mooney: again, the ordering is crucial
16:35:23 <sean-k-mooney> dansmith: well we coudl weight the filtered host in parrale
16:35:25 <bauzas> for weighers
16:35:32 <sean-k-mooney> bauzas: not for weighers
16:35:37 <sean-k-mooney> that is not an orderd list
16:35:50 <sean-k-mooney> or is it i should check that
16:35:51 <harsha24> I mean after the filtered hosts then can use parallelism for weighers
16:35:57 <sean-k-mooney> if its orded then yes
16:36:03 <bauzas> sean-k-mooney: trust me :)
16:36:08 <dansmith> sean-k-mooney: that seems like a tiny potential for improvement
16:36:18 <harsha24> filters order is not required as we intesect the set at the end
16:36:21 <sean-k-mooney> dansmith: correct
16:36:21 <bauzas> this seems putting more complexity for no gain
16:36:33 <johnthetubaguy> bauzas: +1
16:36:42 <sean-k-mooney> harsha24: filter order is currently used to impore perfomance
16:36:54 <dansmith> sean-k-mooney: there's still overhead in spinning up those threads, joining, and then the sorting is still linear, so I'd super doubt it's worth it :)
16:36:55 <johnthetubaguy> sean-k-mooney: +1
16:36:57 <sean-k-mooney> you can change the oder of the filters to elimiate more hosts in the intial filters
16:37:19 <bauzas> I thought ops were maybe wanting to describe filters and weights another way, but they were okay with a sequential approach
16:37:20 <gibi> OK so in summary, we don't see filters as a performance bottleneck of instance spawn. Therefore adding the complexity of the parallel exection does not worth it for us.
16:37:22 <harsha24> Again the no filters shold be less than no of threads available
16:37:24 <sean-k-mooney> dansmith: yes im not arguign for threads by the way
16:37:33 <sean-k-mooney> harsha24: can you provide any data for this
16:37:35 <dansmith> maybe this request is actually just someone's master's thesis and not really reality? :P
16:37:45 <sean-k-mooney> harsha24: so we can look at a specici case
16:38:02 <gibi> sean-k-mooney: ++
16:38:17 <gibi> I would like to see data that proves what we could gain
16:38:25 <gibi> before moving forward with this
16:38:29 <dansmith> the claim is also latency,
16:38:32 <dansmith> not throughput
16:38:37 <sean-k-mooney> harsha24: we have had ineffinctly inpmeneted filters in the past which have been optimised either at the db layer or in python
16:38:44 <bauzas> we're discussing about a finite number of possibilities
16:38:51 <artom> dansmith, oh be nice :) If harsha24 is looking to improve Nova performance, we should guide them to where we thing the biggest problems are
16:38:58 <bauzas> in general, we have less than 1000 nodes
16:39:06 <bauzas> so parallelism doesn't really help
16:39:23 <bauzas> even 100000 with say 10 filters is reasonable
16:39:24 <sean-k-mooney> well even if we have more placment will have narrowed it
16:39:30 <bauzas> right
16:39:40 <bauzas> and again, the whole thing was sucking because of the DB access times
16:39:47 <bauzas> again, latency as dansmith said
16:40:08 <dansmith> also, you can improve latency by reducing the number of results from placement, if you're claiming latency on an empty cloud where tons of hosts are considered by the filters
16:40:17 <bauzas> at least that's what a couple of summit sessions from CERN teached me
16:40:30 <sean-k-mooney> cern reduce it excessively
16:40:37 <bauzas> aarents: what are your findings btw ?
16:40:37 <sean-k-mooney> but yes
16:40:38 <dansmith> which is far less complex than threading any of the rest of the process
16:41:15 <bauzas> ++
16:41:29 <johnthetubaguy> more request filters to remove filters?
16:41:45 <dansmith> johnthetubaguy: that's covered under the placement topic I think, but always better yeah
16:41:47 <sean-k-mooney> wehre possible
16:41:59 <johnthetubaguy> dansmith: true
16:42:00 <bauzas> johnthetubaguy: well, I wouldn't be that opiniated
16:42:07 <dansmith> meaning, as we do more of that, fewer real filters are required because placement gives us more accurate results
16:42:14 <artom> harsha24, are you following any of this, by the way?
16:42:28 <bauzas> johnthetubaguy: I'd say some ops would prefer having a fine-grained placement query and others would prefer looping over the filters logic
16:42:28 <artom> harsha24, was your idea to implement threading, or more generally to try and improve performance?
16:42:33 <johnthetubaguy> dansmith: +1
16:42:34 <harsha24> what if we use this as a plugin when there are few filters like less than 3 filters or so
16:42:45 <aarents> bauzas: we leaves with cluster of 1500 nodes on newton(placement free).. it is slow but it "usable"
16:42:50 <aarents> live
16:43:08 <bauzas> aarents: ah, I forgot you lag :p
16:43:32 <harsha24> artom yeah
16:43:57 <sean-k-mooney> harsha24: we had plugins in the past that tried to use caching shcudler and other trick to spead up shcudling
16:44:05 <sean-k-mooney> in the long run that appoch ahs not worked out
16:44:29 <sean-k-mooney> you coudl do an experiment but im not sure the scudling is the botelneck in spawning an instance
16:44:38 <johnthetubaguy> well, caching scheduler became pointless, now placement does it better
16:44:41 <harsha24> oh okay then its not feasible
16:44:57 <gibi> OK lets wrap this up.
16:45:02 <sean-k-mooney> johnthetubaguy: ya more or less
16:45:06 <harsha24> does this method improves performance in weighers
16:45:08 <bauzas> caching scheduler was there because of the DB bottleneck, not because of the filters :)
16:45:24 <gibi> so the direct idea of paralelize the filter exection seems not worth it. please provide data to prove the gain.
16:45:33 <bauzas> +1
16:45:49 <harsha24> I will try a PoC based on this
16:45:50 <gibi> harsha24: if you are here to improve performance in general, then I think artom had some idea where to gain omre
16:45:51 <bauzas> with a devstack env ideally, please
16:45:52 <johnthetubaguy> bauzas: agreed, and the filters, with loads of hosts too basically zero time, when I measured it in prod at a public cloud
16:46:01 <johnthetubaguy> s/too/took/
16:46:27 <gibi> Is there any other open disucssion topic for today?
16:46:37 <artom> gibi, it was literally a random thought in my head, but I guess?
16:46:47 <gibi> artom: :)
16:46:59 <artom> End this! ;)
16:47:00 <bauzas> artom: I'd dare say good luck with improving I/Os on filters
16:47:07 <dansmith> we do async the neutron stuff already, IIRC,
16:47:13 <bauzas> right
16:47:15 <dansmith> and the cinder stuff has dependencies that make it hard, IIRC
16:47:29 <bauzas> we don't call cinder on a filter either way
16:47:37 <bauzas> and I'd -2 any proposal like it
16:47:40 <dansmith> bauzas: artom was randomly picking non-filter things
16:47:46 <bauzas> hah
16:47:52 <artom> *performance related* random things!
16:47:58 <artom> I wasn't talking about, like cabbages!
16:48:01 <bauzas> but I thought we were discussing about filters parralelism
16:48:07 <dansmith> bauzas: so this is not filter-related, but it was confusing in a discussion about...filters :)
16:48:12 <artom> Yes, to improve *performance* :)
16:48:13 <bauzas> right
16:48:14 <dansmith> bauzas: we were...
16:48:16 <sean-k-mooney> let go to #openstack-nova
16:48:21 <gibi> if nothing else for today then lets close the meeting.
16:48:25 <dansmith> please
16:48:27 <artom> I said "End this!" like 13 lines ago
16:48:28 <sean-k-mooney> and we can continue this there if we want
16:48:31 <gibi> thanks for joining
16:48:31 <bauzas> okay, my tongue is dry
16:48:33 <sean-k-mooney> o/
16:48:36 <gibi> #endmeeting