14:00:30 <efried> #startmeeting nova
14:00:31 <openstack> Meeting started Thu May 16 14:00:30 2019 UTC and is due to finish in 60 minutes.  The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:35 <openstack> The meeting name has been set to 'nova'
14:00:37 <cdent> o/
14:00:40 <edleafe> \o
14:00:40 <mriedem> HI
14:00:40 <artom> ~o~
14:00:41 <takashin> o/
14:00:45 <artom> (my name is)
14:00:52 <mriedem> slim shady
14:01:07 <artom> *scratching noises*
14:01:13 <efried> wow
14:01:17 * johnthetubaguy lurks until he has to run to a doctors appointment
14:01:43 <artom> Cell service in the subway ftw
14:02:01 <efried> #link agenda https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting
14:02:11 <efried> #topic Last meeting
14:02:11 <efried> #link Minutes from last meeting: http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-05-09-21.01.html
14:02:18 <efried> A few items from last time to follow up on...
14:02:27 <efried> fup efried action to track down owner of review status page http://status.openstack.org/reviews/#nova
14:02:47 <efried> I have gotten as far as finding out where the source is (openstack/reviewday project) but haven't dug in yet.
14:03:08 <mriedem> wow
14:03:08 <mriedem> Page refreshed at 2019-05-09 06:38:29 UTC                   466 active reviews
14:03:16 <efried> Anyone wants to hack around, knock yourself out. Lemme know what you find.
14:03:21 <mriedem> might ping infra (fungi) to see if it's busted
14:03:28 <efried> why, is that wrong?
14:03:58 <mriedem> he's been pinged
14:04:32 <efried> oh, yeah, looks like there's a bit over 700 actually open
14:04:51 <cdent> what does that page even mean?
14:04:58 <fungi> yeah, when our current fires are extinguished hopefully someone can check on status.o.o and find out if there's any error from the cron or whatever that regenerates that content
14:05:00 <efried> that's what we'd like to figure out.
14:05:11 <efried> ^ to cdent
14:05:16 <mriedem> cdent: we're not sure how the scoring is calculated
14:05:18 <efried> ...and figure out how we can use it.
14:05:28 <mriedem> but otherwise it's just a place with all the open nova reviews, sortable
14:05:40 <efried> sortable by some criteria we don't understand.
14:05:41 <mriedem> i think part of the heat factor is age
14:05:46 <fungi> you may have to dig into the reviewstats source code, but i think the scoring has to do with launchpad bug priority
14:05:52 <fungi> and age
14:06:02 <mriedem> and maybe lp heat value, idk
14:06:05 <mriedem> but that would make sense
14:06:28 <efried> #link cycle themes are still up for review https://review.opendev.org/657171
14:06:59 <efried> This has a couple +2s and a number of +1s. I'm tempted to say I'll merge it in a week if no objections from this point.
14:07:07 <efried> does that work?
14:07:16 <johnthetubaguy> +1 doing the merge
14:07:44 <efried> ight
14:08:06 <mriedem> i'll take a look after this meeting
14:08:12 <efried> thanks.
14:08:14 <efried> and last fup, couple of patches were highlighted for review last week.
14:08:20 <efried> https://review.opendev.org/#/c/643023/
14:08:20 <efried> https://review.opendev.org/#/c/643024/
14:08:33 <artom> Got dragged away for downstream bugfixes/backports, didn't get a chance to look :(
14:08:58 <efried> It looked like sean-k-mooney was into them as well, but was on vacation last week; I'll poke.
14:09:37 <efried> done
14:09:45 <efried> #topic Release News
14:10:04 <efried> anything?
14:10:28 <efried> #topic Bugs (stuck/critical)
14:10:28 <efried> No Critical bugs
14:10:28 <efried> #link 78 new untriaged bugs (up 2 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New
14:10:28 <efried> #link 10 untagged untriaged bugs (no change since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW
14:11:21 <efried> From last week, bug 1827083
14:11:22 <openstack> bug 1827083 in OpenStack-Gate "ERROR: Could not install packages due to an EnvironmentError: HTTPSConnectionPool(host='git.openstack.org', port=443): Max retries exceeded with url: /cgit/openstack/requirements/plain/upper-constraints.txt (Caused by NewConnectionError('<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x7febbf6ae630>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)) in vexxhost-sjc1" [Undecided,Confirmed] https://launchpad.net/bugs/1827083
14:11:39 <efried> looks like this has at least been worked around by making vexxhost not use ipv6
14:11:47 <efried> and then some conditional ipv6ing
14:11:55 <efried> but the bug isn't closed. mriedem, whassa deal, yo?
14:11:56 <mriedem> yes http://status.openstack.org/elastic-recheck/#1827083
14:12:01 <mriedem> mnaser is in china
14:12:09 <mriedem> so the workaround is forced ipv4
14:12:16 <mriedem> next steps are up to infra, not me
14:12:21 <mnaser> hi
14:12:23 <efried> flatline since those fixes merged, so that's good.
14:12:39 <efried> still open because more permanent solution pending?
14:12:57 <mriedem> well we want ipv6 testing in the gate per one of the release goals i believe,
14:13:06 <mriedem> and that region was all ipv6 until this week i think
14:13:11 <mnaser> ah yes, that.  IPv6 works till it doesn’t and I don’t know why only that hits it. Anyways, I’ll try to dig deeper soon with Indra hopefully.
14:13:14 <mnaser> Infra**
14:13:14 <mriedem> so yeah i'm sure people (again, infra) will be working on it
14:13:30 <efried> cool. Meantime mitigated, so \o/
14:13:38 <mriedem> yes thank clarkb
14:13:42 <artom> But... the underlying setup can be done with IPv4 even if we then test IPv6 in the tenant networks, no?
14:14:14 <mriedem> yes, we've had ipv6 testing in the gate with tempest for a long time
14:15:16 <clarkb> ya the switch here was to use external dns via ipv4 instead of ipv6
14:15:28 <clarkb> the tests themselves can still.use ipv6 internally
14:15:45 <clarkb> it was external connectivity we struggled with
14:15:58 <artom> Yeah, we're still testing IPv6 correctly
14:16:32 <efried> otherwise gate looks pretty healthy (keinehorah, ptoo-ptoo-ptoo)
14:16:52 <efried> 3rd party CI
14:16:52 <efried> #link 3rd party CI status http://ciwatch.mmedvede.net/project?project=nova&time=7+days
14:16:52 <efried> ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa looks bad - anyone know anything about this?
14:17:34 <mriedem> dtantsur was asking about some ironic job n-cpu logs the other day, but for a stable branch (rocky) i think
14:17:36 <mriedem> not sure if that would be related
14:17:58 <mriedem> http://logs.openstack.org/32/634832/29/check/ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa/fba9197/controller/logs/devstacklog.txt.gz#_2019-05-16_03_28_21_590
14:18:04 <mriedem> looks like the job is f'ed
14:18:12 <efried> also this http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006314.html
14:18:16 <efried> not sure if that's related
14:18:20 <mriedem> die 1865 'Timed out waiting for Nova to track 1 nodes'
14:18:42 <efried> when did that job stop voting?
14:18:43 <mriedem> looks like they are waiting for some CUSTOM_GOLD trait to show up
14:18:48 <mriedem> i don't think it ever was voting
14:18:53 <efried> hmph, okay.
14:18:55 <mriedem> it used to timeout all the time (years ago)
14:19:20 <mriedem> TheJulia: ^ known issue?
14:19:25 <mriedem> ++ /opt/stack/ironic/devstack/lib/ironic:wait_for_nova_resources:1865 :   die 1865 'Timed out waiting for Nova to track 1 nodes'
14:19:38 <mriedem> anyway we can sort that out and track it outside of the meeting
14:19:50 <efried> cool
14:20:01 <efried> Anything else on bugs, gate, CI, etc?
14:20:11 <cdent> I've raised a flag internally (again) on the lack of health from vmware ci. there was some enthusiam last week about "move everything to zuul v3" but that's dependent on locating some "lost" hardware
14:20:32 <mriedem> starlingx has reported what is for them a critical bug https://bugs.launchpad.net/nova/+bug/1829062
14:20:34 <openstack> Launchpad bug 1829062 in StarlingX "nova placement api non-responsive due to eventlet error" [Critical,In progress] - Assigned to Gerry Kopec (gerry-kopec)
14:20:37 <mriedem> related to the eventlet wsgi stuff
14:21:02 <mriedem> it sounds like the ultimate fix is melwitt's series to drop eventlet usage from the api
14:21:25 <efried> who's qualified to deep-review ^ ?
14:21:25 <mriedem> https://review.opendev.org/#/q/topic:cell-scatter-gather-futurist+(status:open+OR+status:merged)
14:21:41 <efried> mdbooth?
14:21:41 <mriedem> i haven't been paying much attention to it, but i know mdbooth has,
14:21:55 <cdent> I continue to think (as I said on the review) that we should only scatter gather when there are >2 cells
14:21:56 <mriedem> it sounds like the open nagging issue is not knowing if a thread is hung or something?
14:22:15 <cdent> that's an aspect yes, but mdbooth thinks that shouldn't be a "real" problem
14:22:30 <mriedem> there was some talk about down cells behavior with that change and i dropped my testing guide patch for down cells if people want to test that out with melwitt's patches applied
14:22:37 <sean-k-mooney> mriedem: the concern was if you had several tread hang waiting for a respoce then you could exaust the thread pool
14:23:07 <mriedem> but we don't actually know if we have a case for hung threads right?
14:23:11 <mriedem> this is just conjecture?
14:23:34 <sean-k-mooney> yes more or less
14:23:38 <mriedem> we can find out if a down cell breaks this by testing it with devstack, it's pretty easy
14:23:39 <cdent> aye
14:23:39 <sean-k-mooney> i mean it could happen
14:23:47 <mriedem> anything can happen...
14:23:55 <mriedem> so we know eventlet + wsgi is bad
14:24:11 <mriedem> we're not sure what can happen with mel's changes, but we can get more info by testing it with a down cell
14:24:12 <sean-k-mooney> it was raised by gibi and dan on the review which is why we are giving it credence
14:24:23 <mriedem> ok gibi is out for a bit
14:24:30 <mriedem> i'm not sure what dansmith's current thoughts are on it
14:24:40 * dansmith is on a call
14:24:41 <mriedem> sounds like next step is testing her patches with down cells?
14:24:57 <sean-k-mooney> for a donw cell it should not be an issue
14:25:00 <dansmith> it needs to be multiple down cells with lots of api traffic,
14:25:23 <sean-k-mooney> the edgecase was if the requst hang after the connection to the cell has started
14:25:31 <dansmith> but I also kinda don't see the point of doing this tbh, and I thought there were a couple things we could do to get the monkey patching in order to fix the acute problem
14:25:58 <dansmith> I'm super wary of having two threading models in code that doesn't have a strong separation... asking for trouble, IMHO,
14:26:07 <dansmith> but I don't really have time to dig deep on this
14:26:19 <sean-k-mooney> well the issue is the api was not monkey patched before when runing under wsgi and wsgi + eventlest has issues
14:26:53 <mriedem> ok i don't know what the alternatives are that dan's referring to, like i said i'm not heavily involved in this one
14:27:24 <mriedem> we could punt and only scatter/gather if there are >2 cells, but that just punts the problem to someone like cern to hit it when they get to stein
14:27:53 <mriedem> so i'm not in love with that option personally
14:28:04 <mriedem> anyway i guess we can move on
14:28:19 <mriedem> seems like by now people would have figured out problems with wsgi and multi-threading in python?
14:28:37 * mriedem re-writes nova-api in EJBs!
14:28:42 <artom> Well, eventlet isn't real multithreading...
14:28:43 <cdent> yes, several ideas are discussed on the review, but nothing has congealed out of the goo
14:28:55 <mriedem> artom: i mean without eventlet
14:28:58 <edleafe> EJB does sound promising!
14:29:04 <mriedem> if there are dangers with wsgi + python std lib concurrency stuff
14:29:10 <sean-k-mooney> by the way the work around for people untill we fix this is to go back to running the api via the console scipt command
14:29:16 <efried> edleafe is going to rewrite nova with graph databases
14:29:32 <artom> It'd immediately solve NUMA in placement ;)
14:29:36 <edleafe> efried: s/graph/distributed
14:29:36 <mriedem> i was also going to kill the nova-api eventlet stuff about a year ago...
14:29:38 <sean-k-mooney> that is a performacne hit but it works
14:29:40 <mriedem> good thing i got busy
14:29:51 <efried> moving on.
14:29:51 <efried> #topic Reminders
14:29:51 <efried> Summit, Forum, and PTG happened
14:29:51 <efried> #link PTG summary emails (searching for ".*[nova].*[ptg] Summary" will get most of them) http://lists.openstack.org/pipermail/openstack-discuss/2019-May/
14:30:08 <efried> Any other reminders?
14:31:09 <efried> #topic Stable branch status
14:31:09 <efried> #link Stein regressions: http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005637.html
14:31:09 <efried> No change since last week (one bug still open, bug 1824435, no great solution yet)
14:31:11 <openstack> bug 1824435 in OpenStack Compute (nova) stein "fill_virtual_interface_list migration fails on second attempt" [High,Triaged] https://launchpad.net/bugs/1824435
14:31:25 <efried> #link stable/stein: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/stein
14:31:25 <efried> #link stable/rocky: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/rocky
14:31:25 <efried> #link stable/queens: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/queens
14:31:25 <efried> #link stable/pike: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/pike
14:31:55 <efried> There was a question from cdent a few days ago about backporting something to ocata. It sounded like it was de-confusing a message?
14:32:09 <cdent> mriedem and I worked out it wasn't worth doing
14:32:13 <efried> okay, cool.
14:32:18 <cdent> as it wouldn't be a backport as pike changed it
14:32:29 <cdent> s/it/it alot/
14:32:54 <efried> so it would be an ocata-only change, and noncritical, so punt?
14:33:11 <efried> Anything else stable-related?
14:33:37 <cdent> (yes on the punt)
14:34:21 <efried> #topic Sub/related team Highlights
14:34:25 <efried> Placement
14:34:25 <efried> cdent was traveling, but we had a brief meeting on Monday without him
14:34:25 <efried> #link placement meeting log http://eavesdrop.openstack.org/meetings/placement/2019/placement.2019-05-13-14.00.log.html
14:34:30 <efried> The main nova-related things were...
14:34:38 <efried> #link WIP spec for nested magic https://review.opendev.org/#/c/658510/
14:34:38 <efried> #link spec for rg/rp mapping https://review.opendev.org/#/c/657582/
14:34:56 <efried> It would be nice if nova folks could have a look at those ^ and make sure they're going to satisfy nova use cases
14:35:17 * artom adds the nested magic one to this queue
14:35:19 <efried> also look with an eye for how we could simplify ^ and still meet the use cases :) (especially the nested magic one)
14:35:44 <artom> Will get to it when all the downstream fires have been put out. So, next year :P
14:36:08 <efried> artom: it's a scintillating read, I promise you.
14:36:15 <efried> cdent: anything else placement-that-affects-nova you want to go over?
14:36:27 <cdent> no sir
14:36:36 <efried> anyone else?
14:36:46 <efried> API (gmann)
14:36:59 <efried> no notes in the agenda, no gmann in the channel. Anyone have anything here?
14:38:09 <efried> #topic Stuck Reviews
14:38:17 <efried> nothing on the agenda. Anyone?
14:38:34 <efried> #topic Review status page
14:39:18 <efried> we talked about this above. fup with infra to make sure it's working. fup hacking the repo to see wtf it's doing. fup brainstorm on whether/how to use it to make the world a better place.
14:39:28 <efried> #topic Open discussion
14:39:47 <efried> in the spirit of being good little community citizens, I have started
14:39:47 <efried> #link WIP TC Vision Reflection https://review.opendev.org/658932
14:39:56 <efried> #help with this, please.
14:40:33 <efried> Any other opens?
14:40:45 <jangutter> Sorry for asking a question that might already have been answered: has anyone managed to dig up a mirror to the train ptg etherpad somewhere?
14:41:12 <efried> jangutter: Yeah, sean-k-mooney sent a copy (undecorated, unfortunately) to the ML
14:41:30 <efried> #link nova train ptg etherpad backup http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006243.html
14:41:40 <jangutter> efried: thanks!
14:41:50 <aspiers> #link alternative etherpad backup from infra team http://paste.openstack.org/show/751315/
14:42:10 <efried> thanks aspiers.
14:42:21 <efried> unfortunately, same lack of formatting, but content is there.
14:43:00 <efried> we had lost the authorship colors in the various transitions anyway, so the main loss is just the strikethroughs
14:43:09 <efried> and we had struck through everything pretty much anyway, so...
14:43:18 <efried> Okay, anything else before we wrap?
14:43:31 <artom> Is there going to be an open discussion thing?
14:43:40 <cdent> you're in it
14:43:51 <efried> artom: your mic
14:43:53 <artom> Wanted to quickly ask about stable branch same company approvals - for instance, who would be able to +W https://review.opendev.org/#/c/657125/
14:44:02 <efried> o right
14:44:39 <efried> my knee-jerk reaction is that same-company approvals don't really apply to backports.
14:44:41 <artom> (Are we writing down the master branch policy anywhere? Might be good to add the stable branch policy as well)
14:44:58 <efried> stable decisions are based on suitability for backporting; the technical decisions were already made on the master patch.
14:45:04 <sean-k-mooney> artom: we decided to not write it down
14:45:14 <efried> (except for the long email thread that's written down)
14:45:15 <artom> sean-k-mooney, aha, keep it in the cloud ;)
14:45:36 <mriedem> as a stable core i would be able to +2 it, if i don't -1 it first
14:45:42 <efried> artom: in case you missed it:
14:45:42 <efried> #link same-company approvals ML thread http://lists.openstack.org/pipermail/openstack-discuss/2019-May/thread.html#5865
14:45:54 <artom> mriedem, right, but you're not RH
14:45:58 <sean-k-mooney> mriedem: :)
14:46:04 <artom> I was more wondering if melwitt, for example, could come along and +W it
14:46:15 <mriedem> she should not IMO
14:46:18 <artom> efried, yeah, I followed that
14:46:22 <mriedem> especially given this change is arguably a feature
14:46:28 <sean-k-mooney> artom: am she could if a non redhater was the first +2
14:46:48 <artom> So stable is also 2 +2s?
14:46:57 <mriedem> not necessarily
14:47:08 <efried> assume we count the author of the master patch, not the proposer of the backport, as the author of record for purposes of same-company approval decisions?
14:47:10 <mriedem> backport from a stable core is generally considered a proxy +2 if it's a clean backport
14:47:33 <mriedem> efried: the original author of this change is RH
14:47:52 <efried> right, I'm confirming that the original author, not the backport proposer, is who we care about when talking about same-company.
14:47:53 <artom> mriedem, ah, so Lee backports a thing, that's a +2 in the bag if it's clean
14:48:04 <mriedem> artom: for some things yes
14:48:11 <sean-k-mooney> efried: yes it is
14:48:11 <mriedem> not really for this b/c it's big as hell
14:48:13 <artom> Dammit, nothing it black and white!
14:48:14 <mriedem> and feature-y
14:48:27 <mriedem> sorry, i'll get right to work on that law degree
14:48:28 <efried> artom: That's why we don't want to write it down.
14:48:42 <artom> efried, fair enough.
14:48:59 <artom> Anyways, don't want to take up too much time. My takeaway is, use your judgment and don't piss off mriedem  :D
14:49:09 <sean-k-mooney> any way i think people are awre of this patch and can review it now
14:49:32 <efried> so the answer to the general question is "case by case". And sounds like the answer for this specific patch is "let's not allow same-company".
14:49:44 <artom> sean-k-mooney, for the record, I was using the patch as an example because I got asked that question about that patch
14:50:02 <mriedem> is there a day where you guys downstream aren't talking about this same company approval thing?
14:50:13 <artom> mriedem, no, we all dream about it
14:50:26 <sean-k-mooney> ya well for that patch the aser is mriedem johnthetubaguy or claudiu can +w
14:50:39 <mriedem> just start forking the code and be done with it
14:50:52 <mriedem> claudiu isn't really upstream anymore
14:51:00 <mriedem> anyway, can we move on?
14:51:04 <artom> Dammit dude, the reason I'm asking here is because I actually care about upstream :)
14:51:19 <mriedem> let's hug
14:51:25 <artom> Bring it in brah
14:51:28 <mriedem> i know you do, but i shoot the messenger
14:51:35 <mriedem> sorry
14:51:40 <efried> can we end on another rap tune?
14:51:51 <efried> you don't
14:51:51 <efried> wanna f with mriedem
14:51:51 <efried> cause mriedem
14:51:51 <efried> will f'in hug you
14:52:05 <artom> Got 99 problems but upstream ain't one?
14:52:05 <efried> #endmeeting