14:00:30 #startmeeting nova 14:00:31 Meeting started Thu May 16 14:00:30 2019 UTC and is due to finish in 60 minutes. The chair is efried. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:32 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:35 The meeting name has been set to 'nova' 14:00:37 o/ 14:00:40 \o 14:00:40 HI 14:00:40 ~o~ 14:00:41 o/ 14:00:45 (my name is) 14:00:52 slim shady 14:01:07 *scratching noises* 14:01:13 wow 14:01:17 * johnthetubaguy lurks until he has to run to a doctors appointment 14:01:43 Cell service in the subway ftw 14:02:01 #link agenda https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 14:02:11 #topic Last meeting 14:02:11 #link Minutes from last meeting: http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-05-09-21.01.html 14:02:18 A few items from last time to follow up on... 14:02:27 fup efried action to track down owner of review status page http://status.openstack.org/reviews/#nova 14:02:47 I have gotten as far as finding out where the source is (openstack/reviewday project) but haven't dug in yet. 14:03:08 wow 14:03:08 Page refreshed at 2019-05-09 06:38:29 UTC 466 active reviews 14:03:16 Anyone wants to hack around, knock yourself out. Lemme know what you find. 14:03:21 might ping infra (fungi) to see if it's busted 14:03:28 why, is that wrong? 14:03:58 he's been pinged 14:04:32 oh, yeah, looks like there's a bit over 700 actually open 14:04:51 what does that page even mean? 14:04:58 yeah, when our current fires are extinguished hopefully someone can check on status.o.o and find out if there's any error from the cron or whatever that regenerates that content 14:05:00 that's what we'd like to figure out. 14:05:11 ^ to cdent 14:05:16 cdent: we're not sure how the scoring is calculated 14:05:18 ...and figure out how we can use it. 14:05:28 but otherwise it's just a place with all the open nova reviews, sortable 14:05:40 sortable by some criteria we don't understand. 14:05:41 i think part of the heat factor is age 14:05:46 you may have to dig into the reviewstats source code, but i think the scoring has to do with launchpad bug priority 14:05:52 and age 14:06:02 and maybe lp heat value, idk 14:06:05 but that would make sense 14:06:28 #link cycle themes are still up for review https://review.opendev.org/657171 14:06:59 This has a couple +2s and a number of +1s. I'm tempted to say I'll merge it in a week if no objections from this point. 14:07:07 does that work? 14:07:16 +1 doing the merge 14:07:44 ight 14:08:06 i'll take a look after this meeting 14:08:12 thanks. 14:08:14 and last fup, couple of patches were highlighted for review last week. 14:08:20 https://review.opendev.org/#/c/643023/ 14:08:20 https://review.opendev.org/#/c/643024/ 14:08:33 Got dragged away for downstream bugfixes/backports, didn't get a chance to look :( 14:08:58 It looked like sean-k-mooney was into them as well, but was on vacation last week; I'll poke. 14:09:37 done 14:09:45 #topic Release News 14:10:04 anything? 14:10:28 #topic Bugs (stuck/critical) 14:10:28 No Critical bugs 14:10:28 #link 78 new untriaged bugs (up 2 since the last meeting): https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 14:10:28 #link 10 untagged untriaged bugs (no change since the last meeting): https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=NEW 14:11:21 From last week, bug 1827083 14:11:22 bug 1827083 in OpenStack-Gate "ERROR: Could not install packages due to an EnvironmentError: HTTPSConnectionPool(host='git.openstack.org', port=443): Max retries exceeded with url: /cgit/openstack/requirements/plain/upper-constraints.txt (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)) in vexxhost-sjc1" [Undecided,Confirmed] https://launchpad.net/bugs/1827083 14:11:39 looks like this has at least been worked around by making vexxhost not use ipv6 14:11:47 and then some conditional ipv6ing 14:11:55 but the bug isn't closed. mriedem, whassa deal, yo? 14:11:56 yes http://status.openstack.org/elastic-recheck/#1827083 14:12:01 mnaser is in china 14:12:09 so the workaround is forced ipv4 14:12:16 next steps are up to infra, not me 14:12:21 hi 14:12:23 flatline since those fixes merged, so that's good. 14:12:39 still open because more permanent solution pending? 14:12:57 well we want ipv6 testing in the gate per one of the release goals i believe, 14:13:06 and that region was all ipv6 until this week i think 14:13:11 ah yes, that. IPv6 works till it doesn’t and I don’t know why only that hits it. Anyways, I’ll try to dig deeper soon with Indra hopefully. 14:13:14 Infra** 14:13:14 so yeah i'm sure people (again, infra) will be working on it 14:13:30 cool. Meantime mitigated, so \o/ 14:13:38 yes thank clarkb 14:13:42 But... the underlying setup can be done with IPv4 even if we then test IPv6 in the tenant networks, no? 14:14:14 yes, we've had ipv6 testing in the gate with tempest for a long time 14:15:16 ya the switch here was to use external dns via ipv4 instead of ipv6 14:15:28 the tests themselves can still.use ipv6 internally 14:15:45 it was external connectivity we struggled with 14:15:58 Yeah, we're still testing IPv6 correctly 14:16:32 otherwise gate looks pretty healthy (keinehorah, ptoo-ptoo-ptoo) 14:16:52 3rd party CI 14:16:52 #link 3rd party CI status http://ciwatch.mmedvede.net/project?project=nova&time=7+days 14:16:52 ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa looks bad - anyone know anything about this? 14:17:34 dtantsur was asking about some ironic job n-cpu logs the other day, but for a stable branch (rocky) i think 14:17:36 not sure if that would be related 14:17:58 http://logs.openstack.org/32/634832/29/check/ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa/fba9197/controller/logs/devstacklog.txt.gz#_2019-05-16_03_28_21_590 14:18:04 looks like the job is f'ed 14:18:12 also this http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006314.html 14:18:16 not sure if that's related 14:18:20 die 1865 'Timed out waiting for Nova to track 1 nodes' 14:18:42 when did that job stop voting? 14:18:43 looks like they are waiting for some CUSTOM_GOLD trait to show up 14:18:48 i don't think it ever was voting 14:18:53 hmph, okay. 14:18:55 it used to timeout all the time (years ago) 14:19:20 TheJulia: ^ known issue? 14:19:25 ++ /opt/stack/ironic/devstack/lib/ironic:wait_for_nova_resources:1865 : die 1865 'Timed out waiting for Nova to track 1 nodes' 14:19:38 anyway we can sort that out and track it outside of the meeting 14:19:50 cool 14:20:01 Anything else on bugs, gate, CI, etc? 14:20:11 I've raised a flag internally (again) on the lack of health from vmware ci. there was some enthusiam last week about "move everything to zuul v3" but that's dependent on locating some "lost" hardware 14:20:32 starlingx has reported what is for them a critical bug https://bugs.launchpad.net/nova/+bug/1829062 14:20:34 Launchpad bug 1829062 in StarlingX "nova placement api non-responsive due to eventlet error" [Critical,In progress] - Assigned to Gerry Kopec (gerry-kopec) 14:20:37 related to the eventlet wsgi stuff 14:21:02 it sounds like the ultimate fix is melwitt's series to drop eventlet usage from the api 14:21:25 who's qualified to deep-review ^ ? 14:21:25 https://review.opendev.org/#/q/topic:cell-scatter-gather-futurist+(status:open+OR+status:merged) 14:21:41 mdbooth? 14:21:41 i haven't been paying much attention to it, but i know mdbooth has, 14:21:55 I continue to think (as I said on the review) that we should only scatter gather when there are >2 cells 14:21:56 it sounds like the open nagging issue is not knowing if a thread is hung or something? 14:22:15 that's an aspect yes, but mdbooth thinks that shouldn't be a "real" problem 14:22:30 there was some talk about down cells behavior with that change and i dropped my testing guide patch for down cells if people want to test that out with melwitt's patches applied 14:22:37 mriedem: the concern was if you had several tread hang waiting for a respoce then you could exaust the thread pool 14:23:07 but we don't actually know if we have a case for hung threads right? 14:23:11 this is just conjecture? 14:23:34 yes more or less 14:23:38 we can find out if a down cell breaks this by testing it with devstack, it's pretty easy 14:23:39 aye 14:23:39 i mean it could happen 14:23:47 anything can happen... 14:23:55 so we know eventlet + wsgi is bad 14:24:11 we're not sure what can happen with mel's changes, but we can get more info by testing it with a down cell 14:24:12 it was raised by gibi and dan on the review which is why we are giving it credence 14:24:23 ok gibi is out for a bit 14:24:30 i'm not sure what dansmith's current thoughts are on it 14:24:40 * dansmith is on a call 14:24:41 sounds like next step is testing her patches with down cells? 14:24:57 for a donw cell it should not be an issue 14:25:00 it needs to be multiple down cells with lots of api traffic, 14:25:23 the edgecase was if the requst hang after the connection to the cell has started 14:25:31 but I also kinda don't see the point of doing this tbh, and I thought there were a couple things we could do to get the monkey patching in order to fix the acute problem 14:25:58 I'm super wary of having two threading models in code that doesn't have a strong separation... asking for trouble, IMHO, 14:26:07 but I don't really have time to dig deep on this 14:26:19 well the issue is the api was not monkey patched before when runing under wsgi and wsgi + eventlest has issues 14:26:53 ok i don't know what the alternatives are that dan's referring to, like i said i'm not heavily involved in this one 14:27:24 we could punt and only scatter/gather if there are >2 cells, but that just punts the problem to someone like cern to hit it when they get to stein 14:27:53 so i'm not in love with that option personally 14:28:04 anyway i guess we can move on 14:28:19 seems like by now people would have figured out problems with wsgi and multi-threading in python? 14:28:37 * mriedem re-writes nova-api in EJBs! 14:28:42 Well, eventlet isn't real multithreading... 14:28:43 yes, several ideas are discussed on the review, but nothing has congealed out of the goo 14:28:55 artom: i mean without eventlet 14:28:58 EJB does sound promising! 14:29:04 if there are dangers with wsgi + python std lib concurrency stuff 14:29:10 by the way the work around for people untill we fix this is to go back to running the api via the console scipt command 14:29:16 edleafe is going to rewrite nova with graph databases 14:29:32 It'd immediately solve NUMA in placement ;) 14:29:36 efried: s/graph/distributed 14:29:36 i was also going to kill the nova-api eventlet stuff about a year ago... 14:29:38 that is a performacne hit but it works 14:29:40 good thing i got busy 14:29:51 moving on. 14:29:51 #topic Reminders 14:29:51 Summit, Forum, and PTG happened 14:29:51 #link PTG summary emails (searching for ".*[nova].*[ptg] Summary" will get most of them) http://lists.openstack.org/pipermail/openstack-discuss/2019-May/ 14:30:08 Any other reminders? 14:31:09 #topic Stable branch status 14:31:09 #link Stein regressions: http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005637.html 14:31:09 No change since last week (one bug still open, bug 1824435, no great solution yet) 14:31:11 bug 1824435 in OpenStack Compute (nova) stein "fill_virtual_interface_list migration fails on second attempt" [High,Triaged] https://launchpad.net/bugs/1824435 14:31:25 #link stable/stein: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/stein 14:31:25 #link stable/rocky: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/rocky 14:31:25 #link stable/queens: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/queens 14:31:25 #link stable/pike: https://review.openstack.org/#/q/status:open+(project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/nova)+branch:stable/pike 14:31:55 There was a question from cdent a few days ago about backporting something to ocata. It sounded like it was de-confusing a message? 14:32:09 mriedem and I worked out it wasn't worth doing 14:32:13 okay, cool. 14:32:18 as it wouldn't be a backport as pike changed it 14:32:29 s/it/it alot/ 14:32:54 so it would be an ocata-only change, and noncritical, so punt? 14:33:11 Anything else stable-related? 14:33:37 (yes on the punt) 14:34:21 #topic Sub/related team Highlights 14:34:25 Placement 14:34:25 cdent was traveling, but we had a brief meeting on Monday without him 14:34:25 #link placement meeting log http://eavesdrop.openstack.org/meetings/placement/2019/placement.2019-05-13-14.00.log.html 14:34:30 The main nova-related things were... 14:34:38 #link WIP spec for nested magic https://review.opendev.org/#/c/658510/ 14:34:38 #link spec for rg/rp mapping https://review.opendev.org/#/c/657582/ 14:34:56 It would be nice if nova folks could have a look at those ^ and make sure they're going to satisfy nova use cases 14:35:17 * artom adds the nested magic one to this queue 14:35:19 also look with an eye for how we could simplify ^ and still meet the use cases :) (especially the nested magic one) 14:35:44 Will get to it when all the downstream fires have been put out. So, next year :P 14:36:08 artom: it's a scintillating read, I promise you. 14:36:15 cdent: anything else placement-that-affects-nova you want to go over? 14:36:27 no sir 14:36:36 anyone else? 14:36:46 API (gmann) 14:36:59 no notes in the agenda, no gmann in the channel. Anyone have anything here? 14:38:09 #topic Stuck Reviews 14:38:17 nothing on the agenda. Anyone? 14:38:34 #topic Review status page 14:39:18 we talked about this above. fup with infra to make sure it's working. fup hacking the repo to see wtf it's doing. fup brainstorm on whether/how to use it to make the world a better place. 14:39:28 #topic Open discussion 14:39:47 in the spirit of being good little community citizens, I have started 14:39:47 #link WIP TC Vision Reflection https://review.opendev.org/658932 14:39:56 #help with this, please. 14:40:33 Any other opens? 14:40:45 Sorry for asking a question that might already have been answered: has anyone managed to dig up a mirror to the train ptg etherpad somewhere? 14:41:12 jangutter: Yeah, sean-k-mooney sent a copy (undecorated, unfortunately) to the ML 14:41:30 #link nova train ptg etherpad backup http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006243.html 14:41:40 efried: thanks! 14:41:50 #link alternative etherpad backup from infra team http://paste.openstack.org/show/751315/ 14:42:10 thanks aspiers. 14:42:21 unfortunately, same lack of formatting, but content is there. 14:43:00 we had lost the authorship colors in the various transitions anyway, so the main loss is just the strikethroughs 14:43:09 and we had struck through everything pretty much anyway, so... 14:43:18 Okay, anything else before we wrap? 14:43:31 Is there going to be an open discussion thing? 14:43:40 you're in it 14:43:51 artom: your mic 14:43:53 Wanted to quickly ask about stable branch same company approvals - for instance, who would be able to +W https://review.opendev.org/#/c/657125/ 14:44:02 o right 14:44:39 my knee-jerk reaction is that same-company approvals don't really apply to backports. 14:44:41 (Are we writing down the master branch policy anywhere? Might be good to add the stable branch policy as well) 14:44:58 stable decisions are based on suitability for backporting; the technical decisions were already made on the master patch. 14:45:04 artom: we decided to not write it down 14:45:14 (except for the long email thread that's written down) 14:45:15 sean-k-mooney, aha, keep it in the cloud ;) 14:45:36 as a stable core i would be able to +2 it, if i don't -1 it first 14:45:42 artom: in case you missed it: 14:45:42 #link same-company approvals ML thread http://lists.openstack.org/pipermail/openstack-discuss/2019-May/thread.html#5865 14:45:54 mriedem, right, but you're not RH 14:45:58 mriedem: :) 14:46:04 I was more wondering if melwitt, for example, could come along and +W it 14:46:15 she should not IMO 14:46:18 efried, yeah, I followed that 14:46:22 especially given this change is arguably a feature 14:46:28 artom: am she could if a non redhater was the first +2 14:46:48 So stable is also 2 +2s? 14:46:57 not necessarily 14:47:08 assume we count the author of the master patch, not the proposer of the backport, as the author of record for purposes of same-company approval decisions? 14:47:10 backport from a stable core is generally considered a proxy +2 if it's a clean backport 14:47:33 efried: the original author of this change is RH 14:47:52 right, I'm confirming that the original author, not the backport proposer, is who we care about when talking about same-company. 14:47:53 mriedem, ah, so Lee backports a thing, that's a +2 in the bag if it's clean 14:48:04 artom: for some things yes 14:48:11 efried: yes it is 14:48:11 not really for this b/c it's big as hell 14:48:13 Dammit, nothing it black and white! 14:48:14 and feature-y 14:48:27 sorry, i'll get right to work on that law degree 14:48:28 artom: That's why we don't want to write it down. 14:48:42 efried, fair enough. 14:48:59 Anyways, don't want to take up too much time. My takeaway is, use your judgment and don't piss off mriedem :D 14:49:09 any way i think people are awre of this patch and can review it now 14:49:32 so the answer to the general question is "case by case". And sounds like the answer for this specific patch is "let's not allow same-company". 14:49:44 sean-k-mooney, for the record, I was using the patch as an example because I got asked that question about that patch 14:50:02 is there a day where you guys downstream aren't talking about this same company approval thing? 14:50:13 mriedem, no, we all dream about it 14:50:26 ya well for that patch the aser is mriedem johnthetubaguy or claudiu can +w 14:50:39 just start forking the code and be done with it 14:50:52 claudiu isn't really upstream anymore 14:51:00 anyway, can we move on? 14:51:04 Dammit dude, the reason I'm asking here is because I actually care about upstream :) 14:51:19 let's hug 14:51:25 Bring it in brah 14:51:28 i know you do, but i shoot the messenger 14:51:35 sorry 14:51:40 can we end on another rap tune? 14:51:51 you don't 14:51:51 wanna f with mriedem 14:51:51 cause mriedem 14:51:51 will f'in hug you 14:52:05 Got 99 problems but upstream ain't one? 14:52:05 #endmeeting