19:00:52 <clarkb> #startmeeting infra 19:00:53 <openstack> Meeting started Tue Nov 14 19:00:52 2017 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:57 <openstack> The meeting name has been set to 'infra' 19:00:58 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:01:12 <clarkb> we do have a few items on the agenda so lets go ahead and get started 19:01:31 <bswartz> .o/ 19:01:34 <Shrews> o/ 19:01:48 <clarkb> mordred: I think I've seen you on IRC today as well so ping :) 19:01:51 <clarkb> #topic Announcements 19:02:19 <clarkb> I don't really have anything other than Summit happened and people are likely jet lagged or still traveling or sight seeing still 19:02:25 <clarkb> so we may be on the slow end of things for a bit 19:02:37 <clarkb> #topic Actions from last meeting 19:02:47 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-10-31-19.01.txt Minutes from last meeting 19:02:56 <clarkb> #action fungi Document secrets backup policy 19:03:03 <clarkb> I don't think I've seen a change for ^ yet so will keep that on there 19:03:12 <clarkb> #topic Specs approval 19:03:26 <clarkb> Our current spec list seems to be mostly work in progress 19:03:42 <clarkb> if I've missed something important let me know, but I don't think we need to spend time on this today 19:03:56 <clarkb> #topic Priority Efforts 19:04:02 <clarkb> #topic Zuul v3 19:04:55 <clarkb> This morning we had problems with release jobs inheriting from final jobs and not failing until tags were made 19:05:21 <pabelanger> oh, I missed that 19:05:33 <clarkb> #info release note job is final but projects have inherited from it to add require projects. This error was not discovered until tags were made and those release note jobs were actually run 19:05:53 <clarkb> I expect that addressing this properly will require modifications to how zuul processes proposed config changes so doubt we'll be able to design a fix here :) 19:06:06 <clarkb> but somethign to be aware of when reviewing jobs, make sure we aren't modifying things that are marked final 19:06:27 <mordred> o/ 19:06:31 <AJaeger> o/ 19:06:33 <pabelanger> we also somehow landed a zuul.yaml file that broken zuulv3: https://review.openstack.org/519442/ this reported a syntax error only after being merged, and we had to revert 19:06:39 <clarkb> #info when reviewing jobs be careful to not allow inheritance of final jobs 19:06:57 <AJaeger> sorry, missed the start ;( 19:07:00 <clarkb> pabelanger: it is possible that the error checking for both cases is related and similarly broken 19:07:29 <AJaeger> clarkb: we also have to review current state - check which repos in project-config and templates in openstack-zuul-jobs are wrong 19:07:37 <clarkb> AJaeger: good point 19:07:48 <clarkb> there may be other cases of inherited and modified final jobs 19:08:15 <clarkb> I'll make a note to add my understanding of the problem to the zuul issues etherpad (I haven't yet) 19:08:16 <pabelanger> clarkb: maybe, I'm going to make note and get help from jeblair when he returns to debug 19:08:33 <clarkb> pabelanger: ya lets write it down on the issues etherpad and then remember to bring it up with jeblair when he returns 19:08:39 <pabelanger> ++ 19:08:50 <clarkb> ok, any other zuulv3 related items we want to talk about or make sure others are aware of? 19:09:25 <jlvillal> o/ 19:09:44 <ianw> sorry, just to summarise, a job marked job.final was able to be inherited from incorrectly? 19:10:02 <AJaeger> ianw: it failed when executed 19:10:07 <clarkb> ianw: correct we were able to merge at least one change to trove-dashboard that inherited from a job marked final 19:10:26 <clarkb> ianw: then didn't have any errors until we tried to run the job when a tag was made for releasing that project 19:10:49 <ianw> ahh, ok, so it merges but not runs. ok, something to keep an eye out for. thanks 19:11:09 <clarkb> yup it does error properly in the end so that bit works, it would just be ideal to error pre merge and haev people fix it before trying to make a release 19:11:14 <AJaeger> clarkb: trove and neutron-fwaas-dashboard - trove-dashboard was something else 19:11:40 <AJaeger> http://lists.openstack.org/pipermail/openstack-dev/2017-November/124535.html 19:11:52 <clarkb> #link http://lists.openstack.org/pipermail/openstack-dev/2017-November/124535.html 19:12:21 * clarkb gives it another minute for zuulv3 items 19:12:30 <AJaeger> #link http://lists.openstack.org/pipermail/openstack-dev/2017-November/124480.html 19:12:39 <AJaeger> those two are the failures ^ 19:12:42 <clarkb> AJaeger: thanks 19:13:28 <clarkb> #topic General topics 19:13:36 <clarkb> #topic rax-ord instance clean up 19:13:48 <clarkb> pabelanger: re ^ did you get the info you needed? I think I saw cleanup was happening? 19:13:53 <clarkb> pabelanger: anything else to discuss on this one? 19:14:01 <AJaeger> clarkb: I was wrong and you're right: trove-dashboard it was, trove was a different failure ;( 19:14:33 <pabelanger> clarkb: oh, that was last week? 19:14:39 <pabelanger> yah, i did delete fg-test 19:14:51 <pabelanger> I think ianw and I are ready to delete pypi.slave 19:15:03 <clarkb> #info fg-test was deleted and we are ready to delete pypi.slave.openstack.org 19:15:09 <clarkb> sounds good, and thanks for working on cleanup 19:15:15 <pabelanger> np 19:15:39 <clarkb> #topic New backup server 19:16:02 <clarkb> ianw: my plan today is to restore zuul config(s) after lunch to confirm the new server is working as expected 19:16:20 <ianw> ok, if i get external confirmation on that, i think we can do https://review.openstack.org/516159 19:16:23 <ianw> #link https://review.openstack.org/516159 19:16:46 <ianw> maybe give me an action item to report next week that all backups are happening correctly on the new server 19:16:57 <ianw> that way, i/we won't forget to check things are working 19:17:07 <clarkb> #action ianw to confirm backups are working properly next week (after we migrate to new backup server) 19:17:38 <pabelanger> ianw: +2 19:17:46 <clarkb> other than second set of eyes checking backups and reviews on 516159, anything else you need on this? 19:17:50 <ianw> </eot> 19:18:00 <clarkb> #topic Puppetmaster health 19:18:22 <ianw> yeah, so when i found that zombie host that had been up for ~400 days running jobs ... 19:18:41 <ianw> i also found puppetmaster dead. since the credentials to reboot puppetmaster were on puppetmaster ... 19:18:54 * jlvillal remembers ianw finding that zombie host and wonders if there are more... 19:19:00 <ianw> the rax guy told me it was oom messages on the console 19:19:16 <ianw> jlvillal: i did an audit, and didn't find any (that's how i noticed the other pypi.slave node etc) 19:19:34 <jlvillal> ianw: cool and thanks 19:19:40 <ianw> 2gb is small for this i think, i know we've discussed it ... is it on anyone's plate to migrate? 19:19:48 <pabelanger> I think we discussed about maybe moving off infracloud first to new puppetmaster, see what we learn, then move everything else 19:20:10 <clarkb> maybe call it something other than puppetmaster too? 19:20:30 <clarkb> as that no longer accurately reflects its duties 19:20:34 <pabelanger> or just go all in on zuulv3 post playbooks :D 19:20:38 * bswartz recommends "zombiemaster" 19:20:49 <clarkb> bswartz: necromancer? 19:20:57 <bswartz> +1 19:21:14 <jlvillal> despot? Pharaoh? tyrant? 19:21:25 <ianw> ok, so step 1 is split off infracloud control to this new thing? 19:21:29 <jlvillal> overlord? enforcer! 19:21:39 <clarkb> ianw: I think it would probably be a good idea to come up with a rough plan for any migration (spec I guess) just because the instance is fairly important. But ya splitting off infracloud control sounds like a good place to start to me 19:22:17 <ianw> alright then, let me investigate and write a spec for this then, with that as a starting point 19:22:29 <clarkb> (also let me know if we think a spec is too heavy weight, maybe just an email thread to the list. Mostly I want to make sure its fairly well communicated) 19:22:40 <clarkb> ianw: thanks! 19:22:53 <ianw> i think spec is fine. clearly a lot of opinion on names :) 19:23:04 <ianw> </eot> 19:23:08 <clarkb> #topic bindep & external repositories 19:23:25 <ianw> we can skip to this last if others are more important 19:23:28 <clarkb> I want to say that the 5 second reading of meeting topic notes makes me want this feature in bindep... 19:23:40 <clarkb> ianw: ok lets come back to it then 19:23:44 <clarkb> #undo 19:23:44 <openstack> Removing item from minutes: #topic bindep & external repositories 19:23:52 <clarkb> #topic Jobs requiring IPv6 19:23:56 <clarkb> bswartz: you are up 19:24:00 <bswartz> This might be better handled as a question in the channel, but I knew you were meeting today so I figured I'd try here first. 19:24:06 <bswartz> If a test job requires an IPv6 address on the test node for ${reasons} is there a way to guarantee that? 19:24:13 <bswartz> My (limited) observation is that some test nodes are IPv4-only and some are mixed. IDK if IPv6-only nodes exist or not. 19:24:42 <clarkb> bswartz: do you need external ipv6 connectivity or just between the test nodes? 19:25:10 <bswartz> just the existence of a non-link-local IPv6 address on the host would be sufficient for our use case 19:25:20 <clarkb> ok 19:25:32 <bswartz> we're doing to install nfs-kernel-server on the test node and connect to it from nova VMs over IPv6 19:25:39 <clarkb> and yes, we currently have clouds that do not provide ipv6 addrs to interfaces by default 19:26:10 <bswartz> is there a way to setup jobs to only run on nodes with IPv6? 19:26:23 <clarkb> bswartz: no, even in our ipv6 only cloud we had ipv4 for reasons 19:26:25 <bswartz> and if not, can I request it? 19:26:39 <clarkb> I don't expect that would go away any time soon. (also we don't have any ipv6 only clouds left) 19:26:42 <ianw> so multi-node testing, one node running nfs server & the other connecting to it? 19:26:45 <bswartz> having ipv4 also is no problem 19:26:52 <frickler> iirc devstack does setup a public v6 network and it should be possible to give the host an address too, if it doesn't already, would that be enough? 19:26:53 <clarkb> instead I think you can just modify local networking to have ipv6 19:26:55 <bswartz> just having no ipv6 at all causes problems 19:27:01 <clarkb> frickler: ya it does for testing ipv6 iirc 19:27:31 <clarkb> frickler: bswartz so that is how I would appraoch this here. If you need ipv6 set it up locally and it should work fine. Even for multinode you should be able to ipv6 over the inter node overlay just fine 19:28:07 <bswartz> okay I may follow up with more questions about that approach later 19:28:14 <bswartz> but it sounds like the answer to my first question is no 19:28:20 <ianw> yeah, it sounds a bit like a devstack setup problem 19:28:25 <bswartz> it's not possible to schedule zuul jobs to nodes with or without ipv6 19:28:35 <clarkb> bswartz: ya I wouldn't expect something like that at least not in the near future since it is largely to do with clouds and they can and do change over time as well 19:28:44 <bswartz> okay thank you 19:29:00 <bswartz> that's all I wanted to know 19:29:13 <clarkb> bswartz: I would start by looking at how devstack + tempest do ipv6 testing. They arleady do it today where you connect via v6 between instances and stuff 19:29:32 <bswartz> yes I know that setting up IPv6 within neutron is quite easy 19:30:03 <ianw> bswartz: i don't know all that much about the guts of ipv6 in neutron in devstack, but happy to help out and learn things along the way, just ping me 19:30:07 <bswartz> we need it to be on the dsvm too, because manila installs services there that are "outside the cloud" from neutron's perspective 19:30:27 <clarkb> bswartz: yup all you need is an address in the same range and local attached routing rules should just work for that 19:30:50 <clarkb> bswartz: that is how we do floating IP routing on multinode tests with an overlay network 19:31:20 <clarkb> the host VM gets addrs in the first /24 of a /23 and the floating IPs are assigned from the second /24 19:31:24 <pabelanger> maybe we should setup ipv6 by default on overlay in multinode? 19:31:28 <bswartz> I believe we can attempt to leverage the "public" ipv6 network that neutron places on the external bridge 19:31:29 <clarkb> I imagine you could do something similar with ipv6 address assignment 19:31:52 <clarkb> pabelanger: maybe, though I think this is an issue on signle node as well so probably start there 19:32:06 <pabelanger> ya 19:32:46 <clarkb> bswartz: definitely come back and ask questions if you find unexpected things trying ^ 19:32:59 <bswartz> I'm sure I will 19:33:03 <bswartz> ty 19:33:15 <clarkb> bswartz: anything else we can help with or is this a good start? 19:33:21 <bswartz> that's all for now 19:33:26 <clarkb> #topic Zanata upgrade 19:33:54 <clarkb> At this point this is mostly informational, but at the summit aeng reached out and was asking about upgrading our zanata servers 19:34:27 <clarkb> it sounds like the translators would like to have a new version in place for string freeze which means getting upgraded by early/mid January if I remember the release calendar properly 19:35:00 <clarkb> my understanding is this new version of zanata doesn't require newer java so this upgrade should be much simpler than the last one. Will just be a config update then the actual zanata update 19:35:07 <clarkb> also wildfly doesn't need an upgrade 19:35:44 <clarkb> I have asked them to push changes to translate-dev's puppet config to do the upgrade (config and zanata) and then we can test there and work out problems in order to get a production upgrade planned 19:35:50 <clarkb> so be on the lookout for those changes. 19:35:52 <pabelanger> ++ 19:36:19 <clarkb> #topic bindep & external repositories 19:36:33 <ianw> oh, so back to that ... 19:36:34 <clarkb> we have time to bikeshed bindep config grammar :) 19:36:46 <ianw> just wanted to see what people thought about this before i wrote anything 19:37:15 <ianw> i'm wondering if bindep should get involved in this ... or just have a rule that it deals with the packages it can see from the package manager, and that's it 19:37:26 <clarkb> `iberasurecode-devel [platform:centos enable_repo:epel] -> yum install --enablerepo=epel liberasurecode-devel` is the example 19:37:59 <ianw> right, any sort of package in bindep.txt that isn't in the base repos ... how do you know where it comes from 19:38:21 <clarkb> zypper install has a --from flag which I think would be the equivalent there 19:38:35 <clarkb> does apt/apt-get/aptitude know how to enable things for a single install? 19:38:39 <pabelanger> yah, this has been asked for in the past 19:39:01 <ianw> i guess apt has "-t" 19:39:10 <ianw> although not sure if that's for disabled repos 19:39:38 <pabelanger> I think with apt you setup pins for packages 19:39:51 <pabelanger> if you want to only pull something from a specific repo 19:39:54 <clarkb> pabelanger: ya its priority based and much more painful/complicated iirc 19:39:59 <pabelanger> yup 19:40:22 <pabelanger> I'm leaning towards not adding that into bindep, and maybe seeing how to work around it 19:40:37 <ianw> we can just put in a role to enable repos before running bindep 19:41:05 <pabelanger> yah, I think we've been suggesting that for now 19:41:14 <pabelanger> but I do like --enablerepo foo with yum 19:41:33 <pabelanger> if only we could figure out an easy win for apt 19:42:09 <ianw> it could be a sort of distro-specific flag 19:42:34 <pabelanger> I'll look into dpkg_options and see what there is 19:43:48 <ianw> i guess, when i think this through, you're still going to have to have like the epel-release package pre-installed before bindep anyway 19:43:59 <ianw> you can't really specify that *in* bindep 19:44:16 <clarkb> ianw: ya I think end of day maybe making it distro specific is fine 19:44:17 <ianw> at which point, you've already had to manually setup the repos 19:44:23 <pabelanger> yah, in our case, repo files already exists but just disabled 19:45:05 * mordred played with adding repo indications to bindep a couple of months ago - gave up because it got super complex 19:45:20 <ianw> right, but at that point you've had to work outside bindep anyway to get the repo setup 19:45:32 <ianw> so you might as well enable/disable around the bindep call 19:45:48 <ianw> mordred: good to see we're reaching the same conclusion :) 19:46:13 <mordred> the thing I wanted to do was just add a list of repos that needed to be added - have them be permanent vs temporary adds didn't occur to me 19:47:08 <ianw> that kind of gets complex because on redhat for epel and openstack-* packages you'd just install the -release packages which dumps the repo files 19:47:43 <ianw> where as on deb you'd write in apt sources i guess to things like libvirt repos 19:47:57 <pabelanger> so, I am in favor of removing some of the packages from bindep-fallback.txt: https://review.openstack.org/519533/ 19:48:24 <pabelanger> maybe we should just give a heads up to ML and help projects that might be affected setup bindep.txt files 19:48:32 * mordred is in favor of removing bindep fallback ... 19:48:40 <pabelanger> or that 19:48:46 <ianw> right, this was just about what do we tell people to do when they need the package and it's in the different repo 19:48:52 <mordred> ++ 19:49:13 <pabelanger> ianw: I think we continue with, you need to manually enable repo X before bindep 19:49:16 <ianw> i think we just provide roles to enable/disable around the bindep calls myself 19:49:24 <mordred> ++ 19:49:29 <pabelanger> this is what we've said with OSA and kolla up until now 19:49:34 <pabelanger> ++ 19:50:16 <ianw> ok, well if people are happy with 519533 ... that removes the packages that use non-base things on mostly centos, but others too 19:50:47 <clarkb> I think it would be good to get a bit more settled in zuulv3 first 19:50:51 <ianw> however, before we remove centos-release-openstack-ocata (519535) i will look at having roles to put it back 19:51:02 <clarkb> we still have job issues outside of that and adding more fuel to the fire won't help anyone 19:51:50 <ianw> ok, i just don't want to grow too many more dependencies on this 19:52:21 <clarkb> ya I don't think we have to wait very long, just thinking this has potential to break release again and we are still sorting out v3 related chagnes there 19:52:23 <ianw> we already have tripleo having to turn these repos off before they start. that's a bit of a silly situation and creates potential for weird version skews 19:52:28 <clarkb> and will be harder to debug if multiple things are changing 19:52:55 <ianw> i don't think release will really be involved, as this really only affects centos & suse 19:53:02 <clarkb> oh right 19:53:14 <clarkb> in that case maybe its safe enough (sorry was thinking of it as a more global issue) 19:53:21 <ianw> and not devstack jobs, because they pull in the repos, so only unit testing really 19:53:46 <clarkb> maybe email to the dev list, say we'd like to do it sometime next week and please speak up if this is a problem for you? 19:54:02 <clarkb> osa, puppet, tripleo, kolla are likely to be broken by it if their bindep isn't up to date? 19:54:03 <ianw> ok, yep can do. not a put in on friday and disappear thing 19:54:10 <clarkb> ya 19:55:20 <clarkb> #topic open discussion 19:56:09 <pabelanger> do we want to consider a virtual sprint for control plane upgrades (xenial) before or after PTG? 19:56:20 <clarkb> this week is going to be a weird one for me. I've got the kids while my wife recovers from surgery. Hoping things will be more normal next week 19:56:43 <pabelanger> clarkb: np, I think most people are still recovering from summit 19:56:52 <pabelanger> I know I am 19:56:58 <clarkb> pabelanger: if we are able to take advantage of holiday slowdown that may be a good time to knock out a bunch of services with few people noticing 19:57:05 <jlvillal> I really would like to get this fix in: https://review.openstack.org/509670 Since I keep hitting cases where there is a CRITICAL log level but if I select ERROR log level I don't see it. Reviews appreciated :) 19:57:16 <clarkb> pabelanger: but that depends on availability. I think sooner is better than later though 19:57:46 <clarkb> jlvillal: I will give that a review 19:57:53 <pabelanger> clarkb: yah, that is what I was thinking too. I can see what peoples schedules look like and propose a time 19:57:56 <jlvillal> clarkb: Awesome, thanks :) 19:58:13 <clarkb> pabelanger: maybe an ethercalc/etherpad of proposed times and see who is available when? like we do for ptg/summit dinner things 19:58:28 <pabelanger> ++ 19:59:07 <clarkb> alright out of time. Thanks everyone and you can find us in #openstack-infra or on the infra mailing list if anything else comes up 19:59:10 <clarkb> #endmeeting