19:00:52 <clarkb> #startmeeting infra
19:00:53 <openstack> Meeting started Tue Nov 14 19:00:52 2017 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:57 <openstack> The meeting name has been set to 'infra'
19:00:58 <clarkb> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:01:12 <clarkb> we do have a few items on the agenda so lets go ahead and get started
19:01:31 <bswartz> .o/
19:01:34 <Shrews> o/
19:01:48 <clarkb> mordred: I think I've seen you on IRC today as well so ping :)
19:01:51 <clarkb> #topic Announcements
19:02:19 <clarkb> I don't really have anything other than Summit happened and people are likely jet lagged or still traveling or sight seeing still
19:02:25 <clarkb> so we may be on the slow end of things for a bit
19:02:37 <clarkb> #topic Actions from last meeting
19:02:47 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-10-31-19.01.txt Minutes from last meeting
19:02:56 <clarkb> #action fungi Document secrets backup policy
19:03:03 <clarkb> I don't think I've seen a change for ^ yet so will keep that on there
19:03:12 <clarkb> #topic Specs approval
19:03:26 <clarkb> Our current spec list seems to be mostly work in progress
19:03:42 <clarkb> if I've missed something important let me know, but I don't think we need to spend time on this today
19:03:56 <clarkb> #topic Priority Efforts
19:04:02 <clarkb> #topic Zuul v3
19:04:55 <clarkb> This morning we had problems with release jobs inheriting from final jobs and not failing until tags were made
19:05:21 <pabelanger> oh, I missed that
19:05:33 <clarkb> #info release note job is final but projects have inherited from it to add require projects. This error was not discovered until tags were made and those release note jobs were actually run
19:05:53 <clarkb> I expect that addressing this properly will require modifications to how zuul processes proposed config changes so doubt we'll be able to design a fix here :)
19:06:06 <clarkb> but somethign to be aware of when reviewing jobs, make sure we aren't modifying things that are marked final
19:06:27 <mordred> o/
19:06:31 <AJaeger> o/
19:06:33 <pabelanger> we also somehow landed a zuul.yaml file that broken zuulv3: https://review.openstack.org/519442/ this reported a syntax error only after being merged, and we had to revert
19:06:39 <clarkb> #info when reviewing jobs be careful to not allow inheritance of final jobs
19:06:57 <AJaeger> sorry, missed the start ;(
19:07:00 <clarkb> pabelanger: it is possible that the error checking for both cases is related and similarly broken
19:07:29 <AJaeger> clarkb: we also have to review current state - check which repos in project-config and templates in openstack-zuul-jobs are wrong
19:07:37 <clarkb> AJaeger: good point
19:07:48 <clarkb> there may be other cases of inherited and modified final jobs
19:08:15 <clarkb> I'll make a note to add my understanding of the problem to the zuul issues etherpad (I haven't yet)
19:08:16 <pabelanger> clarkb: maybe, I'm going to make note and get help from jeblair when he returns to debug
19:08:33 <clarkb> pabelanger: ya lets write it down on the issues etherpad and then remember to bring it up with jeblair when he returns
19:08:39 <pabelanger> ++
19:08:50 <clarkb> ok, any other zuulv3 related items we want to talk about or make sure others are aware of?
19:09:25 <jlvillal> o/
19:09:44 <ianw> sorry, just to summarise, a job marked job.final was able to be inherited from incorrectly?
19:10:02 <AJaeger> ianw: it failed when executed
19:10:07 <clarkb> ianw: correct we were able to merge at least one change to trove-dashboard that inherited from a job marked final
19:10:26 <clarkb> ianw: then didn't have any errors until we tried to run the job when a tag was made for releasing that project
19:10:49 <ianw> ahh, ok, so it merges but not runs.  ok, something to keep an eye out for.  thanks
19:11:09 <clarkb> yup it does error properly in the end so that bit works, it would just be ideal to error pre merge and haev people fix it before trying to make a release
19:11:14 <AJaeger> clarkb: trove and neutron-fwaas-dashboard - trove-dashboard was something else
19:11:40 <AJaeger> http://lists.openstack.org/pipermail/openstack-dev/2017-November/124535.html
19:11:52 <clarkb> #link http://lists.openstack.org/pipermail/openstack-dev/2017-November/124535.html
19:12:21 * clarkb gives it another minute for zuulv3 items
19:12:30 <AJaeger> #link http://lists.openstack.org/pipermail/openstack-dev/2017-November/124480.html
19:12:39 <AJaeger> those two are the failures ^
19:12:42 <clarkb> AJaeger: thanks
19:13:28 <clarkb> #topic General topics
19:13:36 <clarkb> #topic rax-ord instance clean up
19:13:48 <clarkb> pabelanger: re ^ did you get the info you needed? I think I saw cleanup was happening?
19:13:53 <clarkb> pabelanger: anything else to discuss on this one?
19:14:01 <AJaeger> clarkb: I was wrong and you're right: trove-dashboard it was, trove was a different failure ;(
19:14:33 <pabelanger> clarkb: oh, that was last week?
19:14:39 <pabelanger> yah, i did delete fg-test
19:14:51 <pabelanger> I think ianw and I are ready to delete pypi.slave
19:15:03 <clarkb> #info fg-test was deleted and we are ready to delete pypi.slave.openstack.org
19:15:09 <clarkb> sounds good, and thanks for working on cleanup
19:15:15 <pabelanger> np
19:15:39 <clarkb> #topic New backup server
19:16:02 <clarkb> ianw: my plan today is to restore zuul config(s) after lunch to confirm the new server is working as expected
19:16:20 <ianw> ok, if i get external confirmation on that, i think we can do https://review.openstack.org/516159
19:16:23 <ianw> #link https://review.openstack.org/516159
19:16:46 <ianw> maybe give me an action item to report next week that all backups are happening correctly on the new server
19:16:57 <ianw> that way, i/we won't forget to check things are working
19:17:07 <clarkb> #action ianw to confirm backups are working properly next week (after we migrate to new backup server)
19:17:38 <pabelanger> ianw: +2
19:17:46 <clarkb> other than second set of eyes checking backups and reviews on 516159, anything else you need on this?
19:17:50 <ianw> </eot>
19:18:00 <clarkb> #topic Puppetmaster health
19:18:22 <ianw> yeah, so when i found that zombie host that had been up for ~400 days running jobs ...
19:18:41 <ianw> i also found puppetmaster dead.  since the credentials to reboot puppetmaster were on puppetmaster ...
19:18:54 * jlvillal remembers ianw finding that zombie host and wonders if there are more...
19:19:00 <ianw> the rax guy told me it was oom messages on the console
19:19:16 <ianw> jlvillal: i did an audit, and didn't find any (that's how i noticed the other pypi.slave node etc)
19:19:34 <jlvillal> ianw: cool and thanks
19:19:40 <ianw> 2gb is small for this i think, i know we've discussed it ... is it on anyone's plate to migrate?
19:19:48 <pabelanger> I think we discussed about maybe moving off infracloud first to new puppetmaster, see what we learn, then move everything else
19:20:10 <clarkb> maybe call it something other than puppetmaster too?
19:20:30 <clarkb> as that no longer accurately reflects its duties
19:20:34 <pabelanger> or just go all in on zuulv3 post playbooks :D
19:20:38 * bswartz recommends "zombiemaster"
19:20:49 <clarkb> bswartz: necromancer?
19:20:57 <bswartz> +1
19:21:14 <jlvillal> despot? Pharaoh? tyrant?
19:21:25 <ianw> ok, so step 1 is split off infracloud control to this new thing?
19:21:29 <jlvillal> overlord? enforcer!
19:21:39 <clarkb> ianw: I think it would probably be a good idea to come up with a rough plan for any migration (spec I guess) just because the instance is fairly important. But ya splitting off infracloud control sounds like a good place to start to me
19:22:17 <ianw> alright then, let me investigate and write a spec for this then, with that as a starting point
19:22:29 <clarkb> (also let me know if we think a spec is too heavy weight, maybe just an email thread to the list. Mostly I want to make sure its fairly well communicated)
19:22:40 <clarkb> ianw: thanks!
19:22:53 <ianw> i think spec is fine.  clearly a lot of opinion on names :)
19:23:04 <ianw> </eot>
19:23:08 <clarkb> #topic bindep & external repositories
19:23:25 <ianw> we can skip to this last if others are more important
19:23:28 <clarkb> I want to say that the 5 second reading of meeting topic notes makes me want this feature in bindep...
19:23:40 <clarkb> ianw: ok lets come back to it then
19:23:44 <clarkb> #undo
19:23:44 <openstack> Removing item from minutes: #topic bindep & external repositories
19:23:52 <clarkb> #topic Jobs requiring IPv6
19:23:56 <clarkb> bswartz: you are up
19:24:00 <bswartz> This might be better handled as a question in the channel, but I knew you were meeting today so I figured I'd try here first.
19:24:06 <bswartz> If a test job requires an IPv6 address on the test node for ${reasons} is there a way to guarantee that?
19:24:13 <bswartz> My (limited) observation is that some test nodes are IPv4-only and some are mixed. IDK if IPv6-only nodes exist or not.
19:24:42 <clarkb> bswartz: do you need external ipv6 connectivity or just between the test nodes?
19:25:10 <bswartz> just the existence of a non-link-local IPv6 address on the host would be sufficient for our use case
19:25:20 <clarkb> ok
19:25:32 <bswartz> we're doing to install nfs-kernel-server on the test node and connect to it from nova VMs over IPv6
19:25:39 <clarkb> and yes, we currently have clouds that do not provide ipv6 addrs to interfaces by default
19:26:10 <bswartz> is there a way to setup jobs to only run on nodes with IPv6?
19:26:23 <clarkb> bswartz: no, even in our ipv6 only cloud we had ipv4 for reasons
19:26:25 <bswartz> and if not, can I request it?
19:26:39 <clarkb> I don't expect that would go away any time soon. (also we don't have any ipv6 only clouds left)
19:26:42 <ianw> so multi-node testing, one node running nfs server & the other connecting to it?
19:26:45 <bswartz> having ipv4 also is no problem
19:26:52 <frickler> iirc devstack does setup a public v6 network and it should be possible to give the host an address too, if it doesn't already, would that be enough?
19:26:53 <clarkb> instead I think you can just modify local networking to have ipv6
19:26:55 <bswartz> just having no ipv6 at all causes problems
19:27:01 <clarkb> frickler: ya it does for testing ipv6 iirc
19:27:31 <clarkb> frickler: bswartz so that is how I would appraoch this here. If you need ipv6 set it up locally and it should work fine. Even for multinode you should be able to ipv6 over the inter node overlay just fine
19:28:07 <bswartz> okay I may follow up with more questions about that approach later
19:28:14 <bswartz> but it sounds like the answer to my first question is no
19:28:20 <ianw> yeah, it sounds a bit like a devstack setup problem
19:28:25 <bswartz> it's not possible to schedule zuul jobs to nodes with or without ipv6
19:28:35 <clarkb> bswartz: ya I wouldn't expect something like that at least not in the near future since it is largely to do with clouds and they can and do change over time as well
19:28:44 <bswartz> okay thank you
19:29:00 <bswartz> that's all I wanted to know
19:29:13 <clarkb> bswartz: I would start by looking at how devstack + tempest do ipv6 testing. They arleady do it today where you connect via v6 between instances and stuff
19:29:32 <bswartz> yes I know that setting up IPv6 within neutron is quite easy
19:30:03 <ianw> bswartz: i don't know all that much about the guts of ipv6 in neutron in devstack, but happy to help out and learn things along the way, just ping me
19:30:07 <bswartz> we need it to be on the dsvm too, because manila installs services there that are "outside the cloud" from neutron's perspective
19:30:27 <clarkb> bswartz: yup all you need is an address in the same range and local attached routing rules should just work for that
19:30:50 <clarkb> bswartz: that is how we do floating IP routing on multinode tests with an overlay network
19:31:20 <clarkb> the host VM gets addrs in the first /24 of a /23 and the floating IPs are assigned from the second /24
19:31:24 <pabelanger> maybe we should setup ipv6 by default on overlay in multinode?
19:31:28 <bswartz> I believe we can attempt to leverage the "public" ipv6 network that neutron places on the external bridge
19:31:29 <clarkb> I imagine you could do something similar with ipv6 address assignment
19:31:52 <clarkb> pabelanger: maybe, though I think this is an issue on signle node as well so probably start there
19:32:06 <pabelanger> ya
19:32:46 <clarkb> bswartz: definitely come back and ask questions if you find unexpected things trying ^
19:32:59 <bswartz> I'm sure I will
19:33:03 <bswartz> ty
19:33:15 <clarkb> bswartz: anything else we can help with or is this a good start?
19:33:21 <bswartz> that's all for now
19:33:26 <clarkb> #topic Zanata upgrade
19:33:54 <clarkb> At this point this is mostly informational, but at the summit aeng reached out and was asking about upgrading our zanata servers
19:34:27 <clarkb> it sounds like the translators would like to have a new version in place for string freeze which means getting upgraded by early/mid January if I remember the release calendar properly
19:35:00 <clarkb> my understanding is this new version of zanata doesn't require newer java so this upgrade should be much simpler than the last one. Will just be a config update then the actual zanata update
19:35:07 <clarkb> also wildfly doesn't need an upgrade
19:35:44 <clarkb> I have asked them to push changes to translate-dev's puppet config to do the upgrade (config and zanata) and then we can test there and work out problems in order to get a production upgrade planned
19:35:50 <clarkb> so be on the lookout for those changes.
19:35:52 <pabelanger> ++
19:36:19 <clarkb> #topic bindep & external repositories
19:36:33 <ianw> oh, so back to that ...
19:36:34 <clarkb> we have time to bikeshed bindep config grammar :)
19:36:46 <ianw> just wanted to see what people thought about this before i wrote anything
19:37:15 <ianw> i'm wondering if bindep should get involved in this ... or just have a rule that it deals with the packages it can see from the package manager, and that's it
19:37:26 <clarkb> `iberasurecode-devel [platform:centos enable_repo:epel] -> yum install --enablerepo=epel liberasurecode-devel` is the example
19:37:59 <ianw> right, any sort of package in bindep.txt that isn't in the base repos ... how do you know where it comes from
19:38:21 <clarkb> zypper install has a --from flag which I think would be the equivalent there
19:38:35 <clarkb> does apt/apt-get/aptitude know how to enable things for a single install?
19:38:39 <pabelanger> yah, this has been asked for in the past
19:39:01 <ianw> i guess apt has "-t"
19:39:10 <ianw> although not sure if that's for disabled repos
19:39:38 <pabelanger> I think with apt you setup pins for packages
19:39:51 <pabelanger> if you want to only pull something from a specific repo
19:39:54 <clarkb> pabelanger: ya its priority based and much more painful/complicated iirc
19:39:59 <pabelanger> yup
19:40:22 <pabelanger> I'm leaning towards not adding that into bindep, and maybe seeing how to work around it
19:40:37 <ianw> we can just put in a role to enable repos before running bindep
19:41:05 <pabelanger> yah, I think we've been suggesting that for now
19:41:14 <pabelanger> but I do like --enablerepo foo with yum
19:41:33 <pabelanger> if only we could figure out an easy win for apt
19:42:09 <ianw> it could be a sort of distro-specific flag
19:42:34 <pabelanger> I'll look into dpkg_options and see what there is
19:43:48 <ianw> i guess, when i think this through, you're still going to have to have like the epel-release package pre-installed before bindep anyway
19:43:59 <ianw> you can't really specify that *in* bindep
19:44:16 <clarkb> ianw: ya I think end of day maybe making it distro specific is fine
19:44:17 <ianw> at which point, you've already had to manually setup the repos
19:44:23 <pabelanger> yah, in our case, repo files already exists but just disabled
19:45:05 * mordred played with adding repo indications to bindep a couple of months ago - gave up because it got super complex
19:45:20 <ianw> right, but at that point you've had to work outside bindep anyway to get the repo setup
19:45:32 <ianw> so you might as well enable/disable around the bindep call
19:45:48 <ianw> mordred: good to see we're reaching the same conclusion :)
19:46:13 <mordred> the thing I wanted to do was just add a list of repos that needed to be added - have them be permanent vs temporary adds didn't occur to me
19:47:08 <ianw> that kind of gets complex because on redhat for epel and openstack-* packages you'd just install the -release packages which dumps the repo files
19:47:43 <ianw> where as on deb you'd write in apt sources i guess to things like libvirt repos
19:47:57 <pabelanger> so, I am in favor of removing some of the packages from bindep-fallback.txt: https://review.openstack.org/519533/
19:48:24 <pabelanger> maybe we should just give a heads up to ML and help projects that might be affected setup bindep.txt files
19:48:32 * mordred is in favor of removing bindep fallback ...
19:48:40 <pabelanger> or that
19:48:46 <ianw> right, this was just about what do we tell people to do when they need the package and it's in the different repo
19:48:52 <mordred> ++
19:49:13 <pabelanger> ianw: I think we continue with, you need to manually enable repo X before bindep
19:49:16 <ianw> i think we just provide roles to enable/disable around the bindep calls myself
19:49:24 <mordred> ++
19:49:29 <pabelanger> this is what we've said with OSA and kolla up until now
19:49:34 <pabelanger> ++
19:50:16 <ianw> ok, well if people are happy with 519533 ... that removes the packages that use non-base things on mostly centos, but others too
19:50:47 <clarkb> I think it would be good to get a bit more settled in zuulv3 first
19:50:51 <ianw> however, before we remove centos-release-openstack-ocata (519535) i will look at having roles to put it back
19:51:02 <clarkb> we still have job issues outside of that and adding more fuel to the fire won't help anyone
19:51:50 <ianw> ok, i just don't want to grow too many more dependencies on this
19:52:21 <clarkb> ya I don't think we have to wait very long, just thinking this has potential to break release again and we are still sorting out v3 related chagnes there
19:52:23 <ianw> we already have tripleo having to turn these repos off before they start.  that's a bit of a silly situation and creates potential for weird version skews
19:52:28 <clarkb> and will be harder to debug if multiple things are changing
19:52:55 <ianw> i don't think release will really be involved, as this really only affects centos & suse
19:53:02 <clarkb> oh right
19:53:14 <clarkb> in that case maybe its safe enough (sorry was thinking of it as a more global issue)
19:53:21 <ianw> and not devstack jobs, because they pull in the repos, so only unit testing really
19:53:46 <clarkb> maybe email to the dev list, say we'd like to do it sometime next week and please speak up if this is a problem for you?
19:54:02 <clarkb> osa, puppet, tripleo, kolla are likely to be broken by it if their bindep isn't up to date?
19:54:03 <ianw> ok, yep can do.  not a put in on friday and disappear thing
19:54:10 <clarkb> ya
19:55:20 <clarkb> #topic open discussion
19:56:09 <pabelanger> do we want to consider a virtual sprint for control plane upgrades (xenial) before or after PTG?
19:56:20 <clarkb> this week is going to be a weird one for me. I've got the kids while my wife recovers from surgery. Hoping things will be more normal next week
19:56:43 <pabelanger> clarkb: np, I think most people are still recovering from summit
19:56:52 <pabelanger> I know I am
19:56:58 <clarkb> pabelanger: if we are able to take advantage of holiday slowdown that may be a good time to knock out a bunch of services with few people noticing
19:57:05 <jlvillal> I really would like to get this fix in: https://review.openstack.org/509670  Since I keep hitting cases where there is a CRITICAL log level but if I select ERROR log level I don't see it. Reviews appreciated :)
19:57:16 <clarkb> pabelanger: but that depends on availability. I think sooner is better than later though
19:57:46 <clarkb> jlvillal: I will give that a review
19:57:53 <pabelanger> clarkb: yah, that is what I was thinking too. I can see what peoples schedules look like and propose a time
19:57:56 <jlvillal> clarkb: Awesome, thanks :)
19:58:13 <clarkb> pabelanger: maybe an ethercalc/etherpad of proposed times and see who is available when? like we do for ptg/summit dinner things
19:58:28 <pabelanger> ++
19:59:07 <clarkb> alright out of time. Thanks everyone and you can find us in #openstack-infra or on the infra mailing list if anything else comes up
19:59:10 <clarkb> #endmeeting