19:01:15 #startmeeting infra 19:01:16 Meeting started Tue Jan 9 19:01:15 2018 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:19 The meeting name has been set to 'infra' 19:01:28 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:01:37 #topic Announcements 19:01:53 * mordred waves 19:02:02 I will be traveling for meetings on January 23rd so will not be able to run or attend the infra meeting 19:02:06 i'm back, just in the nick of time 19:02:10 that is the meeting after next 19:02:17 I think fungi is in this same boat too. 19:02:18 fungi: I thought you were back in the nick of fungi 19:02:24 i'll be on the same trip, so can't volunteer to chair 19:02:43 would be great if someone else was able to run that assumed we aren't all traveling at the same time 19:02:47 let me know if you are interested 19:03:03 pabelanger and ianw have chaired meetings in my stead in the past, fwiw 19:03:21 happy to if nobody else wants to 19:03:25 yah, happy to help, unless somebody else wants to try :) 19:03:32 who ever _wants_ to chair a meeting? ;) 19:03:53 #topic Actions from last meeting 19:04:00 #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-01-02-19.06.txt minutes from last meeting 19:04:07 there were none \o/ 19:04:20 success! 19:04:25 #topic Specs approval 19:04:53 I don't think there are any specs up for approval right now. But I do still intend on cleaning up our specs list after fungi's jenkins vote removal bubbled a bunch of old ones to the front of the list 19:05:15 * fungi is happy to tale the blame for people noticing old changes in review 19:05:15 i didn't put it on the agenda, but i'd like to issue a request for collaboration on https://review.openstack.org/531242 19:05:15 Also worth mentioning there is an irc bot improvement spec that we should be reviewing which is related to handling the irc spam freenode has been seeing recently 19:05:21 clarkb: ya that :) 19:05:22 er, s/tale/take/ 19:05:53 it would be helpful if you could rebase your old specs if they are still valid and abandon them otherwise 19:06:06 I will be going through and doing this for specs whose owners are less active now 19:06:21 #link https://review.openstack.org/531242 irc bot improvement spec 19:06:36 Yah, I have a spec or 2 to rebase 19:06:39 please get me feedback on that soon, and i'll put it on the agenda for approval 19:06:51 thanks for the reminder on the irc spec 19:07:13 \o sorry I'm late 19:07:41 #topic Priority Efforts 19:07:47 #topic Storyboard 19:07:57 There is also the ARA spec, I was waiting on one more review or two before spinning up a new draft: https://review.openstack.org/#/c/527500/ 19:08:05 oops 19:08:13 #undo 19:08:14 Removing item from minutes: #topic Storyboard 19:08:16 #undo 19:08:17 Removing item from minutes: #topic Priority Efforts 19:08:34 #link https://review.openstack.org/#/c/527500/ ARA dashboard spec 19:08:59 dmsimard: you might want to rebase it and push a new patchset anyways as it is merge conflicted which a lot of people filter out 19:09:11 I can do that. 19:09:31 #topic Priority Efforts 19:09:40 #topic Storyboard 19:10:02 We don't often have storyboard topics in this meeting but I wanted to mention that I have nominated diablo_rojo as storyboard core 19:10:21 diablo_rojo has been actively reviewing and shepherding storyboard changes recently and want ot encourage more of that. 19:10:43 (this in addition to helping projects work through the migration to storyboard) 19:10:56 there is an email thread on the infra list if you want to chime in 19:11:05 #topic Zuul v3 19:11:28 See my email about mailing lists 19:11:43 I've almost got the zuulv3-issues etherpad cleared out by either tracking down things as fixed, fixing them, or creating bugs. 19:11:52 #link zuul mailing list email http://lists.openstack.org/pipermail/openstack-infra/2018-January/005758.html 19:12:13 zuul mailing lists double-plus good 19:12:25 clarkb: thanks! 19:12:35 note that clarkb added two 3.0 release blocking bugs to storyboard 19:12:54 mmedvede' 19:12:56 so there's work available there 19:13:04 thanks clarkb :) 19:13:06 s puppet changes are looking good 19:13:22 fungi: oh great! 19:13:28 fungi: is there a topic or other gerrit query we can use to find those? 19:13:29 * fungi apologizes for having his return key adjacent to his apostrophe) 19:13:40 fungi: i fixed that, it's next to my 19:13:40 - 19:13:41 key 19:13:44 (they are on my list of things to followup on to make sure we are progressing towards a merged feature branch) 19:13:45 hah 19:14:04 fyi for folks, we're working on turning on testing ansible PRs from their github repo - https://review.openstack.org/#/q/topic:turn-on-ansible is the stack ATM 19:14:08 clarkb: good question. i simply found them looking in openstack-infra/puppet-openstackci open changes 19:14:10 fungi: are they just ready for a +3 or are we going to need to be careful? 19:14:15 fungi: thanks 19:14:33 #link puppet-openstack changes https://review.openstack.org/523951 19:14:45 i think they're probably ready for +3 since they default to zuul 2.x code paths, but eatra reviews are of course strongly recommended 19:14:59 er, extra 19:15:08 not eatra 19:15:13 but that should totally be a word 19:15:17 mordred: that's so cool. i'm hoping to follow that up shortly with cross-source deps which will be nice to have in place before we make any big announcements (that way we can show people a change in ansible depending on a change in shade!) 19:15:50 mordred: corvus will/are we reporting to github yet? 19:15:59 or is it currently just commenting on gerrit changes? 19:16:03 we are not yet reporting to github but will be by the end of that stack 19:16:11 * fungi wants a change in nodepool depending on a change in ansible depending on a change in shade 19:16:20 and ansible knows that this is happening and won't hunt us down? 19:16:20 currently iterating on the test job to make sure it's actually, you know, testing the right thing 19:16:25 i recently had a change in pbr which depended on a change in setuptools 19:16:35 but alas, no zuul support for me 19:16:38 so we'll start reporting... poke at things a bit, make sure it looks good, add in cross-source deps, test that out, then show people how cool it all is 19:16:39 clarkb: yup! 19:16:50 cool 19:17:05 exciting times 19:17:21 and not even in the ancient chinese proverb sense 19:17:24 o/ 19:17:46 mmedvede: other than reviews is there anything we need to be doing or aware of in updating puppet-openstackci? 19:18:07 clarkb: luckily, we have robyn to prevent ansible people from _actually_ hunting us down 19:18:30 incidentally, the implementation path i've chosen for cross-source deps should let us leave error messages on changes when there are dependency cycles. that's a long-requested feature. 19:18:30 clarkb: well, it is more work than I initially thought it would be :) 19:18:50 corvus: i missed that tidbit... amazing! 19:19:08 corvus++ 19:19:08 fungi: it's not there yet -- will probably happen in a followup. but i'm laying groundwork. 19:19:09 corvus: \o/ 19:19:18 corvus: I filed two bugs related to that that are not marked 3.0 just 3 19:19:20 color me excited 19:19:37 corvus: one is lack of reporting the other is detecting cycles when there isn't a proper cycle 19:20:17 alright any other zuul v3 related items? 19:20:52 nak 19:21:03 clarkb: I am restarting work on puppet-openstackci, would follow up outside of meeting with you if you got a moment 19:21:14 mmedvede: sure 19:21:20 #topic General Topics 19:21:27 Meltdown/Spectre! 19:21:45 it's the 9th is ubuntu updated? 19:21:45 i've melted down at least a few spectres already 19:21:54 corvus: they have not 19:22:08 corvus: there are kernels in testing though, but they haven't listed one for trusty unless you use the xenial kernel on trusty :/ 19:22:15 debian was pretty johnny-on-the-spot, but it's hard to beat openbsd getting patches in more than a decade ago 19:22:28 fungi: debian still doesn't have kernel for jessie for me :/ 19:22:32 #link ubuntu meltdown/spectre https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/SpectreAndMeltdown 19:22:44 clarkb: it's called 'apt dist-upgrade' to stretch :/ 19:23:01 fungi: ya I should probably upgrade the entire OS 19:23:02 lts security is not handled by the debian security team, alas 19:23:02 meanwhile, the meltdown POC was published today... 19:23:20 In the meantime, I've (for another purpose) made what should be a cross-distro playbook that allows to see what hosts are patched and which aren't 19:23:22 anyways tl;dr for us is this: 19:23:26 fungi: so we should be migrating to openbsd, yeah? 19:23:46 I'll clean it up and we can use it to keep an inventory of which host needs to be patched still 19:23:48 mordred: i'm running a bunch of it at home since long ago 19:23:52 Our CentOS machines (git*.openstack.org) are all patched and running with page table isolation. We are waiting for ubuntu to publish kernel updates for the rest of our control plane. 19:23:58 dmsimard: sounds cool, thx 19:24:09 and i gather you can run openbsd virtual machines as long as you don't need pv guests, but i still need to try that myself 19:24:12 The test node VMs will automagically pick up new kernels as part of daily rebuilds 19:24:26 Rackspace as a Xen user is for the most part not affected by meltdown 19:24:40 so I think concern about side channel attacks attacking our control plane is relatively minor 19:24:47 We don't upload any secrets from the executor to the test nodes, do we ? 19:24:59 their hypervisor layer not particularly impacted still doesn't change that we need to upgrade our guest layer kernels though 19:25:05 (since we don't control the state of each nodepool provider) 19:25:06 dmsimard: yes, AFS credentials 19:25:18 fungi: correct 19:25:39 clarkb: i'm assuming etherpad/ethercalc are pretty much where we left them before break? 19:25:41 also everyone should use firefox and make sure it is up to date as it has meltdown mitigation built in. Chrom* will get updatedon the 23rd 19:25:45 ianw: ya 19:26:00 ok, should get back to that, as part of this too 19:26:02 pabelanger: what job uploads afs creds to nodepool nodes? 19:26:02 pabelanger: we mount the AFS on the test nodes ? 19:26:21 actually, creds should only be required for writing I would guess ? since read only is unauthenticated 19:26:33 corvus: dmsimard our wheel builders 19:26:39 let me get jobs 19:27:03 pabelanger: the contents aren't recovered by the executor like logs or other artifacts would be ? 19:27:15 * dmsimard is totally unfamiliar with the wheel build process 19:27:37 the bigger concern is whether we put decrypted secrets on systems where we run untrusted code. to the extent that our (config-core) reviews are thorough, that shouldn't be the case 19:28:24 as soon as Ubuntu does publish kernels (which is supposed to be today), it would be much appreciated if we could have a semi organized patching "sprint" 19:28:26 that's not running proposed code though, that's a job we control centrally. however, it may be running code in python packages that build wheels. but that's not a new attact vector. 19:28:35 yeah, I thought secrets were meant to stay on the executor -- but that tells me there's nothing preventing a project from sending the secret(s) to the test nodes 19:28:36 publish-wheel-mirror-centos-7 for example will use secrets on nodepool node 19:28:37 maybe we can all hop in the incident channel and start working through it and partition out servers 19:28:52 dmsimard: secrets are meant to be used however needed 19:28:54 corvus: +1 19:29:01 er +1 to not new attack vector 19:29:23 agreed, ping infra-root in #-incident as soon as someone notices ubuntu trusty and/or xenial kernel updates addressing meltdown 19:29:39 i'm happy to pitch in on upgrading/rebooting if i'm at a computer 19:30:00 I will likely end up making that my priority once we have packages 19:30:00 as with anything, each piece of privileged information needs to be thought about and risk evaluated. there's no hard rule we can make about secrets. 19:30:06 corvus: what I am saying is that if secrets are used in test nodes, then they are vulnerable if the test node is running in an unpatched nodepool cloud provider 19:30:30 dmsimard: i agree. 19:30:45 yup, but those creds have alway sbeen vulnerable to attacks through the package builds 19:30:51 dmsimard: to the extent that the jobs/code running there aren't approved by the same people who know the secret 19:31:07 fungi: no, just running the job opens it to attack 19:31:19 or maybe thats what you mean by approved 19:31:21 attack from... 19:31:26 fungi: meltdown 19:31:33 in $vulnerablecloud 19:31:37 via other cloud users 19:31:38 using secrets in jobs which run proposed patches is already a huge no-no 19:31:57 fungi: the attack vector is ovh customers 19:31:58 regardless if it's proposed, post, or release 19:31:58 ohm, in providers who haven't secured their hypervisor hosts? yes indeed 19:32:02 compeltely unrelated to us 19:32:12 right 19:32:28 so this increases the risk of exposure of the credentials but I don't think it does so greatly 19:32:32 so that job is now vulnerable to evil package maintainers, and other customers on unpatched nodepool cloud providers 19:32:49 (whereas it used to only be the first -- a risk we deemed acceptable) 19:33:13 It's probably not that big of a risk, but it's worth keeping in mind and generally try to keep secrets off of test nodes where possible 19:33:19 dmsimard: yup 19:33:32 agreed 19:33:44 dmsimard: i'm not going to agree with we should try to keep secrets off test nodes where possible 19:34:03 i'm only going to go as far as we need to keep that risk in mind when we evaluate where we use secrets 19:34:15 yep, if a job using a secret doesn't need to run arbitrary code and can leverage the bwrap environment on the executor, that's best 19:34:19 corvus: I think it is part of the risk assessment, and then we decide if we can accept possible insecure_cpu/cloud bugs 19:34:29 corvus: ++ 19:34:35 Given the choice of using the secret on the executor or on the test node, I'd keep it on the executor if it doesn't mean jumping through 10 hoops 19:34:43 what about jobs that are using docker credentials, should we post something to warn about exposure? Or hope they understand it 19:34:48 but secrets on executors are only actually safe if the other rackspace guests on the same hosts aren't 64-bit pv flavor 19:34:56 (and to be really specific, i would like to eventually put a secrets on the test node so that we don't have to route logs through the executor) 19:35:03 so, it's still up to the provider to provide proper security 19:35:19 corvus: an approach we have for RDO is that we have the logserver pull the logs instead of having the test nodes push the logs 19:35:22 so i'm not going to climb on board any rule that precludes that. i'm happy to have a discussion about the risks when we do. 19:35:34 i.e, the log server has a ssh key that grants access to the test nodes 19:35:53 We talked about that before, but didn't implement it for some reason 19:36:05 let's not turn this into a discussion about that. i don't have a proposal. 19:36:13 pabelanger: it may be worth a note, maybe when the dust settles and we can speak with a bit more concrete info (eg find out when $clouds say they are patched, after we are patched and so on) 19:36:15 i merely cite it as an example of why i disagree with the blanket statement. 19:36:27 yeah, this is not the venue for redesigning test log publishing 19:36:29 sure 19:36:33 pabelanger: right now I don't think we have enough concrete info other than "please don't panic but be aware of this" 19:36:43 clarkb: agree 19:37:04 other than the general situation update adn call for help once we have packages I' 19:37:10 m not sure there is much more to talk about here 19:37:47 #topic Project Renames 19:37:53 clarkb: you missed a topic 19:38:10 or he's reordering them 19:38:10 dmsimard: refresh confirms 19:38:16 no I had missed it 19:38:17 #undo 19:38:18 Removing item from minutes: #topic Project Renames 19:38:19 was added late 19:38:30 added it shortly before meeting, sorry :) 19:38:30 sshnaidm: are you around? 19:38:43 clarkb, yep 19:39:04 "Allow jobs to send data to graphite.openstack.org" 19:39:19 the question I wanted to ask - is it possible to do in general? 19:39:25 I know now it's firewalled 19:39:27 i'd like to make sure everyone is aware of and has read this spec: 19:39:32 #link qa counter spec https://specs.openstack.org/openstack/qa-specs/specs/devstack/counter-inspection.html 19:40:08 sshnaidm: the spec corvus linked has a specific way to accomplish this despite the firewall 19:40:12 a target of mitaka-2, nice 19:40:22 I think that approach is still valid 19:40:57 it is partially implemented, and probably needs a little bit of updating because it predates zuulv3 and the changes we made to subunit processing, but the approach is still largely viable and is important prior work 19:40:57 clarkb, yeah, I ready it briefly 19:41:11 from skimming the spec, it's the same approach as openstack-health and logstash-workers ? 19:41:14 clarkb, it seems close, but not really what is done in jobs 19:41:26 the appraoch should also be generally applicable to all jobs 19:41:41 sshnaidm: meaning it doesn't meet your usecase? 19:42:21 clarkb, maybe I miss something in docs, but what I need actually it's to send some my custom statistics to some statistics server (graphite/influx/whatever) 19:42:34 sshnaidm: yes that is what that spec is for 19:42:36 clarkb, during the job run 19:42:47 i haven't looked closely at the spec, but is there no sane way to ship the stats back to the executor and let it emit them? 19:42:51 sshnaidm: why would it be a problem if you send the data after the job is complete ? 19:42:53 sshnaidm: jobs will collect the data then as part of the log publishing and processing we write the info to graphite 19:43:01 clarkb, ok, just maybe confused by subunit, mysql and other stuff there.. 19:43:19 sshnaidm: it is just pointing out that subunit2sql and logstash already use a similar data flow 19:43:21 dmsimard, post playbook is ok, of course 19:43:31 sshnaidm: and SpamapS was explicitly interested in mysql stats 19:43:39 ahh, so basically what i was asking. i should definitely look over the spec 19:43:51 clarkb, ok, so we need just to wait afaiu 19:43:54 Probably the best approach here is to work to implement that spec and update it where it needs to be updated 19:43:57 sshnaidm: yeah, the approach in how we currently ask logstash and openstack-health to index data involves a post playbook 19:44:10 sshnaidm: I don't think anyone is actively working it. 19:44:14 sshnaidm: oh, no. SpamapS is not working on that spec. if you want this to happen, someone will need to work on it. 19:44:15 waiting isn't likely to get you much unless there is someone else actively implementing 19:44:30 sshnaidm: you might be interested in something like https://review.openstack.org/#/c/434625/ ; that is an example of adding coverage output to the subunit stream 19:44:39 corvus, clarkb ok.. it it tracked anywhere? 19:44:45 hmmm? 19:44:55 sshnaidm: i'd recommend talking to SpamapS to find out what's completed and what maybe should be changed. 19:44:55 sshnaidm: you'd have to ask the qa team probably, or SpamapS 19:45:03 Indeed, I don't, unfortunately, have bandwidth to finish the perf stats spec. :-/ 19:45:16 Happy to advise anyone who wants to pick it up. 19:45:33 ok, thanks, will try to poke there 19:45:34 sshnaidm: ^ does that work as a starting point? 19:45:38 I'm sure a lot of projects could benefit from that spec -- I've seen kolla implement a self-contained collectd recently 19:45:56 (for the purpose of tracking gate job resource usage and performance data) 19:46:01 Yeah! 19:46:06 I wanted to do that too. :) 19:46:18 sshnaidm: I can help point you in the right direction 19:46:31 cool sounds like we have a rough starting point/plan 19:46:34 dmsimard, great 19:46:38 dmsimard: wait -- they implemented it just for kolla? 19:46:49 corvus: I think its more that kolla deployed clouds need that tooling anyways 19:47:00 corvus: so they implemented it for their users (and may also take advantage of it in jobs) 19:47:07 corvus: it's a collectd implementation that generates a html page at the end and the html page is stored in the logs 19:47:09 dmsimard, well, in tripleo we also run dstat that collect data.. just not sending it anywhere 19:47:25 dmsimard: oh that makes sense... i thought you were saying they stood up a private collectd 19:47:26 sshnaidm: https://review.openstack.org/#/c/436768/5/specs/coverage-tracking.rst also has a bunch of info on subunit streams and how they plug in 19:47:39 to openstack-health, etc 19:47:46 dmsimard: i agree, that's a great first step that can feed into what sshnaidm wants to accomplish and make things better for everyone 19:47:56 ianw, thanks, will look 19:48:08 #topic Project Renames 19:48:11 another question I have is - how can I inject credentials to some 3d-party server in jobs? for example to authenticate with some http api in 3party to send data there 19:48:15 oops 19:48:32 I don't want to derail the previous discussion but we are running out of time and would like to talk about project renames before the meeting ends 19:48:37 sshnaidm: why not send the data to a first party server? 19:48:53 ok, let's talk later about it 19:48:53 sshnaidm: it's a question for #openstack-infra, we can help there 19:49:11 great, thanks 19:49:18 short answer... 19:49:20 #link https://docs.openstack.org/infra/zuul/feature/zuulv3/user/encryption.html#encryption Zuul v3 secrets 19:49:21 fungi: mordred ianw I know everyone has been afk and snowed in or under a heat wave, but curious if any progress has been made on sorting out a project rename plan 19:49:57 clarkb: AJaeger has been asking that we discuss the plan for pypi registration for new projects. is that still in need of discussion? 19:50:32 fungi: I thought there was rough consensus on the plan there to register empty projects? I may have missed some subtle details that need ironing out 19:50:42 i'm not aware of any progress on redesigning the renames process yet, unfortunately 19:50:47 clarkb: the project name not needed patch landed to zuul, so that part is good, but we've still got required-projects entries to consider 19:51:00 mordred: oh cool 19:51:12 oh, and i guess we've restarted zuul since then 19:51:21 i'll start pushing up patches to remove project names 19:51:29 sounds like good progress for everyone being on holiday 19:51:37 mordred: clarkb: also we had talked about switching to implicit required jobs where desired, and dropping teh system-required project-template? 19:51:45 #action corvus patchbomb project name removals from zuul.yaml 19:51:50 clarkb, fungi: on pypi - I've got patches up to verify that a project either doesn't exist in pypi or exists but has openstackci properly as the owner ... 19:52:11 mordred: cool, saw those, still need to find a moment to review, but seems like a solution is in progress anyway 19:52:19 for require-projects maybe we can have project aliases 19:52:30 where we map old to new and zuul will attempt to use both 19:52:37 then after a rename is compelte we can remove the aliases 19:52:40 or we could remove the jobs, rename, then readd. 19:54:02 though really, required-projects failures should probably soft-fail -- they should not fail at startup, but should still error on proposed reconfigurations. 19:54:02 clarkb, fungi: https://review.openstack.org/#/q/topic:check-pypi ... it does not contain anything to actually auto-register yet (figure let's get verification working solidly first) 19:54:31 since we should protect zuul from external renames happening, but still prevent folks from botching a config change 19:54:44 ya external renames would be annoying 19:54:44 ++ 19:54:50 sounds right 19:54:53 especuially if you couldn't start zuul as a result 19:54:54 i think that would cover this case? the job would just be 'broken' till someone updated its' required projects... 19:55:10 corvus: ya I think it would. Job would be broken but zuul would work for everyone else 19:55:24 last thing we want is for zuul to stop being able to start due to a change in external data 19:55:25 and renames already tend to put projects in a half working state anyways 19:55:28 of course, if it's an important job, fallout could be large, but hey. 19:56:01 #topic open discussion 19:56:04 o/ 19:56:16 we have a few minutes for whatever else we missed :) 19:56:16 a wild Zara appears 19:56:18 I'm late but congrats diablo_rojo! 19:56:28 I arrived *right* after the storyboard topic and have been lurking haha 19:56:36 diablo_rojo: ++ :) 19:56:39 I'm very happy about the nomination :) 19:56:46 Thanks :) 19:56:52 Happy to share the load ;) 19:56:58 woot 19:57:01 hopefully there are more new sb contributors on their way too 19:57:11 :D 19:57:15 seems like it's gaining visibility 19:57:36 and mordred's been hacking on it some again 19:58:02 yep! 19:58:14 * diablo_rojo writes a dozen 'You are eligible for migration' emails 19:58:14 oh! PTG is coming up 19:58:44 krotscheck has even been resurfacing on occasion 19:58:53 There are a few patches up to add puppetlabs to AFS mirrors, if anybody else wants some hands on XP on AFS changes: https://review.openstack.org/#/q/topic:puppetlabs/mirrors 19:58:58 I'll probably start soliciting rough planning thoughts for PTG once we get past meltdown 19:59:05 happy to walk people thought it or will likely do it in a day or so 19:59:26 pabelanger: you might want to check with frickler directly as the timezones don't make it easy to see pings 19:59:32 I expect frickler will be interested 19:59:42 wfm 20:00:06 ok we are out of time. Thank you everyone 20:00:08 #endmeeting