#openstack-meeting-4 log

15:01:11 <portdirect> #startmeeting openstack-helm
15:01:12 <openstack> Meeting started Tue Apr  9 15:01:11 2019 UTC and is due to finish in 60 minutes.  The chair is portdirect. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:16 <openstack> The meeting name has been set to 'openstack_helm'
15:01:21 <evrardjp> \o
15:01:23 <alanmeadows> o/
15:01:25 <gagehugo> o/
15:01:25 <pgaxatte> o/
15:01:29 <portdirect> lets give it a couple of mins for people to arrive
15:01:41 <itxaka> \o o/
15:01:43 <portdirect> the agenda for today is here: https://etherpad.openstack.org/p/openstack-helm-meeting-2019-04-09
15:01:49 <mattmceuen> o/
15:01:51 <portdirect> please add to it :)
15:02:20 <evrardjp> itxaka: your blue is... quite punchy :D
15:02:42 <itxaka> I know, I dont know what to change it to so its easily discernible but doesnt make eyes bleed
15:03:13 <evrardjp> =)
15:03:27 <portdirect> it reminds me of word processing in displaywrite4
15:04:27 <evrardjp> souvenirs!
15:04:39 <srwilkers> o/
15:05:09 <prabhusesha> Hi
15:05:18 <portdirect> ok - lets go
15:05:27 <portdirect> #topic Office hours
15:05:43 <prabhusesha> This is Prabhu from Juniper
15:05:54 <portdirect> so after last weeks meeting I did a straw poll amongst cores
15:06:23 <evrardjp> I thought you were sending this publicly in the ML?
15:06:25 <portdirect> and we would like to run office hours on wednesday each week from 20:00-21:00UTC
15:06:54 <portdirect> as this is the time that would allow us to ensure that we have at least a couple of cores present every week
15:07:18 <itxaka> is there no EU cores? seems a very late time for EUr people to ask questions :)
15:07:25 <evrardjp> itxaka: there are none
15:07:42 <portdirect> and that was the critical thing we needed to ensure to give us a chance of giving real value to these sessions
15:08:04 <portdirect> itxaka: at the moment only lost europeans - though it would be great to change that
15:08:06 <evrardjp> portdirect: that's correct, without cores, it's impossible to get mentored
15:08:40 <evrardjp> Let's see if you have enough ppl to mentor. I am pretty sure jayahn will love this news :)
15:08:56 <evrardjp> It's a good step forward, thanks portdirect  for organising this!
15:09:20 <evrardjp> itxaka: we can still have jhesketh as our representative for questions :)
15:09:25 <itxaka> :)
15:09:56 <portdirect> ok - I'll send out a note to the ml after this and get the wiki updated
15:10:00 <jayahn> I have been traveling, and someone mentioned my name. :)
15:10:26 <jayahn> I am in Las Vegas for NAB show. Doing some exhibition. :)
15:10:26 <evrardjp> I did -- Please don't hesitate to read the meeting log -- there is a proposal of office hours time.
15:10:44 <jayahn> I will surely read through
15:10:49 <evrardjp> portdirect: thanks
15:11:10 <jayahn> portdirect: thanks as well. :)
15:11:45 <portdirect> np - im hoping that we can get some real flow out of these
15:11:51 <portdirect> so lets move on
15:11:55 <portdirect> #topic Status of Tempest chart
15:11:58 <itxaka> thats me!
15:12:04 <portdirect> floors yours :)
15:12:24 <itxaka> So looking yesterday at the tempest chart and trying to deploy I found out that it has some weird and missing default values which make it impossible to deploy
15:12:55 <itxaka> i.e: wrong image (does not exists), default null values which are not really null and so on
15:13:18 <itxaka> was wondering if we need more testing in there as its currently not possible to deploy tempest with the default values properly
15:13:38 <itxaka> btw, sent some patches already for those things, but makes me wonders which others may be broken as well
15:14:03 <portdirect> i think this really speaks to the gaps we have in our testing atm
15:14:08 <srwilkers> we used to trigger an experimental job with the following: https://github.com/openstack/openstack-helm/blob/master/tools/deployment/multinode/900-tempest.sh
15:14:11 <itxaka> which links with the next point whiich is basically the same but for spiceproxy :)
15:14:16 <srwilkers> albeit, it's a bit dated at this point
15:14:45 <portdirect> jayahn: i think you are currently using the tempest chart?
15:15:07 <itxaka> Im not sure of what a solution could be, other than getting more people to use the tempest chart and report things :)
15:15:32 <evrardjp> itxaka: making this part of the jobs, and voting?
15:15:40 <srwilkers> not voting
15:15:48 <srwilkers> as currently it's not exercised at all
15:15:54 <evrardjp> not immediately
15:15:59 <evrardjp> I mean as a long term goal
15:16:04 <itxaka> if it doesnt vote it will get ignored forever :D
15:16:21 <srwilkers> also in my experience, running tempest as an experimental check previously would more often than not put a large amount of strain on the nodepool vms
15:16:21 <evrardjp> if it's maintained, I don't see a reason to not make it voting
15:16:36 <srwilkers> it needs a promotional period like all the other jobs we introduce
15:16:50 <evrardjp> srwilkers: I have noticed the impact when doing this in parallel, but running in serial is longer but with less impact.
15:16:51 <srwilkers> where we can vet it and determine whether it's reliable or not
15:16:53 <portdirect> i really think it would be great to make this a goal
15:17:12 <portdirect> as tempest should be the de-facto way we validate the basic sanity of an osh deployed cloud
15:17:25 <srwilkers> i dont disagree
15:17:25 <evrardjp> srwilkers: totally -- it's not like we should drop a new job and say "hey, it's the new world order"
15:17:32 <evrardjp> it's up to cores to +w anyway :p
15:17:42 <itxaka> well, if you make it sound that cool, maybe we will
15:17:44 <evrardjp> portdirect: agreed
15:17:52 <itxaka> s/will/should/g
15:18:22 <portdirect> whould anyone like to step up here?
15:18:28 <evrardjp> tempest has multiple tests, we don't need to run everything. I think we could run only smoke tests
15:18:36 <portdirect> ++
15:18:49 <jsuchome> +1 for some tests
15:18:51 <evrardjp> portdirect: I guess the first action point would be to fix things, and discuss it with jayahn?
15:18:52 <jsuchome> smoke
15:19:07 <portdirect> evrardjp: yes
15:19:09 <srwilkers> tempest is set up to run smoke tests by default if i recall
15:19:20 <itxaka> it is, srwilkers
15:19:23 <evrardjp> if we get jayahn to review (as main user of said helm chart), we should be good to re-introduce a non voting job
15:19:51 <itxaka> I can take that unless somebody really, really, *really* want it :)
15:19:57 <itxaka> fix, test, add job
15:20:05 <evrardjp> srwilkers: we should maybe whitelist until confidence, then move to more complete scenarios then
15:20:17 <evrardjp> because there are lots of smoke tests in default test list
15:20:21 <srwilkers> evrardjp: im fine with that.  however, id also like to see more individuals take interest in maintaining the jobs we currently have.  we've got a small subset of people actually looking at our periodic jobs and attempting to fix what's broken
15:20:30 <portdirect> itxaka: i think from what jayahn just posted in openstack-helm, that would be great - and i'd add jaesang to reviews
15:20:37 <evrardjp> srwilkers: that's a fair point.
15:21:01 <evrardjp> but if those jobs are voting, people will have no choice than to fix things? :p
15:21:07 <srwilkers> thats a lazy approach
15:21:19 <portdirect> srwilkers: you have been doing pretty much all the work here - and really deserve thanks for that
15:21:21 <itxaka> srwilkers, should we bring that up in the agenda to discuss? Im interested in that
15:21:38 <evrardjp> the lazy approach IMO would be to make the jobs non voting or experimental when they start to fail...
15:21:47 <evrardjp> but I agree it's always the problem when new jobs are introduced
15:21:54 <srwilkers> its not even new jobs evrardjp
15:22:18 <srwilkers> we've had failing voting jobs before, and my previous statement applies: it's a small set of people working to triage and fix them as soon as possible to get the pipes moving again
15:22:43 <srwilkers> which is why im super skeptical about just jamming in new voting jobs, until that changes
15:22:52 <portdirect> srwilkers: agreed
15:22:53 <evrardjp> that's everyone's duty IMO.
15:23:05 <evrardjp> you see something failing you fix it
15:23:21 <portdirect> i think we should probably try and implement what we planned at the last ptg evrardjp
15:23:21 <evrardjp> some people are just unaware of some of the issues though, and they don't monitor periodics
15:23:38 <portdirect> but aware that we are all time poor there
15:23:40 * itxaka didnt even knew there were periodic jobs...
15:25:45 <itxaka> did I drop or everyone went silent?
15:25:50 <evrardjp> http://zuul.openstack.org/builds?project=openstack%2Fopenstack-helm
15:26:21 <evrardjp> the scarier is here:
15:26:27 <evrardjp> http://zuul.openstack.org/builds?project=openstack%2Fopenstack-helm&pipeline=periodic
15:27:46 <itxaka> should we...move onto the next point?
15:27:59 <evrardjp> I would be fine with that
15:28:32 <portdirect> #topic Status of Spiceproxy
15:28:41 <portdirect> itxaka: your up again :)
15:28:47 <itxaka> same thing, different name
15:29:08 <portdirect> here i think jayahn has a production clouding usuing this for vdi?
15:29:09 <itxaka> spiceproxy was missing serviceaccounts, which leads me to think that there was no testing in there for it
15:29:24 <itxaka> damn jayahn has everything which is broken :P
15:29:40 <portdirect> no - i think this is just the tip of the iceberg
15:29:49 <portdirect> thanks for digging into it itxaka
15:29:54 <itxaka> so same thing as with tempest, should we fix, test, add non-voting?
15:30:02 <evrardjp> lgtm
15:30:04 <portdirect> i think thats the prudent thing
15:30:14 <portdirect> itxaka: you already added the svc account i think?
15:30:23 <itxaka> I could help there, unfortunately I have never used spiceproxy so Im not sure what the configs for it are
15:30:27 <itxaka> portdirect, yep, I did
15:30:31 <evrardjp> would do this in two different patches though (fixing and change testing)
15:30:44 <portdirect> i could take that on if you like, i used to use spice a bit
15:30:48 <itxaka> but it still didnt work for me due to permission errors, so Im wondering what else is missing in there
15:31:00 <evrardjp> (except if tempest testing of this is already present, which I doubt)
15:31:11 <srwilkers> the spice must flow
15:31:12 <itxaka> portdirect, that sounds good, I'll post in openstack-helm the exact error that I got then
15:31:26 <portdirect> thx - I'll update the etherpad
15:32:09 <portdirect> ok to move on?
15:32:14 <itxaka> :+1
15:32:18 <portdirect> #topic status of image registry authentication
15:32:26 <portdirect> not sure who added this?
15:32:31 <pgaxatte> that would be me
15:32:48 <pgaxatte> i'm looking into using a private registry to pull images
15:32:55 <pgaxatte> and i need to use authentication
15:33:06 <pgaxatte> stumbled upon angiewang's specs
15:33:21 <portdirect> yeah - its a nice bit of work
15:33:24 <pgaxatte> but it seems to need some work and i was wondering what was the status on it
15:33:49 <portdirect> we should chase up with Angie, but since its been merged i dont think we have managed to make any progress
15:34:09 <pgaxatte> i could not find any related patch in progress
15:34:11 <portdirect> though the spec makes pretty clear the work that needs done, so i think anyone could pick it up
15:34:43 <itxaka> wow that spec is pretty cool, the code is basically there :o
15:34:47 <pgaxatte> i would gladly try it but i have very little experience with helm
15:35:12 <angiewang> hi currently, the implementation for the support private registry with auth has not started yet.
15:35:55 <itxaka> sounds like an opportunity for some mentoring no?
15:36:07 <evrardjp> itxaka: +1
15:36:09 <pgaxatte> that would be great
15:37:16 <evrardjp> portdirect: sorry to highjack the topic, but there is an onboarding session at the next summit for OSH, right?
15:37:23 <portdirect> evrardjp: there is
15:37:37 <pgaxatte> awesome
15:37:47 <evrardjp> sadly those are not recorded, but the eventual slides can be re-used/shared.
15:37:52 <evrardjp> pgaxatte: will you be there?
15:37:59 <pgaxatte> evrardjp: yes
15:38:05 <evrardjp> perfect.
15:38:16 <portdirect> angiewang: do you have time to work on this atm? it would be great if we could get a refence to spread out across charts
15:38:44 <evrardjp> angiewang: and if you could bring pgaxatte along the way that would be helpful to grow the community
15:39:17 <evrardjp> (no pressure though!)
15:40:11 <angiewang> do you mean pgaxatte will be implementing the changes in my spec and I will be helping with any issues
15:41:20 <evrardjp> I just mean there are opportunities to work together
15:41:30 <evrardjp> for example, pgaxatte could learn by reviewing
15:41:45 <portdirect> together we can spread the load
15:41:48 <evrardjp> or the other way around pgaxatte could learn by doing and you would mentor in the reviews
15:42:04 <pgaxatte> anyway i'm ok with it
15:42:10 <evrardjp> or what portdirect said.. but basically communication is key :)
15:42:46 <pgaxatte> let it be noted that i still have a lot of things to understand on openstack-helm and openstack-helm-infra
15:42:47 <evrardjp> s/but//
15:43:05 <portdirect> pgaxatte: dont worry - we all do :)
15:43:30 <pgaxatte> especially on helm-toolkit
15:43:55 <portdirect> are you working on the tungesten-fabric charts atm pgaxatte ?
15:44:30 <angiewang> I have other tasks right now, I am not available to implement this at the moment(in the next month). I can help pgaxatte in reviews though.
15:44:38 <pgaxatte> portdirect: no we're looking at core components and mistral at the moment
15:45:26 <pgaxatte> angiewang: help with the reviews would be great too
15:45:42 <portdirect> ok - mistral falls well within that bucket of things that have regressed in tests since the last ptg - we should also work to restore that
15:46:31 <pgaxatte> portdirect: that's something we plan on working on too, we'll probably have the occasion to contribute back
15:46:58 <angiewang> pgaxatte Yeah, I can help in reviews.
15:47:04 <angiewang> I can also help with implementing a month later
15:47:19 <angiewang> If it's not rush
15:47:40 <portdirect> angiewang: will you be in denver at the end of the month?
15:47:40 <evrardjp> I think the introductions are done now
15:47:44 <pgaxatte> no problem we can bypass that for now in our roadmap
15:48:11 <angiewang> portdirect, no
15:48:16 <portdirect> pgaxatte: as a short term, you can add .dockercfg with the appropriate creds to the kubeletes home dir
15:48:34 <portdirect> but lets see if we can get the ball rolling on implementing the correct approach
15:48:50 <portdirect> angiewang: thats a shame, would have been great to have you out there
15:49:16 <portdirect> ok to move on?
15:49:29 <pgaxatte> ok for me
15:49:29 <prabhusesha> Sorry to jump in.
15:49:43 <prabhusesha> I'm Prabhu, first time to this mtg.
15:49:52 <prabhusesha> Need some time at the end
15:50:01 <portdirect> will do prabhusesha
15:50:08 <prabhusesha> I didn't add my stuff to the agenda
15:50:15 <portdirect> can you add that now? https://etherpad.openstack.org/p/openstack-helm-meeting-2019-04-09
15:50:20 <prabhusesha> thanks1
15:50:24 <portdirect> #topic reviews
15:50:47 <portdirect> so - we have quite a few outstanding reviews this week that could do with some love
15:51:11 <portdirect> https://review.openstack.org/#/c/647493/ Add internal tenant id in conf (irc: LiangFang)
15:52:05 <portdirect> ^ would be great to get some more fb on this - would be really nice to have this feature
15:52:10 <portdirect> https://review.openstack.org/#/c/651140/ WIP: Use real rpc methods to find out if the service is alive
15:52:10 <portdirect> problem - maybe not enough read-only RPC calls?
15:52:10 <portdirect> problem - RPC service versioning changes per objective and sometimes per release (versions could be tracked with release specific values overrides)
15:52:46 <portdirect> jsuchome riases a good point here, and one we were painfully aware of when implementing the probes inititally
15:53:03 <jsuchome> hi
15:53:10 <portdirect> it would be nice to re-visit and see if theres a way we can implement this without causing tracebacks in logs
15:53:28 <jsuchome> originally I wanted to ask if it could be the right approach, to use some real methods instead of fake one
15:53:35 <jsuchome> but I was looking at it today
15:53:39 <portdirect> it would be lovely is oslo.messgaing supported a `ping` but there we are ;)
15:53:44 <itxaka> maybe a different approach? do we have to check for RPC aliveness? Is there any other places we can look at for how to do this?
15:53:59 <jsuchome> and it seems to me that some components like conductor and scheduler does not seem to support read-only rpc calls
15:54:14 <portdirect> this was the challenge we hit jsuchome :(
15:54:35 <jsuchome> ah, so that explains weird implementation
15:55:00 <portdirect> itxaka: with the absence of more rich reporting from the agents, im at a loss as to how we can do this another way
15:55:07 <portdirect> but would love to have one
15:55:45 <jsuchome> I guess it's wrong approach, trying to add such 'ping' calls to nova code... ?
15:55:49 <evrardjp> get hypervisor status for compute?
15:56:07 <portdirect> evrardjp: but that does not cover the hard ones
15:56:22 <evrardjp> yeah. I suppose we should discuss this in self-healing
15:56:22 <portdirect> we need read-only rpc across the board
15:56:27 <jsuchome> compute actually is solvable, as proposed in that patch
15:56:30 <evrardjp> because they probably have ideas there
15:56:43 <portdirect> evrardjp: agreed
15:56:59 <evrardjp> looks like a good collaboration topic for ptg
15:57:02 <jsuchome> would anyone else (apart from helm project) benefit from new calls?
15:57:04 <portdirect> it would be awesome if we could move off this approach
15:57:05 <evrardjp> with oslo team
15:57:24 <portdirect> jsuchome: if done properly - every deployment/management project could
15:57:52 <itxaka> anything that implements monitoring would take advantage of that
15:57:53 <evrardjp> it doesn't need to be a community goal, but something like that can be approach by a pop up team
15:58:04 <evrardjp> approached*
15:58:09 <jsuchome> hm, might be a good idea to bring it in nova meeting
15:58:34 <portdirect> there have been a few efforts in the past here, though normally they have stalled
15:58:46 <portdirect> getting a project like nova to lead the way would be great
15:58:59 * portdirect we ok to run over a few mins
15:59:29 <evrardjp> portdirect: in this room? I think it's better to continue in #openstack-helm if necessary
15:59:40 <evrardjp> being a good citizen :)
15:59:41 <portdirect> prabhusesha: you oke if we dicsuss your topic in the main #openstack-helm channel?  `Keystone failure with all-in-one setup`
16:00:01 <prabhusesha> ok
16:00:06 <jsuchome> is anyone here active in nova? If not, I can try to ask there myself (still RPC topic)
16:00:18 <portdirect> jsuchome: that would rock
16:00:37 <portdirect> itxaka: did my comments in the etherpad help you with your questions re reviews?
16:00:58 <portdirect> if not lets take it to the openstack-helm channel
16:01:01 <itxaka> portdirect, yep feel free to ignore those, I got enough as to move them forward
16:01:14 <portdirect> awesome - thanks :)
16:01:18 <jsuchome> I mean, it would be better if it would be someone better known to nova community, but I'd do that if no one else fits such description
16:01:25 <portdirect> sorry we out of time folks
16:01:35 <portdirect> #endmeeting openstack-helm