#openstack-meeting-4 log

15:01:13 <portdirect> #startmeeting openstack-helm
15:01:14 <openstack> Meeting started Tue Jul 23 15:01:13 2019 UTC and is due to finish in 60 minutes.  The chair is portdirect. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:15 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:17 <openstack> The meeting name has been set to 'openstack_helm'
15:01:21 <portdirect> lets give it a few mins
15:01:27 <lamt> o/
15:01:28 <dwalt> o/
15:01:29 <gagehugo> o/
15:01:29 <dwalt> I made it
15:01:30 <howell> o/
15:01:33 <roman_g> o/
15:01:33 <cliffparsons> o/
15:01:34 <portdirect> agenda is here: https://etherpad.openstack.org/p/openstack-helm-meeting-2019-07-23
15:01:42 <srwilkers> o/
15:01:52 <srwilkers> happy tuesday e'rybody
15:01:59 <cheng1> o/
15:02:33 <zhipeng[m]> hello
15:02:43 <zhipeng[m]> have a good day, everyone
15:03:08 <rihabb2> o/
15:06:00 <portdirect> ok - lets go
15:06:09 <portdirect> #topic Apologies
15:06:37 <zhipeng[m]> lets go:)
15:06:41 <portdirect> I need to hold my hands up here, and apologise for the admin backlock thats built up
15:06:59 <portdirect> in particular im behind on the meeting time re-arangment
15:07:08 <portdirect> and the release versioning spec
15:07:16 <portdirect> im sorry - things have been hectic
15:07:24 <portdirect> and im trying to get back on track
15:07:30 <portdirect> though may take another week or two
15:07:32 <srwilkers> portdirect: you're only human
15:09:18 <portdirect> #topic Gating for more than just the ocata release.
15:09:46 <portdirect> so we have had checks in place for all openstack realeases from ocata to stein for some time now
15:09:56 <portdirect> and also opensuse support for rocky
15:10:14 <portdirect> its probably time we considered them for promotion to gates
15:10:43 <portdirect> or at least a subset of them
15:10:45 <portdirect> thoughts?
15:11:43 <srwilkers> i think that makes sense
15:11:49 <lamt> ++, but do we need to cover all the releases from ocata -> stein? or just a subset of those releases
15:12:27 <srwilkers> i'm checking through now and don't see any particular subset of jobs that seems to fail often enough to cause concern, at least from my perspective
15:13:18 <evrardjp> srwilkers: the simple fact of enabling things in gates will make things fail, Murphy's law still apply
15:13:21 <srwilkers> well, i take that back -- it seems the compute jobs fail more frequently than anything else, but that's not any different than what i've noticed in the past both with single node and multinode jobs
15:13:29 <srwilkers> evrardjp: obviously
15:13:48 <portdirect> lamt: im wondering if just a subset makes sense?
15:14:12 <evrardjp> I would ofc vote (pun intended) to make SUSE voting on rocky :D
15:14:14 <portdirect> possibly gate on the default images, and the current 'supported' openstack releases?
15:14:15 <lamt> I don't recall any major failure - most of the issues I remember is with supporting various distribution like fedora and centos - sometimes the package repo change and things break
15:14:50 <evrardjp> portdirect: or we can just slowly advance, and report on the progress in this meeting?
15:15:08 <evrardjp> "this week we enabled x as voting, and things seems to be still fine"
15:15:14 <jsuchome> what is slowly advancing?
15:15:19 <portdirect> im more worried about srwilkers concern
15:15:23 <evrardjp> "next week we're gonna add x"
15:15:34 <portdirect> in that if we have too many gates - then murphy will hit us hard
15:15:45 <evrardjp> that's happening in every project
15:15:48 <portdirect> its pretty common atm to see one check fail each time
15:16:00 <portdirect> and the distrobution of that is fairly even :(
15:16:03 <evrardjp> is the test flaky ?
15:16:12 <portdirect> i think mostly infra
15:16:31 <evrardjp> so maybe we should start on improving reliability, by listing the causes of the rechecks?
15:16:33 <srwilkers> it's not just the tests -- at times, its the initial chart deployments that fail
15:16:53 <portdirect> running openstack in k8s, in a 8gb ram, 4vcpu vm with horrible i/o is fun
15:16:58 <srwilkers> ^
15:17:00 <evrardjp> srwilkers: on that note, reliability is for everyone, gates or not
15:17:12 <srwilkers> evrardjp: I dont think i stated otherwise
15:17:16 <portdirect> the horrible i/o is the reall killer
15:17:31 <evrardjp> srwilkers: I know, I am just pointing out the fact it's important to be reliable at all times :D
15:17:48 <srwilkers> evrardjp: as am i.  however, we're limited by the infrastructure we're provided, as portdirect pointed out
15:17:54 <srwilkers> unfortunately, thats outside of our control
15:18:06 <portdirect> how about 'walking back down'
15:18:17 <portdirect> ie what evrardjp suggested, but starting with stein
15:18:19 <evrardjp> portdirect: what do you mean?
15:18:22 <portdirect> and then adding rocky
15:18:37 <portdirect> and so on progessivly, untill we feel we've reached the nadir
15:18:48 <evrardjp> I think the concern is not about the way up or down, it's the fact we can trust gate reliability
15:18:58 <portdirect> yup
15:19:21 <evrardjp> (and the fact that we have ppl monitoring those jobs)
15:20:25 <srwilkers> that's a valid point and i'm glad you brought it up
15:22:05 <portdirect> ok - sounds like we dont think we are ready to add new gates atm?
15:22:14 <portdirect> evrardjp / srwilkers ?
15:22:15 <evrardjp> I know I won't check periodics for things outside suse/rocky. When I will do a PR, I will check if things work, and try to fix things if not working, whether SUSE or not. But I can't promise more.
15:22:37 <evrardjp> I prefer not promising and over perform than the other way around. :p
15:22:53 <srwilkers> if a change is made, it should ideally be vetted against all jobs we've decided to promote to gates or periodics
15:23:15 <srwilkers> quite frankly, i find it frustrating that there's not much concern outside of myself for making sure the periodic jobs run successfully
15:23:15 <evrardjp> could you clarify srwilkers?
15:23:41 <evrardjp> srwilkers: I understand, and feel your pain.
15:23:59 <evrardjp> I am sorry there aren't more eyes on this.
15:24:12 <evrardjp> you know you can ping ppl if necessary too, right? :p
15:24:53 <evrardjp> portdirect: I let the decision to ppl monitoring said jobs, and to your authority
15:25:01 <evrardjp> I leave the decision to*
15:25:11 <evrardjp> or whatever it is in english
15:26:45 <portdirect> ok - lets revisit this once we have some stats on where we see failures currently
15:26:55 <portdirect> and untill then leave the voting jobs as they are.
15:27:16 <portdirect> #topic Backup/restore Enhancements for MariaDB and Postgresql Databases
15:27:29 <evrardjp> portdirect: sounds good
15:27:58 <portdirect> cliff is this you?
15:28:07 <cliffparsons> portdirect: yes
15:28:10 <cliffparsons> Hello everyone, good morning!
15:28:23 <cliffparsons> I wanted to make everyone aware of the pending backup/restore enhancements for MariaDB and Postgresql.
15:28:32 <cliffparsons> The patchsets that introduce these enhancements: https://review.opendev.org/#/c/655231/ and https://review.opendev.org/#/c/661853/
15:29:08 <cliffparsons> The Postgresql and MariaDB releases already have automatic backup and restore capability from a local PVC or host path volume.
15:29:24 <cliffparsons> The two patchsets above are enhancing it so that the databases can be backed up to a remote platform via Swift API;
15:29:25 <cliffparsons> in our case we have chosen Ceph RGW, which will be running in a different cluster.  But to test it in the OSH gate environment, we
15:29:25 <cliffparsons> use the ceph rgw in the same cluster. The keystone dependency is required to create a keystone user to use the Swift API.
15:30:20 <cliffparsons> So I was wondering if anyone has seen these patchsets, and if so, are there any concerns?
15:30:45 <srwilkers> taking a peek real quick
15:32:18 <portdirect> i'll need to look properly at: https://review.opendev.org/#/c/655231/56/tools/deployment/backup-restore/backup-restore-tests.sh
15:32:23 <srwilkers> https://review.opendev.org/#/c/661853/62/helm-toolkit/templates/manifests/_job-ks-user.yaml.tpl seems like a sane change, but maybe should have been a separate change along with defaults for these into the jobs that consume that template
15:33:12 <cliffparsons> srwilkers: I can split out the job definition if that makes more sense
15:33:26 <cliffparsons> just wanted to show that it's tested
15:33:41 <portdirect> cliffparsons: that would be great - it would probably also make sense to add this to all the jobs?
15:34:17 <srwilkers> portdirect: i think it does -- at the moment, we've got some jobs that support these overrides, and others that dont
15:34:27 <cliffparsons> portdirect: assuming you're referring to enabling the backups and testing even on the voting gates?
15:34:28 <srwilkers> would be nice to make that standard
15:34:51 <portdirect> cliffparsons: i was refering to the params added here: https://review.opendev.org/#/c/661853/62/helm-toolkit/templates/manifests/_job-ks-user.yaml.tpl
15:35:05 <portdirect> would be a great addition to all jobs
15:35:12 <cliffparsons> oh
15:35:19 <srwilkers> also, wondering if it makes sense to just add testing this functionality to the openstack-support job we already have
15:35:27 <portdirect> also i wonder if we should make the default to retry 'indefinately'
15:35:34 <srwilkers> instead of creating an entirely new job for testing backup functionality
15:35:47 <portdirect> i think that makes sense
15:35:51 <srwilkers> portdirect: that default would make me happy honestly
15:36:02 <cliffparsons> srwilkers: originally started doing that but was a little unsure if we wanted to do that
15:36:05 <lamt> ++
15:36:17 <cliffparsons> portdirect: thanks for the suggestion, I'll work on that
15:36:24 <srwilkers> for a start, it would make debugging chart job failures in our zuul jobs easier
15:36:57 <cliffparsons> srwilkers: agree
15:37:23 <cliffparsons> Any other concerns or comments?
15:37:43 <portdirect> none atm - thanks cliffparsons for taking this on
15:37:46 <srwilkers> cliffparsons: nope
15:37:54 <cliffparsons> portdirect: Thanks for the floor
15:38:10 <portdirect> #topic OVS-DPDK CI needs KVM support in nodepool VM and DPDK support of the virtual NIC in VM
15:38:18 <cheng1> this is mine
15:38:19 <portdirect> cheng1: floor is yours
15:38:35 <cheng1> I am working on ovs-dpdk CI job
15:39:03 <cheng1> As I can see we have two kinds of VMs in openstack CI
15:39:17 <cheng1> allocated from nodepool
15:39:31 <cheng1> #1. Has two interfaces, one with a public ip and the other with a private ip. They don't have pci address. KVM is not supported in the VM
15:39:47 <cheng1> #2. With only one interface which has pci address. KVM is supported in the VM
15:40:14 <srwilkers> ah
15:40:21 <cheng1> For ovs-dpdk test, we need kvm support and the NIC should have pci address
15:41:14 <cheng1> Not sure if anyone here is similar with openstack CI VMs
15:41:36 <cheng1> anyway to customize the VM
15:41:39 <cheng1> ?
15:41:51 <portdirect> ok so we need to run this in #2?
15:42:08 <portdirect> or do neither of the profiles provide what we need?
15:42:52 <cheng1> #2 is good except it has only 1 NIC
15:43:06 <cheng1> which has a public ip
15:43:07 <portdirect> ok - lets see what we can do
15:43:25 <cheng1> thanks portdirect
15:43:28 <portdirect> following this meeting do you have time to join #openstack-infra, and we can ask them there?
15:44:10 <cheng1> let me check the meeting time first. It's already 11:44 PM
15:44:20 <portdirect> ok - that makes sense!
15:44:40 <portdirect> I'll ping them while we discuss the next topic and see if we can circle back here
15:44:46 <portdirect> ok to move on for now?
15:44:57 <cheng1> sure, portdirect thanks
15:44:59 <portdirect> #topic How to make metadata update/create/delete available on Horizon?
15:45:41 <zhipeng[m]> y
15:46:03 <zhipeng[m]> Anyone can help on this question?
15:46:36 <portdirect> not off the top of my head
15:46:51 * zhipeng[m] sent a long message:  < https://matrix.org/_matrix/media/v1/download/matrix.org/JWUOrYLfNTMegzKgndZRZXzC >
15:46:55 <portdirect> can you provide the config that you'd be looking to apply at the pod end
15:47:10 <portdirect> and we can then work out how to get that into inputs for the chart
15:47:17 <portdirect> or make any required changes
15:48:04 <zhipeng[m]> ok
15:48:11 <lamt> I can help with that if you can provide the configuration you have in mind
15:49:09 <zhipeng[m]> overides for horizon?
15:49:47 <cheng1> Do you have the same issue with other interface more than horizon? zhipeng[m]
15:50:01 <cheng1> like openstack cmdline
15:50:22 <zhipeng[m]> not sure about it
15:51:39 <srwilkers> zhipeng[m]: i think the ask here from portdirect and lamt is to provide them the required configuration for enabling those operations on metadata through horizon, so that they can work to get that functionality into the horizon chart if it's missing
15:51:51 <srwilkers> as i'm not entirely sure what configurations required to make that happen either
15:52:00 <zhipeng[m]> @lamt,  which config you want to check?
15:52:03 <portdirect> though here we need to bottom out cheng1's question as well
15:52:16 <cheng1> Then you can't make sure the problem is on horizon or nova itself
15:52:21 <portdirect> if its not just horizon, then is a policy.json issue somewhere else
15:53:02 <portdirect> and right now - our policy.jsons in the charts are default ocata era ones
15:53:03 <lamt> ++, want to make sure the problem is just with horizon and not with nova
15:53:08 <portdirect> so that may well be it for starlinx
15:53:37 <zhipeng[m]> OK, I will check further, thanks!
15:53:41 <lamt> does starlingX use ocata as well?
15:53:51 <zhipeng[m]> no
15:54:01 <cheng1> stein I think
15:54:07 <zhipeng[m]> we switch to stein now
15:54:10 <lamt> zhipeng[m]: if you have questions, feel free to ping folks in the #openstack-helm channel
15:54:11 <lamt> ah
15:54:41 <zhipeng[m]> thanks lamt!   will ping you later:)
15:55:04 <portdirect> #topic In cinder pod, the default type is missing. Is it a helm chart issue?
15:55:31 <portdirect> zhipeng[m] think this is you?
15:56:52 <portdirect> ok - lets discuss this further in the channel
15:57:08 <portdirect> #topic DPDK checks revisited
15:58:10 <zhipeng[m]> y
15:58:15 <portdirect> ok - I reached out to -infra and both fungi and clarkb were very helpful
15:58:30 <portdirect> unfortaulty we cannot select which provider we use in nodepool
15:58:35 <fungi> helpful as we can be anyway
15:58:48 <portdirect> and if we could, there is no gtee that the prfile of the vms would be the same
15:58:59 <portdirect> https://docs.openstack.org/infra/manual/testing.html
15:59:03 <portdirect> ^ a great ref here
15:59:41 <cheng1> If that, we can test very limited cases for ovs-dpdk feature
15:59:55 <portdirect> cheng1: that would be a great start
15:59:55 <fungi> the short answer is that the environment we provide is targeted at testing software in isolation, it's not really geared toward testing features of the environment itself
16:00:48 <fungi> so jobs should avoid depending on test nodes having a specific number of network interfaces, hypervisor, processor flags, et cetera
16:00:55 <portdirect> fungi: thats fair - and we love what we can get out of -infra atm :D
16:03:04 <portdirect> ok - we out of time
16:03:14 <portdirect> thanks for coming peeps
16:03:17 <portdirect> see you in #openstack-helm
16:03:21 <portdirect> #endmeeting