15:01:13 #startmeeting openstack-helm 15:01:14 Meeting started Tue Jul 23 15:01:13 2019 UTC and is due to finish in 60 minutes. The chair is portdirect. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:15 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:17 The meeting name has been set to 'openstack_helm' 15:01:21 lets give it a few mins 15:01:27 o/ 15:01:28 o/ 15:01:29 o/ 15:01:29 I made it 15:01:30 o/ 15:01:33 o/ 15:01:33 o/ 15:01:34 agenda is here: https://etherpad.openstack.org/p/openstack-helm-meeting-2019-07-23 15:01:42 o/ 15:01:52 happy tuesday e'rybody 15:01:59 o/ 15:02:33 hello 15:02:43 have a good day, everyone 15:03:08 o/ 15:06:00 ok - lets go 15:06:09 #topic Apologies 15:06:37 lets go:) 15:06:41 I need to hold my hands up here, and apologise for the admin backlock thats built up 15:06:59 in particular im behind on the meeting time re-arangment 15:07:08 and the release versioning spec 15:07:16 im sorry - things have been hectic 15:07:24 and im trying to get back on track 15:07:30 though may take another week or two 15:07:32 portdirect: you're only human 15:09:18 #topic Gating for more than just the ocata release. 15:09:46 so we have had checks in place for all openstack realeases from ocata to stein for some time now 15:09:56 and also opensuse support for rocky 15:10:14 its probably time we considered them for promotion to gates 15:10:43 or at least a subset of them 15:10:45 thoughts? 15:11:43 i think that makes sense 15:11:49 ++, but do we need to cover all the releases from ocata -> stein? or just a subset of those releases 15:12:27 i'm checking through now and don't see any particular subset of jobs that seems to fail often enough to cause concern, at least from my perspective 15:13:18 srwilkers: the simple fact of enabling things in gates will make things fail, Murphy's law still apply 15:13:21 well, i take that back -- it seems the compute jobs fail more frequently than anything else, but that's not any different than what i've noticed in the past both with single node and multinode jobs 15:13:29 evrardjp: obviously 15:13:48 lamt: im wondering if just a subset makes sense? 15:14:12 I would ofc vote (pun intended) to make SUSE voting on rocky :D 15:14:14 possibly gate on the default images, and the current 'supported' openstack releases? 15:14:15 I don't recall any major failure - most of the issues I remember is with supporting various distribution like fedora and centos - sometimes the package repo change and things break 15:14:50 portdirect: or we can just slowly advance, and report on the progress in this meeting? 15:15:08 "this week we enabled x as voting, and things seems to be still fine" 15:15:14 what is slowly advancing? 15:15:19 im more worried about srwilkers concern 15:15:23 "next week we're gonna add x" 15:15:34 in that if we have too many gates - then murphy will hit us hard 15:15:45 that's happening in every project 15:15:48 its pretty common atm to see one check fail each time 15:16:00 and the distrobution of that is fairly even :( 15:16:03 is the test flaky ? 15:16:12 i think mostly infra 15:16:31 so maybe we should start on improving reliability, by listing the causes of the rechecks? 15:16:33 it's not just the tests -- at times, its the initial chart deployments that fail 15:16:53 running openstack in k8s, in a 8gb ram, 4vcpu vm with horrible i/o is fun 15:16:58 ^ 15:17:00 srwilkers: on that note, reliability is for everyone, gates or not 15:17:12 evrardjp: I dont think i stated otherwise 15:17:16 the horrible i/o is the reall killer 15:17:31 srwilkers: I know, I am just pointing out the fact it's important to be reliable at all times :D 15:17:48 evrardjp: as am i. however, we're limited by the infrastructure we're provided, as portdirect pointed out 15:17:54 unfortunately, thats outside of our control 15:18:06 how about 'walking back down' 15:18:17 ie what evrardjp suggested, but starting with stein 15:18:19 portdirect: what do you mean? 15:18:22 and then adding rocky 15:18:37 and so on progessivly, untill we feel we've reached the nadir 15:18:48 I think the concern is not about the way up or down, it's the fact we can trust gate reliability 15:18:58 yup 15:19:21 (and the fact that we have ppl monitoring those jobs) 15:20:25 that's a valid point and i'm glad you brought it up 15:22:05 ok - sounds like we dont think we are ready to add new gates atm? 15:22:14 evrardjp / srwilkers ? 15:22:15 I know I won't check periodics for things outside suse/rocky. When I will do a PR, I will check if things work, and try to fix things if not working, whether SUSE or not. But I can't promise more. 15:22:37 I prefer not promising and over perform than the other way around. :p 15:22:53 if a change is made, it should ideally be vetted against all jobs we've decided to promote to gates or periodics 15:23:15 quite frankly, i find it frustrating that there's not much concern outside of myself for making sure the periodic jobs run successfully 15:23:15 could you clarify srwilkers? 15:23:41 srwilkers: I understand, and feel your pain. 15:23:59 I am sorry there aren't more eyes on this. 15:24:12 you know you can ping ppl if necessary too, right? :p 15:24:53 portdirect: I let the decision to ppl monitoring said jobs, and to your authority 15:25:01 I leave the decision to* 15:25:11 or whatever it is in english 15:26:45 ok - lets revisit this once we have some stats on where we see failures currently 15:26:55 and untill then leave the voting jobs as they are. 15:27:16 #topic Backup/restore Enhancements for MariaDB and Postgresql Databases 15:27:29 portdirect: sounds good 15:27:58 cliff is this you? 15:28:07 portdirect: yes 15:28:10 Hello everyone, good morning! 15:28:23 I wanted to make everyone aware of the pending backup/restore enhancements for MariaDB and Postgresql. 15:28:32 The patchsets that introduce these enhancements: https://review.opendev.org/#/c/655231/ and https://review.opendev.org/#/c/661853/ 15:29:08 The Postgresql and MariaDB releases already have automatic backup and restore capability from a local PVC or host path volume. 15:29:24 The two patchsets above are enhancing it so that the databases can be backed up to a remote platform via Swift API; 15:29:25 in our case we have chosen Ceph RGW, which will be running in a different cluster. But to test it in the OSH gate environment, we 15:29:25 use the ceph rgw in the same cluster. The keystone dependency is required to create a keystone user to use the Swift API. 15:30:20 So I was wondering if anyone has seen these patchsets, and if so, are there any concerns? 15:30:45 taking a peek real quick 15:32:18 i'll need to look properly at: https://review.opendev.org/#/c/655231/56/tools/deployment/backup-restore/backup-restore-tests.sh 15:32:23 https://review.opendev.org/#/c/661853/62/helm-toolkit/templates/manifests/_job-ks-user.yaml.tpl seems like a sane change, but maybe should have been a separate change along with defaults for these into the jobs that consume that template 15:33:12 srwilkers: I can split out the job definition if that makes more sense 15:33:26 just wanted to show that it's tested 15:33:41 cliffparsons: that would be great - it would probably also make sense to add this to all the jobs? 15:34:17 portdirect: i think it does -- at the moment, we've got some jobs that support these overrides, and others that dont 15:34:27 portdirect: assuming you're referring to enabling the backups and testing even on the voting gates? 15:34:28 would be nice to make that standard 15:34:51 cliffparsons: i was refering to the params added here: https://review.opendev.org/#/c/661853/62/helm-toolkit/templates/manifests/_job-ks-user.yaml.tpl 15:35:05 would be a great addition to all jobs 15:35:12 oh 15:35:19 also, wondering if it makes sense to just add testing this functionality to the openstack-support job we already have 15:35:27 also i wonder if we should make the default to retry 'indefinately' 15:35:34 instead of creating an entirely new job for testing backup functionality 15:35:47 i think that makes sense 15:35:51 portdirect: that default would make me happy honestly 15:36:02 srwilkers: originally started doing that but was a little unsure if we wanted to do that 15:36:05 ++ 15:36:17 portdirect: thanks for the suggestion, I'll work on that 15:36:24 for a start, it would make debugging chart job failures in our zuul jobs easier 15:36:57 srwilkers: agree 15:37:23 Any other concerns or comments? 15:37:43 none atm - thanks cliffparsons for taking this on 15:37:46 cliffparsons: nope 15:37:54 portdirect: Thanks for the floor 15:38:10 #topic OVS-DPDK CI needs KVM support in nodepool VM and DPDK support of the virtual NIC in VM 15:38:18 this is mine 15:38:19 cheng1: floor is yours 15:38:35 I am working on ovs-dpdk CI job 15:39:03 As I can see we have two kinds of VMs in openstack CI 15:39:17 allocated from nodepool 15:39:31 #1. Has two interfaces, one with a public ip and the other with a private ip. They don't have pci address. KVM is not supported in the VM 15:39:47 #2. With only one interface which has pci address. KVM is supported in the VM 15:40:14 ah 15:40:21 For ovs-dpdk test, we need kvm support and the NIC should have pci address 15:41:14 Not sure if anyone here is similar with openstack CI VMs 15:41:36 anyway to customize the VM 15:41:39 ? 15:41:51 ok so we need to run this in #2? 15:42:08 or do neither of the profiles provide what we need? 15:42:52 #2 is good except it has only 1 NIC 15:43:06 which has a public ip 15:43:07 ok - lets see what we can do 15:43:25 thanks portdirect 15:43:28 following this meeting do you have time to join #openstack-infra, and we can ask them there? 15:44:10 let me check the meeting time first. It's already 11:44 PM 15:44:20 ok - that makes sense! 15:44:40 I'll ping them while we discuss the next topic and see if we can circle back here 15:44:46 ok to move on for now? 15:44:57 sure, portdirect thanks 15:44:59 #topic How to make metadata update/create/delete available on Horizon? 15:45:41 y 15:46:03 Anyone can help on this question? 15:46:36 not off the top of my head 15:46:51 * zhipeng[m] sent a long message: < https://matrix.org/_matrix/media/v1/download/matrix.org/JWUOrYLfNTMegzKgndZRZXzC > 15:46:55 can you provide the config that you'd be looking to apply at the pod end 15:47:10 and we can then work out how to get that into inputs for the chart 15:47:17 or make any required changes 15:48:04 ok 15:48:11 I can help with that if you can provide the configuration you have in mind 15:49:09 overides for horizon? 15:49:47 Do you have the same issue with other interface more than horizon? zhipeng[m] 15:50:01 like openstack cmdline 15:50:22 not sure about it 15:51:39 zhipeng[m]: i think the ask here from portdirect and lamt is to provide them the required configuration for enabling those operations on metadata through horizon, so that they can work to get that functionality into the horizon chart if it's missing 15:51:51 as i'm not entirely sure what configurations required to make that happen either 15:52:00 @lamt, which config you want to check? 15:52:03 though here we need to bottom out cheng1's question as well 15:52:16 Then you can't make sure the problem is on horizon or nova itself 15:52:21 if its not just horizon, then is a policy.json issue somewhere else 15:53:02 and right now - our policy.jsons in the charts are default ocata era ones 15:53:03 ++, want to make sure the problem is just with horizon and not with nova 15:53:08 so that may well be it for starlinx 15:53:37 OK, I will check further, thanks! 15:53:41 does starlingX use ocata as well? 15:53:51 no 15:54:01 stein I think 15:54:07 we switch to stein now 15:54:10 zhipeng[m]: if you have questions, feel free to ping folks in the #openstack-helm channel 15:54:11 ah 15:54:41 thanks lamt! will ping you later:) 15:55:04 #topic In cinder pod, the default type is missing. Is it a helm chart issue? 15:55:31 zhipeng[m] think this is you? 15:56:52 ok - lets discuss this further in the channel 15:57:08 #topic DPDK checks revisited 15:58:10 y 15:58:15 ok - I reached out to -infra and both fungi and clarkb were very helpful 15:58:30 unfortaulty we cannot select which provider we use in nodepool 15:58:35 helpful as we can be anyway 15:58:48 and if we could, there is no gtee that the prfile of the vms would be the same 15:58:59 https://docs.openstack.org/infra/manual/testing.html 15:59:03 ^ a great ref here 15:59:41 If that, we can test very limited cases for ovs-dpdk feature 15:59:55 cheng1: that would be a great start 15:59:55 the short answer is that the environment we provide is targeted at testing software in isolation, it's not really geared toward testing features of the environment itself 16:00:48 so jobs should avoid depending on test nodes having a specific number of network interfaces, hypervisor, processor flags, et cetera 16:00:55 fungi: thats fair - and we love what we can get out of -infra atm :D 16:03:04 ok - we out of time 16:03:14 thanks for coming peeps 16:03:17 see you in #openstack-helm 16:03:21 #endmeeting