15:00:16 #startmeeting kolla 15:00:20 Meeting started Wed Jan 27 15:00:16 2021 UTC and is due to finish in 60 minutes. The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:21 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:23 The meeting name has been set to 'kolla' 15:00:26 #topic rollcall 15:00:28 \o 15:00:34 \o 15:00:46 o/ 15:00:55 [o] 15:03:09 /o\ 15:03:48 o/ 15:04:49 #topic agenda 15:04:57 * Roll-call 15:04:59 * Announcements 15:05:01 * Review action items from the last meeting 15:05:03 * CI status 15:05:04 * Cleaning up disabled components and services (e.g. monasca/storm) https://review.opendev.org/c/openstack/kolla-ansible/+/769900 15:05:07 * Allowed to fail images - next steps https://review.opendev.org/c/openstack/kolla/+/765807 15:05:09 * Wallaby release planning 15:05:11 #topic announcements 15:05:18 I have none. Anyone else? 15:06:01 #topic Review action items from the last meeting 15:06:05 There were none 15:06:10 #topic CI status 15:06:53 How are we looking? 15:07:05 We've had some issues with train, let's start there 15:07:08 good 15:07:34 everything works 15:08:27 great. Thanks to those involved in rescuing it 15:08:35 Stein has some issues 15:08:38 https://review.opendev.org/c/openstack/kolla-ansible/+/772501 15:08:44 https://review.opendev.org/c/openstack/kolla/+/772490 15:08:51 Those two should address them 15:09:54 optional: propose CI Slim Down in K-A (T-W) 15:09:57 remove l-c jobs in K-A U, V & W (as agreed; already dropped in older) 15:10:11 let's either do those or remove them from the board 15:10:16 do these 15:11:10 I guess I'm undecided on the slim down 15:12:14 has anyone actually run 'check experimental' in kolla since we removed the jobs? 15:12:26 I 15:12:27 once 15:12:40 I'd prefer we make CI work better 15:12:46 but we can't easily 15:12:58 work better how? 15:14:56 I honestly don't think anybody will use check experimental 15:15:20 So, if we could have a list of problematic jobs, and vote on what to do with them - would probably be better 15:15:36 most people don't know about it, and those who do are likely to forget 15:15:57 it seems like a reasonable trade-off for kolla, but not so sure about k-a 15:16:24 maybe we should have only one of each scenario 15:16:34 and put the other in experimental 15:16:35 mgoddard: as in dockerhub not failing 15:16:43 and no disk_full 15:16:46 and no timeout 15:16:56 and no multiple retries 15:17:11 we got ourselves a ton of jobs 15:17:14 I liked adding them 15:17:21 but I don't like waiting for them nowadays :D 15:17:33 * yoctozepto practicing self-blame 15:17:53 well, I prefer to wait for them, instead of write check experimental and wait for them to fail :) 15:18:08 +1 15:18:31 we do have various things that make CI unreliable, but it does seem workable at the moment 15:19:21 I would rather keep a list of jobs that need to be looked into, if there's no interest - we can decide to remove them, or just disable them. 15:19:48 or combine them 15:19:52 +1 15:20:05 let's not overwork ourselves for the moment 15:20:12 remove the point on slim down 15:20:20 ok 15:20:22 and we will reconsider our options when it gets worse 15:20:31 kolla has it already 15:20:42 it sounds like a reasonable tradeoff 15:20:50 I know the ceph jobs are failing often, but I plan to move to use cephadm, and then investigate failures if they happen. 15:21:26 yeah, cephadm should help a bit; ceph-ansible is quite quirky 15:21:27 are there legit ceph failures we should look into? 15:21:48 or perhaps we should just spin ceph manually for our simple scenario 15:21:55 it's a no-brainer xD 15:22:10 mgoddard: I don't think so 15:22:20 it's just feels worse than before 15:22:21 ok 15:22:26 let's move on, tired of CI :) 15:22:27 perhaps just due to multinode being worse 15:22:29 indeed! 15:22:38 #topic Cleaning up disabled components and services (e.g. monasca/storm) https://review.opendev.org/c/openstack/kolla-ansible/+/769900 15:22:50 * yoctozepto to deprecate everything 15:23:09 This came up while reviewing one of dougsz's patches 15:23:14 https://review.opendev.org/c/openstack/kolla-ansible/+/769900 15:23:17 monasca goes out? 15:23:20 no 15:23:33 but it's going on a diet :) 15:23:49 some things will be removed 15:24:09 and we'll add a flag to allow disabling the alerting pipeline 15:24:12 https://review.opendev.org/c/openstack/kolla-ansible/+/769902/ 15:24:17 so... 15:24:34 there are two cases here 15:24:52 1. a container is no longer required and should be cleaned up 15:25:16 2. a container is disabled based on some feature flag, and should be cleaned up 15:25:18 ah, this general thing 15:25:38 1. I think should be done on upgrade 15:25:46 we should address both at once 15:26:22 2. is less clear. I don't really want a bunch of code that cleans up on every deploy 15:26:23 perhaps treat being disabled as removal? 15:26:30 it will be messy and slow 15:26:33 and have a stub for those removed completely due to new release 15:26:50 hmm 15:27:06 there is also a 3. which I forget 15:27:21 3. a service (enable_X) is disabled and should be cleaned up 15:27:53 3 is tricky because the site.yml playbook doesn't execute plays for disabled services 15:28:09 we need undeploy.yml in each role and a separate command to clean up? 15:28:19 that is my preference 15:28:42 that sounds simplistic and doable in short time 15:28:54 right 15:29:07 however, I think dougsz didn't want to overload his patch 15:29:16 so we came to a compromise 15:29:38 he will implement a specific command to clean up the monasca alerting pipeline, which can be adapted into a general cleanup command 15:29:53 e.g. kolla-ansible monasca-disable-alerting 15:30:01 does that seem reasonable? 15:30:15 what about kolla-ansible destroy --tags service ? 15:30:25 (dougsz is busy right now so can't join in) 15:30:55 wuchunyang: I think destroy is subtley different 15:31:16 destroy will clean up everything, regardless of enable flags, etc. 15:31:30 cleanup should only destroy things that have been disabled 15:32:22 and still questionable if it should delete database entries and so on 15:32:33 is this making sense? 15:32:42 hmm, hadn't considered DB 15:32:51 that would only apply if the whole service is disabled 15:33:02 probably it will evolve some flags 15:33:21 it's tricky and sounds like a good topic for the next ptg 15:33:25 --clean-db --clean-keystone --clean-mq 15:33:36 as the need grows :D 15:33:57 yeah, but the initial approach with Monasca sounds fine, we need to start somewhere 15:33:58 --yes-i-really-really-want-it-clean 15:33:59 sounds good 15:34:01 but a simple deletion of containers and volumes would be a good start 15:34:08 yes 15:34:21 just need to make sure we adapt reno if we change the command name :) 15:34:26 (over time) 15:34:41 I think it would be a new command 15:34:54 + 1 15:35:37 any other thoughts or shall we progress? 15:36:13 #topic Allowed to fail images - next steps https://review.opendev.org/c/openstack/kolla/+/765807 15:36:16 ping hrw 15:36:30 #link https://review.opendev.org/c/openstack/kolla/+/765807 15:36:46 pong 15:36:48 we had some discussion about this in IRC earlier 15:37:12 that patch adds an option to kolla to mark some images as being allowed to fail 15:37:29 the next step is to make use of this in our CI 15:37:45 some key requirements for me: 15:38:15 * a change that breaks an image must fail CI 15:38:42 * ideally we have some visibility if an external change breaks an image 15:39:14 * hrw waits for end of list 15:39:23 15:39:26 ok. 15:39:47 1. 'change that breaks an image must fail CI' == what we have by default 15:40:02 allowed-to-fail does not change anything here 15:40:54 this feature is more for 'omg, xyz image failed again and the only core who understand it is on holidays. let use ignore it for a week with this patch' 15:41:14 ok 15:41:26 otherwise we would switch to profile builds which can list 'those images we build and they can not fail' 15:41:46 I think if we keep the list empty by default it will work 15:41:47 + non-voting job building everything to check are there images which fail 15:41:55 that's the plan 15:42:03 (empty list) 15:42:06 ok 15:42:23 'ideally we have some visibility if an external change breaks an image' == normal CI too 15:43:06 allowed-to-fail is kind of 'one step' before moving to UNBUILDABLE_IMAGES 15:43:07 the original goal was for tiers of images, so I imagined having a static list of tier 3 images 15:43:34 in which case they could have started failig silently 15:43:51 default empty list works for me 15:44:10 so perhaps you are right in saying that the feature is complete :) 15:44:13 anyone disagree? 15:44:51 great 15:44:53 we have 3 levels for images: BUILD, allowed to fail, UNBUILDABLE 15:45:16 with middle one being for emergencies 15:45:25 hrw: could you add a few sentences on that to the contributor docs? 15:45:28 no disagreement 15:45:44 mgoddard: #action please. will handle 15:46:22 #action hrw to document allowed to fail & unbuildable images in contributor docs 15:46:42 good job - nice and simple, should be useful 15:46:44 and 'allowed to fail' is basically CI stuff which can be used by anyone 15:47:14 PTL just too stoopid to understand it :D 15:47:32 #topic Wallaby release planning 15:47:44 mgoddard: after adding UNBUILDABLE I have a feeling that I understand all those lists. and that's scary 15:48:00 ~3 months to release? 15:48:08 I don't think we've talked dates yet 15:48:21 April is usual date ;D 15:48:24 I'll add a section to the whiteboard 15:48:52 ++ 15:49:02 #link https://releases.openstack.org/wallaby/schedule.html 15:49:18 I think that infra stuff would be nice. I may take a look and rewrite patches adding funcionality and skip CI changes 15:49:51 Apr 12 - Apr 16 15:50:15 ^ is OpenStack Wallaby GA 15:50:59 so 1st April is a day when we should be ready with features, freeze, fix issues and release on 1st May 15:51:03 Kolla feature freeze: Mar 29 - Apr 02 15:51:06 Kolla RC1: Apr 12 - Apr 16 15:51:39 CentOS 8 stream is a risk 15:51:58 RDO is targeting stream for wallaby 15:52:55 it will freeze our blood 15:52:56 :-) 15:52:56 just to reiterate - there are around 2 months until feature freeze 15:53:10 now is the time to get working on features 15:53:11 only sad reactions! 15:53:12 if RDO goes stream then so are we. 15:53:14 (and reviewing) 15:53:20 :D 15:53:58 which may end in centos:8 -> stream or ubi:8 as a base image 15:54:02 7 minutes to go. Shall we look at the release priorities? 15:55:01 go, quickly 15:55:06 this one is ever present, and never done: 15:55:09 High level documentation, eg. examples of networking config, diagrams, justification of use of containers, not k8s etc 15:55:27 but if anyone has time for docs, they will be appreciated 15:55:37 Ability to run CI jobs locally (without Zuul, but possibly with Ansible) 15:55:41 yoctozepto: any progress? 15:56:31 mgoddard: none 15:56:49 ok 15:57:12 Switch to CentOS 8 stream 15:57:17 Anyone want to pick this up? 15:58:00 I may if no one wants 15:58:14 ok 15:58:20 Infra images 15:58:33 hrw: you mentioned adding features but skipping CI 15:58:43 AFAIR, CI was the blocking part 15:58:54 the CentOS combined job fails 15:59:14 so we either need to fix that, or implement the multi-job pipeline 16:00:02 I want to rewrite patches to have functionality for review while waiting for someone with ideas of CI changes 16:00:35 it is quite old patchset so I first have to remind myself how it works 16:01:14 I do wonder if the combined job could be made to work with a bit of debugging 16:01:39 it could be a disk issue, try removing images after each build 16:02:20 anyway, we are over time 16:02:23 thanks all 16:02:24 the good part is that CI changes can be done separately 16:02:34 #endmeeting