15:00:16 <mgoddard> #startmeeting kolla 15:00:20 <openstack> Meeting started Wed Jan 27 15:00:16 2021 UTC and is due to finish in 60 minutes. The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:23 <openstack> The meeting name has been set to 'kolla' 15:00:26 <mgoddard> #topic rollcall 15:00:28 <mgoddard> \o 15:00:34 <wuchunyang> \o 15:00:46 <osmanlicilegi> o/ 15:00:55 <hrw> [o] 15:03:09 <yoctozepto> /o\ 15:03:48 <mnasiadka> o/ 15:04:49 <mgoddard> #topic agenda 15:04:57 <mgoddard> * Roll-call 15:04:59 <mgoddard> * Announcements 15:05:01 <mgoddard> * Review action items from the last meeting 15:05:03 <mgoddard> * CI status 15:05:04 <mgoddard> * Cleaning up disabled components and services (e.g. monasca/storm) https://review.opendev.org/c/openstack/kolla-ansible/+/769900 15:05:07 <mgoddard> * Allowed to fail images - next steps https://review.opendev.org/c/openstack/kolla/+/765807 15:05:09 <mgoddard> * Wallaby release planning 15:05:11 <mgoddard> #topic announcements 15:05:18 <mgoddard> I have none. Anyone else? 15:06:01 <mgoddard> #topic Review action items from the last meeting 15:06:05 <mgoddard> There were none 15:06:10 <mgoddard> #topic CI status 15:06:53 <mgoddard> How are we looking? 15:07:05 <mgoddard> We've had some issues with train, let's start there 15:07:08 <yoctozepto> good 15:07:34 <yoctozepto> everything works 15:08:27 <mgoddard> great. Thanks to those involved in rescuing it 15:08:35 <mgoddard> Stein has some issues 15:08:38 <mgoddard> https://review.opendev.org/c/openstack/kolla-ansible/+/772501 15:08:44 <mgoddard> https://review.opendev.org/c/openstack/kolla/+/772490 15:08:51 <mgoddard> Those two should address them 15:09:54 <mgoddard> optional: propose CI Slim Down in K-A (T-W) 15:09:57 <mgoddard> remove l-c jobs in K-A U, V & W (as agreed; already dropped in older) 15:10:11 <mgoddard> let's either do those or remove them from the board 15:10:16 <yoctozepto> do these 15:11:10 <mgoddard> I guess I'm undecided on the slim down 15:12:14 <mgoddard> has anyone actually run 'check experimental' in kolla since we removed the jobs? 15:12:26 <yoctozepto> I 15:12:27 <yoctozepto> once 15:12:40 <yoctozepto> I'd prefer we make CI work better 15:12:46 <yoctozepto> but we can't easily 15:12:58 <mgoddard> work better how? 15:14:56 <mnasiadka> I honestly don't think anybody will use check experimental 15:15:20 <mnasiadka> So, if we could have a list of problematic jobs, and vote on what to do with them - would probably be better 15:15:36 <mgoddard> most people don't know about it, and those who do are likely to forget 15:15:57 <mgoddard> it seems like a reasonable trade-off for kolla, but not so sure about k-a 15:16:24 <mgoddard> maybe we should have only one of each scenario 15:16:34 <mgoddard> and put the other in experimental 15:16:35 <yoctozepto> mgoddard: as in dockerhub not failing 15:16:43 <yoctozepto> and no disk_full 15:16:46 <yoctozepto> and no timeout 15:16:56 <yoctozepto> and no multiple retries 15:17:11 <yoctozepto> we got ourselves a ton of jobs 15:17:14 <yoctozepto> I liked adding them 15:17:21 <yoctozepto> but I don't like waiting for them nowadays :D 15:17:33 * yoctozepto practicing self-blame 15:17:53 <mnasiadka> well, I prefer to wait for them, instead of write check experimental and wait for them to fail :) 15:18:08 <mgoddard> +1 15:18:31 <mgoddard> we do have various things that make CI unreliable, but it does seem workable at the moment 15:19:21 <mnasiadka> I would rather keep a list of jobs that need to be looked into, if there's no interest - we can decide to remove them, or just disable them. 15:19:48 <mgoddard> or combine them 15:19:52 <yoctozepto> +1 15:20:05 <yoctozepto> let's not overwork ourselves for the moment 15:20:12 <yoctozepto> remove the point on slim down 15:20:20 <mgoddard> ok 15:20:22 <yoctozepto> and we will reconsider our options when it gets worse 15:20:31 <yoctozepto> kolla has it already 15:20:42 <yoctozepto> it sounds like a reasonable tradeoff 15:20:50 <mnasiadka> I know the ceph jobs are failing often, but I plan to move to use cephadm, and then investigate failures if they happen. 15:21:26 <yoctozepto> yeah, cephadm should help a bit; ceph-ansible is quite quirky 15:21:27 <mgoddard> are there legit ceph failures we should look into? 15:21:48 <yoctozepto> or perhaps we should just spin ceph manually for our simple scenario 15:21:55 <yoctozepto> it's a no-brainer xD 15:22:10 <yoctozepto> mgoddard: I don't think so 15:22:20 <yoctozepto> it's just feels worse than before 15:22:21 <mgoddard> ok 15:22:26 <mgoddard> let's move on, tired of CI :) 15:22:27 <yoctozepto> perhaps just due to multinode being worse 15:22:29 <yoctozepto> indeed! 15:22:38 <mgoddard> #topic Cleaning up disabled components and services (e.g. monasca/storm) https://review.opendev.org/c/openstack/kolla-ansible/+/769900 15:22:50 * yoctozepto to deprecate everything 15:23:09 <mgoddard> This came up while reviewing one of dougsz's patches 15:23:14 <mgoddard> https://review.opendev.org/c/openstack/kolla-ansible/+/769900 15:23:17 <hrw> monasca goes out? 15:23:20 <mgoddard> no 15:23:33 <mgoddard> but it's going on a diet :) 15:23:49 <mgoddard> some things will be removed 15:24:09 <mgoddard> and we'll add a flag to allow disabling the alerting pipeline 15:24:12 <mgoddard> https://review.opendev.org/c/openstack/kolla-ansible/+/769902/ 15:24:17 <mgoddard> so... 15:24:34 <mgoddard> there are two cases here 15:24:52 <mgoddard> 1. a container is no longer required and should be cleaned up 15:25:16 <mgoddard> 2. a container is disabled based on some feature flag, and should be cleaned up 15:25:18 <yoctozepto> ah, this general thing 15:25:38 <mgoddard> 1. I think should be done on upgrade 15:25:46 <yoctozepto> we should address both at once 15:26:22 <mgoddard> 2. is less clear. I don't really want a bunch of code that cleans up on every deploy 15:26:23 <yoctozepto> perhaps treat being disabled as removal? 15:26:30 <mgoddard> it will be messy and slow 15:26:33 <yoctozepto> and have a stub for those removed completely due to new release 15:26:50 <yoctozepto> hmm 15:27:06 <mgoddard> there is also a 3. which I forget 15:27:21 <mgoddard> 3. a service (enable_X) is disabled and should be cleaned up 15:27:53 <mgoddard> 3 is tricky because the site.yml playbook doesn't execute plays for disabled services 15:28:09 <mnasiadka> we need undeploy.yml in each role and a separate command to clean up? 15:28:19 <mgoddard> that is my preference 15:28:42 <mnasiadka> that sounds simplistic and doable in short time 15:28:54 <mgoddard> right 15:29:07 <mgoddard> however, I think dougsz didn't want to overload his patch 15:29:16 <mgoddard> so we came to a compromise 15:29:38 <mgoddard> he will implement a specific command to clean up the monasca alerting pipeline, which can be adapted into a general cleanup command 15:29:53 <mgoddard> e.g. kolla-ansible monasca-disable-alerting 15:30:01 <mgoddard> does that seem reasonable? 15:30:15 <wuchunyang> what about kolla-ansible destroy --tags service ? 15:30:25 <mgoddard> (dougsz is busy right now so can't join in) 15:30:55 <mgoddard> wuchunyang: I think destroy is subtley different 15:31:16 <mgoddard> destroy will clean up everything, regardless of enable flags, etc. 15:31:30 <mgoddard> cleanup should only destroy things that have been disabled 15:32:22 <mnasiadka> and still questionable if it should delete database entries and so on 15:32:33 <mgoddard> is this making sense? 15:32:42 <mgoddard> hmm, hadn't considered DB 15:32:51 <mgoddard> that would only apply if the whole service is disabled 15:33:02 <mgoddard> probably it will evolve some flags 15:33:21 <yoctozepto> it's tricky and sounds like a good topic for the next ptg 15:33:25 <mgoddard> --clean-db --clean-keystone --clean-mq 15:33:36 <yoctozepto> as the need grows :D 15:33:57 <mnasiadka> yeah, but the initial approach with Monasca sounds fine, we need to start somewhere 15:33:58 <yoctozepto> --yes-i-really-really-want-it-clean 15:33:59 <wuchunyang> sounds good 15:34:01 <mgoddard> but a simple deletion of containers and volumes would be a good start 15:34:08 <yoctozepto> yes 15:34:21 <mnasiadka> just need to make sure we adapt reno if we change the command name :) 15:34:26 <mnasiadka> (over time) 15:34:41 <mgoddard> I think it would be a new command 15:34:54 <wuchunyang> + 1 15:35:37 <mgoddard> any other thoughts or shall we progress? 15:36:13 <mgoddard> #topic Allowed to fail images - next steps https://review.opendev.org/c/openstack/kolla/+/765807 15:36:16 <mgoddard> ping hrw 15:36:30 <mgoddard> #link https://review.opendev.org/c/openstack/kolla/+/765807 15:36:46 <hrw> pong 15:36:48 <mgoddard> we had some discussion about this in IRC earlier 15:37:12 <mgoddard> that patch adds an option to kolla to mark some images as being allowed to fail 15:37:29 <mgoddard> the next step is to make use of this in our CI 15:37:45 <mgoddard> some key requirements for me: 15:38:15 <mgoddard> * a change that breaks an image must fail CI 15:38:42 <mgoddard> * ideally we have some visibility if an external change breaks an image 15:39:14 * hrw waits for end of list 15:39:23 <mgoddard> </end> 15:39:26 <hrw> ok. 15:39:47 <hrw> 1. 'change that breaks an image must fail CI' == what we have by default 15:40:02 <hrw> allowed-to-fail does not change anything here 15:40:54 <hrw> this feature is more for 'omg, xyz image failed again and the only core who understand it is on holidays. let use ignore it for a week with this patch' 15:41:14 <mgoddard> ok 15:41:26 <hrw> otherwise we would switch to profile builds which can list 'those images we build and they can not fail' 15:41:46 <mgoddard> I think if we keep the list empty by default it will work 15:41:47 <hrw> + non-voting job building everything to check are there images which fail 15:41:55 <hrw> that's the plan 15:42:03 <hrw> (empty list) 15:42:06 <mgoddard> ok 15:42:23 <hrw> 'ideally we have some visibility if an external change breaks an image' == normal CI too 15:43:06 <hrw> allowed-to-fail is kind of 'one step' before moving to UNBUILDABLE_IMAGES 15:43:07 <mgoddard> the original goal was for tiers of images, so I imagined having a static list of tier 3 images 15:43:34 <mgoddard> in which case they could have started failig silently 15:43:51 <mgoddard> default empty list works for me 15:44:10 <mgoddard> so perhaps you are right in saying that the feature is complete :) 15:44:13 <mgoddard> anyone disagree? 15:44:51 <mgoddard> great 15:44:53 <hrw> we have 3 levels for images: BUILD, allowed to fail, UNBUILDABLE 15:45:16 <hrw> with middle one being for emergencies 15:45:25 <mgoddard> hrw: could you add a few sentences on that to the contributor docs? 15:45:28 <yoctozepto> no disagreement 15:45:44 <hrw> mgoddard: #action please. will handle 15:46:22 <mgoddard> #action hrw to document allowed to fail & unbuildable images in contributor docs 15:46:42 <mgoddard> good job - nice and simple, should be useful 15:46:44 <hrw> and 'allowed to fail' is basically CI stuff which can be used by anyone 15:47:14 <mgoddard> PTL just too stoopid to understand it :D 15:47:32 <mgoddard> #topic Wallaby release planning 15:47:44 <hrw> mgoddard: after adding UNBUILDABLE I have a feeling that I understand all those lists. and that's scary 15:48:00 <hrw> ~3 months to release? 15:48:08 <mgoddard> I don't think we've talked dates yet 15:48:21 <hrw> April is usual date ;D 15:48:24 <mgoddard> I'll add a section to the whiteboard 15:48:52 <yoctozepto> ++ 15:49:02 <mgoddard> #link https://releases.openstack.org/wallaby/schedule.html 15:49:18 <hrw> I think that infra stuff would be nice. I may take a look and rewrite patches adding funcionality and skip CI changes 15:49:51 <mgoddard> Apr 12 - Apr 16 15:50:15 <mgoddard> ^ is OpenStack Wallaby GA 15:50:59 <hrw> so 1st April is a day when we should be ready with features, freeze, fix issues and release on 1st May 15:51:03 <mgoddard> Kolla feature freeze: Mar 29 - Apr 02 15:51:06 <mgoddard> Kolla RC1: Apr 12 - Apr 16 15:51:39 <mgoddard> CentOS 8 stream is a risk 15:51:58 <mgoddard> RDO is targeting stream for wallaby 15:52:55 <yoctozepto> it will freeze our blood 15:52:56 <yoctozepto> :-) 15:52:56 <mgoddard> just to reiterate - there are around 2 months until feature freeze 15:53:10 <mgoddard> now is the time to get working on features 15:53:11 <yoctozepto> only sad reactions! 15:53:12 <hrw> if RDO goes stream then so are we. 15:53:14 <mgoddard> (and reviewing) 15:53:20 <yoctozepto> :D 15:53:58 <hrw> which may end in centos:8 -> stream or ubi:8 as a base image 15:54:02 <mgoddard> 7 minutes to go. Shall we look at the release priorities? 15:55:01 <hrw> go, quickly 15:55:06 <mgoddard> this one is ever present, and never done: 15:55:09 <mgoddard> High level documentation, eg. examples of networking config, diagrams, justification of use of containers, not k8s etc 15:55:27 <mgoddard> but if anyone has time for docs, they will be appreciated 15:55:37 <mgoddard> Ability to run CI jobs locally (without Zuul, but possibly with Ansible) 15:55:41 <mgoddard> yoctozepto: any progress? 15:56:31 <yoctozepto> mgoddard: none 15:56:49 <mgoddard> ok 15:57:12 <mgoddard> Switch to CentOS 8 stream 15:57:17 <mgoddard> Anyone want to pick this up? 15:58:00 <hrw> I may if no one wants 15:58:14 <mgoddard> ok 15:58:20 <mgoddard> Infra images 15:58:33 <mgoddard> hrw: you mentioned adding features but skipping CI 15:58:43 <mgoddard> AFAIR, CI was the blocking part 15:58:54 <mgoddard> the CentOS combined job fails 15:59:14 <mgoddard> so we either need to fix that, or implement the multi-job pipeline 16:00:02 <hrw> I want to rewrite patches to have functionality for review while waiting for someone with ideas of CI changes 16:00:35 <hrw> it is quite old patchset so I first have to remind myself how it works 16:01:14 <mgoddard> I do wonder if the combined job could be made to work with a bit of debugging 16:01:39 <mgoddard> it could be a disk issue, try removing images after each build 16:02:20 <mgoddard> anyway, we are over time 16:02:23 <mgoddard> thanks all 16:02:24 <hrw> the good part is that CI changes can be done separately 16:02:34 <mgoddard> #endmeeting