15:01:22 <mgoddard> #startmeeting kolla 15:01:23 <openstack> Meeting started Wed Feb 10 15:01:22 2021 UTC and is due to finish in 60 minutes. The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:26 <openstack> The meeting name has been set to 'kolla' 15:01:44 <mgoddard> #topic rollcall 15:01:48 <mgoddard> \o 15:01:51 <yoctozepto> *o* 15:02:04 <openstackgerrit> Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm https://review.opendev.org/c/openstack/kolla-ansible/+/771418 15:02:10 <hrw> Io/ 15:03:06 <openstackgerrit> Mark Goddard proposed openstack/kolla-ansible master: CI: fix kolla-ansible installation after cryptography 3.4 release https://review.opendev.org/c/openstack/kolla-ansible/+/774622 15:04:21 <mgoddard> #topic agenda 15:04:25 <mgoddard> * Roll-call 15:04:26 <mgoddard> * Announcements 15:04:28 <mgoddard> * Review action items from the last meeting 15:04:30 <mgoddard> * CI status 15:04:32 <mgoddard> * CI resources & performance 15:04:34 <mgoddard> ** http://lists.openstack.org/pipermail/openstack-discuss/2021-February/020235.html 15:04:36 <mgoddard> ** https://docs.google.com/spreadsheets/d/1U_TaLoLTpX7ZC3v9cQsvBfoJtUI7ClAhkx0HYvUaFrM/edit#gid=1217205417 15:04:38 <mgoddard> * Review requests 15:04:40 <mgoddard> * Wallaby release planning 15:04:42 <mgoddard> #topic announcements 15:04:46 <mgoddard> Nothing from me. Anyone else? 15:05:20 <hrw> no 15:05:39 <mgoddard> #topic Review action items from the last meeting 15:05:54 <mgoddard> hrw to document allowed to fail & unbuildable images in contributor docs 15:05:56 <mgoddard> hrw to build libvirt 7.0.0 for Debian 15:06:00 <hrw> https://review.opendev.org/c/openstack/kolla/+/774618 15:06:01 <mgoddard> both proposed 15:06:05 <hrw> https://review.opendev.org/c/openstack/kolla/+/774351 15:06:35 <hrw> allowed to fail are not documented yet - have to find a place where to put it 15:07:08 <mgoddard> true 15:07:26 <mgoddard> #topic CI status 15:07:30 <mgoddard> RED 15:07:49 <mgoddard> https://review.opendev.org/c/openstack/kolla-ansible/+/774622 is the WIP fix 15:08:50 <hrw> train is RED too. https://review.opendev.org/c/openstack/kolla/+/774602 fixes sensu/ubuntu but bifrost is broken for centos 7/8 15:09:10 <hrw> ussuri is RED on aarch64 - https://review.opendev.org/c/openstack/kolla/+/774493 fixes it 15:10:45 <mgoddard> #action mgoddard fix bifrost on Train 15:12:22 <mgoddard> anything else for CI? 15:13:01 <hrw> probably that's all 15:13:10 <mgoddard> #topic CI resources & performance 15:13:21 <mgoddard> There is a thread on openstack-discuss 15:13:27 <mgoddard> #link http://lists.openstack.org/pipermail/openstack-discuss/2021-February/020235.html 15:13:35 <mgoddard> about CI resource usage 15:13:41 <mgoddard> kolla is in the top 5 15:14:03 <mgoddard> although way behind tripleo 15:14:05 <mgoddard> Project % of total Node Hours Nodes 15:14:07 <mgoddard> ------------------------------------------ 15:14:09 <mgoddard> 1. TripleO 38% 31 hours 20 15:14:11 <mgoddard> 2. Neutron 13% 38 hours 32 15:14:13 <mgoddard> 3. Nova 9% 21 hours 25 15:14:15 <mgoddard> 4. Kolla 5% 12 hours 18 15:14:17 <mgoddard> 5. OSA 5% 22 hours 17 15:14:34 <mnasiadka> but we have less node hours than OSA 15:14:50 <hrw> were we asked to lower usage? 15:14:57 <mnasiadka> whatever that means 15:15:13 <mgoddard> node hours is per changeset 15:15:17 <openstackgerrit> Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm https://review.opendev.org/c/openstack/kolla-ansible/+/771418 15:15:29 <hrw> mnasiadka: node hours == how much time CI run takes 15:15:42 <hrw> that's how I understood it 15:15:48 <yoctozepto> yeah and I think this was max 15:15:59 <yoctozepto> so it's very biased 15:16:02 <mnasiadka> True, I don't understand what Nodes mean, like at the same time? 15:16:24 <mgoddard> how many nodes to run jobs for one changeset 15:16:31 <yoctozepto> again, I think it's max for changeset 15:16:36 <yoctozepto> mhm 15:16:40 <mgoddard> I would expect average 15:16:49 <mgoddard> max for kolla could be huge 15:17:02 <yoctozepto> true that 15:17:10 <yoctozepto> though then nodehours 15:17:13 <yoctozepto> can't be avg 15:17:24 <mgoddard> why not? 15:17:38 <yoctozepto> you think we average 12 hours? 15:17:43 <mgoddard> sure 15:17:48 <mgoddard> most jobs are ~1 hour 15:17:55 <mgoddard> some more 15:18:08 <yoctozepto> ah, you mean like consumed node time 15:18:10 <mgoddard> multinode upgrade jobs might be 4-5 node hours 15:18:17 <yoctozepto> true 15:18:20 <yoctozepto> makes sense 15:18:23 <mgoddard> nodes * time = node hours 15:18:33 <yoctozepto> yeah, yeah 15:18:42 <mgoddard> i.e. what you would pay for on a cloud 15:18:45 <yoctozepto> now it makes sense 15:18:58 <mgoddard> anyways 15:19:10 <mgoddard> I promised I would raise it at this meeting 15:19:14 <mgoddard> now let's move on 15:19:15 <mgoddard> :p 15:19:21 <yoctozepto> you are my hero 15:19:47 <mgoddard> #link https://docs.google.com/spreadsheets/d/1U_TaLoLTpX7ZC3v9cQsvBfoJtUI7ClAhkx0HYvUaFrM/edit#gid=1217205417 15:19:57 <mgoddard> I did some playing with zuul API 15:20:14 <mgoddard> last 1000 jobs across all branches & kolla projects 15:20:35 <mgoddard> pivot table is most interesting 15:21:50 <mgoddard> I expect we could slice the data more 15:22:07 <yoctozepto> top offenders - ceph jobs 15:22:28 <mgoddard> yes 15:22:57 <yoctozepto> though I wonder if it's simply because of using ceph 15:23:04 <yoctozepto> or because we have to bring up and test cinder 15:23:09 <mgoddard> interestingly the scenario jobs are generally very low 15:23:22 <yoctozepto> they run much less often 15:23:33 <mgoddard> right 15:23:45 <mgoddard> but that does make a good argument against moving them to experimental 15:23:50 <mnasiadka> Well, all ceph jobs use ceph-ansible to deploy, it takes reasonable amount of time to deploy even a tiny ceph cluster 15:23:55 <mgoddard> assuming 1000 jobs is enough data 15:24:17 <mgoddard> mnasiadka: multinode is the main killer - immediate 3x 15:24:23 <yoctozepto> yeah, but 15:24:28 <yoctozepto> compare with multinode ipv6 15:25:11 <yoctozepto> about 75% penalty on ceph jobs 15:25:27 <yoctozepto> but in them we do ceph and cinder 15:25:31 <mgoddard> yes, but significant extra coverage 15:25:50 <yoctozepto> yeah, but I do wonder if we switch to cephadm 15:25:58 <yoctozepto> or perhaps to a custom way 15:26:07 <yoctozepto> (we don't really do a production cluster anyway) 15:26:14 <mnasiadka> then we can compare once we switch, I have time to work on that now - so we'll know next week I guess. 15:26:20 <yoctozepto> great 15:26:27 <openstackgerrit> Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm https://review.opendev.org/c/openstack/kolla-ansible/+/771418 15:26:32 <mnasiadka> we probably could have even a single node ceph 15:26:44 <yoctozepto> and then only continue with optimisations we do for k-a ansible 15:26:47 <mnasiadka> and it wouldn't hurt 15:27:01 <mgoddard> should we consider two nodes in multinode jobs? 15:27:02 <mnasiadka> (if there's a big time difference between deploying on 1 and 3 nodes) :) 15:27:21 <yoctozepto> mnasiadka: true that 15:27:28 <yoctozepto> mgoddard: we can evaluate that as well 15:27:31 <mnasiadka> two nodes? I don't think it's our recommendation in HA - the docs recommend 3 nodes? 15:27:33 <mgoddard> mnasiadka: from a resource usage perspective it's 3x 15:27:50 <yoctozepto> mgoddard, mnasiadka: we can still test clustering correctness 15:27:52 <yoctozepto> in jobs like mariadb 15:27:56 <yoctozepto> which are much faster 15:28:18 <yoctozepto> we could create a simple job that test mariadb+rabbitmq+anything_else_that_we_need_to_ha_cluster 15:28:29 <yoctozepto> and avoid multinode in biggur scenarios 15:28:48 <yoctozepto> we could use some unit testing 15:28:53 <yoctozepto> if we loop correctly in templates 15:28:57 <yoctozepto> and things like that 15:29:04 <yoctozepto> but it is much less likely to break it 15:29:18 <yoctozepto> than simply break the process of deploying/upgrading clustered resources 15:30:32 <yoctozepto> we can also avoid multinode jobs duplicates per distro 15:30:41 <yoctozepto> as it's less likely that clustering logic breaks between distros 15:30:44 <mgoddard> it is good to test 3 nodes. I guess we just need to be careful about when we use it 15:30:49 <yoctozepto> it's more likely that we break altogether 15:31:06 <yoctozepto> let's do it in small steps 15:31:26 <mgoddard> we could do centos multinode ceph, ubuntu single node ceph 15:31:27 <yoctozepto> perhaps ditching ceph-ansible will give us a sane boost to performance and reliability 15:31:34 <mgoddard> then the reverse for the ceph upgrade jobs 15:31:41 <yoctozepto> yeah, that works too 15:31:56 <mnasiadka> we could run ceph upgrade jobs as periodicals, I think they are the main killer :) 15:32:00 <hrw> if weeknumber is odd then centos. if even then ubuntu 15:32:01 <mnasiadka> they/those 15:32:16 <yoctozepto> yup, periodics are another path 15:32:24 <yoctozepto> hrw: oh, I like it 15:32:27 <hrw> such check would not require changing CI scripts 15:32:27 <yoctozepto> random CI 15:32:35 <yoctozepto> and then 15:32:40 <yoctozepto> let's merge it on an odd week 15:32:52 <yoctozepto> because it fails on even ones 15:34:40 <mgoddard> let's try to keep it predictable :) 15:35:47 <mgoddard> ok, I think we have some things to try 15:36:11 <mgoddard> I'm going to propose restricting the cells job to nova changes 15:37:10 <yoctozepto> and soon mariadb ones 15:37:13 <yoctozepto> (for shards) 15:37:19 <mgoddard> right 15:37:26 <mgoddard> #topic Review requests 15:37:30 <mgoddard> Same as last time 15:37:41 <mgoddard> Does anyone have a patch they would like to be reviewed? 15:38:26 <hrw> one patch: https://review.opendev.org/c/openstack/kolla/+/772507 15:38:53 <hrw> passed zuul, yoctozepto gave +2 15:39:16 <openstackgerrit> Radosław Piliszek proposed openstack/kolla-ansible master: Lint and fix renos https://review.opendev.org/c/openstack/kolla-ansible/+/759370 15:39:53 <yoctozepto> well, you can w+1 ^ 15:40:17 <yoctozepto> I am a bit overloaded and don't seem to have anything ready proposed at the moment 15:41:07 <hrw> lgtm 15:41:15 <mgoddard> don't +A 15:41:21 <mgoddard> linters job is broken 15:41:36 <yoctozepto> oh 15:41:37 <yoctozepto> sad 15:41:42 <mgoddard> yeah 15:41:45 <mgoddard> ansible-lint 15:41:46 <openstackgerrit> Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm https://review.opendev.org/c/openstack/kolla-ansible/+/771418 15:41:50 <yoctozepto> argh 15:41:53 <mgoddard> worst linter ever 15:41:55 <mnasiadka> ansible-lint broken again? 15:42:07 <yoctozepto> just drop it 15:42:08 <hrw> mgoddard: add --exclude releasenotes and be done 15:42:14 <yoctozepto> or make it non-voting 15:42:15 <mgoddard> hrw: not required 15:42:27 <mgoddard> it's fixed in my patch 15:42:32 <hrw> mgoddard: ok 15:42:36 <yoctozepto> let it help us, not hinder us 15:42:42 <mgoddard> we put in the effort to get it passing, may as well keep it 15:43:15 <yoctozepto> we did 15:43:25 <mgoddard> I'm going to make https://review.opendev.org/c/openstack/kolla-ansible/+/695432/ my review request this week 15:43:50 <yoctozepto> that's a review bomb for sure 15:44:27 <mgoddard> boom 15:44:29 <yoctozepto> it conflicts with half the proposed changes as well I see :-) 15:44:36 <yoctozepto> tough nut 15:44:51 <mgoddard> indeed 15:44:52 <hrw> can we add 'must pass zuul' to this request queue? 15:45:05 <mnasiadka> mgoddard: eek, balance source in haproxy? 15:45:25 <hrw> or 'must pass zuul if core wrote' ;D 15:45:43 <hrw> ok, I have to go. thanks for meeting 15:45:46 <mgoddard> mnasiadka: it that's exactly the kind of comment that should be in the review :) 15:45:52 <mgoddard> thanks headphoneJames 15:45:55 <mgoddard> sorry, hrw 15:46:02 <mgoddard> back to sleep headphoneJames 15:46:04 <mnasiadka> mgoddard: I'll force myself to review that tomorrow ;) 15:46:30 <mgoddard> it has been around for a long time and the authors have been quite responsive, helpful, and receptive to change 15:46:44 <mgoddard> I'd like to at least merge that patch and adapt it if we need to 15:47:16 <mgoddard> mnasiadka: just don't mention SAML ;) 15:47:35 <mnasiadka> Of course I will, OIDC is not 90% of usage, I would rather claim otherwise :) 15:48:04 <yoctozepto> I have added my first comment 15:48:07 <mnasiadka> Anyway, let's not incite religious wars. 15:48:24 <yoctozepto> XML4life 15:48:38 <mnasiadka> I never had the ability to read XML files. 15:48:58 <yoctozepto> they are the best test for patience 15:49:04 <mnasiadka> It's like obfuscated poetry from an oppressed country. 15:49:23 <yoctozepto> hahahaha 15:49:26 <mgoddard> it stands for XXXX my life 15:49:38 <yoctozepto> the water I was drinking left me through my nose 15:49:41 <yoctozepto> thank you very much 15:49:49 <mgoddard> hope your computer is dry 15:49:56 <yoctozepto> drying it 15:50:23 <yoctozepto> don't drink and kollameet 15:51:03 <mgoddard> #topic Wallaby release planning 15:51:07 <mnasiadka> Around reviews - can somebody look at this: https://review.opendev.org/c/openstack/kolla-ansible/+/774222 ? It solves cases when Neutron does not support OVN native DHCP yet. 15:51:32 <yoctozepto> wallaby is the cycle we spend on fixing our ci fires 15:51:56 <mgoddard> Mar 29 - Apr 02 15:52:13 <mgoddard> yoctozepto: every cycle is the cycle we spend on fixing our ci fires 15:52:21 <mnasiadka> and not only our 15:52:37 <yoctozepto> this one feels sadder 15:53:19 <mgoddard> I'll take a look 15:53:39 <mgoddard> mnasiadka: what are those cases? bare metal? 15:53:55 <mnasiadka> mgoddard: yes, bare metal 15:54:21 <mnasiadka> OVN has functionality that would fit, but Neutron seems to be lacking here and there... 15:55:13 <mgoddard> hmm, surprised lucasgomes is not on top of it 15:55:27 <mnasiadka> well, he's the one proposing to use neutron-dhcp-agent ) 15:55:29 <mnasiadka> :) 15:55:32 <mgoddard> ah 15:55:55 <mgoddard> well there is no better intersection of ironic & ovn knowledge 15:56:20 <mgoddard> #topic open discussion 15:56:26 <mgoddard> anyone have anything this week? 15:58:23 <mnasiadka> I was thinking about extending kayobe's seed-custom-containers to overcloud-custom-containers - is that something we want to do? 15:58:47 <mgoddard> mnasiadka: I guess I see kolla-ansible as the thing that deploys containers 15:59:07 <mgoddard> is there a particular use case in mind? 15:59:48 <yoctozepto> nothing new from me 15:59:50 <mnasiadka> well, there are some components, that are not covered by kolla-ansible - I can write an external role to do it, or just do ^^ 16:00:02 <mnasiadka> like I don't know... consul and hashicorp vault 16:00:23 <mgoddard> I see 16:01:29 <mgoddard> it probably needs some thought 16:01:57 <mnasiadka> for sure, we need to make sure we don't have a naming collision for containers and volumes 16:02:15 <mnasiadka> which might be not so straightforward to write 16:02:16 <mgoddard> we did discuss making some plugin/extension mechanism for kolla-ansible at the PTG 16:02:39 <mgoddard> prometheus exporters being the initial use case 16:02:46 <mgoddard> but potentiall this fits too 16:02:59 <mnasiadka> yeah, while adding roles from a different path (or using ansible galaxy) is not problematic, we would need to find a way to resolve dependencies/order of running 16:03:27 <mgoddard> I think there is a question of when you would want to deploy those things - are they coupled to openstack, or not? 16:03:37 <mgoddard> probably you would not want to touch vault too often 16:04:07 <mnasiadka> well - touching vault means unsealing - so no :) 16:04:08 <mgoddard> anyway, meeting over folks 16:04:14 <mgoddard> thanks 16:04:16 <mgoddard> #endmeeting