15:01:22 #startmeeting kolla 15:01:23 Meeting started Wed Feb 10 15:01:22 2021 UTC and is due to finish in 60 minutes. The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:26 The meeting name has been set to 'kolla' 15:01:44 #topic rollcall 15:01:48 \o 15:01:51 *o* 15:02:04 Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm https://review.opendev.org/c/openstack/kolla-ansible/+/771418 15:02:10 Io/ 15:03:06 Mark Goddard proposed openstack/kolla-ansible master: CI: fix kolla-ansible installation after cryptography 3.4 release https://review.opendev.org/c/openstack/kolla-ansible/+/774622 15:04:21 #topic agenda 15:04:25 * Roll-call 15:04:26 * Announcements 15:04:28 * Review action items from the last meeting 15:04:30 * CI status 15:04:32 * CI resources & performance 15:04:34 ** http://lists.openstack.org/pipermail/openstack-discuss/2021-February/020235.html 15:04:36 ** https://docs.google.com/spreadsheets/d/1U_TaLoLTpX7ZC3v9cQsvBfoJtUI7ClAhkx0HYvUaFrM/edit#gid=1217205417 15:04:38 * Review requests 15:04:40 * Wallaby release planning 15:04:42 #topic announcements 15:04:46 Nothing from me. Anyone else? 15:05:20 no 15:05:39 #topic Review action items from the last meeting 15:05:54 hrw to document allowed to fail & unbuildable images in contributor docs 15:05:56 hrw to build libvirt 7.0.0 for Debian 15:06:00 https://review.opendev.org/c/openstack/kolla/+/774618 15:06:01 both proposed 15:06:05 https://review.opendev.org/c/openstack/kolla/+/774351 15:06:35 allowed to fail are not documented yet - have to find a place where to put it 15:07:08 true 15:07:26 #topic CI status 15:07:30 RED 15:07:49 https://review.opendev.org/c/openstack/kolla-ansible/+/774622 is the WIP fix 15:08:50 train is RED too. https://review.opendev.org/c/openstack/kolla/+/774602 fixes sensu/ubuntu but bifrost is broken for centos 7/8 15:09:10 ussuri is RED on aarch64 - https://review.opendev.org/c/openstack/kolla/+/774493 fixes it 15:10:45 #action mgoddard fix bifrost on Train 15:12:22 anything else for CI? 15:13:01 probably that's all 15:13:10 #topic CI resources & performance 15:13:21 There is a thread on openstack-discuss 15:13:27 #link http://lists.openstack.org/pipermail/openstack-discuss/2021-February/020235.html 15:13:35 about CI resource usage 15:13:41 kolla is in the top 5 15:14:03 although way behind tripleo 15:14:05 Project % of total Node Hours Nodes 15:14:07 ------------------------------------------ 15:14:09 1. TripleO 38% 31 hours 20 15:14:11 2. Neutron 13% 38 hours 32 15:14:13 3. Nova 9% 21 hours 25 15:14:15 4. Kolla 5% 12 hours 18 15:14:17 5. OSA 5% 22 hours 17 15:14:34 but we have less node hours than OSA 15:14:50 were we asked to lower usage? 15:14:57 whatever that means 15:15:13 node hours is per changeset 15:15:17 Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm https://review.opendev.org/c/openstack/kolla-ansible/+/771418 15:15:29 mnasiadka: node hours == how much time CI run takes 15:15:42 that's how I understood it 15:15:48 yeah and I think this was max 15:15:59 so it's very biased 15:16:02 True, I don't understand what Nodes mean, like at the same time? 15:16:24 how many nodes to run jobs for one changeset 15:16:31 again, I think it's max for changeset 15:16:36 mhm 15:16:40 I would expect average 15:16:49 max for kolla could be huge 15:17:02 true that 15:17:10 though then nodehours 15:17:13 can't be avg 15:17:24 why not? 15:17:38 you think we average 12 hours? 15:17:43 sure 15:17:48 most jobs are ~1 hour 15:17:55 some more 15:18:08 ah, you mean like consumed node time 15:18:10 multinode upgrade jobs might be 4-5 node hours 15:18:17 true 15:18:20 makes sense 15:18:23 nodes * time = node hours 15:18:33 yeah, yeah 15:18:42 i.e. what you would pay for on a cloud 15:18:45 now it makes sense 15:18:58 anyways 15:19:10 I promised I would raise it at this meeting 15:19:14 now let's move on 15:19:15 :p 15:19:21 you are my hero 15:19:47 #link https://docs.google.com/spreadsheets/d/1U_TaLoLTpX7ZC3v9cQsvBfoJtUI7ClAhkx0HYvUaFrM/edit#gid=1217205417 15:19:57 I did some playing with zuul API 15:20:14 last 1000 jobs across all branches & kolla projects 15:20:35 pivot table is most interesting 15:21:50 I expect we could slice the data more 15:22:07 top offenders - ceph jobs 15:22:28 yes 15:22:57 though I wonder if it's simply because of using ceph 15:23:04 or because we have to bring up and test cinder 15:23:09 interestingly the scenario jobs are generally very low 15:23:22 they run much less often 15:23:33 right 15:23:45 but that does make a good argument against moving them to experimental 15:23:50 Well, all ceph jobs use ceph-ansible to deploy, it takes reasonable amount of time to deploy even a tiny ceph cluster 15:23:55 assuming 1000 jobs is enough data 15:24:17 mnasiadka: multinode is the main killer - immediate 3x 15:24:23 yeah, but 15:24:28 compare with multinode ipv6 15:25:11 about 75% penalty on ceph jobs 15:25:27 but in them we do ceph and cinder 15:25:31 yes, but significant extra coverage 15:25:50 yeah, but I do wonder if we switch to cephadm 15:25:58 or perhaps to a custom way 15:26:07 (we don't really do a production cluster anyway) 15:26:14 then we can compare once we switch, I have time to work on that now - so we'll know next week I guess. 15:26:20 great 15:26:27 Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm https://review.opendev.org/c/openstack/kolla-ansible/+/771418 15:26:32 we probably could have even a single node ceph 15:26:44 and then only continue with optimisations we do for k-a ansible 15:26:47 and it wouldn't hurt 15:27:01 should we consider two nodes in multinode jobs? 15:27:02 (if there's a big time difference between deploying on 1 and 3 nodes) :) 15:27:21 mnasiadka: true that 15:27:28 mgoddard: we can evaluate that as well 15:27:31 two nodes? I don't think it's our recommendation in HA - the docs recommend 3 nodes? 15:27:33 mnasiadka: from a resource usage perspective it's 3x 15:27:50 mgoddard, mnasiadka: we can still test clustering correctness 15:27:52 in jobs like mariadb 15:27:56 which are much faster 15:28:18 we could create a simple job that test mariadb+rabbitmq+anything_else_that_we_need_to_ha_cluster 15:28:29 and avoid multinode in biggur scenarios 15:28:48 we could use some unit testing 15:28:53 if we loop correctly in templates 15:28:57 and things like that 15:29:04 but it is much less likely to break it 15:29:18 than simply break the process of deploying/upgrading clustered resources 15:30:32 we can also avoid multinode jobs duplicates per distro 15:30:41 as it's less likely that clustering logic breaks between distros 15:30:44 it is good to test 3 nodes. I guess we just need to be careful about when we use it 15:30:49 it's more likely that we break altogether 15:31:06 let's do it in small steps 15:31:26 we could do centos multinode ceph, ubuntu single node ceph 15:31:27 perhaps ditching ceph-ansible will give us a sane boost to performance and reliability 15:31:34 then the reverse for the ceph upgrade jobs 15:31:41 yeah, that works too 15:31:56 we could run ceph upgrade jobs as periodicals, I think they are the main killer :) 15:32:00 if weeknumber is odd then centos. if even then ubuntu 15:32:01 they/those 15:32:16 yup, periodics are another path 15:32:24 hrw: oh, I like it 15:32:27 such check would not require changing CI scripts 15:32:27 random CI 15:32:35 and then 15:32:40 let's merge it on an odd week 15:32:52 because it fails on even ones 15:34:40 let's try to keep it predictable :) 15:35:47 ok, I think we have some things to try 15:36:11 I'm going to propose restricting the cells job to nova changes 15:37:10 and soon mariadb ones 15:37:13 (for shards) 15:37:19 right 15:37:26 #topic Review requests 15:37:30 Same as last time 15:37:41 Does anyone have a patch they would like to be reviewed? 15:38:26 one patch: https://review.opendev.org/c/openstack/kolla/+/772507 15:38:53 passed zuul, yoctozepto gave +2 15:39:16 Radosław Piliszek proposed openstack/kolla-ansible master: Lint and fix renos https://review.opendev.org/c/openstack/kolla-ansible/+/759370 15:39:53 well, you can w+1 ^ 15:40:17 I am a bit overloaded and don't seem to have anything ready proposed at the moment 15:41:07 lgtm 15:41:15 don't +A 15:41:21 linters job is broken 15:41:36 oh 15:41:37 sad 15:41:42 yeah 15:41:45 ansible-lint 15:41:46 Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm https://review.opendev.org/c/openstack/kolla-ansible/+/771418 15:41:50 argh 15:41:53 worst linter ever 15:41:55 ansible-lint broken again? 15:42:07 just drop it 15:42:08 mgoddard: add --exclude releasenotes and be done 15:42:14 or make it non-voting 15:42:15 hrw: not required 15:42:27 it's fixed in my patch 15:42:32 mgoddard: ok 15:42:36 let it help us, not hinder us 15:42:42 we put in the effort to get it passing, may as well keep it 15:43:15 we did 15:43:25 I'm going to make https://review.opendev.org/c/openstack/kolla-ansible/+/695432/ my review request this week 15:43:50 that's a review bomb for sure 15:44:27 boom 15:44:29 it conflicts with half the proposed changes as well I see :-) 15:44:36 tough nut 15:44:51 indeed 15:44:52 can we add 'must pass zuul' to this request queue? 15:45:05 mgoddard: eek, balance source in haproxy? 15:45:25 or 'must pass zuul if core wrote' ;D 15:45:43 ok, I have to go. thanks for meeting 15:45:46 mnasiadka: it that's exactly the kind of comment that should be in the review :) 15:45:52 thanks headphoneJames 15:45:55 sorry, hrw 15:46:02 back to sleep headphoneJames 15:46:04 mgoddard: I'll force myself to review that tomorrow ;) 15:46:30 it has been around for a long time and the authors have been quite responsive, helpful, and receptive to change 15:46:44 I'd like to at least merge that patch and adapt it if we need to 15:47:16 mnasiadka: just don't mention SAML ;) 15:47:35 Of course I will, OIDC is not 90% of usage, I would rather claim otherwise :) 15:48:04 I have added my first comment 15:48:07 Anyway, let's not incite religious wars. 15:48:24 XML4life 15:48:38 I never had the ability to read XML files. 15:48:58 they are the best test for patience 15:49:04 It's like obfuscated poetry from an oppressed country. 15:49:23 hahahaha 15:49:26 it stands for XXXX my life 15:49:38 the water I was drinking left me through my nose 15:49:41 thank you very much 15:49:49 hope your computer is dry 15:49:56 drying it 15:50:23 don't drink and kollameet 15:51:03 #topic Wallaby release planning 15:51:07 Around reviews - can somebody look at this: https://review.opendev.org/c/openstack/kolla-ansible/+/774222 ? It solves cases when Neutron does not support OVN native DHCP yet. 15:51:32 wallaby is the cycle we spend on fixing our ci fires 15:51:56 Mar 29 - Apr 02 15:52:13 yoctozepto: every cycle is the cycle we spend on fixing our ci fires 15:52:21 and not only our 15:52:37 this one feels sadder 15:53:19 I'll take a look 15:53:39 mnasiadka: what are those cases? bare metal? 15:53:55 mgoddard: yes, bare metal 15:54:21 OVN has functionality that would fit, but Neutron seems to be lacking here and there... 15:55:13 hmm, surprised lucasgomes is not on top of it 15:55:27 well, he's the one proposing to use neutron-dhcp-agent ) 15:55:29 :) 15:55:32 ah 15:55:55 well there is no better intersection of ironic & ovn knowledge 15:56:20 #topic open discussion 15:56:26 anyone have anything this week? 15:58:23 I was thinking about extending kayobe's seed-custom-containers to overcloud-custom-containers - is that something we want to do? 15:58:47 mnasiadka: I guess I see kolla-ansible as the thing that deploys containers 15:59:07 is there a particular use case in mind? 15:59:48 nothing new from me 15:59:50 well, there are some components, that are not covered by kolla-ansible - I can write an external role to do it, or just do ^^ 16:00:02 like I don't know... consul and hashicorp vault 16:00:23 I see 16:01:29 it probably needs some thought 16:01:57 for sure, we need to make sure we don't have a naming collision for containers and volumes 16:02:15 which might be not so straightforward to write 16:02:16 we did discuss making some plugin/extension mechanism for kolla-ansible at the PTG 16:02:39 prometheus exporters being the initial use case 16:02:46 but potentiall this fits too 16:02:59 yeah, while adding roles from a different path (or using ansible galaxy) is not problematic, we would need to find a way to resolve dependencies/order of running 16:03:27 I think there is a question of when you would want to deploy those things - are they coupled to openstack, or not? 16:03:37 probably you would not want to touch vault too often 16:04:07 well - touching vault means unsealing - so no :) 16:04:08 anyway, meeting over folks 16:04:14 thanks 16:04:16 #endmeeting