#openstack-kolla log

15:01:22 <mgoddard> #startmeeting kolla
15:01:23 <openstack> Meeting started Wed Feb 10 15:01:22 2021 UTC and is due to finish in 60 minutes.  The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:26 <openstack> The meeting name has been set to 'kolla'
15:01:44 <mgoddard> #topic rollcall
15:01:48 <mgoddard> \o
15:01:51 <yoctozepto> *o*
15:02:04 <openstackgerrit> Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm  https://review.opendev.org/c/openstack/kolla-ansible/+/771418
15:02:10 <hrw> Io/
15:03:06 <openstackgerrit> Mark Goddard proposed openstack/kolla-ansible master: CI: fix kolla-ansible installation after cryptography 3.4 release  https://review.opendev.org/c/openstack/kolla-ansible/+/774622
15:04:21 <mgoddard> #topic agenda
15:04:25 <mgoddard> * Roll-call
15:04:26 <mgoddard> * Announcements
15:04:28 <mgoddard> * Review action items from the last meeting
15:04:30 <mgoddard> * CI status
15:04:32 <mgoddard> * CI resources & performance
15:04:34 <mgoddard> ** http://lists.openstack.org/pipermail/openstack-discuss/2021-February/020235.html
15:04:36 <mgoddard> ** https://docs.google.com/spreadsheets/d/1U_TaLoLTpX7ZC3v9cQsvBfoJtUI7ClAhkx0HYvUaFrM/edit#gid=1217205417
15:04:38 <mgoddard> * Review requests
15:04:40 <mgoddard> * Wallaby release planning
15:04:42 <mgoddard> #topic announcements
15:04:46 <mgoddard> Nothing from me. Anyone else?
15:05:20 <hrw> no
15:05:39 <mgoddard> #topic Review action items from the last meeting
15:05:54 <mgoddard> hrw to document allowed to fail & unbuildable images in contributor docs
15:05:56 <mgoddard> hrw to build libvirt 7.0.0 for Debian
15:06:00 <hrw> https://review.opendev.org/c/openstack/kolla/+/774618
15:06:01 <mgoddard> both proposed
15:06:05 <hrw> https://review.opendev.org/c/openstack/kolla/+/774351
15:06:35 <hrw> allowed to fail are not documented yet - have to find a place where to put it
15:07:08 <mgoddard> true
15:07:26 <mgoddard> #topic CI status
15:07:30 <mgoddard> RED
15:07:49 <mgoddard> https://review.opendev.org/c/openstack/kolla-ansible/+/774622 is the WIP fix
15:08:50 <hrw> train is RED too. https://review.opendev.org/c/openstack/kolla/+/774602 fixes sensu/ubuntu but bifrost is broken for centos 7/8
15:09:10 <hrw> ussuri is RED on aarch64 - https://review.opendev.org/c/openstack/kolla/+/774493 fixes it
15:10:45 <mgoddard> #action mgoddard fix bifrost on Train
15:12:22 <mgoddard> anything else for CI?
15:13:01 <hrw> probably that's all
15:13:10 <mgoddard> #topic CI resources & performance
15:13:21 <mgoddard> There is a thread on openstack-discuss
15:13:27 <mgoddard> #link http://lists.openstack.org/pipermail/openstack-discuss/2021-February/020235.html
15:13:35 <mgoddard> about CI resource usage
15:13:41 <mgoddard> kolla is in the top 5
15:14:03 <mgoddard> although way behind tripleo
15:14:05 <mgoddard> Project     % of total  Node Hours  Nodes
15:14:07 <mgoddard> ------------------------------------------
15:14:09 <mgoddard> 1. TripleO    38%       31 hours     20
15:14:11 <mgoddard> 2. Neutron    13%       38 hours     32
15:14:13 <mgoddard> 3. Nova       9%        21 hours     25
15:14:15 <mgoddard> 4. Kolla      5%        12 hours     18
15:14:17 <mgoddard> 5. OSA        5%        22 hours     17
15:14:34 <mnasiadka> but we have less node hours than OSA
15:14:50 <hrw> were we asked to lower usage?
15:14:57 <mnasiadka> whatever that means
15:15:13 <mgoddard> node hours is per changeset
15:15:17 <openstackgerrit> Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm  https://review.opendev.org/c/openstack/kolla-ansible/+/771418
15:15:29 <hrw> mnasiadka: node hours == how much time CI run takes
15:15:42 <hrw> that's how I understood it
15:15:48 <yoctozepto> yeah and I think this was max
15:15:59 <yoctozepto> so it's very biased
15:16:02 <mnasiadka> True, I don't understand what Nodes mean, like at the same time?
15:16:24 <mgoddard> how many nodes to run jobs for one changeset
15:16:31 <yoctozepto> again, I think it's max for changeset
15:16:36 <yoctozepto> mhm
15:16:40 <mgoddard> I would expect average
15:16:49 <mgoddard> max for kolla could be huge
15:17:02 <yoctozepto> true that
15:17:10 <yoctozepto> though then nodehours
15:17:13 <yoctozepto> can't be avg
15:17:24 <mgoddard> why not?
15:17:38 <yoctozepto> you think we average 12 hours?
15:17:43 <mgoddard> sure
15:17:48 <mgoddard> most jobs are ~1 hour
15:17:55 <mgoddard> some more
15:18:08 <yoctozepto> ah, you mean like consumed node time
15:18:10 <mgoddard> multinode upgrade jobs might be 4-5 node hours
15:18:17 <yoctozepto> true
15:18:20 <yoctozepto> makes sense
15:18:23 <mgoddard> nodes * time = node hours
15:18:33 <yoctozepto> yeah, yeah
15:18:42 <mgoddard> i.e. what you would pay for on a cloud
15:18:45 <yoctozepto> now it makes sense
15:18:58 <mgoddard> anyways
15:19:10 <mgoddard> I promised I would raise it at this meeting
15:19:14 <mgoddard> now let's move on
15:19:15 <mgoddard> :p
15:19:21 <yoctozepto> you are my hero
15:19:47 <mgoddard> #link https://docs.google.com/spreadsheets/d/1U_TaLoLTpX7ZC3v9cQsvBfoJtUI7ClAhkx0HYvUaFrM/edit#gid=1217205417
15:19:57 <mgoddard> I did some playing with zuul API
15:20:14 <mgoddard> last 1000 jobs across all branches & kolla projects
15:20:35 <mgoddard> pivot table is most interesting
15:21:50 <mgoddard> I expect we could slice the data more
15:22:07 <yoctozepto> top offenders - ceph jobs
15:22:28 <mgoddard> yes
15:22:57 <yoctozepto> though I wonder if it's simply because of using ceph
15:23:04 <yoctozepto> or because we have to bring up and test cinder
15:23:09 <mgoddard> interestingly the scenario jobs are generally very low
15:23:22 <yoctozepto> they run much less often
15:23:33 <mgoddard> right
15:23:45 <mgoddard> but that does make a good argument against moving them to experimental
15:23:50 <mnasiadka> Well, all ceph jobs use ceph-ansible to deploy, it takes reasonable amount of time to deploy even a tiny ceph cluster
15:23:55 <mgoddard> assuming 1000 jobs is enough data
15:24:17 <mgoddard> mnasiadka: multinode is the main killer - immediate 3x
15:24:23 <yoctozepto> yeah, but
15:24:28 <yoctozepto> compare with multinode ipv6
15:25:11 <yoctozepto> about 75% penalty on ceph jobs
15:25:27 <yoctozepto> but in them we do ceph and cinder
15:25:31 <mgoddard> yes, but significant extra coverage
15:25:50 <yoctozepto> yeah, but I do wonder if we switch to cephadm
15:25:58 <yoctozepto> or perhaps to a custom way
15:26:07 <yoctozepto> (we don't really do a production cluster anyway)
15:26:14 <mnasiadka> then we can compare once we switch, I have time to work on that now - so we'll know next week I guess.
15:26:20 <yoctozepto> great
15:26:27 <openstackgerrit> Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm  https://review.opendev.org/c/openstack/kolla-ansible/+/771418
15:26:32 <mnasiadka> we probably could have even a single node ceph
15:26:44 <yoctozepto> and then only continue with optimisations we do for k-a ansible
15:26:47 <mnasiadka> and it wouldn't hurt
15:27:01 <mgoddard> should we consider two nodes in multinode jobs?
15:27:02 <mnasiadka> (if there's a big time difference between deploying on 1 and 3 nodes) :)
15:27:21 <yoctozepto> mnasiadka: true that
15:27:28 <yoctozepto> mgoddard: we can evaluate that as well
15:27:31 <mnasiadka> two nodes? I don't think it's our recommendation in HA - the docs recommend 3 nodes?
15:27:33 <mgoddard> mnasiadka: from a resource usage perspective it's 3x
15:27:50 <yoctozepto> mgoddard, mnasiadka: we can still test clustering correctness
15:27:52 <yoctozepto> in jobs like mariadb
15:27:56 <yoctozepto> which are much faster
15:28:18 <yoctozepto> we could create a simple job that test mariadb+rabbitmq+anything_else_that_we_need_to_ha_cluster
15:28:29 <yoctozepto> and avoid multinode in biggur scenarios
15:28:48 <yoctozepto> we could use some unit testing
15:28:53 <yoctozepto> if we loop correctly in templates
15:28:57 <yoctozepto> and things like that
15:29:04 <yoctozepto> but it is much less likely to break it
15:29:18 <yoctozepto> than simply break the process of deploying/upgrading clustered resources
15:30:32 <yoctozepto> we can also avoid multinode jobs duplicates per distro
15:30:41 <yoctozepto> as it's less likely that clustering logic breaks between distros
15:30:44 <mgoddard> it is good to test 3 nodes. I guess we just need to be careful about when we use it
15:30:49 <yoctozepto> it's more likely that we break altogether
15:31:06 <yoctozepto> let's do it in small steps
15:31:26 <mgoddard> we could do centos multinode ceph, ubuntu single node ceph
15:31:27 <yoctozepto> perhaps ditching ceph-ansible will give us a sane boost to performance and reliability
15:31:34 <mgoddard> then the reverse for the ceph upgrade jobs
15:31:41 <yoctozepto> yeah, that works too
15:31:56 <mnasiadka> we could run ceph upgrade jobs as periodicals, I think they are the main killer :)
15:32:00 <hrw> if weeknumber is odd then centos. if even then ubuntu
15:32:01 <mnasiadka> they/those
15:32:16 <yoctozepto> yup, periodics are another path
15:32:24 <yoctozepto> hrw: oh, I like it
15:32:27 <hrw> such check would not require changing CI scripts
15:32:27 <yoctozepto> random CI
15:32:35 <yoctozepto> and then
15:32:40 <yoctozepto> let's merge it on an odd week
15:32:52 <yoctozepto> because it fails on even ones
15:34:40 <mgoddard> let's try to keep it predictable :)
15:35:47 <mgoddard> ok, I think we have some things to try
15:36:11 <mgoddard> I'm going to propose restricting the cells job to nova changes
15:37:10 <yoctozepto> and soon mariadb ones
15:37:13 <yoctozepto> (for shards)
15:37:19 <mgoddard> right
15:37:26 <mgoddard> #topic Review requests
15:37:30 <mgoddard> Same as last time
15:37:41 <mgoddard> Does anyone have a patch they would like to be reviewed?
15:38:26 <hrw> one patch: https://review.opendev.org/c/openstack/kolla/+/772507
15:38:53 <hrw> passed zuul, yoctozepto gave +2
15:39:16 <openstackgerrit> Radosław Piliszek proposed openstack/kolla-ansible master: Lint and fix renos  https://review.opendev.org/c/openstack/kolla-ansible/+/759370
15:39:53 <yoctozepto> well, you can w+1 ^
15:40:17 <yoctozepto> I am a bit overloaded and don't seem to have anything ready proposed at the moment
15:41:07 <hrw> lgtm
15:41:15 <mgoddard> don't +A
15:41:21 <mgoddard> linters job is broken
15:41:36 <yoctozepto> oh
15:41:37 <yoctozepto> sad
15:41:42 <mgoddard> yeah
15:41:45 <mgoddard> ansible-lint
15:41:46 <openstackgerrit> Michal Nasiadka proposed openstack/kolla-ansible master: CI: Move from ceph-ansible to cephadm  https://review.opendev.org/c/openstack/kolla-ansible/+/771418
15:41:50 <yoctozepto> argh
15:41:53 <mgoddard> worst linter ever
15:41:55 <mnasiadka> ansible-lint broken again?
15:42:07 <yoctozepto> just drop it
15:42:08 <hrw> mgoddard: add --exclude releasenotes and be done
15:42:14 <yoctozepto> or make it non-voting
15:42:15 <mgoddard> hrw: not required
15:42:27 <mgoddard> it's fixed in my patch
15:42:32 <hrw> mgoddard: ok
15:42:36 <yoctozepto> let it help us, not hinder us
15:42:42 <mgoddard> we put in the effort to get it passing, may as well keep it
15:43:15 <yoctozepto> we did
15:43:25 <mgoddard> I'm going to make https://review.opendev.org/c/openstack/kolla-ansible/+/695432/ my review request this week
15:43:50 <yoctozepto> that's a review bomb for sure
15:44:27 <mgoddard> boom
15:44:29 <yoctozepto> it conflicts with half the proposed changes as well I see :-)
15:44:36 <yoctozepto> tough nut
15:44:51 <mgoddard> indeed
15:44:52 <hrw> can we add 'must pass zuul' to this request queue?
15:45:05 <mnasiadka> mgoddard: eek, balance source in haproxy?
15:45:25 <hrw> or 'must pass zuul if core wrote' ;D
15:45:43 <hrw> ok, I have to go. thanks for meeting
15:45:46 <mgoddard> mnasiadka: it that's exactly the kind of comment that should be in the review :)
15:45:52 <mgoddard> thanks headphoneJames
15:45:55 <mgoddard> sorry, hrw
15:46:02 <mgoddard> back to sleep headphoneJames
15:46:04 <mnasiadka> mgoddard: I'll force myself to review that tomorrow ;)
15:46:30 <mgoddard> it has been around for a long time and the authors have been quite responsive, helpful, and receptive to change
15:46:44 <mgoddard> I'd like to at least merge that patch and adapt it if we need to
15:47:16 <mgoddard> mnasiadka: just don't mention SAML ;)
15:47:35 <mnasiadka> Of course I will, OIDC is not 90% of usage, I would rather claim otherwise :)
15:48:04 <yoctozepto> I have added my first comment
15:48:07 <mnasiadka> Anyway, let's not incite religious wars.
15:48:24 <yoctozepto> XML4life
15:48:38 <mnasiadka> I never had the ability to read XML files.
15:48:58 <yoctozepto> they are the best test for patience
15:49:04 <mnasiadka> It's like obfuscated poetry from an oppressed country.
15:49:23 <yoctozepto> hahahaha
15:49:26 <mgoddard> it stands for XXXX my life
15:49:38 <yoctozepto> the water I was drinking left me through my nose
15:49:41 <yoctozepto> thank you very much
15:49:49 <mgoddard> hope your computer is dry
15:49:56 <yoctozepto> drying it
15:50:23 <yoctozepto> don't drink and kollameet
15:51:03 <mgoddard> #topic Wallaby release planning
15:51:07 <mnasiadka> Around reviews - can somebody look at this: https://review.opendev.org/c/openstack/kolla-ansible/+/774222 ? It solves cases when Neutron does not support OVN native DHCP yet.
15:51:32 <yoctozepto> wallaby is the cycle we spend on fixing our ci fires
15:51:56 <mgoddard> Mar 29 - Apr 02
15:52:13 <mgoddard> yoctozepto: every cycle is the cycle we spend on fixing our ci fires
15:52:21 <mnasiadka> and not only our
15:52:37 <yoctozepto> this one feels sadder
15:53:19 <mgoddard> I'll take a look
15:53:39 <mgoddard> mnasiadka: what are those cases? bare metal?
15:53:55 <mnasiadka> mgoddard: yes, bare metal
15:54:21 <mnasiadka> OVN has functionality that would fit, but Neutron seems to be lacking here and there...
15:55:13 <mgoddard> hmm, surprised lucasgomes is not on top of it
15:55:27 <mnasiadka> well, he's the one proposing to use neutron-dhcp-agent )
15:55:29 <mnasiadka> :)
15:55:32 <mgoddard> ah
15:55:55 <mgoddard> well there is no better intersection of ironic & ovn knowledge
15:56:20 <mgoddard> #topic open discussion
15:56:26 <mgoddard> anyone have anything this week?
15:58:23 <mnasiadka> I was thinking about extending kayobe's seed-custom-containers to overcloud-custom-containers - is that something we want to do?
15:58:47 <mgoddard> mnasiadka: I guess I see kolla-ansible as the thing that deploys containers
15:59:07 <mgoddard> is there a particular use case in mind?
15:59:48 <yoctozepto> nothing new from me
15:59:50 <mnasiadka> well, there are some components, that are not covered by kolla-ansible - I can write an external role to do it, or just do ^^
16:00:02 <mnasiadka> like I don't know... consul and hashicorp vault
16:00:23 <mgoddard> I see
16:01:29 <mgoddard> it probably needs some thought
16:01:57 <mnasiadka> for sure, we need to make sure we don't have a naming collision for containers and volumes
16:02:15 <mnasiadka> which might be not so straightforward to write
16:02:16 <mgoddard> we did discuss making some plugin/extension mechanism for kolla-ansible at the PTG
16:02:39 <mgoddard> prometheus exporters being the initial use case
16:02:46 <mgoddard> but potentiall this fits too
16:02:59 <mnasiadka> yeah, while adding roles from a different path (or using ansible galaxy) is not problematic, we would need to find a way to resolve dependencies/order of running
16:03:27 <mgoddard> I think there is a question of when you would want to deploy those things - are they coupled to openstack, or not?
16:03:37 <mgoddard> probably you would not want to touch vault too often
16:04:07 <mnasiadka> well - touching vault means unsealing - so no :)
16:04:08 <mgoddard> anyway, meeting over folks
16:04:14 <mgoddard> thanks
16:04:16 <mgoddard> #endmeeting