15:00:32 <mgoddard> #startmeeting kolla 15:00:32 <openstack> Meeting started Wed Feb 17 15:00:32 2021 UTC and is due to finish in 60 minutes. The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:33 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:36 <openstack> The meeting name has been set to 'kolla' 15:00:44 <mgoddard> #topic rollcall 15:00:51 <yoctozepto> \o/ 15:00:51 <mgoddard> \o 15:00:55 <rafaelweingartne> \o 15:01:34 <risson> \o 15:02:32 <hrw> /o\ 15:03:00 <hrw> /ō\ even 15:03:45 <mgoddard> #topic agenda 15:03:53 <mgoddard> * Roll-call 15:03:55 <mgoddard> * Announcements 15:03:57 <mgoddard> * Review action items from the last meeting 15:03:59 <mgoddard> * CI status 15:04:01 <mgoddard> * Review requests 15:04:03 <mgoddard> * Keystone federation & HAProxy session stickiness https://review.opendev.org/c/openstack/kolla-ansible/+/695432/56/ansible/roles/keystone/defaults/main.yml 15:04:05 <mgoddard> * Dockerhub pull limits: publish weekly master images? https://review.opendev.org/c/openstack/kolla/+/775995 15:04:07 <mgoddard> * Wallaby release planning 15:04:09 <mgoddard> #topic announcements 15:04:32 <mgoddard> None from me 15:05:49 <mgoddard> #topic Review action items from the last meeting 15:06:07 <mgoddard> mgoddard fix bifrost on Train 15:06:44 <mgoddard> Fixing bifrost itself proved tricky, but there is a part of the bifrost fix that we can apply via config 15:07:12 <mgoddard> I added the fix to https://review.opendev.org/c/openstack/kolla/+/774602, which seems to have worked 15:07:20 <mgoddard> but now there are other issues 15:07:40 <mgoddard> something to do with the elasticsearch 5.x repo 15:07:50 <mgoddard> I'm wondering if it's a mirror sync issue 15:07:55 <yoctozepto> yeah, it fails randomly 15:07:58 <yoctozepto> but weirdly 15:08:03 <mgoddard> fails every time 15:08:07 <mgoddard> on ubuntu source 15:08:16 <yoctozepto> hmm, but that ubuntu binary built 15:08:34 <yoctozepto> something fishy I would say 15:08:40 <mgoddard> yes 15:08:49 <mgoddard> retry tomorrow 15:08:52 <yoctozepto> let's leave it be for today 15:08:53 <yoctozepto> yes 15:09:12 <mgoddard> #topic CI status 15:10:00 <mgoddard> Generally looks better 15:10:10 <mgoddard> kolla failing in Train & earlier due to aforementioned issues 15:11:33 <mgoddard> #topic Review requests 15:11:47 <mgoddard> Does anyone have a patch they would like to be reviewed? 15:11:54 <risson> Yep! https://review.opendev.org/c/openstack/kolla-ansible/+/772886 15:12:08 <risson> It has been discussed here before between you and Mr_Freezeex 15:12:16 <hrw> https://review.opendev.org/c/openstack/kolla/+/772841 from me (centos 8 stream) 15:12:33 <kevko> yeah, https://review.opendev.org/q/hashtag:%22proxysql%22+(status:open%20OR%20status:merged) :) 15:12:37 <kevko> :D 15:13:31 <hrw> kevko: could you look at https://review.opendev.org/c/openstack/kolla/+/772479 one? 15:14:00 <kevko> will 15:15:16 <mgoddard> risson: I've added review priority +1 label to the patch 15:15:27 <risson> thanks! 15:16:48 <mgoddard> added RP+1 to those 15:16:52 <mgoddard> Anyone else? 15:17:41 <mgoddard> I'm going to request the same as last week, 15:17:43 <mgoddard> https://review.opendev.org/c/openstack/kolla-ansible/+/695432 15:17:46 <mgoddard> keystone federation 15:18:21 <mgoddard> on that topic... 15:18:27 <mgoddard> #topic Keystone federation & HAProxy session stickiness 15:18:34 <mgoddard> #link https://review.opendev.org/c/openstack/kolla-ansible/+/695432/56/ansible/roles/keystone/defaults/main.yml 15:18:45 <mgoddard> rafaelweingartne: hi 15:19:21 <rafaelweingartne> Hello 15:20:08 <risson> We applied that patch on our deployment and we needed the balance source option for session stickiness as explained by Pedro in his comment 15:20:31 <mgoddard> We have one main point of contention in the keystone federation patch: session stickiness 15:20:41 <mgoddard> the aim here is to talk it out 15:20:49 <mgoddard> argh, Fl1nt isn't here 15:20:49 <rafaelweingartne> Exactly, we explained a few times for different people, and probably when Flint asked the same, we just jumped over the question. 15:21:22 <yoctozepto> I saw the explanation, I am buying it 15:21:31 <mgoddard> I would say that he's done quite a good job of explaining himself now, and I haven't seen a decent response yet, although maybe I missed it 15:21:34 <rafaelweingartne> A few days ago flint explicitly showed what he wanted to address there, which is the "sticky session mode" that is being used, and not the use of the sticky session per se 15:22:10 <risson> yes, sticky sessions should be achieved based on the user's cookies, not with `balance source` 15:22:12 <mgoddard> right, I think we're in agreement that stickiness is required 15:22:23 <risson> I'm not sure if HAProxy permits that though 15:22:29 <rafaelweingartne> We do not actually mind changing that, if that had been said, we would have done it. 15:22:35 <rafaelweingartne> risson: we also do not know that 15:22:54 <rafaelweingartne> we started experimenting with some options, we normally only use source, because it is easier :) 15:23:15 <rafaelweingartne> to avoid more problems, what alternatives to source would you guys prefer? 15:24:04 <mgoddard> Fl1nt made a comment on PS57: https://review.opendev.org/c/openstack/kolla-ansible/+/695432/56 15:24:04 <rafaelweingartne> custom cookie based sticky session? Session ID? a configurable load balancing mode (least connection/round-robin)? 15:24:15 <risson> there's an rdp-cookie option that can be passed to `balance`, not sure if it is what we're looking for 15:25:07 <rafaelweingartne> yes, that seems to be the implementation Flint prefers 15:25:15 <mgoddard> It would be better to use roundrobin or leastconn with a session cookie, that would let HAProxy to appropriately let you contact the correct backend if the node you were connecting from died or if the lease of your client expired. 15:25:17 <mgoddard> Additionally, there is an optional extra option that can be used to be more deterministic on the way HAproxy is handling the backend chosen for your session which is: hash-type that can be set to many options such as consistent / map-based / sdbm, etc (see haproxy doc about that). 15:25:19 <mgoddard> We use consistent on our side but that could be something up to the operators to choose. 15:25:24 <mgoddard> quoting Fl1nt there 15:25:56 <rafaelweingartne> yes 15:26:00 <mgoddard> TBH, balance source is what we use for horizon, so it's not going to be making things any worse 15:26:16 <rafaelweingartne> actually, it does not make any difference 15:26:22 <yoctozepto> ^ exactly mgoddard 15:26:32 <rafaelweingartne> you know, the sticky session is only needed during the authentication phase to validate the token generated by the IdP 15:26:44 <risson> what was the argument against balance source again? 15:26:45 <yoctozepto> exactly, it should be either short enough to be irrelevant 15:26:53 <rafaelweingartne> that is the moment we need the sticky session, after that, it does not make much difference 15:26:55 <yoctozepto> or slow enough that it needs fixing anyhow elsewhere 15:27:08 <openstackgerrit> Arthur Outhenin-Chalandre proposed openstack/kolla-ansible master: Add `kolla_externally_managed_cert` option https://review.opendev.org/c/openstack/kolla-ansible/+/772886 15:27:15 <yoctozepto> but the problem is obviously that 'balance source' stays with us forever 15:27:23 <yoctozepto> in that token verifications 15:27:27 <yoctozepto> still hit it 15:28:20 <mgoddard> very old blog with info on using haproxy to insert cookies: https://www.haproxy.com/blog/load-balancing-affinity-persistence-sticky-sessions-what-you-need-to-know/#session-cookie-setup-by-the-load-balancer 15:28:34 <risson> damn, review priority has been removed from https://review.opendev.org/c/openstack/kolla-ansible/+/772886 15:29:17 <yoctozepto> risson: it's baaaack 15:29:25 <rafaelweingartne> if the node that initiated the authentication dies, the user will get an error when presenting this token to other mod-OIDC instances 15:30:13 <risson> yes, but they can just try again and it'll work, right? 15:30:52 <risson> the proper way of fixing this would not be using apache2 for authentication, but keystone doing it and storing its state in its db 15:31:26 <mgoddard> how about this for a path forward: 15:31:40 <rafaelweingartne> yes, 15:31:48 <rafaelweingartne> but source-balance would do the same 15:31:56 <mgoddard> keep the current patch with balance source, enabled only with federation 15:32:01 <rafaelweingartne> it validates if the node is up, before sending it to the backend 15:32:29 <mgoddard> consider switching to another method for horizon and keystone together, as a follow up 15:33:02 <rafaelweingartne> the only difference between sticky session with source-balance and the others is the "more optimal" balance of load between nodes 15:33:32 <rafaelweingartne> considering that we could have one IP (NAT) with many different users 15:34:03 <mgoddard> right 15:34:22 <mgoddard> with a central service such as keystone that is something worth considering 15:34:58 <mgoddard> thoughts on my suggestion? 15:35:23 <risson> I think that going with balance source is a good idea for now 15:35:28 <yoctozepto> mgoddard: love it 15:36:12 <mgoddard> wonderful 15:36:29 <rafaelweingartne> I like your suggestion 15:36:31 <mgoddard> let's aim to get it merged before the next meeting 15:36:48 <rafaelweingartne> because we have not extensively tested this with other balance methods 15:36:58 <risson> ^ this 15:37:01 <mgoddard> and rafaelweingartne and Pedro can stop pulling their hair out :) 15:37:06 <rafaelweingartne> :) 15:37:24 <rafaelweingartne> we do understand that the patch is huge. I also hate it 15:37:39 <rafaelweingartne> but, it was the first load of code to handle federation in Kolla-ansible 15:37:39 <mnasiadka> so next time make smaller patches :) 15:37:40 <mgoddard> I've seen bigger ;) 15:37:48 <rafaelweingartne> the improvements will be much easier 15:38:11 <mnasiadka> around haproxy balance source - that's a bit non-ideal solution, but I guess we can live with it for a while. 15:38:12 <mgoddard> I think the main obstacle is the subject matter, rather than the size of the code 15:38:44 <yoctozepto> I'll re-review this week 15:38:48 <mgoddard> anyways, we have some level of agreement, let's move on 15:38:50 <yoctozepto> but I expect to merge it 15:38:56 <mgoddard> thanks for joining rafaelweingartne 15:39:12 <mgoddard> #topic Dockerhub pull limits: publish weekly master images? 15:39:21 <mgoddard> #link https://review.opendev.org/c/openstack/kolla/+/775995 15:39:24 <yoctozepto> y not 15:39:26 <rafaelweingartne> awesome thanks guys 15:39:42 <mgoddard> priteau and I were discussing the pull limit issue 15:40:02 <yoctozepto> it sucks 15:40:25 <mgoddard> what if we publish master images weekly and daily? 15:41:17 <mgoddard> some projects could use the weekly images in CI 15:41:21 <mgoddard> e.g. kayobe 15:41:27 <mgoddard> possibly kolla-ansible 15:42:19 <mgoddard> how often would we get hit by broken images, or blocked by images being out of date? 15:42:59 <priteau> Hard to say. I suppose if we get blocked we could override CI to use daily. 15:43:08 <mgoddard> right 15:43:20 <yoctozepto> I think we need to add ourselves the ability to publish on demand 15:43:22 <mgoddard> well, maybe for broken images 15:43:32 <yoctozepto> we can publish on specific commits we merge 15:43:35 <yoctozepto> fugly but worky 15:43:37 <mgoddard> probably not just for a feature that depends on images 15:43:53 <mgoddard> or we could publish twice-weekly 15:44:01 <mgoddard> that could be a better compromise 15:44:05 <yoctozepto> that's getting overly complicated 15:44:17 <mgoddard> not really 15:44:18 <yoctozepto> Sunday feels better 15:44:28 <mnasiadka> or we could build on every deployment, how long is the build? 15:44:40 <mnasiadka> (on master only) 15:44:46 <mgoddard> it just feels wrong 15:44:51 <yoctozepto> feels wrong 15:44:55 <mnasiadka> I think often we are dependent on something failing in the image 15:44:56 <yoctozepto> but might make CI saner 15:44:59 <mnasiadka> and then we're stuck for a week? 15:45:11 <yoctozepto> we don't build all the images 15:45:21 <yoctozepto> but indeed it might quite a bit of extra work 15:45:23 <mgoddard> well, like yoctozepto said we'd need an override 15:45:34 <yoctozepto> yeah, we can practice the override 15:45:43 <yoctozepto> empty commits with metadata are pretty cheap 15:46:18 <yoctozepto> we can publish from other pipelines than periodic 15:46:21 <yoctozepto> just not check 15:46:31 <yoctozepto> as it runs untrusted code 15:46:41 <mgoddard> which pipeline would be appropriate? 15:46:46 <yoctozepto> on that note, remember W+1 makes the change trusted 15:46:47 <mgoddard> gate? 15:46:56 <yoctozepto> nope, it should be after gating 15:47:07 <yoctozepto> either post or promote 15:47:30 <yoctozepto> but we should really keep the images built in gate 15:47:34 <yoctozepto> for publishing later 15:47:56 <yoctozepto> gate is technically fine but we all know we can end up overpublishing :-) 15:48:30 <openstackgerrit> Doug Szumski proposed openstack/kolla-ansible master: Support bypassing Monasca Log API for control plane logs https://review.opendev.org/c/openstack/kolla-ansible/+/776219 15:48:39 <mgoddard> alternatively we have a nightly publish job that is a noop unless: 15:48:45 <mnasiadka> well, can we publish master to quay.io or github? will it work better? 15:48:53 <mgoddard> * it is a one of the selected publishing days 15:49:07 <yoctozepto> mnasiadka: yeah, we could test that as well 15:49:18 <yoctozepto> lots of ideas; need triage :-) 15:49:22 <mgoddard> * or we modify zuul config to override 15:49:43 <mnasiadka> yoctozepto: I just don't like those zuul dances, because it seems like a lot of work with random success :) 15:50:10 <yoctozepto> mnasiadka: i feel you 15:50:18 <hrw> what is wrong with each-day publish? do we mirror images on CI? 15:50:46 <mgoddard> hrw: new images -> invalidated registry caches -> docker pull -> pull request limit 15:51:05 <mgoddard> hrw: we now do weekly publishing on stables, and it has helped a lot 15:51:32 <hrw> can we publish daily to some opendev infra registry? 15:51:38 <hrw> and then use them on CI? 15:52:10 <mgoddard> we have discussed all these solutions before 15:52:24 <mgoddard> the problem is, I don't see anyone putting in time to implement them 15:52:31 <mnasiadka> hrw: and that solution is nice, but requires somebody to work with infra to get this implemented 15:52:39 <mgoddard> so this topic was aiming to be another stop-gap measure 15:52:53 <yoctozepto> yeah 15:52:59 <hrw> k 15:53:13 <mgoddard> we can very easily reduce our publishing frequency 15:53:31 <yoctozepto> so let's do it 15:53:33 <mgoddard> although it does come with gotchas 15:53:37 <mgoddard> as discussed :) 15:53:41 <yoctozepto> and cry* when we get blocked 15:53:44 <yoctozepto> * discuss 15:54:01 <yoctozepto> better than continuous rechecks 15:54:15 <yoctozepto> and now gimme open discussion 15:54:36 <mgoddard> #topic open discussion 15:54:45 <yoctozepto> hrw: I like https://michael-prokop.at/blog/2021/02/16/how-to-properly-use-3rd-party-debian-repository-signing-keys-with-apt/ 15:54:52 <yoctozepto> it is essentially what we have in centos 15:55:02 <yoctozepto> and I was wondering once if we could have the same for debuntu 15:55:15 <yoctozepto> so I'm all in 15:55:23 <hrw> yoctozepto: I looked closer into it and can have it for Debian. Ubuntu uses 3 keys directly from keyserver so gnupg still needed 15:55:49 <yoctozepto> perhaps we can override that as well 15:55:55 <yoctozepto> but a mixed solution is fine for now 15:56:05 <yoctozepto> do it everywhere it's simple 15:56:11 <hrw> yoctozepto: https://paste.centos.org/view/e526b842 is start of cleanup 15:56:51 <yoctozepto> ++ 15:56:54 <yoctozepto> let it continue 15:58:59 <openstackgerrit> Mark Goddard proposed openstack/kolla master: CI: publish images on a weekly basis https://review.opendev.org/c/openstack/kolla/+/776221 16:01:11 * hrw out 16:01:16 <mgoddard> all done for this week 16:01:18 <mgoddard> thanks 16:01:21 <yoctozepto> thanks 16:01:22 <mgoddard> #endmeeting