#openstack-kolla log

15:00:32 <mgoddard> #startmeeting kolla
15:00:32 <openstack> Meeting started Wed Feb 17 15:00:32 2021 UTC and is due to finish in 60 minutes.  The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:33 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:36 <openstack> The meeting name has been set to 'kolla'
15:00:44 <mgoddard> #topic rollcall
15:00:51 <yoctozepto> \o/
15:00:51 <mgoddard> \o
15:00:55 <rafaelweingartne> \o
15:01:34 <risson> \o
15:02:32 <hrw> /o\
15:03:00 <hrw> /ō\ even
15:03:45 <mgoddard> #topic agenda
15:03:53 <mgoddard> * Roll-call
15:03:55 <mgoddard> * Announcements
15:03:57 <mgoddard> * Review action items from the last meeting
15:03:59 <mgoddard> * CI status
15:04:01 <mgoddard> * Review requests
15:04:03 <mgoddard> * Keystone federation & HAProxy session stickiness https://review.opendev.org/c/openstack/kolla-ansible/+/695432/56/ansible/roles/keystone/defaults/main.yml
15:04:05 <mgoddard> * Dockerhub pull limits: publish weekly master images? https://review.opendev.org/c/openstack/kolla/+/775995
15:04:07 <mgoddard> * Wallaby release planning
15:04:09 <mgoddard> #topic announcements
15:04:32 <mgoddard> None from me
15:05:49 <mgoddard> #topic Review action items from the last meeting
15:06:07 <mgoddard> mgoddard fix bifrost on Train
15:06:44 <mgoddard> Fixing bifrost itself proved tricky, but there is a part of the bifrost fix that we can apply via config
15:07:12 <mgoddard> I added the fix to https://review.opendev.org/c/openstack/kolla/+/774602, which seems to have worked
15:07:20 <mgoddard> but now there are other issues
15:07:40 <mgoddard> something to do with the elasticsearch 5.x repo
15:07:50 <mgoddard> I'm wondering if it's a mirror sync issue
15:07:55 <yoctozepto> yeah, it fails randomly
15:07:58 <yoctozepto> but weirdly
15:08:03 <mgoddard> fails every time
15:08:07 <mgoddard> on ubuntu source
15:08:16 <yoctozepto> hmm, but that ubuntu binary built
15:08:34 <yoctozepto> something fishy I would say
15:08:40 <mgoddard> yes
15:08:49 <mgoddard> retry tomorrow
15:08:52 <yoctozepto> let's leave it be for today
15:08:53 <yoctozepto> yes
15:09:12 <mgoddard> #topic CI status
15:10:00 <mgoddard> Generally looks better
15:10:10 <mgoddard> kolla failing in Train & earlier due to aforementioned issues
15:11:33 <mgoddard> #topic Review requests
15:11:47 <mgoddard> Does anyone have a patch they would like to be reviewed?
15:11:54 <risson> Yep! https://review.opendev.org/c/openstack/kolla-ansible/+/772886
15:12:08 <risson> It has been discussed here before between you and Mr_Freezeex
15:12:16 <hrw> https://review.opendev.org/c/openstack/kolla/+/772841 from me (centos 8 stream)
15:12:33 <kevko> yeah, https://review.opendev.org/q/hashtag:%22proxysql%22+(status:open%20OR%20status:merged)  :)
15:12:37 <kevko> :D
15:13:31 <hrw> kevko: could you look at https://review.opendev.org/c/openstack/kolla/+/772479 one?
15:14:00 <kevko> will
15:15:16 <mgoddard> risson: I've added review priority +1 label to the patch
15:15:27 <risson> thanks!
15:16:48 <mgoddard> added RP+1 to those
15:16:52 <mgoddard> Anyone else?
15:17:41 <mgoddard> I'm going to request the same as last week,
15:17:43 <mgoddard> https://review.opendev.org/c/openstack/kolla-ansible/+/695432
15:17:46 <mgoddard> keystone federation
15:18:21 <mgoddard> on that topic...
15:18:27 <mgoddard> #topic Keystone federation & HAProxy session stickiness
15:18:34 <mgoddard> #link https://review.opendev.org/c/openstack/kolla-ansible/+/695432/56/ansible/roles/keystone/defaults/main.yml
15:18:45 <mgoddard> rafaelweingartne: hi
15:19:21 <rafaelweingartne> Hello
15:20:08 <risson> We applied that patch on our deployment and we needed the balance source option for session stickiness as explained by Pedro in his comment
15:20:31 <mgoddard> We have one main point of contention in the keystone federation patch: session stickiness
15:20:41 <mgoddard> the aim here is to talk it out
15:20:49 <mgoddard> argh, Fl1nt isn't here
15:20:49 <rafaelweingartne> Exactly, we explained a few times for different people, and probably when Flint asked the same, we just jumped over the question.
15:21:22 <yoctozepto> I saw the explanation, I am buying it
15:21:31 <mgoddard> I would say that he's done quite a good job of explaining himself now, and I haven't seen a decent response yet, although maybe I missed it
15:21:34 <rafaelweingartne> A few days ago flint explicitly showed what he wanted to address there, which is the "sticky session mode" that is being used, and not the use of the sticky session per se
15:22:10 <risson> yes, sticky sessions should be achieved based on the user's cookies, not with `balance source`
15:22:12 <mgoddard> right, I think we're in agreement that stickiness is required
15:22:23 <risson> I'm not sure if HAProxy permits that though
15:22:29 <rafaelweingartne> We do not actually mind changing that, if that had been said, we would have done it.
15:22:35 <rafaelweingartne> risson: we also do not know that
15:22:54 <rafaelweingartne> we started experimenting with some options, we normally only use source, because it is easier :)
15:23:15 <rafaelweingartne> to avoid more problems, what alternatives to source would you guys prefer?
15:24:04 <mgoddard> Fl1nt made a comment on PS57: https://review.opendev.org/c/openstack/kolla-ansible/+/695432/56
15:24:04 <rafaelweingartne> custom cookie based sticky session? Session ID? a configurable load balancing mode (least connection/round-robin)?
15:24:15 <risson> there's an rdp-cookie option that can be passed to `balance`, not sure if it is what we're looking for
15:25:07 <rafaelweingartne> yes, that seems to be the implementation Flint prefers
15:25:15 <mgoddard> It would be better to use roundrobin or leastconn with a session cookie, that would let HAProxy to appropriately let you contact the correct backend if the node you were connecting from died or if the lease of your client expired.
15:25:17 <mgoddard> Additionally, there is an optional extra option that can be used to be more deterministic on the way HAproxy is handling the backend chosen for your session which is: hash-type that can be set to many options such as consistent / map-based / sdbm, etc (see haproxy doc about that).
15:25:19 <mgoddard> We use consistent on our side but that could be something up to the operators to choose.
15:25:24 <mgoddard> quoting Fl1nt there
15:25:56 <rafaelweingartne> yes
15:26:00 <mgoddard> TBH, balance source is what we use for horizon, so it's not going to be making things any worse
15:26:16 <rafaelweingartne> actually, it does not make any difference
15:26:22 <yoctozepto> ^ exactly mgoddard
15:26:32 <rafaelweingartne> you know, the sticky session is only needed during the authentication phase to validate the token generated by the IdP
15:26:44 <risson> what was the argument against balance source again?
15:26:45 <yoctozepto> exactly, it should be either short enough to be irrelevant
15:26:53 <rafaelweingartne> that is the moment we need the sticky session, after that, it does not make much difference
15:26:55 <yoctozepto> or slow enough that it needs fixing anyhow elsewhere
15:27:08 <openstackgerrit> Arthur Outhenin-Chalandre proposed openstack/kolla-ansible master: Add `kolla_externally_managed_cert` option  https://review.opendev.org/c/openstack/kolla-ansible/+/772886
15:27:15 <yoctozepto> but the problem is obviously that 'balance source' stays with us forever
15:27:23 <yoctozepto> in that token verifications
15:27:27 <yoctozepto> still hit it
15:28:20 <mgoddard> very old blog with info on using haproxy to insert cookies: https://www.haproxy.com/blog/load-balancing-affinity-persistence-sticky-sessions-what-you-need-to-know/#session-cookie-setup-by-the-load-balancer
15:28:34 <risson> damn, review priority has been removed from https://review.opendev.org/c/openstack/kolla-ansible/+/772886
15:29:17 <yoctozepto> risson: it's baaaack
15:29:25 <rafaelweingartne> if the node that initiated the authentication dies, the user will get an error when presenting this token to other mod-OIDC instances
15:30:13 <risson> yes, but they can just try again and it'll work, right?
15:30:52 <risson> the proper way of fixing this would not be using apache2 for authentication, but keystone doing it and storing its state in its db
15:31:26 <mgoddard> how about this for a path forward:
15:31:40 <rafaelweingartne> yes,
15:31:48 <rafaelweingartne> but source-balance would do the same
15:31:56 <mgoddard> keep the current patch with balance source, enabled only with federation
15:32:01 <rafaelweingartne> it validates if the node is up, before sending it to the backend
15:32:29 <mgoddard> consider switching to another method for horizon and keystone together, as a follow up
15:33:02 <rafaelweingartne> the only difference between sticky session with source-balance and the others is the "more optimal" balance of load between nodes
15:33:32 <rafaelweingartne> considering that we could have one IP (NAT) with many different users
15:34:03 <mgoddard> right
15:34:22 <mgoddard> with a central service such as keystone that is something worth considering
15:34:58 <mgoddard> thoughts on my suggestion?
15:35:23 <risson> I think that going with balance source is a good idea for now
15:35:28 <yoctozepto> mgoddard: love it
15:36:12 <mgoddard> wonderful
15:36:29 <rafaelweingartne> I like your suggestion
15:36:31 <mgoddard> let's aim to get it merged before the next meeting
15:36:48 <rafaelweingartne> because we have not extensively tested this with other balance methods
15:36:58 <risson> ^ this
15:37:01 <mgoddard> and rafaelweingartne and Pedro can stop pulling their hair out :)
15:37:06 <rafaelweingartne> :)
15:37:24 <rafaelweingartne> we do understand that the patch is huge. I also hate it
15:37:39 <rafaelweingartne> but, it was the first load of code to handle federation in Kolla-ansible
15:37:39 <mnasiadka> so next time make smaller patches :)
15:37:40 <mgoddard> I've seen bigger ;)
15:37:48 <rafaelweingartne> the improvements will be much easier
15:38:11 <mnasiadka> around haproxy balance source - that's a bit non-ideal solution, but I guess we can live with it for a while.
15:38:12 <mgoddard> I think the main obstacle is the subject matter, rather than the size of the code
15:38:44 <yoctozepto> I'll re-review this week
15:38:48 <mgoddard> anyways, we have some level of agreement, let's move on
15:38:50 <yoctozepto> but I expect to merge it
15:38:56 <mgoddard> thanks for joining rafaelweingartne
15:39:12 <mgoddard> #topic Dockerhub pull limits: publish weekly master images?
15:39:21 <mgoddard> #link https://review.opendev.org/c/openstack/kolla/+/775995
15:39:24 <yoctozepto> y not
15:39:26 <rafaelweingartne> awesome thanks guys
15:39:42 <mgoddard> priteau and I were discussing the pull limit issue
15:40:02 <yoctozepto> it sucks
15:40:25 <mgoddard> what if we publish master images weekly and daily?
15:41:17 <mgoddard> some projects could use the weekly images in CI
15:41:21 <mgoddard> e.g. kayobe
15:41:27 <mgoddard> possibly kolla-ansible
15:42:19 <mgoddard> how often would we get hit by broken images, or blocked by images being out of date?
15:42:59 <priteau> Hard to say. I suppose if we get blocked we could override CI to use daily.
15:43:08 <mgoddard> right
15:43:20 <yoctozepto> I think we need to add ourselves the ability to publish on demand
15:43:22 <mgoddard> well, maybe for broken images
15:43:32 <yoctozepto> we can publish on specific commits we merge
15:43:35 <yoctozepto> fugly but worky
15:43:37 <mgoddard> probably not just for a feature that depends on images
15:43:53 <mgoddard> or we could publish twice-weekly
15:44:01 <mgoddard> that could be a better compromise
15:44:05 <yoctozepto> that's getting overly complicated
15:44:17 <mgoddard> not really
15:44:18 <yoctozepto> Sunday feels better
15:44:28 <mnasiadka> or we could build on every deployment, how long is the build?
15:44:40 <mnasiadka> (on master only)
15:44:46 <mgoddard> it just feels wrong
15:44:51 <yoctozepto> feels wrong
15:44:55 <mnasiadka> I think often we are dependent on something failing in the image
15:44:56 <yoctozepto> but might make CI saner
15:44:59 <mnasiadka> and then we're stuck for a week?
15:45:11 <yoctozepto> we don't build all the images
15:45:21 <yoctozepto> but indeed it might quite a bit of extra work
15:45:23 <mgoddard> well, like yoctozepto said we'd need an override
15:45:34 <yoctozepto> yeah, we can practice the override
15:45:43 <yoctozepto> empty commits with metadata are pretty cheap
15:46:18 <yoctozepto> we can publish from other pipelines than periodic
15:46:21 <yoctozepto> just not check
15:46:31 <yoctozepto> as it runs untrusted code
15:46:41 <mgoddard> which pipeline would be appropriate?
15:46:46 <yoctozepto> on that note, remember W+1 makes the change trusted
15:46:47 <mgoddard> gate?
15:46:56 <yoctozepto> nope, it should be after gating
15:47:07 <yoctozepto> either post or promote
15:47:30 <yoctozepto> but we should really keep the images built in gate
15:47:34 <yoctozepto> for publishing later
15:47:56 <yoctozepto> gate is technically fine but we all know we can end up overpublishing :-)
15:48:30 <openstackgerrit> Doug Szumski proposed openstack/kolla-ansible master: Support bypassing Monasca Log API for control plane logs  https://review.opendev.org/c/openstack/kolla-ansible/+/776219
15:48:39 <mgoddard> alternatively we have a nightly publish job that is a noop unless:
15:48:45 <mnasiadka> well, can we publish master to quay.io or github? will it work better?
15:48:53 <mgoddard> * it is a one of the selected publishing days
15:49:07 <yoctozepto> mnasiadka: yeah, we could test that as well
15:49:18 <yoctozepto> lots of ideas; need triage :-)
15:49:22 <mgoddard> * or we modify zuul config to override
15:49:43 <mnasiadka> yoctozepto: I just don't like those zuul dances, because it seems like a lot of work with random success :)
15:50:10 <yoctozepto> mnasiadka: i feel you
15:50:18 <hrw> what is wrong with each-day publish? do we mirror images on CI?
15:50:46 <mgoddard> hrw: new images -> invalidated registry caches -> docker pull -> pull request limit
15:51:05 <mgoddard> hrw: we now do weekly publishing on stables, and it has helped a lot
15:51:32 <hrw> can we publish daily to some opendev infra registry?
15:51:38 <hrw> and then use them on CI?
15:52:10 <mgoddard> we have discussed all these solutions before
15:52:24 <mgoddard> the problem is, I don't see anyone putting in time to implement them
15:52:31 <mnasiadka> hrw: and that solution is nice, but requires somebody to work with infra to get this implemented
15:52:39 <mgoddard> so this topic was aiming to be another stop-gap measure
15:52:53 <yoctozepto> yeah
15:52:59 <hrw> k
15:53:13 <mgoddard> we can very easily reduce our publishing frequency
15:53:31 <yoctozepto> so let's do it
15:53:33 <mgoddard> although it does come with gotchas
15:53:37 <mgoddard> as discussed :)
15:53:41 <yoctozepto> and cry* when we get blocked
15:53:44 <yoctozepto> * discuss
15:54:01 <yoctozepto> better than continuous rechecks
15:54:15 <yoctozepto> and now gimme open discussion
15:54:36 <mgoddard> #topic open discussion
15:54:45 <yoctozepto> hrw: I like https://michael-prokop.at/blog/2021/02/16/how-to-properly-use-3rd-party-debian-repository-signing-keys-with-apt/
15:54:52 <yoctozepto> it is essentially what we have in centos
15:55:02 <yoctozepto> and I was wondering once if we could have the same for debuntu
15:55:15 <yoctozepto> so I'm all in
15:55:23 <hrw> yoctozepto: I looked closer into it and can have it for Debian. Ubuntu uses 3 keys directly from keyserver so gnupg still needed
15:55:49 <yoctozepto> perhaps we can override that as well
15:55:55 <yoctozepto> but a mixed solution is fine for now
15:56:05 <yoctozepto> do it everywhere it's simple
15:56:11 <hrw> yoctozepto: https://paste.centos.org/view/e526b842 is start of cleanup
15:56:51 <yoctozepto> ++
15:56:54 <yoctozepto> let it continue
15:58:59 <openstackgerrit> Mark Goddard proposed openstack/kolla master: CI: publish images on a weekly basis  https://review.opendev.org/c/openstack/kolla/+/776221
16:01:11 * hrw out
16:01:16 <mgoddard> all done for this week
16:01:18 <mgoddard> thanks
16:01:21 <yoctozepto> thanks
16:01:22 <mgoddard> #endmeeting