15:01:18 #startmeeting kolla 15:01:20 Meeting started Wed Dec 16 15:01:18 2020 UTC and is due to finish in 60 minutes. The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:21 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:23 The meeting name has been set to 'kolla' 15:01:39 #topic rollcall 15:01:41 \o 15:01:47 o/ 15:01:48 o/ 15:02:29 \o 15:04:22 #topic agenda 15:05:02 * Roll-call 15:05:04 * Announcements 15:05:06 * Review action items from the last meeting 15:05:08 * CI status 15:05:10 * Victoria release planning 15:05:12 * Dockerhub pull rate limits https://etherpad.opendev.org/p/docker-pull-limits 15:05:14 * CentOS 8.3 & stream https://lists.centos.org/pipermail/centos-devel/2020-December/075451.html 15:05:16 * Cinder active/active https://bugs.launchpad.net/kolla-ansible/+bug/1904062 15:05:17 Launchpad bug 1904062 in kolla-ansible wallaby "external ceph cinder volume config breaks volumes on ussuri upgrade" [High,In progress] - Assigned to Michal Nasiadka (mnasiadka) 15:05:18 * Wallaby PTG actions 15:05:20 #topic announcements 15:05:28 I suppose we should cancel some meetings 15:05:57 the next two? 15:05:57 for the next 3 weeks I guess, 6th Jan is designated holiday in Poland 15:06:02 oh ok 15:06:03 ++ 15:06:22 I am running masakari's on 22nd 15:06:29 but yeah, 23rd with kolla might be a bit late 15:06:31 #info The next 3 meetings are cancelled 15:06:59 #action mgoddard send email to openstack-discuss about meeting cancellations 15:07:00 ok 15:07:00 2020-01-13 is the next kolla meeting 15:07:17 happy new year 15:07:19 2021-01-13 * 15:07:26 happy new year! :D 15:07:34 Any others? 15:07:46 we got our CI back on track 15:07:55 #topic CI status 15:08:11 * yoctozepto sends off fireworks 15:08:15 hooray 15:08:42 thank you to everyone involved in firefighting over the last few weeks 15:09:03 we're not done yet - kayobe is still busted 15:09:23 I had it passing yesterday, now there two new failures 15:09:27 ah, yeah 15:09:29 :O 15:09:33 that went quick 15:09:42 what kind of failures? 15:09:44 well, passing in review anyway 15:09:56 bifrost changed a default 15:10:00 oh gosh 15:10:05 and some weird ironic locking issue 15:10:19 :-( 15:11:01 * hrw at 2 meetings at time 15:11:01 anyway let's go over the CI status 15:11:20 centos8-ceph-upgrade jobs seem to be retried 3 times only to fail in some weird way 15:11:23 still seeing it? 15:12:09 yes 15:12:16 obviously not always 15:13:06 https://zuul.openstack.org/builds?job_name=kolla-ansible-centos8-source-upgrade-ceph-ansible&project=openstack%2Fkolla-ansible&branch=master 15:13:28 a SUCCESS is rare 15:14:02 fwiw, ubuntu ain't looking any better https://zuul.openstack.org/builds?job_name=kolla-ansible-ubuntu-source-upgrade-ceph-ansible&project=openstack%2Fkolla-ansible&branch=master 15:14:06 wonderful 15:14:19 well, I suppose DISK_FULL is kinda success 15:14:35 I like your optimism 15:15:46 guys, probably i refactor and have completed switch from haproxy to maxscale , everyone who want to check working maxscale can login here 15:15:47 http://185.21.197.26:8990/#/dashboard/servers 15:16:01 just ask for pass in private message .. 15:16:12 kevko: meeting time 15:16:15 yup, these recent ones still failed on "Ensuring config directories exist" 15:17:25 let's move on from CI. At least we're in a better position than before, even if we have loose ends 15:17:35 #topic Victoria release planning 15:17:40 ubuntu does not exhibit that particular behaviour 15:17:59 all right, Victoria not so victorious 15:18:14 end of year release not looking too likely 15:18:25 we're still basically blocked on the cinder issue 15:18:28 mnasiadka: any progress? 15:18:29 yup, super sad this time 15:19:13 mgoddard: have a test env, testing migration from non-cluster to cluster, should have something to push this week 15:19:13 perhaps we need to make a call about whether to release without the cinder fix 15:19:53 backend_host probably won't take too much time, it's not much different, than we have one cinder-volume service instead of 3 or more/less 15:20:21 mgoddard: well, we could release with an issue type reno, but it won't look very good :) 15:20:56 no, but it's no worse than what we have already 15:21:07 ++ 15:21:51 that doesn't mean we should take the pressure off fixing the issue 15:21:59 ++ 15:22:17 vote: release victoria without cinder active/active fix? 15:23:12 if I won't push any updates to the change tomorrow - I'll make the releases change myself, deal? :) 15:23:31 mnasiadka: otherwise you'll make mgoddard do them? :P 15:23:35 unless we want to test the migration in a CI 15:23:42 then it will take for sure more time 15:24:48 let's release with an issue reno 15:24:55 ussuri is already "broken" 15:24:59 mention that 15:25:11 it is not a *new issue* 15:25:34 and it can be workarounded if you just control the process yourself 15:25:41 which might be the case with an external ceph ;-) 15:27:30 mgoddard, mnasiadka: decisions, decisions 15:27:34 ugh 15:27:50 let's give mnasiadka a day or two 15:27:59 this week indeed 15:28:16 I might think about extending CI testing 15:28:27 we may not even have the right people around to get a release approved before 2021 15:28:37 that true as well 15:28:43 #topic Dockerhub pull rate limits https://etherpad.opendev.org/p/docker-pull-limits 15:28:50 oh noez 15:29:03 in nightmares it hunts me 15:29:16 you have reached your dream limit 15:30:05 http 429 no sweet dreams for you 15:30:21 sooo 15:30:22 lol 15:30:31 are we pushing towards an internal registry? 15:30:40 I've crossed through docker devil pact option 2 15:30:41 because I have no idea what to do about this 15:30:59 so, mostly k-a jobs suffer? 15:31:02 which leaves switching to another devil, or using the infra registry 15:31:14 mnasiadka: k too 15:31:27 so, in kolla case we pull only one image from docker hub, right? 15:31:30 maybe let's try another devil? 15:31:43 mnasiadka: yeah 1 per distro version 15:31:53 and we fail with that, that's some crazy stuff 15:32:10 because we kill it with k-a jobs 15:32:30 ok, so option 1: another devil 15:32:33 or, well 15:32:43 we are not using registry mirror in kolla 15:32:52 when building those images 15:33:02 they seem to be coming directly from the dockerhub 15:33:07 maybe that is where our issue is 15:33:08 I don't think centos:8 ubuntu:something and debian:devilish_number changes a lot 15:33:21 they don't 15:33:25 if we switch to another registry, I don't know if we get the caching or the mirror 15:33:31 so we could at least cross out those failures by using a cache 15:33:52 mgoddard, mnasiadka: could you confirm that in k we are not using any mirror at the very moment 15:33:57 do we know how many pulls credits we use per a standard k-a CI job? 15:34:15 mnasiadka: 20-30 15:34:59 *(mirror meaning caching proxy) 15:35:57 Michal Arbet proposed openstack/kolla-ansible master: Add maxscale support for database https://review.opendev.org/c/openstack/kolla-ansible/+/767370 15:37:26 Michal Arbet proposed openstack/kolla-ansible master: Add maxscale support for database https://review.opendev.org/c/openstack/kolla-ansible/+/767370 15:37:29 yoctozepto: 20-30 is not bad, we have 200 pulls per 6 hours, right? 15:37:31 or 100? 15:37:35 100 15:37:42 it's not enough 15:37:47 not enuff 15:38:03 ok, key question 15:38:21 so, if we disable using cache in k-a jobs, and add a check in pre to check if we have enough pulls for current host, is that something we could live with? 15:38:23 if I pull from github's registry using a registry mirror, will it cache the image? 15:39:04 mgoddard: good question, I guess it should - but maybe we are both wrong in that assumption :) needs testing 15:39:19 I remember reading it doesn't 15:40:47 you mean pull-through? 15:40:51 yes 15:41:33 I also think it only caches the primary one 15:41:38 might be worth gooling 15:41:41 googling* 15:41:42 let me see, 15:42:14 the standard doesn't 15:42:21 but there is stuff like https://hub.docker.com/r/tiangolo/docker-registry-proxy 15:42:28 or probably something else we could use 15:42:59 Gotcha 15:43:01 It’s currently not possible to mirror another private registry. Only the central Hub can be mirrored. 15:44:11 https://github.com/docker/distribution/blob/master/ROADMAP.md#proxying-to-other-registries 15:44:53 mnasiadka: the thing is: infra has to 15:45:14 yoctozepto: I know, that's why we need to think of something we can do now 15:45:17 yeah 15:45:20 and they already have their hands full with the new great gerrit 15:45:54 I think we should add cache to k 15:46:01 as I see it is not there 15:46:07 I wanted you to cross-validate me 15:46:23 what do you mean? 15:46:25 and judging by the amount of storage our images need - it might be complicated to find a registry local to nodepool providers :) 15:46:51 no hits of registry-mirrors in kolla job plays 15:47:12 oh, I see 15:47:12 kolla-ansible yes 15:47:14 kolla not 15:47:18 it would help for some build failures 15:47:28 mgoddard: now you know why I thought what I thought ;-) 15:48:03 but while it might save the build jobs, the deploy jobs are likely to fail 15:48:24 yeah, at least we can fix kolla for now - it's an easy step 15:48:31 yes, I thought this 15:48:36 we can move to another registry 15:48:41 but we still need to fetch 15:48:44 distro images 15:48:49 so cache in kolla 15:48:51 and that is enough 15:48:54 for kolla 15:49:08 then move publishing to quay 15:49:13 or something 15:49:16 and why quay is better? 15:49:21 higher limits? 15:49:26 because it has only burst limits 15:49:40 if we move to another registry we can no longer use the registry mirror, and that will increase external traffic significantly 15:49:42 if you do too many reqs per sec 15:49:45 but it has no mirror 15:49:48 so is trickky 15:49:49 indeed 15:50:18 so, in reality the zuul pull through cache is having going over pull limits? 15:50:35 uh, english is complicated 15:50:46 yes, rephrase pretty please 15:50:53 yes, all pulls come from the caches 15:51:01 in k-a jobs 15:51:14 and the caches have one ip, so it's not a problem for them to use the quota 15:51:27 are we sure that disabling the cache is not a better solution? :) 15:51:35 one ip addr per cloud more or less 15:51:50 we are pretty sure regarding the external traffic 15:52:00 well, not better in terms of traffic, but better in terms of pull quotas 15:53:02 hard to say without trying it 15:53:14 it depends on how nodepool is setup 15:53:34 if all nodes are in a single project using one router, they'll all use the same IP anyway 15:53:42 indeed 15:54:13 well then it's even worse, if they all share one external IP via SNAT 15:54:18 yes 15:54:52 so the only way forward is deploying a registry in each nodepool provider and try to sync them (or check for images older than X and build them) 15:55:24 or cache quay 15:55:33 or... check the usage of docker pull quota and decide if we need to build the images, or just pull them (if we have the same IP as the pull through cache) 15:55:45 (docker exposes number of pull requests left in API) 15:55:45 I wonder 15:55:54 if we are not better of rebuilding the images now each time :P 15:56:02 it's a bit racy with multiple jobs though 15:56:22 well, we can build on first host, push into a registry and use that registry 15:56:30 and build only a subset that is required for CI 15:56:35 that is what we are doing 15:56:42 when we need to build them 15:56:46 we are smart (TM) 15:57:00 so what do you say 15:57:07 we just enable building by default 15:57:12 and see how bad that works for us 15:57:31 well, how longer will it take? 10-20 minutes? 15:57:44 why don't we try another registry 15:57:52 but then there's no cache? 15:58:02 all other options seem to lack a cache 15:58:22 folks, 2 minutes 15:58:26 or require pulling in data for building 15:58:32 enable building images each time and see how bad it goes? 15:58:39 for the time being 15:58:42 on master 15:58:53 could do it on master 15:59:13 yeah, let's do it on master - and see how that goes 15:59:19 or even only for source 15:59:24 then basically kolla-ansible jobs work like they always do in kolla 15:59:39 mnasiadka: no, always, otherwise we might still trigger the duckery 15:59:41 kolla-ansible pull | kolla-build 15:59:45 || 16:00:01 doesn't work for pre-upgrade though 16:00:08 also true of building 16:00:26 no, just build on master 16:00:31 stable have their own jobs 16:01:06 so we would still be pulling something extra 16:01:21 eh, dockerhub 16:01:28 was that really necessary 16:01:54 a'ight time is up 16:02:45 yup 16:03:05 maybe we should just accept the Ts & Cs :D 16:03:41 basically - ideally if infra would deploy something like Harbor in each mirror site and setup replication rules - it would be solved like that 16:03:51 we would just publish both to docker hub and local registry 16:04:12 wonder what is the traffic volume we create by publishing 16:04:25 ok, let's finish 16:04:28 thanks 16:04:30 #endmeeting