*** mlavalle has quit IRC | 00:04 | |
*** CeeMac has quit IRC | 00:24 | |
*** hamalq has quit IRC | 01:00 | |
ianw | 2021-04-15 00:55:22,992 ERROR nodepool.driver.NodeRequestHandler[nl03.opendev.org-PoolWorker.osuosl-main-d45571466bf647cb9f2d43d71a7981ae]: [e: 73669cd29bfa4e80adf58a9eedb5e450] [node_request: 300-0013700920] Declining node request due to exception in NodeRequestHandler: | 01:00 |
---|---|---|
ianw | Exception: Unable to find flavor with min ram: 8000 | 01:00 |
ianw | i may have messed up the flavors... | 01:00 |
ianw | or, i may have screwed up the password? | 01:02 |
ianw | that couldn't be though, because it uploaded the imgaes | 01:03 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: OSU OSL : fix typo in project id https://review.opendev.org/c/opendev/system-config/+/786346 | 01:07 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: OSU OSL : use correct flavor name https://review.opendev.org/c/openstack/project-config/+/786347 | 01:09 |
ianw | ^ that should fix it, looking for the wrong name | 01:09 |
*** ysandeep|away is now known as ysandeep | 01:16 | |
openstackgerrit | Merged openstack/project-config master: OSU OSL : use correct flavor name https://review.opendev.org/c/openstack/project-config/+/786347 | 01:37 |
*** brinzhang_ is now known as brinzhang | 02:06 | |
ianw | ok, next problem | 02:20 |
ianw | 409: Client Error for url: http://arm-openstack.osuosl.org:8774/v2.1/59d5d8ec7d0b416d9a6fe92e51718d64/servers, Multiple possible networks found, use a Network ID to be more specific. | 02:20 |
ianw | this might be interesting | 02:25 |
ianw | we have two subnets 140.211.167.64/27 & 140.211.169.0/26 | 02:25 |
ianw | i guess the /27 has enough room for our current 15 hosts | 02:28 |
ianw | maybe? if we do need to alternate, i'm not sure how to specify that | 02:28 |
fungi | a /27 yields 32 total addresses (minus base, broadcast, and probably a gateway) | 02:31 |
fungi | 64 for a /26 | 02:31 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: OSU OSL : add default network https://review.opendev.org/c/opendev/system-config/+/786351 | 02:33 |
openstackgerrit | Merged opendev/system-config master: OSU OSL : fix typo in project id https://review.opendev.org/c/opendev/system-config/+/786346 | 02:34 |
ianw | yeah, i've reached out to find out what the difference is from their end. i'll manually patch in the first (called "public4") and see | 02:34 |
*** hemanth_n has joined #opendev | 02:40 | |
openstackgerrit | Merged opendev/system-config master: nodepool-builder: configure upload workers, reduce nb03 https://review.opendev.org/c/opendev/system-config/+/786341 | 03:27 |
*** gry has left #opendev | 03:28 | |
ianw | No fixed IP addresses available for network: 48bfc43c-c99e-4395-afbd-97d02ef0116a, not rescheduling. | 03:36 |
*** ykarel has joined #opendev | 03:45 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: OSU OSL : add default network https://review.opendev.org/c/opendev/system-config/+/786351 | 04:11 |
*** snapdeal has joined #opendev | 04:16 | |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Fix OSU OSL typo (again! this is so easy to turn around!) https://review.opendev.org/c/openstack/project-config/+/786353 | 04:26 |
openstackgerrit | Merged openstack/project-config master: Fix OSU OSL typo (again! this is so easy to turn around!) https://review.opendev.org/c/openstack/project-config/+/786353 | 04:40 |
*** brinzhang_ has joined #opendev | 04:45 | |
*** brinzhang has quit IRC | 04:48 | |
*** whoami-rajat has joined #opendev | 04:50 | |
*** marios has joined #opendev | 05:03 | |
openstackgerrit | Ian Wienand proposed openstack/project-config master: nodepool: make ARM64 config names consistent https://review.opendev.org/c/openstack/project-config/+/786357 | 05:18 |
openstackgerrit | Merged opendev/system-config master: OSU OSL : add default network https://review.opendev.org/c/opendev/system-config/+/786351 | 05:21 |
*** vishalmanchanda has joined #opendev | 05:41 | |
*** sshnaidm|pto has quit IRC | 05:47 | |
*** sshnaidm|pto has joined #opendev | 05:48 | |
*** slaweq has quit IRC | 05:55 | |
*** slaweq_ has joined #opendev | 05:55 | |
*** slaweq_ is now known as slaweq | 05:55 | |
frickler | fungi: since you didn't generate the new signing keys yet (or did I just look wrong?), maybe we can discuss key parameters. my initial idea was to move to rsa4096, which I seemed to remember having discussed some time ago already, but couldn't find it in my logs | 05:59 |
*** ykarel has quit IRC | 06:00 | |
*** ykarel has joined #opendev | 06:00 | |
frickler | then I looked at your original spec, which mentions a possible transition to ed25519 at some point, which makes me wonder whether it would be feasible as a first step to have both key types and do two signatures for everything | 06:01 |
*** eolivare has joined #opendev | 06:09 | |
*** ralonsoh has joined #opendev | 06:10 | |
*** sboyron has joined #opendev | 06:29 | |
*** slaweq_ has joined #opendev | 06:30 | |
*** icey has quit IRC | 06:34 | |
*** icey has joined #opendev | 06:35 | |
*** slaweq has quit IRC | 06:36 | |
*** slaweq_ is now known as slaweq | 06:36 | |
*** fressi has joined #opendev | 06:47 | |
*** ykarel_ has joined #opendev | 06:49 | |
*** ykarel has quit IRC | 06:51 | |
*** jpena|off is now known as jpena | 06:55 | |
*** amoralej|off is now known as amoralej | 07:05 | |
*** jaicaa has quit IRC | 07:10 | |
*** jaicaa has joined #opendev | 07:13 | |
*** andrewbonney has joined #opendev | 07:24 | |
*** ykarel_ is now known as ykarel | 07:30 | |
*** rpittau|afk is now known as rpittau | 07:45 | |
*** tosky has joined #opendev | 07:46 | |
*** dtantsur|afk is now known as dtantsur | 07:59 | |
*** hemanth_n has quit IRC | 08:32 | |
*** ykarel is now known as ykarel|lunch | 08:33 | |
*** ysandeep is now known as ysandeep|lunch | 08:49 | |
*** slaweq has quit IRC | 08:51 | |
*** slaweq has joined #opendev | 08:51 | |
*** vishalmanchanda has quit IRC | 09:27 | |
*** ysandeep|lunch is now known as ysandeep | 09:32 | |
*** DSpider has joined #opendev | 09:35 | |
*** ykarel|lunch is now known as ykarel | 09:38 | |
*** dpawlik9 has quit IRC | 09:46 | |
*** vishalmanchanda has joined #opendev | 09:48 | |
*** snapdeal has quit IRC | 09:52 | |
*** dpawlik4 has joined #opendev | 09:59 | |
*** ykarel_ has joined #opendev | 10:29 | |
*** ykarel has quit IRC | 10:32 | |
*** hrw has joined #opendev | 10:35 | |
hrw | morning | 10:35 |
hrw | kevinz: INFO:kolla.common.utils.placement-base:E: Failed to fetch http://mirror.regionone.linaro-us.opendev.org/debian/dists/bullseye/main/binary-arm64/Packages 403 Forbidden [IP: 2604:1380:4111:3e54:f816:3eff:fe17:6b17 80] | 10:36 |
hrw | ok, works again | 10:38 |
hrw | is there a way to recheck only check-arm64 queue? | 10:43 |
*** ykarel_ has quit IRC | 10:44 | |
*** ykarel_ has joined #opendev | 10:45 | |
*** ykarel_ is now known as ykarel | 11:15 | |
*** jpena is now known as jpena|lunch | 11:31 | |
*** ykarel_ has joined #opendev | 11:54 | |
*** ykarel has quit IRC | 11:56 | |
*** ykarel_ is now known as ykarel | 12:02 | |
*** CeeMac has joined #opendev | 12:05 | |
*** jpena|lunch is now known as jpena | 12:33 | |
*** amoralej is now known as amoralej|lunch | 12:36 | |
*** snapdeal has joined #opendev | 12:37 | |
*** snapdeal has quit IRC | 12:42 | |
*** snapdeal has joined #opendev | 12:44 | |
fungi | frickler: i haven't generated it yet, no, was probably going to do that today. switching the key type would be easy enough, but signing with two keys would need changes to the underlying ansible role | 13:06 |
fungi | also what's wrong with 3072-bit rsa? seems to still be gnupg's default | 13:08 |
*** marios is now known as marios|call | 13:15 | |
fungi | aha, looks like gnupg 2.3.0 changes the default to rsa4096 | 13:18 |
*** amoralej|lunch is now known as amoralej | 13:29 | |
fungi | er, actually that's the python-gnupg docs, not the gnupg docs | 13:34 |
*** klonn has joined #opendev | 13:39 | |
*** klonn has quit IRC | 13:42 | |
*** marios|call is now known as marios | 13:55 | |
*** diablo_rojo has joined #opendev | 13:56 | |
mordred | LinkedIn advertised this to me today: https://www.linkedin.com/learning/parallel-and-concurrent-programming-with-c-plus-plus-part-1/learn-parallel-programming-basics ... I totally thought it was a joke, because it sure does look like a cooking show. but it does, indeed, seem to be a c++ course, and they sure do look like they're going to be using food and cooking metaphors as they write multi-threaded code | 14:25 |
mordred | (the intro seems to be available without login or subscription) | 14:25 |
fungi | also when they reach the point where it's time to put the source into the compiler, they just pull out one they compiled earlier to save time | 14:26 |
fungi | cooking show methods totally work for programming classes | 14:26 |
fungi | hrw: yes! you want "check arm64" according to https://opendev.org/openstack/project-config/src/branch/master/zuul.d/pipelines.yaml#L386 | 14:28 |
fungi | hrw: for those 403 errors from the mirror, we see evidence of intermittent network connectivity in that cloud, ianw was looking into it yesterday | 14:29 |
*** avass has quit IRC | 14:42 | |
*** avass has joined #opendev | 14:43 | |
*** snapdeal has quit IRC | 14:44 | |
hrw | fungi: thanks | 15:25 |
fungi | my pleasure | 15:26 |
*** ysandeep is now known as ysandeep|dinner | 15:36 | |
*** mlavalle has joined #opendev | 15:36 | |
fungi | frickler: okay, so gnupg 2.3.0 (2021-04-07) switched the default key algorithm to ed25519/cv25519, that's all the convincing i need: https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=blob;f=NEWS;h=6aec353e359b2502018b572b0bc869e3dac518cb;hb=refs/heads/master#l27 | 15:40 |
fungi | willing to give that a try as long as the 2.2.4 on bridge has support (which i believe it does) | 15:41 |
fungi | i'll know shortly | 15:43 |
*** ykarel has quit IRC | 15:47 | |
johnsom | Cloning into 'releases'... | 15:49 |
johnsom | fatal: unable to access 'https://opendev.org/openstack/releases.git/': GnuTLS recv error (-110): The TLS connection was non-properly terminated. | 15:49 |
fungi | that doesn't look good | 15:50 |
fungi | i'll check the servers, see if one is having a bad day | 15:50 |
johnsom | Hmmm, that was on a focal host that updated yesterday. So, not sure if it's just me or ... | 15:50 |
johnsom | subject: CN=gitea02.opendev.org | 15:51 |
fungi | established tcp connections on the lb shot waaay up in the past few minutes, suggesting one of the backends has turned into a tarpit | 15:51 |
fungi | yep, 02 looks about to oom | 15:52 |
fungi | i'll take it out of the lb pool | 15:52 |
fungi | #status log Temporarily disabled the gitea02 backend in haproxy due to impending memory exhaustion | 15:53 |
openstackstatus | fungi: finished logging | 15:53 |
fungi | johnsom: working now? | 15:53 |
fungi | i have a feeling, based on what we saw from the openstackansible user a few weeks back, that the wallaby release is going to trigger a bunch of this | 15:54 |
johnsom | Hmm, nope, the TLS isn't completing for me now | 15:54 |
clarkb | ya we're just redirecting the hose when we do that | 15:54 |
johnsom | Ah, just slow. Got 6 this time. | 15:54 |
fungi | 01 is also being slammed | 15:54 |
fungi | might be all of them | 15:54 |
johnsom | Yeah, not cloning | 15:55 |
fungi | 03 as well | 15:55 |
fungi | seems we're under a distributed denial of service attack. i'll see if i can map them all to a single netblock | 15:55 |
clarkb | fungi: let me know if I can help | 15:55 |
fungi | i've put 02 back into the pool for now while i dig deeper | 15:56 |
fungi | unfortunately nothing as simple as connection count is going to tell us who's doing it | 15:57 |
fungi | pretty sure this is going to wind up being another case of "i told my 1000 node cluster to upgrade openstack, so all the servers are cloning nova now" | 15:58 |
fungi | i guess i'll try to sample some clone requests from one of the backends and then map that in the haproxy log to client addresses | 15:59 |
*** hamalq has joined #opendev | 16:00 | |
fungi | made much harder by the fact that they're in swap thrash now | 16:01 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Upgrade gitea to 1.13.7 https://review.opendev.org/c/opendev/system-config/+/786466 | 16:01 |
*** marios is now known as marios|out | 16:01 | |
clarkb | fungi: ^ I don't think that will help, but always a good reminder when this sort of thing happens | 16:01 |
clarkb | fungi: ya that is roughly what I have done in the past, look at the access log to see which requests are very slow/large on a gitea server. Then map back to load balancer | 16:02 |
*** hamalq has quit IRC | 16:02 | |
*** hamalq has joined #opendev | 16:02 | |
fungi | i have a feeling this might be wal-mart | 16:05 |
*** jpena is now known as jpena|off | 16:05 | |
fungi | need to map more samples back, but this is somewhat tedious and takes time | 16:06 |
*** rpittau is now known as rpittau|afk | 16:09 | |
fungi | it seems like nobody besides us uses reverse dns these days | 16:11 |
johnsom | Looks like there is a gitea patch proposed to add PROXY protocol support, so there is hope in sight that you can get the actual client addresses in the near-ish future | 16:12 |
johnsom | https://github.com/go-gitea/gitea/pull/12527 | 16:12 |
clarkb | oh that is great news | 16:13 |
fungi | johnsom: well, we don't do layer 7 proxying at the moment though, so likely won't help us | 16:13 |
fungi | this is straight up layer 4, tls is terminated on the backends | 16:13 |
johnsom | PROXY protocol works with all protocols via HAProxy | 16:13 |
fungi | oh, wait that right | 16:14 |
johnsom | Yeah, that is not a problem | 16:14 |
fungi | i was thinking of x-forwarded-for | 16:14 |
clarkb | ya this is an haproxy thing | 16:14 |
clarkb | and then you need webservers that also support it (not sure if apache does out of the box but one problem at a time) | 16:14 |
johnsom | Well, almost everything supports it now, but HAProxy started it, yes | 16:14 |
fungi | so not plain tcp, but apache can parse it | 16:14 |
fungi | a wrapper basically | 16:15 |
fungi | (tcp-in-tcp essentially) | 16:15 |
johnsom | It's prefix bytes to the TCP connection | 16:15 |
johnsom | So, right before the TLS in your case | 16:15 |
fungi | yep, i remember investigating it before for something else | 16:15 |
fungi | anyway, we could be using it now, we don't actually terminate ssl with gitea, we terminate it with an apache running on the gitea servers | 16:16 |
johnsom | Ah, yeah, you can totally switch that on now if you have a frontend web server for gitea | 16:16 |
fungi | so we could just do it between haproxy and apache | 16:17 |
fungi | we're relying on apache to be able to implement user agent based filters for certain really annoying botnets | 16:17 |
clarkb | mod remoteip apparently | 16:18 |
johnsom | Yep, mod_remoteip | 16:21 |
johnsom | mod_proxy_protocol is getting merged into mod_remoteip | 16:21 |
clarkb | fungi: fwiw I can hit 01 now without any issue | 16:21 |
clarkb | perhaps whatever it was has settled down? | 16:21 |
johnsom | Yeah, I just got a clone | 16:22 |
*** marios|out has quit IRC | 16:22 | |
fungi | unfortunately the investigation on gitea02 so far indicates that between 15:40 and 16:00 utc, the most frequent requesters of an openstack/nova git-upload-pack were a couple of addresses whois says are registered to wal-mart | 16:23 |
fungi | i'll check 06 for a second data point | 16:23 |
*** mlavalle has quit IRC | 16:23 | |
*** _mlavalle_1 has joined #opendev | 16:24 | |
fungi | yep, two different addresses in the same netblock | 16:24 |
clarkb | since things seem to be settling I'm going to look at my todo list again which says it is time to clean up external ids | 16:24 |
johnsom | FYI, I recommend using proxy protocol v2, the config in haproxy is send-proxy-v2. There are additional options if you are TLS offloading, but it sounds like you are not. | 16:25 |
clarkb | ping me if you need help with gitea (or anything else) but I'm going to focus on running the externalid cleanups for the accounts I recently retired then will also rerun consistency chceks on gerrit | 16:25 |
fungi | right, it's probably too late to go blocking them now, but i've also got a method to be able to map these a little faster next time it hits | 16:25 |
fungi | established connections in haproxy is falling sharply at this point so i expect it's over for the moment | 16:26 |
fungi | in the span of 20 minutes we saw what seemed to be ~500 attempts to clone nova from addresses in that netblock | 16:27 |
fungi | extrapolating across our backends | 16:27 |
smcginnis | Looks like still issues. I've been waiting for https://opendev.org/openstack/ to load for several minutes. | 16:32 |
clarkb | smcginnis: ya I think its falling off, not necessarily completely happy yet | 16:34 |
fungi | well, in the latest sample, looks like it could be spiking back up again | 16:34 |
fungi | if this is similar to what we saw with the osa user a few weeks back, they tried to deploy an upgrade, that broke with a bunch of git errors, so once they saw the failures reported back they tried again | 16:34 |
fungi | and again | 16:35 |
fungi | and again | 16:35 |
clarkb | ah yup there is a recent spike on cacti | 16:35 |
fungi | i'll see if it's still the same addresses | 16:35 |
clarkb | it was falling off :) | 16:35 |
*** dtantsur is now known as dtantsur|afk | 16:36 | |
fungi | yep, same class c network | 16:37 |
clarkb | the external id cleanups ran and log file has been put in the normal spot on review | 16:37 |
clarkb | I'm going to run consistecny checking next | 16:38 |
fungi | i'm blocking it temporarily on the lb | 16:38 |
clarkb | fungi: that seems reasonable and maybe they will reach otu and we can ask tehm to not ddos us | 16:38 |
jrosser | i wonder if jmccrory is still at walmart | 16:38 |
fungi | #status log Temporarily blocked 161.170.233.0/24 in iptables on gitea-lb01.opendev.org to limit impact from excessive git clone requests | 16:39 |
openstackstatus | fungi: finished logging | 16:39 |
fungi | jrosser: are they likely to also be using osa, like our previous incident? maybe hitting the same bug? | 16:39 |
jrosser | well i just put 2 and 2 together and get 5 maybe as jimmy was with walmart and also a known OSA contributor/user | 16:41 |
fungi | ahh, okay | 16:41 |
*** _mlavalle_1 has quit IRC | 16:42 | |
jrosser | fwiw the behaviour that the folk from uvic.ca had where it cloned everything 100s of times is totally wrong and i've reached out to them since to try and unpick whats happening in their environment | 16:45 |
fungi | memory utilization seems to be falling on the backends again, so seems like those addresses were probably the source of the expensive requests | 16:46 |
clarkb | fungi: connection count is falling way off on the lb too | 16:46 |
fungi | yup | 16:46 |
fungi | though that's a secondary indicator | 16:46 |
fungi | basically backends start responding very slowly and all the normal incoming requests pile up | 16:47 |
fungi | in theory we'll see the swap thrash subside on the backends before the connection count recovers | 16:47 |
*** amoralej is now known as amoralej|off | 16:48 | |
clarkb | 281 is not unique errors now (down from 334) | 16:49 |
fungi | for the gerrit account collisions? | 16:51 |
clarkb | yup | 16:51 |
fungi | excellent! | 16:51 |
clarkb | and that is down overall from 643 | 16:51 |
clarkb | 334 was the previous state before this round of cleanups | 16:52 |
fungi | need to take a paperwork break to scan some paper records, but i'll check back in on the gitea farm in a bit | 16:52 |
*** eolivare has quit IRC | 16:55 | |
clarkb | doesn't seem to be spiking back up again | 16:56 |
fungi | yep, load averages have also all fallen <1 now | 16:58 |
fungi | so whatever needed to be paged back in has been i think | 16:58 |
fungi | and caches are rewarmned | 16:58 |
fungi | rewarmed | 16:58 |
*** eolivare has joined #opendev | 16:59 | |
clarkb | I've got my audit script rerunning so that I don't see the old data for the accounst that were just cleaned up. When that is done I'm going to context switch to reviewing zuul changes and booting new zk servers though. Don't think I'll get to the next batch of account cleanups for a little bit | 17:04 |
*** mlavalle has joined #opendev | 17:05 | |
*** ralonsoh has quit IRC | 17:12 | |
*** eolivare has quit IRC | 17:14 | |
*** ysandeep|dinner is now known as ysandeep | 17:16 | |
clarkb | cacti continues to make gitea look stable | 17:20 |
fungi | yup | 17:22 |
fungi | though i feel bad for the wal-mart sysadmins, i hope they reach out soon | 17:23 |
*** zul has joined #opendev | 17:26 | |
clarkb | audit has completed and it looks about how I expect it. I think I'll call that done for now as everyting looks good | 17:33 |
fungi | thanks! | 17:34 |
*** andrewbonney has quit IRC | 17:50 | |
mnaser | infra-root, infra-core: docker hub registry is fully down right now -- so just a heads up in case jobs start reporting failures =) | 18:00 |
clarkb | neat | 18:03 |
clarkb | mnaser: do you know if they have a status page where they will track that? usually we pass that sort of thing along to people who ask about when it might be fixed | 18:03 |
clarkb | (I can google for it too when I finish these reviews) | 18:03 |
mnaser | clarkb: https://status.docker.com/pages/533c6539221ae15e3f000031 :) | 18:04 |
mnaser | and more specifically, https://status.docker.com/pages/incident/533c6539221ae15e3f000031/60787e0cfb9e67053616ba8a is the incident | 18:04 |
clarkb | thanks | 18:04 |
fungi | that's certainly fun | 18:08 |
* fungi bets it's all the kolla users upgrading to wallaby | 18:08 | |
fungi | though i think kolla hasn't tagged wallaby yet | 18:08 |
fungi | things are still looking okay with the git servers, it seems | 18:10 |
clarkb | I'm starting the boot zk04.opendev.org, zk05.opendev.org, and zk06.opendev.org. I don't expect issues related to the docker hub outage as we don't do docker things at launch | 18:14 |
clarkb | then tomorrow we can start stumbling through replacement of the old servers | 18:15 |
clarkb | https://status.docker.com/pages/533c6539221ae15e3f000031 reports they are operational now | 18:16 |
clarkb | so from at least 17:55UTC to ~18:16UTC there could be problems in job logs | 18:17 |
clarkb | I suspect it may have started a bit before 17:55UTC too but that is what they recorded on the incident page | 18:17 |
*** vishalmanchanda has quit IRC | 18:27 | |
*** sboyron has quit IRC | 18:42 | |
openstackgerrit | Clark Boylan proposed opendev/zone-opendev.org master: Add new zookeeper servers to DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/786484 | 19:01 |
*** hamalq has quit IRC | 19:17 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add zk04.opendev.org https://review.opendev.org/c/opendev/system-config/+/786487 | 19:20 |
clarkb | I'm WIP'ing ^ but that should be the first change we need to implement option A at https://etherpad.opendev.org/p/opendev-zookeeper-upgrade-2021 | 19:21 |
clarkb | corvus: ^ for that zk work one of the steps in my document is to update the client configs by hand to see all three new servers when the first two are rotated in. This way we only need to do one restart of the clients. The only clients that currently matter are zuul-scheduler, nodepool launchers and builders? Should I plan to do the mergers and executors too? | 19:22 |
clarkb | I did just confirm that 02 is still the leader so the order there is correct at least for now | 19:24 |
*** hamalq has joined #opendev | 19:31 | |
corvus | clarkb: currently scheduler and nodepool yes. i expect us to add executors to that very soon (next week?), but will require executor restarts of course, so as long as system-config tracks your by-hand work closely (so the new config is on disk when the executors are restarted), shouldn't be an issue. | 19:43 |
clarkb | corvus: yup I believe zuul will be writing out the updates as the group membership changes | 19:45 |
clarkb | so it should track, the by hand step is merely there to ensure we don't have to restart frequently (we can lie and tell zuul/nodepool there is a future config state we haven't quite reached yet ) | 19:45 |
clarkb | fwiw I did spend some time today double checking the dynamic reconfiguration support in zk and it seems complicated enough that not relying on it for now seems ideal | 19:46 |
clarkb | (there are new config files involved and you have to configure acls etc) | 19:46 |
clarkb | that said, their docs give clients a short recipe for tracking those changes and we might want to consider switching to it in the future and then updating zuul and nodepool to auto shift their connections based on watches of the config data | 19:47 |
*** whoami-rajat has quit IRC | 19:47 | |
corvus | clarkb: note if we do that we may need to look into details about how we're handling the config in the containers | 19:48 |
clarkb | corvus: yup that is why I'm punting, it seems like a neat feature to support but also complicated so I didn't want to mix that into this process | 19:48 |
corvus | we're mostly steamrolling over the dynamic update feature to make our "git is authoritative" approach work | 19:49 |
clarkb | corvus: it is also disabled by default due to security concerns | 19:49 |
clarkb | whcih is another aspect to consider | 19:49 |
clarkb | I've also started thinking about what a zuul scheduelr replacement looks like. It appears that we don't actually auto start the zuul scheduler when deploying it. I think that means we can deploy a zuul02.opendev.org, have ansible write out configs and lay down dirs as well as pull containers. Then we stop 01's zuul, sync secrets (and maybe even git repos) to 02, then start zuul on 02 | 19:50 |
clarkb | update DNS and in theory we've done a fairly seemless upgrade there | 19:50 |
*** auristor has quit IRC | 19:52 | |
clarkb | I think we can also shrink the new zuul server. We may run into OOM problems more quickly but that may even be a good thing? | 19:53 |
clarkb | the new zk servers should be the same flavor as the old ones fwiw | 19:53 |
*** diablo_rojo has quit IRC | 19:54 | |
*** slittle1 has joined #opendev | 19:57 | |
*** auristor has joined #opendev | 19:59 | |
slittle1 | Hmmm... having trouble creating our StarlingX release branchs | 19:59 |
slittle1 | My notes say this used to work ... | 20:00 |
*** ysandeep is now known as ysandeep|away | 20:01 | |
slittle1 | git push --tags gerrit r/stx.5.0 | 20:01 |
clarkb | slittle1: branches need to be created through the web ui | 20:02 |
clarkb | the command you have pasted looks like one to push tags | 20:02 |
clarkb | (I suppose you could do it via the rest api too, but branch creation via git requires force push access and that isn't typically available) | 20:03 |
openstackgerrit | Clark Boylan proposed opendev/gerritlib master: Add function to set project parent https://review.opendev.org/c/opendev/gerritlib/+/786500 | 20:04 |
fungi | branch creation via git requires direct push rights yes (but not push --force) | 20:09 |
openstackgerrit | Clark Boylan proposed opendev/jeepyb master: Set gerrit project parents https://review.opendev.org/c/opendev/jeepyb/+/786501 | 20:09 |
slittle1 | I thought membership in group starlingx-release gave be the power | 20:10 |
clarkb | fungi: ^ I don't know that those are complete and it may be worth WIPing the jeepyb change until we're happy with it | 20:10 |
slittle1 | if not, tell me more about the rest api | 20:11 |
clarkb | slittle1: it will depend on the exact acl config you've got in place. But typically the power you have allows you to create branches via the web ui or the rest api but not through git | 20:11 |
fungi | slittle1: membership in group starlingx-release currently gives power to create branches via the webui or rest api, but you need a different permission to do it via git push (and it would also allow you to completely bypass code review) | 20:11 |
*** diablo_rojo has joined #opendev | 20:11 | |
clarkb | slittle1: https://review.opendev.org/Documentation/rest-api.html#authentication is the first thing to look at for the rest api as you will need to authenticate for this. | 20:11 |
clarkb | slittle1: then you can use https://review.opendev.org/Documentation/rest-api-projects.html#create-branch to create the branch | 20:12 |
fungi | i wouldn't recommend setting direct push permissions for a group unless the members of that group use separate accounts for it so they won't accidentally bypass code review and wind up pushing code directly to a branch | 20:12 |
clarkb | fungi: I have WIP'd 786501 | 20:13 |
fungi | clarkb: ooh, thanks! i'll take a look after i finish dinner | 20:13 |
slittle1 | the main idea here is to have a script that walks over all our git repos, pushes a new branch, a new tag, and a .gitreview update more without review | 20:14 |
clarkb | slittle1: creating the branch and tag you can do without review (in fact there isn't a way to review those). Then you could push the .gitreview change up for review and autoapprove it | 20:15 |
fungi | slittle1: you could do it via curl and an api password in that case, or use a dedicated account which isn't normally used for interactive git activity to avoid accidents | 20:15 |
fungi | the gerrit rest api is scriptable | 20:16 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add zk04.opendev.org https://review.opendev.org/c/opendev/system-config/+/786487 | 20:20 |
clarkb | neat if you push a new ps to a wip change it stays wip by default | 20:21 |
clarkb | slittle1: https://opendev.org/opendev/system-config/src/branch/master/tools/gerrit-account-inconsistencies/remove-user-external-ids.py that is a python script I wrote semi recently thatuses the rest api for other purposes but does both reads and writes and may be helpful | 20:23 |
slittle1 | hmm, Found the release branch scripts of an old colleague... he was using 'git push' the same way i was | 20:27 |
slittle1 | At some point, that must have been permitted for our release group | 20:28 |
fungi | what repository are you trying to create a branch on? | 20:28 |
fungi | and was that script previously used to create branches in gerrit or on the earlier starlingx github repos? | 20:29 |
clarkb | also possible there was a behavior change we didn't expect in the gerrit upgrade | 20:29 |
clarkb | but the behavior slittle1 describes now is what I would've expected pre upgrade too (and I expect it now) | 20:29 |
slittle1 | pretty much all the starlingx/* repos ... e.g starlingx/tools.git | 20:30 |
clarkb | slittle1: https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/starlingx/tools.config is the acl config for that repo | 20:32 |
clarkb | slittle1: it says you can create the branch (which is what allows you to do rest api or web ui branch creation), but there is no push permission | 20:32 |
clarkb | you would need a push permission on refs/heads/* to push a new branch to it that way using git | 20:33 |
clarkb | are you sure the old script you are looking at wasn't just pushing tags? | 20:33 |
clarkb | because the command you pasted would be the sort of thing I would expect for someone pushing a tag | 20:33 |
slittle1 | It did both | 20:34 |
clarkb | and https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/starlingx/tools.config#L11 does allow pushing tags | 20:34 |
fungi | even on the older version of gerrit we were running (2.13) the only way to allow branch creation via git push was to grant push in the [access "refs/heads/*"] section of the acl. the create permission which is currently there in your acls has only permitted creation via the webui and rest api. the access control docs point out the difference (and also indicate that push rights allow bypassing code review for | 20:34 |
fungi | pushing commits directly to branches): https://review.opendev.org/Documentation/access-control.html#category_create | 20:34 |
slittle1 | look at release/branch-repo.sh within starlingx/tools.git | 20:34 |
slittle1 | perhaps we had the power a few years ago and lost it during one of the upgrades | 20:35 |
fungi | it looks to me like that script expected the target branch to be precreated in gerrit, and it's pushing the release tags for it? | 20:36 |
fungi | ahh, it's referencing a tag for a local branch | 20:37 |
slittle1 | No, it creates the branch it not found, which was the normal case | 20:39 |
clarkb | https://opendev.org/starlingx/tools/src/branch/master/release/branch-repo.sh#L184-L187 I think that is the key clue? | 20:39 |
clarkb | it does seem toe expect SRC_BRANCH and BRANCH to be identical? | 20:40 |
clarkb | oh wiat nevermind I see what that is saying | 20:40 |
clarkb | update the gitreview file if the branch has shifted | 20:40 |
slittle1 | No, they should be differnt | 20:40 |
clarkb | so ya I'm not sure how that would've ever worked looking at the acl configs for a few starlingx projects | 20:40 |
clarkb | the git push to create a branch requires permissions that are not there | 20:40 |
slittle1 | The stuff you are keying in on was just there to resume a run that failed part way through | 20:41 |
clarkb | however, the permissions that are there allow you to create the branch via a different method | 20:41 |
slittle1 | yep, gotta figure out the new method | 20:41 |
clarkb | too bad dtroyer doesn't appear to be on irc anymore, we could ask :) | 20:43 |
fungi | slittle1: there is also an ssh cli command you can use with the current permissions: https://review.opendev.org/Documentation/cmd-create-branch.html | 20:44 |
fungi | if you're authenticating other things via ssh, that may be the easiest solution | 20:44 |
clarkb | fungi: oh good catch | 20:45 |
slittle1 | ssh -p 29418 review.opendev.org gerrit create-branch > | 20:46 |
slittle1 | ? | 20:46 |
fungi | yep | 20:47 |
fungi | ssh -p 29418 review.opendev.org gerrit create-branch starlingx/tools newbranch deadbeef12345678... | 20:48 |
fungi | relies on the create reference permission, which is what the acl has for the starlingx-release group | 20:49 |
clarkb | fungi: maybe we should think about resurrecting https://review.opendev.org/c/opendev/system-config/+/774023 re the gitea sadness earlier today | 20:56 |
clarkb | the trick with that chagne remains figuring out a viable limit that allows NAT'd users through while also restricting floods | 20:56 |
slittle1 | is there a create-tag as well ? | 21:00 |
clarkb | slittle1: no, that you git push | 21:01 |
fungi | tags need local key material to sign anyway, so gerrit couldn't technically "create" those anyway | 21:01 |
slittle1 | git push gerrit test_tag_123:test_tag_123 | 21:02 |
slittle1 | ? | 21:02 |
clarkb | the rest api does have a method for that but that must be an unsigned tag | 21:02 |
clarkb | `git push gerrit tag test_tag_123` iirc | 21:03 |
clarkb | "tag <tag> means the same as refs/tags/<tag>:refs/tags/<tag>" from the manpage for git push | 21:03 |
clarkb | so that is shorthand for `git push gerrit refs/tags/test_tag_123:refs/tags/test_tag_123` | 21:03 |
slittle1 | git push gerrit tag test_tag_123 | 21:04 |
slittle1 | ! [remote rejected] test_tag_123 -> test_tag_123 (prohibited by Gerrit: not permitted: create) | 21:04 |
fungi | git push gerrit test_tag_123 | 21:04 |
fungi | but also it needs to be a signed tag | 21:04 |
fungi | starlingx-release has permission to push signed tags, not unsigned tags | 21:04 |
fungi | https://docs.opendev.org/opendev/infra-manual/latest/drivers.html#tagging-a-release | 21:05 |
clarkb | Note you won't be able to delete that tag if you push it | 21:05 |
clarkb | (note sure how much you'll care about that) | 21:05 |
clarkb | as a side note, I think git has the ability to tell you where a branch diverged from its parent. If that is what this tag is for it may not be necessary (though perhaps simpler to just check the tag value than inspect history) | 21:10 |
fungi | spelunking through the infra-manual history, back in 2014 we added instructions for openstack stable branch creation which look like they involved openstack's release managers using git push to create them (but i don't recall that ever working unless they were also administrators). unfortunately then in march of last year a change was made to the manual copying this option into the general section on branch | 21:25 |
fungi | creation | 21:25 |
clarkb | The only time I can remember promoting release managers was for deletions not creations | 21:26 |
fungi | so up until roughly a year ago, we at least had a sentence in the manual saying branch creation by git push isn't expected to work, but that was lost in the branch creation section refactor | 21:29 |
clarkb | fungi: https://zuul.opendev.org/t/openstack/build/41126504fb3a4359ad6d282e76ef091a/log/job-output.txt#1589-1600 looks like it may be working to set the project parent | 21:31 |
clarkb | we should double check that jeepyb is being run multiple times against that project to ensure the cache udpate is working. We may also want to start inspecting the resulting state a bit better | 21:32 |
fungi | it was in the feature branch section before: "To get started with a feature branch you will need to create the new branch in Gerrit with the 'feature/' prefix. Note that Gerrit ACLs do not allow for pushing of new branches via git, but specific groups of Gerrit users can create new branches." | 21:32 |
fungi | added by https://review.openstack.org/138200 in 2014 | 21:32 |
fungi | strangely the git push recommendation for creating proposed/.* branches was added to the manual that same month by https://review.openstack.org/138206 | 21:35 |
fungi | clarkb: i don't think 774023 would have helped much today since it was lots of different addresses spread out over the entire cluster... i expect they were each only doing a connection or two at a time, but retrying after not getting any response | 21:54 |
clarkb | I see so not one doing a bunch of concurrent requests * bignumber but one doing one request * big number | 21:55 |
fungi | however the git-upload-pack requests were probably continuing to be processed by the backend even after the client disconnected and retried | 21:55 |
fungi | once it gets into a sad state, it likely starts a chain reaction which won't end until the client gives up and stops retrying to clone | 21:56 |
clarkb | fungi: https://review.opendev.org/c/opendev/zone-opendev.org/+/786484 do you think we can land that one in prep for using those new zk servers sometime soon (kinda sounds like airship might be delaying more ...) | 22:00 |
fungi | lgtm, approved it | 22:04 |
clarkb | thanks | 22:05 |
clarkb | the WIP change to do the swapout of the first node passes testing now too | 22:06 |
openstackgerrit | Merged opendev/zone-opendev.org master: Add new zookeeper servers to DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/786484 | 22:07 |
openstackgerrit | Clark Boylan proposed opendev/gerritlib master: Add function to set project parent https://review.opendev.org/c/opendev/gerritlib/+/786500 | 22:22 |
clarkb | fungi: ^ that adds a bit more testing confirmation of state we want. I've rechecked the jeepyb change too | 22:23 |
fungi | git status | 22:29 |
fungi | heh, you're not my command shell! | 22:29 |
clarkb | on branch irc | 22:29 |
clarkb | Your branch is up to date with freenode/irc | 22:29 |
fungi | nice | 22:29 |
openstackgerrit | Jeremy Stanley proposed opendev/infra-manual master: Update branch creation for PolyGerrit https://review.opendev.org/c/opendev/infra-manual/+/786512 | 22:29 |
fungi | capturing the earlier conversation while it's fresh in our minds ^ | 22:31 |
*** tosky has quit IRC | 22:34 | |
clarkb | +2 | 22:34 |
openstackgerrit | Clark Boylan proposed opendev/gerritlib master: Add function to set project parent https://review.opendev.org/c/opendev/gerritlib/+/786500 | 22:53 |
*** gothicserpent has quit IRC | 23:02 | |
*** cenne is now known as cenne|out | 23:08 | |
clarkb | cool ^ that fails the way I expect it to now. The jeepyb side should pass | 23:12 |
clarkb | hrm jeepyb fails with the same error | 23:13 |
openstackgerrit | Clark Boylan proposed opendev/gerritlib master: Add function to set project parent https://review.opendev.org/c/opendev/gerritlib/+/786500 | 23:15 |
*** gothicserpent has joined #opendev | 23:16 | |
clarkb | and the latest batch of testing hit the dockerhub rate limit. Might be time to call it a day | 23:26 |
fungi | dockerhub sounds the shift change whistle | 23:31 |
fungi | i've still gotta stick around to at least see the end of the tc vacancy poll through | 23:32 |
fungi | but that's just in a little over 10 minutes at this point | 23:32 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!