Friday, 2023-03-10

opendevreview	Merged openstack/project-config master: Add charms-stable-maint group for charms projects https://review.opendev.org/c/openstack/project-config/+/872406	00:28
opendevreview	Merged openstack/project-config master: Retire puppet-tacker - Step 1: End project Gating https://review.opendev.org/c/openstack/project-config/+/874539	00:29
fungi	clarkb: ^ you were wanting to see successive project-config changes and how checkouts impacted things at deployment	00:30
fungi	i approved a few	00:30
opendevreview	Merged openstack/project-config master: Periodically update Puppetfile_unit https://review.opendev.org/c/openstack/project-config/+/875302	00:31
clarkb	thanks. I think there is a weird interaction where the job fails successfully or something in the rename case though? its definitely something I want to dig into to understand better and probably document	00:32
opendevreview	Merged openstack/project-config master: Add the main NebulOuS repos https://review.opendev.org/c/openstack/project-config/+/876054	00:32
opendevreview	Merged openstack/project-config master: Add Ironic Dashboard charm to OpenStack charms https://review.opendev.org/c/openstack/project-config/+/876205	00:32
clarkb	Tomorrow I'll look at finishing up the gitea05-07 deletions	00:36
clarkb	I don't expect anyone has stashed anything they need on those servers but infra-root consider this your warning	00:37
fungi	i definitely haven't	00:37
opendevreview	Merged openstack/project-config master: Add the NebulOuS tenant https://review.opendev.org/c/openstack/project-config/+/876414	01:43
fungi	yoctozepto: ^ that's deployed	01:57
fungi	https://zuul.opendev.org/t/nebulous/jobs	01:58
fungi	just inherited stuff for now, but it's there and ready for next steps	01:59
opendevreview	OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/877057	02:15
ianw	i took the liberty of adding yoctozepto to nebulous-core	02:41
fungi	oh, good thinking!	02:41
fungi	ianw: nebulous-project-config-core was made as a separate group too	02:43
ianw	ok i added to that too :) luckily i still have the admin console up from pushing changes yesterday	02:44
fungi	thanks!	02:45
opendevreview	Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/877057	02:54
WhoisJMH	hello, I have a question. In the openstack environment built using devstack in the Ubuntu 20.04 environment, the instance is created and operated well, but there is no problem. When I try to create a new instance, the message "Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance" appears and the creation fails. Although it is a single node environment, the server has enough resources for cpu, ram,	07:12
WhoisJMH	Which part should I check first to solve this problem?	07:12
yoctozepto	WhoisJMH: hi, nova-compute logs will be best now to know the reason for rejection; just note this channel is not devoted to openstack support, please go to #openstack for further queries	07:38
yoctozepto	ianw, fungi, clarkb, frickler: thanks for all your feedback on the new project+tenant and for merging that; I will proceed with setting up the tenant today and let you know how it goes	07:43
*** jpena\|off is now known as jpena		07:46
yoctozepto	just one last request for now - please also add me to nebulous-release ;D	07:57
ianw	yoctozepto: done :)	08:22
yoctozepto	ianw: many thanks	08:23
opendevreview	daniel.pawlik proposed zuul/zuul-jobs master: Provide ensure-microshift role https://review.opendev.org/c/zuul/zuul-jobs/+/876081	09:00
opendevreview	daniel.pawlik proposed zuul/zuul-jobs master: Provide ensure-microshift role https://review.opendev.org/c/zuul/zuul-jobs/+/876081	09:10
opendevreview	daniel.pawlik proposed zuul/zuul-jobs master: Provide ensure-microshift role https://review.opendev.org/c/zuul/zuul-jobs/+/876081	09:31
bbezak	Hi, I'm having quite a lot of network connectivity issues - only involves 'provider: inmotion-iad3' ones.	10:52
bbezak	Interestingly it happens towards tarballs.openstack.org. On both ubuntu (focal,jammy) and centos stream 8 jobs. Most often than not I'm affraid (but I saw good runs too on the occasions on iad3, but less often). I haven't seen those issues on 'rax' provider for instance - https://paste.opendev.org/raw/btENz9poC0tQ0p3t7Hny/	10:52
bbezak	by the look of it, it started yesterday	10:52
fungi	bbezak: looks like it's not just tarballs.o.o, the first failure i pulled up was complaining about reaching the releases site: https://zuul.opendev.org/t/openstack/build/3a9c0f69727f47ba8e7747eba3f2d678/log/primary/ansible/tenks-deploy#2030-2036	12:36
fungi	but still from a node in inmotion-iad3	12:37
bbezak	I've seen issues with releases as well. But not in last several runs, so I didn't mention it	12:37
fungi	well, it helps to know that there's more than one site the jobs are having trouble reaching from there	12:40
fungi	and the nodes in that region are ipv4-only so we can rule out ipv6-related issues	12:40
bbezak	however those are resolving to the same static01.opendev.org fungi	12:43
bbezak	(at least from my end)	12:44
fungi	oh, yes that's a good point, they're different sites on the same vm	12:47
fungi	anyway, i'm checking for connectivity issues between that provider region and those sites	12:48
bbezak	thx fungi	12:48
fungi	not seeing any packet loss at the moment	12:50
bbezak	it just failed on 173.231.253.119 fungi	13:16
bbezak	it got 200 on https://tarballs.openstack.org/ironic-python-agent/tinyipa/files/tinyipa-stable-xena.vmlinuz.sha256, but got Network is unreachable on https://tarballs.openstack.org/ironic-python-agent/tinyipa/files/tinyipa-stable-xena.vmlinuz	13:18
fungi	yeah, whatever it is, it's clearly intermittent	13:19
bbezak	yeah, "the best" kind	13:20
fungi	could be an overloaded router in that providers core network and only some flows are getting balanced through it, for example	13:20
fungi	i'm still trying to reproduce connectivity errors with lower-level tools	13:20
fungi	could also be farther out on the internet in some backbone provider	13:22
fungi	the route between those environments is, unfortunately, asymmetrical so will be harder to track down if so	13:22
fungi	looks like from inmotion to rackspace (where the static server resides) both providers peer with zayo, while in the other direction they both peer with level3	13:24
fungi	going through zayo it transits their atl facillity to get from iad to dfw, though the level3 hop between dfw and iad is not identifying itself currently	13:27
fungi	mtr from rackspace to inmotion is recording around 0.2-0.3% packet loss at the moment	13:28
fungi	not seeing any in the other direction, which is strange, but maybe just not a statistically significant volume of samples yet	13:29
bbezak	ok	13:31
fungi	bbezak: one thing to keep in mind, jobs shouldn't normally need to fetch urls like https://releases.openstack.org/constraints/upper/yoga since they can access the same constraints file from the openstack/requirements repository checkout provided on the test node	13:32
fungi	and we could look into baking the tinyipa kernels into our node images in order to reduce further traffic across the internet, or add the tarballs site to our mirrors in all providers (they're both backed by data in afs, so it would just be a matter of adding a path in the apache vhost to expose that to clients)	13:34
fungi	making connections across the internet in a job should be avoided whenever possible (though we perform some brief internet connectivity tests in pre-run for all jobs in order to weed out test nodes with obviously bad internet connections)	13:38
bbezak	yeah, that makes sense, we have the var for requirements_src_dir already in the job, so it shouldn't be difficult to override it for the CI only	13:40
fungi	setting up some mtr runs from montreal and san jose to static.o.o as well for a baseline	13:40
opendevreview	daniel.pawlik proposed zuul/zuul-jobs master: Provide ensure-microshift role https://review.opendev.org/c/zuul/zuul-jobs/+/876081	13:57
opendevreview	daniel.pawlik proposed zuul/zuul-jobs master: Provide ensure-microshift role https://review.opendev.org/c/zuul/zuul-jobs/+/876081	14:03
opendevreview	daniel.pawlik proposed zuul/zuul-jobs master: Provide ensure-microshift role https://review.opendev.org/c/zuul/zuul-jobs/+/876081	14:48
fungi	mtr wasn't turning up any packet loss to static.o.o from other providers either, and the 0.3% loss i initially saw from there to inmotion dropped to 0.2%, then to 0.1% and eventially 0.0% so seems there may have been a very brief blip early in the mtr run but that's it	15:02
fungi	i'm currently downloading tinyipa-stable-xena.vmlinuz on a machine in inmotion-iad3 in a loop with a 1-second delay, trying to get it to fail	15:04
fungi	over 2k downloads so far with no failures	16:02
fungi	whatever the issue, i don't think it's steady, must come and go in small bursts	16:03
clarkb	as expected the gitea13 and 14 replication continues this morning	16:15
clarkb	its a bit more than halfway done	16:15
clarkb	fungi: sounds like typical internet behavior	16:17
fungi	yeah, close to 3k successful downloads and no failures. i'm going to stop the loop before i waste any more bandwidth	16:17
fungi	though i do think exposing the tarballs afs volume on our mirrors might be useful for some stuff like the ipa kernel downloads	16:18
clarkb	as far as adding the tinyipa image to test nodes the main struggle there is you end up with a bunch of versions and no one knows when it is safe to remove. If we do that I think we should explicitly state we can do latest and latest-1 and then older versions which are used less often can ocntinue to be fetched remotely. This is basically what we're moving towards with cirros	16:18
clarkb	oh ya simply making use of our afs caches isn't a bad idea	16:18
fungi	speaking of cirros, should i go ahead and self-approve 873735? it's been about a month with no objections	16:19
clarkb	I've got no objections though I worry it may disrupt the openstack release somehow (the latest 6.1 version isn't used anywhere because it changes dhcp clients and tempest doesn't know how to interact with it or something to check dhcp things are working_	16:20
clarkb	but I think 5.2 is used and not 5.1?	16:21
fungi	yeah, i'll keep it on the back burner until post-release	16:22
fungi	good call	16:23
yoctozepto	infra-root: I think I need your help with merging this initial change: https://review.opendev.org/c/nebulous/project-config/+/877107	16:25
yoctozepto	or may you help me set it up to allow me to merge things on demand from gerrit?	16:25
fungi	if it's what i think it is (haven't looked yet), yes there's a bootstrapping step where manual merging is needed to add a base job	16:26
yoctozepto	(in case we break this base config in the future)	16:26
yoctozepto	fungi: yeah, it's adding the noop job to the nebulous/project-config repo as well as pipelines	16:26
yoctozepto	based on opendev/project-config	16:26
clarkb	I think what you've got is correct. Add pipelines and a noop job	16:27
clarkb	then you can land changes from there	16:27
fungi	adding verified+2 and submit perms in the project-config repo for a special group might make sense... infra-root: ^ opinions?	16:27
clarkb	and ya that would need a gerrit admin to apply a verified +2 and hit submit	16:27
yoctozepto	btw, the CI/CD side will be Apache 2.0 licensed; the project itself will be MPL 2.0 because that is what we have in the grant agreement	16:28
fungi	definitely a line to walk between risk for the user and needing to involve our gerrit admins more often	16:28
clarkb	fungi: I seem to recall there was some consideration for that in the past.	16:28
clarkb	I want to say it implies a higher level of trust than what is limited to the tenant for some reason but I may be misremembering	16:28
yoctozepto	for one, not many people will be allowed to approve anything in that repo	16:29
yoctozepto	likely just me and some other person that we have not found yet	16:29
clarkb	but ya ultimately if they can land changes normally then allowing them to bypass ci is not much extra	16:30
clarkb	I'm willing to give it a go	16:30
yoctozepto	thanks	16:30
yoctozepto	what should I do?	16:30
clarkb	yoctozepto: it will require an acl update to give some group verified -2/+2 perms and allow the submit button	16:30
yoctozepto	ook	16:31
clarkb	then instructing that group to do their best to avoid relying on those perms and only perform the actions when you can't get around zuul being stuck due to the config you are tring to update	16:31
yoctozepto	so verified +2 I think I know how to do	16:31
clarkb	this situation is an example of that	16:31
yoctozepto	but the submit button	16:31
yoctozepto	:-)	16:31
clarkb	yoctozepto: once the necessary votes are applied the button shows up in the top left panel of the change	16:32
clarkb	next to rebase/abandon/edit	16:32
clarkb	you apply the required votes, then click the button	16:32
yoctozepto	oook, I see you mean top-right	16:32
yoctozepto	then I will reconfigure the group to allow V+2	16:33
clarkb	oh sorry yes	16:33
fungi	yoctozepto: one thing you might consider is having separate admin accounts you add to your administrative group, it's what we (opendev sysadmins) do for our gerrit admin access in order to minimize risk of accidentally doing something we didn't mean to over the course of normal use of the system or unnecessarily exposing the more privileged account to compromise	16:36
fungi	also lp/uo sso 2fa is highly recommended	16:36
yoctozepto	fungi: thanks for the hints! I think we are less impactful so a separate account is an overkill but it surely would be handy to disallow non-2FA logins going forward	16:45
clarkb	unfortunately I don't think we're able to control that via gerrit	16:46
fungi	right, well what you can do is make sure you have 2fa set up for the account(s) you use	16:47
clarkb	yup and you can configure your account to require 2fa, but I don't know that we can enforce it on the service side with the way thing are currently implemented	16:48
fungi	and you can always separate your roles into multiple accounts later pretty easily since you control the group membership anyway, so nothing you need to decide right now	16:48
opendevreview	Radosław Piliszek proposed openstack/project-config master: Allow nebulous-project-config-core to add V+2 https://review.opendev.org/c/openstack/project-config/+/877108	16:52
yoctozepto	clarkb, fungi: yeah, I meant for other project members to also be good citizens with 2FA :D but thankfully nothing makes it obligatory for us so it's good as it is atm	16:53
yoctozepto	anyways, the change is up ^	16:53
clarkb	yoctozepto: yup just reviewed	16:54
opendevreview	Radosław Piliszek proposed openstack/project-config master: Allow nebulous-project-config-core to add V+2 https://review.opendev.org/c/openstack/project-config/+/877108	16:56
yoctozepto	clarkb: fixed&replied	16:56
yoctozepto	q if you are sure about the "submit =" part	16:57
yoctozepto	because nothing else has it	16:57
yoctozepto	happy to oblige otherwise	16:57
clarkb	yoctozepto: I'm pretty sure. Yesterday when we were tesitng things and adding all the +2 votes the submit button would show up but was greyed out because you also need explicit submit perms. If you look in system-config/doc/source/gerrit.rst you'll see where we document that only zuul and the project creation tool have it by default	16:57
clarkb	fungi can double check	16:58
yoctozepto	ok, you are right, I feel convinced	16:58
opendevreview	Radosław Piliszek proposed openstack/project-config master: Allow nebulous-project-config-core to add V+2 https://review.opendev.org/c/openstack/project-config/+/877108	16:58
opendevreview	Radosław Piliszek proposed openstack/project-config master: Allow nebulous-project-config-core to submit changes https://review.opendev.org/c/openstack/project-config/+/877108	16:59
yoctozepto	clarkb, fungi: all done ^	16:59
yoctozepto	even adapted the commit message	17:00
yoctozepto	fingers crossed	17:00
*** jpena is now known as jpena\|off		17:02
corvus	yoctozepto: clarkb fungi i think we can remove these permissions. this is a one-time event	17:04
yoctozepto	corvus: unless I manage to break the zuul config there and then come here bragging for help ;D	17:05
corvus	currently both opendev and zuul tenants have only noop jobs configured for their config-projects, so it's exceedingly unlikely that further involvement from infra-root would needed	17:05
fungi	well, it's a one-time event until someone accidentally merges breakage to the base job and then gerrit admins need to step in again	17:05
corvus	yoctozepto: well, that's part of it...	17:05
fungi	ahh, good point with noop	17:05
clarkb	ya I explicitly noted that you can stick with noop and non voting in my +1 review	17:05
corvus	you can't break the tenant if the config project is gated with noop. but you can break it if you have submit perms	17:06
clarkb	basically the change does what fungi suggested earlier, but I wanted more feedback and gave alternatives	17:06
yoctozepto	what if I break the pipelines?	17:06
yoctozepto	:D	17:06
yoctozepto	I mean	17:06
yoctozepto	I can only really ever break pipelines there	17:06
yoctozepto	as it will like 99,9% stay "tested" with noop	17:06
corvus	it would be exceedingly hard to break the pipelines if the config-project is gated	17:07
corvus	it is easy to break them if it is not	17:07
fungi	note i wasn't necessarily suggesting this, but asking what others thought about the tradeoffs	17:07
clarkb	I think pipelines are unlikely to change often and considering that other projcet configs haven't resorted to this I think I'm coming around to corvus' reasoning and we can try it with noop for now	17:07
clarkb	fungi: ack	17:07
corvus	okay sorry i saw a flurry of changes and messages and am not 100% sure what the current status is	17:07
yoctozepto	ok, so someone just merge this for me and I abandon the extra perms	17:07
yoctozepto	I don't mind either way for now :D	17:07
corvus	so we're at "consider additng perms" not "we just added perms" that's cool, then i'm jumping into the conversation about evaluating what to do :)	17:08
yoctozepto	:D	17:08
clarkb	corvus: yup the change has not merged yet. Just at the point where a change that does that has been proposed and is in early review	17:08
corvus	yoctozepto: anyway, not trying to make things hard, and if it becomes a problem, i'm not opposed to more perms in principle. i think that not having perms is sufficient and the most safe, and should not actually block you	17:08
yoctozepto	I agree, I don't like exceptions in my stuff either	17:09
clarkb	the main trick would be avoiding gate jobs that vote or if they vote always return success	17:09
yoctozepto	if I never break the change and gate pipelines, then I should be fine, right?	17:09
fungi	i'm also not against helping bypass zuul to merge things on rare occasions where there are no alternative solutions, mainly just want to avoid it becoming a frequent activity	17:09
yoctozepto	as in	17:09
yoctozepto	if I misconfigure some other pipeline	17:09
corvus	yeah, and we've never seen fit to add anything other than noop to the opendev or zuul tenants, so i would expect the same for nebulous too	17:09
yoctozepto	++	17:09
corvus	yoctozepto: yes -- and if you follow the pattern in the opendev or zuul tenants, hopefully only gate would matter.	17:10
clarkb	eg no clean check requirement	17:10
yoctozepto	ah, right	17:10
yoctozepto	that's true	17:10
yoctozepto	so I can even break the check, sweet	17:11
yoctozepto	:D	17:11
yoctozepto	let there be havoc	17:11
corvus	yoctozepto: consider carefully which tenants to base your pipelines on. opendev and zuul tenants do not have a clean check requirement, which means only gate is needed to work (and merging changes can be much faster); openstack has a clean check requirement (because people don't always follow best practices)	17:11
yoctozepto	corvus: yeah, I went for the quicker way for now and will see how our partners behave	17:12
yoctozepto	in the worst case, it will be the openstack way	17:12
yoctozepto	which is not bad	17:12
yoctozepto	just slower for some stuff	17:12
corvus	yoctozepto: i think it's great to start with no clean check and add only if needed	17:13
clarkb	++	17:13
yoctozepto	happy to hear that I am making the blessed choices :-)	17:13
yoctozepto	soo	17:14
yoctozepto	I think we have a consensus	17:14
yoctozepto	let me abandon the extra perms	17:14
yoctozepto	and some of you merge me that nebulous/project-config change	17:14
yoctozepto	https://review.opendev.org/c/nebulous/project-config/+/877107	17:15
corvus	infra-root: i can do the force-merge -- i think that's probably a non-controversial action that i could do immediately?	17:19
fungi	corvus: thanks! i have no objection	17:20
clarkb	corvus: yup as long as the pipeline config doesn't look broken I guess. But if it is functional enough to land a followup that isn't a big deal	17:20
corvus	#status log force-merged https://review.opendev.org/877107 to bootstrap nebulous tenant	17:22
opendevstatus	corvus: finished logging	17:22
corvus	yoctozepto: https://zuul.opendev.org/t/nebulous/status	17:23
yoctozepto	thanks, corvus	17:23
yoctozepto	it's a verbatim copy of opendev/project-config now	17:24
yoctozepto	I think I made it explicit in the commit message	17:24
opendevreview	John L. Villalovos proposed openstack/diskimage-builder master: chore: support building Fedora on arm64 AKA aarch64 https://review.opendev.org/c/openstack/diskimage-builder/+/877112	17:32
fungi	clarkb: what do you think about further increases to the launch timeout in rax-ord? it looks like whenever we have a spike in node requests, we end up with lots of launch timeouts there even with the timeout increased to 15 minutes, but the instances do seem to eventually boot after nodepool gives up waiting	17:38
fungi	my concern is that the longer we make the timeout, the longer some jobs will spend waiting for node requests to be filled	17:38
clarkb	fungi: I suspect that reducing the max-servers count may result in better throughput?	17:39
fungi	if we had some way to limit the number of nodes booting in parallel that might help, since the cloud does appear to be capable of handling a large number of nodes once they've booted	17:39
clarkb	that will reduce the size of potential rushes there and may keep us booting nodes in a reasonable time frame	17:39
clarkb	fungi: ooh thats a good idea, but I'm not sure we have support for that yet?	17:40
fungi	it's the region in rax where we have the largest quota, but we've already reduced max-servers there by half (so it's now ~2/3 of the other two rax regions)	17:40
clarkb	one problem with increasing the timeout is that we retry 3 times too	17:41
clarkb	so in the owrst case a job may wait 3 * timeout	17:41
fungi	it did represent 40% of our theoretical rax capacity, now it's 25%	17:41
clarkb	maybe what we should do is raise the timeout and not allow retries in that region	17:41
clarkb	that should also help with the thundering herd problem since we won't retry so much	17:42
clarkb	(maybe, if we are at capacity chances are another request will show up soon enough though)	17:42
clarkb	I think that is what I would do. don't allow retries and increase timeout a bit	17:42
fungi	we're able to control retries independently per provider? i'll give that a shot	17:43
clarkb	I think we are	17:43
fungi	launch-retries is per provider, yep	17:44
fungi	we currently don't set it for any provider and take the default from nodepool	17:44
fungi	https://zuul-ci.org/docs/nodepool/latest/openstack.html#attr-providers.[openstack].launch-retries	17:45
fungi	slightly misleadingly named/documented, since it's the number of time to try, not the number of times to retry	17:45
fungi	so we want it set to 1 for a single try (i.e. no retries)	17:45
clarkb	fungi: and also maybe we want to increase the api rate limit	17:48
clarkb	but thta impact is unlikely to matter much	17:49
fungi	you mean the delay between api calls? already did by an order of magnitude in an earlier change but can do it some more	17:49
clarkb	(it would slow down boots when thundering herd happens just not much comapred to the timeouts)	17:49
clarkb	ya its still 100 requests a second	17:49
clarkb	we may want 1 a second? I dunno	17:49
fungi	worth a try i guess	17:49
opendevreview	Jeremy Stanley proposed openstack/project-config master: Further increase rax-ord launch timeout no retries https://review.opendev.org/c/openstack/project-config/+/877113	17:53
fungi	if it ends up helping, maybe we can try turning the max-servers there back up some	17:54
clarkb	corvus: ^ that may interest you from a general nodepool functionality perspective	17:55
fungi	it's a fairly pathological case though, not sure how many knobs for dealing with a situation like that make sense	17:59
corvus	oh oh	18:00
clarkb	I think being able to control the parallelism of inflight node creations is worth considering though	18:00
corvus	what about https://zuul-ci.org/docs/nodepool/latest/configuration.html#attr-providers.max-concurrency ?	18:00
clarkb	oh do we have that TIL	18:00
clarkb	++ that is exactly what we need	18:01
fungi	whoa. mind blown	18:01
clarkb	I think we still drop the retries to avoid stalling builds out	18:01
clarkb	but maybe we keep the old api rate and set parallelism to something that grafana graphs make look like it can handle	18:01
fungi	i'll start it off at 10	18:02
corvus	(the docs need updating because it's not actually threads anymore, but the statemachine framework does honor that -- it still controls the concurrency for new requests)	18:02
fungi	if this helps, we can probably back the launch-timeout down to something smaller as a safety, and then possibly re-add retries?	18:03
clarkb	fungi: I suspect that we're more likely to hit timeouts than valid failures needing a retry?	18:05
clarkb	basically in a cloud suffering these issues it is probably better in all cases to have alonger timeout and not keep trying if you fail	18:05
fungi	yeah, maybe	18:07
opendevreview	Jeremy Stanley proposed openstack/project-config master: Limit rax-ord launch concurrency and don't retry https://review.opendev.org/c/openstack/project-config/+/877113	18:09
fungi	corvus: clarkb: ^	18:09
corvus	++	18:10
fungi	you can sort of see the misbehavior by comparing the three test node history graphs at https://grafana.opendev.org/d/a8667d6647/nodepool-rackspace?orgId=1&from=now-24h&to=now	18:11
fungi	though the error node launch attempts graph is across all three providers, from the logs it seems to be mainly rax-ord	18:12
fungi	and you can clearly see the variability for that region in the time to ready graph	18:12
clarkb	itdoes seem to be able to handle ~30 booting nodes. I guess we monitor things and increase the concurrency if it holds up	18:12
fungi	well, that's potentially misleading. it "handles" accepting that many boot requests in parallel, but definitely does look like things get a lot worse when we ask for all its capacity at once	18:14
clarkb	right you can see where it requests far more than 30 and has a sad. But there are a couple instances where it requests 30ish and seems to do well with that	18:15
clarkb	and an instance of about 38 where it falls over	18:15
clarkb	I suspect the tipping point is around 30 for this reason	18:16
fungi	potentially, but i also wouldn't rule out external factors since we're not the only user of that cloud	18:16
fungi	does anybody know what the multiple stats for the providers are in the api server operations graphs?	18:19
fungi	not all regions have the same number of them either	18:19
fungi	for example in the post server graph there are 5 lines for dfw, 3 each for iad and ord	18:20
fungi	i guess i could look at the yaml for that	18:21
clarkb	if you hit the graph menu there is an inspect option	18:21
clarkb	it looks like only one of them actually has data	18:22
fungi	looks like the same thing i found in git, aliasByNode(stats.timers.nodepool.task.$region.compute.POST.servers.*.mean, 4)	18:22
clarkb	somehow $region is ending up with multiple identical results? And i think that variable list can be retrieved by querying graphite?	18:23
fungi	yeah, i'm fiddling with graphite	18:23
corvus	one for each response code?	18:24
clarkb	oh maybe	18:25
fungi	that seems to be it, yep	18:25
corvus	there are 5 for dfw and 3 for ord	18:25
fungi	dfw has returned 500 and 503, in addition to the 202, 400 and 403 returned by the other regions	18:25
fungi	i wonder if the intent was to aggregate those	18:26
corvus	should probably either aggregrate or adjust the alias so it's rax-ord 200; either is possible with a different graphite function	18:27
opendevreview	Merged openstack/project-config master: Limit rax-ord launch concurrency and don't retry https://review.opendev.org/c/openstack/project-config/+/877113	18:29
clarkb	alright I'm going to work on deleting gitea05-07 now	18:30
clarkb	if they are anything like gitea08 I will need to delete their backing boot volume separately	18:30
clarkb	#status log Deleted gitea05-07 and their associated boot volumes as they have been replaced with gitea10-12.	18:41
opendevstatus	clarkb: finished logging	18:41
opendevreview	Clark Boylan proposed opendev/zone-opendev.org master: Remove gitea05-07 from DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/877117	18:43
clarkb	under 1k replication tasks for gitea13 and 14 now too	18:43
clarkb	the cryptography min requirement discussion for openstack prompted me to look and cryptography is actually requiring a pretty reasonable rust compiler version	19:10
clarkb	A lot of projects (firefox famously) require a prerelease compiler but cryprography wants 1.45.0 or newer and ubuntu has 1.65.0 (in universe though)	19:10
fungi	yeah, however jammy ships python3-cryptography 3.4.8 because it was contemporary at the time, and would need to backport a rust-built version of the package with numerous new build-dependencies to update to a newer version	19:25
fungi	compare https://packages.ubuntu.com/source/jammy/python-cryptography with https://packages.ubuntu.com/source/lunar/python-cryptography	19:25
fungi	that's the sort of complex change a stable distro is generally going to avoid	19:26
clarkb	for sure. My point is more I don't think the issue is rust specific so much as stable distros don't wholesale backport new releases typically	19:27
clarkb	and then that is made worse by the old software they are having been in part replaced with something completely different so any updates may require a lot of effort	19:29
fungi	but also it may not just be rust itself in this case, if cryptography needs newer versions of other toolchain components (cargo, setuptools-rust extension, various rust libs)	19:30
clarkb	right, I just think its distracting to bring up rust in this case. Ubuntu basically never takes a 2 year newer version of a libarary and replaces their stable version with it regardless of the compiler	19:31
fungi	looks like the python3-cryptography in lunar has 9 different rust-based libs it also depends on	19:31
clarkb	this is more about "stable distro has a stable library version we need to continue to do our best to support that". Its no different than not requiring a newer libvirt	19:32
fungi	sure, the discussion has fixated on rust because it's what coreycb brought up as the challenge for that particular package, but i tried to point out earlier that it's really just one example	19:32
fungi	the rax-ord graph looks considerably better	20:49
fungi	though it'll be hard to know for sure until monday	20:50
fungi	possibly just wishful thinking on my part	20:51
clarkb	one replication event is still processing. Once that is done I'll trigger global replication for the replication config reload and then we should be good to land the change to add gitea13 and 14 to haproxy	21:08
clarkb	full replication started	21:23
clarkb	fungi: the boot times don't seem to have dropped significantly though	21:26
clarkb	we may still need to bump the timeout up as a result	21:27
clarkb	I've approved https://review.opendev.org/c/opendev/system-config/+/877047 as replication is complete	21:47
clarkb	fungi: if you are still around https://review.opendev.org/c/opendev/zone-opendev.org/+/877117 should be an eas one	21:47
clarkb	(cleans up gitea05-07 dns records)	21:47
clarkb	thank you!	21:53
fungi	np, just cooking dinner and reviewing changes	21:56
opendevreview	Merged opendev/zone-opendev.org master: Remove gitea05-07 from DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/877117	21:58
opendevreview	Merged opendev/system-config master: Add gitea13 and gitea14 to the haproxy load balancer https://review.opendev.org/c/opendev/system-config/+/877047	23:00

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!