Thursday, 2023-01-19

ianw	2ebdabe1-799f-4bb0-9ed6-758d9ee34bbc \| test-server \| ACTIVE \| auto_allocated_network=10.100.0.147 \| centos-8-stream-arm64-1674085448 \| opendev-no-ephemeral	00:14
ianw	ok ... so the new linaro cloud can boot a raw image uploaded, with no ephemeral storage. that's a start	00:14
fungi	progress!	00:17
opendevreview	Ian Wienand proposed opendev/system-config master: nodepool config: set linaro cloud to use raw images https://review.opendev.org/c/opendev/system-config/+/871010	00:38
opendevreview	Merged openstack/project-config master: nb04: use linaro region mirror https://review.opendev.org/c/openstack/project-config/+/871006	00:39
*** dasm is now known as dasm\|off		01:10
opendevreview	Merged opendev/system-config master: Update git in gitea images https://review.opendev.org/c/opendev/system-config/+/871009	02:21
opendevreview	Ian Wienand proposed opendev/system-config master: openafs: use consistent name for cache size https://review.opendev.org/c/opendev/system-config/+/871014	02:31
ianw	^ i've also manually fixed the linaro mirror to start openafs correctly; but that will do it permanently	02:33
clarkb	inventory/service/host_vars/mirror01.regionone.linaro.opendev.org.yaml is where the smaller size is set if anyone is wondering	02:36
ianw	huh, that went into a recursive error	02:57
ianw	i guess you can not call a role with openafs_client_cache_size: "{{ openafs_client_cache_size \| default(10000000) }}" # 10GiB	03:33
opendevreview	Ian Wienand proposed opendev/system-config master: linaro mirror: fix afs cache size https://review.opendev.org/c/opendev/system-config/+/871014	03:37
ianw	it's confusing but i don't have motivation to do anything more fancy now	03:37
fungi	makes sense	03:45
opendevreview	Ian Wienand proposed opendev/system-config master: hound: use updated git packages https://review.opendev.org/c/opendev/system-config/+/871016	03:46
opendevreview	Merged opendev/system-config master: nodepool config: set linaro cloud to use raw images https://review.opendev.org/c/opendev/system-config/+/871010	04:21
opendevreview	Merged opendev/system-config master: linaro mirror: fix afs cache size https://review.opendev.org/c/opendev/system-config/+/871014	04:25
ianw	i've recreated the "opendev" flavor on the new linaro cloud to not have ephemeral storage. however nodepool still isn't sending work there as it doesn't have the right images, yet. nb04 is building them (after nb03 went missing)	04:37
opendevreview	Merged opendev/system-config master: hound: use updated git packages https://review.opendev.org/c/opendev/system-config/+/871016	05:09
*** soniya is now known as soniya29\|rover		05:41
*** ysandeep is now known as ysandeep\|afk		06:13
*** ysandeep\|afk is now known as ysandeep		07:35
*** jpena\|off is now known as jpena		08:05
*** soniya29\|rover is now known as soniya29\|rover\|brb		09:44
*** ysandeep is now known as ysandeep\|afk		11:00
*** dviroel\|afk is now known as dviroel		11:13
*** rlandy\|out is now known as rlandy		11:14
*** soniya29\|rover\|brb is now known as soniya29\|rover		11:25
*** ysandeep\|afk is now known as ysandeep		12:48
*** dasm\|off is now known as dasm		13:08
*** ysandeep is now known as ysandeep\|dinner		15:18
*** dasm is now known as Guest1846		15:29
*** Guest1846 is now known as dasm		15:30
*** ysandeep\|dinner is now known as ysandeep		15:32
Tengu	folks, I have a really, really weird behavior with the ansible-galaxy proxy: locally, it works. The vhost has the exact same configuration, though I don't have TLS enabled. But the mirror on opendev infra seems to have a difference making it unreliable.	15:32
Tengu	for instance, using the proxy, it's impossible to install "community.general" collection, while it does work through my local config.	15:33
Tengu	and I really don't know why this is failing. Especially since installing (so far) any other collection is working fine.	15:33
fungi	have you tested more than one mirror server?	15:35
fungi	what is the error you receive from it?	15:35
Tengu	fungi: I don't know the URI for other proxies, so I tested only via https://mirror.iad3.inmotion.opendev.org:4448	15:35
fungi	can you replicate it consistently, or is it intermittent>	15:36
Tengu	consistent on that one. here's the command: ansible ~/.ansible/galaxy_cache/api.json ; ansible-galaxy collection install -vvvvvvv -s https://mirror.iad3.inmotion.opendev.org:4448 -p ./ansible community.general	15:36
Tengu	err... it's missing the beginning.	15:37
Tengu	here: rm -rf ansible ~/.ansible/galaxy_cache/api.json ; ansible-galaxy collection install -vvvvvvv -s https://mirror.iad3.inmotion.opendev.org:4448 -p ./ansible community.general	15:37
Tengu	the first part is to clear all local cache. the "-p ansible" ensures we're using a local directory, in order to not pollute the system.	15:37
fungi	https://mirror.dfw.rax.opendev.org:4448/ https://mirror.bhs1.ovh.opendev.org:4448/ https://mirror.sjc1.vexxhost.opendev.org:4448/	15:38
fungi	those are a few in more providers	15:38
Tengu	let's see.	15:38
Tengu	same on https://mirror.sjc1.vexxhost.opendev.org:4448/	15:38
fungi	what is the error you receive from it?	15:39
Tengu	it's an ansible CLI error - and it doesn't really provide data. I tried to compare things with the actual ansible-galaxy server, but didn't find anything. lemme paste the stack.	15:39
fungi	to paste.opendev.org please ;)	15:40
Tengu	https://paste.openstack.org/show/biRnXTrehuHU0b0GNRZ5/	15:40
fungi	thanks	15:40
Tengu	(no, I won't paste 60+ lines on IRC ;))	15:40
fungi	much appreciated	15:40
Tengu	;)	15:40
Tengu	and if we point to another collection, say "ansible.utils", it just works fine.	15:41
fungi	https://mirror.sjc1.vexxhost.opendev.org:4448/api/v2/collections/community/general/versions/ seems to paginate when i hit it with a browser	15:41
Tengu	it's expected.	15:42
Tengu	for instance, ansible-galaxy CLI does it on its own: Calling Galaxy at https://mirror.dfw.rax.opendev.org:4448/api/v2/collections/community/general/versions/?page_size=100	15:42
Tengu	and the, on the next line: Calling Galaxy at https://mirror01.dfw.rax.opendev.org:4448/api/v2/collections/community/general/versions/?page=2&page_size=100	15:42
Tengu	basically, ansible-galaxy wants to get the full index	15:42
fungi	so, there's one problem we observed with mod_substitute when proxying pypi	15:43
Tengu	wait.... you may have a thing	15:43
Tengu	ansible.utils, a working collection, doesn't paginate	15:43
fungi	if a json response is all in one line, it may exceed teh maximum line length supported by the mod. this can be adjusted with a setting	15:43
Tengu	hmmm nope, it seems to be fine with the substitute	15:43
Tengu	ansible.posix doesn't paginate either	15:44
Tengu	hmmmmm.	15:44
Tengu	fungi: are there settings in httpd set outside of playbooks/roles/mirror/templates/mirror.vhost.j2 ?	15:46
fungi	looking at the pypi proxy we set SubstituteMaxLineLength 20m because of https://github.com/pypi/warehouse/issues/11919	15:46
Tengu	asking since the same vhost config is working locally - maybe there's something global I don't have, creating the issue.	15:46
Tengu	fungi: I copied the setting in the galaxy vhost	15:46
fungi	looks like the limit in mod_substitute is 1m characters to a line	15:47
Tengu	fungi: and... well, it would fail on my local env, but it's working fine.	15:47
fungi	just trying to rule that out real quick	15:47
Tengu	lemme try to paste my httpd.conf from my container somewhere.	15:47
Tengu	it's pretty ugly, 1-file, but...	15:47
fungi	total response size is 12584 bytes, so definitely not that	15:48
fungi	an order of magnitude lower than would be needed to hit that problem	15:49
Tengu	fungi: I also ruled out some internal cache issue within ansible code - I suspected that the "localhost:8080" being far shorter than the "mirror01......" used in the CI job, it may be truncated or something - but apparently it's not the case.	15:50
Tengu	the trace is really weird.	15:50
Gue___________________________	Greetings #opendev. Quick question: is it possible to delete an etherpad that I created on your site or to delete the content including its history? We are hoping to use the pad for a brainstorm but would prefer if the convo did not live on forever. Thank you in advance for considering.	15:50
fungi	Gue___________________________: that's not a supported use for our etherpad server. i think etherpad.org may have a public server which expires pads after a while, you might check there	15:52
Tengu	aha. yeah. ok. fungi I think you get a thing with the paginate actually.	15:52
fungi	Gue___________________________: our etherpad is intended for public collaboration, and we make every attempt to preserve the history there for posterity	15:53
Tengu	the ansible cache is far, far different.	15:53
Tengu	yessssssss	15:53
Tengu	jm1: I fond a workaround!	15:53
Tengu	and it won't hit too hard: add "--no-cache" to the ansible-galaxy command	15:53
fungi	Tengu: so local caching impacts it?	15:54
Tengu	jm1: that will tell ansible to NOT touch its ~/.ansible/galaxy_cache/api.json	15:54
Tengu	fungi: in a weird way - maybe due to something sent by the proxy, still.	15:54
Tengu	grumpf... isn't there some CLI one can easily use to paste a long file ?!	15:54
Tengu	instead of copy-pasting blocks after blocks..	15:55
fungi	i would probably resort to hacking some debug logging into resolvelib to get more detail about the dict that it's trying to access	15:55
fungi	Tengu: there's the pastebinit tool	15:55
Tengu	http://paste.scsys.co.uk/2216 here	15:55
Tengu	nopaste < httpd.conf	15:55
Tengu	fungi: I checked the file on-disk	15:56
Tengu	its content is indeed "slightly" different when there's a paginate.	15:56
fungi	for future reference, pastebinit can paste to paste.opendev.org as well	15:56
* Tengu takes note		15:56
Tengu	ah, via -b paste.opendev.org I guess.	15:57
Tengu	ok.	15:57
Gue___________________________	@fungi Thank you, understood.	16:01
*** dviroel is now known as dviroel\|lunch		16:01
fungi	Tengu: yeah, i have an "opaste" alias to that in my shell, for convenience	16:11
Tengu	fungi: it failed to get the generated link. bah. I usually don't have to paste 100+ lines.	16:12
Tengu	anyway. I have a workaround, but it would still be nice to understand why it fails on the "prod", while dev env is fine :/.	16:12
fungi	i would probably resort to hacking some debug logging into resolvelib to get more detail about the dict that it's trying to access	16:13
Tengu	fungi: so I checked the JSON (yeah, the local cache is plain JSON), and it seems to miss things when it comes to that specific collection.	16:28
Tengu	it's... weird.	16:28
fungi	what part of the json is missing?	16:29
*** ysandeep is now known as ysandeep\|out		16:38
*** dviroel\|lunch is now known as dviroel		16:57
*** marios is now known as marios\|out		16:59
Tengu	fungi: (sorry, was on some other discussion) the whole part matching the key shown in the trace	17:01
Tengu	so basically, it's as if it's flushing all of the data related to the versions	17:01
Tengu	i.e. loads page one, injects data in the file, and that entry is dropped at some point when it comes to load the second page and tries to update it.	17:02
Tengu	and since it's supposed to be there, it crashes instead of re-creating (which is probably better).	17:02
Tengu	but this happens if and only if we're using the opendev proxies. My local httpd, with the configuration I pasted earlier, doesn't crash ansible-galaxy.	17:03
Tengu	this is why I'm wondering if there are some other configurations in httpd, set outside of that mirror thingy.	17:03
Tengu	fungi: I'm running my local proxy like this: podman run --rm --security-opt label=disable -v ./httpd.conf:/usr/local/apache2/conf/httpd.conf:ro -v ./cache:/var/cache/apache2/proxy:rw -v ./logs:/usr/local/apache2/logs:rw -p 8080:8080 httpd:2.4	17:04
Tengu	and then pointing ansible-galaxy -s http://localhost:8080	17:04
fungi	Tengu: looking at one of the mirror servers, we have https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/files/apache-connection-tuning added	17:07
fungi	but aside from that, just the default configuration from ubuntu bionic (18.04 lts)	17:08
Tengu	hmmmmmm not sure it would really be a thing	17:08
Tengu	yeah. shouldn't do	17:08
Tengu	weird...	17:08
Tengu	anyway... getting late here. I'm a bit puzzled by that behavior, but I don't know what I can do. Yeah, adding some debugg, of course - maybe getting the file copied before update, that may help.	17:09
Tengu	if I have some time... though I doubt.	17:09
fungi	i confirm that i don't find community.general in the json we get back from the proxy, but it's not in the json from galaxy.ansible.com either	17:13
fungi	i guess community.general is a key in the local cache	17:17
fungi	not in the json response	17:17
fungi	i suppose that's the result of the earlier warning line	17:17
fungi	Tengu: i suppose one difference might be that mirror.dfw.rax.opendev.org is a cname to mirror01.dfw.rax.opendev.org and while we're calling the former it's the latter we find being substituted in the response	17:19
fungi	do you see the same error if you use https://mirror01.dfw.rax.opendev.org:4448/ instead of the hostname without the 01 in it?	17:19
clarkb	note we also use the internal rax ip address in CI for the rax mirrors specifically (gets better throughput)	17:21
clarkb	but I wouldn't expect that to matter too much as they are both CNAMEs so should be able to reproduce using the public name	17:21
fungi	though it's worth noting that we'll end up using the non-internal interface for subsequent calls recursed from the initial request since mod_substitute is writing the server name in there. maybe we need to substitute the hostname from the request instead	17:22
clarkb	I half expected that it already did that? I guess not	17:23
fungi	it would get extra broken if we started doing sni with different hostname-specific vhosts later	17:23
fungi	if you curl https://mirror.dfw.rax.opendev.org:4448/api/v2/collections/community/general/ you'll see the json says mirror01 instead of mirror	17:24
clarkb	does it do that with pypi?	17:25
clarkb	(it also substitutes iirc)	17:26
fungi	also in https://paste.opendev.org/show/biRnXTrehuHU0b0GNRZ5/ you can see the initial requests are to mirror.dfw.rax but then a subsequent request goes to mirror01.dfw.rax	17:26
fungi	clarkb: we don't embed the hostname in pypi responses, we use relative hrefs	17:27
fungi	rewriting to /pypifiles	17:27
fungi	er, not relative, but local	17:27
clarkb	aha	17:27
*** jpena is now known as jpena\|off		17:30
fungi	i guess technically we could try that with the ansible galaxy substitutions too	17:35
fungi	basically just drop the scheme, servername and port from https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L584	17:37
fungi	Substitute "s\|https://galaxy.ansible.com/\|/\|ni"	17:37
fungi	also would allow to get rid of the scheme lookup conditionals	17:38
fungi	oh, though the comment immediately above there states "ansible-galaxy CLI needs a fully qualified URI"	17:38
fungi	so maybe that was already attempted	17:38
fungi	maybe we can do it like	17:49
fungi	Substitute "s\|https://galaxy.ansible.com/\|%{REQUEST_SCHEME}://%{HTTP_HOST}:%{SERVER_PORT}/\|ni	17:55
fungi	oh, i see it's a feature of apache 2.5.1: https://httpd.apache.org/docs/trunk/en/mod/mod_substitute.html (see the bit on expr= syntax)	18:02
fungi	some of our older mirror servers are still on bionic, so not new enough for that	18:03
fungi	oh, even focal's isn't	18:04
fungi	yeah, nevermind. ubuntu lunar is even still on apache 2.4	18:05
fungi	so yeah, the options are dwindling	18:09
fungi	i guess we could make the preferred mirror hostname a separate ansible var and jinja that into the Substitute directive	18:10
fungi	assuming this ends up being the problem	18:11
frickler	wow, that's some monster job logs that make my firefox choke when I open the corresponding zuul page https://184e5731741af40c59ec-11b479ab8ac0999ee2009c93a602f83a.ssl.cf1.rackcdn.com/870988/1/check/cross-nova-functional/1cb7591/	18:43
clarkb	frickler: definitelyworth encouraging the nova team to address that. I twill make jobs run faster too	18:45
clarkb	I usually open large logs with vim and it mostly handles it	18:45
frickler	the problem is that the build page loads the huge job-output.json and them seems to break it	18:49
frickler	https://zuul.opendev.org/t/openstack/buildset/e8968c8bf1be4caa86d4d6ef0fb23cd9 is the buildset page, the failed job is the one in question	18:49
clarkb	ya I'm not sure what zuul can do about that. I guess show an error?	18:49
frickler	maybe it should truncate oversized logs even before uploading them	18:51
clarkb	the problem with that is you lose the information necessar to address the problem in many cases	18:52
frickler	them maybe just rename oversized job-output.json so it doesn't get autoloaded. it will still be available for manual inspection	18:54
clarkb	lots of "WARNING [oslo_messaging.rpc.client] Using RPCClient manually to instantiate client. Please use get_rpc_client to obtain an RPC client instance."	18:54
clarkb	2617487 log lines in the job-output.txt. 2101438 are that line above	18:55
clarkb	melwitt: ^ fyi	18:55
clarkb	who hacks on oslo these days? stephenfin? Maybe that warning should be emitted once per process?	18:56
frickler	not sure if or how that could be related to the eventlet bump though. doesn't seem to happen for other patches	18:56
melwitt	I see stephenfin around oslo occasionally	18:57
clarkb	frickler: I don't think it is. I suspect this is a change in oslo_messaging	18:57
melwitt	I'll look at nova, should be "easy" to fix I would think	18:57
clarkb	I think the january 5 release of oslo.messaging version 14.1.0 added it. Commit 4ead7cb2dcf376032f7bf9532a375256db6d3784 was the change and appears to be after 14.0.0	19:01
clarkb	tobias-urdin: ^ fyi	19:02
melwitt	clarkb: looks like it was fixed two days agoi https://github.com/openstack/nova/commit/c59db128a00477f6163d71ea1454da4286dad708	19:26
melwitt	*ago	19:26
clarkb	hrm that log is from about 26 hours ago	19:27
clarkb	oh maybe the change landed more recently than two days ago. The commit time would be earlier	19:27
clarkb	yup it merged 22 hours ago or so. That explains it	19:28
melwitt	ah, yeah	19:28
clarkb	thank you for looking into it	19:29
melwitt	np, thanks for the heads up about it	19:30
Tengu	fungi: oh! using the host you pointed (mirror01.dfw.rax.opendev.org) , it seems to work now!	19:47
fungi	Tengu: thanks for testing, that at least narrows down the cause. now to figure out what to do about it	19:47
Tengu	clarkb: I'd expect it to actually play a role, because the ansible-galaxy cache is using the servername (as passed in -s <servername>)	19:47
fungi	that also makes a lot more sense as to why skipping the local cache works around the issue	19:48
Tengu	yup	19:48
Tengu	so in the CI case, skipping local cache is OK, since it's a one-show.	19:48
fungi	would still be nice to figure out how to do that substitution so that you don't have to use that workaround	19:49
Tengu	and the local cache is a dict, built as {'host': {'module_name': {'path1': ..., 'path2': ...}}}	19:49
Tengu	or something like that	19:49
Tengu	fungi: iirc httpd itself should know its actual name?	19:50
fungi	well, that's the tricky part. there's not necessarily any single name, that vhost supports multiple names	19:50
Tengu	fungi: yeah, so, confirmation: cleaning local cache, running with ansible-galaxy collection install -vvvvvv -s https://mirror.dfw.rax.opendev.org:4448 -p /tmp/foo_test__ community.general it fails; cleaning cache, re-running with the mirror01, it works.	19:51
fungi	however, we can probably specify a preferred name we want to use in the rewrites. for example doing mirror-int for the ones where we want the nodes to use the mirror's internal/private interface for performance reasons	19:51
Tengu	fungi: hmmm.... so there's a mismatch between the ansible variable (don't remember its name) and the actual vhost in apache config?	19:51
Tengu	jm1: we found the root cause apparently :)	19:52
Tengu	jm1: well, actually, fungi pointed the missing piece :)	19:52
fungi	not a mismatch. mirror and mirror01 are both valid names for the server and the vhost will serve (currently the same) content for either	19:52
Tengu	fungi: ~> the ansible_var we get in the zuul job then?	19:53
fungi	the challenge is deciding which name we want nodes using in their requests, which may not be the primary hostname for the server	19:53
fungi	but setting that statically in the vhost configuration, because as your inline comment points out, mod_substitute on apache 2.4.x doesn't support expressions	19:53
jm1	Tengu, fungi 🥳	19:54
jm1	I just wanted to give up on this :D	19:54
Tengu	fungi: hmmm..... so mod_substitute doesn't support the httpd internal variables?	19:54
fungi	not until apache 2.5.1 (currently under development)	19:54
Tengu	dang	19:54
fungi	however, we can probably add an ansible var in our deployment inventory for each mirror host to contain the name we plan to tell clients to access it as	19:55
Tengu	that would be great :)	19:55
fungi	and then jinja splat that into the substitute rule	19:56
fungi	other root sysadmins with a better grasp of ansible can tell me if i'm smoking something with that idea	19:56
Tengu	we can do whatever you want actually :). host_vars are here for that.	19:57
clarkb	fungi: the main issue with that is you'd break public access/testing of the name in rax (since we'd have to use the internal name since that is what the jobs get). Elsewhere it is fine and we can probably use elsewhere as testing proxies	20:00
fungi	clarkb: yes, i think more generally it's impossible to proxy ansible-galaxy completely with apache if you want to serve it from multiple arbitrary hostnames for the same server	20:01
fungi	so it's already broken in that way	20:01
clarkb	ya	20:02
fungi	just trying to think of a solution which breaks it in favor of the hostnames we want test nodes using rather than in favor of some other hostname	20:02
fungi	where the latter is what we have at the moment	20:02
fungi	another way would be to give the rackspace mirrors two vhosts and use sni to route requests to the correct one for internal vs external interface hostnames	20:04
Tengu	or maybe pass a secondary zuul_site_mirror_fqdn var such as zuul_site_mirror_fqdn_fixed (or the like) that will then match the actual name of the host (i.e. mirror01.dfw.rax.opendev.org) ?	20:05
Tengu	that way, actual jobs will have to get the config, but it doesn't really change anything on the mirror config itself?	20:05
fungi	Tengu: the problem (and the reason for the mention of rackspace) is that in rackspace our mirror servers are dual-homed and we'd prefer nodes to connect to their non-public interfaces	20:06
Tengu	hmm ok.	20:07
fungi	so we really do want nodes to use urls like https://mirror-int..dfw.rax.opendev.org/... which isn't reachable from outside	20:07
Tengu	sounds legit	20:07
fungi	mainly because we get improved efficiency and stability for connections across their private internal network	20:08
fungi	so making the apache configuration on that server know the hostname we're telling nodes to connect to would give us something to bake into the substitute rule	20:08
clarkb	ya tha might be the best appraoch but more effort than simply substituting the internal name always	20:09
Tengu	fungi: so for instance http://mirror.iad3.inmotion.opendev.org:8085 would actually be another name ?	20:09
fungi	and yeah, an apache host_var containing that name would be one way to go about it	20:09
fungi	Tengu: we tell clients to connect to mirror.iad3.inmotion.opendev.org which is currently a cname to mirror02.iad3.inmotion.opendev.org	20:10
fungi	the server knows itself as mirror02 but considers mirror to be an available alias	20:10
Tengu	ok, so this means the substitute talks abec mirror02, while galaxy knows "mirror".... and crashes.	20:11
Tengu	ok.	20:11
fungi	right	20:11
Tengu	and it fails once we get to the second page for #reason.	20:11
Tengu	because single paging is fine.	20:11
fungi	so if we tweak the substitute rule to use mirror.iad3.inmotion.opendev.org like the client requests do, then it should work	20:12
Tengu	go figure.... they probably messed big time at some point, but that's really a corner case.	20:12
Tengu	fungi: so if we have a way to inject that name "mirror.iad3..." in the ansible generating the config, we're good.	20:12
fungi	the reason the pagination seems to break it is that the pages include "previous" and "next" fields which use fully qualified urls	20:12
Tengu	does it?	20:13
fungi	and the client is probably following the "next" url from the first page, which then takes it to mirror02 instead of mirror	20:13
Tengu	oh.... dang.	20:13
Tengu	yeah	20:13
Tengu	that's exactly that	20:13
Tengu	that's the trick	20:13
Tengu	you got it, fungi !	20:14
fungi	anyway, the solution seems fairly straightforward, we probably just need to get consensus among the sysadmins as to the best way to encode the "preferred" request name for the mirror sites (like do we leverage some mechanism to generate them on the fly in group_vars or something similar which doesn't need us to list them individually)	20:16
Tengu	fungi: maybe as a first thing we may just make a simple mapping in the jinja, using {% set %} and some if/elsif/else things...	20:16
tobias-urdin	clarkb: yeah, I'm working on getting everything moved over to the new API there, started with Nova gonna continue with Neutron but got blocked needing this https://review.opendev.org/c/openstack/oslo.messaging/+/869899 and been stuggling some with getting CI green with new tox etc	20:17
tobias-urdin	I will continue look into it for sure	20:17
Tengu	fungi: though... if I understand correctly, mirror.foo.bar is a cname to, at least, mirror01.foo.bar, but may also be a cname to mirror02.foo.bar - would that mean both are up, and both are answering, meaning galaxy may end on 01 first, then re-request mirror.foo.bar and end on 02?	20:18
fungi	Tengu: we add and remove mirror servers frequently is the reason for the cnames	20:19
fungi	you can't have a cname resolve to multiple names though, you're probably thinking of round-robin address records	20:20
Tengu	fungi: err yeah, round-robin address record indeed.	20:20
Tengu	and yeah, cname can't match multiple names. indeed. so it's more a "service" address that may be attached to any of the used server.	20:21
Tengu	fungi: maybe... why is the proxy able/configured to answer to multiple names?	20:21
fungi	anyway, there's enough churn that setting vars is probably cleaner than doing some sort of name mapping	20:21
Tengu	:) pretty sure you'll figure something out	20:22
Tengu	lemme know if there's a need for testing or just pushing ideas.	20:22
Tengu	though.... not today - it's getting late here.	20:22
Tengu	but at least, the root cause is known	20:22
fungi	Tengu: mainly because we haven't told the proxies not to, and in some cases (like the multi-homed mirrors in rackspace) it's useful to be able to test them over the internet on externally reachable interfaces	20:22
Tengu	fungi: makes sense.	20:22
Tengu	a pity mod_substitute doesn't support vars yet ;_;. that would solve everything	20:23
fungi	we could make the apache vhosts name-specific rather than wildcarded, as i mentioned earlier, it would just mean a lot of duplication in the configs or more apache templating	20:23
jrosser	is the idea to generally proxy/cache any ansible collection, or are there a subset of them we're more interested in?	20:24
fungi	so there are several ways we could go about it, mainly just trying to work out the least intrusive	20:24
Tengu	jrosser: no actual idea. I thought it would be 2 or 3, but apparently that's already wrong.	20:24
fungi	jrosser: to cache general ansible-galaxy access. if you're going to want to test with unreleased or un-merged commits from specific collections, still better to use required-projects in zuul	20:25
jrosser	well, in projects i'm involved in we rewrite the collection URLs on the fly to use any that happen to be cached on the CI node	20:25
clarkb	tobias-urdin: my main suggestin would be to look into using the python warnings library and emit the warning once	20:25
Tengu	jrosser: yeah - and if not cached on the ci node? adding more and more and more isn't good either, since they end up being moved around during the node bootstrap	20:26
Tengu	using the caching-proxy is more flexible imho. and... we're not that far from a working setup, once we get over that hostname "mismatch". And there's a "clean" workaround (passing --no-cache to ansible-galaxy CLI)	20:27
Tengu	also, that issue seems to be affecting only ansible >2.9 - because the galaxy cache was implemented in later release (2.11 I think)	20:28
clarkb	they ultimately provide different functionality some of which is useful at different times. In particular you should use the Zuul case if you are doing testing against unreleased collections and if you want depends-on support	20:29
Tengu	yep.	20:29
Tengu	anyway.... getting really late, I'll check back tomorrow :)	20:32
fungi	however, if you're just doing some ansible testing and need something from galaxy, it's nice not to need to wait for someone to add another github repo to the zuul tenant config	20:32
fungi	have a good night Tengu!	20:32
Tengu	thanks fungi for the pointers :).	20:32
jrosser	the most widespread problem i have seen with galaxy in ci jobs is the API returning 5xx and just bailing out, so server side problem at their end	20:33
jrosser	as a result our jobs get the collections from github with git rathan than galaxy with the API wherever possible	20:34
jrosser	though having said that, occurrences of that kind of error have been very infrequent lately, something must be be fixed/improved in the galaxy server	20:35
fungi	also, in theory, proxying and caching requests for galaxy local to our test nodes should reduce the load we impose on their servers by running test jobs, while improving latency, packet loss, and bandiwdth availability/speed for the requests if hitting a warmed cache	20:41
tobias-urdin	clarkb: i guess the problem is that it gets logged everytime for example an API worker is spawned which means it will be a new interpreter (atleast for mod_wsgi) every time for the lifetime of that worker atleast	20:57
tobias-urdin	did it shrink down a bit when nova was fixed? or is there some other ones causing potential issues	20:58
*** dviroel is now known as dviroel\|out		21:05
clarkb	tobias-urdin: https://5ee1e6c6ba7962bf8d90-9f271c6f9270f1e424d49ce4325dabf5.ssl.cf2.rackcdn.com/871001/1/gate/cross-nova-functional/f996c0a/ it shrunk down significantly. Its more that if you are going to warn a user or operator about something repeating the warning in a tight loop is not helpful as it fills disks/logs and irritates them. That is why the python warnings library	21:15
clarkb	allows you to emit such warnings once and move on	21:15
tobias-urdin	clarkb: yeah i agree, just wondering if using python warning once would actually solve all such issues but hm yea probably some of them atleast	21:26
opendevreview	Clark Boylan proposed opendev/system-config master: Fix Gerrit 3.6 image build https://review.opendev.org/c/opendev/system-config/+/870118	21:28
opendevreview	Clark Boylan proposed opendev/system-config master: Build Gerrit on top of our python-base images https://review.opendev.org/c/opendev/system-config/+/870874	21:28
opendevreview	Clark Boylan proposed opendev/system-config master: Switch Gerrit to Java 17 https://review.opendev.org/c/opendev/system-config/+/870877	21:28
clarkb	this is neat I've got github notifications for a gitea release that doesn't show up in github yet	21:32
clarkb	tobias-urdin: I would expect it to cut down quite a bit since wsgi should have some process reuse right?	21:33
tobias-urdin	clarkb: yeah think so, i've proposed patches to all projects now atleast, not sure if i should change oslo.m also	21:34
*** arxcruz\|ruck is now known as arxcruz		21:50
opendevreview	Ian Wienand proposed openstack/project-config master: nodepool: drop linaro-us https://review.opendev.org/c/openstack/project-config/+/871196	21:55
clarkb	ianw: ^ re that did the new flavor get things going on the new cloud?	21:55
clarkb	ianw: also left a thought on that new change	21:57
clarkb	hrm that tag is still not there I wonder if that implies they immediately deleted it	22:02
ianw	not really. for some reason, it hasn't chosen to upload all the image types, i'm not sure why. there's nothing in the nb04 logs that i can see, it just doesn't seem to try uploading	22:02
clarkb	has it built them? if so thats weird	22:03
ianw	kevinz gave me access to the cloud, but i'm a little worried it's out of disk	22:03
ianw	/dev/nvme1n1p2 196G 186G 792M 100% /	22:04
clarkb	that is one downside to raw images	22:04
clarkb	they are a lot bigger	22:04
clarkb	I wonder if we shuldn't consider trimming what we support on arm64 way back. Like Jammy and Rocky 9	22:05
clarkb	thats 4 images (2 * 2) at about 20GB each raw we'd be under that limit	22:05
ianw	Local Volumes space usage:	22:07
ianw	glance 1 122.9GB	22:07
ianw	it does seem to me that is probably where the images are being stored. i'm still just trying to understand the layout and kolla deployment	22:08
ianw	t] Failed to upload image data due to HTTP error: webob.exc.HTTPRequestEntity	22:12
ianw	TooLarge: Image storage media is full: There is not enough disk space on the image storage media.	22:12
ianw	yeah, glance is not happy	22:12
ianw	ok, kevinz did explain this, but i see now ... there's 2 1tb disks on this	22:16
ianw	nvme0n1 259:0 0 894.3G 0 disk	22:16
ianw	nvme1n1 259:1 0 894.3G 0 disk	22:16
ianw	nvme1n1 is the boot disk -- it has a 1gb efi partition, and 200gb / and then the rest is in lvm for cinder volumes	22:17
ianw	nvme0n1 is 100% in the cinder lvm	22:17
ianw	i think we probably want to make glance use cinder	22:21
opendevreview	Merged opendev/system-config master: Fix Gerrit 3.6 image build https://review.opendev.org/c/opendev/system-config/+/870118	22:21
ianw	since that is where the space is	22:21
clarkb	if that is possible that seems like a good idea	22:23
ianw	it doesn't say cinder -> https://docs.openstack.org/kolla-ansible/latest/reference/shared-services/glance-guide.html#glance-backends	22:25
ianw	but https://opendev.org/openstack/kolla-ansible/commit/fa49b2692de1b38bfdf47e1468296770d5dfff89 suggests maybe otherwise	22:27
*** dasm is now known as dasm\|off		23:13
*** rlandy is now known as rlandy\|out		23:29

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!