Monday, 2022-11-28

*** swalladge is now known as Guest138		01:38
*** yadnesh\|away is now known as yadnesh		04:48
*** dasm\|off is now known as Guest188		05:30
opendevreview	Cedric Jeanneret proposed openstack/project-config master: Ensure NetworkManager doesn't override /etc/resolv.conf https://review.opendev.org/c/openstack/project-config/+/865433	08:25
*** jpena\|off is now known as jpena		08:42
*** prometheanfire is now known as Guest217		09:11
*** yadnesh is now known as yadnesh\|afk		09:43
*** yadnesh\|afk is now known as yadnesh		10:12
*** anbanerj is now known as frenzy_friday\|rover		10:47
*** dviroel\|out is now known as dviroel		10:58
*** frenzy_friday\|rover is now known as frenzy_friday\|rover\|food		12:22
opendevreview	Merged openstack/openstack-zuul-jobs master: Add py310 master template jobs https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/862286	13:06
*** frenzy_friday\|rover\|food is now known as frenzy_friday\|rover		13:38
Tengu	fungi: heya! glad you appreciate my python in-lining :)	13:52
fungi	i do, it's clever	13:55
Tengu	:)	13:55
*** akekane is now known as abhishekk		13:57
*** Guest188 is now known as dasm		13:58
*** yadnesh is now known as yadnesh\|away		13:59
Tengu	fungi: while I'm at it - probing: what would be your thoughts on getting a squid proxy with dedicated CA in order to make proper web content caching, even from TLS sources such as ansible-galaxy?	14:49
Tengu	context: everyday, we're seeing failures from ansible-galaxy, usually 502 errors from their side, and this breaks TripleO CI jobs, meaning "more recheck" that would be otherwise avoided/not needed.	14:50
Tengu	I'm trying to find a "nice" way out of this situation, and using some caching-proxy, if possible at infra level, seems like a possible way.	14:51
Tengu	afaik, there are already RPM mirrors available. Not 100% sure about who manage them though.	14:52
fungi	Tengu: we have a content caching proxy in each region already, with valid ssl certs. it's using apache mod_proxy/mod_cache instead of squid, but is the specific proxy software decision important in that case?	14:54
*** dviroel is now known as dviroel\|afk		14:55
fungi	right now we use them to cache pypi, npm, dockerhub... i forget what else	14:55
fungi	we could add the ansible galaxy site too, i expect, depending on how proxyable they've designed it	14:56
Tengu	fungi: ah, so if we configure our ansible tasks calling ansible-galaxy to use those existing content proxy, that would be just working out of the box?	14:57
Tengu	fungi: what would be required to do some tests?	14:57
fungi	Tengu: here's where we'd add the proxy: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2	14:58
Tengu	fungi: is there some doc?	14:58
Tengu	ah, yeah. ok. got it.	14:58
Tengu	so I was thinking "squid" because it's "slightly" easier to put in place :).	14:59
Tengu	no need to make new vhost/others	14:59
fungi	and yeah, whatever's pulling from ansible-galaxy would need to be told to pull from the local proxy url instead (it's not a transparent proxy, it presents itself as a copy of the site being proxied)	14:59
opendevreview	Dr. Jens Harbott proposed openstack/project-config master: Use kolla.config for kolla-ansible in gerrit https://review.opendev.org/c/openstack/project-config/+/865686	14:59
Tengu	and it generates certificates on-demand, using a provided CA	14:59
fungi	no need to make a new vhost, just use a specific path on the main mirror vhost unless whatever's pulling from galaxy is very particular about the relative path to the content	15:00
Tengu	I'm pretty sure caching ansible-galaxy shouldn't be too hard - though it may take some space.	15:00
Tengu	yeah - right. /galaxy or the like.	15:00
Tengu	we'd need to do some testing I gues.	15:00
Tengu	*guess	15:00
fungi	most of our proxied content shares a single vhost on each mirror server	15:00
fungi	but if you look at the template i linked, you'll see a bunch of examples	15:00
fungi	at least enough to get some ideas and do some initial manual investigation into feasibility	15:01
Tengu	I see pypi has a rewrite.	15:01
Tengu	guess galaxy would need the same.	15:01
Tengu	fungi: guess getting URI hit by an "ansible-galaxy collection install foo" would be a good start?	15:03
Tengu	uho. wait. you're using that same apache thing to cache container layers? It soulds wrong..	15:04
Tengu	*sounds	15:04
Tengu	fungi: so for instance: Downloading https://galaxy.ansible.com/download/tripleo-operator-0.9.0.tar.gz that's the URI hit by `ansible-galaxy -v collection download tripleo.operator'	15:08
fungi	Tengu: i'm afraid i'm not familiar enough with container distribution to know why caching layers is bad	15:12
fungi	Tengu: does the ansible-galaxy command provide an option to specify a different url, or some other way to configure the url it will go to?	15:12
Tengu	fungi: overall size. using a docker-registry instance as caching-proxy is probably better, and allows a clean management of the registry content (hence cache). but it's a detail.	15:13
Tengu	as for galaxy - I have to check, but it supports the "http(s)_proxy" environment variable.	15:13
fungi	Tengu: we've looked into registry options, but the last time we dug into it all the ones you could run yourself lacked a safe way to live purge old content	15:14
Tengu	as far as I can tell, it doesn't support passing tweaked URI. though I have to check a doc	15:14
fungi	it's been a while though, so maybe they've improved	15:14
Tengu	fungi: I have one running here, it's cleaning things older than 7 days by default (can be configured of course)	15:14
fungi	Tengu: these proxies are not transparent proxies, so http(s)_proxy envvars won't be much help	15:14
Tengu	ah, that's what you call "transparent proxy" - sorry, not same definition on my side ^^'.	15:15
fungi	Tengu: yeah, if memory serves, the ones we looked into at the time had to restart the registry to purge old images/layers, so went offline briefly when doing so	15:15
Tengu	heh, yeah, bad	15:15
fungi	Tengu: basically we have no way to access control the proxies, so we need to make sure they can't be used to proxy arbitrary content	15:16
Tengu	gimme a moment, have to check a doc about "on-premise ansible-galaxy mirror", it should point to the way to set custom host.	15:16
fungi	i agree, for container images, a pull-through registry backed by other registries with some configurable retention policy would probably work better	15:17
Tengu	I can help on that if needed. I'm running such a registry in a podman pod, alongside redis for index data	15:18
Tengu	seems to work pretty well	15:18
Tengu	so - apparently there's a way to pass an API_SERVER to ansible-galaxy collection install	15:18
fungi	what we have was basically state of the art for 2017 or thereabouts, so we should continue to investigate whether the landscape for centralized container image caching has improved	15:18
Tengu	but I'll have to do some tests.	15:18
Tengu	:)	15:19
Tengu	fungi: stupid question if I may: why using httpd with mod_proxy/mod_cache instead of an actual caching software?	15:19
fungi	mod_cache is caching software, isn't it?	15:19
Tengu	not the best afaik.. ?	15:20
fungi	but if you mean "why not squid" it's that we already have a need for apache on those servers in order to serve the afs caches of our package mirrors	15:20
Tengu	ok..	15:20
Tengu	and I guess the signed certificate was also a reason	15:21
fungi	well, we could configure squid to use the cert when serving connections directly	15:21
fungi	if we wanted to install both on the server	15:22
Tengu	i.e. with squid as a TLS MitM, a crafted CA would be needed so that it can create certificates on the fly - such CA would need to be added then	15:22
fungi	oh, yes we really don't want to complicate things with a mitm configuration. is that what you consider a "real proxy"?	15:22
Tengu	:)	15:22
Tengu	yeah	15:22
Tengu	something able to cache TLS content directly.	15:23
fungi	okay, then no real proxies is one of our basic requirements for this ;)	15:23
Tengu	by default, squid can't decrypt.	15:23
Tengu	it just opens a pass-through tunnel and doesn't see anything	15:23
fungi	again, we have no way of access controlling these proxies, so don't want them able to be used to proxy arbitrary content. they're reachable from arbitrary hosts on the internet	15:23
Tengu	no filtering at all? ok	15:24
Tengu	anyway. I think we should be able to tweak the ansible-galaxy command to use "-s {PROXY}"	15:24
fungi	we "filter" by configuring which specific websites they're backed by	15:24
Tengu	though I'll need to take some time to test it properly.	15:24
fungi	so test nodes can't just generally use them to proxy all requests to the web	15:25
Tengu	the doc is.. well.	15:25
fungi	since we don't control the network topology for the cloud resources donated to us, filtering clients based on source ip address or the like isn't really an option	15:26
Tengu	right. (squid allows authentication)	15:26
fungi	also we give proposers of untrusted changes root access to the "clients" so they'd be able to read any authentication tokens local to the test nodes	15:27
fungi	hence the need to design this so that it's unlikely to be abused as a clandestine web access anonymizer	15:27
Tengu	note that squid also has ACL based on backend host names ;)	15:28
Tengu	but anyway	15:28
Tengu	mode_proxy/mod_cache it is	15:28
Tengu	makes things a bit more complex for ansible-galaxy apparently.	15:28
fungi	yes, we could limit access through the proxy to content for specific sites, but then people couldn't just set htp(s)_proxy envvars globally and would need some way to switch them for specific tools/sites only	15:29
Tengu	no_proxy - but yeah. I've worked on that in tripleo, and proxies are alwaya messy to manage.	15:29
Tengu	especially when operator doesn't know what they're doing -.-.	15:29
fungi	anyway, not to belabor the point, but we've ruled out operating transparent/mitm web proxies for a number of security and manageability reasons	15:30
Tengu	hmmmmmm so "-s" seems to be the correct param.	15:31
fungi	hence the odd rewrite gymnastics necessary for sites like pypi and dockerhub that like to split indexes and content between different domains	15:31
Tengu	lemme start a dumb container with apache/mod_proxy and see how it goes.	15:31
fungi	any time we've brought up with maintainers of those sorts of sites the idea of redesigning their content to make it easier for direct proxying, the response has generally been "why are you bothering to proxy? we have a cdn already"	15:33
Tengu	heh	15:33
Tengu	ppl don't get the actual use of cache.	15:34
fungi	even after very detailed explanations, no	15:34
Tengu	and we're wondering why web content delivery is so slow, why website layout are so terrible and so on..	15:35
fungi	i think people who are old enough to remember metered residential network access or uucp batching understand, but a lot of folks have grown up treating the internet as a limitless utility	15:35
Tengu	"back then", it was better :)	15:35
Tengu	yep	15:35
* Tengu feels old now		15:36
Tengu	thank you fungi -.-	15:36
fungi	i feel the aches and pains of old age every morning when i wake up, it's all the reminding i really need ;)	15:36
Tengu	.. I try to forget about aches and pains, especially in the back -.-	15:37
Tengu	shhhhhh ;)	15:37
Tengu	fungi: my httpd skills are rusted (thank you nginx) - is there a way to ensure "/galaxy/" is removed from the query done in the backend?	15:52
fungi	Tengu: the pypi config does what i think you're asking: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L242-L245	15:56
fungi	Tengu: for example https://mirror.dfw.rax.opendev.org/pypi/simple/bindep/ goes to https://pypi.org/simple/bindep/	15:58
Tengu	hmm.	15:58
Tengu	weird.	15:58
Tengu	ah, there are redirect..	16:01
Tengu	though I thought ProxyPassReverse was supposed to take care of them.	16:01
Tengu	fungi: so, "in theory", getting a new ProxyPass /galaxy/ https://galaxy.ansible.com/ would work.	16:02
Tengu	now there's some tweaking - and since my httpd config skills are so rusted, it will take some time for me to come to something that is actually working.	16:03
Tengu	fungi: is there a place where to push a request for new endpoint?	16:03
Tengu	so yeah - proxypassreverse should catch the 301 and rewrite it properly. pfff. probably missing something in my local httpd config.	16:14
vishalmanchanda	clarkb:hi, could you fix linters on your patch https://review.opendev.org/c/zuul/zuul-jobs/+/865459 once you have time.	16:18
clarkb	vishalmanchanda: yes I can take a look shortly	16:20
vishalmanchanda	clarkb: thanks.	16:20
clarkb	Tengu: fungi: correct, we do not use transparent proxies because they can be abused. Instead we reverse proxy specific content. For docker container caching fungi's memory is also correct. We haven't found a registry that can live prune data which is kind of important for caching.	16:20
Tengu	clarkb: heya :). Thanks for the confirmation - I was more focused on the "TLS impact" of a transparent proxy, actually. Regarding the container layer caching, I'm using this currently: https://paste.openstack.org/show/bI1JecmqTjtFREczs7te/	16:22
Tengu	there's ONE issue with the docker-registry: it can only proxy one backend. Meaning: you have to run as many registries as backend. Which is a bit stupid, but understandable.	16:23
clarkb	Tengu: he docker registry cannot be pruned	16:23
Tengu	and if the mod_proxy is able to manage the layers, well.	16:23
clarkb	* cannot be pruned while up	16:23
Tengu	clarkb: well, apparently yes.	16:23
clarkb	so it is a non starter	16:24
Tengu	at least there are options allowing to clean things older than UPLOADPURGING_AGE (in my case, 7 days)	16:24
Tengu	it's in the registry "maintenance" config section	16:24
clarkb	aiui you cannot do that safely while it is up. It may create failed requests	16:24
clarkb	we investigated this a fair bit before we added the apache caching for docker hub	16:25
clarkb	and it was extremely dissapointing. Other issues included the swift implementation not working	16:25
clarkb	(it would return 0 byte layers often)	16:25
Tengu	for my use-case, it's ok like that. still: my point is more about the ansible-galaxy caching needs :)	16:25
clarkb	sure, we have tools for that. They aren't perfect but they address the varying demands of the system reasonably well	16:26
Tengu	I just proposed https://review.opendev.org/c/opendev/system-config/+/865869 - not really sure how it can actually be tested, i.e. if Infra has some sandbox/playground.	16:26
clarkb	Tengu: check the CI jobs for that change. There should be a job that deploys a mirror and you can use the testinfra tests to query it	16:26
clarkb	I'm pretty sure we already have tests that check the pypi (I think pypi) proxy	16:26
Tengu	hmmm care to point to that "testinfra" repository?	16:27
clarkb	Tengu: opendev/system-config/testinfra	16:28
Tengu	ok	16:28
Tengu	directly in. ok.	16:28
Tengu	good - in "test_mirror.py"	16:28
Tengu	I can add a thing for galaxy.	16:28
clarkb	separately, I believe that tripleo had avoided needing a galaxy cache by having zuul cache the git repos for the various ansible roles instead	16:29
clarkb	it would probably be a good idea to clean that up if it is no longer used	16:29
Tengu	it's failing because there are actual runs of `ansible-galaxy collection install', usually in the molecule tests	16:29
Tengu	but yeah, if we can get the cache, that would be useless. Lemme add a note, I have a call with TripleO CI tomorrow about that matter.	16:30
clarkb	ansible-galaxy can install from disk though iirc	16:30
clarkb	we do it for a role or two that we use in the infrastructure iirc	16:30
clarkb	anyway, I don't really care wha installation method you use. I just don't want to keep caching git repos we don't need to cache anymore if that is the case	16:31
Tengu	note added, thanks clarkb for that info!	16:32
Tengu	clarkb: ah, if you're willing to check on some other change requests, care to have a look at https://review.opendev.org/q/topic:unbound%252Fnetworkmanager ? Note, ianw has an open question, so maybe not fit for a merge right now.	16:40
clarkb	why are there two changes?	16:42
clarkb	fwiw I think the correct place to fix this is in the simple init element for disk image builder. Not base-jobs or our infra specific elements	16:43
Tengu	clarkb: well, unbound seems to be configured in another location than the disk image builder - that's why I also edited that "configure-unbound" role.	16:44
clarkb	its just updating the resolvers to ip version specific resolver config in the job	16:44
clarkb	whether or not NM touches it is completely separate	16:44
Tengu	well, it's also configuring unbound actually	16:45
Tengu	what the point at configuring unbound and setting the resolv.conf content if it's then squashed by NM?	16:45
Tengu	(that's what we're seeing in tripleo jobs - hence those 2 patches)	16:45
clarkb	Tengu: unbound is expected to be fully configured at that point	16:46
Tengu	yes. but it won't be used	16:46
Tengu	because NM will override it pretty fast	16:46
clarkb	But some clouds NAT all ipv4 outbound	16:46
clarkb	NAT + UDP (eg DNS) can be unreliably. What the base-jobs role is doing is checking for an ipv6 capable instance and flipping its forwarding config over to ipv6 resolvers to avoid nat	16:46
clarkb	you should not need to change anything in base-jobs to fix the network manager problem	16:47
Tengu	well.... in that case, there's no use to override the /etc/resolv.conf in the first place.	16:47
Tengu	nor to configure unbound actually.	16:47
clarkb	I don't understand	16:47
clarkb	the point is we are using unbound	16:47
clarkb	some clouds need unbound configured to use ipv6 forwarders	16:47
Tengu	well, it's NOT used during the CI job lifetime.	16:48
Tengu	because NM overrides the /etc/resolv.conf at some point	16:48
clarkb	yes I understand that	16:48
Tengu	(lease refresh, service restart, whatever)	16:48
Tengu	so the configuration I inject there is to ensure this doesn't happen	16:48
clarkb	but unbound is configured in the base image.	16:48
Tengu	I don't touch unbound config	16:48
clarkb	Configuring it in the job is too late	16:48
clarkb	basically updating base-jobs is redundant and confusing	16:48
clarkb	(as this conversation illustrates)	16:49
clarkb	you should only do the configuration in the base image	16:49
Tengu	by not touching the configure-unbound then?	16:49
clarkb	I guess simple-init doesn't assumt unbound so maybe project-config is fine. But base-jobs is just redundant	16:49
clarkb	Tengu: when the instance boots it is using unbound for all DNS by default configured to ipv4 resolvers. Early in the test jobs we check if we have ipv6 and flip the resolvers over to ipv6 resolvers to avoid ipv4 NAT. If you fix network manager at that stage it is already too late as something may have broken DNS	16:50
clarkb	to properly fix this problem you need to have it fixed at boot time is what I am saying	16:50
Tengu	clarkb: so for you, only https://review.opendev.org/c/openstack/project-config/+/865433 is valid - the second one affecting ansible role "configure-unbound" is useless - what about jobs not using the nodepool image? (is it a valid case?)	16:50
clarkb	Tengu: no that is not a valid use case. You cannot run a job in opendev outisde of our images	16:51
Tengu	ok. so I can, indeed, discard the ansible version of the enforcement.	16:52
clarkb	(you actually can through ansible inventory manipulation but if/when you do that you are on your own)	16:52
clarkb	(and any such inventory manipulation should happen well after base jobs playbooks have run)	16:52
Tengu	ok, abandonned the base-jobs one.	16:53
clarkb	ok ya it is nodepool-base that configures unbound for our images and not an element in dib so project-config is the correct place to add the override	16:53
Tengu	so we keep only the thing in the disk-image-builder	16:54
Tengu	\o/	16:54
Tengu	hafl wrong, half right :)	16:54
Tengu	fungi: for the proxy test, I guess the one I want to check is "system-config-run-mirror-x86" job? if green, means my /galaxy/ endpoint is good?	16:56
clarkb	Tengu: side note, I would suggest against pushing every change as a WIP	16:57
clarkb	I did not review the NM stuff last week because it was marked WIP	16:57
Tengu	clarkb: why so?	16:57
clarkb	and because it prevents others from landing your change without your intervention if it is actually ready to go	16:58
clarkb	basiclly use it when you know the change is not ready	16:58
Tengu	well, sure, that's the actual advantage of WIP: not bother ppl	16:58
clarkb	but if the only question is CI then let it be mergable	16:58
Tengu	that said, I can mark the mirror one as "active", since I was able to add a test	16:59
Tengu	so it's "just" a matter of getting green CI.	16:59
clarkb	basically you should mark it WIP when you know you don't want it to merge. Which is different than asking someone else to help evaluate if it is mergable	16:59
Tengu	'k, divergent view on the WIP flag, no issue for me :)	17:00
clarkb	Tengu: left a comment on the cache change	17:02
fungi	part of the reason for that workflow is also because a lot of the projects' acls don't grant core reviewers the ability to un-set wip on your change, therefore they still need another round of action from you to un-wip before approving	17:03
Tengu	clarkb: hmm ok. wasn't aware of the need for the name - I'll indeed update to ansible-galaxy	17:03
fungi	(the ability to delegate that is a more recent addition in gerrit than the wip implementation, so projects are only just now starting to update their acls to grant it)	17:03
clarkb	I'm looking at ansible galaxy and I'm fairly certain we will never cache the search/index results due to the url parameters	17:05
clarkb	The downloads themselves are served by s3 so won't be cached either	17:06
clarkb	however, I suspect something like the pypi cache setup would work	17:07
Tengu	fun - while using the "-vvv" params with ansible-galaxy, it doesn't show anything else than the galaxy.ansible.com/download tree	17:07
clarkb	but I'm not sure as there are parameters in the s3 redirect as well	17:07
clarkb	I think docker does this too? so there are examples you can look at at least	17:07
Tengu	oh, ok. I get it. nice 302 hidden.	17:08
Tengu	so now I can use WIP? :)	17:08
clarkb	Tengu: I don't think that is necessary as the change is already -1?	17:08
clarkb	Tengu: bu if you are worred about someone merging it with a -1 then sure	17:08
Tengu	-1 was dropped with the new push.	17:08
clarkb	oh ou pushed a new patch.	17:09
clarkb	I can -1 again :)	17:09
Tengu	just -W it :)	17:10
Tengu	I'll work on that tomorrow - it's late here (EMEA)	17:10
Tengu	thanks for the help clarkb :)	17:10
*** jpena is now known as jpena\|off		17:54
*** dviroel\|afk is now known as dviroel		18:39
*** rlandy is now known as rlandy\|afk		19:06
*** dviroel is now known as dviroel\|afk		21:00
*** swalladge is now known as Guest277		21:19
*** rlandy\|afk is now known as rlandy		21:39
*** dasm is now known as dasm\|off		22:01
*** Guest217 is now known as prometheanfire		23:12

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!