*** swalladge is now known as Guest138 | 01:38 | |
*** yadnesh|away is now known as yadnesh | 04:48 | |
*** dasm|off is now known as Guest188 | 05:30 | |
opendevreview | Cedric Jeanneret proposed openstack/project-config master: Ensure NetworkManager doesn't override /etc/resolv.conf https://review.opendev.org/c/openstack/project-config/+/865433 | 08:25 |
---|---|---|
*** jpena|off is now known as jpena | 08:42 | |
*** prometheanfire is now known as Guest217 | 09:11 | |
*** yadnesh is now known as yadnesh|afk | 09:43 | |
*** yadnesh|afk is now known as yadnesh | 10:12 | |
*** anbanerj is now known as frenzy_friday|rover | 10:47 | |
*** dviroel|out is now known as dviroel | 10:58 | |
*** frenzy_friday|rover is now known as frenzy_friday|rover|food | 12:22 | |
opendevreview | Merged openstack/openstack-zuul-jobs master: Add py310 master template jobs https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/862286 | 13:06 |
*** frenzy_friday|rover|food is now known as frenzy_friday|rover | 13:38 | |
Tengu | fungi: heya! glad you appreciate my python in-lining :) | 13:52 |
fungi | i do, it's clever | 13:55 |
Tengu | :) | 13:55 |
*** akekane is now known as abhishekk | 13:57 | |
*** Guest188 is now known as dasm | 13:58 | |
*** yadnesh is now known as yadnesh|away | 13:59 | |
Tengu | fungi: while I'm at it - probing: what would be your thoughts on getting a squid proxy with dedicated CA in order to make proper web content caching, even from TLS sources such as ansible-galaxy? | 14:49 |
Tengu | context: everyday, we're seeing failures from ansible-galaxy, usually 502 errors from their side, and this breaks TripleO CI jobs, meaning "more recheck" that would be otherwise avoided/not needed. | 14:50 |
Tengu | I'm trying to find a "nice" way out of this situation, and using some caching-proxy, if possible at infra level, seems like a possible way. | 14:51 |
Tengu | afaik, there are already RPM mirrors available. Not 100% sure about who manage them though. | 14:52 |
fungi | Tengu: we have a content caching proxy in each region already, with valid ssl certs. it's using apache mod_proxy/mod_cache instead of squid, but is the specific proxy software decision important in that case? | 14:54 |
*** dviroel is now known as dviroel|afk | 14:55 | |
fungi | right now we use them to cache pypi, npm, dockerhub... i forget what else | 14:55 |
fungi | we could add the ansible galaxy site too, i expect, depending on how proxyable they've designed it | 14:56 |
Tengu | fungi: ah, so if we configure our ansible tasks calling ansible-galaxy to use those existing content proxy, that would be just working out of the box? | 14:57 |
Tengu | fungi: what would be required to do some tests? | 14:57 |
fungi | Tengu: here's where we'd add the proxy: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2 | 14:58 |
Tengu | fungi: is there some doc? | 14:58 |
Tengu | ah, yeah. ok. got it. | 14:58 |
Tengu | so I was thinking "squid" because it's "slightly" easier to put in place :). | 14:59 |
Tengu | no need to make new vhost/others | 14:59 |
fungi | and yeah, whatever's pulling from ansible-galaxy would need to be told to pull from the local proxy url instead (it's not a transparent proxy, it presents itself as a copy of the site being proxied) | 14:59 |
opendevreview | Dr. Jens Harbott proposed openstack/project-config master: Use kolla.config for kolla-ansible in gerrit https://review.opendev.org/c/openstack/project-config/+/865686 | 14:59 |
Tengu | and it generates certificates on-demand, using a provided CA | 14:59 |
fungi | no need to make a new vhost, just use a specific path on the main mirror vhost unless whatever's pulling from galaxy is very particular about the relative path to the content | 15:00 |
Tengu | I'm pretty sure caching ansible-galaxy shouldn't be too hard - though it may take some space. | 15:00 |
Tengu | yeah - right. /galaxy or the like. | 15:00 |
Tengu | we'd need to do some testing I gues. | 15:00 |
Tengu | *guess | 15:00 |
fungi | most of our proxied content shares a single vhost on each mirror server | 15:00 |
fungi | but if you look at the template i linked, you'll see a bunch of examples | 15:00 |
fungi | at least enough to get some ideas and do some initial manual investigation into feasibility | 15:01 |
Tengu | I see pypi has a rewrite. | 15:01 |
Tengu | guess galaxy would need the same. | 15:01 |
Tengu | fungi: guess getting URI hit by an "ansible-galaxy collection install foo" would be a good start? | 15:03 |
Tengu | uho. wait. you're using that same apache thing to cache container layers? It soulds wrong.. | 15:04 |
Tengu | *sounds | 15:04 |
Tengu | fungi: so for instance: Downloading https://galaxy.ansible.com/download/tripleo-operator-0.9.0.tar.gz that's the URI hit by `ansible-galaxy -v collection download tripleo.operator' | 15:08 |
fungi | Tengu: i'm afraid i'm not familiar enough with container distribution to know why caching layers is bad | 15:12 |
fungi | Tengu: does the ansible-galaxy command provide an option to specify a different url, or some other way to configure the url it will go to? | 15:12 |
Tengu | fungi: overall size. using a docker-registry instance as caching-proxy is probably better, and allows a clean management of the registry content (hence cache). but it's a detail. | 15:13 |
Tengu | as for galaxy - I have to check, but it supports the "http(s)_proxy" environment variable. | 15:13 |
fungi | Tengu: we've looked into registry options, but the last time we dug into it all the ones you could run yourself lacked a safe way to live purge old content | 15:14 |
Tengu | as far as I can tell, it doesn't support passing tweaked URI. though I have to check a doc | 15:14 |
fungi | it's been a while though, so maybe they've improved | 15:14 |
Tengu | fungi: I have one running here, it's cleaning things older than 7 days by default (can be configured of course) | 15:14 |
fungi | Tengu: these proxies are not transparent proxies, so http(s)_proxy envvars won't be much help | 15:14 |
Tengu | ah, that's what you call "transparent proxy" - sorry, not same definition on my side ^^'. | 15:15 |
fungi | Tengu: yeah, if memory serves, the ones we looked into at the time had to restart the registry to purge old images/layers, so went offline briefly when doing so | 15:15 |
Tengu | heh, yeah, bad | 15:15 |
fungi | Tengu: basically we have no way to access control the proxies, so we need to make sure they can't be used to proxy arbitrary content | 15:16 |
Tengu | gimme a moment, have to check a doc about "on-premise ansible-galaxy mirror", it should point to the way to set custom host. | 15:16 |
fungi | i agree, for container images, a pull-through registry backed by other registries with some configurable retention policy would probably work better | 15:17 |
Tengu | I can help on that if needed. I'm running such a registry in a podman pod, alongside redis for index data | 15:18 |
Tengu | seems to work pretty well | 15:18 |
Tengu | so - apparently there's a way to pass an API_SERVER to ansible-galaxy collection install | 15:18 |
fungi | what we have was basically state of the art for 2017 or thereabouts, so we should continue to investigate whether the landscape for centralized container image caching has improved | 15:18 |
Tengu | but I'll have to do some tests. | 15:18 |
Tengu | :) | 15:19 |
Tengu | fungi: stupid question if I may: why using httpd with mod_proxy/mod_cache instead of an actual caching software? | 15:19 |
fungi | mod_cache is caching software, isn't it? | 15:19 |
Tengu | not the best afaik.. ? | 15:20 |
fungi | but if you mean "why not squid" it's that we already have a need for apache on those servers in order to serve the afs caches of our package mirrors | 15:20 |
Tengu | ok.. | 15:20 |
Tengu | and I guess the signed certificate was also a reason | 15:21 |
fungi | well, we could configure squid to use the cert when serving connections directly | 15:21 |
fungi | if we wanted to install both on the server | 15:22 |
Tengu | i.e. with squid as a TLS MitM, a crafted CA would be needed so that it can create certificates on the fly - such CA would need to be added then | 15:22 |
fungi | oh, yes we really don't want to complicate things with a mitm configuration. is that what you consider a "real proxy"? | 15:22 |
Tengu | :) | 15:22 |
Tengu | yeah | 15:22 |
Tengu | something able to cache TLS content directly. | 15:23 |
fungi | okay, then no real proxies is one of our basic requirements for this ;) | 15:23 |
Tengu | by default, squid can't decrypt. | 15:23 |
Tengu | it just opens a pass-through tunnel and doesn't see anything | 15:23 |
fungi | again, we have no way of access controlling these proxies, so don't want them able to be used to proxy arbitrary content. they're reachable from arbitrary hosts on the internet | 15:23 |
Tengu | no filtering at all? ok | 15:24 |
Tengu | anyway. I think we should be able to tweak the ansible-galaxy command to use "-s {PROXY}" | 15:24 |
fungi | we "filter" by configuring which specific websites they're backed by | 15:24 |
Tengu | though I'll need to take some time to test it properly. | 15:24 |
fungi | so test nodes can't just generally use them to proxy all requests to the web | 15:25 |
Tengu | the doc is.. well. | 15:25 |
fungi | since we don't control the network topology for the cloud resources donated to us, filtering clients based on source ip address or the like isn't really an option | 15:26 |
Tengu | right. (squid allows authentication) | 15:26 |
fungi | also we give proposers of untrusted changes root access to the "clients" so they'd be able to read any authentication tokens local to the test nodes | 15:27 |
fungi | hence the need to design this so that it's unlikely to be abused as a clandestine web access anonymizer | 15:27 |
Tengu | note that squid also has ACL based on backend host names ;) | 15:28 |
Tengu | but anyway | 15:28 |
Tengu | mode_proxy/mod_cache it is | 15:28 |
Tengu | makes things a bit more complex for ansible-galaxy apparently. | 15:28 |
fungi | yes, we could limit access through the proxy to content for specific sites, but then people couldn't just set htp(s)_proxy envvars globally and would need some way to switch them for specific tools/sites only | 15:29 |
Tengu | no_proxy - but yeah. I've worked on that in tripleo, and proxies are alwaya messy to manage. | 15:29 |
Tengu | especially when operator doesn't know what they're doing -.-. | 15:29 |
fungi | anyway, not to belabor the point, but we've ruled out operating transparent/mitm web proxies for a number of security and manageability reasons | 15:30 |
Tengu | hmmmmmm so "-s" seems to be the correct param. | 15:31 |
fungi | hence the odd rewrite gymnastics necessary for sites like pypi and dockerhub that like to split indexes and content between different domains | 15:31 |
Tengu | lemme start a dumb container with apache/mod_proxy and see how it goes. | 15:31 |
fungi | any time we've brought up with maintainers of those sorts of sites the idea of redesigning their content to make it easier for direct proxying, the response has generally been "why are you bothering to proxy? we have a cdn already" | 15:33 |
Tengu | heh | 15:33 |
Tengu | ppl don't get the actual use of cache. | 15:34 |
fungi | even after very detailed explanations, no | 15:34 |
Tengu | and we're wondering why web content delivery is so slow, why website layout are so terrible and so on.. | 15:35 |
fungi | i think people who are old enough to remember metered residential network access or uucp batching understand, but a lot of folks have grown up treating the internet as a limitless utility | 15:35 |
Tengu | "back then", it was better :) | 15:35 |
Tengu | yep | 15:35 |
* Tengu feels old now | 15:36 | |
Tengu | thank you fungi -.- | 15:36 |
fungi | i feel the aches and pains of old age every morning when i wake up, it's all the reminding i really need ;) | 15:36 |
Tengu | .. I try to forget about aches and pains, especially in the back -.- | 15:37 |
Tengu | shhhhhh ;) | 15:37 |
Tengu | fungi: my httpd skills are rusted (thank you nginx) - is there a way to ensure "/galaxy/" is removed from the query done in the backend? | 15:52 |
fungi | Tengu: the pypi config does what i think you're asking: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L242-L245 | 15:56 |
fungi | Tengu: for example https://mirror.dfw.rax.opendev.org/pypi/simple/bindep/ goes to https://pypi.org/simple/bindep/ | 15:58 |
Tengu | hmm. | 15:58 |
Tengu | weird. | 15:58 |
Tengu | ah, there are redirect.. | 16:01 |
Tengu | though I thought ProxyPassReverse was supposed to take care of them. | 16:01 |
Tengu | fungi: so, "in theory", getting a new ProxyPass /galaxy/ https://galaxy.ansible.com/ would work. | 16:02 |
Tengu | now there's some tweaking - and since my httpd config skills are so rusted, it will take some time for me to come to something that is actually working. | 16:03 |
Tengu | fungi: is there a place where to push a request for new endpoint? | 16:03 |
Tengu | so yeah - proxypassreverse should catch the 301 and rewrite it properly. pfff. probably missing something in my local httpd config. | 16:14 |
vishalmanchanda | clarkb:hi, could you fix linters on your patch https://review.opendev.org/c/zuul/zuul-jobs/+/865459 once you have time. | 16:18 |
clarkb | vishalmanchanda: yes I can take a look shortly | 16:20 |
vishalmanchanda | clarkb: thanks. | 16:20 |
clarkb | Tengu: fungi: correct, we do not use transparent proxies because they can be abused. Instead we reverse proxy specific content. For docker container caching fungi's memory is also correct. We haven't found a registry that can live prune data which is kind of important for caching. | 16:20 |
Tengu | clarkb: heya :). Thanks for the confirmation - I was more focused on the "TLS impact" of a transparent proxy, actually. Regarding the container layer caching, I'm using this currently: https://paste.openstack.org/show/bI1JecmqTjtFREczs7te/ | 16:22 |
Tengu | there's ONE issue with the docker-registry: it can only proxy one backend. Meaning: you have to run as many registries as backend. Which is a bit stupid, but understandable. | 16:23 |
clarkb | Tengu: he docker registry cannot be pruned | 16:23 |
Tengu | and if the mod_proxy is able to manage the layers, well. | 16:23 |
clarkb | * cannot be pruned while up | 16:23 |
Tengu | clarkb: well, apparently yes. | 16:23 |
clarkb | so it is a non starter | 16:24 |
Tengu | at least there are options allowing to clean things older than UPLOADPURGING_AGE (in my case, 7 days) | 16:24 |
Tengu | it's in the registry "maintenance" config section | 16:24 |
clarkb | aiui you cannot do that safely while it is up. It may create failed requests | 16:24 |
clarkb | we investigated this a fair bit before we added the apache caching for docker hub | 16:25 |
clarkb | and it was extremely dissapointing. Other issues included the swift implementation not working | 16:25 |
clarkb | (it would return 0 byte layers often) | 16:25 |
Tengu | for my use-case, it's ok like that. still: my point is more about the ansible-galaxy caching needs :) | 16:25 |
clarkb | sure, we have tools for that. They aren't perfect but they address the varying demands of the system reasonably well | 16:26 |
Tengu | I just proposed https://review.opendev.org/c/opendev/system-config/+/865869 - not really sure how it can actually be tested, i.e. if Infra has some sandbox/playground. | 16:26 |
clarkb | Tengu: check the CI jobs for that change. There should be a job that deploys a mirror and you can use the testinfra tests to query it | 16:26 |
clarkb | I'm pretty sure we already have tests that check the pypi (I think pypi) proxy | 16:26 |
Tengu | hmmm care to point to that "testinfra" repository? | 16:27 |
clarkb | Tengu: opendev/system-config/testinfra | 16:28 |
Tengu | ok | 16:28 |
Tengu | directly in. ok. | 16:28 |
Tengu | good - in "test_mirror.py" | 16:28 |
Tengu | I can add a thing for galaxy. | 16:28 |
clarkb | separately, I believe that tripleo had avoided needing a galaxy cache by having zuul cache the git repos for the various ansible roles instead | 16:29 |
clarkb | it would probably be a good idea to clean that up if it is no longer used | 16:29 |
Tengu | it's failing because there are actual runs of `ansible-galaxy collection install', usually in the molecule tests | 16:29 |
Tengu | but yeah, if we can get the cache, that would be useless. Lemme add a note, I have a call with TripleO CI tomorrow about that matter. | 16:30 |
clarkb | ansible-galaxy can install from disk though iirc | 16:30 |
clarkb | we do it for a role or two that we use in the infrastructure iirc | 16:30 |
clarkb | anyway, I don't really care wha installation method you use. I just don't want to keep caching git repos we don't need to cache anymore if that is the case | 16:31 |
Tengu | note added, thanks clarkb for that info! | 16:32 |
Tengu | clarkb: ah, if you're willing to check on some other change requests, care to have a look at https://review.opendev.org/q/topic:unbound%252Fnetworkmanager ? Note, ianw has an open question, so maybe not fit for a merge right now. | 16:40 |
clarkb | why are there two changes? | 16:42 |
clarkb | fwiw I think the correct place to fix this is in the simple init element for disk image builder. Not base-jobs or our infra specific elements | 16:43 |
Tengu | clarkb: well, unbound seems to be configured in another location than the disk image builder - that's why I also edited that "configure-unbound" role. | 16:44 |
clarkb | its just updating the resolvers to ip version specific resolver config in the job | 16:44 |
clarkb | whether or not NM touches it is completely separate | 16:44 |
Tengu | well, it's also configuring unbound actually | 16:45 |
Tengu | what the point at configuring unbound and setting the resolv.conf content if it's then squashed by NM? | 16:45 |
Tengu | (that's what we're seeing in tripleo jobs - hence those 2 patches) | 16:45 |
clarkb | Tengu: unbound is expected to be fully configured at that point | 16:46 |
Tengu | yes. but it won't be used | 16:46 |
Tengu | because NM will override it pretty fast | 16:46 |
clarkb | But some clouds NAT all ipv4 outbound | 16:46 |
clarkb | NAT + UDP (eg DNS) can be unreliably. What the base-jobs role is doing is checking for an ipv6 capable instance and flipping its forwarding config over to ipv6 resolvers to avoid nat | 16:46 |
clarkb | you should not need to change anything in base-jobs to fix the network manager problem | 16:47 |
Tengu | well.... in that case, there's no use to override the /etc/resolv.conf in the first place. | 16:47 |
Tengu | nor to configure unbound actually. | 16:47 |
clarkb | I don't understand | 16:47 |
clarkb | the point is we are using unbound | 16:47 |
clarkb | some clouds need unbound configured to use ipv6 forwarders | 16:47 |
Tengu | well, it's NOT used during the CI job lifetime. | 16:48 |
Tengu | because NM overrides the /etc/resolv.conf at some point | 16:48 |
clarkb | yes I understand that | 16:48 |
Tengu | (lease refresh, service restart, whatever) | 16:48 |
Tengu | so the configuration I inject there is to ensure this doesn't happen | 16:48 |
clarkb | but unbound is configured in the base image. | 16:48 |
Tengu | I don't touch unbound config | 16:48 |
clarkb | Configuring it in the job is too late | 16:48 |
clarkb | basically updating base-jobs is redundant and confusing | 16:48 |
clarkb | (as this conversation illustrates) | 16:49 |
clarkb | you should only do the configuration in the base image | 16:49 |
Tengu | by not touching the configure-unbound then? | 16:49 |
clarkb | I guess simple-init doesn't assumt unbound so maybe project-config is fine. But base-jobs is just redundant | 16:49 |
clarkb | Tengu: when the instance boots it is using unbound for all DNS by default configured to ipv4 resolvers. Early in the test jobs we check if we have ipv6 and flip the resolvers over to ipv6 resolvers to avoid ipv4 NAT. If you fix network manager at that stage it is already too late as something may have broken DNS | 16:50 |
clarkb | to properly fix this problem you need to have it fixed at boot time is what I am saying | 16:50 |
Tengu | clarkb: so for you, only https://review.opendev.org/c/openstack/project-config/+/865433 is valid - the second one affecting ansible role "configure-unbound" is useless - what about jobs not using the nodepool image? (is it a valid case?) | 16:50 |
clarkb | Tengu: no that is not a valid use case. You cannot run a job in opendev outisde of our images | 16:51 |
Tengu | ok. so I can, indeed, discard the ansible version of the enforcement. | 16:52 |
clarkb | (you actually can through ansible inventory manipulation but if/when you do that you are on your own) | 16:52 |
clarkb | (and any such inventory manipulation should happen well after base jobs playbooks have run) | 16:52 |
Tengu | ok, abandonned the base-jobs one. | 16:53 |
clarkb | ok ya it is nodepool-base that configures unbound for our images and not an element in dib so project-config is the correct place to add the override | 16:53 |
Tengu | so we keep only the thing in the disk-image-builder | 16:54 |
Tengu | \o/ | 16:54 |
Tengu | hafl wrong, half right :) | 16:54 |
Tengu | fungi: for the proxy test, I guess the one I want to check is "system-config-run-mirror-x86" job? if green, means my /galaxy/ endpoint is good? | 16:56 |
clarkb | Tengu: side note, I would suggest against pushing every change as a WIP | 16:57 |
clarkb | I did not review the NM stuff last week because it was marked WIP | 16:57 |
Tengu | clarkb: why so? | 16:57 |
clarkb | and because it prevents others from landing your change without your intervention if it is actually ready to go | 16:58 |
clarkb | basiclly use it when you know the change is not ready | 16:58 |
Tengu | well, sure, that's the actual advantage of WIP: not bother ppl | 16:58 |
clarkb | but if the only question is CI then let it be mergable | 16:58 |
Tengu | that said, I can mark the mirror one as "active", since I was able to add a test | 16:59 |
Tengu | so it's "just" a matter of getting green CI. | 16:59 |
clarkb | basically you should mark it WIP when you know you don't want it to merge. Which is different than asking someone else to help evaluate if it is mergable | 16:59 |
Tengu | 'k, divergent view on the WIP flag, no issue for me :) | 17:00 |
clarkb | Tengu: left a comment on the cache change | 17:02 |
fungi | part of the reason for that workflow is also because a lot of the projects' acls don't grant core reviewers the ability to un-set wip on your change, therefore they still need another round of action from you to un-wip before approving | 17:03 |
Tengu | clarkb: hmm ok. wasn't aware of the need for the name - I'll indeed update to ansible-galaxy | 17:03 |
fungi | (the ability to delegate that is a more recent addition in gerrit than the wip implementation, so projects are only just now starting to update their acls to grant it) | 17:03 |
clarkb | I'm looking at ansible galaxy and I'm fairly certain we will never cache the search/index results due to the url parameters | 17:05 |
clarkb | The downloads themselves are served by s3 so won't be cached either | 17:06 |
clarkb | however, I suspect something like the pypi cache setup would work | 17:07 |
Tengu | fun - while using the "-vvv" params with ansible-galaxy, it doesn't show anything else than the galaxy.ansible.com/download tree | 17:07 |
clarkb | but I'm not sure as there are parameters in the s3 redirect as well | 17:07 |
clarkb | I think docker does this too? so there are examples you can look at at least | 17:07 |
Tengu | oh, ok. I get it. nice 302 hidden. | 17:08 |
Tengu | so now I can use WIP? :) | 17:08 |
clarkb | Tengu: I don't think that is necessary as the change is already -1? | 17:08 |
clarkb | Tengu: bu if you are worred about someone merging it with a -1 then sure | 17:08 |
Tengu | -1 was dropped with the new push. | 17:08 |
clarkb | oh ou pushed a new patch. | 17:09 |
clarkb | I can -1 again :) | 17:09 |
Tengu | just -W it :) | 17:10 |
Tengu | I'll work on that tomorrow - it's late here (EMEA) | 17:10 |
Tengu | thanks for the help clarkb :) | 17:10 |
*** jpena is now known as jpena|off | 17:54 | |
*** dviroel|afk is now known as dviroel | 18:39 | |
*** rlandy is now known as rlandy|afk | 19:06 | |
*** dviroel is now known as dviroel|afk | 21:00 | |
*** swalladge is now known as Guest277 | 21:19 | |
*** rlandy|afk is now known as rlandy | 21:39 | |
*** dasm is now known as dasm|off | 22:01 | |
*** Guest217 is now known as prometheanfire | 23:12 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!