Thursday, 2021-02-18

clarkb	spot checking more kna1 job failures I'm yet to find anything that looks like a growroot failure	00:01
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: ensure-zookeeper: add use_tls role var https://review.opendev.org/c/zuul/zuul-jobs/+/776290	00:06
clarkb	changing tactics a bit and using this query: node_provider:"airship-kna1" AND build_status:"FAILURE" AND message:"The filesystem on /dev/vda1 is now" doesn't really produce any catches either	00:25
*** tosky has quit IRC		00:30
openstackgerrit	Ian Wienand proposed opendev/system-config master: refstack: Edit URL of public RefStackAPI https://review.opendev.org/c/opendev/system-config/+/776292	00:37
openstackgerrit	Ian Wienand proposed opendev/system-config master: [wip] gerrit : use mariadb container https://review.opendev.org/c/opendev/system-config/+/775961	00:42
corvus	clarkb: completely at random i saw this patch; i have no idea if it at all relates to that earlier behavior that finally got cleared up: https://gerrit-review.googlesource.com/c/gerrit/+/297216	00:52
clarkb	I'm not sure I fully understand what that commit message is trying to tell me	00:54
clarkb	I guess a sticky vote is one which has carried over from one patchset to the next	00:55
*** LowKey has quit IRC		00:59
ianw	i've noticed the zuul plugin doesn't match on skipped jobs	00:59
ianw	they don't have a time	01:00
ianw	system-config-build-image-gerrit-3.2 https://zuul.opendev.org/t/openstack/build/None : SKIPPED	01:00
clarkb	ya skipped jobs never start aiui	01:01
ianw	i'll put that on the todo list	01:01
ianw	unless anyone feels like javascript regex hacking :)	01:01
*** mlavalle has quit IRC		01:32
*** auristor has quit IRC		02:18
*** auristor has joined #opendev		02:26
*** hemanth_n has joined #opendev		02:53
*** dviroel has quit IRC		02:55
*** dmsimard8 has joined #opendev		02:56
*** dmsimard has quit IRC		02:59
*** lourot has quit IRC		02:59
*** priteau has quit IRC		02:59
*** gouthamr has quit IRC		02:59
*** clayg has quit IRC		02:59
*** dmsimard8 is now known as dmsimard		02:59
*** ianw has quit IRC		02:59
*** clayg has joined #opendev		03:00
*** ianw has joined #opendev		03:00
*** lourot has joined #opendev		03:05
*** dirk has quit IRC		03:25
*** dirk has joined #opendev		03:25
*** zoharm1 has joined #opendev		03:34
*** whoami-rajat__ has joined #opendev		04:38
*** ykarel has joined #opendev		05:00
*** ysandeep\|away is now known as ysandeep\|ruck		05:04
*** DSpider has joined #opendev		05:20
*** LowKey has joined #opendev		05:53
*** gouthamr has joined #opendev		06:01
*** marios has joined #opendev		06:18
*** LowKey has quit IRC		06:37
*** LowKey has joined #opendev		06:37
openstackgerrit	Dinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354	07:01
openstackgerrit	Dinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354	07:03
openstackgerrit	Dinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354	07:04
*** slaweq has joined #opendev		07:13
*** ralonsoh has joined #opendev		07:21
*** DSpider has quit IRC		07:31
*** jpena\|off is now known as jpena		07:36
*** slaweq has quit IRC		07:39
*** slaweq has joined #opendev		07:42
*** eolivare has joined #opendev		07:44
*** fressi has joined #opendev		07:45
*** brinzhang has quit IRC		07:45
*** brinzhang has joined #opendev		07:45
*** fressi has quit IRC		07:52
*** fressi has joined #opendev		08:06
*** rpittau\|afk is now known as rpittau		08:14
*** hashar has joined #opendev		08:21
*** sboyron has joined #opendev		08:24
*** andrewbonney has joined #opendev		08:26
*** ysandeep\|ruck is now known as ysandeep\|lunch		08:26
*** roman_g has joined #opendev		08:27
*** klonn has joined #opendev		08:42
*** tosky has joined #opendev		08:45
openstackgerrit	Dinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354	08:51
*** brinzhang has quit IRC		09:00
openstackgerrit	Andreas Jaeger proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354	09:05
*** ykarel_ has joined #opendev		09:05
openstackgerrit	Andreas Jaeger proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354	09:07
*** DSpider has joined #opendev		09:08
*** ykarel has quit IRC		09:08
openstackgerrit	Merged openstack/project-config master: Add ansible-role-pki to zuul https://review.opendev.org/c/openstack/project-config/+/773387	09:08
*** ysandeep\|lunch is now known as ysandeep\|ruck		09:15
*** dtantsur\|afk is now known as dtantsur		09:31
*** ykarel_ is now known as ykarel		09:49
*** roman_g has quit IRC		10:46
*** JayF has quit IRC		11:03
*** JayF has joined #opendev		11:07
*** dviroel has joined #opendev		11:21
openstackgerrit	Merged zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354	11:31
*** priteau has joined #opendev		11:33
frickler	lest anyone misses this, 19:15 UTC today: https://mars.nasa.gov/mars2020/timeline/landing/watch-online/	11:59
*** hemanth_n has quit IRC		12:16
*** jpena is now known as jpena\|lunch		12:30
openstackgerrit	Martin Kopec proposed opendev/system-config master: refstack: Edit URL of public RefStackAPI https://review.opendev.org/c/opendev/system-config/+/776292	12:47
*** klonn has quit IRC		12:57
*** jpena\|lunch is now known as jpena		13:31
*** klonn has joined #opendev		13:46
*** iurygregory has quit IRC		13:47
*** iurygregory has joined #opendev		13:50
*** mlavalle has joined #opendev		13:55
*** ysandeep\|ruck is now known as ysandeep\|afk		14:11
*** ykarel_ has joined #opendev		14:14
*** ykarel has quit IRC		14:17
*** marios is now known as marios\|call		14:18
fungi	frickler: i just hope we're able to slip it through while the martians aren't watching	14:21
*** ykarel_ has quit IRC		14:27
*** whoami-rajat__ is now known as whoami-rajat		14:48
*** tosky has quit IRC		14:51
*** tosky_ has joined #opendev		14:51
*** tosky_ is now known as tosky		14:51
*** fressi has quit IRC		14:53
*** fressi has joined #opendev		14:55
*** ysandeep\|afk is now known as ysandeep\|ruck		15:02
*** fressi has quit IRC		15:12
*** fressi has joined #opendev		15:13
*** hashar has quit IRC		15:18
*** marios\|call is now known as marios		15:28
*** zoharm1 has quit IRC		15:29
*** fressi has quit IRC		15:33
*** ysandeep\|ruck is now known as ysandeep\|away		15:35
*** roman_g has joined #opendev		15:36
*** ykarel_ has joined #opendev		15:48
roman_g	Good morning. Is there a way to get job logs if it timed out? For example, this one: https://zuul.opendev.org/t/openstack/build/16329de0aeb64a208542d8f6a3ccc15b	16:07
roman_g	3 lines of logs:	16:07
roman_g	2021-02-17 05:15:42.776228 \| TASK [make]	16:07
roman_g	2021-02-17 07:13:51.873764 \| RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/airship/images/playbooks/airship-images-build.yaml@master]	16:07
roman_g	2021-02-17 07:13:51.875093 \| POST-RUN START: [untrusted : opendev.org/airship/images/playbooks/airship-collect-logs.yaml@master]	16:07
roman_g	I'd like to see logs which are from TASK [make].	16:08
roman_g	It hanged for about 2 hours, and then timed out. Need to find out how far did it go and where did it hang.	16:09
*** klonn has quit IRC		16:10
openstackgerrit	Clark Boylan proposed opendev/system-config master: Try to make gatling-git work with our test gerrit https://review.opendev.org/c/opendev/system-config/+/775883	16:12
clarkb	roman_g: if your job isn't writing logs as it runs that task that can be collected I think your best bet is the ansible record json file	16:13
clarkb	roman_g: the zuul job console is rendered for that but it won't render a killed/timed out playbook. However, the json should still have an incomplete record there showing you roughly how far along it got	16:13
clarkb	oh huh nevermind it seems not to. I thought I had used this method before	16:13
clarkb	roman_g: you can update the job such that make outputs to stdout/stderr and that will end up in the console log. Or have it write to a log file that is collected	16:15
roman_g	clarkb Yes, that's what I have been thinking of. Thank you.	16:15
jrosser	hi, is it ok to add more things to the CI mirrors, or are there storage / other things that need considering?	16:19
clarkb	jrosser: storage is definitely something to factor in. https://grafana.opendev.org/d/Zic1IwPGk/afs?orgId=1 gives you an overview of what current storage looks like	16:20
jrosser	erlang-solutions got themselves on my "things that break" list for improving osa CI reliability	16:20
clarkb	afs01.dfw is getting a bit close to full	16:20
*** ykarel_ is now known as ykarel		16:21
fungi	jrosser: also a lot of what is "served" from our mirrors is really just a caching http proxy in reality, so that's often not hard to add (depending on the size of the files, and how effectively the application protocol/api can be proxied)	16:21
jrosser	the trouble with erlang-solutions is that they repeatedly release new packages and break their repo in the process of doing that	16:22
jrosser	so if reprepro were able to mitigate that it would be a small reliability win for osa	16:22
jrosser	but obv. to balance against the storage cost of doing that	16:22
fungi	oh, is it a deb package repository?	16:22
clarkb	and I guess using the distro provided erlange packages we already mirror is a problem?	16:23
jrosser	yes, and they break the checksums relatively often	16:23
jrosser	for focal we use the distro package, but not for bionic down all the stable branches	16:23
jrosser	mainly in order to keep the same exact versions on all the OS	16:23
fungi	jrosser: rough bar napkin calculation on how much data at rest you're talking about	16:24
fungi	?	16:24
fungi	(order of magnitude is fine)	16:24
jrosser	i have a local mirror here as it happens 15Gpackages.erlang-solutions.com	16:26
fungi	so not tiny, but not huge	16:26
jrosser	and looks like i have bionic and focal in it	16:27
clarkb	sort of related, I think we can remove fedora-31 soonish if we remove dib's job for it	16:27
clarkb	then we can clean up that portion of the fedora mirror	16:27
jrosser	i thought i'd ask becasue it's one of the things that has failed jobs for us since the ML post about CI usage	16:28
*** ykarel has quit IRC		16:29
clarkb	one other concern with the one offs like that (ceph is another good example) is they don't tend to be updated by anyone once put in place	16:30
clarkb	we keep things like the distro mirrors up to date as well as we can, but for something like say ceph the problem becomes some versions are built for some distro releases and some are not and figuring out the config is significantly more effort and the folks putting stuff in place in the first place tend not to update them later	16:30
jrosser	i did patches this week to use the ceph mirror	16:31
jrosser	octopus was only recently added	16:31
clarkb	right, I'm not saying they don't get updated but they tend to significantly lag	16:32
clarkb	(I'm just calling this out as a risk, beacuse we are unlikely to keep up with say ceph releases for various distros or erlang releases for various distros)	16:32
clarkb	15GB is probably fine in the current mirrors, but we should also look at trimming things due to afs01.dfw's available disk space	16:34
clarkb	my rough math says we've got about 160GB of headroom currently	16:35
*** LowKey has quit IRC		16:35
fungi	tend to significantly lag on additions, but also tend to lag heavily on cleanup	16:36
*** LowKey has joined #opendev		16:36
fungi	in many cases we don't know whether some of the suites being copied target distro releases we've otherwise dropped, due to how projects name them	16:36
jrosser	i really don't mind either way, it falls into the category of something i understand how to fix but admittedly the size of issue it's addressing is really quite small	16:37
clarkb	another approach or possibly something to do as well, is to reach out to them and ask if they can publish consistent repos	16:37
clarkb	its not super difficult, but does require some care	16:37
jrosser	https://twitter.com/thejrosser/status/1222226703105298433 :)	16:38
fungi	add packages, replace checksums/signatures atomically, pause for a while, then delete unreferenced packages	16:38
fungi	is the generally safe order of operations	16:38
fungi	also never replace package files without up-revving the versions in their filenames	16:39
fungi	overwriting package files is bad, bad, bad	16:39
jrosser	i guess they have some slightly wonky means of updating their repo	16:39
clarkb	fungi: found a grenade failure in kna1, but it failed beacuse it is trying to install a package that doesn't exist (I assume its a stale/bitrotted stable branch)	16:44
clarkb	and its growroot log looks fine	16:45
fungi	best part about heisenbugs is as long as you keep measuring them, they'll stay fixed	16:45
clarkb	fungi: ya I'm beginning to wonder if journalctl -u is blocking long enough for the job to finish a growroot before doing real work	16:46
fungi	entirely possible, if it's truly just a race	16:46
clarkb	the journalctl -u growroot timestamps show growroot takes 2 or 3 seconds and completes before we ever manage to start running ansible	16:46
clarkb	however the ansible task to run journalctl -u growroot takes ~30 seconds?	16:47
clarkb	though ansible itself reports that task took less than a second	16:48
clarkb	not super confident that the bug has been fixed by looking at it, but certainly some odd enough timing there that I can't rule it out	16:48
*** fressi has joined #opendev		16:48
fungi	could also be time weirdness from ntp still synchronizing, if we're catching nodes that early in their lifecycles	16:50
*** jpena is now known as jpena\|off		16:51
*** roman_g has quit IRC		17:03
*** roman_g has joined #opendev		17:03
*** roman_g has quit IRC		17:03
*** roman_g has joined #opendev		17:04
*** roman_g has quit IRC		17:04
*** marios is now known as marios\|out		17:05
*** roman_g has joined #opendev		17:05
*** roman_g has quit IRC		17:05
*** roman_g has joined #opendev		17:06
*** roman_g has quit IRC		17:06
*** roman_g has joined #opendev		17:06
*** roman_g has quit IRC		17:07
*** hashar has joined #opendev		17:16
openstackgerrit	Clark Boylan proposed openstack/diskimage-builder master: Remove fedora-31 testing https://review.opendev.org/c/openstack/diskimage-builder/+/776503	17:18
openstackgerrit	Clark Boylan proposed openstack/diskimage-builder master: Add fedora 33 testing https://review.opendev.org/c/openstack/diskimage-builder/+/776504	17:18
*** marios\|out has quit IRC		17:19
clarkb	I think we don't actually need to land ^ that to cleanup the fedora-31 image	17:19
clarkb	since the job runs on some other platform and builds a fedora-31 image. Still a reasonable cleanup I think	17:19
openstackgerrit	Clark Boylan proposed opendev/system-config master: Try to make gatling-git work with our test gerrit https://review.opendev.org/c/opendev/system-config/+/775883	17:23
clarkb	oh hrm looking at nodepool we don't have fedora-33 yet there	17:25
clarkb	however, I don't think we need 33 before removing 31	17:25
fungi	i think there were some challenges getting f33 built, maybe still unresolved	17:25
fungi	i don't remember specifics, unfortunately	17:26
openstackgerrit	Clark Boylan proposed openstack/project-config master: Stop launch fedora-31 nodes nodepool https://review.opendev.org/c/openstack/project-config/+/776510	17:31
openstackgerrit	Clark Boylan proposed openstack/project-config master: Remove fedora-31 disk image config https://review.opendev.org/c/openstack/project-config/+/776511	17:31
clarkb	we should probably double check with ianw that this doesn't affect any existing plans with dib etc but I think the cleanup looks like that to start	17:32
clarkb	fungi: we do mirror fedora 33 at least	17:42
*** rpittau is now known as rpittau\|afk		17:47
*** dtantsur is now known as dtantsur\|afk		18:00
*** ralonsoh has quit IRC		18:15
frickler	meh, no live stream, everything shown with 7 mins delay, /me feels betrayed. also seems the actual landing is still some hours away, does anyone have an exact timetable?	18:18
clarkb	frickler: 12:55 pm pacific time which is 20:55 UTC is the roughly expected landing time aiui	18:19
clarkb	at 19:15 UTC they will start live streams (looks like on twitch and youtube and probably nasa tv)	18:24
clarkb	not sure if those will be delayed though	18:24
mnaser	realistically -- what are the odds of opendev running a container registry?	18:25
mnaser	if someone was to do the work	18:25
mnaser	or maybe just even provided it as a community service, we have a registry we can make available. i don't even know why dockerhub failures are hitting at this point and they are getting incredibly frustrating	18:26
frickler	mnaser: hah, before I answer that question, did you see we talked about your IPv6 issues in the last meeting?	18:27
mnaser	frickler: i have not	18:28
frickler	mnaser: and I think if vexxhost provided a registry that we could use, I wouldn't object to that	18:28
clarkb	the reason for failures are largely due to docker hubs new rate limiting and how it subverts our existing caching	18:28
frickler	mnaser: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-16-19.01.log.html#l-103 mainly as a reminder to keep nagging you about the IRR setuo	18:28
frickler	s/o$/p/	18:29
clarkb	what we talked about at teh end of last year is that we'd be happy to run a better caching system that isn't subverted by the rate limiting	18:29
mnaser	frickler: yeah, IRR's aren't exactly trivial to get going, but we're working on that alongside with a big network overhaul so i'm hoping to have that done at the same time (alongside adding rpki for bgp, etc)	18:29
clarkb	I don't know that we want to be a registry of record though	18:29
clarkb	(side note quay.io supposedly does not rate limit and we do proxy cache them too)	18:29
frickler	mnaser: rpki would have been my next topic, cool. do you at least have a rough timeframe for that? then I could stay quiet about it for some time ;)	18:30
mnaser	frickler: Q2 2021? it's a big overhaul and it's already in place	18:30
fungi	mnaser: also keep in mind that at least some providers are still likely to be following longstanding filtering guidance which says to ignore any prefixes longer than /23 inside 2600::/12	18:30
fungi	er, longer than /32	18:31
clarkb	specifically dockerhub moved from rate limiting blob requests (which we cache) to rate limiting manifest requests which we do not cache. The manifest requests are not cached because you have to be authenticated to fetch them (even anonymously) and apache won't cache requests with authorization headers even if the cache-control header on the response says it is cacheable	18:31
mnaser	i think all of our announcements are /48s	18:31
* fungi has dyslexic fingers		18:31
fungi	yeah, /48 is longer than /32	18:31
mnaser	clarkb: that is annoying. so what is the infra-suggested-workaround for this?	18:31
mnaser	use quay.io ?	18:31
*** hashar has quit IRC		18:32
clarkb	Ideally we'd find some tool (squid or docker-registry if it can be convinced to be space bound and prunable, etc) and update the mirrors to run that on the docker hub proxy ports	18:32
frickler	istr that someone said quay.io wasn't super reliable, too	18:32
mnaser	yeah i remember seeing some quay.io wasnt reliable talk too	18:32
clarkb	ya they have had a couple semi recent outages that were annoying	18:32
fungi	basically many years ago arin said it's only allocating /19 through /32 sized blocks from 2600::/12 so some providers assume anything longer than that (like a /48) is invalid and filter it at their peers	18:32
mnaser	fungi: welp that's annoying, arin gives one allocation only and its not like i want /32 per site	18:33
mnaser	i am just trying to see what osh can do to drop our failure rate with ratelimits	18:33
clarkb	anyway my memory from our meetings back in ~december was that if people were interested in making this better our suggestion was a better caching proxy whatever that might look like	18:33
frickler	mnaser: fungi: I think that is mostly no longer applied. since there is no IPv6 PI (at least in RIPE), people are using /48s out of PA space as a replacement	18:33
mnaser	in this case, they are images that are owned by osh	18:33
*** andrewbonney has quit IRC		18:34
frickler	so in my understanding, announcing /48 is o.k., it just needs a matching route object	18:34
clarkb	my hunch is that squid would work well	18:34
fungi	the way we worked around it at $oldjob was to announce our aggregate /32 from any border along with our longer prefixes, and then by the time packets reached backbone providers who weren't filtering those routes they'd forward based on longest (most specific) prefix	18:34
clarkb	it is frustrating because we were already doing our best to be good docker citizens and caching the blobs which are the bulk of the data transfer. But according to dockerhub people found rate limitns on the blob requests super confusing beacuse when you pull an image the number of blobs is variable so it is hard to say we will be under limits with any set of requests	18:35
clarkb	so they updated it to be a thing that tools don't want to cache	18:35
clarkb	and represents a tiny fraction of the data transfer	18:35
fungi	usually packets leaving sites filtering longer prefixes only went through at most one additional peering point before reaching a network which had more specific routes anyway	18:36
clarkb	woot that last gatling-git ps has only a single failure	18:37
clarkb	I think I finally figured out how it wants to function	18:37
fungi	in reality we had an ipv6 /29 allocation from arin and assigned a /32 per site, but eventually ended up wanting to subdivide the /32 because that was faster than going back to arin to ask to have our /29 expanded to a /28	18:40
clarkb	and with the pushes working the better load avg on 3.3 goes away but the larger memory use remains	18:40
fungi	once we grew past 8 different multi-homed data centers	18:40
clarkb	fungi: I based the gatling work on https://review.opendev.org/c/opendev/system-config/+/775051/ so that we could compare system level metrics. Any chance you could take a look at reviewing that?	18:40
fungi	sure	18:41
clarkb	mnaser: just thinking out loud here: other options could be to bypass our mirror so that jobs are more likely to fetch from unique IPs (not true in ipv6 only clouds because docker hub is ipv4 only so you funnel through NAT). Docker hub offers a "open source images not rate limited for open source projects that agree to our terms" thing. Unfortunately the terms are weird enough taht I'm not super	18:42
clarkb	comfortable with them (in particular they seem to try to say you can only ever use docker tools to interact with those images)	18:42
fungi	clarkb: also that's specifically about access to images you publish, not your access to images other orgs publish	18:43
clarkb	correct	18:43
clarkb	which includes the base images I think	18:43
mnaser	so something else that comes to my mind is uh	18:44
fungi	so we could maybe get the images we publish to dockerhub flagged that way as long as we renounce all software which competes with docker and run advertisements on their behalf, but our dependencies would not be covered	18:44
mnaser	we could grab openstackci/ and pay for a pro plan, i'm happy to pay the $7 or whatever, and that drops all pull limits	18:44
clarkb	mnaser: the trouble with that is you'd need to authenticate as that user	18:44
fungi	mnaser: and reupload copies of all our dependencies there?	18:44
fungi	and yeah, we'd need special proxies which know how to auth to dockerhub	18:45
mnaser	i think having a pro account drops he pull request for your images	18:45
clarkb	fungi: no the way that works is you authenticate as the user and then all requests by the user are not limited (or less limited)	18:45
mnaser	https://www.docker.com/pricing	18:45
mnaser	>Not Applicable to Pro Accounts	18:45
mnaser	under data transfer / anonymous users	18:45
clarkb	its doable, but not there yet. And if we're redoing the caching system anyway to make that work I'm more inclined to make it work without a special account	18:46
fungi	yep, either we'd need to trust our docker credentials to all jobs which might use them, or make an authenticating proxy	18:46
clarkb	but I've also not had time to figure it out (and honestly their complete subverison of the thing I helped make that worked has made motivation low)	18:47
clarkb	but I guess thats the risk when you're the only person the planet trying to be nice to your upstream services :(	18:47
fungi	if we make an authenticating proxy, that's probably about the same amount of work as making a proper caching proxy registry	18:48
clarkb	fungi: we'd also need to authenticate jobs to that proxy somehow	18:48
mnaser	dumb question	18:48
clarkb	or otherwise prevent our mirrors from becoming a docker hub rate limit back door	18:48
mnaser	have we tried to talk to	18:49
mnaser	folks @ dockerhub?	18:49
fungi	clarkb: oh, great point. we'd be quickly open to abuse	18:49
clarkb	mnaser: yes they sent us a large legal contract like document we had to agree to	18:49
clarkb	mnaser: with terms like "you can only use docker tools with these images"	18:49
fungi	which said things like you have to docuemtn that all your projects require the use of docker inc. software, and you have to participate in periodic promotional press releases and run ads for docker	18:50
clarkb	mnaser: we asked jbryce to take a look but it was a busy end of year last year. It is possible it would be worth looking at it again but I'm not super comfortable with the terms they sent us	18:50
mnaser	i mean those are all things we could likely negotiate	18:50
mnaser	i'm not massively against maybe adding dockerhub to our infrastructure donors if they technically become one	18:51
fungi	the interested parties seem likely to be far more efficient at developing a different solution than negotiating legal contracts	18:51
clarkb	I'm not against it either which is why we asked, but the response was not very friendly	18:51
clarkb	but also I think if others want to pick up that avenue thats fine	18:52
mnaser	fungi: right, i just worry we may be signing ourselves for a little more than we can handle for now :>	18:52
mnaser	but- i don't know anything about that :)	18:52
fungi	mnaser: agreed, that includes trying to tilt at docker's windmill in my opinion	18:52
clarkb	but we did go through a bunch of this several months ago and where we ended up was that likely tbe best option to us is a proxy cache taht works with the new rate limits	18:52
clarkb	and basically asked for help (at the time it was tripleo asking about it, I think they cahnged stuff to rebuild image from base images which aren't rate lmiited)	18:53
mnaser	hmm but i guess that sounds like tripleo is building images on each run	18:53
clarkb	mnaser: yes	18:53
clarkb	mnaser: they do a central image build then other jobs hang off of that	18:53
fungi	but as you point out, because our nodes don't (by design, for trust reasons) authenticate to stuff, and we don't have control over what ip addresses they'll get form one minute to the next, we likely can't strictly prohibit random users from pointing their own systems at our proxies to get around docker's rate limiting, so we're opening ourselves up for potential abuse	18:54
clarkb	fungi: yup that is why I'm mroe comfortable with caching what anonymous users can already do	18:54
clarkb	but we could also probably set up some sort of authentication to an authenticated proxy that was job specific	18:55
clarkb	people have had thoughts about this and setting up squid as an HTTP_PROXY	18:56
clarkb	I think the underyling issues are very similar	18:56
fungi	well, i'm saying, if we cache what anonymous users can already do, and that makes users of that cache less likely to hit dockerhub rate limits, there's nothing stopping joe's garage ci from pointing at our cache instead of dockerhub to get around rate limiting too	18:56
clarkb	oh sure. But we've always done that since we first added teh caching proxy	18:56
clarkb	it isn' exposing anything that require special privileges	18:57
clarkb	so the badness factor is much lower	18:57
*** klonn has joined #opendev		18:57
fungi	another option would be to do something like our wheel builder cache, and just (proactively) cache things listed in or which are dependencies of some list maintained in a git repo. though that means some infrastructure to populate the cache and also someone to review additions for the list	18:58
fungi	we could probably use off the shelf registry software to do that	18:58
fungi	we'd also have lag/races between publication of our own new images and their inclusion into the cache	18:59
fungi	similar to what we see with wheels today	18:59
clarkb	oh ya the other thought was updating zuul-registry to do something like this	18:59
clarkb	since other zuul users may need similar too	18:59
clarkb	fungi: one struggle with existing off the shelf options is you can't prune them while they are running	18:59
clarkb	I think that is where the zuul-registry ideas came in. Since it can prune	18:59
fungi	right, the partial abuse mitigation if we have reusable software to solve the problem is that potential abusers may see running their own zuul-registry proxy as a cheaper alternative to hitting an external one which might go offline or change names periodically	19:00
clarkb	mnaser: I forwarded the docker response to you so you can see the rules they laid out	19:00
clarkb	separately, I wonder if apache would ever want to update mod proxy and mod cache to cache things that cache-control headers say are cacheable	19:01
fungi	good point. i've patched and recompiled squid for similar reasons in the past	19:02
fungi	"yes i know the relevant rfcs say you should never cache these things, but i want to anyway, please don't stop me"	19:02
clarkb	fungi: well in this case the rfcs specifically say you can cache these things	19:03
fungi	oh, right, i misread what you said as "uncacheable"	19:03
clarkb	cache-control: public means "The response may be stored by any cache, even if the response is normally non-cacheable."	19:04
fungi	these are flagged as cacheable, but mod_proxy doesn't want to because they're authenticated requests	19:04
clarkb	docker hub does set cache-control: public on the public image manifests so an rfc respecting cache should be able to cache them	19:04
clarkb	yup	19:04
fungi	and the challenge is the dockerhub protocol requires "anonymous" access be authenticated?	19:04
clarkb	correct	19:05
clarkb	you request a token for an anonymous user then use that to fetch the manifests	19:05
clarkb	and then you fetch the sha256sum addressed blobs out of their cdn (and I don't think this is authenticated, though they may give you time bound redirects for them if pulling private info)	19:06
frickler	mnaser: well it wasn't much talking to dockerhub actually, I received that contract proposal after filling out their online form. if you want to continue talking to them, I can forward that email to you	19:06
clarkb	frickler: I forwarded it already :)	19:07
frickler	clarkb: ah, great	19:07
clarkb	frickler: mnaser: fwiw I had asked jbryce to take a look given some of the terms were a bit odd like the docker tools requirement. May be a good idea to try and run those terms by him again before reaching out further (but I expect this week is bad for that given the texas weather)	19:07
*** eolivare has quit IRC		19:19
mnaser	yeah -- i suspect austin/tx folks have got a few other things on their plate	19:24
mnaser	:(	19:24
fungi	like rebooting their power grid	19:41
clarkb	I followed up with the gerrit gatling-git thread on some of the stuff I discovered, but my latest patchset seems to be working	19:45
clarkb	clones, pulls, and pushes are all happening \o/	19:45
fungi	nice!	19:46
fungi	i'm about to start reviewing it	19:46
fungi	er, start reviewing the dstat change i mean	19:46
clarkb	ya that one is far more important I think	19:47
clarkb	the gatling-git one is still quite a bit hacky. Half hoping there will be a response to my email upstream saying "oh you can do this properlythis way" :)	19:47
fungi	and then your change will be 50% smaller	19:48
clarkb	the next step is for me to fetch the gatling-git report and return it to our log changes	19:48
clarkb	s/log changes/logging system/	19:48
fungi	would gatling-git be useful for torturing gitea too, or is it gerrit-specific?	19:48
clarkb	fungi: I think it could be used for gitea too	19:49
clarkb	it has gerrit specific things in it like change id generation support but it uses jgit and should be able to talk to any gitserver	19:49
fungi	right, and for gitea we're probably more interested in how well it can serve large numbers of requests (though also handling pushes since that's how gerrit writes to it)	19:52
clarkb	ya, I haven't sorted out ssh testing with gatling-git yet either	19:53
clarkb	adding that too is probably a good next step as well	19:53
fungi	oh, right, most of our gerrit users are doing push via ssh, not https	19:54
clarkb	any idea if host_copy_output can use a glob for the source?	19:56
clarkb	oh actually nevermind I need to copy out of the container first and can use a glob for that to a static location then host_copy_output can move things to recordable locations	19:57
openstackgerrit	Clark Boylan proposed opendev/system-config master: Try to make gatling-git work with our test gerrit https://review.opendev.org/c/opendev/system-config/+/775883	20:07
clarkb	that might be what we need for the reports	20:07
*** slaweq has quit IRC		20:21
corvus	clarkb: https://www.youtube.com/watch?v=kPrbJ63qUc4 is interesting	20:33
clarkb	I'm watching the nasa twitch feed seems the clean feed is a bit ahead	20:38
*** tbarron\|out has joined #opendev		20:38
corvus	yeah, i've got both up; clean feed is less gabbing and more callouts	20:39
fungi	i'm watching the nasa.tv feed, it's saying 5 minutes from entry	20:44
fungi	sounds like the twitch feed is ~30 seconds faster	20:50
*** mgagne has quit IRC		21:08
*** sboyron has quit IRC		21:10
*** whoami-rajat has quit IRC		21:21
*** DSpider has quit IRC		21:22
clarkb	that was cool	21:26
ianw	grafana 7.4.2 released now with a fix to our reported issue ... https://github.com/grafana/grafana/pull/31263/commits/bf00580f9b63290cdef436bdd46d560f90e27a3e	22:51
ianw	we've blocked the endpoint anyway	22:51
clarkb	ianw: and we pin the image in our container right?	22:51
clarkb	so we could bump the pin then be double covered?	22:51
openstackgerrit	Ian Wienand proposed opendev/system-config master: grafana: update to 7.4.2 https://review.opendev.org/c/opendev/system-config/+/776553	22:55
ianw	clarkb: ^ great minds think alike :)	22:55
ianw	https://review.opendev.org/c/opendev/system-config/+/775553 is an easy one too that just creates screenshots of grafana, like we do for gerrit	22:57
ianw	opportunities to pull some of these bits out into more library functions i imagine if we keep doing this	22:57
clarkb	ianw: one thing I was thinking about recently is if we should try to do a playbooks/tests/ dir and push all those testing playbooks down a level	22:58
clarkb	but it isn't clear to me if that would break host vars and stuff due to relative paths	22:58
ianw	fungi / clarkb : https://review.opendev.org/c/opendev/system-config/+/766630 and https://review.opendev.org/c/opendev/system-config/+/775733 are a couple of backup changes if you have some time	22:59
ianw	the big one is 766630 that removes bup; it's worth checking the wiki backups in particular as i think that's the only one that nobody else has validated	22:59
clarkb	ya I can do reviews. Mostly been trying to get gatling-git further along and help nodepool zk tls changes	22:59
clarkb	I'll have to defer to fungi on the wiki stuff	23:00
ianw	thanks	23:01
fungi	yeah, looking, thanks for the reminder	23:01
ianw	clarkb / kopecmartin : i'll put a hold on https://review.opendev.org/c/opendev/system-config/+/776292 and we can re-run the job -- i have no idea what's failing and i think live debugging will be the best way forward	23:02
ianw	infra-root: another one to consider is https://review.opendev.org/c/opendev/system-config/+/771445 which expands the comment area; it's been sitting for a while. we should either yes or no it i guess. i'm about +1.5 and rounding up, poking through the shadown dom feels icky. but it's also what we do in CI to take screenshots, so ...	23:07
clarkb	ianw: for BORG_UNDER_CRON we set that globally for all crontab entries right? I guess it doesn't really matter too much. Can we set it on the command instead to apply it only to the specific crontab entry though?	23:10
*** klonn has quit IRC		23:11
ianw	umm, yes i guess so	23:17
clarkb	(I'm mostly thinking out loud here I don't know that this sort of thing matters all that much)	23:19
ianw	i also wanted to do a bit of a grep of the output to send some stats, similar to what we do for the mirror updates, so we can graph our incrementals	23:19
fungi	yeah, i don't think i feel all that strongly about envvar pollution except where the variables are poorly named and might be used by another tool unexpectedl	23:23
fungi	y	23:23
*** tosky has quit IRC		23:31
ianw	wait-for-it.sh is a pure bash script ... nc -z $WAITFORIT_HOST $WAITFORIT_PORT	23:34
ianw	it bet you could do something with proc and opening a socket in actual pure bash	23:35
*** LowKey has quit IRC		23:37
*** LowKey has joined #opendev		23:37
openstackgerrit	Merged opendev/system-config master: grafana: take some screenshots during testing https://review.opendev.org/c/opendev/system-config/+/775553	23:37
clarkb	ianw: guillaumec I'm trying to make sense of that comment width change. It seems like it is doing an n^2 search over a single list of comments?	23:51
clarkb	I'm not sure I understand this loop	23:51
openstackgerrit	Ian Wienand proposed opendev/system-config master: [wip] gerrit : use mariadb container https://review.opendev.org/c/opendev/system-config/+/775961	23:51
clarkb	oh no last_change_idx is there to move things ahead on the next pass of the outer loop	23:51
clarkb	so its still a scan done in n	23:52
fungi	ianw: is there a change already to add the new borg servers to cacti?	23:54
fungi	or are they already in there and i'm just blind?	23:55
ianw	fungi: ahh, nope, i may well have forgotten that	23:55
fungi	i'm happy to push a change up for that	23:55
fungi	i was being lazy and using cacti to find their hostnames ;)	23:55
fungi	backup02.ca-ymq-1.vexxhost.opendev.org and backup01.ord.rax.opendev.org are the borg servers, right?	23:56
ianw	yep, that's right	23:56
fungi	cool, will push momentarily	23:57
ianw	clarkb: yeah, the overall level of "i don't want to debug this and i wish gerrit had a way to style comments properly" is my main concern	23:57
clarkb	ianw: heh that resembles the comments I'm writing :)	23:58
clarkb	though with specific concerns related to ^	23:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!