clarkb | spot checking more kna1 job failures I'm yet to find anything that looks like a growroot failure | 00:01 |
---|---|---|
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: ensure-zookeeper: add use_tls role var https://review.opendev.org/c/zuul/zuul-jobs/+/776290 | 00:06 |
clarkb | changing tactics a bit and using this query: node_provider:"airship-kna1" AND build_status:"FAILURE" AND message:"The filesystem on /dev/vda1 is now" doesn't really produce any catches either | 00:25 |
*** tosky has quit IRC | 00:30 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: refstack: Edit URL of public RefStackAPI https://review.opendev.org/c/opendev/system-config/+/776292 | 00:37 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] gerrit : use mariadb container https://review.opendev.org/c/opendev/system-config/+/775961 | 00:42 |
corvus | clarkb: completely at random i saw this patch; i have no idea if it at all relates to that earlier behavior that finally got cleared up: https://gerrit-review.googlesource.com/c/gerrit/+/297216 | 00:52 |
clarkb | I'm not sure I fully understand what that commit message is trying to tell me | 00:54 |
clarkb | I guess a sticky vote is one which has carried over from one patchset to the next | 00:55 |
*** LowKey has quit IRC | 00:59 | |
ianw | i've noticed the zuul plugin doesn't match on skipped jobs | 00:59 |
ianw | they don't have a time | 01:00 |
ianw | system-config-build-image-gerrit-3.2 https://zuul.opendev.org/t/openstack/build/None : SKIPPED | 01:00 |
clarkb | ya skipped jobs never start aiui | 01:01 |
ianw | i'll put that on the todo list | 01:01 |
ianw | unless anyone feels like javascript regex hacking :) | 01:01 |
*** mlavalle has quit IRC | 01:32 | |
*** auristor has quit IRC | 02:18 | |
*** auristor has joined #opendev | 02:26 | |
*** hemanth_n has joined #opendev | 02:53 | |
*** dviroel has quit IRC | 02:55 | |
*** dmsimard8 has joined #opendev | 02:56 | |
*** dmsimard has quit IRC | 02:59 | |
*** lourot has quit IRC | 02:59 | |
*** priteau has quit IRC | 02:59 | |
*** gouthamr has quit IRC | 02:59 | |
*** clayg has quit IRC | 02:59 | |
*** dmsimard8 is now known as dmsimard | 02:59 | |
*** ianw has quit IRC | 02:59 | |
*** clayg has joined #opendev | 03:00 | |
*** ianw has joined #opendev | 03:00 | |
*** lourot has joined #opendev | 03:05 | |
*** dirk has quit IRC | 03:25 | |
*** dirk has joined #opendev | 03:25 | |
*** zoharm1 has joined #opendev | 03:34 | |
*** whoami-rajat__ has joined #opendev | 04:38 | |
*** ykarel has joined #opendev | 05:00 | |
*** ysandeep|away is now known as ysandeep|ruck | 05:04 | |
*** DSpider has joined #opendev | 05:20 | |
*** LowKey has joined #opendev | 05:53 | |
*** gouthamr has joined #opendev | 06:01 | |
*** marios has joined #opendev | 06:18 | |
*** LowKey has quit IRC | 06:37 | |
*** LowKey has joined #opendev | 06:37 | |
openstackgerrit | Dinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354 | 07:01 |
openstackgerrit | Dinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354 | 07:03 |
openstackgerrit | Dinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354 | 07:04 |
*** slaweq has joined #opendev | 07:13 | |
*** ralonsoh has joined #opendev | 07:21 | |
*** DSpider has quit IRC | 07:31 | |
*** jpena|off is now known as jpena | 07:36 | |
*** slaweq has quit IRC | 07:39 | |
*** slaweq has joined #opendev | 07:42 | |
*** eolivare has joined #opendev | 07:44 | |
*** fressi has joined #opendev | 07:45 | |
*** brinzhang has quit IRC | 07:45 | |
*** brinzhang has joined #opendev | 07:45 | |
*** fressi has quit IRC | 07:52 | |
*** fressi has joined #opendev | 08:06 | |
*** rpittau|afk is now known as rpittau | 08:14 | |
*** hashar has joined #opendev | 08:21 | |
*** sboyron has joined #opendev | 08:24 | |
*** andrewbonney has joined #opendev | 08:26 | |
*** ysandeep|ruck is now known as ysandeep|lunch | 08:26 | |
*** roman_g has joined #opendev | 08:27 | |
*** klonn has joined #opendev | 08:42 | |
*** tosky has joined #opendev | 08:45 | |
openstackgerrit | Dinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354 | 08:51 |
*** brinzhang has quit IRC | 09:00 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354 | 09:05 |
*** ykarel_ has joined #opendev | 09:05 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354 | 09:07 |
*** DSpider has joined #opendev | 09:08 | |
*** ykarel has quit IRC | 09:08 | |
openstackgerrit | Merged openstack/project-config master: Add ansible-role-pki to zuul https://review.opendev.org/c/openstack/project-config/+/773387 | 09:08 |
*** ysandeep|lunch is now known as ysandeep|ruck | 09:15 | |
*** dtantsur|afk is now known as dtantsur | 09:31 | |
*** ykarel_ is now known as ykarel | 09:49 | |
*** roman_g has quit IRC | 10:46 | |
*** JayF has quit IRC | 11:03 | |
*** JayF has joined #opendev | 11:07 | |
*** dviroel has joined #opendev | 11:21 | |
openstackgerrit | Merged zuul/zuul-jobs master: Allow customization of helm charts repos https://review.opendev.org/c/zuul/zuul-jobs/+/767354 | 11:31 |
*** priteau has joined #opendev | 11:33 | |
frickler | lest anyone misses this, 19:15 UTC today: https://mars.nasa.gov/mars2020/timeline/landing/watch-online/ | 11:59 |
*** hemanth_n has quit IRC | 12:16 | |
*** jpena is now known as jpena|lunch | 12:30 | |
openstackgerrit | Martin Kopec proposed opendev/system-config master: refstack: Edit URL of public RefStackAPI https://review.opendev.org/c/opendev/system-config/+/776292 | 12:47 |
*** klonn has quit IRC | 12:57 | |
*** jpena|lunch is now known as jpena | 13:31 | |
*** klonn has joined #opendev | 13:46 | |
*** iurygregory has quit IRC | 13:47 | |
*** iurygregory has joined #opendev | 13:50 | |
*** mlavalle has joined #opendev | 13:55 | |
*** ysandeep|ruck is now known as ysandeep|afk | 14:11 | |
*** ykarel_ has joined #opendev | 14:14 | |
*** ykarel has quit IRC | 14:17 | |
*** marios is now known as marios|call | 14:18 | |
fungi | frickler: i just hope we're able to slip it through while the martians aren't watching | 14:21 |
*** ykarel_ has quit IRC | 14:27 | |
*** whoami-rajat__ is now known as whoami-rajat | 14:48 | |
*** tosky has quit IRC | 14:51 | |
*** tosky_ has joined #opendev | 14:51 | |
*** tosky_ is now known as tosky | 14:51 | |
*** fressi has quit IRC | 14:53 | |
*** fressi has joined #opendev | 14:55 | |
*** ysandeep|afk is now known as ysandeep|ruck | 15:02 | |
*** fressi has quit IRC | 15:12 | |
*** fressi has joined #opendev | 15:13 | |
*** hashar has quit IRC | 15:18 | |
*** marios|call is now known as marios | 15:28 | |
*** zoharm1 has quit IRC | 15:29 | |
*** fressi has quit IRC | 15:33 | |
*** ysandeep|ruck is now known as ysandeep|away | 15:35 | |
*** roman_g has joined #opendev | 15:36 | |
*** ykarel_ has joined #opendev | 15:48 | |
roman_g | Good morning. Is there a way to get job logs if it timed out? For example, this one: https://zuul.opendev.org/t/openstack/build/16329de0aeb64a208542d8f6a3ccc15b | 16:07 |
roman_g | 3 lines of logs: | 16:07 |
roman_g | 2021-02-17 05:15:42.776228 | TASK [make] | 16:07 |
roman_g | 2021-02-17 07:13:51.873764 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/airship/images/playbooks/airship-images-build.yaml@master] | 16:07 |
roman_g | 2021-02-17 07:13:51.875093 | POST-RUN START: [untrusted : opendev.org/airship/images/playbooks/airship-collect-logs.yaml@master] | 16:07 |
roman_g | I'd like to see logs which are from TASK [make]. | 16:08 |
roman_g | It hanged for about 2 hours, and then timed out. Need to find out how far did it go and where did it hang. | 16:09 |
*** klonn has quit IRC | 16:10 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Try to make gatling-git work with our test gerrit https://review.opendev.org/c/opendev/system-config/+/775883 | 16:12 |
clarkb | roman_g: if your job isn't writing logs as it runs that task that can be collected I think your best bet is the ansible record json file | 16:13 |
clarkb | roman_g: the zuul job console is rendered for that but it won't render a killed/timed out playbook. However, the json should still have an incomplete record there showing you roughly how far along it got | 16:13 |
clarkb | oh huh nevermind it seems not to. I thought I had used this method before | 16:13 |
clarkb | roman_g: you can update the job such that make outputs to stdout/stderr and that will end up in the console log. Or have it write to a log file that is collected | 16:15 |
roman_g | clarkb Yes, that's what I have been thinking of. Thank you. | 16:15 |
jrosser | hi, is it ok to add more things to the CI mirrors, or are there storage / other things that need considering? | 16:19 |
clarkb | jrosser: storage is definitely something to factor in. https://grafana.opendev.org/d/Zic1IwPGk/afs?orgId=1 gives you an overview of what current storage looks like | 16:20 |
jrosser | erlang-solutions got themselves on my "things that break" list for improving osa CI reliability | 16:20 |
clarkb | afs01.dfw is getting a bit close to full | 16:20 |
*** ykarel_ is now known as ykarel | 16:21 | |
fungi | jrosser: also a lot of what is "served" from our mirrors is really just a caching http proxy in reality, so that's often not hard to add (depending on the size of the files, and how effectively the application protocol/api can be proxied) | 16:21 |
jrosser | the trouble with erlang-solutions is that they repeatedly release new packages and break their repo in the process of doing that | 16:22 |
jrosser | so if reprepro were able to mitigate that it would be a small reliability win for osa | 16:22 |
jrosser | but obv. to balance against the storage cost of doing that | 16:22 |
fungi | oh, is it a deb package repository? | 16:22 |
clarkb | and I guess using the distro provided erlange packages we already mirror is a problem? | 16:23 |
jrosser | yes, and they break the checksums relatively often | 16:23 |
jrosser | for focal we use the distro package, but not for bionic down all the stable branches | 16:23 |
jrosser | mainly in order to keep the same exact versions on all the OS | 16:23 |
fungi | jrosser: rough bar napkin calculation on how much data at rest you're talking about | 16:24 |
fungi | ? | 16:24 |
fungi | (order of magnitude is fine) | 16:24 |
jrosser | i have a local mirror here as it happens 15Gpackages.erlang-solutions.com | 16:26 |
fungi | so not tiny, but not huge | 16:26 |
jrosser | and looks like i have bionic and focal in it | 16:27 |
clarkb | sort of related, I think we can remove fedora-31 soonish if we remove dib's job for it | 16:27 |
clarkb | then we can clean up that portion of the fedora mirror | 16:27 |
jrosser | i thought i'd ask becasue it's one of the things that has failed jobs for us since the ML post about CI usage | 16:28 |
*** ykarel has quit IRC | 16:29 | |
clarkb | one other concern with the one offs like that (ceph is another good example) is they don't tend to be updated by anyone once put in place | 16:30 |
clarkb | we keep things like the distro mirrors up to date as well as we can, but for something like say ceph the problem becomes some versions are built for some distro releases and some are not and figuring out the config is significantly more effort and the folks putting stuff in place in the first place tend not to update them later | 16:30 |
jrosser | i did patches this week to use the ceph mirror | 16:31 |
jrosser | octopus was only recently added | 16:31 |
clarkb | right, I'm not saying they don't get updated but they tend to significantly lag | 16:32 |
clarkb | (I'm just calling this out as a risk, beacuse we are unlikely to keep up with say ceph releases for various distros or erlang releases for various distros) | 16:32 |
clarkb | 15GB is probably fine in the current mirrors, but we should also look at trimming things due to afs01.dfw's available disk space | 16:34 |
clarkb | my rough math says we've got about 160GB of headroom currently | 16:35 |
*** LowKey has quit IRC | 16:35 | |
fungi | tend to significantly lag on additions, but also tend to lag heavily on cleanup | 16:36 |
*** LowKey has joined #opendev | 16:36 | |
fungi | in many cases we don't know whether some of the suites being copied target distro releases we've otherwise dropped, due to how projects name them | 16:36 |
jrosser | i really don't mind either way, it falls into the category of something i understand how to fix but admittedly the size of issue it's addressing is really quite small | 16:37 |
clarkb | another approach or possibly something to do as well, is to reach out to them and ask if they can publish consistent repos | 16:37 |
clarkb | its not super difficult, but does require some care | 16:37 |
jrosser | https://twitter.com/thejrosser/status/1222226703105298433 :) | 16:38 |
fungi | add packages, replace checksums/signatures atomically, pause for a while, then delete unreferenced packages | 16:38 |
fungi | is the generally safe order of operations | 16:38 |
fungi | also never replace package files without up-revving the versions in their filenames | 16:39 |
fungi | overwriting package files is bad, bad, bad | 16:39 |
jrosser | i guess they have some slightly wonky means of updating their repo | 16:39 |
clarkb | fungi: found a grenade failure in kna1, but it failed beacuse it is trying to install a package that doesn't exist (I assume its a stale/bitrotted stable branch) | 16:44 |
clarkb | and its growroot log looks fine | 16:45 |
fungi | best part about heisenbugs is as long as you keep measuring them, they'll stay fixed | 16:45 |
clarkb | fungi: ya I'm beginning to wonder if journalctl -u is blocking long enough for the job to finish a growroot before doing real work | 16:46 |
fungi | entirely possible, if it's truly just a race | 16:46 |
clarkb | the journalctl -u growroot timestamps show growroot takes 2 or 3 seconds and completes before we ever manage to start running ansible | 16:46 |
clarkb | however the ansible task to run journalctl -u growroot takes ~30 seconds? | 16:47 |
clarkb | though ansible itself reports that task took less than a second | 16:48 |
clarkb | not super confident that the bug has been fixed by looking at it, but certainly some odd enough timing there that I can't rule it out | 16:48 |
*** fressi has joined #opendev | 16:48 | |
fungi | could also be time weirdness from ntp still synchronizing, if we're catching nodes that early in their lifecycles | 16:50 |
*** jpena is now known as jpena|off | 16:51 | |
*** roman_g has quit IRC | 17:03 | |
*** roman_g has joined #opendev | 17:03 | |
*** roman_g has quit IRC | 17:03 | |
*** roman_g has joined #opendev | 17:04 | |
*** roman_g has quit IRC | 17:04 | |
*** marios is now known as marios|out | 17:05 | |
*** roman_g has joined #opendev | 17:05 | |
*** roman_g has quit IRC | 17:05 | |
*** roman_g has joined #opendev | 17:06 | |
*** roman_g has quit IRC | 17:06 | |
*** roman_g has joined #opendev | 17:06 | |
*** roman_g has quit IRC | 17:07 | |
*** hashar has joined #opendev | 17:16 | |
openstackgerrit | Clark Boylan proposed openstack/diskimage-builder master: Remove fedora-31 testing https://review.opendev.org/c/openstack/diskimage-builder/+/776503 | 17:18 |
openstackgerrit | Clark Boylan proposed openstack/diskimage-builder master: Add fedora 33 testing https://review.opendev.org/c/openstack/diskimage-builder/+/776504 | 17:18 |
*** marios|out has quit IRC | 17:19 | |
clarkb | I think we don't actually need to land ^ that to cleanup the fedora-31 image | 17:19 |
clarkb | since the job runs on some other platform and builds a fedora-31 image. Still a reasonable cleanup I think | 17:19 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Try to make gatling-git work with our test gerrit https://review.opendev.org/c/opendev/system-config/+/775883 | 17:23 |
clarkb | oh hrm looking at nodepool we don't have fedora-33 yet there | 17:25 |
clarkb | however, I don't think we need 33 before removing 31 | 17:25 |
fungi | i think there were some challenges getting f33 built, maybe still unresolved | 17:25 |
fungi | i don't remember specifics, unfortunately | 17:26 |
openstackgerrit | Clark Boylan proposed openstack/project-config master: Stop launch fedora-31 nodes nodepool https://review.opendev.org/c/openstack/project-config/+/776510 | 17:31 |
openstackgerrit | Clark Boylan proposed openstack/project-config master: Remove fedora-31 disk image config https://review.opendev.org/c/openstack/project-config/+/776511 | 17:31 |
clarkb | we should probably double check with ianw that this doesn't affect any existing plans with dib etc but I think the cleanup looks like that to start | 17:32 |
clarkb | fungi: we do mirror fedora 33 at least | 17:42 |
*** rpittau is now known as rpittau|afk | 17:47 | |
*** dtantsur is now known as dtantsur|afk | 18:00 | |
*** ralonsoh has quit IRC | 18:15 | |
frickler | meh, no live stream, everything shown with 7 mins delay, /me feels betrayed. also seems the actual landing is still some hours away, does anyone have an exact timetable? | 18:18 |
clarkb | frickler: 12:55 pm pacific time which is 20:55 UTC is the roughly expected landing time aiui | 18:19 |
clarkb | at 19:15 UTC they will start live streams (looks like on twitch and youtube and probably nasa tv) | 18:24 |
clarkb | not sure if those will be delayed though | 18:24 |
mnaser | realistically -- what are the odds of opendev running a container registry? | 18:25 |
mnaser | if someone was to do the work | 18:25 |
mnaser | or maybe just even provided it as a community service, we have a registry we can make available. i don't even know why dockerhub failures are hitting at this point and they are getting incredibly frustrating | 18:26 |
frickler | mnaser: hah, before I answer that question, did you see we talked about your IPv6 issues in the last meeting? | 18:27 |
mnaser | frickler: i have not | 18:28 |
frickler | mnaser: and I think if vexxhost provided a registry that we could use, I wouldn't object to that | 18:28 |
clarkb | the reason for failures are largely due to docker hubs new rate limiting and how it subverts our existing caching | 18:28 |
frickler | mnaser: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-16-19.01.log.html#l-103 mainly as a reminder to keep nagging you about the IRR setuo | 18:28 |
frickler | s/o$/p/ | 18:29 |
clarkb | what we talked about at teh end of last year is that we'd be happy to run a better caching system that isn't subverted by the rate limiting | 18:29 |
mnaser | frickler: yeah, IRR's aren't exactly trivial to get going, but we're working on that alongside with a big network overhaul so i'm hoping to have that done at the same time (alongside adding rpki for bgp, etc) | 18:29 |
clarkb | I don't know that we want to be a registry of record though | 18:29 |
clarkb | (side note quay.io supposedly does not rate limit and we do proxy cache them too) | 18:29 |
frickler | mnaser: rpki would have been my next topic, cool. do you at least have a rough timeframe for that? then I could stay quiet about it for some time ;) | 18:30 |
mnaser | frickler: Q2 2021? it's a big overhaul and it's already in place | 18:30 |
fungi | mnaser: also keep in mind that at least some providers are still likely to be following longstanding filtering guidance which says to ignore any prefixes longer than /23 inside 2600::/12 | 18:30 |
fungi | er, longer than /32 | 18:31 |
clarkb | specifically dockerhub moved from rate limiting blob requests (which we cache) to rate limiting manifest requests which we do not cache. The manifest requests are not cached because you have to be authenticated to fetch them (even anonymously) and apache won't cache requests with authorization headers even if the cache-control header on the response says it is cacheable | 18:31 |
mnaser | i think all of our announcements are /48s | 18:31 |
* fungi has dyslexic fingers | 18:31 | |
fungi | yeah, /48 is longer than /32 | 18:31 |
mnaser | clarkb: that is annoying. so what is the infra-suggested-workaround for this? | 18:31 |
mnaser | use quay.io ? | 18:31 |
*** hashar has quit IRC | 18:32 | |
clarkb | Ideally we'd find some tool (squid or docker-registry if it can be convinced to be space bound and prunable, etc) and update the mirrors to run that on the docker hub proxy ports | 18:32 |
frickler | istr that someone said quay.io wasn't super reliable, too | 18:32 |
mnaser | yeah i remember seeing some quay.io wasnt reliable talk too | 18:32 |
clarkb | ya they have had a couple semi recent outages that were annoying | 18:32 |
fungi | basically many years ago arin said it's only allocating /19 through /32 sized blocks from 2600::/12 so some providers assume anything longer than that (like a /48) is invalid and filter it at their peers | 18:32 |
mnaser | fungi: welp that's annoying, arin gives one allocation only and its not like i want /32 per site | 18:33 |
mnaser | i am just trying to see what osh can do to drop our failure rate with ratelimits | 18:33 |
clarkb | anyway my memory from our meetings back in ~december was that if people were interested in making this better our suggestion was a better caching proxy whatever that might look like | 18:33 |
frickler | mnaser: fungi: I think that is mostly no longer applied. since there is no IPv6 PI (at least in RIPE), people are using /48s out of PA space as a replacement | 18:33 |
mnaser | in this case, they are images that are owned by osh | 18:33 |
*** andrewbonney has quit IRC | 18:34 | |
frickler | so in my understanding, announcing /48 is o.k., it just needs a matching route object | 18:34 |
clarkb | my hunch is that squid would work well | 18:34 |
fungi | the way we worked around it at $oldjob was to announce our aggregate /32 from any border along with our longer prefixes, and then by the time packets reached backbone providers who weren't filtering those routes they'd forward based on longest (most specific) prefix | 18:34 |
clarkb | it is frustrating because we were already doing our best to be good docker citizens and caching the blobs which are the bulk of the data transfer. But according to dockerhub people found rate limitns on the blob requests super confusing beacuse when you pull an image the number of blobs is variable so it is hard to say we will be under limits with any set of requests | 18:35 |
clarkb | so they updated it to be a thing that tools don't want to cache | 18:35 |
clarkb | and represents a tiny fraction of the data transfer | 18:35 |
fungi | usually packets leaving sites filtering longer prefixes only went through at most one additional peering point before reaching a network which had more specific routes anyway | 18:36 |
clarkb | woot that last gatling-git ps has only a single failure | 18:37 |
clarkb | I think I finally figured out how it wants to function | 18:37 |
fungi | in reality we had an ipv6 /29 allocation from arin and assigned a /32 per site, but eventually ended up wanting to subdivide the /32 because that was faster than going back to arin to ask to have our /29 expanded to a /28 | 18:40 |
clarkb | and with the pushes working the better load avg on 3.3 goes away but the larger memory use remains | 18:40 |
fungi | once we grew past 8 different multi-homed data centers | 18:40 |
clarkb | fungi: I based the gatling work on https://review.opendev.org/c/opendev/system-config/+/775051/ so that we could compare system level metrics. Any chance you could take a look at reviewing that? | 18:40 |
fungi | sure | 18:41 |
clarkb | mnaser: just thinking out loud here: other options could be to bypass our mirror so that jobs are more likely to fetch from unique IPs (not true in ipv6 only clouds because docker hub is ipv4 only so you funnel through NAT). Docker hub offers a "open source images not rate limited for open source projects that agree to our terms" thing. Unfortunately the terms are weird enough taht I'm not super | 18:42 |
clarkb | comfortable with them (in particular they seem to try to say you can only ever use docker tools to interact with those images) | 18:42 |
fungi | clarkb: also that's specifically about access to images you publish, not your access to images other orgs publish | 18:43 |
clarkb | correct | 18:43 |
clarkb | which includes the base images I think | 18:43 |
mnaser | so something else that comes to my mind is uh | 18:44 |
fungi | so we could maybe get the images we publish to dockerhub flagged that way as long as we renounce all software which competes with docker and run advertisements on their behalf, but our dependencies would not be covered | 18:44 |
mnaser | we could grab openstackci/ and pay for a pro plan, i'm happy to pay the $7 or whatever, and that drops all pull limits | 18:44 |
clarkb | mnaser: the trouble with that is you'd need to authenticate as that user | 18:44 |
fungi | mnaser: and reupload copies of all our dependencies there? | 18:44 |
fungi | and yeah, we'd need special proxies which know how to auth to dockerhub | 18:45 |
mnaser | i think having a pro account drops he pull request for your images | 18:45 |
clarkb | fungi: no the way that works is you authenticate as the user and then all requests by the user are not limited (or less limited) | 18:45 |
mnaser | https://www.docker.com/pricing | 18:45 |
mnaser | >Not Applicable to Pro Accounts | 18:45 |
mnaser | under data transfer / anonymous users | 18:45 |
clarkb | its doable, but not there yet. And if we're redoing the caching system anyway to make that work I'm more inclined to make it work without a special account | 18:46 |
fungi | yep, either we'd need to trust our docker credentials to all jobs which might use them, or make an authenticating proxy | 18:46 |
clarkb | but I've also not had time to figure it out (and honestly their complete subverison of the thing I helped make that worked has made motivation low) | 18:47 |
clarkb | but I guess thats the risk when you're the only person the planet trying to be nice to your upstream services :( | 18:47 |
fungi | if we make an authenticating proxy, that's probably about the same amount of work as making a proper caching proxy registry | 18:48 |
clarkb | fungi: we'd also need to authenticate jobs to that proxy somehow | 18:48 |
mnaser | dumb question | 18:48 |
clarkb | or otherwise prevent our mirrors from becoming a docker hub rate limit back door | 18:48 |
mnaser | have we tried to talk to | 18:49 |
mnaser | folks @ dockerhub? | 18:49 |
fungi | clarkb: oh, great point. we'd be quickly open to abuse | 18:49 |
clarkb | mnaser: yes they sent us a large legal contract like document we had to agree to | 18:49 |
clarkb | mnaser: with terms like "you can only use docker tools with these images" | 18:49 |
fungi | which said things like you have to docuemtn that all your projects require the use of docker inc. software, and you have to participate in periodic promotional press releases and run ads for docker | 18:50 |
clarkb | mnaser: we asked jbryce to take a look but it was a busy end of year last year. It is possible it would be worth looking at it again but I'm not super comfortable with the terms they sent us | 18:50 |
mnaser | i mean those are all things we could likely negotiate | 18:50 |
mnaser | i'm not massively against maybe adding dockerhub to our infrastructure donors if they technically become one | 18:51 |
fungi | the interested parties seem likely to be far more efficient at developing a different solution than negotiating legal contracts | 18:51 |
clarkb | I'm not against it either which is why we asked, but the response was not very friendly | 18:51 |
clarkb | but also I think if others want to pick up that avenue thats fine | 18:52 |
mnaser | fungi: right, i just worry we may be signing ourselves for a little more than we can handle for now :> | 18:52 |
mnaser | but- i don't know anything about that :) | 18:52 |
fungi | mnaser: agreed, that includes trying to tilt at docker's windmill in my opinion | 18:52 |
clarkb | but we did go through a bunch of this several months ago and where we ended up was that likely tbe best option to us is a proxy cache taht works with the new rate limits | 18:52 |
clarkb | and basically asked for help (at the time it was tripleo asking about it, I think they cahnged stuff to rebuild image from base images which aren't rate lmiited) | 18:53 |
mnaser | hmm but i guess that sounds like tripleo is building images on each run | 18:53 |
clarkb | mnaser: yes | 18:53 |
clarkb | mnaser: they do a central image build then other jobs hang off of that | 18:53 |
fungi | but as you point out, because our nodes don't (by design, for trust reasons) authenticate to stuff, and we don't have control over what ip addresses they'll get form one minute to the next, we likely can't strictly prohibit random users from pointing their own systems at our proxies to get around docker's rate limiting, so we're opening ourselves up for potential abuse | 18:54 |
clarkb | fungi: yup that is why I'm mroe comfortable with caching what anonymous users can already do | 18:54 |
clarkb | but we could also probably set up some sort of authentication to an authenticated proxy that was job specific | 18:55 |
clarkb | people have had thoughts about this and setting up squid as an HTTP_PROXY | 18:56 |
clarkb | I think the underyling issues are very similar | 18:56 |
fungi | well, i'm saying, if we cache what anonymous users can already do, and that makes users of that cache less likely to hit dockerhub rate limits, there's nothing stopping joe's garage ci from pointing at our cache instead of dockerhub to get around rate limiting too | 18:56 |
clarkb | oh sure. But we've always done that since we first added teh caching proxy | 18:56 |
clarkb | it isn' exposing anything that require special privileges | 18:57 |
clarkb | so the badness factor is much lower | 18:57 |
*** klonn has joined #opendev | 18:57 | |
fungi | another option would be to do something like our wheel builder cache, and just (proactively) cache things listed in or which are dependencies of some list maintained in a git repo. though that means some infrastructure to populate the cache and also someone to review additions for the list | 18:58 |
fungi | we could probably use off the shelf registry software to do that | 18:58 |
fungi | we'd also have lag/races between publication of our own new images and their inclusion into the cache | 18:59 |
fungi | similar to what we see with wheels today | 18:59 |
clarkb | oh ya the other thought was updating zuul-registry to do something like this | 18:59 |
clarkb | since other zuul users may need similar too | 18:59 |
clarkb | fungi: one struggle with existing off the shelf options is you can't prune them while they are running | 18:59 |
clarkb | I think that is where the zuul-registry ideas came in. Since it can prune | 18:59 |
fungi | right, the partial abuse mitigation if we have reusable software to solve the problem is that potential abusers may see running their own zuul-registry proxy as a cheaper alternative to hitting an external one which might go offline or change names periodically | 19:00 |
clarkb | mnaser: I forwarded the docker response to you so you can see the rules they laid out | 19:00 |
clarkb | separately, I wonder if apache would ever want to update mod proxy and mod cache to cache things that cache-control headers say are cacheable | 19:01 |
fungi | good point. i've patched and recompiled squid for similar reasons in the past | 19:02 |
fungi | "yes i know the relevant rfcs say you should never cache these things, but i want to anyway, please don't stop me" | 19:02 |
clarkb | fungi: well in this case the rfcs specifically say you can cache these things | 19:03 |
fungi | oh, right, i misread what you said as "uncacheable" | 19:03 |
clarkb | cache-control: public means "The response may be stored by any cache, even if the response is normally non-cacheable." | 19:04 |
fungi | these are flagged as cacheable, but mod_proxy doesn't want to because they're authenticated requests | 19:04 |
clarkb | docker hub does set cache-control: public on the public image manifests so an rfc respecting cache should be able to cache them | 19:04 |
clarkb | yup | 19:04 |
fungi | and the challenge is the dockerhub protocol requires "anonymous" access be authenticated? | 19:04 |
clarkb | correct | 19:05 |
clarkb | you request a token for an anonymous user then use that to fetch the manifests | 19:05 |
clarkb | and then you fetch the sha256sum addressed blobs out of their cdn (and I don't think this is authenticated, though they may give you time bound redirects for them if pulling private info) | 19:06 |
frickler | mnaser: well it wasn't much talking to dockerhub actually, I received that contract proposal after filling out their online form. if you want to continue talking to them, I can forward that email to you | 19:06 |
clarkb | frickler: I forwarded it already :) | 19:07 |
frickler | clarkb: ah, great | 19:07 |
clarkb | frickler: mnaser: fwiw I had asked jbryce to take a look given some of the terms were a bit odd like the docker tools requirement. May be a good idea to try and run those terms by him again before reaching out further (but I expect this week is bad for that given the texas weather) | 19:07 |
*** eolivare has quit IRC | 19:19 | |
mnaser | yeah -- i suspect austin/tx folks have got a few other things on their plate | 19:24 |
mnaser | :( | 19:24 |
fungi | like rebooting their power grid | 19:41 |
clarkb | I followed up with the gerrit gatling-git thread on some of the stuff I discovered, but my latest patchset seems to be working | 19:45 |
clarkb | clones, pulls, and pushes are all happening \o/ | 19:45 |
fungi | nice! | 19:46 |
fungi | i'm about to start reviewing it | 19:46 |
fungi | er, start reviewing the dstat change i mean | 19:46 |
clarkb | ya that one is far more important I think | 19:47 |
clarkb | the gatling-git one is still quite a bit hacky. Half hoping there will be a response to my email upstream saying "oh you can do this properlythis way" :) | 19:47 |
fungi | and then your change will be 50% smaller | 19:48 |
clarkb | the next step is for me to fetch the gatling-git report and return it to our log changes | 19:48 |
clarkb | s/log changes/logging system/ | 19:48 |
fungi | would gatling-git be useful for torturing gitea too, or is it gerrit-specific? | 19:48 |
clarkb | fungi: I think it could be used for gitea too | 19:49 |
clarkb | it has gerrit specific things in it like change id generation support but it uses jgit and should be able to talk to any gitserver | 19:49 |
fungi | right, and for gitea we're probably more interested in how well it can serve large numbers of requests (though also handling pushes since that's how gerrit writes to it) | 19:52 |
clarkb | ya, I haven't sorted out ssh testing with gatling-git yet either | 19:53 |
clarkb | adding that too is probably a good next step as well | 19:53 |
fungi | oh, right, most of our gerrit users are doing push via ssh, not https | 19:54 |
clarkb | any idea if host_copy_output can use a glob for the source? | 19:56 |
clarkb | oh actually nevermind I need to copy out of the container first and can use a glob for that to a static location then host_copy_output can move things to recordable locations | 19:57 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Try to make gatling-git work with our test gerrit https://review.opendev.org/c/opendev/system-config/+/775883 | 20:07 |
clarkb | that might be what we need for the reports | 20:07 |
*** slaweq has quit IRC | 20:21 | |
corvus | clarkb: https://www.youtube.com/watch?v=kPrbJ63qUc4 is interesting | 20:33 |
clarkb | I'm watching the nasa twitch feed seems the clean feed is a bit ahead | 20:38 |
*** tbarron|out has joined #opendev | 20:38 | |
corvus | yeah, i've got both up; clean feed is less gabbing and more callouts | 20:39 |
fungi | i'm watching the nasa.tv feed, it's saying 5 minutes from entry | 20:44 |
fungi | sounds like the twitch feed is ~30 seconds faster | 20:50 |
*** mgagne has quit IRC | 21:08 | |
*** sboyron has quit IRC | 21:10 | |
*** whoami-rajat has quit IRC | 21:21 | |
*** DSpider has quit IRC | 21:22 | |
clarkb | that was cool | 21:26 |
ianw | grafana 7.4.2 released now with a fix to our reported issue ... https://github.com/grafana/grafana/pull/31263/commits/bf00580f9b63290cdef436bdd46d560f90e27a3e | 22:51 |
ianw | we've blocked the endpoint anyway | 22:51 |
clarkb | ianw: and we pin the image in our container right? | 22:51 |
clarkb | so we could bump the pin then be double covered? | 22:51 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: grafana: update to 7.4.2 https://review.opendev.org/c/opendev/system-config/+/776553 | 22:55 |
ianw | clarkb: ^ great minds think alike :) | 22:55 |
ianw | https://review.opendev.org/c/opendev/system-config/+/775553 is an easy one too that just creates screenshots of grafana, like we do for gerrit | 22:57 |
ianw | opportunities to pull some of these bits out into more library functions i imagine if we keep doing this | 22:57 |
clarkb | ianw: one thing I was thinking about recently is if we should try to do a playbooks/tests/ dir and push all those testing playbooks down a level | 22:58 |
clarkb | but it isn't clear to me if that would break host vars and stuff due to relative paths | 22:58 |
ianw | fungi / clarkb : https://review.opendev.org/c/opendev/system-config/+/766630 and https://review.opendev.org/c/opendev/system-config/+/775733 are a couple of backup changes if you have some time | 22:59 |
ianw | the big one is 766630 that removes bup; it's worth checking the wiki backups in particular as i think that's the only one that nobody else has validated | 22:59 |
clarkb | ya I can do reviews. Mostly been trying to get gatling-git further along and help nodepool zk tls changes | 22:59 |
clarkb | I'll have to defer to fungi on the wiki stuff | 23:00 |
ianw | thanks | 23:01 |
fungi | yeah, looking, thanks for the reminder | 23:01 |
ianw | clarkb / kopecmartin : i'll put a hold on https://review.opendev.org/c/opendev/system-config/+/776292 and we can re-run the job -- i have no idea what's failing and i think live debugging will be the best way forward | 23:02 |
ianw | infra-root: another one to consider is https://review.opendev.org/c/opendev/system-config/+/771445 which expands the comment area; it's been sitting for a while. we should either yes or no it i guess. i'm about +1.5 and rounding up, poking through the shadown dom feels icky. but it's also what we do in CI to take screenshots, so ... | 23:07 |
clarkb | ianw: for BORG_UNDER_CRON we set that globally for all crontab entries right? I guess it doesn't really matter too much. Can we set it on the command instead to apply it only to the specific crontab entry though? | 23:10 |
*** klonn has quit IRC | 23:11 | |
ianw | umm, yes i guess so | 23:17 |
clarkb | (I'm mostly thinking out loud here I don't know that this sort of thing matters all that much) | 23:19 |
ianw | i also wanted to do a bit of a grep of the output to send some stats, similar to what we do for the mirror updates, so we can graph our incrementals | 23:19 |
fungi | yeah, i don't think i feel all that strongly about envvar pollution except where the variables are poorly named and might be used by another tool unexpectedl | 23:23 |
fungi | y | 23:23 |
*** tosky has quit IRC | 23:31 | |
ianw | wait-for-it.sh is a pure bash script ... nc -z $WAITFORIT_HOST $WAITFORIT_PORT | 23:34 |
ianw | it bet you could do something with proc and opening a socket in actual pure bash | 23:35 |
*** LowKey has quit IRC | 23:37 | |
*** LowKey has joined #opendev | 23:37 | |
openstackgerrit | Merged opendev/system-config master: grafana: take some screenshots during testing https://review.opendev.org/c/opendev/system-config/+/775553 | 23:37 |
clarkb | ianw: guillaumec I'm trying to make sense of that comment width change. It seems like it is doing an n^2 search over a single list of comments? | 23:51 |
clarkb | I'm not sure I understand this loop | 23:51 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] gerrit : use mariadb container https://review.opendev.org/c/opendev/system-config/+/775961 | 23:51 |
clarkb | oh no last_change_idx is there to move things ahead on the next pass of the outer loop | 23:51 |
clarkb | so its still a scan done in n | 23:52 |
fungi | ianw: is there a change already to add the new borg servers to cacti? | 23:54 |
fungi | or are they already in there and i'm just blind? | 23:55 |
ianw | fungi: ahh, nope, i may well have forgotten that | 23:55 |
fungi | i'm happy to push a change up for that | 23:55 |
fungi | i was being lazy and using cacti to find their hostnames ;) | 23:55 |
fungi | backup02.ca-ymq-1.vexxhost.opendev.org and backup01.ord.rax.opendev.org are the borg servers, right? | 23:56 |
ianw | yep, that's right | 23:56 |
fungi | cool, will push momentarily | 23:57 |
ianw | clarkb: yeah, the overall level of "i don't want to debug this and i wish gerrit had a way to style comments properly" is my main concern | 23:57 |
clarkb | ianw: heh that resembles the comments I'm writing :) | 23:58 |
clarkb | though with specific concerns related to ^ | 23:58 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!