Thursday, 2021-02-18

clarkbspot checking more kna1 job failures I'm yet to find anything that looks like a growroot failure00:01
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: ensure-zookeeper: add use_tls role var  https://review.opendev.org/c/zuul/zuul-jobs/+/77629000:06
clarkbchanging tactics a bit and using this query: node_provider:"airship-kna1" AND build_status:"FAILURE" AND message:"The filesystem on /dev/vda1 is now" doesn't really produce any catches either00:25
*** tosky has quit IRC00:30
openstackgerritIan Wienand proposed opendev/system-config master: refstack: Edit URL of public RefStackAPI  https://review.opendev.org/c/opendev/system-config/+/77629200:37
openstackgerritIan Wienand proposed opendev/system-config master: [wip] gerrit : use mariadb container  https://review.opendev.org/c/opendev/system-config/+/77596100:42
corvusclarkb: completely at random i saw this patch; i have no idea if it at all relates to that earlier behavior that finally got cleared up: https://gerrit-review.googlesource.com/c/gerrit/+/29721600:52
clarkbI'm not sure I fully understand what that commit message is trying to tell me00:54
clarkbI guess a sticky vote is one which has carried over from one patchset to the next00:55
*** LowKey has quit IRC00:59
ianwi've noticed the zuul plugin doesn't match on skipped jobs00:59
ianwthey don't have a time01:00
ianwsystem-config-build-image-gerrit-3.2 https://zuul.opendev.org/t/openstack/build/None : SKIPPED01:00
clarkbya skipped jobs never start aiui01:01
ianwi'll put that on the todo list01:01
ianwunless anyone feels like javascript regex hacking :)01:01
*** mlavalle has quit IRC01:32
*** auristor has quit IRC02:18
*** auristor has joined #opendev02:26
*** hemanth_n has joined #opendev02:53
*** dviroel has quit IRC02:55
*** dmsimard8 has joined #opendev02:56
*** dmsimard has quit IRC02:59
*** lourot has quit IRC02:59
*** priteau has quit IRC02:59
*** gouthamr has quit IRC02:59
*** clayg has quit IRC02:59
*** dmsimard8 is now known as dmsimard02:59
*** ianw has quit IRC02:59
*** clayg has joined #opendev03:00
*** ianw has joined #opendev03:00
*** lourot has joined #opendev03:05
*** dirk has quit IRC03:25
*** dirk has joined #opendev03:25
*** zoharm1 has joined #opendev03:34
*** whoami-rajat__ has joined #opendev04:38
*** ykarel has joined #opendev05:00
*** ysandeep|away is now known as ysandeep|ruck05:04
*** DSpider has joined #opendev05:20
*** LowKey has joined #opendev05:53
*** gouthamr has joined #opendev06:01
*** marios has joined #opendev06:18
*** LowKey has quit IRC06:37
*** LowKey has joined #opendev06:37
openstackgerritDinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos  https://review.opendev.org/c/zuul/zuul-jobs/+/76735407:01
openstackgerritDinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos  https://review.opendev.org/c/zuul/zuul-jobs/+/76735407:03
openstackgerritDinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos  https://review.opendev.org/c/zuul/zuul-jobs/+/76735407:04
*** slaweq has joined #opendev07:13
*** ralonsoh has joined #opendev07:21
*** DSpider has quit IRC07:31
*** jpena|off is now known as jpena07:36
*** slaweq has quit IRC07:39
*** slaweq has joined #opendev07:42
*** eolivare has joined #opendev07:44
*** fressi has joined #opendev07:45
*** brinzhang has quit IRC07:45
*** brinzhang has joined #opendev07:45
*** fressi has quit IRC07:52
*** fressi has joined #opendev08:06
*** rpittau|afk is now known as rpittau08:14
*** hashar has joined #opendev08:21
*** sboyron has joined #opendev08:24
*** andrewbonney has joined #opendev08:26
*** ysandeep|ruck is now known as ysandeep|lunch08:26
*** roman_g has joined #opendev08:27
*** klonn has joined #opendev08:42
*** tosky has joined #opendev08:45
openstackgerritDinesh Garg proposed zuul/zuul-jobs master: Allow customization of helm charts repos  https://review.opendev.org/c/zuul/zuul-jobs/+/76735408:51
*** brinzhang has quit IRC09:00
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Allow customization of helm charts repos  https://review.opendev.org/c/zuul/zuul-jobs/+/76735409:05
*** ykarel_ has joined #opendev09:05
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Allow customization of helm charts repos  https://review.opendev.org/c/zuul/zuul-jobs/+/76735409:07
*** DSpider has joined #opendev09:08
*** ykarel has quit IRC09:08
openstackgerritMerged openstack/project-config master: Add ansible-role-pki to zuul  https://review.opendev.org/c/openstack/project-config/+/77338709:08
*** ysandeep|lunch is now known as ysandeep|ruck09:15
*** dtantsur|afk is now known as dtantsur09:31
*** ykarel_ is now known as ykarel09:49
*** roman_g has quit IRC10:46
*** JayF has quit IRC11:03
*** JayF has joined #opendev11:07
*** dviroel has joined #opendev11:21
openstackgerritMerged zuul/zuul-jobs master: Allow customization of helm charts repos  https://review.opendev.org/c/zuul/zuul-jobs/+/76735411:31
*** priteau has joined #opendev11:33
fricklerlest anyone misses this, 19:15 UTC today: https://mars.nasa.gov/mars2020/timeline/landing/watch-online/11:59
*** hemanth_n has quit IRC12:16
*** jpena is now known as jpena|lunch12:30
openstackgerritMartin Kopec proposed opendev/system-config master: refstack: Edit URL of public RefStackAPI  https://review.opendev.org/c/opendev/system-config/+/77629212:47
*** klonn has quit IRC12:57
*** jpena|lunch is now known as jpena13:31
*** klonn has joined #opendev13:46
*** iurygregory has quit IRC13:47
*** iurygregory has joined #opendev13:50
*** mlavalle has joined #opendev13:55
*** ysandeep|ruck is now known as ysandeep|afk14:11
*** ykarel_ has joined #opendev14:14
*** ykarel has quit IRC14:17
*** marios is now known as marios|call14:18
fungifrickler: i just hope we're able to slip it through while the martians aren't watching14:21
*** ykarel_ has quit IRC14:27
*** whoami-rajat__ is now known as whoami-rajat14:48
*** tosky has quit IRC14:51
*** tosky_ has joined #opendev14:51
*** tosky_ is now known as tosky14:51
*** fressi has quit IRC14:53
*** fressi has joined #opendev14:55
*** ysandeep|afk is now known as ysandeep|ruck15:02
*** fressi has quit IRC15:12
*** fressi has joined #opendev15:13
*** hashar has quit IRC15:18
*** marios|call is now known as marios15:28
*** zoharm1 has quit IRC15:29
*** fressi has quit IRC15:33
*** ysandeep|ruck is now known as ysandeep|away15:35
*** roman_g has joined #opendev15:36
*** ykarel_ has joined #opendev15:48
roman_gGood morning. Is there a way to get job logs if it timed out? For example, this one: https://zuul.opendev.org/t/openstack/build/16329de0aeb64a208542d8f6a3ccc15b16:07
roman_g3 lines of logs:16:07
roman_g2021-02-17 05:15:42.776228 | TASK [make]16:07
roman_g2021-02-17 07:13:51.873764 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/airship/images/playbooks/airship-images-build.yaml@master]16:07
roman_g2021-02-17 07:13:51.875093 | POST-RUN START: [untrusted : opendev.org/airship/images/playbooks/airship-collect-logs.yaml@master]16:07
roman_gI'd like to see logs which are from TASK [make].16:08
roman_gIt hanged for about 2 hours, and then timed out. Need to find out how far did it go and where did it hang.16:09
*** klonn has quit IRC16:10
openstackgerritClark Boylan proposed opendev/system-config master: Try to make gatling-git work with our test gerrit  https://review.opendev.org/c/opendev/system-config/+/77588316:12
clarkbroman_g: if your job isn't writing logs as it runs that task that can be collected I think your best bet is the ansible record json file16:13
clarkbroman_g: the zuul job console is rendered for that but it won't render a killed/timed out playbook. However, the json should still have an incomplete record there showing you roughly how far along it got16:13
clarkboh huh nevermind it seems not to. I thought I had used this method before16:13
clarkbroman_g: you can update the job such that make outputs to stdout/stderr and that will end up in the console log. Or have it write to a log file that is collected16:15
roman_gclarkb Yes, that's what I have been thinking of. Thank you.16:15
jrosserhi, is it ok to add more things to the CI mirrors, or are there storage / other things that need considering?16:19
clarkbjrosser: storage is definitely something to factor in. https://grafana.opendev.org/d/Zic1IwPGk/afs?orgId=1 gives you an overview of what current storage looks like16:20
jrossererlang-solutions got themselves on my "things that break" list for improving osa CI reliability16:20
clarkbafs01.dfw is getting a bit close to full16:20
*** ykarel_ is now known as ykarel16:21
fungijrosser: also a lot of what is "served" from our mirrors is really just a caching http proxy in reality, so that's often not hard to add (depending on the size of the files, and how effectively the application protocol/api can be proxied)16:21
jrosserthe trouble with erlang-solutions is that they repeatedly release new packages and break their repo in the process of doing that16:22
jrosserso if reprepro were able to mitigate that it would be a small reliability win for osa16:22
jrosserbut obv. to balance against the storage cost of doing that16:22
fungioh, is it a deb package repository?16:22
clarkband I guess using the distro provided erlange packages we already mirror is a problem?16:23
jrosseryes, and they break the checksums relatively often16:23
jrosserfor focal we use the distro package, but not for bionic down all the stable branches16:23
jrossermainly in order to keep the same exact versions on all the OS16:23
fungijrosser: rough bar napkin calculation on how much data at rest you're talking about16:24
fungi?16:24
fungi(order of magnitude is fine)16:24
jrosseri have a local mirror here as it happens 15Gpackages.erlang-solutions.com16:26
fungiso not tiny, but not huge16:26
jrosserand looks like i have bionic and focal in it16:27
clarkbsort of related, I think we can remove fedora-31 soonish if we remove dib's job for it16:27
clarkbthen we can clean up that portion of the fedora mirror16:27
jrosseri thought i'd ask becasue it's one of the things that has failed jobs for us since the ML post about CI usage16:28
*** ykarel has quit IRC16:29
clarkbone other concern with the one offs like that (ceph is another good example) is they don't tend to be updated by anyone once put in place16:30
clarkbwe keep things like the distro mirrors up to date as well as we can, but for something like say ceph the problem becomes some versions are built for some distro releases and some are not and figuring out the config is significantly more effort and the folks putting stuff in place in the first place tend not to update them later16:30
jrosseri did patches this week to use the ceph mirror16:31
jrosseroctopus was only recently added16:31
clarkbright, I'm not saying they don't get updated but they tend to significantly lag16:32
clarkb(I'm just calling this out as a risk, beacuse we are unlikely to keep up with say ceph releases for various distros or erlang releases for various distros)16:32
clarkb15GB is probably fine in the current mirrors, but we should also look at trimming things due to afs01.dfw's available disk space16:34
clarkbmy rough math says we've got about 160GB of headroom currently16:35
*** LowKey has quit IRC16:35
fungitend to significantly lag on additions, but also tend to lag heavily on cleanup16:36
*** LowKey has joined #opendev16:36
fungiin many cases we don't know whether some of the suites being copied target distro releases we've otherwise dropped, due to how projects name them16:36
jrosseri really don't mind either way, it falls into the category of something i understand how to fix but admittedly the size of issue it's addressing is really quite small16:37
clarkbanother approach or possibly something to do as well, is to reach out to them and ask if they can publish consistent repos16:37
clarkbits not super difficult, but does require some care16:37
jrosserhttps://twitter.com/thejrosser/status/1222226703105298433 :)16:38
fungiadd packages, replace checksums/signatures atomically, pause for a while, then delete unreferenced packages16:38
fungiis the generally safe order of operations16:38
fungialso never replace package files without up-revving the versions in their filenames16:39
fungioverwriting package files is bad, bad, bad16:39
jrosseri guess they have some slightly wonky means of updating their repo16:39
clarkbfungi: found a grenade failure in kna1, but it failed beacuse it is trying to install a package that doesn't exist (I assume its a stale/bitrotted stable branch)16:44
clarkband its growroot log looks fine16:45
fungibest part about heisenbugs is as long as you keep measuring them, they'll stay fixed16:45
clarkbfungi: ya I'm beginning to wonder if journalctl -u is blocking long enough for the job to finish a growroot before doing real work16:46
fungientirely possible, if it's truly just a race16:46
clarkbthe journalctl -u growroot timestamps show growroot takes 2 or 3 seconds and completes before we ever manage to start running ansible16:46
clarkbhowever the ansible task to run journalctl -u growroot takes ~30 seconds?16:47
clarkbthough ansible itself reports that task took less than a second16:48
clarkbnot super confident that the bug has been fixed by looking at it, but certainly some odd enough timing there that I can't rule it out16:48
*** fressi has joined #opendev16:48
fungicould also be time weirdness from ntp still synchronizing, if we're catching nodes that early in their lifecycles16:50
*** jpena is now known as jpena|off16:51
*** roman_g has quit IRC17:03
*** roman_g has joined #opendev17:03
*** roman_g has quit IRC17:03
*** roman_g has joined #opendev17:04
*** roman_g has quit IRC17:04
*** marios is now known as marios|out17:05
*** roman_g has joined #opendev17:05
*** roman_g has quit IRC17:05
*** roman_g has joined #opendev17:06
*** roman_g has quit IRC17:06
*** roman_g has joined #opendev17:06
*** roman_g has quit IRC17:07
*** hashar has joined #opendev17:16
openstackgerritClark Boylan proposed openstack/diskimage-builder master: Remove fedora-31 testing  https://review.opendev.org/c/openstack/diskimage-builder/+/77650317:18
openstackgerritClark Boylan proposed openstack/diskimage-builder master: Add fedora 33 testing  https://review.opendev.org/c/openstack/diskimage-builder/+/77650417:18
*** marios|out has quit IRC17:19
clarkbI think we don't actually need to land ^ that to cleanup the fedora-31 image17:19
clarkbsince the job runs on some other platform and builds a fedora-31 image. Still a reasonable cleanup I think17:19
openstackgerritClark Boylan proposed opendev/system-config master: Try to make gatling-git work with our test gerrit  https://review.opendev.org/c/opendev/system-config/+/77588317:23
clarkboh hrm looking at nodepool we don't have fedora-33 yet there17:25
clarkbhowever, I don't think we need 33 before removing 3117:25
fungii think there were some challenges getting f33 built, maybe still unresolved17:25
fungii don't remember specifics, unfortunately17:26
openstackgerritClark Boylan proposed openstack/project-config master: Stop launch fedora-31 nodes nodepool  https://review.opendev.org/c/openstack/project-config/+/77651017:31
openstackgerritClark Boylan proposed openstack/project-config master: Remove fedora-31 disk image config  https://review.opendev.org/c/openstack/project-config/+/77651117:31
clarkbwe should probably double check with ianw that this doesn't affect any existing plans with dib etc but I think the cleanup looks like that to start17:32
clarkbfungi: we do mirror fedora 33 at least17:42
*** rpittau is now known as rpittau|afk17:47
*** dtantsur is now known as dtantsur|afk18:00
*** ralonsoh has quit IRC18:15
fricklermeh, no live stream, everything shown with 7 mins delay, /me feels betrayed. also seems the actual landing is still some hours away, does anyone have an exact timetable?18:18
clarkbfrickler: 12:55 pm pacific time which is 20:55 UTC is the roughly expected landing time aiui18:19
clarkbat 19:15 UTC they will start live streams (looks like on twitch and youtube and probably nasa tv)18:24
clarkbnot sure if those will be delayed though18:24
mnaserrealistically -- what are the odds of opendev running a container registry?18:25
mnaserif someone was to do the work18:25
mnaseror maybe just even provided it as a community service, we have a registry we can make available.  i don't even know why dockerhub failures are hitting at this point and they are getting incredibly frustrating18:26
fricklermnaser: hah, before I answer that question, did you see we talked about your IPv6 issues in the last meeting?18:27
mnaserfrickler: i have not18:28
fricklermnaser: and I think if vexxhost provided a registry that we could use, I wouldn't object to that18:28
clarkbthe reason for failures are largely due to docker hubs new rate limiting and how it subverts our existing caching18:28
fricklermnaser: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-16-19.01.log.html#l-103 mainly as a reminder to keep nagging you about the IRR setuo18:28
fricklers/o$/p/18:29
clarkbwhat we talked about at teh end of last year is that we'd be happy to run a better caching system that isn't subverted by the rate limiting18:29
mnaserfrickler: yeah, IRR's aren't exactly trivial to get going, but we're working on that alongside with a big network overhaul so i'm hoping to have that done at the same time (alongside adding rpki for bgp, etc)18:29
clarkbI don't know that we want to be a registry of record though18:29
clarkb(side note quay.io supposedly does not rate limit and we do proxy cache them too)18:29
fricklermnaser: rpki would have been my next topic, cool. do you at least have a rough timeframe for that? then I could stay quiet about it for some time ;)18:30
mnaserfrickler: Q2 2021?  it's a big overhaul and it's already in place18:30
fungimnaser: also keep in mind that at least some providers are still likely to be following longstanding filtering guidance which says to ignore any prefixes longer than /23 inside 2600::/1218:30
fungier, longer than /3218:31
clarkbspecifically dockerhub moved from rate limiting blob requests (which we cache) to rate limiting manifest requests which we do not cache. The manifest requests are not cached because you have to be authenticated to fetch them (even anonymously) and apache won't cache requests with authorization headers even if the cache-control header on the response says it is cacheable18:31
mnaseri think all of our announcements are /48s18:31
* fungi has dyslexic fingers18:31
fungiyeah, /48 is longer than /3218:31
mnaserclarkb: that is annoying.  so what is the infra-suggested-workaround for this?18:31
mnaseruse quay.io ?18:31
*** hashar has quit IRC18:32
clarkbIdeally we'd find some tool (squid or docker-registry if it can be convinced to be space bound and prunable, etc) and update the mirrors to run that on the docker hub proxy ports18:32
frickleristr that someone said quay.io wasn't super reliable, too18:32
mnaseryeah i remember seeing some quay.io wasnt reliable talk too18:32
clarkbya they have had a couple semi recent outages that were annoying18:32
fungibasically many years ago arin said it's only allocating /19 through /32 sized blocks from 2600::/12 so some providers assume anything longer than that (like a /48) is invalid and filter it at their peers18:32
mnaserfungi: welp that's annoying, arin  gives one allocation only and its not like i want /32 per site18:33
mnaseri am just trying to see what osh can do to drop our failure rate with ratelimits18:33
clarkbanyway my memory from our meetings back in ~december was that if people were interested in making this better our suggestion was a better caching proxy whatever that might look like18:33
fricklermnaser: fungi: I think that is mostly no longer applied. since there is no IPv6 PI (at least in RIPE), people are using /48s out of PA space as a replacement18:33
mnaserin this case, they are images that are owned by osh18:33
*** andrewbonney has quit IRC18:34
fricklerso in my understanding, announcing /48 is o.k., it just needs a matching route object18:34
clarkbmy hunch is that squid would work well18:34
fungithe way we worked around it at $oldjob was to announce our aggregate /32 from any border along with our longer prefixes, and then by the time packets reached backbone providers who weren't filtering those routes they'd forward based on longest (most specific) prefix18:34
clarkbit is frustrating because we were already doing our best to be good docker citizens and caching the blobs which are the bulk of the data transfer. But according to dockerhub people found rate limitns on the blob requests super confusing beacuse when you pull an image the number of blobs is variable so it is hard to say we will be under limits with any set of requests18:35
clarkbso they updated it to be a thing that tools don't want to cache18:35
clarkband represents a tiny fraction of the data transfer18:35
fungiusually packets leaving sites filtering longer prefixes only went through at most one additional peering point before reaching a network which had more specific routes anyway18:36
clarkbwoot that last gatling-git ps has only a single failure18:37
clarkbI think I finally figured out how it wants to function18:37
fungiin reality we had an ipv6 /29 allocation from arin and assigned a /32 per site, but eventually ended up wanting to subdivide the /32 because that was faster than going back to arin to ask to have our /29 expanded to a /2818:40
clarkband with the pushes working the better load avg on 3.3 goes away but the larger memory use remains18:40
fungionce we grew past 8 different multi-homed data centers18:40
clarkbfungi: I based the gatling work on https://review.opendev.org/c/opendev/system-config/+/775051/ so that we could compare system level metrics. Any chance you could take a look at reviewing that?18:40
fungisure18:41
clarkbmnaser: just thinking out loud here: other options could be to bypass our mirror so that jobs are more likely to fetch from unique IPs (not true in ipv6 only clouds because docker hub is ipv4 only so you funnel through NAT). Docker hub offers a "open source images not rate limited for open source projects that agree to our terms" thing. Unfortunately the terms are weird enough taht I'm not super18:42
clarkbcomfortable with them (in particular they seem to try to say you can only ever use docker tools to interact with those images)18:42
fungiclarkb: also that's specifically about access to images you publish, not your access to images other orgs publish18:43
clarkbcorrect18:43
clarkbwhich includes the base images I think18:43
mnaserso something else that comes to my mind is uh18:44
fungiso we could maybe get the images we publish to dockerhub flagged that way as long as we renounce all software which competes with docker and run advertisements on their behalf, but our dependencies would not be covered18:44
mnaserwe could grab openstackci/ and pay for a pro plan, i'm happy to pay the $7 or whatever, and that drops all pull limits18:44
clarkbmnaser: the trouble with that is you'd need to authenticate as that user18:44
fungimnaser: and reupload copies of all our dependencies there?18:44
fungiand yeah, we'd need special proxies which know how to auth to dockerhub18:45
mnaseri think having a pro account drops he pull request for your images18:45
clarkbfungi: no the way that works is you authenticate as the user and then all requests by the user are not limited (or less limited)18:45
mnaserhttps://www.docker.com/pricing18:45
mnaser>Not Applicable to Pro Accounts18:45
mnaserunder data transfer / anonymous users18:45
clarkbits doable, but not there yet. And if we're redoing the caching system anyway to make that work I'm more inclined to make it work without a special account18:46
fungiyep, either we'd need to trust our docker credentials to all jobs which might use them, or make an authenticating proxy18:46
clarkbbut I've also not had time to figure it out (and honestly their complete subverison of the thing I helped make that worked has made motivation low)18:47
clarkbbut I guess thats the risk when you're the only person the planet trying to be nice to your upstream services :(18:47
fungiif we make an authenticating proxy, that's probably about the same amount of work as making a proper caching proxy registry18:48
clarkbfungi: we'd also need to authenticate jobs to that proxy somehow18:48
mnaserdumb question18:48
clarkbor otherwise prevent our mirrors from becoming a docker hub rate limit back door18:48
mnaserhave we tried to talk to18:49
mnaserfolks @ dockerhub?18:49
fungiclarkb: oh, great point. we'd be quickly open to abuse18:49
clarkbmnaser: yes they sent us a large legal contract like document we had to agree to18:49
clarkbmnaser: with terms like "you can only use docker tools with these images"18:49
fungiwhich said things like you have to docuemtn that all your projects require the use of docker inc. software, and you have to participate in periodic promotional press releases and run ads for docker18:50
clarkbmnaser: we asked jbryce to take a look but it was a busy end of year last year. It is possible it would be worth looking at it again but I'm not super comfortable with the terms they sent us18:50
mnaseri mean those are all things we could likely negotiate18:50
mnaseri'm not massively against maybe adding dockerhub to our infrastructure donors if they technically become one18:51
fungithe interested parties seem likely to be far more efficient at developing a different solution than negotiating legal contracts18:51
clarkbI'm not against it either which is why we asked, but the response was not very friendly18:51
clarkbbut also I think if others want to pick up that avenue thats fine18:52
mnaserfungi: right, i just worry we may be signing ourselves for a little more than we can handle for now :>18:52
mnaserbut- i don't know anything about that :)18:52
fungimnaser: agreed, that includes trying to tilt at docker's windmill in my opinion18:52
clarkbbut we did go through a bunch of this several months ago and where we ended up was that likely tbe best option to us is a proxy cache taht works with the new rate limits18:52
clarkband basically asked for help (at the time it was tripleo asking about it, I think they cahnged stuff to rebuild image from base images which aren't rate lmiited)18:53
mnaserhmm but i guess that sounds like tripleo is building images on each run18:53
clarkbmnaser: yes18:53
clarkbmnaser: they do a central image build then other jobs hang off of that18:53
fungibut as you point out, because our nodes don't (by design, for trust reasons) authenticate to stuff, and we don't have control over what ip addresses they'll get form one minute to the next, we likely can't strictly prohibit random users from pointing their own systems at our proxies to get around docker's rate limiting, so we're opening ourselves up for potential abuse18:54
clarkbfungi: yup that is why I'm mroe comfortable with caching what anonymous users can already do18:54
clarkbbut we could also probably set up some sort of authentication to an authenticated proxy that was job specific18:55
clarkbpeople have had thoughts about this and setting up squid as an HTTP_PROXY18:56
clarkbI think the underyling issues are very similar18:56
fungiwell, i'm saying, if we cache what anonymous users can already do, and that makes users of that cache less likely to hit dockerhub rate limits, there's nothing stopping joe's garage ci from pointing at our cache instead of dockerhub to get around rate limiting too18:56
clarkboh sure. But we've always done that since we first added teh caching proxy18:56
clarkbit isn' exposing anything that require special privileges18:57
clarkbso the badness factor is much lower18:57
*** klonn has joined #opendev18:57
fungianother option would be to do something like our wheel builder cache, and just (proactively) cache things listed in or which are dependencies of some list maintained in a git repo. though that means some infrastructure to populate the cache and also someone to review additions for the list18:58
fungiwe could probably use off the shelf registry software to do that18:58
fungiwe'd also have lag/races between publication of our own new images and their inclusion into the cache18:59
fungisimilar to what we see with wheels today18:59
clarkboh ya the other thought was updating zuul-registry to do something like this18:59
clarkbsince other zuul users may need similar too18:59
clarkbfungi: one struggle with existing off the shelf options is you can't prune them while they are running18:59
clarkbI think that is where the zuul-registry ideas came in. Since it can prune18:59
fungiright, the partial abuse mitigation if we have reusable software to solve the problem is that potential abusers may see running their own zuul-registry proxy as a cheaper alternative to hitting an external one which might go offline or change names periodically19:00
clarkbmnaser: I forwarded the docker response to you so you can see the rules they laid out19:00
clarkbseparately, I wonder if apache would ever want to update mod proxy and mod cache to cache things that cache-control headers say are cacheable19:01
fungigood point. i've patched and recompiled squid for similar reasons in the past19:02
fungi"yes i know the relevant rfcs say you should never cache these things, but i want to anyway, please don't stop me"19:02
clarkbfungi: well in this case the rfcs specifically say you can cache these things19:03
fungioh, right, i misread what you said as "uncacheable"19:03
clarkbcache-control: public means "The response may be stored by any cache, even if the response is normally non-cacheable."19:04
fungithese are flagged as cacheable, but mod_proxy doesn't want to because they're authenticated requests19:04
clarkbdocker hub does set cache-control: public on the public image manifests so an rfc respecting cache should be able to cache them19:04
clarkbyup19:04
fungiand the challenge is the dockerhub protocol requires "anonymous" access be authenticated?19:04
clarkbcorrect19:05
clarkbyou request a token for an anonymous user then use that to fetch the manifests19:05
clarkband then you fetch the sha256sum addressed blobs out of their cdn (and I don't think this is authenticated, though they may give you time bound redirects for them if pulling private info)19:06
fricklermnaser: well it wasn't much talking to dockerhub actually, I received that contract proposal after filling out their online form. if you want to continue talking to them, I can forward that email to you19:06
clarkbfrickler: I forwarded it already :)19:07
fricklerclarkb: ah, great19:07
clarkbfrickler: mnaser: fwiw I had asked jbryce to take a look given some of the terms were a bit odd like the docker tools requirement. May be a good idea to try and run those terms by him again before reaching out further (but I expect this week is bad for that given the texas weather)19:07
*** eolivare has quit IRC19:19
mnaseryeah -- i suspect austin/tx folks have got a few other things on their plate19:24
mnaser:(19:24
fungilike rebooting their power grid19:41
clarkbI followed up with the gerrit gatling-git thread on some of the stuff I discovered, but my latest patchset seems to be working19:45
clarkbclones, pulls, and pushes are all happening \o/19:45
funginice!19:46
fungii'm about to start reviewing it19:46
fungier, start reviewing the dstat change i mean19:46
clarkbya that one is far more important I think19:47
clarkbthe gatling-git one is still quite a bit hacky. Half hoping there will be a response to my email upstream saying "oh you can do this properlythis way" :)19:47
fungiand then your change will be 50% smaller19:48
clarkbthe next step is for me to fetch the gatling-git report and return it to our log changes19:48
clarkbs/log changes/logging system/19:48
fungiwould gatling-git be useful for torturing gitea too, or is it gerrit-specific?19:48
clarkbfungi: I think it could be used for gitea too19:49
clarkbit has gerrit specific things in it like change id generation support but it uses jgit and should be able to talk to any gitserver19:49
fungiright, and for gitea we're probably more interested in how well it can serve large numbers of requests (though also handling pushes since that's how gerrit writes to it)19:52
clarkbya, I haven't sorted out ssh testing with gatling-git yet either19:53
clarkbadding that too is probably a good next step as well19:53
fungioh, right, most of our gerrit users are doing push via ssh, not https19:54
clarkbany idea if host_copy_output can use a glob for the source?19:56
clarkboh actually nevermind I need to copy out of the container first and can use a glob for that to a static location then host_copy_output can move things to recordable locations19:57
openstackgerritClark Boylan proposed opendev/system-config master: Try to make gatling-git work with our test gerrit  https://review.opendev.org/c/opendev/system-config/+/77588320:07
clarkbthat might be what we need for the reports20:07
*** slaweq has quit IRC20:21
corvusclarkb: https://www.youtube.com/watch?v=kPrbJ63qUc4 is interesting20:33
clarkbI'm watching the nasa twitch feed seems the clean feed is a bit ahead20:38
*** tbarron|out has joined #opendev20:38
corvusyeah, i've got both up; clean feed is less gabbing and more callouts20:39
fungii'm watching the nasa.tv feed, it's saying 5 minutes from entry20:44
fungisounds like the twitch feed is ~30 seconds faster20:50
*** mgagne has quit IRC21:08
*** sboyron has quit IRC21:10
*** whoami-rajat has quit IRC21:21
*** DSpider has quit IRC21:22
clarkbthat was cool21:26
ianwgrafana 7.4.2 released now with a fix to our reported issue ... https://github.com/grafana/grafana/pull/31263/commits/bf00580f9b63290cdef436bdd46d560f90e27a3e22:51
ianwwe've blocked the endpoint anyway22:51
clarkbianw: and we pin the image in our container right?22:51
clarkbso we could bump the pin then be double covered?22:51
openstackgerritIan Wienand proposed opendev/system-config master: grafana: update to 7.4.2  https://review.opendev.org/c/opendev/system-config/+/77655322:55
ianwclarkb: ^ great minds think alike :)22:55
ianwhttps://review.opendev.org/c/opendev/system-config/+/775553 is an easy one too that just creates screenshots of grafana, like we do for gerrit22:57
ianwopportunities to pull some of these bits out into more library functions i imagine if we keep doing this22:57
clarkbianw: one thing I was thinking about recently is if we should try to do a playbooks/tests/ dir and push all those testing playbooks down a level22:58
clarkbbut it isn't clear to me if that would break host vars and stuff due to relative paths22:58
ianwfungi / clarkb : https://review.opendev.org/c/opendev/system-config/+/766630 and https://review.opendev.org/c/opendev/system-config/+/775733 are a couple of backup changes if you have some time22:59
ianwthe big one is 766630 that removes bup; it's worth checking the wiki backups in particular as i think that's the only one that nobody else has validated22:59
clarkbya I can do reviews. Mostly been trying to get gatling-git further along and help nodepool zk tls changes22:59
clarkbI'll have to defer to fungi on the wiki stuff23:00
ianwthanks23:01
fungiyeah, looking, thanks for the reminder23:01
ianwclarkb / kopecmartin : i'll put a hold on https://review.opendev.org/c/opendev/system-config/+/776292 and we can re-run the job -- i have no idea what's failing and i think live debugging will be the best way forward23:02
ianwinfra-root: another one to consider is https://review.opendev.org/c/opendev/system-config/+/771445 which expands the comment area; it's been sitting for a while.  we should either yes or no it i guess.  i'm about +1.5 and rounding up, poking through the shadown dom feels icky.  but it's also what we do in CI to take screenshots, so ...23:07
clarkbianw: for BORG_UNDER_CRON we set that globally for all crontab entries right? I guess it doesn't really matter too much. Can we set it on the command instead to apply it only to the specific crontab entry though?23:10
*** klonn has quit IRC23:11
ianwumm, yes i guess so23:17
clarkb(I'm mostly thinking out loud here I don't know that this sort of thing matters all that much)23:19
ianwi also wanted to do a bit of a grep of the output to send some stats, similar to what we do for the mirror updates, so we can graph our incrementals23:19
fungiyeah, i don't think i feel all that strongly about envvar pollution except where the variables are poorly named and might be used by another tool unexpectedl23:23
fungiy23:23
*** tosky has quit IRC23:31
ianwwait-for-it.sh is a pure bash script ... nc -z $WAITFORIT_HOST $WAITFORIT_PORT23:34
ianwit bet you could do something with proc and opening a socket in actual pure bash23:35
*** LowKey has quit IRC23:37
*** LowKey has joined #opendev23:37
openstackgerritMerged opendev/system-config master: grafana: take some screenshots during testing  https://review.opendev.org/c/opendev/system-config/+/77555323:37
clarkbianw: guillaumec I'm trying to make sense of that comment width change. It seems like it is doing an n^2 search over a single list of comments?23:51
clarkbI'm not sure I understand this loop23:51
openstackgerritIan Wienand proposed opendev/system-config master: [wip] gerrit : use mariadb container  https://review.opendev.org/c/opendev/system-config/+/77596123:51
clarkboh no last_change_idx is there to move things ahead on the next pass of the outer loop23:51
clarkbso its still a scan done in n23:52
fungiianw: is there a change already to add the new borg servers to cacti?23:54
fungior are they already in there and i'm just blind?23:55
ianwfungi: ahh, nope, i may well have forgotten that23:55
fungii'm happy to push a change up for that23:55
fungii was being lazy and using cacti to find their hostnames ;)23:55
fungibackup02.ca-ymq-1.vexxhost.opendev.org and backup01.ord.rax.opendev.org are the borg servers, right?23:56
ianwyep, that's right23:56
fungicool, will push momentarily23:57
ianwclarkb: yeah, the overall level of "i don't want to debug this and i wish gerrit had a way to style comments properly" is my main concern23:57
clarkbianw: heh that resembles the comments I'm writing :)23:58
clarkbthough with specific concerns related to ^23:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!