Monday, 2021-05-24

*** DSpider has quit IRC00:40
*** timburke has joined #opendev01:26
*** gmann_afk is now known as gmann01:30
*** timburke has quit IRC02:31
*** timburke has joined #opendev04:18
*** gnuoy has quit IRC04:33
*** gnuoy has joined #opendev04:36
*** ykarel|away has joined #opendev04:54
*** ykarel|away is now known as ykarel05:00
*** ralonsoh has joined #opendev05:28
*** timburke has quit IRC05:35
*** fressi has joined #opendev05:58
*** ysandeep|away is now known as ysandeep06:11
*** slaweq has joined #opendev06:33
*** sshnaidm has joined #opendev06:58
*** amoralej|off is now known as amoralej07:01
openstackgerritDmitriy Rabotyagov proposed opendev/system-config master: Add mirroring of Erlang Solutions repo
*** tosky has joined #opendev07:20
*** fressi has quit IRC07:24
*** andrewbonney has joined #opendev07:31
*** slaweq has quit IRC07:32
*** slaweq has joined #opendev07:35
*** vishalmanchanda has joined #opendev07:41
*** ysandeep is now known as ysandeep|lunch07:51
*** lucasagomes has joined #opendev07:58
iurygregoryDoes anyone know the correct place to report that we have a link in that redirects to 404 ? =)08:15
*** ykarel is now known as ykarel|lunch08:27
*** stephenfin has quit IRC08:27
openstackgerritIan Wienand proposed opendev/system-config master: static site
openstackgerritGuillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal
*** ysandeep|lunch is now known as ysandeep08:58
openstackgerritGuillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal
*** ykarel|lunch is now known as ykarel09:13
openstackgerritGuillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal
*** DSpider has joined #opendev09:24
*** jpenag is now known as jpena09:30
*** ysandeep has quit IRC09:50
*** ysandeep has joined #opendev09:51
openstackgerritGuillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal
*** owalsh has quit IRC10:05
openstackgerritGuillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal
*** owalsh has joined #opendev10:27
*** lpetrut has joined #opendev10:43
openstackgerritGuillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal
*** fbo has joined #opendev11:22
*** jpena is now known as jpena|lunch11:34
fbohi, we are experiency network issue contacting , curl -vi seems flaky11:55
openstackgerritIan Wienand proposed opendev/system-config master: static site
openstackgerritIan Wienand proposed opendev/system-config master: static site
*** jpena|lunch is now known as jpena12:24
*** amoralej is now known as amoralej|lunch12:27
dpawlikcc fungi ^^12:28
dpawlikcc ianw12:28
*** slittle1 has joined #opendev12:35
slittle1Are the git servers down?12:39
*** elod has quit IRC12:56
slittle1various errors12:57
slittle1 fatal: unable to access '': error:1408F10B:SSL routines:ssl3_get_record:wrong version number12:57
slittle1fatal: unable to access '': Failed to connect to 2604:e100:3:0:f816:3eff:fe6b:ad62: Network is unreachable12:57
slittle1 fatal: unable to access '': SSL received a record that exceeded the maximum permissible length.12:58
slittle1and some clone requests pass as well12:58
*** dwilde is now known as d34dh0r5312:59
*** arxcruz is now known as arxcruz|rover13:00
*** akahat|ruck is now known as akahat13:00
*** raghavendrat has joined #opendev13:02
raghavendrathi, we have some query regarding addition of tag "autogenerated:zuul" in CI result13:05
*** elod has joined #opendev13:05
raghavendratif anyone can suggest whom to contact . . . that would be greatly appreciated13:06
*** ysandeep is now known as ysandeep|ruck13:06
*** rishabhhpe has joined #opendev13:08
slittle1git clone issues started around 6:00 pm May 23 EST13:11
fungifbo: dpawlik: slittle1: sorry, i'm around now and checking on the git servers13:13
fungisystem resource graphs in cacti for the load balancer and all the backends look relatively normal13:17
fungii can browse and git clone something simple like with no problem13:17
fungii wonder if there's a subtle problem with one or more of the backends and i'm just getting lucky13:18
*** amoralej|lunch is now known as amoralej13:18
fungifbo: dpawlik: slittle1: can users who are experiencing problems let us know what common name or altname besides plain is showing on the ssl certificate for (that'll tell me which backend you're hitting)?13:20
fungii'm going to start logging into a shell on all the backends and checking them out more thoroughly13:20
dpawlikfungi: i tried multiple times: curl -vi and one per 5 requests are failing13:21
dpawlikis failing*13:21
fbofungi: two curl attempt and the second is
fungii discovered that one of the service containers on the gitea08 backend wasn't running so i've downed and upped it again13:26
fungifbo: oh, maybe this is a v6 routing error. can you try cloning over ipv4 and let me know if you see the same?13:28
*** rishabhhpe has quit IRC13:29
*** raghavendrat has quit IRC13:29
fungidpawlik: slittle1: are your failures also "network is unreachable" or similar?13:29
fungiand are you hitting's ipv6 or ipv4 address?13:29
*** mtreinish has quit IRC13:29
dpawlikfungi: it was failing on ipv4  than it switch to v613:31
dpawlikfungi: seems to be better13:31
fungiwell, i'm not sure i've necessarily fixed anything13:32
dpawliknope, it fail once again13:32
dpawliknow it is more often13:33
fungialso it looks like i misread the process list and the container i thought was down on gitea08 wasn't13:33
fungii'm trying cloning directly from each backend in a loop now, since looping a git clone through the lb didn't break for me13:38
fungiall worked for me over ipv6 at least, though i'm getting a sinking suspicion this could be a routing problem somewhere out on the internet which just isn't impacting the path my connection traverses (lucky me i guess?)13:45
fungii can retry them all over ipv4 and see if i can get any clone calls to break13:45
fungiokay, tried them all 25x over both ipv4 and ipv613:52
fungiat this point i expect it's a problem with comes and goes, or it's a routing problem impacting only a subset of the internet, or it's specific to certain git repositories and not others maybe13:53
fungithough if just a simple curl of is also failing, then seems like it's not going to be related to specific repositories13:53
fungidpawlik: fbo: are both of you testing from the same network?13:54
fbofungi: I've just tried on a VM located in Canada (I guess) where IPv6 is disabled but still happen. Timeout connection on with curl -iv https://opendev.org13:55
fboalso from my laptop at my place (france) still same issue.13:56
fungiyeah, working consistently from home for me. i'll try from machines in some of our different cloud providers and see if i can find one which is impacted13:57
fungiat least then i should be able to traceroute in both directions and work out where things might be breaking13:57
fungino problems from rackspace dfw/iad/ord, ovh bhs1/gra1, limestone regionone, linaro-us regionone... but i got it to break from inap mtl0114:13
fungiso i'll investigate from there14:13
fungiand it's definitely as reported by others, most connections work, random ones hang14:14
fungialso i think the ipv6 network unreachable error is a secondary error, this seems endemic to ipv4-only hosts where curl tries to fall back to v6 anyway after the v4 connection is not looking great14:15
fungioh this is even more worrisome... the traceroute between them suggests that inap and vexxhost are both peering with cogent, so the traceroute is entirely within that backbone provider14:20
fungicogent-only when tracerouting in both directions14:23
fungiby comparison, the rackspace regions are all connecting to vexxhost through zayo (a.k.a. 360networks/pacific fiber)14:27
fungisame for the ovh regions (bhs1 directly via zayo, gra1 takes abovenet to zayo to vexxhost)14:32
fungimnaser: ^ just a heads up, seems like there may be some packet loss/blackholes in cogent impacting availability of sjc1 v4 addresses from some parts of the internet14:35
mnaserfungi: do you have pairs of ips that are problematic?  i can file a ticket (thanks for checking)14:37
mnasercc guilhermesp14:37
fungimnaser: trying to reach from for example14:38
fungiit's random, maybe 20% of the connections i originate from to don't complete, but we have users reporting similar problems from various networks14:38
mnaserfungi: what if you try hitting
openstackgerritIan Wienand proposed opendev/system-config master: static site
*** ykarel is now known as ykarel|away14:40
fungimnaser: via what protocol? i've been testing over http(s) but can try to see if i can reproduce the problem with large icmp packets or something (just normal ping doesn't seem to show packet loss though)14:41
mnaserfungi: when i've seen cogent have this sort of crap, i am able to repro using mtr enabling tcp mode14:41
fungii'll give that a shot, ping -s 1472 doesn't seem to show it happening14:43
fungimnaser: bingo, mtr --tcp is showing packet loss in cogent's sfo pop14:45
fungiaround 20%, which is right on the mark for the failure percentage rates reported14:46
mnaserfungi: would you mind letting it run for a bit and sharing the mtr output?14:47
fungimnaser: sure, it's in curses mode which is probably not that useful (and my terminal is rather small). i'll see if i can get it to do log output instead14:48
fungii was also working on mtr in the other direction for comparison14:49
fungiin the other direction we see ~20% loss starting at slc (the hop after sfo) so seems like maybe they've got a problem link between sfo and slc or something14:49
mnaserfungi: appreciate this information14:50
*** lpetrut has quit IRC14:50
fungimnaser: is raw mode from mtr sufficient? or do you want something you don't need to parse?14:51
mnaserfungi: this is mostly to be parsed by humans =)14:52
fungii'll see if i can get a capture of the curses interface in that case14:53
fungijust need a tiny enough font to get it to all fit in the terminal14:53
clarkbfungi: re docker processes you can use `docker ps -a` to see docker's view15:01
fungiclarkb: yep, it was mainly that i was expecting to see more gitea processes based on comparison with another backend, but apparently it creates on-demand forks15:02
fungiand there were a number of semi-recent gitea segfaults in dmesg which got me suspicious15:03
mnaserfungi: do you have a few cycles by any chance to run the same (from say, opendev mirror at sjc1?)15:03
mnaserto 198.x15:03
fungimnaser: sure15:04
fungiunfortunately i can't run too many mt cycles, it bails quickly with "address already in use" (i guess it has trouble opening enough tcp sockets?)15:06
fungislittle1: dpawlik: fbo: anyway, to summarize, it looks like there's something going on in cogent around or between their san francisco and salt lake city pops, so connections traversing that link are having problems. thankfully mnaser is a customer of theirs so can directly report it at least15:10
fbofungi: mnaser thanks a lot for the investigation !15:11
fungii need to step away for a few minutes, but will return shortly15:11
mnaserfbo: are you able to maybe run an mtr as well?15:12
fbomnaser: probably what command should I run ?15:13
mnaserfbo: mtr --tcp opendev.org15:13
mnasershould be okay15:13
clarkbraghavendrat is no longer here but you need to set that flag on review comments from the CI system. has a --tag flag when reviewing via ssh and has a tag attribute when reviewing via http REST api15:14
clarkbraghavendrat's CI system will need to do one or the other. Zuul does this by default if you use zuulv3 or newer and tell it to review by http iirc15:14
mnaserfbo: ok i see, so you're coming in from zayo, but egressing via cogent which is causing the packet loss15:17
openstackgerritClark Boylan proposed opendev/system-config master: Provision ethercalc LE cert
openstackgerritClark Boylan proposed opendev/system-config master: Switch ethercalc to the new LE cert
*** d34dh0r53 has quit IRC15:29
mnaserfungi, fbo: i've shutdown the cogent bgp peer, waiting for things to propagate15:34
clarkbinfra-root ^ I have gone ahead and created acme challenge records for ethercalc, wiki, storyboard, and translate. THe wiki one may not be useful, but figured while I was in DNS edit mode I should add it15:34
clarkbI think we can probably go ahead and land 792708 if it looks correct to thers and start pushing on that15:34
fbomnaser: thanks15:35
clarkbI do need to pop out for some errands this morning though (new glasses yay I'll see everything at odd angles for a few days)15:35
*** d34dh0r53 has joined #opendev15:43
*** lucasagomes has quit IRC16:02
slittle1k, I'm back16:27
slittle1holiday up here, so I can only touch base infrequently16:28
slittle1git clone
slittle1Cloning into 'manifest'...16:31
slittle1error: RPC failed; result=35, HTTP code = 016:31
slittle1fatal: The remote end hung up unexpectedly16:31
slittle1want me to try something ?16:31
*** ykarel|away has quit IRC16:32
fungislittle1: i think at this point we're waiting for mnaser's bgp announcement changes to propagate16:33
fungii also still see issues from inap's mtl01 region, and it's still going through cogent16:33
fungislittle1: not sure if you followed the rather lengthy scrollback but it's nothing we have direct control over, seems to be some routing problems between cogent's sfo and slc pops16:34
fungiso any connections traversing cogent to get to vexxhost sjc1 may experience problems16:34
fungiseems like roughly 1 in 5 tcp connections traversing that link end up in a black hole16:36
*** rishabhhpe has joined #opendev16:38
fungislittle1: reports "Some customers are experiencing some routing issues in the US. This has been escalated to our IP Engineering Group for further investigation. No ETR yet. Our trouble ticket on this issue is HD11634166."16:40
fungiunfortunately very vague16:41
slittle1seems to have improved in the last 5 minutes or so16:42
*** amoralej is now known as amoralej|off16:43
slittle13 for 3 'repo sync' successful .... that's ~200 git clone/fetch ... I think it's resolved16:46
mnaserslittle1: it seems to be improving indeed16:46
mnaserfungi: cogent's network status page is a bit of a joke sadly16:46
mnaserfungi: it's a bit like this -
fungislittle1: mnaser: yes, looks like cogent may have fixed or worked around the problem. mtr still shows me routing through slc and sfo but no errors now, and i don't get any more random hangs with curl16:49
*** andrewbonney has quit IRC17:05
*** ysandeep|ruck is now known as ysandeep|away17:14
*** ralonsoh has quit IRC17:23
*** jpena is now known as jpena|off17:23
clarkbinfra-root should be ready to go now and is a cleanup of things I noticed when writing the LE change17:33
*** timburke has joined #opendev17:37
clarkbre ansible forks speeding things up it seems that infra-prod-service-zuul would typically take 8.5-12.5 minutes (usually on the lower end of that range) previously and now runs in 6.5-7.25 minutes17:47
funginice! so nearly half the time17:55
clarkbI suspect that the longer runtimes are due to zuul image updates17:56
clarkbso the 6.5 vs 8.5 comparison may be more valuable as a typical run. Thats a decent improvement too though17:56
clarkbwhen you think about it in aggregate and the other jobs we run that should also benefit17:56
*** sshnaidm is now known as sshnaidm|afk18:02
*** rishabhhpe has quit IRC18:08
clarkbfungi: did you want to respond to the question from ianw and the question from ianw on the opendev irc bots discussion? or would you prefer someone else give it a go?18:34
fungiclarkb: i can probably reply in a bit. i'm just about there on check_irc_access.py18:37
*** auristor has quit IRC18:38
*** auristor has joined #opendev18:39
*** rishabhhpe has joined #opendev18:40
fungiyay! i finally have it passing locally18:47
openstackgerritJeremy Stanley proposed openstack/project-config master: Clean up accessbot channels without services
openstackgerritJeremy Stanley proposed openstack/project-config master: Accessbot OFTC channel list stopgap
openstackgerritJeremy Stanley proposed openstack/project-config master: Switch the IRC access check to OFTC
fungiinfra-root: JayF: ^ marking wip for the moment but that should pass18:54
JayFthank you!18:55
fungii need to take a break to cook dinner, but will reply to the service-discuss thread about it once i'm done eating18:57
clarkbfungi: thanks!19:02
*** rishabhhpe has quit IRC19:02
fungii'll also try to tackle starting to make accessbot itself work after that19:03
funginow that i have the irc module paged into my brain a bit and have already worked out some of the differences in that script19:04
clarkbfungi: I've revied the changes you pushed19:28
*** stevebaker has joined #opendev19:30
openstackgerritClark Boylan proposed opendev/system-config master: Provision LE cert for
openstackgerritClark Boylan proposed opendev/system-config master: Switch storyboard to LE cert
openstackgerritClark Boylan proposed opendev/system-config master: Provision LE cert for
openstackgerritClark Boylan proposed opendev/system-config master: Switch translate to LE cert
clarkbI'd like to get through those three, then next to look at will be openstackid19:50
clarkbif someone can take a look at the ethercalc change that would be great we can start getting the ball rolling on that19:54
*** stevebaker has quit IRC19:54
*** stevebaker has joined #opendev19:54
clarkbthe DNS records for all three (ethercalc, storyboard, and translate) shoudl be up, but please dig them and double check :)19:57
*** vishalmanchanda has quit IRC19:59
*** slaweq has quit IRC20:14
clarkbnb02 has a full /opt. I believe this is what has been causing service-nodepool jobs to fail. I'll look into it shortly20:15
clarkbthere are a couple of intermediate vhd files there that eblong to images that no longer exist so will start with cleaning those up20:19
clarkbthat free ~60GB20:21
clarkbthere is 132GB in dib_tmop20:34
clarkbI'm trying to do cleanup of old dirs in there without needing to stop everything and do the typical reboot we do to flush that20:37
corvusfungi: today is a holiday in parts of europe; i'm planning on sending a reply to service-discuss after giving folks until tomorrow to reply to zuul-discuss.20:45
openstackgerritIan Wienand proposed opendev/system-config master: static site
fungicorvus: no worries, thanks for the status update!20:47
corvusfungi, clarkb: any chance other opendev communities might have an internal consensus/recommendation by opendev meeting time tomorrow?20:48
corvus(wondering if there would be enough feedback by that point to attempt a resolution tomorrow)20:49
clarkbcorvus: engagement has been fairly tepid so far. Though in the openstack tc channel there was a group of people asking for them to speed things up20:52
clarkbI doubt we'll have an answer from them by our meeting tomorrow though20:52
fungiyeah, tomorrow seems soon21:01
clarkbok dib_image* and dib_build* cleaned up leaving the current build in place21:02
clarkbthere are a bunch of profiledir* contents too but those don't appear to be a major disk usage21:02
clarkbI should context switch into putting the meeting agenda together. Get your stuff on their now :)21:02
clarkbfungi: I put topics on the agenda for freenode things and the vexxhost nodepool changes. Any chances you are still around and can take a quick look at that and see if anything elseshould be added before I send that out?21:15
fungiclarkb: lgtm21:17
fungihonestly i'm so focused on irc things at the moment i can barely recall what else is going on21:17
clarkbno worries I'm trying to remember those other items for everyone else :)21:17
clarkbianw: it looks like we may not have built any images for about a week? maybe a little less. Was there a recent dib update?21:31
ianwclarkb: no, no release21:31
ianwi can take a look in a bit21:32
clarkb2021-05-24 19:37:54.300 | DEBUG diskimage_builder.block_device.utils [-] exec_sudo: losetup: /opt/dib_tmp/dib_image.3HAMohsV/image0.raw: failed to set up loop device: No such file or directory exec_sudo /usr/local/lib/python3.7/site-packages/diskimage_builder/block_device/
clarkbianw: thanks. I'm going to do similar disk cleanup on nb01 that I did on nb0221:33
clarkbdid we maybe leak all the loopback devices so we can't make more of them?21:34
openstackgerritJeremy Stanley proposed openstack/project-config master: Accessbot OFTC channel list stopgap
openstackgerritJeremy Stanley proposed openstack/project-config master: Switch the IRC access check to OFTC
openstackgerritJeremy Stanley proposed openstack/project-config master: Revert "Accessbot OFTC channel list stopgap"
fungithat should really pass now21:42
fungii was only previously running the irc access check locally, and not the config check21:43
clarkbianw: I think I may wait on further cleanup on nb01. It seems to have leaked quite a bit more than nb02 and the simplest thing there likely is cleaning out /opt/dib_tmp entirely after stopping the container and restarting the server. But I don't want to do that because it may magically fix things and then we don't ahve good debug info21:52
openstackgerritMerged openstack/project-config master: Remove gerritbot from channels with no discussion
clarkbianw: looking at losetup --list we seem to be using 8 of 9 /dev/loop devices present in devfs22:01
clarkbnb02 is only using 4 which would explain why it is building all the images and running out of disk22:03
clarkbianw: if you agree that is the issue I can do the container shutdown, reboot, cleanup, then start container again22:03
*** openstackgerrit has quit IRC22:05
clarkbya looks like nb02 may be about to successfully build a centos-8 image too22:09
*** openstackgerrit has joined #opendev22:21
openstackgerritMerged opendev/system-config master: Drop meetbot/statusbot from inactive IRC channels
*** DSpider has quit IRC22:23
ianwclarkb: sorry, back now22:46
clarkbianw: tldr is I think we may have leaked all our loop devices so now dib fails. Easy way to clean that up is do to the stop, reboot, cleanup disk, start process. But not sure if you want to look and see where we might be leaking those things22:52
ianwyeah, the root cause is usually a looping failure of another build22:52
ianwunfortunately we've probably rotated out the logs22:53
ianwi will stop and restart them22:53
clarkbianw: ok and you'll clean out dib_tmp in the process? nb01 in particular has lost a lost of psace to things sticking around in there22:55
clarkbianw: I also got a stack of LE conversion changes starting at, DNS should be set for those three updates too. In the process I noticed we had some puppet related cleanups we could do which ended up in you added some of those todos I think23:06
ianw++ will review today23:07
ianwi'm just trying to get that static site we talked about working, but getting sidetracked into fixing ansible documentation23:07
clarkbya I saw that change go by but haven't reviewed it yet. Is it ready for review now?23:08
ianwnot yet, still noodling on the deployment23:08
*** tosky has quit IRC23:18
*** openstack has joined #opendev23:49
*** ChanServ sets mode: +o openstack23:49
*** jeblairtest has joined #opendev23:52
*** jeblairtest has left #opendev23:52
*** jeblairtest has joined #opendev23:53
*** jeblairtest has left #opendev23:54
*** jeblairtest has joined #opendev23:56

Generated by 2.17.2 by Marius Gedminas - find it at!