*** DSpider has quit IRC | 00:40 | |
*** timburke has joined #opendev | 01:26 | |
*** gmann_afk is now known as gmann | 01:30 | |
*** timburke has quit IRC | 02:31 | |
*** timburke has joined #opendev | 04:18 | |
*** gnuoy has quit IRC | 04:33 | |
*** gnuoy has joined #opendev | 04:36 | |
*** ykarel|away has joined #opendev | 04:54 | |
*** ykarel|away is now known as ykarel | 05:00 | |
*** ralonsoh has joined #opendev | 05:28 | |
*** timburke has quit IRC | 05:35 | |
*** fressi has joined #opendev | 05:58 | |
*** ysandeep|away is now known as ysandeep | 06:11 | |
*** slaweq has joined #opendev | 06:33 | |
*** sshnaidm has joined #opendev | 06:58 | |
*** amoralej|off is now known as amoralej | 07:01 | |
openstackgerrit | Dmitriy Rabotyagov proposed opendev/system-config master: Add mirroring of Erlang Solutions repo https://review.opendev.org/c/opendev/system-config/+/792651 | 07:06 |
---|---|---|
*** tosky has joined #opendev | 07:20 | |
*** fressi has quit IRC | 07:24 | |
*** andrewbonney has joined #opendev | 07:31 | |
*** slaweq has quit IRC | 07:32 | |
*** slaweq has joined #opendev | 07:35 | |
*** vishalmanchanda has joined #opendev | 07:41 | |
*** ysandeep is now known as ysandeep|lunch | 07:51 | |
*** lucasagomes has joined #opendev | 07:58 | |
iurygregory | Does anyone know the correct place to report that we have a link in https://www.openstack.org that redirects to 404 ? =) | 08:15 |
*** ykarel is now known as ykarel|lunch | 08:27 | |
*** stephenfin has quit IRC | 08:27 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: ask.openstack.org static site https://review.opendev.org/c/opendev/system-config/+/792789 | 08:28 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal https://review.opendev.org/c/zuul/zuul-jobs/+/792349 | 08:35 |
*** ysandeep|lunch is now known as ysandeep | 08:58 | |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal https://review.opendev.org/c/zuul/zuul-jobs/+/792349 | 09:01 |
*** ykarel|lunch is now known as ykarel | 09:13 | |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal https://review.opendev.org/c/zuul/zuul-jobs/+/792349 | 09:21 |
*** DSpider has joined #opendev | 09:24 | |
*** jpenag is now known as jpena | 09:30 | |
*** ysandeep has quit IRC | 09:50 | |
*** ysandeep has joined #opendev | 09:51 | |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal https://review.opendev.org/c/zuul/zuul-jobs/+/792349 | 09:54 |
*** owalsh has quit IRC | 10:05 | |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal https://review.opendev.org/c/zuul/zuul-jobs/+/792349 | 10:08 |
*** owalsh has joined #opendev | 10:27 | |
*** lpetrut has joined #opendev | 10:43 | |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: Fix buildset-registry test on focal https://review.opendev.org/c/zuul/zuul-jobs/+/792349 | 10:57 |
*** fbo has joined #opendev | 11:22 | |
*** jpena is now known as jpena|lunch | 11:34 | |
fbo | hi, we are experiency network issue contacting opendev.org , curl -vi https://opendev.org seems flaky | 11:55 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: ask.openstack.org static site https://review.opendev.org/c/opendev/system-config/+/792789 | 11:59 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: ask.openstack.org static site https://review.opendev.org/c/opendev/system-config/+/792789 | 12:01 |
*** jpena|lunch is now known as jpena | 12:24 | |
*** amoralej is now known as amoralej|lunch | 12:27 | |
dpawlik | cc fungi ^^ | 12:28 |
dpawlik | cc ianw | 12:28 |
*** slittle1 has joined #opendev | 12:35 | |
slittle1 | Are the git servers down? | 12:39 |
*** elod has quit IRC | 12:56 | |
slittle1 | various errors | 12:57 |
slittle1 | fatal: unable to access 'https://opendev.org/starlingx/manifest/': error:1408F10B:SSL routines:ssl3_get_record:wrong version number | 12:57 |
slittle1 | fatal: unable to access 'https://opendev.org/starlingx/manifest/': Failed to connect to 2604:e100:3:0:f816:3eff:fe6b:ad62: Network is unreachable | 12:57 |
slittle1 | fatal: unable to access 'https://opendev.org/starlingx/manifest/': SSL received a record that exceeded the maximum permissible length. | 12:58 |
slittle1 | and some clone requests pass as well | 12:58 |
*** dwilde is now known as d34dh0r53 | 12:59 | |
*** arxcruz is now known as arxcruz|rover | 13:00 | |
*** akahat|ruck is now known as akahat | 13:00 | |
*** raghavendrat has joined #opendev | 13:02 | |
raghavendrat | hi, we have some query regarding addition of tag "autogenerated:zuul" in CI result | 13:05 |
*** elod has joined #opendev | 13:05 | |
raghavendrat | if anyone can suggest whom to contact . . . that would be greatly appreciated | 13:06 |
*** ysandeep is now known as ysandeep|ruck | 13:06 | |
*** rishabhhpe has joined #opendev | 13:08 | |
slittle1 | git clone issues started around 6:00 pm May 23 EST | 13:11 |
fungi | fbo: dpawlik: slittle1: sorry, i'm around now and checking on the git servers | 13:13 |
fungi | system resource graphs in cacti for the load balancer and all the backends look relatively normal | 13:17 |
fungi | i can browse opendev.org and git clone something simple like https://opendev.org/opendev/bindep with no problem | 13:17 |
fungi | i wonder if there's a subtle problem with one or more of the backends and i'm just getting lucky | 13:18 |
*** amoralej|lunch is now known as amoralej | 13:18 | |
fungi | fbo: dpawlik: slittle1: can users who are experiencing problems let us know what common name or altname besides plain opendev.org is showing on the ssl certificate for https://opendev.org/ (that'll tell me which backend you're hitting)? | 13:20 |
fungi | i'm going to start logging into a shell on all the backends and checking them out more thoroughly | 13:20 |
dpawlik | fungi: i tried multiple times: curl -vi https://opendev.org and one per 5 requests are failing | 13:21 |
dpawlik | is failing* | 13:21 |
fbo | fungi: two curl attempt and the second is https://softwarefactory-project.io/paste/show/2012/ | 13:23 |
fungi | i discovered that one of the service containers on the gitea08 backend wasn't running so i've downed and upped it again | 13:26 |
fungi | fbo: oh, maybe this is a v6 routing error. can you try cloning over ipv4 and let me know if you see the same? | 13:28 |
*** rishabhhpe has quit IRC | 13:29 | |
*** raghavendrat has quit IRC | 13:29 | |
fungi | dpawlik: slittle1: are your failures also "network is unreachable" or similar? | 13:29 |
fungi | and are you hitting opendev.org's ipv6 or ipv4 address? | 13:29 |
*** mtreinish has quit IRC | 13:29 | |
dpawlik | fungi: it was failing on ipv4 than it switch to v6 | 13:31 |
dpawlik | fungi: seems to be better | 13:31 |
dpawlik | fungi++ | 13:31 |
fungi | well, i'm not sure i've necessarily fixed anything | 13:32 |
dpawlik | nope, it fail once again | 13:32 |
dpawlik | now it is more often | 13:33 |
fungi | also it looks like i misread the process list and the container i thought was down on gitea08 wasn't | 13:33 |
fungi | i'm trying cloning directly from each backend in a loop now, since looping a git clone through the lb didn't break for me | 13:38 |
fungi | all worked for me over ipv6 at least, though i'm getting a sinking suspicion this could be a routing problem somewhere out on the internet which just isn't impacting the path my connection traverses (lucky me i guess?) | 13:45 |
fungi | i can retry them all over ipv4 and see if i can get any clone calls to break | 13:45 |
fungi | okay, tried them all 25x over both ipv4 and ipv6 | 13:52 |
fungi | at this point i expect it's a problem with comes and goes, or it's a routing problem impacting only a subset of the internet, or it's specific to certain git repositories and not others maybe | 13:53 |
fungi | though if just a simple curl of https://opendev.org/ is also failing, then seems like it's not going to be related to specific repositories | 13:53 |
fungi | dpawlik: fbo: are both of you testing from the same network? | 13:54 |
fbo | fungi: I've just tried on a VM located in Canada (I guess) where IPv6 is disabled but still happen. Timeout connection on 38.108.68.124 with curl -iv https://opendev.org | 13:55 |
fbo | also from my laptop at my place (france) still same issue. | 13:56 |
fungi | yeah, working consistently from home for me. i'll try from machines in some of our different cloud providers and see if i can find one which is impacted | 13:57 |
fungi | at least then i should be able to traceroute in both directions and work out where things might be breaking | 13:57 |
fbo | thanks | 13:59 |
fungi | no problems from rackspace dfw/iad/ord, ovh bhs1/gra1, limestone regionone, linaro-us regionone... but i got it to break from inap mtl01 | 14:13 |
fungi | so i'll investigate from there | 14:13 |
fungi | and it's definitely as reported by others, most connections work, random ones hang | 14:14 |
fungi | also i think the ipv6 network unreachable error is a secondary error, this seems endemic to ipv4-only hosts where curl tries to fall back to v6 anyway after the v4 connection is not looking great | 14:15 |
fungi | oh this is even more worrisome... the traceroute between them suggests that inap and vexxhost are both peering with cogent, so the traceroute is entirely within that backbone provider | 14:20 |
fungi | cogent-only when tracerouting in both directions | 14:23 |
fungi | by comparison, the rackspace regions are all connecting to vexxhost through zayo (a.k.a. 360networks/pacific fiber) | 14:27 |
fungi | same for the ovh regions (bhs1 directly via zayo, gra1 takes abovenet to zayo to vexxhost) | 14:32 |
fungi | mnaser: ^ just a heads up, seems like there may be some packet loss/blackholes in cogent impacting availability of sjc1 v4 addresses from some parts of the internet | 14:35 |
mnaser | fungi: do you have pairs of ips that are problematic? i can file a ticket (thanks for checking) | 14:37 |
mnaser | cc guilhermesp | 14:37 |
mnaser | ^ | 14:37 |
fungi | mnaser: trying to reach 38.108.68.124 from 198.72.125.6 for example | 14:38 |
fungi | it's random, maybe 20% of the connections i originate from 198.72.125.6 to 38.108.68.124 don't complete, but we have users reporting similar problems from various networks | 14:38 |
mnaser | fungi: what if you try hitting 38.140.50.137 | 14:39 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: ask.openstack.org static site https://review.opendev.org/c/opendev/system-config/+/792789 | 14:39 |
*** ykarel is now known as ykarel|away | 14:40 | |
fungi | mnaser: via what protocol? i've been testing over http(s) but can try to see if i can reproduce the problem with large icmp packets or something (just normal ping doesn't seem to show packet loss though) | 14:41 |
mnaser | fungi: when i've seen cogent have this sort of crap, i am able to repro using mtr enabling tcp mode | 14:41 |
fungi | i'll give that a shot, ping -s 1472 doesn't seem to show it happening | 14:43 |
fungi | mnaser: bingo, mtr --tcp 38.140.50.137 is showing packet loss in cogent's sfo pop | 14:45 |
fungi | around 20%, which is right on the mark for the failure percentage rates reported | 14:46 |
mnaser | fungi: would you mind letting it run for a bit and sharing the mtr output? | 14:47 |
fungi | mnaser: sure, it's in curses mode which is probably not that useful (and my terminal is rather small). i'll see if i can get it to do log output instead | 14:48 |
fungi | i was also working on mtr in the other direction for comparison | 14:49 |
fungi | in the other direction we see ~20% loss starting at slc (the hop after sfo) so seems like maybe they've got a problem link between sfo and slc or something | 14:49 |
mnaser | fungi: appreciate this information | 14:50 |
*** lpetrut has quit IRC | 14:50 | |
fungi | mnaser: is raw mode from mtr sufficient? or do you want something you don't need to parse? | 14:51 |
mnaser | fungi: this is mostly to be parsed by humans =) | 14:52 |
fungi | i'll see if i can get a capture of the curses interface in that case | 14:53 |
fungi | just need a tiny enough font to get it to all fit in the terminal | 14:53 |
clarkb | fungi: re docker processes you can use `docker ps -a` to see docker's view | 15:01 |
fungi | mnaser: http://paste.openstack.org/show/805650/ | 15:01 |
fungi | clarkb: yep, it was mainly that i was expecting to see more gitea processes based on comparison with another backend, but apparently it creates on-demand forks | 15:02 |
fungi | and there were a number of semi-recent gitea segfaults in dmesg which got me suspicious | 15:03 |
mnaser | fungi: do you have a few cycles by any chance to run the same (from say, opendev mirror at sjc1?) | 15:03 |
mnaser | to 198.x | 15:03 |
fungi | mnaser: sure | 15:04 |
fungi | unfortunately i can't run too many mt cycles, it bails quickly with "address already in use" (i guess it has trouble opening enough tcp sockets?) | 15:06 |
fungi | mnaser: http://paste.openstack.org/show/805651/ | 15:08 |
fungi | slittle1: dpawlik: fbo: anyway, to summarize, it looks like there's something going on in cogent around or between their san francisco and salt lake city pops, so connections traversing that link are having problems. thankfully mnaser is a customer of theirs so can directly report it at least | 15:10 |
fbo | fungi: mnaser thanks a lot for the investigation ! | 15:11 |
fungi | i need to step away for a few minutes, but will return shortly | 15:11 |
mnaser | fbo: are you able to maybe run an mtr as well? | 15:12 |
fbo | mnaser: probably what command should I run ? | 15:13 |
mnaser | fbo: mtr --tcp opendev.org | 15:13 |
mnaser | should be okay | 15:13 |
clarkb | raghavendrat is no longer here but you need to set that flag on review comments from the CI system. https://review.opendev.org/Documentation/cmd-review.html has a --tag flag when reviewing via ssh and https://review.opendev.org/Documentation/rest-api-changes.html#review-input has a tag attribute when reviewing via http REST api | 15:14 |
clarkb | raghavendrat's CI system will need to do one or the other. Zuul does this by default if you use zuulv3 or newer and tell it to review by http iirc | 15:14 |
fbo | mnaser: https://softwarefactory-project.io/paste/show/2013/ | 15:17 |
mnaser | fbo: ok i see, so you're coming in from zayo, but egressing via cogent which is causing the packet loss | 15:17 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Provision ethercalc LE cert https://review.opendev.org/c/opendev/system-config/+/792708 | 15:26 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Switch ethercalc to the new LE cert https://review.opendev.org/c/opendev/system-config/+/792827 | 15:26 |
*** d34dh0r53 has quit IRC | 15:29 | |
mnaser | fungi, fbo: i've shutdown the cogent bgp peer, waiting for things to propagate | 15:34 |
fungi | thanks! | 15:34 |
clarkb | infra-root ^ I have gone ahead and created acme challenge records for ethercalc, wiki, storyboard, and translate. THe wiki one may not be useful, but figured while I was in DNS edit mode I should add it | 15:34 |
clarkb | I think we can probably go ahead and land 792708 if it looks correct to thers and start pushing on that | 15:34 |
fbo | mnaser: thanks | 15:35 |
clarkb | I do need to pop out for some errands this morning though (new glasses yay I'll see everything at odd angles for a few days) | 15:35 |
*** d34dh0r53 has joined #opendev | 15:43 | |
*** lucasagomes has quit IRC | 16:02 | |
slittle1 | k, I'm back | 16:27 |
slittle1 | holiday up here, so I can only touch base infrequently | 16:28 |
slittle1 | git clone https://opendev.org/starlingx/manifest | 16:31 |
slittle1 | Cloning into 'manifest'... | 16:31 |
slittle1 | error: RPC failed; result=35, HTTP code = 0 | 16:31 |
slittle1 | fatal: The remote end hung up unexpectedly | 16:31 |
slittle1 | want me to try something ? | 16:31 |
*** ykarel|away has quit IRC | 16:32 | |
fungi | slittle1: i think at this point we're waiting for mnaser's bgp announcement changes to propagate | 16:33 |
fungi | i also still see issues from inap's mtl01 region, and it's still going through cogent | 16:33 |
fungi | slittle1: not sure if you followed the rather lengthy scrollback but it's nothing we have direct control over, seems to be some routing problems between cogent's sfo and slc pops | 16:34 |
fungi | so any connections traversing cogent to get to vexxhost sjc1 may experience problems | 16:34 |
fungi | seems like roughly 1 in 5 tcp connections traversing that link end up in a black hole | 16:36 |
*** rishabhhpe has joined #opendev | 16:38 | |
fungi | slittle1: https://ecogent.cogentco.com/network-status reports "Some customers are experiencing some routing issues in the US. This has been escalated to our IP Engineering Group for further investigation. No ETR yet. Our trouble ticket on this issue is HD11634166." | 16:40 |
fungi | unfortunately very vague | 16:41 |
slittle1 | seems to have improved in the last 5 minutes or so | 16:42 |
*** amoralej is now known as amoralej|off | 16:43 | |
slittle1 | 3 for 3 'repo sync' successful .... that's ~200 git clone/fetch ... I think it's resolved | 16:46 |
mnaser | slittle1: it seems to be improving indeed | 16:46 |
mnaser | fungi: cogent's network status page is a bit of a joke sadly | 16:46 |
mnaser | fungi: it's a bit like this - https://kerrywills.files.wordpress.com/2018/05/dgkhi6yw0aaq3f1.jpg | 16:47 |
fungi | hah | 16:47 |
fungi | slittle1: mnaser: yes, looks like cogent may have fixed or worked around the problem. mtr still shows me routing through slc and sfo but no errors now, and i don't get any more random hangs with curl | 16:49 |
*** andrewbonney has quit IRC | 17:05 | |
*** ysandeep|ruck is now known as ysandeep|away | 17:14 | |
*** ralonsoh has quit IRC | 17:23 | |
*** jpena is now known as jpena|off | 17:23 | |
clarkb | infra-root https://review.opendev.org/c/opendev/system-config/+/792708 should be ready to go now and https://review.opendev.org/c/opendev/system-config/+/792709 is a cleanup of things I noticed when writing the LE change | 17:33 |
*** timburke has joined #opendev | 17:37 | |
clarkb | re ansible forks speeding things up it seems that infra-prod-service-zuul would typically take 8.5-12.5 minutes (usually on the lower end of that range) previously and now runs in 6.5-7.25 minutes | 17:47 |
fungi | nice! so nearly half the time | 17:55 |
clarkb | I suspect that the longer runtimes are due to zuul image updates | 17:56 |
clarkb | so the 6.5 vs 8.5 comparison may be more valuable as a typical run. Thats a decent improvement too though | 17:56 |
clarkb | when you think about it in aggregate and the other jobs we run that should also benefit | 17:56 |
*** sshnaidm is now known as sshnaidm|afk | 18:02 | |
*** rishabhhpe has quit IRC | 18:08 | |
clarkb | fungi: did you want to respond to the question from ianw and the question from ianw on the opendev irc bots discussion? or would you prefer someone else give it a go? | 18:34 |
fungi | clarkb: i can probably reply in a bit. i'm just about there on check_irc_access.py | 18:37 |
*** auristor has quit IRC | 18:38 | |
*** auristor has joined #opendev | 18:39 | |
*** rishabhhpe has joined #opendev | 18:40 | |
fungi | yay! i finally have it passing locally | 18:47 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Clean up accessbot channels without services https://review.opendev.org/c/openstack/project-config/+/792841 | 18:54 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Accessbot OFTC channel list stopgap https://review.opendev.org/c/openstack/project-config/+/792842 | 18:54 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Switch the IRC access check to OFTC https://review.opendev.org/c/openstack/project-config/+/792843 | 18:54 |
fungi | infra-root: JayF: ^ marking wip for the moment but that should pass | 18:54 |
JayF | thank you! | 18:55 |
fungi | i need to take a break to cook dinner, but will reply to the service-discuss thread about it once i'm done eating | 18:57 |
clarkb | fungi: thanks! | 19:02 |
*** rishabhhpe has quit IRC | 19:02 | |
fungi | i'll also try to tackle starting to make accessbot itself work after that | 19:03 |
fungi | now that i have the irc module paged into my brain a bit and have already worked out some of the differences in that script | 19:04 |
clarkb | fungi: I've revied the changes you pushed | 19:28 |
clarkb | *reviewed | 19:28 |
fungi | thanks | 19:30 |
*** stevebaker has joined #opendev | 19:30 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Provision LE cert for storyboard.openstack.org https://review.opendev.org/c/opendev/system-config/+/792852 | 19:41 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Switch storyboard to LE cert https://review.opendev.org/c/opendev/system-config/+/792853 | 19:41 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Provision LE cert for translate.openstack.org https://review.opendev.org/c/opendev/system-config/+/792854 | 19:49 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Switch translate to LE cert https://review.opendev.org/c/opendev/system-config/+/792855 | 19:49 |
clarkb | I'd like to get through those three, then next to look at will be openstackid | 19:50 |
clarkb | if someone can take a look at the ethercalc change that would be great https://review.opendev.org/c/opendev/system-config/+/792708 we can start getting the ball rolling on that | 19:54 |
*** stevebaker has quit IRC | 19:54 | |
*** stevebaker has joined #opendev | 19:54 | |
clarkb | the DNS records for all three (ethercalc, storyboard, and translate) shoudl be up, but please dig them and double check :) | 19:57 |
*** vishalmanchanda has quit IRC | 19:59 | |
*** slaweq has quit IRC | 20:14 | |
clarkb | nb02 has a full /opt. I believe this is what has been causing service-nodepool jobs to fail. I'll look into it shortly | 20:15 |
clarkb | there are a couple of intermediate vhd files there that eblong to images that no longer exist so will start with cleaning those up | 20:19 |
clarkb | that free ~60GB | 20:21 |
clarkb | there is 132GB in dib_tmop | 20:34 |
clarkb | *dib_tmp | 20:35 |
clarkb | I'm trying to do cleanup of old dirs in there without needing to stop everything and do the typical reboot we do to flush that | 20:37 |
corvus | fungi: today is a holiday in parts of europe; i'm planning on sending a reply to service-discuss after giving folks until tomorrow to reply to zuul-discuss. | 20:45 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: ask.openstack.org static site https://review.opendev.org/c/opendev/system-config/+/792789 | 20:47 |
fungi | corvus: no worries, thanks for the status update! | 20:47 |
corvus | fungi, clarkb: any chance other opendev communities might have an internal consensus/recommendation by opendev meeting time tomorrow? | 20:48 |
corvus | (wondering if there would be enough feedback by that point to attempt a resolution tomorrow) | 20:49 |
clarkb | corvus: engagement has been fairly tepid so far. Though in the openstack tc channel there was a group of people asking for them to speed things up | 20:52 |
clarkb | I doubt we'll have an answer from them by our meeting tomorrow though | 20:52 |
fungi | yeah, tomorrow seems soon | 21:01 |
clarkb | ok dib_image* and dib_build* cleaned up leaving the current build in place | 21:02 |
clarkb | there are a bunch of profiledir* contents too but those don't appear to be a major disk usage | 21:02 |
clarkb | I should context switch into putting the meeting agenda together. Get your stuff on their now :) | 21:02 |
clarkb | fungi: I put topics on the agenda for freenode things and the vexxhost nodepool changes. Any chances you are still around and can take a quick look at that and see if anything elseshould be added before I send that out? | 21:15 |
fungi | clarkb: lgtm | 21:17 |
fungi | honestly i'm so focused on irc things at the moment i can barely recall what else is going on | 21:17 |
clarkb | no worries I'm trying to remember those other items for everyone else :) | 21:17 |
fungi | thanks! | 21:17 |
clarkb | ianw: it looks like we may not have built any images for about a week? maybe a little less. Was there a recent dib update? | 21:31 |
ianw | clarkb: no, no release | 21:31 |
ianw | i can take a look in a bit | 21:32 |
clarkb | 2021-05-24 19:37:54.300 | DEBUG diskimage_builder.block_device.utils [-] exec_sudo: losetup: /opt/dib_tmp/dib_image.3HAMohsV/image0.raw: failed to set up loop device: No such file or directory exec_sudo /usr/local/lib/python3.7/site-packages/diskimage_builder/block_device/utils.py:135 | 21:32 |
clarkb | ianw: thanks. I'm going to do similar disk cleanup on nb01 that I did on nb02 | 21:33 |
clarkb | did we maybe leak all the loopback devices so we can't make more of them? | 21:34 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Accessbot OFTC channel list stopgap https://review.opendev.org/c/openstack/project-config/+/792842 | 21:41 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Switch the IRC access check to OFTC https://review.opendev.org/c/openstack/project-config/+/792843 | 21:41 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Revert "Accessbot OFTC channel list stopgap" https://review.opendev.org/c/openstack/project-config/+/792857 | 21:41 |
fungi | that should really pass now | 21:42 |
fungi | i was only previously running the irc access check locally, and not the config check | 21:43 |
clarkb | ianw: I think I may wait on further cleanup on nb01. It seems to have leaked quite a bit more than nb02 and the simplest thing there likely is cleaning out /opt/dib_tmp entirely after stopping the container and restarting the server. But I don't want to do that because it may magically fix things and then we don't ahve good debug info | 21:52 |
openstackgerrit | Merged openstack/project-config master: Remove gerritbot from channels with no discussion https://review.opendev.org/c/openstack/project-config/+/792301 | 21:59 |
clarkb | ianw: looking at losetup --list we seem to be using 8 of 9 /dev/loop devices present in devfs | 22:01 |
clarkb | nb02 is only using 4 which would explain why it is building all the images and running out of disk | 22:03 |
clarkb | ianw: if you agree that is the issue I can do the container shutdown, reboot, cleanup, then start container again | 22:03 |
*** openstackgerrit has quit IRC | 22:05 | |
clarkb | ya looks like nb02 may be about to successfully build a centos-8 image too | 22:09 |
*** openstackgerrit has joined #opendev | 22:21 | |
openstackgerrit | Merged opendev/system-config master: Drop meetbot/statusbot from inactive IRC channels https://review.opendev.org/c/opendev/system-config/+/792302 | 22:21 |
*** DSpider has quit IRC | 22:23 | |
ianw | clarkb: sorry, back now | 22:46 |
clarkb | ianw: tldr is I think we may have leaked all our loop devices so now dib fails. Easy way to clean that up is do to the stop, reboot, cleanup disk, start process. But not sure if you want to look and see where we might be leaking those things | 22:52 |
ianw | yeah, the root cause is usually a looping failure of another build | 22:52 |
ianw | unfortunately we've probably rotated out the logs | 22:53 |
ianw | i will stop and restart them | 22:53 |
clarkb | ianw: ok and you'll clean out dib_tmp in the process? nb01 in particular has lost a lost of psace to things sticking around in there | 22:55 |
ianw | yep | 22:55 |
clarkb | ianw: I also got a stack of LE conversion changes starting at https://review.opendev.org/c/opendev/system-config/+/792708, DNS should be set for those three updates too. In the process I noticed we had some puppet related cleanups we could do which ended up in https://review.opendev.org/c/opendev/system-config/+/792709 you added some of those todos I think | 23:06 |
ianw | ++ will review today | 23:07 |
ianw | i'm just trying to get that ask.openstack.org static site we talked about working, but getting sidetracked into fixing ansible documentation | 23:07 |
clarkb | ya I saw that change go by but haven't reviewed it yet. Is it ready for review now? | 23:08 |
ianw | not yet, still noodling on the deployment | 23:08 |
clarkb | ok | 23:08 |
*** tosky has quit IRC | 23:18 | |
*** openstack has joined #opendev | 23:49 | |
*** ChanServ sets mode: +o openstack | 23:49 | |
*** jeblairtest has joined #opendev | 23:52 | |
*** jeblairtest has left #opendev | 23:52 | |
*** jeblairtest has joined #opendev | 23:53 | |
*** jeblairtest has left #opendev | 23:54 | |
*** jeblairtest has joined #opendev | 23:56 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!