Monday, 2021-06-14

opendevreviewMerged opendev/irc-meetings master: Move nova meeting to #openstack-nova
*** amoralej <amoralej!> has joined #opendev14:06
opendevreviewHervĂ© Beraud proposed openstack/project-config master: Adding missing zuul for etcd3gw
fungi#status log Restarted the ircbot container on eavesdrop to troubleshoot why opendevmeet isn't joining all its configured channels14:40
opendevstatusfungi: finished logging14:40
opendevreviewHervĂ© Beraud proposed openstack/project-config master: Adding missing zuul for etcd3gw
fungicontinuing discussion from #openstack-infra just now, to summarize, the new limnoria meetbot is joining (a seemingly random) 86 of its 88 configured channels15:03
fungilooking in i don't see any issue reports which look relevant to this problem15:03
fungiat the moment the bot has skipped joining #puppet-openstack and #starlingx at its 14:40 utc restart15:06
fungioh! the log says "Holding JOIN to None @ oftc until identified."15:11
fungii wonder if that means it didn't actually wait to start trying to join channels15:12
fungithough it also shouldn't need to be identified to join our channels now15:13
fungiso that's probably not it15:13
clarkbfungi: is it a constantly rotating set or random at start then consistent?15:17
fungiseems like random at start15:18
fungion friday when it was started for good, it skipped #openstack-neutron and at least one other channel15:18
fungihalf an hour ago when i restarted it for the first time since then, it picked different channels15:19
clarkbinteresting. I think that limnoria handles all of that and we don't do any channel joining in our plugins15:19
fungii'm currently staring at though i don't think the problem is the loop there15:20
clarkbfungi: debugging idea: what if we drop ~5 channels and restart. Do we join all of them or do we still get an off by 2?15:21
clarkbwondering if maybe the size of the list is at fault15:21
fungiyeah, i'm just hesitant to restart while there are meetings in progress since those will get truncated15:22
fungiotherwise there are a few things we could try to get a clearer definition of the behavior15:22
clarkbI'm going to start working on that reboot15:46
clarkbinfra-root: for ^ my plan is to switch the docs and tarballs RW site over to Then hold all the locks on mirror-update then reboot afs01.dfw15:47
clarkbdoes that sound correct to you?15:47
fungiyeah, just be prepared for moving those two rather large rw volumes to take some time15:51
clarkbs/also do/also move the RW volume/15:51
clarkbfungi: yup I'll run the commands in a screen on afsdb01 and use localauth15:52
fungithe smaller ones, if you want to do them, would probably go quickly15:52
clarkb`vos move -id docs -toserver -topartition vicepa -fromserver -frompartition vicepa -localauth` is an example command s/docs/project.tarballs/ and so on15:52
clarkbfungi: oh good point. Ya why don't I do these larger ones first and see what it looks like the ndecide on the smaller ones or not15:52
fungiyou could even test the waters with a small one and then do the bigger ones15:53
fungiplenty of options15:53
fungijust make sure there's sufficient space i guess?15:53
clarkbfungi: oh does it make a new copy when I do that?15:53
clarkbThere is a RO copy on afs02 already for each of these15:53
fungii don't know that it's a full copy, but it can diverge for sure15:53
fungii guess if both servers have the same amount of block storage then no need to worry15:54
clarkbafs02 seems to have more disk than 0115:54
clarkbI suspect that means we're ok15:54
fungiyeah, agreed15:54
clarkbs/more disk/more free disk/15:54
clarkbI'll do project.specs first15:54
clarkbas that should be small and give some info on what this looks like15:55
clarkb`vos move -id project.specs -toserver -topartition vicepa -fromserver -frompartition vicepa -localauth` will be run in screen on afsdb01 shortly15:56
clarkbI might just do all the project.* volumes if they don't take too long (other than probably tarballs)15:57
clarkbya specs went pretty quickly. I'll do all of project.* that isn't project.tarballs first. THen do tarballs then do docs. THen we can hold locks on mirror-update and reboot I think15:59
fungisounds great!16:01
clarkbfungi: is a related fixup16:06
clarkbto the docs16:06
fungiyep, looks like those were typos16:07
clarkbok all but tarballs are done so not too bad for those. I'm doing tarballs now which I expect to be much slower16:09
clarkber I guess docs hasn't been done either16:09
clarkbI've started to grab locks on mirror-update too16:19
clarkbI don't expect we'll be rebooting in the immediate future but didn't want it to drag on later than necessary if I was waiting for those later too16:20
corvus   (spoiler alert: real data, not just "yes")16:24
clarkbwow that graph in the top left is telling16:27
clarkb(there is a clear trend)16:27
fungiit's nice to see that our frenzied^Wcalm and orderly relocation didn't move the needle on oftc in the slightest16:32
clarkbI have all the mirror-update lock held now but the one for ubuntu mirroring16:39
clarkbtarballs is still moving its RW site16:39
clarkbfungi: disk use on afs02 has gone up slightly since I started the tarballs move fwiw16:46
clarkband now i have all of the mirror-update locks16:49
clarkbfungi: it seems to do one large join command. Could it simply be truncation due to exceeding 512 bytes in that join?17:04
clarkbhrm it seems to try and address that in the code you linked earlier17:05
opendevreviewDanni Shi proposed openstack/diskimage-builder master: Add a keylime-agent element and a tpm-emulator element.
clarkbya ok it is trying to construct the fewest number of join commands necessary. I wonder if it is losing channels on boundaries17:06
fungithat's entirely possible17:07
clarkbyup I think I see the issue. fungi that needs to include the current channel in the loop I think because we're removing it from the list and then skipping to the next channel17:08
clarkbbasically when we hit the 512 byte limit we add the previous (old) msg to the list of join msgs. Then we skip to the next channel and I think we lose the current one17:09
fungimakes sense, so we likely are splitting the channel list into 3 messages17:10
fungiand losing two boundary entries between them17:11
fungiand people with ~1/3 the number of channels aren't running into this (or maybe our channel names are longer than usual)17:12
fungii'll try to hand patch a suspected fix in and then restart the contanier when there's a lull in meetings17:12
fungiluckily not many teams like to meet on mondays17:12
clarkbfungi: I think you can copy the if key: else: conditionals above the 512 check to line 345 ish after we reset the lists to ensure that the current channel remains in contention on the next pass through17:13
clarkbHowever, that will still be broken if we hit the limit at the end of the loop17:14
clarkboh I think I see a better fix. Let me try and write that up17:14
clarkbfungi: also we are already running a small fork taht ianw modified so we can use that if we can't get stuff upstreamed17:15
fungithanks, i thought the problem was likely around that spot, but was totally not seeing it17:16
clarkb something like that maybe17:19
fungii'll give it a shot17:19
clarkbit almost seems like it is copying the entirety of projects.tarballs? we have room for it (I double checked) and docs is about 10% of tarballs size so this is the one that will give us trouble if any do17:31
fungiclarkb: okay, i've hand-patched your proposed fix into in the unpacked container tree. looks like there's no meetings underway right now so i'll restart the container17:50
fungid'oh, i should have done stop/start not down/up18:01
clarkbah down up will undo your local changes :)18:01
fungii was confused for a bit why there was no change in behavior, then realized what i'd done :/18:02
fungistill doesn't look like anyone's running a meeting, so restarting it again properly this time18:04
fungiclarkb: yep, that seems to have fixed it, thanks!18:13
clarkbyay. I guess I should make a fork on github and submit a PR18:13
clarkbgot a quick response, but they want me to add tests so trying to figure that out now18:32
clarkbfungi: do you understand the limnoria test suite? I cannot figure out how to run a single test18:54
clarkbI think I managed to hack up the actual test driver to not load extra stuff. clunky but probably good enough18:57
fungiahh, no i hadn't even tried looking at their tests18:57
clarkbtarballs has moved. docs is moving now19:02
fungihow long did tarballs take roughly?19:02
fungithankfully those servers are in the same network19:02
clarkbjust over 2 hours I think19:07
fungifar faster than i was anticipating19:08
clarkbthe config for this thing is incredibly complex and I'm having a hard time understanding hwo to construct one from scratch to get a working test :/19:23
fungialso i need to disappear for dinner in a few minutes, but will be back on later19:31
opendevreviewMerged opendev/system-config master: Fix some hostnames in afs docs
clarkbI figured out the tests for limnoria and pushed a new commit up. The doc volume is still moving20:10
opendevreviewJames E. Blair proposed opendev/system-config master: Use grafyaml container image
clarkbdocs is done moving20:35
clarkbinfra-root does anyone else wnat to double check there isn't any other volume(s) that should be moved before I reboot
clarkbI've realized that I think afs02.dfw never had a RO version of docs, but that should be fine, it just uses a bit more disk on afs02 now as a result of the RW move (and RO remain on afs01.dfw and afs01.ord)20:38
clarkbfungi: if you get back from dinner ^ can you check the listvldb state and decide if you think we're ready for a reboot? I need to step out for a few myself so will wait for your input I think20:39
clarkbI'll give it to the top of the hour then proceed with the reboot if i don't hear back. Want to make sure I can try and get things back the way they were post reboot too20:55
clarkbalright I'm going for it now21:00
clarkbreboot is done and bos status reports currently running normally21:01
clarkbI'll undo the volume moves so that we don't have a mixed set (will make future reboots easier if we don't have to move stuff to reboot 02)21:02
clarkband I'll release my mirror-update locks21:02
clarkball of projects.* is moved back against except tarballs. Tarballs is in progress now. When that is done docs is the last one to move back21:09
fungisorry, back now and catching up21:51
fungiclarkb: for me `vos listvldb` shows the docs volume is currently locked but with rw site on afs02.dfw, most project.* volumes are still showing rw is on afs01 instead (though project.tarballs shows afs02)21:55
fungior most are once again showing afs01 i guess, rereading what you wrote21:55
fungii guess it went smoothly?21:56
fungiianw: if you're around, tl;dr is probably worth following22:04
fungiit's already merged, but we'll want to update our fork so we can properly replace the hand-patched situation22:05
ianwo/ just reading now :)22:05
fungicame up early in my day when it was reported that #openstack-neutron lost channel logging friday22:05
ianwthe only thing we have my fork for is setuptools with , i'm not sure what extra they want there, can give a ping22:07
ianwbut i'll pull in that fix and we can quickly rebuild the container22:07
fungialso on the oftc migration front, is finally ready to merge thanks to help from an old friend22:09
corvusinfra-root: mind if i set the avatar image for the portal room to the opendev logo?22:11
fungicorvus: not only do i not mind, i'm entirely in favor--thanks for noticing!22:12
corvusalso, lol underscores got eaten :)22:13
JayFI actually suggested matrix for a new internal openstack contributor today :) Knocking down that "now setup an IRC bouncer" wall22:13
fungiunderscores schumnerscores22:13
JayFSo thanks for making so much noise about that during the battle of Freenode22:13
fungiheh, "the battle of freenode" has a nice ring to it22:14
JayFI heard someone refer to it as "Fleenode" today, but I figured that might be a /little/ too direct for this channel :P22:14
corvusJayF: ++ i made some instructions for zuul for new users; should be suitable for s/#zuul/#something/
corvusJayF: direct link to the page as it will render:
fungii want to write a matrix mud bridge22:15
corvusalso -- lol sphinx ate that url :)22:15
fungithe sphinx hungers22:16
JayFcorvus: thanks for that; I shared it.22:16
corvusi mean, i like that the instructions can be as simple as "go to" and follow the prompts, but everyone has different learning styles, so step-by-step with screenshots can be useful22:17
JayFMy instructions were something along the lines of "google for `matrix oftc`" so yours are much better :D 22:18
fungisome people even find videos useful, i have no idea how they cut and paste from them, but not judging22:18
opendevreviewIan Wienand proposed opendev/system-config master: ircbot: update limnoria
dmsimardfreenode staff took over #ansible last weekend :/22:23
JayFapparently ##linux was taken over a couple days back22:23
JayFat least per reddit r/linux22:23
dmsimardyeah about the same time22:23
ianwfungi: i pulled both changes into my setuptools branch, 796323 will rebuild the image.  thanks so much to yourself and clarkb running this one down!22:24
mordredI really don't understand what the freenode takeover endgame was22:25
dmsimardbold of you to assume there is one22:26
fungimordred: scorched earth. rebuild irc v4.0 on the ruined rubble of the former civilization?22:26
mordreddmsimard: I know silly me22:27
fungibut yeah, occam's razor suggests the real driving force was stupidity22:27
opendevreviewSteve Baker proposed openstack/diskimage-builder master: Add a growvols utility for growing LVM volumes
ianwclarkb: if you have a sec, could you look over to make sure i didn't miss anythign else with the eavesdrop deploy job22:31
corvushere's what this channel looks like in element now:
corvusat the suggestion of mordred i also set up some private "spaces" -- you can see those on the left (Ansible, Zuul, Services).  those are just for my own organization, but public spaces are also an option; that's something we could think about setting up for folks.  like an "ironic" space...22:33
corvusand, the bridge is officially operational, so i have joined  \o/22:34
mordredwoot! 22:34
ianwclarkb: also i moved the db out of /home/gerrit2/review_site as it's not really gerrit data in .  but that's the only change since your last review22:34
mordredcorvus: your screen is wider than mine :)22:35
corvusmordred: it doesn't normally run that wide; i just made it so so i could show everything in one screenshot22:35
mordredI have also now joined #ansible:libera.chat22:35
mordred(the link worked great - thank you)22:36
mordredalso - thanks for adding channel logo! I kept meaning to mention that :)22:36
corvusit really helps once you get a bunch of channels22:36
mordredcorvus: I did not get a message from a libera bridge status bot22:37
mordredcorvus: do you happen to know what its name is so I can do nickserv password stuff?22:38
corvusmordred: i did not either; i found it in the users list (at the top, admin level perms) and started a dm with it22:42
corvusmordred: it's @appservice:libera.chat22:42
corvusi used "!storepass",  "!username", then "!reconnect" commands to establish my nick setup22:43
mordredand it's using sasl22:45
opendevreviewMerged opendev/system-config master: Use tmpfiles.d to create /var/run/reprepro
clarkbianw: no problem, the upstreamfor limnoria seems to be pretty responsive which is nice. THe test suite is weird but once i sorted it out things went well23:11
clarkbianw: I'll look at those chagnes shortly too23:11
clarkbas soon as the docs volume isn't locked I will move it then everything will be back to normal with afs23:12
clarkbfungi: ya things went smoothly so far23:13
ianwi'll move that nodepool-builder volume mount to a directory just for consistency23:15
ianwmy main thinking was that he container dir should never need to be exposed outside the container, and should be small23:16
clarkbthis docs volume is locked a fair bit23:23
fungiit gets written to a lot23:24
clarkbI wonder if I made it cranky by adding a third fileserver to it23:24
opendevreviewIan Wienand proposed opendev/system-config master: nodepool-builder: add volume for /var/lib/containers
clarkbI didn't realize it was only afs01.ord and afs01.dfw prior to moving RW to afs02.defw23:24
fungiclarkb: it's probably just that every single change which merges for openstack writes to it with a post pipeline job23:24
fungibut also yes i suspect that'll need more space on afs02.dfw23:25
fungitarballs was the same way, right?23:25
fungior maybe we have three ro replicas of that one23:26
clarkbtarballs was already doing three I think23:27
fungiahh, yep23:27
clarkbfungi: docs is relatively small compared to the mirrors and tarballs I don't think we need more disk to do the move23:27
clarkbI checked that after you asked about it and we seem to have enough free headroom23:27
fungimakes sense23:28
clarkbwe have about 2x the necessary room for a tarballs move too23:29
clarkbianw: all three of those updated changes lgtm23:30
clarkbagenda has been sent23:49
clarkbstill waiting on the docs volume to be unlocked23:50
ianwi feel like we should go with because we're current in hand-edited ephemeral mode, right?23:51
clarkbianw: ++ let me review it23:54
ianwhopefully my comment on the setuptools pull request might remind someone23:55
clarkbyour fork's branch has the two fixes I expect. I have approved it23:55
fungilooked that way to me anyway23:57
ianwi think it will pull automatically, hopfully the deploy job matches now23:57

