Tuesday, 2020-06-02

*** hamalq has quit IRC00:37
*** diablo_rojo has quit IRC03:11
*** diablo_rojo has joined #opendev-meeting04:38
*** hamalq has joined #opendev-meeting04:49
*** hamalq has quit IRC05:15
*** diablo_rojo has quit IRC10:08
*** hamalq has joined #opendev-meeting16:31
*** hamalq has quit IRC16:32
*** hamalq has joined #opendev-meeting16:32
*** diablo_rojo has joined #opendev-meeting16:34
clarkbHave we ended up with people being able to meet today?19:00
clarkbI've been distracted with parental duties (yay birthdays)19:00
clarkbif more than just myself raise their hands as being around I can do a quick informal meeting catch up thing19:02
fungioh, indeed, i forgot19:02
fungii was going to try and get a nap in before 04:00, since there's also more openstack tc sessions at 13:00 i wanted to cover19:02
corvuso/19:03
fungibut i don't need to nap just yet ;)19:03
corvusi'm around, but don't have much to say19:03
clarkb#startmeeting infra19:03
fungii guess we could talk about etherpad lower-casing, but that's not especially urgent19:03
openstackMeeting started Tue Jun  2 19:03:35 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:03
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:03
*** openstack changes topic to " (Meeting topic: infra)"19:03
openstackThe meeting name has been set to 'infra'19:03
clarkbif nothing else we'll record that we had nothing to say19:03
clarkb#topic Announcements19:04
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:04
clarkbThis week the PTG is happening so we are a bit distracted19:04
clarkbfor that reason we'll have a shorter less formal meeting19:04
clarkb#topic Open Discussion19:04
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:04
clarkbMeetpad's room urls are case insensitive due to xmpp limitations19:06
clarkbthis has created a small amount of confusion with the mapping onto etherpad urls as ehterpad urls are case sensitive19:06
corvusfungi looked at the database and it looks like people get confused by that all the time19:06
AJaegero/19:06
clarkbthe workaround we are using is to use lower case urls in both and renaming pads if necessary19:06
fungiahh, yeah, so i did a bit of analysis on what it might look like if we wanted to lower-case all pad names (and presumably set up redirects)19:07
corvusso it seems like making etherpad case-insensitive in general would solve this jitsi issue for the future as well as prevent some etherpad-only mistakes19:07
fungiwe have a bit over 2k pads (out of roughly 60k) which would need to have case-insensitivity collisions resolved19:07
fungihowever, etherpad also has a great feature where if anyone connects to a new pad it saves that initial revision with just the intro text19:08
corvusi probably am responsible for 5k empty pads :)19:08
fungiso i did some comparisons of checksums of all the pad contents and found that if we removed those and also pads which are blank, that leaves more like 500 we'd need to look through19:08
fungistill a lot, but not insurmountable19:09
corvusbrainstorming how to resolve collisions: we could rename one of them to something with a suffix (eg "-case") and ... if it's not too hard to edit via the api... prepend a note at the top saying "if you're looking for $otherpadname it has been moved to $newurl" ?19:09
fungiif nothnig else, we might take the checksum comparisons as a good opportunity to clean up the db. roughly haf of our pads are blank or contain just the intro text19:09
corvusfungi: if we cleanup the blank pads, we could cron that weekly or something too19:10
fungiyeah, well, intro text only pads for sure19:10
clarkbwe can delete pads via the api right?19:10
clarkbso that bit at least should be straightforward19:10
fungii'm not 100% sure removing blank pads is a good diea, because it's possible someone blanked them in a fit of vandalism, and if we delete them then we can't get them back (excepting from our database backups)19:11
fungibut there's fewer of those19:11
corvussorry i meant intro19:11
fungithe ones which are all intro text are obviously fair game for sure19:11
fungii also found a surprising number which are intro text plus an abiword error message19:11
fungialso we have something like 5 variations of intro text floating around as it's changed over time19:12
corvusthere's an "appendText()" method, as well as "setText()"... so adding a redirect message seems plausible19:12
fungibut basically anything over 20 identical checksums (after stripping leading/trailing whitespace) is trash, i verified the texts manually19:13
corvusi'm not sure how that deals with formatting (perhaps the setHTML() method would be necessary?) but it's at least something to look into19:13
corvusany other brainstorms about how to resolve the conflicts?19:14
fungias for the actually empty pads, i could probably do a bit of analysis on revision count. most of them probably just deleted the intro text and that was it19:14
fungii mean, chances are a lot of the remaining 500 name collisions are also trash, i just haven't had time to take a look19:15
corvusyeah, but if more than like 20 of them are real, it may be easier to automate the whole thing19:16
corvus(after all, if we automate it, and nobody notices, it's no big deal :)19:16
clarkbcorvus: other ideas if they are the result of people mistakenly using case improperly we could merge them somehow and keep the lower case version going forward19:17
clarkbif they are distinct then your idea seems sane19:17
corvuslike concat them?  yeah19:17
corvusoh, i guess fungi implied an option that i didn't quite pick up on too --19:18
clarkbya concat is probably simplest19:18
corvusrename one pad, and add an .htaccess entry for that one19:18
corvus(that does a redir)19:18
corvusthat would work for people visiting etherpad.o.o directly, but wouldn't address confusion for folks arriving via meetpad19:19
clarkbright I think we want to force teherpad to do lower case too?19:20
clarkbat least that was what I was assuming we wanted then it would avoid confusion there and mismatched behavior with jitsi19:20
fungiyeah, i wondered if we should make etherpad just redirect to lower-case padnames (if that's possible)19:20
AJaegercan we ensure that future new pads are all lowercase?19:20
corvusyeah, i think in all cases, we have etherpad redirect to lower case19:20
fungithat avoid people creating new problem pads19:21
clarkbwe can enforce that with apache19:21
clarkb(I think)19:21
corvusthen the question is for conflicts, do we a) move and add a note to the pad (optional: add a specific redirect for the moved pad); b) concat.19:21
corvusclarkb: yeah, would be a simple mod_rewrite redirect19:22
corvusfungi: do you have a list of collissions?19:22
fungiyep!19:22
fungii didn't post it publicly since i don't know if anyone was relying on some random pad names to not be discoverable19:23
corvusif we remain concerned about that, that may eliminate the idea of having a .htaccess list for specific pad name redirects19:23
fungiit's also just a python script i can rerun to regenerate, but takes around an hour due to the number of queries19:24
corvusfungi: ~fungi/collisions.yaml ?19:24
fungichecking, but if that's got 504 entries then yes19:24
fungiyeah, that looks like it19:25
fungithat's the collisions which would remain if we cleaned up empty and intro text pads19:25
corvusah nice, there's some linkfarm spam there19:25
fungianyway, just wanted to strike up the discussion when it wasn't a weekend19:28
fungiscripting stuff against the etherpad rest api is not hard, and it's well-documented19:29
fungiso we could certainly consider periodic cleanup by checksum, for example19:29
corvusspot checking these, i feel pretty confident that only one of the two of each of these is going to be important19:29
corvusso far, they're either both linkspam, or one was clearly "the wrong one"19:29
fungithat's my suspicion as well, they just weren't going to be as easy to mechanically identify19:29
clarkbcool in that case we should be able to delete the bad one, then rename if necessary, and set up redirects in apache?19:30
corvuswe could use a simple rubric to determine the "better" of the two to rename19:30
fungiso my hope is that we wouldn't need any fancy per-pad redirects or breadcrumbs19:30
corvuswell, if we go through all 500 -- do we want to?19:30
corvusi guess if we got a bunch of folks doing it, we could probably knock it out pretty quickly19:31
fungii suppose we could do it in batches (the rest api would still let us get to the redirected originals19:31
fungiso we could still set up the mass redirect and renaming while we worked through the collisions19:31
fungibut we'd presumably want to go through the cleanup first, at least before bulk moving19:32
corvushere's what i'm thinking: if we want to delete one of the pads (or rename it to a non-public name), we'll need to go through manually and identify which one to keep.  but if, instead, we went with one of the options above (concat or rename with link) we could use a simple length rubric to 'guess' which is the best one, so we can make that the one that people land on by default.  essentially,19:32
corvusrename that to be the lowercase one if it isn't already.19:32
clarkbfor link spam we might be able to identify those based on content? eg just a list of urls?19:33
corvusprobably so?  that might prune it a bit more19:33
fungiplan sounds reasonable, also yes spotting pads which are just lists of urls may also be scriptable19:33
corvusalso, some of the linkspam may actually have identical content19:34
fungiyeah, some does, you can find the checksum analyses in checksums.yaml19:35
corvusokay, maybe we can putz with this the rest of this week and see if we can prune the list a bit, then send an email out next week with a suggested plan19:36
fungiwfm19:36
fungii at least wanted to be sure this was something we felt we ought to do19:37
clarkbya it seems doable19:37
clarkband considering people were having collision issues previously seems like a good idea meetpad or not19:37
fungia bunch of the url-heavy examples i'm looking at don't seem to be linkfarms for search engine purposes19:38
fungithey instead seem to be mazes of link obfuscation and url proxies19:38
corvusyeah, i was a little ambivalent about doing it just to "fix" meetpad, but i'm increasingly convinced it's a Good Idea19:40
corvusfungi: yeah, there's some interesting stuff in there, the purpose of which i don't fully understand19:40
corvusaww, i just found someone's gerrit http password :/19:41
corvusclarkb: i think we may be out of topics :)19:49
clarkbagreed19:49
clarkb#endmeeting19:49
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"19:49
openstackMeeting ended Tue Jun  2 19:49:27 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:49
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-06-02-19.03.html19:49
clarkbthank you everyone19:49
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-06-02-19.03.txt19:49
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-06-02-19.03.log.html19:49
clarkbwe'll be back to our regularly schedled programing next week19:49
fungithanks clarkb!19:50

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!