fungi | corvus: thanks for looking into it, at least we can warn folks and offer options | 00:05 |
---|---|---|
*** factor has joined #opendev | 00:22 | |
*** stephenfin has quit IRC | 04:42 | |
*** stephenfin has joined #opendev | 04:53 | |
*** calcmandan has quit IRC | 04:57 | |
*** calcmandan has joined #opendev | 04:59 | |
*** stephenfin has quit IRC | 05:02 | |
*** ravsingh has joined #opendev | 05:22 | |
*** ravsingh has quit IRC | 06:17 | |
*** sgw has quit IRC | 06:26 | |
AJaeger | ianw: https://opendev.org/openstack/requirements/src/branch/stable/stein/.zuul.d/project.yaml#L8 https://opendev.org/openstack/requirements/src/branch/stable/train/.zuul.d/project.yaml#L7 https://opendev.org/openstack/requirements/src/branch/stable/ussuri/.zuul.d/project.yaml#L6 https://opendev.org/openstack/requirements/src/branch/stable/rocky/.zuul.d/project.yaml#L7 | 06:26 |
AJaeger | ianw: I suggest to move these master only jobs back to project-config so that removal is not forgotten... | 06:26 |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul-jobs master: test-playbooks: avoid warnings with shell/command https://review.opendev.org/731605 | 07:06 |
openstackgerrit | Tobias Urdin proposed openstack/project-config master: Remove retired congress https://review.opendev.org/731889 | 07:22 |
*** DSpider has joined #opendev | 08:00 | |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:01 | |
*** elod has quit IRC | 09:02 | |
*** elod has joined #opendev | 09:08 | |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: ensure-twine: Update using same format as ensure-tox https://review.opendev.org/731854 | 09:39 |
*** elod has quit IRC | 09:57 | |
*** dpawlik has joined #opendev | 10:04 | |
*** elod has joined #opendev | 10:04 | |
*** dpawlik has quit IRC | 10:43 | |
*** tosky has joined #opendev | 11:06 | |
*** dpawlik has joined #opendev | 11:18 | |
openstackgerrit | Monty Taylor proposed opendev/puppet-openstack_infra_spec_helper master: Install hosts and group files into service location https://review.opendev.org/731583 | 11:53 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split inventory into multiple dirs and move hostvars https://review.opendev.org/730991 | 11:53 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Override bridge hostvars directly https://review.opendev.org/731258 | 11:53 |
*** dpawlik has quit IRC | 12:00 | |
*** dpawlik has joined #opendev | 12:00 | |
*** dpawlik has quit IRC | 12:03 | |
*** dpawlik has joined #opendev | 12:03 | |
corvus | fungi, clarkb: jbg in #jitsi says xmpp muc names are case-insensitive, and suggests thinking about making etherpad the same to match. | 13:15 |
corvus | of course, we have 10 years of history in there... but maybe we could do some db queries to see how many collisions there would be if we did that | 13:16 |
corvus | honestly, it's probably not a bad idea -- the case-sensitivity here isn't really getting us much benefit, and can provide minor confusion. | 13:17 |
fungi | i can run a query | 13:20 |
fungi | and yeah, we could basically just rename any with mixed case and then redirect to case-insensitive names from any case mix | 13:21 |
corvus | ya | 13:21 |
corvus | (i doubt this is something we'd want to do right before a ptg, but thinking ahead) | 13:22 |
fungi | if nothing else, getting a count of case-insensitive collisions will help inform our decision | 13:26 |
fungi | the hardest part will be fiddling with docker-compose to invoke mysqlclient ;) | 13:26 |
fungi | ahh, got it working | 13:29 |
fungi | i think there must be something special involved with getting interactive mode to work, but i can do noninteractive with -e just fine | 13:29 |
corvus | fungi: did you pass '-it' ? | 13:30 |
fungi | ahh, so there is a special flag for that ;) | 13:32 |
fungi | anyway, after reacquainting myself with etherpad's database schema, i think listing pads with the api will be more productive | 13:32 |
*** larainema has quit IRC | 13:32 | |
corvus | fungi: oh, are pad names not a column? | 13:36 |
fungi | not even remotely | 13:42 |
fungi | it wants to use something like mongodb | 13:43 |
fungi | the etherpad-lite database has a single table called "store" with two columns, "key" and "value" | 13:43 |
fungi | that's the entirety of the db schema | 13:43 |
corvus | gotcha. i was hoping it would be ('pad', 'key', 'value') | 13:44 |
corvus | that's clearly just crazytalk | 13:44 |
fungi | so it's a matter of working out what the patterns are for the key names and filtering with key like "%something%" | 13:44 |
fungi | but the rest api is entirely serviceable: https://github.com/ether/etherpad-lite/blob/master/doc/api/http_api.md#listallpads | 13:44 |
fungi | i've got that dumping to a json file now | 13:45 |
fungi | taking a while | 13:45 |
fungi | wget -qO- 'http://localhost:9001/api/1.2.13/listAllPads?apikey='$(sudo docker-compose -f /etc/etherpad-docker/docker-compose.yaml exec etherpad cat /opt/etherpad-lite/APIKEY.txt) > padlist.json | 13:46 |
fungi | for the record | 13:46 |
fungi | no idea how many there are, since we're just one api rev shy of when they recently implemented a getStats method | 13:49 |
fungi | our deployment supports up to api 1.2.13, and getStats was implemented for api 1.2.14 | 13:49 |
fungi | i was hoping we could get https://review.opendev.org/729029 in before the ptg, but timing was tight | 13:50 |
fungi | yeah, 1.3mb json file just for the list of pads | 13:55 |
fungi | parsing with python now | 13:55 |
fungi | >>> len(pads['data']['padIDs']) | 13:56 |
fungi | 58435 | 13:56 |
fungi | now to figure out what case collisions we've got | 13:57 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split inventory into multiple dirs and move hostvars https://review.opendev.org/730991 | 14:00 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Override bridge hostvars directly https://review.opendev.org/731258 | 14:00 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop cloning drupal puppet modules https://review.opendev.org/731947 | 14:00 |
fungi | 2304 collisions of two or more pad names, out of 58435 total pads | 14:07 |
fungi | so it's not going to be trivial | 14:07 |
fungi | granted, a lot of these look like user error, for example | 14:07 |
fungi | ['upstream-Institute-shanghai-2019', 'upstream-institute-shanghai-2019'] | 14:08 |
fungi | ['watcher-Boston-Meetings', 'watcher-Boston-meetings', 'watcher-boston-meetings'] | 14:08 |
fungi | ['YVR-forum-fast-forward-upgrades', 'Yvr-forum-fast-forward-upgrades', 'yvr-forum-fast-forward-upgrades'] | 14:08 |
fungi | looking at the list, we'd likely be doing our users a favor by making pad names case-insensitive | 14:09 |
fungi | lots also just have the welcome text in them. i wonder if there's a good way to cull those | 14:10 |
AJaeger | fungi: upstream-Institute-shanghai-2019 just has the default text, can you figure out via an API call whether a pad has content (at revision 0?) | 14:11 |
fungi | yeah, the getText method is what i'm using to spot check some of them | 14:11 |
fungi | it would take a while, but i could probably iterate over every padID and generate a checksum of the text, then look for checksum collisions to identify duplicate pads | 14:12 |
fungi | checksums for the various default texts we've had over time are likely to come out orders of magnitude higher than any others | 14:13 |
fungi | i've got a first attempt at that running now, will see how far it gets | 14:37 |
*** elod has quit IRC | 15:02 | |
fungi | trying again. apparently without a retry in place, the odds that one of ~60k http calls will fail is nonzero ;) | 15:10 |
*** elod has joined #opendev | 15:19 | |
mordred | fungi: doh | 15:20 |
fungi | hrm, nope, even then some calls fail, maybe broken pads. i'll just add a try/except around it, missing some pads isn't going to hurt for this purpose anyway | 15:20 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split inventory into multiple dirs and move hostvars https://review.opendev.org/730991 | 15:35 |
*** dpawlik has quit IRC | 15:44 | |
fungi | around half the pads (24136) have contents with the same exact checksum... that's our current default pad text | 15:56 |
fungi | we have a total of 29732 unique pad content checksums | 15:58 |
AJaeger | out of those 2304 collisions, how many are unique? | 16:09 |
AJaeger | so, should we delete those 24136 pads? | 16:09 |
AJaeger | guess, it's not worth the effort... | 16:10 |
fungi | there's a long tail of duplicate contents | 16:15 |
fungi | the most common text is what you see in https://etherpad.opendev.org/p/!! | 16:16 |
fungi | 24136 pads with that in them | 16:16 |
fungi | a representative of the next most common pad contents is https://etherpad.opendev.org/p/-3cNAME-OF-YOUR-BLUEPRINT | 16:17 |
fungi | so empty | 16:17 |
fungi | there are 1456 of those | 16:17 |
fungi | an example of the next most common is https://etherpad.opendev.org/p/-3Cdeploymentprocess-3E | 16:19 |
fungi | looks like a different default text | 16:20 |
fungi | there are 1132 of those | 16:20 |
fungi | then there's 400 like this one (another default text) https://etherpad.opendev.org/p/,,,,,,, | 16:50 |
AJaeger | you find some interesting URLs ;) | 16:51 |
fungi | those are just the ones sorting first | 16:51 |
fungi | 339 like https://etherpad.opendev.org/p/0Bcf9qsSUU where there's default text plus an abiword error | 16:51 |
fungi | 227 like https://etherpad.opendev.org/p/.xyz-a41837f9-5ea8-4652-a2ac-009c9 with another default content | 16:52 |
fungi | 114 with this default text https://etherpad.opendev.org/p/+43,-79 | 16:52 |
fungi | 112 which are empty like https://etherpad.opendev.org/p/18amNZslXZ but for some reason not the same checksum as the other empty ones | 16:53 |
fungi | 105 with default text plus some blank lines, like https://etherpad.opendev.org/p/0111 | 16:54 |
fungi | 53 more like that but with a different number of blank lines | 16:55 |
fungi | maybe i'll rerun this and strip leading/trailing whitespace to see if that helps condense the list | 16:56 |
*** dpawlik has joined #opendev | 17:39 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split inventory into multiple dirs and move hostvars https://review.opendev.org/730991 | 17:46 |
fungi | just realized i had also been checksumming the full json payload from the getText query, not only the data.text subfield | 18:00 |
fungi | so between that and stripping leading/trailing whitespace i expect to get even more clear results | 18:00 |
fungi | should hopefully know soon | 18:00 |
fungi | yep, that's compressed the duplicates count curve | 18:30 |
*** dpawlik has quit IRC | 18:37 | |
fungi | 24417 with just our current welcome text, 1790 are empty or contain only whitespace, 1144 and 408 with a couple different older default texts, 341 default text with an abiword error appended, 171 which consist solely of an empty bullet list entry, 23 with the first line of welcome text set as a heading style | 18:39 |
fungi | we could probably stand to delete all of those | 18:40 |
fungi | but the bigger question is, what's the overlap between that and the case-insensitive padID collisions... | 18:41 |
*** sgw has joined #opendev | 19:02 | |
clarkb | fungi: the rocket should ve flying overyou nowish | 19:27 |
fungi | ooh, better that the one yesterday | 19:30 |
fungi | too bright out to see anything though | 19:31 |
*** sshnaidm has joined #opendev | 19:40 | |
fungi | okay, so if we were to delete all those empty and default content pads, the number of actual case-insensitive padID collisions we'd have to deal with is still 504 | 20:28 |
fungi | which, yes, is more than we're going to deal with in a weekend | 20:28 |
fungi | odds are most of those are also dupes or trash, but they'll need more careful inspection | 20:28 |
fungi | i have a list | 20:29 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul-jobs master: ensure-twine: Check executable presence using shell+bin/bash https://review.opendev.org/731854 | 20:59 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split inventory into multiple dirs and move hostvars https://review.opendev.org/730991 | 21:19 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add focal support for ensure-pip https://review.opendev.org/731993 | 21:33 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add focal support for ensure-pip https://review.opendev.org/731993 | 21:49 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add ubuntu-focal testing https://review.opendev.org/731995 | 21:49 |
*** DSpider has quit IRC | 22:25 | |
*** tosky has quit IRC | 23:22 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!