Wednesday, 2021-07-14

corvusyeah.  my opinion is we should either try to get it in shape or discontinue it because it's an attractive nuisance.00:00
corvusas a user of the system, i'd really like to have one well-defined channel for announcements :)00:00
clarkbunfortunately I think we have subsets of users that would prefer twitter, others that would prefer a mailing list, others that want wechat and so on00:01
corvusthat is a separate question that i don't think needs to be answered now :)00:02
clarkbsure. I agree that a single well known channel is the ideal00:02
clarkbI also think that making it interrupt drive and not poll based is important00:02
corvusso let me rephrase that: as a user, i would like one well-defined channel for email service announcements :)00:02
clarkbwith poll based systems people will only check after the fact and that is often too late00:02
ianwsome sort of dismissable MOTD type box on review.opendev.org might be good.  maybe there's a plugin00:02
corvusthere used to be some js to load the status alert from there, but i'm pretty sure it got lost in all the upgrades00:03
corvusbut that's really part of the "another channel" discussion unless you want to stop sending emails00:04
corvusso to make this tractable, i'm only suggesting that if we use email to disseminate service announcements, we have a well-defined policy for where we send those announcements :)00:04
clarkbcorvus: given the current situation for thta well defined location do you have any ideas for making it a better subscribed list?00:05
corvusthat would make it easier for us as admins, and easier for us and all other users because no one has to guess where they should be looking to find this info00:05
clarkbone option is that we auto sbscribe people00:05
clarkb(but then you end up being a spammer)00:05
clarkbBut I guess if we sent the mailman you're subbed email with an unsub option we'd technically be following the rules around spam?00:06
corvussubscribing everyone on "all the lists" to that list so that they will get emails you were going to send to them anyway doesn't sound like spamming to me00:06
clarkbya I think the key is that there be a clear way to unsub which there would be00:07
corvusi don't love the option, but i don't think it's evil or wrong00:07
corvusor, we send out a couple of reminder emails to everyone and call their bluffs by not sending any more service emails to lists00:08
clarkbya that might be a good first step. Then we can see what membership looks like and how much it moves00:08
clarkbfwiw I think we have been using the -announce list a fair bit lately. But the review move did go to the whole set00:09
ianwi agree i should have sent the downtime to service-announce, mea culpa for not following that policy sorry00:09
clarkbOFTC move, default nodeset change, and git review list have all gone to the announce list in the last few months.00:10
clarkbI sent ELK deprecation/removal discussion to the openstack discuss list only as they are the primary (only?) users00:10
clarkbthat said its definitely a small list membersip last I checked00:10
clarkbI want to say it is less than 20?00:10
clarkbit has been a while since I looked though00:11
ianwi think i confused myself because i've filtered service-announce into a generic infra folder00:11
clarkbya I don't think it is an indication of a change in policy just a mistake. We can continue to stick to -announce. Remind people that is where we are and they should be there too. This puts my draft email in a weird spot because the coverage for that list is small though00:13
clarkband maybe a gerrit motd thing would be a better spot for that anyway. I'll have to think on this a bit more00:14
corvusclarkb: if we do want to keep service-announce around, then maybe add a note to your email saying "please subscribe to service-announce to something something something"00:14
clarkbcorvus: that is a good idea00:14
corvusclarkb: not to get too much into the weeds on that, but while i think it's useful for some things, i don't think it's a good match for this (which is indefinite in terms of time).00:15
corvusi don't think we could leave a motd up for more than a day and still have it be effective00:15
corvus(people learn to skip over things like that pretty quick :)00:16
clarkbya the best thing would be a giant warning over the delete emails button :)00:16
corvus++00:16
corvusthat's actually not a bad interim idea if the gerrit fix takes a while00:17
clarkbcorvus: I added a paragraph to the draft pusing people towards service-announce if you want to take a look at that00:17
clarkbcorvus: ya we might be able to make that as a simple patch in our builds too. Worth investigating00:17
ianwi wonder if that could be done with css00:17
clarkbI was hoping to look into the actual fix tomorrow too00:17
corvusclarkb: update lgtm00:19
clarkbI'll think about this overnight and try to look at the gerrit options more closely tomorrow. But now I need to transform soaking wet children (letting them play with the hose was a bad idea) into dry chidlren for dinner00:19
ianwdoes anyone remember the file we put in on bridge to stop ansible runs?  i'm drawing a blank00:21
clarkbianw: there is a script to run to do it00:23
clarkbianw: https://docs.opendev.org/opendev/system-config/latest/bridge.html#running-ansible-on-nodes00:24
clarkb`disable-ansible` and you give it a reason string00:24
ianwahh, that's it, thanks00:24
ianwalright, i've pruned and reworked https://etherpad.opendev.org/p/gerrit-upgrade-2021 and i think it's pretty complete02:00
opendevreviewGhanshyam proposed openstack/project-config master: Use publish-to-pypi-stable-only template for deprecated repo  https://review.opendev.org/c/openstack/project-config/+/80055802:16
opendevreviewchzhang8 proposed openstack/project-config master: register and  bring back tricircle under x namespaces  https://review.opendev.org/c/openstack/project-config/+/80074303:51
*** ysandeep|away is now known as ysandeep03:59
opendevreviewIan Wienand proposed opendev/zone-opendev.org master: Add paste.opendev.org CNAME  https://review.opendev.org/c/opendev/zone-opendev.org/+/80074404:31
ianwok i fixed the problem with paste.opendev.org taking 2+ minutes to post a new paste; the clue was it only happened when i imported the db04:34
ianwturns out, i don't quite know why, after you put in a new paste, it looks it up and the query has "WHERE pastes.user_hash = 'f845053720d2963a3087e1bdfac6c62630a09451'"04:34
ianwwell pastes.user_hash wasn't indexed and with all the records it was reading the whole db back.  so i did a CREATE INDEX user_hash ON pastes(user_hash); and now it works04:35
ianwi'm going to cut it over, to avoid the old and new server going out of sync04:36
ianwi'm guessing somehow the ancient trove db it was connect to uses different table types, or indexes, or something else and why this isn't seen there04:36
opendevreviewIan Wienand proposed opendev/system-config master: lodgeit: correct database path  https://review.opendev.org/c/opendev/system-config/+/80074504:41
opendevreviewMerged opendev/zone-opendev.org master: Add paste.opendev.org CNAME  https://review.opendev.org/c/opendev/zone-opendev.org/+/80074404:43
opendevreviewMerged opendev/system-config master: lodgeit: correct database path  https://review.opendev.org/c/opendev/system-config/+/80074505:38
*** bhagyashris_ is now known as bhagyashris|ruck06:25
*** ysandeep is now known as ysandeep|afk06:51
opendevreviewMerged openstack/project-config master: Create repo for Hashicorp Vault deployment  https://review.opendev.org/c/openstack/project-config/+/79982206:59
*** amoralej|off is now known as amoralej07:03
ianw#status log paste.openstack.org migrated to paste.opendev.org07:06
opendevstatusianw: finished logging07:06
opendevreviewchzhang8 proposed openstack/project-config master: register and bring back tricircle under x namespaces  https://review.opendev.org/c/openstack/project-config/+/80075008:07
*** ykarel is now known as ykarel|lunch08:53
*** ysandeep|afk is now known as ysandeep09:53
*** ykarel|lunch is now known as ykarel10:16
*** dviroel|out is now known as dviroel11:24
*** amoralej is now known as amoralej|lunch13:01
*** ysandeep is now known as ysandeep|PTO13:31
opendevreviewDmitriy Rabotyagov proposed openstack/project-config master: Add Vault role to Zuul jobs  https://review.opendev.org/c/openstack/project-config/+/79982513:41
*** amoralej|lunch is now known as amoralej13:46
opendevreviewGhanshyam proposed openstack/project-config master: Use publish-to-pypi-stable-only template for deprecated repo  https://review.opendev.org/c/openstack/project-config/+/80055814:24
clarkba second red hatter has orphaned their account14:36
*** ykarel is now known as ykarel|away14:36
clarkbefoley: ^ is there a general directive or instruction at red hat to make these updates? I'd like to get ahead of that if so because the gerrit bug makes that process very tricky14:36
clarkbcorvus: I think I'll go ahead and send out my draft to the various lists now as well to try and get ahead of this14:37
clarkbcorvus: are you comfortable with that now that I have added the bit about joining the announce list?14:37
corvusclarkb: ++  (i think i said so yesterday, if not, sorry!)14:45
corvusdefinitely not trying to hold up the process14:45
clarkbcorvus: ah yup you said update lgtm yesterday14:45
clarkbI'll get that out momentarily. Thanks for the checking14:45
clarkbcorvus: I can leave it off of the zuul list if you prefer too14:47
corvusclarkb: feel free to send it to the zuul list14:48
corvusmy concern is long-term, not short-term :)14:48
clarkbok14:49
*** dviroel is now known as dviroel|lunch14:50
clarkbemails sent14:57
efoleyclarkb: No directive, hopefully just an unfortunate coincidence.15:04
clarkbefoley: ok probably coincidence then. Still important to help others avoid this if we can.15:05
clarkbefoley: ianw mentioned that you tested things on the staging server and it went well. I guess we just wait for the downtime now :)15:05
*** dviroel|lunch is now known as dviroel15:39
*** amoralej is now known as amoralej|off16:11
ildikovHello16:19
ildikovI'm reaching out about an etherpad challenge16:20
ildikovI've been experiencing a lot of reconnects with it and I also work with the StarlingX community and they seem to hit this issue a lot too16:20
ildikovI mentioned the duplicate tab open with the same pad as root cause, but it wasn't the case16:21
ildikovThis is one of the pads where I know people had issues with: https://etherpad.opendev.org/p/stx-status16:21
ildikovso I wonder if there might be something on the server side or it's mainly client issue?16:21
ildikovAnd seeking for help and guidance here if anyone's around to help out :)16:21
ildikovThanks in advance!16:21
clarkbthe other issue we often see (other than the duplicat tab issue) is network connectivity problems16:21
clarkbetherpad is quite stateful and if it loses that connectivity it gets out of  sync, then when you try to make an update it complains and forces you to reconnect (and resync the state aiui)16:22
ildikovdoes it matter how big the etherpad has grown?16:23
clarkbwe have also seen that cause problem too. Not sure if that manifests as reconnection issues though16:24
clarkbfungi might recall but isn't around this week16:24
ildikovok, I could ping him next week about that16:24
ildikovit's not that urgent, just wanted to ask as multiple people were reporting this behavior and were upset about it16:25
clarkbI was just able to reproduce the duplicate tab problem and the error we get on the server side appears to be "Error: Can't apply USER_CHANGES, because Trying to submit changes as another author in changeset $stuff"16:26
clarkbwhich we can check against16:26
clarkbildikov: you are the first to report it to us :) I would encourage people to talk to us as we can't know otherwise16:26
ildikovclarkb: will do my best to remind them! :)16:26
clarkbBut also keep in mind these services are volunteer run as best effort and our team needs help. Right now much of the focus is on tools like gerrit and zuul and we don't (at least I don't) have much time to go and debug etherpad16:27
ildikovthe meeting that the etherpad I linked above is used happened 2 and a half hours ago hitting this problem16:27
ildikovI saw the person's browser tabs and it was the only instance of the etherpad there16:27
clarkbildikov: did it happen when tehy started to use the pad or were they able to update the pad a bit, then later it happened?16:28
ildikovclarkb: yes, I was rather looking for information at this point, like it's a known problem, it's not but we might be able to check if it's a server side problem after all, etc16:28
clarkbildikov: yes to which thing? :)16:29
ildikovclarkb: yes to it's not a priority for this small community right now :)16:29
clarkbfwiw as far as I can tell I only experience these problems if I have the pad open in multiple tabs or experience a network connection problem. Disconnects or resuming from suspend etc16:30
clarkband they are expected in those instances as the software just doesn't handle those cases16:30
ildikovclarkb: I think it kept on happening, I mean when they started and then after that too16:30
ildikovok, noted, I will share this information with the community to keep in mind16:30
clarkbildikov: if/when it happens again it would be great to double check network connectivity as well as any other open tabs. Note that if you open it in another tab in the same browser but a different window that would still cause problems.16:31
clarkbbut also if people are experiencing those problems it is usually easiest to debug them without playing telephone16:31
ildikovwithout playing telephone?16:32
*** marios is now known as marios|out16:33
clarkbildikov: the game of telephone is where you say a message to one person who apsses it along to another and so on until the end of the line and then you compare what the message is like at the end16:33
clarkbit is very difficult to debug these problems when the people that have them don't reach out to us directly16:33
clarkbildikov: at 14:06:09 and 14:06:19 I see what I believe is the duplicate tab related error message on the server immediately followed by clients entering stx-status16:39
clarkbunfortunately etherpad doesnt' record the pad that triggered the error so it is hard to say for sure that occurred against stx-status16:39
clarkbI see similar against at 14:14:1016:41
clarkbseparately on other pads including stx-release I see a number of errors from Safari users (and I want to say safari is also known to have problems with etherpad?)16:42
ildikovah ok, noted16:43
ildikovthat's good info!16:43
clarkb'Uncaught Error: applySubmittedChangesToBase: no submitted changes to apply' and 'Uncaught TypeError: Cannot read property \'setStateIdle\' of null' are errors that seem to be safari specific16:43
ildikovI will pass on the browser requirements and keep reminding people to not have the same etherpad open at multiple places16:43
clarkbildikov: and if it continues to happen if we can have them pop in here for live debugging while it happens that would be great16:44
clarkbthen we can do things like check other pads (if it happens against a new random pad then it is really unlikely to be a duplicate tab issue and we can probably see the more specific logs that way too)16:44
clarkbI need to pop out now for a bit ot get some exercise in. Back in a bit16:45
ildikovclarkb: cool, will do!16:49
ildikovcould try to have a debug session while the next StarlingX community call is happening16:49
ildikovI will send out some hints to people in the meantime16:49
ildikovthanks for all the help!!16:50
*** timburke_ is now known as timburke18:12
opendevreviewVishal Manchanda proposed openstack/project-config master: Retire django-openstack-auth  https://review.opendev.org/c/openstack/project-config/+/80045918:21
mnaserclarkb: "clarkb-db-tester" is a vm launched a while back (8 months ago) on our cloud, is that something you still need?19:22
clarkbmnaser: I don't think so. Do you want me to delete it?19:23
mnaserclarkb: i can if it's easier for you :)19:23
clarkblet me login really quickly and just double check it and then I should be able to delete it19:24
clarkbmnaser: done. I notice in the other region I've still got the shutoff openstackid expansion instances. We're starting to look into the future of hosting that service and I'm 95% sure I can delete those too. I'll put that on my todo list19:28
mnaserclarkb: great thank you19:39
clarkbno thank you for all the great support! :)19:40
opendevreviewClark Boylan proposed opendev/system-config master: Push a patch to try and prevent gerrit openid deletion  https://review.opendev.org/c/opendev/system-config/+/80083220:28
clarkbI wrote some java ^ I have no idea if that is anywhere close to working but I'm hoping that change will give me a test node that I can iterate a bit better on than locally20:29
clarkband a hold has been set on the gerrit 3.2 run job20:30
clarkbIn other news I've learned some things about building gerrit and using bazel.20:30
opendevreviewClark Boylan proposed opendev/system-config master: Push a patch to try and prevent gerrit openid deletion  https://review.opendev.org/c/opendev/system-config/+/80083220:52
clarkbI did bash poorly20:53
opendevreviewGhanshyam proposed openstack/project-config master: Temporarily add official-openstack-repo-jobs for retired repo  https://review.opendev.org/c/openstack/project-config/+/80084021:29
clarkbdoes anyone understand how I used patch wrong in https://32765642782fd47cbefb-da3f8077deb18415bd477eebc664cf39.ssl.cf5.rackcdn.com/800832/2/check/system-config-build-image-gerrit-3.2/f64cf33/job-output.txt ? it says it can't find the files but I set -d and those files exist relative to my loacl checkout21:36
opendevreviewMerged openstack/project-config master: Use publish-to-pypi-stable-only template for deprecated repo  https://review.opendev.org/c/openstack/project-config/+/80055821:42
clarkbapparently I need to use git diff with --no-prefix21:43
opendevreviewClark Boylan proposed opendev/system-config master: Push a patch to try and prevent gerrit openid deletion  https://review.opendev.org/c/opendev/system-config/+/80083221:44
clarkbianw: comment on https://review.opendev.org/c/opendev/system-config/+/797564 about why it is failing CI. Should be an easy fix. Let me know and I can probably push up a fix for it though21:56
clarkbianw: comment on https://review.opendev.org/c/opendev/zone-opendev.org/+/798244 as well. I think I've reviewed the chagnes for gerrit stuff. Please let me know if I have missed any21:57
clarkbianw: I added some thoughts to upgrade steps 11, 15, and 17 too if you want to take a look at those21:59
clarkbSeems like it is raelly coming together. THanks for all the work on this22:00
*** dviroel is now known as dviroel|out22:01
ianwi woke up at 3:30am and for some reason instantly had the thought that for the dns update, if i force merge the update review.o.o -> review02 the DNS apply handbook is still probably going to want to pull the opendev zone from review.opendev.org22:09
ianwhrm, so actually it clones from opendev.org per https://opendev.org/opendev/system-config/src/branch/master/inventory/service/group_vars/dns.yaml#L122:11
ianwso i guess that means we need to order replication before DNS updates22:12
clarkbgood catch22:24
clarkbianw: you could manually copy the replication config over from the old server to the new one too so that we aren't coordinating ansible stuff for that22:24
clarkbas a shortcut22:25
ianwyeah, i'll go through that and flesh it out22:26
clarkbhrm I'm still failing at using patch properly.22:28
ianwclarkb: how did you find the second account on that other issue that came up?22:29
ianwoh; i was grepping in external-id's branch22:29
clarkbianw: I grepped for the email addresses that they shared22:29
clarkbgit grep emailaddr; git grep otheremailaddr22:30
clarkbsince those are in the files along with the accountIds22:30
ianwbut not in the external-ids branch right?22:31
ianwohhhh, hang on, i see22:32
clarkbyes in refs/meta/external-ids22:32
ianwi didn't have this users @old-work.com email 22:32
ianwnow i see that has come back with the high account id22:32
ianwok, that makes sense.  so i'll add to my notes on review02 to also update this account22:33
*** dmellado_ is now known as dmellado22:34
opendevreviewClark Boylan proposed opendev/system-config master: Push a patch to try and prevent gerrit openid deletion  https://review.opendev.org/c/opendev/system-config/+/80083222:40
clarkbI think it shoudl work now as I managed to replicate the issue locally22:40
melwittdoes anyone know which component in https://docs.opendev.org/opendev/system-config/latest/logstash.html applies tags (like "console", etc) to the log events?23:03
clarkbmelwitt: it is a combo between the logstash ruleset and the log worker processes. Let me find some links23:04
melwittthanks.. I've been looking at gearman worker and client in puppet-log_processor but I'm either missing it or it's somewhere else23:06
clarkbhttps://opendev.org/openstack/logstash-filters/src/branch/master/filters/openstack-filters.conf that is the logstash ruleset and it parses things out of hte messages like the severity and timestmap etc23:06
melwittthe tl;dr is I'm slowly learning about logstash and I noticed all of our logstash entries showing in kibana are tagged with _grokparsefailure which means something in the logstash-filters failed to parse (IIUC) and I'm trying to send a known bad console log file through to see what happens and I'm failing at that haha23:07
clarkbmelwitt: https://opendev.org/opendev/base-jobs/src/branch/master/roles/submit-log-processor-jobs/library/submit_log_processor_jobs.py#L131-L17123:08
melwittI realized I need to use a file pusher to send the file lines to logstash and I know we have the log_processor for that. it'd be nice if I could just use filebeat somehow but I realized I don't understand what actually applies tags like "console" and "oslofmt" as these are used in openstack-filters.conf23:09
melwittahhh thank you23:09
clarkband it reads https://opendev.org/opendev/base-jobs/src/branch/master/roles/submit-logstash-jobs/defaults/main.yaml23:09
clarkbconsole and oslofmt are a combo of the two files I just linked. We have a config file saying "these are all the files" then we sned out gearman jobs with that data to be processed by the log processors23:10
clarkbthe problem with the upstrema tools for this sort of thing is they all assume they are running alongside the service as it is logging. But in a CI system we want to do it after the fact so that the log pipeline doesn't interfere with job results (andruntime)23:11
clarkbit is intentional that we do it after the fact and none of the tools that existed at the time ever considered that use case so we made some simple ones to do it23:11
melwittyeah, I see, thanks a lot23:11
opendevreviewGhanshyam proposed openstack/project-config master: Remove publish-to-pypi from retired neutron-lbaas repo  https://review.opendev.org/c/openstack/project-config/+/80085323:29
opendevreviewGhanshyam proposed openstack/project-config master: Properly retire neutron-lbaas  https://review.opendev.org/c/openstack/project-config/+/80014723:35
clarkbI'll have to check in on my gerrit images tomorrow.23:46
clarkbianw: is there anything else I should be looking at today before dinner?23:46
ianwclarkb: i think we're good thanks.  i'll correct that backup patch, sort out the replication steps and do some cleanup for the old paste server23:58
clarkbsounds good23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!