corvus | yeah. my opinion is we should either try to get it in shape or discontinue it because it's an attractive nuisance. | 00:00 |
---|---|---|
corvus | as a user of the system, i'd really like to have one well-defined channel for announcements :) | 00:00 |
clarkb | unfortunately I think we have subsets of users that would prefer twitter, others that would prefer a mailing list, others that want wechat and so on | 00:01 |
corvus | that is a separate question that i don't think needs to be answered now :) | 00:02 |
clarkb | sure. I agree that a single well known channel is the ideal | 00:02 |
clarkb | I also think that making it interrupt drive and not poll based is important | 00:02 |
corvus | so let me rephrase that: as a user, i would like one well-defined channel for email service announcements :) | 00:02 |
clarkb | with poll based systems people will only check after the fact and that is often too late | 00:02 |
ianw | some sort of dismissable MOTD type box on review.opendev.org might be good. maybe there's a plugin | 00:02 |
corvus | there used to be some js to load the status alert from there, but i'm pretty sure it got lost in all the upgrades | 00:03 |
corvus | but that's really part of the "another channel" discussion unless you want to stop sending emails | 00:04 |
corvus | so to make this tractable, i'm only suggesting that if we use email to disseminate service announcements, we have a well-defined policy for where we send those announcements :) | 00:04 |
clarkb | corvus: given the current situation for thta well defined location do you have any ideas for making it a better subscribed list? | 00:05 |
corvus | that would make it easier for us as admins, and easier for us and all other users because no one has to guess where they should be looking to find this info | 00:05 |
clarkb | one option is that we auto sbscribe people | 00:05 |
clarkb | (but then you end up being a spammer) | 00:05 |
clarkb | But I guess if we sent the mailman you're subbed email with an unsub option we'd technically be following the rules around spam? | 00:06 |
corvus | subscribing everyone on "all the lists" to that list so that they will get emails you were going to send to them anyway doesn't sound like spamming to me | 00:06 |
clarkb | ya I think the key is that there be a clear way to unsub which there would be | 00:07 |
corvus | i don't love the option, but i don't think it's evil or wrong | 00:07 |
corvus | or, we send out a couple of reminder emails to everyone and call their bluffs by not sending any more service emails to lists | 00:08 |
clarkb | ya that might be a good first step. Then we can see what membership looks like and how much it moves | 00:08 |
clarkb | fwiw I think we have been using the -announce list a fair bit lately. But the review move did go to the whole set | 00:09 |
ianw | i agree i should have sent the downtime to service-announce, mea culpa for not following that policy sorry | 00:09 |
clarkb | OFTC move, default nodeset change, and git review list have all gone to the announce list in the last few months. | 00:10 |
clarkb | I sent ELK deprecation/removal discussion to the openstack discuss list only as they are the primary (only?) users | 00:10 |
clarkb | that said its definitely a small list membersip last I checked | 00:10 |
clarkb | I want to say it is less than 20? | 00:10 |
clarkb | it has been a while since I looked though | 00:11 |
ianw | i think i confused myself because i've filtered service-announce into a generic infra folder | 00:11 |
clarkb | ya I don't think it is an indication of a change in policy just a mistake. We can continue to stick to -announce. Remind people that is where we are and they should be there too. This puts my draft email in a weird spot because the coverage for that list is small though | 00:13 |
clarkb | and maybe a gerrit motd thing would be a better spot for that anyway. I'll have to think on this a bit more | 00:14 |
corvus | clarkb: if we do want to keep service-announce around, then maybe add a note to your email saying "please subscribe to service-announce to something something something" | 00:14 |
clarkb | corvus: that is a good idea | 00:14 |
corvus | clarkb: not to get too much into the weeds on that, but while i think it's useful for some things, i don't think it's a good match for this (which is indefinite in terms of time). | 00:15 |
corvus | i don't think we could leave a motd up for more than a day and still have it be effective | 00:15 |
corvus | (people learn to skip over things like that pretty quick :) | 00:16 |
clarkb | ya the best thing would be a giant warning over the delete emails button :) | 00:16 |
corvus | ++ | 00:16 |
corvus | that's actually not a bad interim idea if the gerrit fix takes a while | 00:17 |
clarkb | corvus: I added a paragraph to the draft pusing people towards service-announce if you want to take a look at that | 00:17 |
clarkb | corvus: ya we might be able to make that as a simple patch in our builds too. Worth investigating | 00:17 |
ianw | i wonder if that could be done with css | 00:17 |
clarkb | I was hoping to look into the actual fix tomorrow too | 00:17 |
corvus | clarkb: update lgtm | 00:19 |
clarkb | I'll think about this overnight and try to look at the gerrit options more closely tomorrow. But now I need to transform soaking wet children (letting them play with the hose was a bad idea) into dry chidlren for dinner | 00:19 |
ianw | does anyone remember the file we put in on bridge to stop ansible runs? i'm drawing a blank | 00:21 |
clarkb | ianw: there is a script to run to do it | 00:23 |
clarkb | ianw: https://docs.opendev.org/opendev/system-config/latest/bridge.html#running-ansible-on-nodes | 00:24 |
clarkb | `disable-ansible` and you give it a reason string | 00:24 |
ianw | ahh, that's it, thanks | 00:24 |
ianw | alright, i've pruned and reworked https://etherpad.opendev.org/p/gerrit-upgrade-2021 and i think it's pretty complete | 02:00 |
opendevreview | Ghanshyam proposed openstack/project-config master: Use publish-to-pypi-stable-only template for deprecated repo https://review.opendev.org/c/openstack/project-config/+/800558 | 02:16 |
opendevreview | chzhang8 proposed openstack/project-config master: register and bring back tricircle under x namespaces https://review.opendev.org/c/openstack/project-config/+/800743 | 03:51 |
*** ysandeep|away is now known as ysandeep | 03:59 | |
opendevreview | Ian Wienand proposed opendev/zone-opendev.org master: Add paste.opendev.org CNAME https://review.opendev.org/c/opendev/zone-opendev.org/+/800744 | 04:31 |
ianw | ok i fixed the problem with paste.opendev.org taking 2+ minutes to post a new paste; the clue was it only happened when i imported the db | 04:34 |
ianw | turns out, i don't quite know why, after you put in a new paste, it looks it up and the query has "WHERE pastes.user_hash = 'f845053720d2963a3087e1bdfac6c62630a09451'" | 04:34 |
ianw | well pastes.user_hash wasn't indexed and with all the records it was reading the whole db back. so i did a CREATE INDEX user_hash ON pastes(user_hash); and now it works | 04:35 |
ianw | i'm going to cut it over, to avoid the old and new server going out of sync | 04:36 |
ianw | i'm guessing somehow the ancient trove db it was connect to uses different table types, or indexes, or something else and why this isn't seen there | 04:36 |
opendevreview | Ian Wienand proposed opendev/system-config master: lodgeit: correct database path https://review.opendev.org/c/opendev/system-config/+/800745 | 04:41 |
opendevreview | Merged opendev/zone-opendev.org master: Add paste.opendev.org CNAME https://review.opendev.org/c/opendev/zone-opendev.org/+/800744 | 04:43 |
opendevreview | Merged opendev/system-config master: lodgeit: correct database path https://review.opendev.org/c/opendev/system-config/+/800745 | 05:38 |
*** bhagyashris_ is now known as bhagyashris|ruck | 06:25 | |
*** ysandeep is now known as ysandeep|afk | 06:51 | |
opendevreview | Merged openstack/project-config master: Create repo for Hashicorp Vault deployment https://review.opendev.org/c/openstack/project-config/+/799822 | 06:59 |
*** amoralej|off is now known as amoralej | 07:03 | |
ianw | #status log paste.openstack.org migrated to paste.opendev.org | 07:06 |
opendevstatus | ianw: finished logging | 07:06 |
opendevreview | chzhang8 proposed openstack/project-config master: register and bring back tricircle under x namespaces https://review.opendev.org/c/openstack/project-config/+/800750 | 08:07 |
*** ykarel is now known as ykarel|lunch | 08:53 | |
*** ysandeep|afk is now known as ysandeep | 09:53 | |
*** ykarel|lunch is now known as ykarel | 10:16 | |
*** dviroel|out is now known as dviroel | 11:24 | |
*** amoralej is now known as amoralej|lunch | 13:01 | |
*** ysandeep is now known as ysandeep|PTO | 13:31 | |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Add Vault role to Zuul jobs https://review.opendev.org/c/openstack/project-config/+/799825 | 13:41 |
*** amoralej|lunch is now known as amoralej | 13:46 | |
opendevreview | Ghanshyam proposed openstack/project-config master: Use publish-to-pypi-stable-only template for deprecated repo https://review.opendev.org/c/openstack/project-config/+/800558 | 14:24 |
clarkb | a second red hatter has orphaned their account | 14:36 |
*** ykarel is now known as ykarel|away | 14:36 | |
clarkb | efoley: ^ is there a general directive or instruction at red hat to make these updates? I'd like to get ahead of that if so because the gerrit bug makes that process very tricky | 14:36 |
clarkb | corvus: I think I'll go ahead and send out my draft to the various lists now as well to try and get ahead of this | 14:37 |
clarkb | corvus: are you comfortable with that now that I have added the bit about joining the announce list? | 14:37 |
corvus | clarkb: ++ (i think i said so yesterday, if not, sorry!) | 14:45 |
corvus | definitely not trying to hold up the process | 14:45 |
clarkb | corvus: ah yup you said update lgtm yesterday | 14:45 |
clarkb | I'll get that out momentarily. Thanks for the checking | 14:45 |
clarkb | corvus: I can leave it off of the zuul list if you prefer too | 14:47 |
corvus | clarkb: feel free to send it to the zuul list | 14:48 |
corvus | my concern is long-term, not short-term :) | 14:48 |
clarkb | ok | 14:49 |
*** dviroel is now known as dviroel|lunch | 14:50 | |
clarkb | emails sent | 14:57 |
efoley | clarkb: No directive, hopefully just an unfortunate coincidence. | 15:04 |
clarkb | efoley: ok probably coincidence then. Still important to help others avoid this if we can. | 15:05 |
clarkb | efoley: ianw mentioned that you tested things on the staging server and it went well. I guess we just wait for the downtime now :) | 15:05 |
*** dviroel|lunch is now known as dviroel | 15:39 | |
*** amoralej is now known as amoralej|off | 16:11 | |
ildikov | Hello | 16:19 |
ildikov | I'm reaching out about an etherpad challenge | 16:20 |
ildikov | I've been experiencing a lot of reconnects with it and I also work with the StarlingX community and they seem to hit this issue a lot too | 16:20 |
ildikov | I mentioned the duplicate tab open with the same pad as root cause, but it wasn't the case | 16:21 |
ildikov | This is one of the pads where I know people had issues with: https://etherpad.opendev.org/p/stx-status | 16:21 |
ildikov | so I wonder if there might be something on the server side or it's mainly client issue? | 16:21 |
ildikov | And seeking for help and guidance here if anyone's around to help out :) | 16:21 |
ildikov | Thanks in advance! | 16:21 |
clarkb | the other issue we often see (other than the duplicat tab issue) is network connectivity problems | 16:21 |
clarkb | etherpad is quite stateful and if it loses that connectivity it gets out of sync, then when you try to make an update it complains and forces you to reconnect (and resync the state aiui) | 16:22 |
ildikov | does it matter how big the etherpad has grown? | 16:23 |
clarkb | we have also seen that cause problem too. Not sure if that manifests as reconnection issues though | 16:24 |
clarkb | fungi might recall but isn't around this week | 16:24 |
ildikov | ok, I could ping him next week about that | 16:24 |
ildikov | it's not that urgent, just wanted to ask as multiple people were reporting this behavior and were upset about it | 16:25 |
clarkb | I was just able to reproduce the duplicate tab problem and the error we get on the server side appears to be "Error: Can't apply USER_CHANGES, because Trying to submit changes as another author in changeset $stuff" | 16:26 |
clarkb | which we can check against | 16:26 |
clarkb | ildikov: you are the first to report it to us :) I would encourage people to talk to us as we can't know otherwise | 16:26 |
ildikov | clarkb: will do my best to remind them! :) | 16:26 |
clarkb | But also keep in mind these services are volunteer run as best effort and our team needs help. Right now much of the focus is on tools like gerrit and zuul and we don't (at least I don't) have much time to go and debug etherpad | 16:27 |
ildikov | the meeting that the etherpad I linked above is used happened 2 and a half hours ago hitting this problem | 16:27 |
ildikov | I saw the person's browser tabs and it was the only instance of the etherpad there | 16:27 |
clarkb | ildikov: did it happen when tehy started to use the pad or were they able to update the pad a bit, then later it happened? | 16:28 |
ildikov | clarkb: yes, I was rather looking for information at this point, like it's a known problem, it's not but we might be able to check if it's a server side problem after all, etc | 16:28 |
clarkb | ildikov: yes to which thing? :) | 16:29 |
ildikov | clarkb: yes to it's not a priority for this small community right now :) | 16:29 |
clarkb | fwiw as far as I can tell I only experience these problems if I have the pad open in multiple tabs or experience a network connection problem. Disconnects or resuming from suspend etc | 16:30 |
clarkb | and they are expected in those instances as the software just doesn't handle those cases | 16:30 |
ildikov | clarkb: I think it kept on happening, I mean when they started and then after that too | 16:30 |
ildikov | ok, noted, I will share this information with the community to keep in mind | 16:30 |
clarkb | ildikov: if/when it happens again it would be great to double check network connectivity as well as any other open tabs. Note that if you open it in another tab in the same browser but a different window that would still cause problems. | 16:31 |
clarkb | but also if people are experiencing those problems it is usually easiest to debug them without playing telephone | 16:31 |
ildikov | without playing telephone? | 16:32 |
*** marios is now known as marios|out | 16:33 | |
clarkb | ildikov: the game of telephone is where you say a message to one person who apsses it along to another and so on until the end of the line and then you compare what the message is like at the end | 16:33 |
clarkb | it is very difficult to debug these problems when the people that have them don't reach out to us directly | 16:33 |
clarkb | ildikov: at 14:06:09 and 14:06:19 I see what I believe is the duplicate tab related error message on the server immediately followed by clients entering stx-status | 16:39 |
clarkb | unfortunately etherpad doesnt' record the pad that triggered the error so it is hard to say for sure that occurred against stx-status | 16:39 |
clarkb | I see similar against at 14:14:10 | 16:41 |
clarkb | separately on other pads including stx-release I see a number of errors from Safari users (and I want to say safari is also known to have problems with etherpad?) | 16:42 |
ildikov | ah ok, noted | 16:43 |
ildikov | that's good info! | 16:43 |
clarkb | 'Uncaught Error: applySubmittedChangesToBase: no submitted changes to apply' and 'Uncaught TypeError: Cannot read property \'setStateIdle\' of null' are errors that seem to be safari specific | 16:43 |
ildikov | I will pass on the browser requirements and keep reminding people to not have the same etherpad open at multiple places | 16:43 |
clarkb | ildikov: and if it continues to happen if we can have them pop in here for live debugging while it happens that would be great | 16:44 |
clarkb | then we can do things like check other pads (if it happens against a new random pad then it is really unlikely to be a duplicate tab issue and we can probably see the more specific logs that way too) | 16:44 |
clarkb | I need to pop out now for a bit ot get some exercise in. Back in a bit | 16:45 |
ildikov | clarkb: cool, will do! | 16:49 |
ildikov | could try to have a debug session while the next StarlingX community call is happening | 16:49 |
ildikov | I will send out some hints to people in the meantime | 16:49 |
ildikov | thanks for all the help!! | 16:50 |
*** timburke_ is now known as timburke | 18:12 | |
opendevreview | Vishal Manchanda proposed openstack/project-config master: Retire django-openstack-auth https://review.opendev.org/c/openstack/project-config/+/800459 | 18:21 |
mnaser | clarkb: "clarkb-db-tester" is a vm launched a while back (8 months ago) on our cloud, is that something you still need? | 19:22 |
clarkb | mnaser: I don't think so. Do you want me to delete it? | 19:23 |
mnaser | clarkb: i can if it's easier for you :) | 19:23 |
clarkb | let me login really quickly and just double check it and then I should be able to delete it | 19:24 |
clarkb | mnaser: done. I notice in the other region I've still got the shutoff openstackid expansion instances. We're starting to look into the future of hosting that service and I'm 95% sure I can delete those too. I'll put that on my todo list | 19:28 |
mnaser | clarkb: great thank you | 19:39 |
clarkb | no thank you for all the great support! :) | 19:40 |
opendevreview | Clark Boylan proposed opendev/system-config master: Push a patch to try and prevent gerrit openid deletion https://review.opendev.org/c/opendev/system-config/+/800832 | 20:28 |
clarkb | I wrote some java ^ I have no idea if that is anywhere close to working but I'm hoping that change will give me a test node that I can iterate a bit better on than locally | 20:29 |
clarkb | and a hold has been set on the gerrit 3.2 run job | 20:30 |
clarkb | In other news I've learned some things about building gerrit and using bazel. | 20:30 |
opendevreview | Clark Boylan proposed opendev/system-config master: Push a patch to try and prevent gerrit openid deletion https://review.opendev.org/c/opendev/system-config/+/800832 | 20:52 |
clarkb | I did bash poorly | 20:53 |
opendevreview | Ghanshyam proposed openstack/project-config master: Temporarily add official-openstack-repo-jobs for retired repo https://review.opendev.org/c/openstack/project-config/+/800840 | 21:29 |
clarkb | does anyone understand how I used patch wrong in https://32765642782fd47cbefb-da3f8077deb18415bd477eebc664cf39.ssl.cf5.rackcdn.com/800832/2/check/system-config-build-image-gerrit-3.2/f64cf33/job-output.txt ? it says it can't find the files but I set -d and those files exist relative to my loacl checkout | 21:36 |
opendevreview | Merged openstack/project-config master: Use publish-to-pypi-stable-only template for deprecated repo https://review.opendev.org/c/openstack/project-config/+/800558 | 21:42 |
clarkb | apparently I need to use git diff with --no-prefix | 21:43 |
opendevreview | Clark Boylan proposed opendev/system-config master: Push a patch to try and prevent gerrit openid deletion https://review.opendev.org/c/opendev/system-config/+/800832 | 21:44 |
clarkb | ianw: comment on https://review.opendev.org/c/opendev/system-config/+/797564 about why it is failing CI. Should be an easy fix. Let me know and I can probably push up a fix for it though | 21:56 |
clarkb | ianw: comment on https://review.opendev.org/c/opendev/zone-opendev.org/+/798244 as well. I think I've reviewed the chagnes for gerrit stuff. Please let me know if I have missed any | 21:57 |
clarkb | ianw: I added some thoughts to upgrade steps 11, 15, and 17 too if you want to take a look at those | 21:59 |
clarkb | Seems like it is raelly coming together. THanks for all the work on this | 22:00 |
*** dviroel is now known as dviroel|out | 22:01 | |
ianw | i woke up at 3:30am and for some reason instantly had the thought that for the dns update, if i force merge the update review.o.o -> review02 the DNS apply handbook is still probably going to want to pull the opendev zone from review.opendev.org | 22:09 |
ianw | hrm, so actually it clones from opendev.org per https://opendev.org/opendev/system-config/src/branch/master/inventory/service/group_vars/dns.yaml#L1 | 22:11 |
ianw | so i guess that means we need to order replication before DNS updates | 22:12 |
clarkb | good catch | 22:24 |
clarkb | ianw: you could manually copy the replication config over from the old server to the new one too so that we aren't coordinating ansible stuff for that | 22:24 |
clarkb | as a shortcut | 22:25 |
ianw | yeah, i'll go through that and flesh it out | 22:26 |
clarkb | hrm I'm still failing at using patch properly. | 22:28 |
ianw | clarkb: how did you find the second account on that other issue that came up? | 22:29 |
ianw | oh; i was grepping in external-id's branch | 22:29 |
clarkb | ianw: I grepped for the email addresses that they shared | 22:29 |
clarkb | git grep emailaddr; git grep otheremailaddr | 22:30 |
clarkb | since those are in the files along with the accountIds | 22:30 |
ianw | but not in the external-ids branch right? | 22:31 |
ianw | ohhhh, hang on, i see | 22:32 |
clarkb | yes in refs/meta/external-ids | 22:32 |
ianw | i didn't have this users @old-work.com email | 22:32 |
ianw | now i see that has come back with the high account id | 22:32 |
ianw | ok, that makes sense. so i'll add to my notes on review02 to also update this account | 22:33 |
*** dmellado_ is now known as dmellado | 22:34 | |
opendevreview | Clark Boylan proposed opendev/system-config master: Push a patch to try and prevent gerrit openid deletion https://review.opendev.org/c/opendev/system-config/+/800832 | 22:40 |
clarkb | I think it shoudl work now as I managed to replicate the issue locally | 22:40 |
melwitt | does anyone know which component in https://docs.opendev.org/opendev/system-config/latest/logstash.html applies tags (like "console", etc) to the log events? | 23:03 |
clarkb | melwitt: it is a combo between the logstash ruleset and the log worker processes. Let me find some links | 23:04 |
melwitt | thanks.. I've been looking at gearman worker and client in puppet-log_processor but I'm either missing it or it's somewhere else | 23:06 |
clarkb | https://opendev.org/openstack/logstash-filters/src/branch/master/filters/openstack-filters.conf that is the logstash ruleset and it parses things out of hte messages like the severity and timestmap etc | 23:06 |
melwitt | the tl;dr is I'm slowly learning about logstash and I noticed all of our logstash entries showing in kibana are tagged with _grokparsefailure which means something in the logstash-filters failed to parse (IIUC) and I'm trying to send a known bad console log file through to see what happens and I'm failing at that haha | 23:07 |
clarkb | melwitt: https://opendev.org/opendev/base-jobs/src/branch/master/roles/submit-log-processor-jobs/library/submit_log_processor_jobs.py#L131-L171 | 23:08 |
melwitt | I realized I need to use a file pusher to send the file lines to logstash and I know we have the log_processor for that. it'd be nice if I could just use filebeat somehow but I realized I don't understand what actually applies tags like "console" and "oslofmt" as these are used in openstack-filters.conf | 23:09 |
melwitt | ahhh thank you | 23:09 |
clarkb | and it reads https://opendev.org/opendev/base-jobs/src/branch/master/roles/submit-logstash-jobs/defaults/main.yaml | 23:09 |
clarkb | console and oslofmt are a combo of the two files I just linked. We have a config file saying "these are all the files" then we sned out gearman jobs with that data to be processed by the log processors | 23:10 |
clarkb | the problem with the upstrema tools for this sort of thing is they all assume they are running alongside the service as it is logging. But in a CI system we want to do it after the fact so that the log pipeline doesn't interfere with job results (andruntime) | 23:11 |
clarkb | it is intentional that we do it after the fact and none of the tools that existed at the time ever considered that use case so we made some simple ones to do it | 23:11 |
melwitt | yeah, I see, thanks a lot | 23:11 |
opendevreview | Ghanshyam proposed openstack/project-config master: Remove publish-to-pypi from retired neutron-lbaas repo https://review.opendev.org/c/openstack/project-config/+/800853 | 23:29 |
opendevreview | Ghanshyam proposed openstack/project-config master: Properly retire neutron-lbaas https://review.opendev.org/c/openstack/project-config/+/800147 | 23:35 |
clarkb | I'll have to check in on my gerrit images tomorrow. | 23:46 |
clarkb | ianw: is there anything else I should be looking at today before dinner? | 23:46 |
ianw | clarkb: i think we're good thanks. i'll correct that backup patch, sort out the replication steps and do some cleanup for the old paste server | 23:58 |
clarkb | sounds good | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!