Thursday, 2021-07-15

opendevreviewIan Wienand proposed opendev/system-config master: backups: add
opendevreviewIan Wienand proposed opendev/ master: Update to
ianwugh that made me realise we want to make sure we move review02 out of review-staging group appropriately too00:32
opendevreviewIan Wienand proposed opendev/system-config master: review02: move out of staging group
opendevreviewIan Wienand proposed opendev/system-config master: backups: add
Clark[m]ianw: are there bits in the review and Gerrit groups that we need on the server like ssh keys or LP credentials that we might need early in the migration?03:08
Clark[m]Your catch of the group move made me think of that. If so we might want to add them to the staging group now and get them landed and in place early03:09
ianwClark[m]: i think that is all pretty much there, as i basically copied the review01 hosts .yaml file on bridge to review0203:09
Clark[m]Ok. Thought I would double check03:10
ianw++ :) i'm feeling better about the plan sorting a few of these bits out03:13
Clark[m]Worst case we'll do a manual sync and then fix Ansible after :)03:15
*** ykarel|away is now known as ykarel04:32
opendevreviewIan Wienand proposed opendev/system-config master: Remove
opendevreviewIan Wienand proposed opendev/system-config master: Remove
opendevreviewIan Wienand proposed opendev/system-config master: Add to backup
opendevreviewMerged opendev/system-config master: borg-backup: exclude /var/lib/snapd
opendevreviewMerged openstack/project-config master: Remove publish-to-pypi from retired neutron-lbaas repo
*** amoralej|off is now known as amoralej06:10
opendevreviewOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml
opendevreviewchzhang8 proposed openstack/project-config master: register and bring back tricircle under x namespaces
opendevreviewMerged opendev/system-config master: Add to backup
opendevreviewxinliang proposed opendev/system-config master: Enable openEuler mirroring
*** rpittau|afk is now known as rpittau07:24
*** ykarel is now known as ykarel|lunch09:17
*** ykarel|lunch is now known as ykarel11:00
*** dviroel|out is now known as dviroel11:02
opendevreviewAnanya Banerjee proposed opendev/elastic-recheck master: Run elastic-recheck container
opendevreviewAnanya Banerjee proposed opendev/elastic-recheck master: Run elastic-recheck container
*** amoralej is now known as amoralej|lunch11:50
*** amoralej|lunch is now known as amoralej|away11:50
*** iurygregory_ is now known as iurygregory12:07
opendevreviewThierry Carrez proposed opendev/yaml2ical master: Update hacking version
opendevreviewAnanya Banerjee proposed opendev/elastic-recheck master: Run elastic-recheck container
opendevreviewThierry Carrez proposed opendev/yaml2ical master: Report which week a meeting occurs.
*** chkumar|rover is now known as chandankumar14:05
mnaserinfra-root: how long does a gerrit restart usually take?14:37
mnaseri'm asking because since we're moving to a new dc, we might need to potentially organize a very short reboot of the instance.  i'm wondering if it is beneficial for us to 'rush' through the current gerrit instance before migration so we avoid it before the actual day of the migration14:37
clarkbmnaser: I normal restart of the gerrit service to update the docker image takes on a few minutes. Maybe 2-314:53
mnaserclarkb: ok so it's not some wild thing that will cause a ton of interruption because it has to index something14:53
clarkbmnaser: no, the reindexing is necessary during certain upgrade scenarios (we are also doing a reindex during our migration on sunday/monday to avoid needing to sync the indexes between servers)14:54
clarkbif we need to coordinate a reboot in a week or two I don't expect that to be problematic. But it would also be fine to do it in the next day or two.14:55
melwittclarkb: eventually I got my logstash working with filebeat pushing the log lines from a sample job-output.txt file and so far I'm not able to get a _grokparsefailure tag happening from it like I see in our kibana. I came across this while googling where someone observed _grokparsefailure only when log events were overloading the system,15:41
melwittwhereas in the debugger there was never a problem. it made me wonder, are our logstash batching settings in a public repo anywhere? just curious about how they are set15:41
clarkbmelwitt: I don't know that we set any config like that. We just run the service from the pacakge install then feed it the configs in that logstash configs repo15:48
clarkbalso it is an older version of logstash (one of the many issues with the aging ELK infrastructure)15:48
melwittclarkb: ah ok. yeah apparently you can configure logstash to filter events in batches to alleviate some congestion issues if you have congestion issues15:49
melwittdo we know what version it is? I could try installing that15:49
clarkbmelwitt: should be what we grab15:50
clarkbyou're welcome15:55
opendevreviewMerged openstack/project-config master: Normalize projects.yaml
*** rpittau is now known as rpittau|afk16:17
*** marios is now known as marios|out16:25
opendevreviewRich Bowen proposed opendev/yaml2ical master: Report which week a meeting occurs.
opendevreviewRich Bowen proposed opendev/yaml2ical master: Report which week a meeting occurs.
*** ykarel is now known as ykarel|away16:32
*** sshnaidm is now known as sshnaidm|afk16:35
opendevreviewRich Bowen proposed opendev/yaml2ical master: Report which week a meeting occurs.
* melwitt is now running with logstash 2.4.1 😅16:57
melwittwhat I've gleaned so far is that logstash is by far the biggest offender for resource consumption (both memory and cpu) but elasticsearch and kibana are still considered best in class17:29
clarkbelasticsearch needs a lot of memory too since it loads indexes into memory17:30
clarkbthe problem with modern kibana is there wasn't a way to use it with a RO elasticsearch iirc17:31
clarkbthere are definitely some rough edges when it comes to the use case we've got :/17:31
melwittand apparently elasticsearch has something called ingest pipelines that can be used to do filtering (instead of logstash) and that it's possible/favored to run elasticsearch and kibana without logstash17:31
melwittI'll read up on RO elasticsearch + kibana17:33
clarkbmelwitt: I think the amazon fork may have some of the functionality needed to control access in a way that kibana might be able to use it17:33
melwittok I'll look for that. I'm not familiar with the RO mode17:35
melwittI mean, I'm not familiar with any of it really but this is the first I have heard of the RO mode heh17:35
clarkbmelwitt: well the RO mode is something we did because we want this data to be accessible. That is why you have to talk to it via that proxy17:37
clarkbthe proxy has a list of allowed api queries and they don't allow writes17:37
melwittoh, I see17:37
clarkbthe problem becomes that kibana wants to use elasticsearch to operate and to do that it needs to write to elasticsearch. But if we do that then no one can access elasticsearch17:37
clarkb(because it isn't safe to have a giant elasticsearch cluster on the internet in that way)17:37
melwittgot it17:38
clarkbbasically the main reason we got in this lack of upgrades problem was kibana. Old kibana didn't work with new elasticsearch but new kibana didn't work with RO elasticsaerch17:38
clarkband we let that rot on the vine for too long :(17:38
melwittI see. ok. so far I don't have a sense of what kibana alternatives there are, I will look deeper at that17:39
melwittso far, alternatives in general look complicated in comparison to ELK so I'm concerned about ease of use and maintenance too17:41
melwittit looks like dropping the L would be pretty simple but beyond that, I haven't seen anything clear17:42
clarkbits possible we punt on kibana and tell epople to use the api. Not a great answer but might help simplify things17:48
clarkbin trying to test my fix for the openid deletions via email deletion I've discovered that gerrit will not let you delete your preferred eamil. So now I need to figure out adding a second email address to my account on the test node without doing verification. I think I can do that as admin but this got more complicated17:49
melwittyeah.. it's been so nice using kibana17:58
clarkbError 409 (Conflict): Cannot remove e-mail 'testemail' which is directly associated with OPENID_SSO authentication <- success! I think17:58
clarkbinfra-root ^ It lets you click the delete button in the settings page and removes it from the ui but when you click on save settings you get this error17:59
clarkbI've got a meeting with our second broken gerrit account user to help them test the fix against review02 shortly, then lunch, but once that is done I'll try to push my patch up to gerrit18:05
clarkbI'd like to avoid forking gerrit again with our patch so will do my very best to make the cahnge mergable upstream18:05
clarkbianw: I had to modify the canonical web url on review02 to do testing of the gerrit account fixup. Details arei n the email. I figured I'd leave that in place for a bit just in case we need to do anymore similar testing, but when your day starts we can revert that to the way we want it18:59
clarkband now lunch19:00
opendevreviewVishal Manchanda proposed openstack/project-config master: Retire django-openstack-auth
clarkbinfra-root exists now19:45
clarkbif ya'll can take a look at that I would appreciate it19:46
clarkbcorvus: do you know how I view the results of their CI jobs? I have a failing code style check and I can't find a way to see why it failed19:53
clarkboh wait there is a little icon next to the rerun button19:53
*** dviroel is now known as dviroel|brb19:53
corvusyeah that :)19:54
clarkbunfortunately that still doesnt' tell me what went wrong just that google-java-format-1.7 produced output19:54
* clarkb looks in the local repo to see if that is runnable from there19:55
corvusi'm stumped too19:55
* fungi places ci errors next to a rerun button, thinking surely that won't encourage users to just rerun things19:56
clarkbrun `./tools/` to download a local copy and set up a wrapper script.19:56
clarkbI'll wait for the other checks to run too in case I have to fix other things19:56
clarkbI have fumbled my way through that (I didn't run their script but fetched the jar myself and ran it. Couldn't figure out how to run it against the whole repo at once so did files individualy). I had an extra unnecessary import20:03
clarkbI'll wait for the other builds before pushing my fix20:03
clarkbthats neat. Pushing my second patchset gets hit with an error20:19
clarkbI'll give it a few minutes then try again. If it fails again I guess I ask about it on slack20:20
clarkbremote: INTERNAL Internal error encountered20:21
clarkbI can't add comments to me change now either it seems and somene on slack says I should file an issue because it is probably a problem with the servers20:26
*** dviroel|brb is now known as dviroel20:27
clarkb has been filed in response to this problem20:37
clarkbgood news is that my change seems to do what I want on the test server. I can delete email addresses not assoicated iwth my openid and cannot delete changes associated with my openid. And when I show up as a new user it is able to creat a new account for me21:01
*** dviroel is now known as dviroel|out21:24
ianwclarkb: thanks for working through that.  we can leave as is and add note in the checklist to update it if you like21:48
clarkbianw: at this point I suspect that we can revert it back to normal since no need for additional testing has come up21:49
ianw++ one less thing seems good21:49
clarkbI can do that in a few21:49
ianwi don't know about the plugin that seems to be highlighting references in
ianwit seems to make my mouse move slower when i'm in the green area, it's very weird21:51
clarkbianw: it could also be an update to the core softwrae on latest gerrit21:51
ianwi can also pin a cpu by wiggling my mouse in there21:52
ianwi guess that's related21:52
clarkbianw: if you want to test it as well on the test node you need to manaully add an email address to your account with the admin account becaues I don't think we can do verification of the email addresses there and you can't delete your preferred email (which your only email from openid will be)21:53
clarkbin my testing I logged in to create a new account. Added an email via the api as admin. Then switched my preferred email to that email. Tried to delete the other email which came from openid (that failed which we wanted). Then swapped preferred emails back around again and successfully deleted the email that I had added which isn't part of my openid21:54
clarkband now I can't push my new ps to fix the CI error21:55
ianwit looks like it fairly logically fits in with what all the other authentication schemes are doing, which is good and suggests it's right to me21:56
clarkbianw: I've reverted the manual edit of review02's gerrit.config and restarted the services with docker-compose21:58
clarkbyou might want to take a look at the config just to make sure I didn't do anything silly but it was just a simple line removal and uncomment the old line21:58
ianw has been updated21:59
ianwthe only further thought i had was that as written maybe it shuts down zuul unnecessarily.  22:00
clarkbya I think I left a comment about that saying zuul should reconnect when the dns update happens22:00
ianwyeah just that the prior steps had it shut down22:00
clarkbI don't think that it hurts to shut it down. I  also plan to work with corvus to restart zuul tomorrow to ensure the changes going into it don't impact the gerrit move22:01
clarkbI think there is one change and another that will probably get squashed into it that we want to have BMW rereview and if they are happy we land those and restart22:01
ianwyeah we can be conservative this time and next time i imagine things will look very different22:03
clarkbianw: oh! another thing I noticed earlier was that I think your common topic for the review changes is no longer on all of the changes. Can you check that then I can pull the topic up and do another pass of reviews?22:04
ianwahh, git review must have helpful reset that for me22:05
clarkbok I was able to post my review comments on that upstream gerrit change and then push the new patchset22:09
clarkbthe 3 remaining changes lgtm. Not sure if you want to WIP the two that are not WIP22:16
clarkbmy upstream chagne passes codestyle checks now. It passed the build test previously22:18
clarkbconsidering the only change I did was to remove an unused import I won't rebuild my held test node. I think what we have there is sufficiently like what I'ev pushed upstream. If I get reviews asking me to make major changes I will update the test machine22:19
ianwsigh, i seem to have lost ipv6 connectivity22:23
ianwif this persists, it's not something i'm going to enjoy trying to talk to support bots about22:24
clarkbmy ISP was hoping to roll out native ipv6 in ~February and it still hasn't happened yet. Supposedly they keep finding problems with their deployment in the test env22:24
ianwletsencrypt is failing, having a look22:28
ianwfatal: []: FAILED! => {}22:28
ianwthat is really weird as i wouldn't have thought cacti was involved with letsencrypt22:32
clarkbianw: we do certcheck from cacti02 iirc. It was maybe writing the list of hosts ot certcheck back to that server?22:32
clarkbbut ya it doesn't LE directly, just does the checking I think22:32
ianwohhh, yeah, that's it22:32
ianw'ansible.vars.hostvars.HostVarsVars object' has no attribute 'letsencrypt_certcheck_domains'22:33
clarkbif we don't update any certs do we maybe fail to write the object properly? (just a thought haven't really dug in)22:34 : ok=0    changed=0    unreachable=122:35
ianwi think that might be the root cause22:35
ianwindeed that host seems to not want to talk22:36
clarkbthe identity and image apis seem to talk but not the compute for that cloud22:37
clarkbthats the openstack as a service cloud that we can poke at directly. But maybe our time is better spent this week putting that host in the emergency list then we can look at ti next week?22:38
clarkb(I'm happy if someone else wants to look at it too but I think we can punt on the problem for now)22:38
ianwwe're not running ci resources there?22:40
clarkbianw: we would if the nova API stopped sending 500 errors :)22:41
clarkbI think we must not be running anything there as a result of ^22:41
ianwis there like a 5-second guide of where to look?22:43
clarkbianw: yes, in the normal location should be login details for that cloud and for the cloud as a service management system22:44
clarkbin horizon I think you should get IP addresses for the hosts involved as well and you can ssh into those. THough now I can't remember if we properly set up ssh keys for everyone on those. fungi and I should have keys on them though and can add others if that didn't happen22:44
clarkblooks like you have the file open so I'll wait my turn :)22:45
opendevreviewIan Wienand proposed openstack/project-config master: nodepool: set inmotion cloud to zero
opendevreviewIan Wienand proposed openstack/project-config master: Revert "nodepool: set inmotion cloud to zero"
opendevreviewGhanshyam proposed openstack/project-config master: Properly retire neutron-lbaas
ianwok, back to what i was actually looking at which was making sure the new paste server backs itself up :)23:15
ianwclarkb: not particularly urgent but cleans up the old bits23:16
clarkbianw: one really small thing inline but worth fixing imo23:20
clarkbalso that has merge conflicts with one of the review changes says gerrit23:20
clarkb(thats fine, just need to remember to rebase and push if we land paste cleanup first)23:20
opendevreviewIan Wienand proposed opendev/system-config master: Remove
opendevreviewMerged openstack/project-config master: nodepool: set inmotion cloud to zero

Generated by 2.17.2 by Marius Gedminas - find it at!