clarkb | and so when we started the scheduler up again it got failures from those merger instances when fetching configs? | 00:00 |
---|---|---|
ianw | i didn't, only the scheduler. that's a good point. i'll do a full restart | 00:00 |
clarkb | ya ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) shows up in the zuul error report | 00:00 |
ianw | i'm running the full restart playbook now | 00:01 |
clarkb | ok | 00:01 |
ianw | i didn't think about the mergers | 00:01 |
ianw | ok that's finished | 00:02 |
clarkb | its still running through its processes to start up though | 00:03 |
ianw | yep | 00:03 |
clarkb | ianw: Unknown projects: opendev/meetbot | 00:05 |
clarkb | in my paranoia I wonder if that wasn't synced over to the new server properly? | 00:06 |
clarkb | hrm no it seems git/opendev/meetbot.git does exist | 00:06 |
clarkb | it might be an order of operations thing loading configs? | 00:07 |
clarkb | ya I think its a cross tenant order of operations thing | 00:07 |
ianw | loading, loading, loading ... | 00:12 |
ianw | ok, seems back | 00:13 |
clarkb | the error list is much much smaller now too :) | 00:13 |
clarkb | I think you can recheck your system-config change | 00:13 |
ianw | yep, it's running now | 00:14 |
ianw | \o/ | 00:14 |
ianw | i think time for a cup of tea and take a breath! | 00:14 |
clarkb | I want to see the jobs actually start but I agree | 00:15 |
clarkb | https://zuul.opendev.org/t/openstack/stream/5427ef9af78943c5aafe41ca8431fa99?logfile=console.log is the tox-docs job and it did just start | 00:15 |
clarkb | I'm a bit worried that this extended queued time is due to the mergers taking longer to set up repos | 00:17 |
clarkb | but I guess it could also just be slow node launches. Far too early to say | 00:17 |
ianw | the rest are multi node jobs ithink | 00:18 |
ianw | not quite i guess | 00:19 |
ianw | we have quite a few building nodes | 00:20 |
clarkb | Looking at zuul logs I think the issue is noderequest fulfilment | 00:20 |
clarkb | the scheduler has only accepted 2 completed node requests | 00:21 |
clarkb | and another job just started. I just need to learn to be pateitn I think | 00:21 |
ianw | inap-mtl01 has a bunch of building nodes that look exactly like what we want :) | 00:21 |
clarkb | I wonder if we are hitting its image cache problems that we run into periodically where it has a slow period. I think everything is going as expected except for slow node boots and that is independent of our work today | 00:22 |
clarkb | ianw: I think I'll take a break now since stuff seems to be moving the right direction. I'll check in later | 00:23 |
ianw | ++ thank you! | 00:24 |
clarkb | `grep 'Accepting node request' /var/log/zuul/debug.log` on the scheduler if you want to see its progress using nodesets | 00:24 |
clarkb | though I guess that isn't much different than checking the js dashboard | 00:25 |
ianw | yeah it's building a ton of nodes, so i think it's just getting itself warmed up | 00:25 |
clarkb | and thank you for doing all the hard work to make this happen :) | 00:25 |
clarkb | there we go a bunch of jobs just startedon the system-config change | 00:29 |
clarkb | and really need to take a break for dinner now. | 00:30 |
fungi | i'm not really around still, sorry, but all's looking okay? | 00:47 |
Clark[m] | We've sorted through the issues that have come up so far. Currently waiting for zuul to merge the system config change to make review02 the review server in Ansible. Then we can run the playbook manually | 00:48 |
Clark[m] | Then once that is happy we can reenable Ansible and do some cleanup | 00:49 |
Clark[m] | I'm working on dinner then probably a walk then will check back in again | 00:49 |
ianw | ahh, that system-config job wasn't actually running the gerrit checks prior | 00:56 |
Clark[m] | Ya it's different because you touched the config file | 00:57 |
opendevreview | Ian Wienand proposed opendev/system-config master: review02: move out of staging group https://review.opendev.org/c/opendev/system-config/+/797563 | 00:58 |
ianw | Clarkb[m]: another attempt that updates the system-config-run job as well | 00:59 |
clarkb | +2 | 01:00 |
Clark[m] | ianw: looks like it is still failing | 01:33 |
ianw | argh | 01:44 |
ianw | groupadd: GID '3000' already exists | 01:46 |
Clark[m] | That's the Gerrit gid ? | 01:47 |
Clark[m] | Maybe we are running twice for some reason? | 01:47 |
ianw | calling it review02 will help not have to fiddle fake letsencrypt certs | 01:50 |
ianw | (in the system-config-run test) | 01:50 |
opendevreview | Ian Wienand proposed opendev/system-config master: review02: move out of staging group https://review.opendev.org/c/opendev/system-config/+/797563 | 01:53 |
clarkb | ianw: any idea why Create Gerrit Group isn't the only thing created a gid 3000? | 01:54 |
ianw | no i'm going to watch this more closely | 01:55 |
clarkb | ok | 01:55 |
ianw | it is created as 3000 on review02 | 01:55 |
clarkb | I wonder if some package on focal that we install is creating a group after we shift the min gid and uid? | 02:01 |
clarkb | ianw: do you have a hold set up for the runs of ^ | 02:01 |
clarkb | might be a good idea to see what /etc/group says about gid 3000 | 02:01 |
ianw | yep :) | 02:04 |
clarkb | gerrit has mreged a change since the move (just one more indication that things are working overall) | 02:05 |
clarkb | https://review.opendev.org/c/openstack/sushy/+/801034/ that one | 02:05 |
clarkb | I'm going to double check it shows up on the giteas | 02:06 |
ianw | yeah i think it's fine. i really wish i'd noticed this job not running prior to this | 02:06 |
clarkb | ianw: I wonder if we should consider skipping ahead and reenqueuing zuul changes? | 02:08 |
clarkb | fwiw that change showed up on the giteas just fine | 02:08 |
ianw | yep if this run doesn't pass i'll skip ahead, re-enqueue the changes (there's only about 4) and make sure the backups are running | 02:15 |
ianw | the node doesn't appear to start with a gid 3000, so that's something | 02:17 |
clarkb | ianw: for general sanity should we remove the replication config on review01? maybe just ocmment it out in the file? | 02:20 |
clarkb | (just wondering what happens if gerrit starts there again unexpectedly and I think the only real issue would be if it replicated) | 02:21 |
ianw | sure, i can move that out of the way. the apache is still serving the maintenance page so should be hard to merge anything | 02:26 |
ianw | i've moved it to a .post-ugprade file | 02:26 |
clarkb | thanks | 02:27 |
ianw | gerrit2:x:3000: | 02:33 |
ianw | so it made the group | 02:33 |
clarkb | thats good | 02:34 |
clarkb | could it have been a side effect of the LE failure somehow? | 02:34 |
ianw | i think it must have been, but i'm not sure how | 02:35 |
ianw | Destination directory /etc/netplan does not exist | 02:36 |
ianw | sigh | 02:36 |
ianw | it's going to be a bit more work getting the CI job and hence ansible run working | 02:37 |
clarkb | seems like it is getting close now though as that is the lsat bit of the playbook isn't it? | 02:37 |
clarkb | ianw: I think you can split the netplan fix up out into another playbook and have that not run in the test job? | 02:38 |
ianw | we only want to do that on the production hos | 02:38 |
clarkb | but do run it in the infra prod job | 02:38 |
clarkb | ya | 02:38 |
clarkb | or use a testing flag and only do that when it is undefined or false? | 02:40 |
clarkb | that might be better for simplifying testing and keeping things consistent across hosts | 02:40 |
ianw | i will re-enqueue the zuul changes, put review02 in emergency and allow ansible to start running again | 02:40 |
clarkb | ok | 02:40 |
ianw | i'm happy the server is operational, it's now just making sure the ansible apply is idempotent and doesn't move it backwards :) | 02:41 |
clarkb | ++ | 02:42 |
clarkb | I'm working on an update to your change to do the its a test flag | 02:42 |
clarkb | for the netplan config | 02:42 |
opendevreview | Clark Boylan proposed opendev/system-config master: review02: move out of staging group https://review.opendev.org/c/opendev/system-config/+/797563 | 02:45 |
clarkb | ianw: ^ something like that for the netplan issue maybe | 02:45 |
ianw | thanks | 02:46 |
clarkb | that hasn't kicked the running jobs out of check yet | 02:47 |
clarkb | a new change has just entered check so why hasn't the new patchset of ^ bumped the old one out | 02:51 |
clarkb | that could be a bug in the zuul pipeline changes I ugess | 02:51 |
clarkb | now its queued up. Ya I suspect some sort of starvation processing the pipelines | 02:53 |
ianw | maybe the last job there was in it's post phase or something | 02:54 |
clarkb | well we did redo the pipeline processing in zuul this last week | 02:54 |
clarkb | so it could totally be something to do with that | 02:54 |
clarkb | ianw: do you know why we need to build the gerrit images in those changes too? | 02:56 |
clarkb | hrm I bet we turned it on for test_gerrit.py but we don't really need it? probably helps in the long run | 02:56 |
clarkb | I expect we had problems where we were updating the tests and tryingto test new images with depends on or similar and it wasn't working | 02:56 |
ianw | i think system-config-run-review depends on the images so it always builds them? | 03:00 |
ianw | i don't imagine we'll be taking the server down at this point, so i think we can announce that it is back online | 03:03 |
clarkb | maybe mention that we are still working through restoration of our config managmeent processes so acl changes and new projects aren't possible yet | 03:05 |
ianw | i might keep it simple and say the update is over, and if i can't get this sorted by EOD (which I should be able to) call that out | 03:17 |
clarkb | ok | 03:18 |
ianw | I wonder if the "unknown" time remaining somehow has to do with the pause entering the gate | 03:22 |
clarkb | ya zuul only shows a number there once all jobs have at least started | 03:24 |
ianw | #status alert The maintenance of the review.opendev.org Gerrit service is now complete and service has been restored. Please alert us in #opendev if you have any issues. Thank you | 03:24 |
opendevstatus | ianw: sending alert | 03:24 |
clarkb | so having a pause and then waiting for some stuff ot happen causes that to happen in the zuul web ui | 03:25 |
-opendevstatus- NOTICE: The maintenance of the review.opendev.org Gerrit service is now complete and service has been restored. Please alert us in #opendev if you have any issues. Thank you | 03:25 | |
clarkb | ianw: do alerts change the topic? | 03:25 |
clarkb | doesn't look like it. I guess | 03:25 |
ianw | not at the moment | 03:25 |
ianw | something to do with acl permissions in oftc or something or other | 03:25 |
ianw | oh, doh, there's an end command isn't there | 03:26 |
clarkb | ya you #status ok to end the alert | 03:26 |
clarkb | which sets the topics back again iirc | 03:26 |
ianw | #status ok | 03:27 |
clarkb | I usually use #notice unless I know I want it in the topics | 03:27 |
clarkb | it might not process that until it is done processing the alert (and you may need to reissue it? | 03:27 |
ianw | oh well it's in the checklist for next time :) | 03:28 |
ianw | probably it's good that it's been so long since i sent a global alert that i forgot! | 03:29 |
ianw | review jobs running now, fingers crossed | 03:29 |
opendevstatus | ianw: sending ok | 03:30 |
ianw | system-config-run-review-3.2 success ! yay | 03:48 |
clarkb | progress | 03:48 |
ianw | i've disabled the backup cron jobs on review01 and will get backups happening on 02 once 797563 merges and i run it | 03:54 |
clarkb | ianw: ok. Keep in mind having review02 in the emergency file makes running the playbook weird | 03:54 |
ianw | yep i have a command that uses inventory out of my checkout | 03:54 |
clarkb | I think you may end up a huge set of jobs because ineventory changed that zuul will work through. If service-review is far down the list you might get away with just running the playbook after removing 02 from the emergency file | 03:55 |
clarkb | ianw: without the emergency file? | 03:55 |
ianw | yeah, for just running review. i'll run it by hand as i want to watch it | 03:55 |
clarkb | got it | 03:55 |
clarkb | ianw: I'm a little annoyed we'll get a new gerrit image we don't need, but at the same time we just updated the gerrit imgae so that should be fine for hwenever we restart in the future | 04:01 |
ianw | yeah, i'm not sure of a way around that | 04:04 |
opendevreview | Ian Wienand proposed opendev/system-config master: gerrit: fix Launchpad credentials write https://review.opendev.org/c/opendev/system-config/+/801227 | 04:07 |
opendevreview | Merged opendev/system-config master: review02: move out of staging group https://review.opendev.org/c/opendev/system-config/+/797563 | 04:49 |
ianw | yay, it's that easy | 04:51 |
*** ykarel|away is now known as ykarel | 04:53 | |
*** dpawlik0 is now known as dpawlik | 05:10 | |
ianw | ok i have run the review playbook against the new server and everything looks good. replication config is setup, nothing out of order in the other configs, cron jobs are there for cleanup etc. | 05:17 |
ianw | i'm taking the server out of emergency as it should be fine now | 05:17 |
opendevreview | Ian Wienand proposed opendev/system-config master: backups: add review02.opendev.org https://review.opendev.org/c/opendev/system-config/+/797564 | 05:29 |
*** mgoddard- is now known as mgoddard | 06:04 | |
*** amoralej|off is now known as amoralej | 06:10 | |
opendevreview | Merged opendev/system-config master: backups: add review02.opendev.org https://review.opendev.org/c/opendev/system-config/+/797564 | 06:19 |
ianw | looks like package installation on review02 is actually borked due to https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1926918 | 06:44 |
ianw | i'm going to try the downgrade mentioned there | 06:44 |
ianw | i think we actually might need to check all our focal systems for this | 06:45 |
jssfr | Good morning everyone. First time contributor to OpenStack here. My company just signed the CCLA, with my address on the list. I am now looking at the gerrit UI to figure out how to apply this. The only choices I have are the "OpenStack Individual Contributor License Agreement" and two "externally managed" ones. Should I sign the ICLA (<https://docs.openstack.org/contributors/common/setup-gerrit.html#contri | 06:49 |
jssfr | butors-from-a-company-or-organization> seems to suggest that) or will an external process (which I may need to poke?) add some system CLA to my account once the CCLA by my employer has been processed? | 06:49 |
*** dpawlik5 is now known as dpawlik | 06:50 | |
ianw | jssfr: i'm no expert here, but *you* should sign the ICLA and the corporate one is an extra things for company lawyers and the opendev foundation | 07:02 |
*** hashar is now known as Guest1365 | 07:02 | |
*** hashar_ is now known as hashar | 07:02 | |
ianw | ok, i finally got borg onto review02. running initial backups now | 07:02 |
opendevreview | Ian Wienand proposed opendev/system-config master: review02: skip ~gerrit2/tmp in backup https://review.opendev.org/c/opendev/system-config/+/801235 | 07:05 |
*** dpawlik7 is now known as dpawlik | 07:06 | |
jssfr | ianw, aha, that is a viewpoint which fits my mental model *and* the stuff written on the page. Thanks! | 07:17 |
ianw | it could be made more explicitly clear, i'd probably agree | 07:22 |
jssfr | I mean the text as written is unambiguous, but combined with the slightly aged screenshots, I wasn't sure if the process is still up-to-date. | 07:25 |
ianw | jssfr: i'm sure a contribution would be welcome to https://opendev.org/openstack/contributor-guide/src/branch/master/doc/source/common/setup-gerrit.rst :) | 07:26 |
ianw | ok, long day, but all 24. of the checklist points are marked off on https://etherpad.opendev.org/p/gerrit-upgrade-2021 | 07:42 |
ianw | the server is up; no complaints and it's processed quite a few changes now, it has had successful backup runs | 07:43 |
ianw | nothing on the cleanup list can't wait | 07:44 |
ianw | i'll try to check back in for the next few hours, but i'm mostly out now | 07:44 |
opendevreview | Merged opendev/system-config master: review02: skip ~gerrit2/tmp in backup https://review.opendev.org/c/opendev/system-config/+/801235 | 08:14 |
*** dpawlik3 is now known as dpawlik | 08:30 | |
*** dpawlik4 is now known as dpawlik | 08:44 | |
*** ykarel is now known as ykarel|lunch | 08:54 | |
*** rpittau|afk is now known as rpittau | 09:31 | |
*** kopecmartin is now known as kopecmartin|pto | 09:45 | |
*** ykarel|lunch is now known as ykarel | 09:59 | |
fungi | jssfr: just to clarify, the ccla is purely paperwork, an a best-effort/honor-system tracking of affiliations for contributors to official open infrastructure foundation projects. in contrast, the icla is required for all contributors to certain projects, for example openstack, and enforced in the code review system so that it prevents contributions from being pushed for repos under the | 11:29 |
fungi | governance of those projects unless you've agreed to it | 11:29 |
jssfr | aha, understood | 11:34 |
fungi | infra-root: just a reminder, i'm still away and on the road all day today, but should be around and start catching back up tomorrow | 12:26 |
*** amoralej is now known as amoralej|lunch | 12:52 | |
opendevreview | Ananya Banerjee proposed opendev/elastic-recheck master: Run elastic-recheck container https://review.opendev.org/c/opendev/elastic-recheck/+/729623 | 13:06 |
*** sshnaidm|afk is now known as sshnaidm | 13:10 | |
opendevreview | Ananya proposed opendev/elastic-recheck master: Run elastic-recheck container https://review.opendev.org/c/opendev/elastic-recheck/+/729623 | 13:14 |
mnaser | is it me or gerrit does feel much more snappy/quick | 13:42 |
*** amoralej|lunch is now known as amoralej | 13:43 | |
rm_work | Question for folks -- when you do DB maintenance, do you just ... take down the DB briefly and expect OpenStack services to deal with retries or whatever for the duration? Do you have a more complex strategy? Turn off OpenStack services first? Keep the DB available via a mirrored DB setup using the read-only node? | 13:49 |
clarkb | rm_work: we don't operate openstack services for the most part so not in a great position to answer | 14:28 |
rm_work | heh yeah but this channel is a who's who of operators :D | 14:28 |
clarkb | mnaser: our big theory for why the old gerrit was very slow was memory contention preventing gerrit and the operating system and the web server from having enough memory to all be happy at once. The new instance is larger (thank you mnaser and vexxhost!) allowing us to allocate more memory for each of those memory consumers | 14:28 |
clarkb | mnaser: long story short I'm very glad to hear you think it is snappier and we thank you for the help in making that happen :) | 14:29 |
mnaser | clarkb: yeah, i'm happy that it actually had a positive impact -- i tried removing a topic from a change and it happened instantly vs before which would take quite a bit of time :) | 14:29 |
clarkb | rm_work: the two recent db maintenances we did (the gerrit move and a zuul upgrade that required a db migration) were both done with services down. Not ideal but things are getting better slowly. | 14:30 |
clarkb | mnaser: another good test is dansmith's giant patch bombs :) pushing those has been very slow in the past | 14:31 |
clarkb | a series of changes or change updates to a large repo in particular | 14:31 |
clarkb | since things seem to be going well this morning I'm going to go find breakfast and do my normal startup routine. I'm hoping that I can then start hacking on testing of our project rename playbook today as well in prep for the planned renames sometime next week | 14:33 |
clarkb | rm_work: back when lifeless was thinking about these problems I think he liked the idea of a transparent cutover using an intelligent proxy | 14:34 |
clarkb | rm_work: I have no idea how feasible that is with the tooling available today, but basically you mirror the database then force all reads and writes to go through a proxy to keep things in sync. Then to cut over you have the proxy halt conncetions momentarily while you do a catch up on the new side and then remove the old side from the proxy | 14:35 |
rm_work | yeah the thing i've run into is a DB team that thought it'd be helpful of them to use read-only mode for cutovers rather than a hard outage, and some services (at least octavia) that are coded to understand and retry on that, but write failures cause them to behave BADLY | 14:35 |
rm_work | trying to figure out if it's reasonable and normal to just do hard-down for maintenance on the DB briefly, and if most services play nice with that | 14:36 |
rm_work | * yeah the thing i've run into is a DB team that thought it'd be helpful of them to use read-only mode for cutovers rather than a hard outage, and some services (at least octavia) that are coded to understand and retry on hard-outage, but write failures cause them to behave BADLY | 14:37 |
*** ykarel is now known as ykarel|away | 14:38 | |
rm_work | sorry for probably misusing your channel, I have a kind of bad habit of that since it's the best place I know of to catch a specific set of people 😅 | 14:39 |
clarkb | no problem, I just wanted to be clear that we don't really have direct experience with that problem and openstack. Though I suppose other channel lurkers may (like mnaser?) | 14:39 |
rm_work | yeah I was about to ping him directly :P | 14:40 |
clarkb | corvus: yesterday when we were trying to get the system-config change to specify review02 as the new gerrit server tested and landed I pushed a new patchset for the change and zuul didn't evict the old patchset as quickly as I expect it would. | 15:20 |
clarkb | corvus: https://review.opendev.org/c/opendev/system-config/+/797563 is the change and it was patchset 5 in check when I pushed patchset 6. I don't think this is currently urgent but it occured to me that that may indicate starvation in the pipeline processing loops? | 15:22 |
clarkb | wanted to call it out in case others notice similar | 15:22 |
dtantsur | hey! are there any mirror problems with opensuse nodes? https://zuul.opendev.org/t/openstack/build/4ba8493813d440998547da49825f7440/log/job-output.txt#673 | 15:34 |
clarkb | dtantsur: we may have synced bad state from our upstream mirror | 15:35 |
clarkb | looks like we last synced opensuse 18 days ago. The upstream we are using has a different repomd.xml that points at a file present in the upstream dir http://mirror.us.leaseweb.net/opensuse/update/leap/15.2/oss/repodata/ | 15:39 |
clarkb | VLDB: vldb entry is already locked | 15:41 |
clarkb | that is why we aren't updating that volume. I'll dig into that | 15:41 |
dtantsur | thank you! | 15:45 |
clarkb | I don't see any running vos release for that on the system that does the vos releases. I've held the flock we use to do the mirror updates for opensuse and will break the vldb lock and manually run the mirror update script | 15:47 |
*** marios is now known as marios|out | 16:29 | |
*** amoralej is now known as amoralej|off | 16:57 | |
clarkb | dtantsur: I think you should be good now. I've rerunning the update manually one mroe time to convince myself that it is happy on the update side but the mirrors show the new content as expected | 17:03 |
dtantsur | great, thanks! I'll create a test patch | 17:04 |
clarkb | infra-root I've updated https://gerrit-review.googlesource.com/c/gerrit/+/312302/ with tests and if those pass in upstream CI (figuring out how to run them locally was an experience) I'll see what I can do to get reviews upstream | 17:07 |
clarkb | dtantsur: thank you for letting us know | 17:09 |
*** rpittau is now known as rpittau|afk | 17:38 | |
opendevreview | Chao Zhang proposed zuul/zuul-jobs master: Update commits since tag calculation https://review.opendev.org/c/zuul/zuul-jobs/+/801370 | 18:03 |
opendevreview | Chao Zhang proposed zuul/zuul-jobs master: Update commits since tag calculation https://review.opendev.org/c/zuul/zuul-jobs/+/801370 | 18:04 |
opendevreview | melanie witt proposed openstack/project-config master: Set launchpad bug Fix Released after adding comment https://review.opendev.org/c/openstack/project-config/+/801376 | 18:54 |
timburke | i saw the all-clear notice went out a while ago, but i'm still getting redirects to the maintenance page when i go to https://review.opendev.org/ -- is that expected? | 19:22 |
timburke | i'm also seeing errors like "ssh: connect to host review.opendev.org port 29418: Connection refused" if i use git/git-review, which makes me curious about how the patches just above got submitted :-/ | 19:24 |
timburke | maybe i've got some stale dns? review.opendev.org and review.openstack.org both seem to resolve to 104.130.246.32 for me, fwiw | 19:27 |
opendevreview | melanie witt proposed openstack/project-config master: Set launchpad bug Fix Released after adding comment https://review.opendev.org/c/openstack/project-config/+/801376 | 19:33 |
Clark[m] | Yes, that is the old DNS record | 19:33 |
Clark[m] | timburke: any idea what might be holding on to that value? We lowered the ttl to 5 minutes last week and prior to that it was 60 minutes. Both much shorter than the time between now and when we updated dns | 19:34 |
timburke | seems to be something on my end -- dig's telling me there's a TTL of 0 (!!) coming from SERVER: 127.0.0.53#53(127.0.0.53) :-( | 19:36 |
timburke | definitely user error! turns out i've got something in /etc/hosts with a comment like "WTF IPv6 (Nov 2020)" 🤣 | 19:37 |
timburke | ignore me :-) | 19:37 |
opendevreview | Chao Zhang proposed zuul/zuul-jobs master: Update commits since tag calculation https://review.opendev.org/c/zuul/zuul-jobs/+/801370 | 19:59 |
opendevreview | Chao Zhang proposed zuul/zuul-jobs master: Update commits since tag calculation https://review.opendev.org/c/zuul/zuul-jobs/+/801370 | 20:00 |
clarkb | I've made my edits to the team meeting agenda. I'll hold off on sending it until ianw can check it for any missing important items/details | 20:32 |
clarkb | please add your edits soon though :) | 20:32 |
clarkb | infra-root rax rebooted eavesdrop01.opendev.org a few minutse ago. A heads up if you notice bots acting weird | 20:58 |
ianw | o/ | 22:24 |
clarkb | ianw: good morning. Its been really quiet. I think things went well. I did leave a -1 over a small thing on your change to fix the lp creds file | 22:25 |
ianw | agenda lgtm thanks | 22:25 |
ianw | yep i checked my mail this morning to see if i had a bunch of "revert" emails but thankfully not :) | 22:26 |
clarkb | ianw: I was also going to suggest maybe you double check backups and such now that the ~gerrit2/tmp exclusion landed and all the jobs for that should've run | 22:26 |
clarkb | but other than that I think its mostly answer the occasional question that came up (timburke had an /etc/hosts override for review and anothe rperson was looking for firewall update details) | 22:26 |
ianw | can do | 22:27 |
clarkb | overall looking really good. I've been working on my java as a result :) wrote tests for my openid fix and am communicating iwth a reviewer on that now. I'm hopeful we can get this landed | 22:27 |
ianw | ++ it would be great to not deal with that again! | 22:28 |
ianw | somehow we've got two scripts trying to backup the review db | 22:32 |
mordred | that seems less than optimal | 22:33 |
ianw | oh, it's an old one from before it got the "use a local mariadb flag" | 22:34 |
mordred | clarkb: what's your gerrit change link? | 22:34 |
mordred | clarkb: nm. I see in backscroll | 22:34 |
clarkb | mordred: https://gerrit-review.googlesource.com/c/gerrit/+/312302 | 22:34 |
ianw | there is also a cron job for /usr/local/bin/track-upstream | 22:34 |
ianw | which i think we removed right? | 22:34 |
clarkb | ianw: I know fungi was working on that, but I don't know if it landed /me looks in git logs | 22:34 |
clarkb | a change titled "Good riddance to track-upstream and its cronjob" did merge | 22:35 |
clarkb | ianw: I think you can remove that cronjob | 22:35 |
clarkb | that change was in system-config | 22:35 |
ianw | I0d6edcc34f25e6bfe2bc41d328ac76618b59f62d yep; ok i'll remove the entry | 22:35 |
clarkb | mordred: I was hoping to get some feedback on my assertions there before I push a new patchset, but I'll probably push a new patchset at EOD if I don't hear back before then just to keep things moving | 22:36 |
ianw | ok, root now runs only the two cron jobs for the backups | 22:36 |
clarkb | agenda has been sent | 22:43 |
clarkb | ianw: one good side effect of keeping the maintenance banner up on review01 has been that it is abundandtly clear you are talking to the wrong gerrit | 22:47 |
clarkb | we might want to update the text to say something like "This server has moved. If you are seeing this page then double check your DNS resolution and /etc/hosts file for review.opendev.org." ? Though that may be a one off | 22:48 |
ianw | yeah we can change it to "if you are seeing this, you're in the wrong place" :) | 22:48 |
ianw | if infra-root wants to audit their home dirs etc. for anything they feel is important and migrate it, we can probably shut it down after that | 22:49 |
clarkb | I'll make a note to do that tomorrow | 22:50 |
clarkb | I do want to preserve the gerrit account cleanup records I've been keeping. I can move those | 22:50 |
opendevreview | Ian Wienand proposed openstack/project-config master: afs graphs: track openeuler mirror volume https://review.opendev.org/c/openstack/project-config/+/801397 | 22:58 |
ianw | clarkb: ^ i think this is what pushed afs up recently and will give a more complete picture in the dashboard | 22:59 |
ianw | it would probably be good to have a stacking graph that shows all the volumes usage in context | 23:00 |
clarkb | ianw: ya the opensuse mirro stopped updated (stale lock) and I went looking I expect it was that mirror. There was talk elsewhere about maybe doing alma linux and debian is 5GB below its quota limit | 23:00 |
clarkb | anyway wanted to discuss if we thought we needed more disk and if the mirror.yum-puppetlabs is used | 23:01 |
clarkb | I went ahead and approved ^ since it seems straightforward | 23:02 |
clarkb | ianw: if only we could make the distros smaller :) | 23:04 |
fungi | clarkb: ianw: i'm home and skimming nick highlights, but not really here properly until tomorrow... i did delete the track-upstream cronjobs on both the old and new review server, if you check there should have been a sudo crontab -e logged when i did it too. perhaps something put it back? | 23:06 |
fungi | i guess wait and see if it reappears | 23:06 |
ianw | we may have had a run against an older system-config at some point | 23:07 |
clarkb | fungi: ya no rush or worries at the moment. I think it will likely just be a bunch of small updates here and there as we find things to improve | 23:09 |
fungi | Jul 15 13:06:18 review02 sudo: fungi : TTY=pts/6 ; PWD=/home/fungi ; USER=root ; COMMAND=/usr/bin/crontab -e | 23:09 |
fungi | i wonder what replaced it | 23:09 |
clarkb | fungi: if you do have a moment https://review.opendev.org/c/opendev/system-config/+/800274 is one that I'd like to land on Wednesday probably (I should have time to watch and monitor as it goes in). Again no rush given that timeframe but review always welcome | 23:12 |
fungi | i'll load it up in my gertty at least, maybe that'll remind me | 23:12 |
opendevreview | Ian Wienand proposed opendev/system-config master: Point cacti at review02 explicitly https://review.opendev.org/c/opendev/system-config/+/801399 | 23:13 |
ianw | ^ that's one i just thought of, i'm pretty sure cacti is still hanging on to talking to the old server, but it's better to be clear about it | 23:13 |
clarkb | ianw: review.openstack.org points to the new server so it should be getting data from the new one now | 23:14 |
clarkb | but explicit is nice | 23:14 |
opendevreview | Merged openstack/project-config master: afs graphs: track openeuler mirror volume https://review.opendev.org/c/openstack/project-config/+/801397 | 23:19 |
ianw | yeah i guess http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=27&rra_id=all looks right | 23:21 |
ianw | i'm really not sure about the load average results though http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=26&rra_id=all | 23:21 |
clarkb | that looks about right given what I recall from memory before? It might be a bit lower now if we don't have as much memory/io/disk contention | 23:22 |
ianw | also the "used memory" doesn't seem to show up | 23:23 |
ianw | i wonder if cacti isn't so happy with something that focal is doing | 23:24 |
clarkb | ianw: sometimes bouncng the snmpd service on the host (review02 in this case) is sufficient to make things happy again | 23:25 |
clarkb | but it could also be due the the size of the values (they are much larger now) | 23:25 |
ianw | i guess this is not worth too much effort given replacement plans | 23:26 |
opendevreview | Merged openstack/diskimage-builder master: Convert multi line if statement to case https://review.opendev.org/c/openstack/diskimage-builder/+/734479 | 23:31 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!