fungi | ianw: looks like afs01.dfw was rebooted roughly 2 hours ago after the upgrade completed. all went fine i guess? | 00:04 |
---|---|---|
ianw | fungi: i think so, no complaints so far :) | 00:05 |
fungi | looks like the mirror-update crontab is active too, so all clear hopefully | 00:07 |
*** tosky has quit IRC | 00:11 | |
*** yoctozepto has quit IRC | 00:13 | |
*** yoctozepto has joined #opendev | 00:13 | |
clarkb | ze03 and ze04 have a numbr of zuul owned processes but none of them appear to be ansible related anymore | 00:21 |
clarkb | ze02 is still running ansible stuff though. I'll go ahead and stop ze03 and ze04 zuul-executors now | 00:21 |
clarkb | corvus: ^ related to that I think we may be leaking git processes, si that something you would prefer I leave things paused so you can look at? | 00:22 |
clarkb | looks like multiprocessing is involved | 00:23 |
clarkb | the process tree seems to be zuul-executor -> some multiprocessing thing -> git processes | 00:23 |
corvus | clarkb: which host(s)? | 00:25 |
clarkb | corvus: ze02.openstack.org through ze04.openstack.org. easier to see on 03 and 04 because there isn't ansible stuff running too | 00:26 |
corvus | k i'll look on 3 | 00:26 |
corvus | zuul is running du | 00:26 |
corvus | oh zuul is still running but paused. sorry, i'm caught up now | 00:26 |
*** hamalq has quit IRC | 00:26 | |
clarkb | yup I think the du is expected. Its the git processes with multiprocessing parents that are not | 00:27 |
corvus | they're all cat-files | 00:27 |
clarkb | and some are from 2 days ago | 00:28 |
corvus | at random 21254 is from build df2dfafb9d18432bb667001630ca8c44 | 00:28 |
corvus | (so says proc/21254/cwd) | 00:28 |
clarkb | which ls notes as (deleted) | 00:29 |
corvus | it ran the job; the job was aborted, but well past the repo setup stage | 00:30 |
corvus | i see no errors related to that | 00:32 |
corvus | no broken process pool errors around that time that i see | 00:34 |
clarkb | what is odd (at least to me) is wouldn't you expect the git processes to finish and exit? maybe they are hanging on lack of a fd to write out to? | 00:35 |
clarkb | the git processes appear to have their stdin/out/err attacked to pipes on the multiprocess python processes | 00:37 |
clarkb | strace says 21254 is reading off of fd 0 which is a pipe to the parent | 00:39 |
clarkb | I guess it expects some input? | 00:40 |
clarkb | aha the batch flags mean it takes input on stdin for the things to cat | 00:40 |
clarkb | corvus: could this happen if we cancel things before reading all the files and stopping the process? it will just sit there waiting on stdin for more input? | 00:42 |
corvus | that job finished its git prep; i'm not sure what would have been canceled | 00:42 |
corvus | clarkb: hypothesis: these aren't leaked | 00:43 |
clarkb | corvus: would they be waiting for new inputs legitimetely and the pause just exposes they are all there? | 00:44 |
corvus | clarkb: it looks like maybe gitpython runs this as a long-running process. the cwd may just happen to be the first build dir for a newly started process worker | 00:44 |
corvus | clarkb: yeah that's what i'm thinking | 00:44 |
corvus | i'm trying to figure out how long gitpython expects these to last | 00:44 |
corvus | i think that should stick around as long as the git.Repo object is around | 00:48 |
clarkb | ah interesting | 00:48 |
corvus | however, due to bad experiences with gitpython, we try really hard to deref those objects immediately | 00:49 |
corvus | so zuul shouldn't be keeping them around after use | 00:50 |
*** brinzhang0 has joined #opendev | 00:54 | |
corvus | also we're certainly not supposed to have 139 of these. | 00:55 |
fungi | that is rather a few | 00:56 |
corvus | the we should have 8 | 00:56 |
clarkb | there are 8 multiprocessing parents | 00:56 |
corvus | ok, so maybe we're leaking git.Repo objects | 00:57 |
*** brinzhang_ has quit IRC | 00:57 | |
clarkb | I need to go help wtih dinner. I can leave those servers as is and pick up their cleanup in the morning. Let me knwo if you'd like to preserve them longer | 01:01 |
corvus | clarkb: i don't think i'll have time to dig further :( | 01:02 |
clarkb | ok, I don't right now either :), but I can also try to write up a bug at least before I shut things down tomorrow so we have hints for the future | 01:02 |
corvus | i think the only thing that could help now would be an objgraph from one of those subprocesses; and i don't think we can get one? | 01:02 |
clarkb | I'm not sure how we would get one at least | 01:03 |
clarkb | since that is outside of the zuul stuff to objgraph | 01:03 |
corvus | yeah, i mean, we can start a repl on the executor i think | 01:03 |
corvus | but i very much doubt we could start one on the subprocess | 01:04 |
corvus | it's worth noting however that if the executor multiprocessing thing is leaking git repos, it may be leaking other things, which could be exacerbating our oom issues | 01:04 |
*** mlavalle has quit IRC | 01:19 | |
*** hemanth_n has joined #opendev | 01:44 | |
*** Eighth_Doctor has quit IRC | 01:45 | |
*** Eighth_Doctor has joined #opendev | 02:14 | |
*** gothicserpent has quit IRC | 04:22 | |
*** gothicserpent has joined #opendev | 04:22 | |
*** redrobot has quit IRC | 04:27 | |
*** redrobot has joined #opendev | 04:30 | |
*** redrobot has quit IRC | 04:35 | |
*** redrobot has joined #opendev | 04:35 | |
*** ykarel has joined #opendev | 04:38 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: [wip] kerberos ansible https://review.opendev.org/c/opendev/system-config/+/778840 | 05:10 |
*** dviroel has quit IRC | 05:10 | |
*** ykarel has quit IRC | 05:50 | |
*** ykarel has joined #opendev | 05:53 | |
*** ykarel_ has joined #opendev | 06:08 | |
*** marios has joined #opendev | 06:08 | |
*** ykarel has quit IRC | 06:10 | |
*** ykarel_ is now known as ykarel | 06:10 | |
openstackgerrit | Merged openstack/project-config master: Add an nl01.opendev.org config https://review.opendev.org/c/openstack/project-config/+/776979 | 06:48 |
*** slaweq has joined #opendev | 06:59 | |
*** redrobot has quit IRC | 07:00 | |
*** sboyron has joined #opendev | 07:02 | |
*** ralonsoh has joined #opendev | 07:03 | |
openstackgerrit | Moshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse. https://review.opendev.org/c/openstack/diskimage-builder/+/778723 | 07:21 |
openstackgerrit | Moshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse. https://review.opendev.org/c/openstack/diskimage-builder/+/778723 | 07:27 |
openstackgerrit | Moshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse. https://review.opendev.org/c/openstack/diskimage-builder/+/778723 | 07:27 |
*** eolivare has joined #opendev | 07:30 | |
openstackgerrit | Moshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse. https://review.opendev.org/c/openstack/diskimage-builder/+/778723 | 07:31 |
openstackgerrit | Moshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse. https://review.opendev.org/c/openstack/diskimage-builder/+/778723 | 07:34 |
openstackgerrit | Martin Kopec proposed opendev/system-config master: refstack: Edit URL of public RefStackAPI https://review.opendev.org/c/opendev/system-config/+/776292 | 07:34 |
*** lpetrut has joined #opendev | 08:00 | |
*** rpittau|afk is now known as rpittau | 08:21 | |
*** hashar has joined #opendev | 08:50 | |
*** jpena|off is now known as jpena | 08:54 | |
*** tosky has joined #opendev | 09:23 | |
*** toomer has joined #opendev | 09:25 | |
openstackgerrit | Moshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse. https://review.opendev.org/c/openstack/diskimage-builder/+/778723 | 09:41 |
*** DSpider has joined #opendev | 09:54 | |
*** fressi has joined #opendev | 09:55 | |
*** DSpider has quit IRC | 09:56 | |
*** zoharm has joined #opendev | 10:17 | |
*** dviroel has joined #opendev | 10:18 | |
*** hashar has quit IRC | 11:03 | |
*** hashar has joined #opendev | 11:04 | |
*** artom has quit IRC | 11:16 | |
*** ykarel_ has joined #opendev | 11:19 | |
*** ykarel has quit IRC | 11:22 | |
*** ykarel_ is now known as ykarel | 11:23 | |
*** ykarel_ has joined #opendev | 12:07 | |
*** ykarel has quit IRC | 12:09 | |
*** jpena is now known as jpena|lunch | 12:34 | |
*** tkajinam has quit IRC | 12:35 | |
*** tkajinam has joined #opendev | 12:35 | |
*** hashar is now known as hasharLunch | 12:39 | |
*** whoami-rajat has joined #opendev | 12:58 | |
*** redrobot has joined #opendev | 13:06 | |
*** artom has joined #opendev | 13:10 | |
*** hasharLunch is now known as hashar | 13:14 | |
*** jpena|lunch is now known as jpena | 13:28 | |
*** hemanth_n has quit IRC | 13:43 | |
*** fressi has left #opendev | 13:45 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Add the Gerrit reviewers plugin to Gerrit builds https://review.opendev.org/c/opendev/system-config/+/724914 | 14:11 |
*** ykarel_ is now known as ykarel | 14:32 | |
TheJulia | is http://opendev.org/project/repo working? | 14:44 |
TheJulia | Hmm, can't even load the base webpage | 14:44 |
fungi | loads for me... connecting over ipv4 or ipv6? | 14:46 |
ykarel | fungi, hi, we have cleaned the open reviews for stable/ocata and pike can u do the cleanup when u get chance http://lists.openstack.org/pipermail/openstack-discuss/2021-March/020826.html | 14:48 |
fungi | ykarel: yep, i saw the ml post, thanks for following up! we have a very large list of branch deletions to process, and have been working with elod and the release team on integrating it with the rest of release automation similar to how branch creation is currently handled (the manual process is painfully slow) | 14:49 |
fungi | i still owe elod a review on his proposed script change, but worst case i'll handle the current backlog by hand | 14:50 |
ykarel | fungi, ok Thanks for update, and yes automating it would be very helpful | 14:50 |
*** smekala has joined #opendev | 14:51 | |
*** smekala has quit IRC | 14:56 | |
fungi | TheJulia: looking over resource graphs for our gitea backends, it appears gitea08 got slammed briefly (and is still under some fairly heavy load). my guess is you got load-balanced to that one before it popped out of the pool | 14:57 |
fungi | seems the server has recovered now (5-minute load average is back around 3) | 14:58 |
fungi | looks like it has git processes chewing up 100% of a processor each | 14:59 |
fungi | /usr/lib/git-core/git pack-objects --revs --thin --stdout --delta-base-offset | 14:59 |
fungi | child of: /usr/bin/git upload-pack --stateless-rpc /data/git/repositories/openstack/nova.git | 14:59 |
fungi | it ate memory until the oom killer knocked it out | 15:01 |
fungi | causing lots of swap thrash before that | 15:02 |
fungi | all the processing capacity was saturated with iowait | 15:02 |
fungi | as tends to happen under such circumstances | 15:03 |
*** lpetrut has quit IRC | 15:03 | |
TheJulia | fungi: most likely, looks like it is working again | 15:03 |
fungi | in fact the oom killer had to kill three processes within the span of a few seconds | 15:03 |
*** rpittau is now known as rpittau|afk | 15:03 | |
TheJulia | eek | 15:05 |
fungi | we suspect a ci system or some other automated process is trying to clone multiple copies of repos all at once from the same ip address, so gets balanced to the same backend | 15:05 |
fungi | we haven't been able to narrow it down to a particular source yet though | 15:05 |
corvus | fungi: to remind myself: we're using source ip balancing because the backends can be slightly different (it terms of git object structure) because they don't have shared storage, right? | 15:32 |
*** ykarel has quit IRC | 15:37 | |
fungi | corvus: yes, though also we're terminating ssl/tls on the gitea servers at the moment, so layer 4 is the deepest haproxy can see at the moment | 15:38 |
fungi | if we wanted to do layer 7 inspection and/or fancy distribution mechanisms like cookie injection, we'd need to move the cert to the lb | 15:39 |
* fungi doubts cookie injection would actually help this case, it was simply an example | 15:39 | |
fungi | corvus: also if we did distribute these requests across the entire pool, there's a chance we'd simply oom on all the backends and take the entire service offline instead of just slamming a single server, but that could probably be mitigated by growing the cluster even more | 15:41 |
corvus | or the other way: least-conn/round-robin we'd need shared storage | 15:41 |
fungi | yeah | 15:41 |
*** zoharm has quit IRC | 15:58 | |
clarkb | ya specifically I think the issue is that clients if they connect to a new backend in the middle of some "transaction" end up not seeing the objects they expect | 16:06 |
clarkb | however, I think that was largely for older git clients, it is possible this will improve with newer clients being smarter? | 16:07 |
clarkb | in my excitement to get the new executors up yesterday I failed to configure reverse dns for them. This has now been corrected | 16:12 |
clarkb | I'm going to work on cleaning up the extra servers now | 16:12 |
clarkb | the zuul-executor processes are now stopped on the 3 old servers. I'll check grafana seems happy with that in a bit then delete the servers entirely | 16:17 |
marios | elod: o/ hey can you please check https://review.opendev.org/c/openstack/releases/+/774244 again when you have some time for reviews please thank you | 16:17 |
fungi | zuul.opendev.org/t/openstack/status | 16:26 |
fungi | heh, you're not my web browser! | 16:26 |
*** hashar is now known as hasharAway | 16:27 | |
clarkb | fungi: also if you've got time to look over the ~70 entries identified by the latest run of the audit script as having no username, an invalid openid, no review or code pushes I think I'd like to run the retire-user.sh script on that set today then followup with external id cleanups on them early next week if there is no screaming | 16:30 |
fungi | oof, you can tell the openstack release freeze is looming... node request backlog is well over 1k even with the 25% additional quota we got this week | 16:30 |
clarkb | fungi: yup I noticed zuul is busy when double checking the executor status on grafana | 16:30 |
clarkb | fungi: re the accounts, I don't necessarily expect people to check every one of them but maybe review the latest versio nof the audit script and quickly skim the list to ensure nothing stands out as blatantly wrong | 16:31 |
fungi | list is in your homedir on review.o.o again? | 16:32 |
clarkb | yes should have a date suffix of yesterday | 16:32 |
fungi | external_id_conflict_classifications.20210304 | 16:32 |
fungi | that one i guess | 16:32 |
clarkb | that sounds right | 16:32 |
elod | marios: sorry, I found some new commits there again :S | 16:33 |
clarkb | fungi: then I think I'm going to try and finish up the preferred email addr has no external id errors next since I expect the remainder of the external id issues to be much more painful :) | 16:34 |
clarkb | might haev another small batch of account retirements today due to that if I can make sense of the ~17 remaining there | 16:35 |
marios | elod: thank you for checking I just replied, both commits (by me) was update to zuul.d/layout to remove some deprecated templates | 16:36 |
marios | elod: if it really needs to be updated then I will do that. thanks for checking so carefully | 16:36 |
clarkb | #status log Deleted ze02-ze04.openstack.org as they have been replaced with new .opendev.org hosts. | 16:38 |
openstackstatus | clarkb: finished logging | 16:38 |
elod | marios: for the sake of completeness, please update, then I'll +2 immediately :X | 16:40 |
elod | marios: and please do not allow in new patches to those branch until the branch is not deleted | 16:41 |
marios | elod: ack OK waiting for tox validate to finish and will post update thanks | 16:44 |
marios | elod: wrt stopping patches... any ideas how i can do that other than asking folks not to post them ? | 16:44 |
clarkb | I don't know how I've missed this but new gerrit seems to show you changes that were parts of rebases and not relvant to the current diff when doing inter patchset diffs | 16:45 |
marios | elod: updated https://review.opendev.org/c/openstack/releases/+/774244 when you next get a chance | 16:47 |
elod | marios: maybe discuss with stable cores to -W if such patches arrive? | 16:47 |
marios | elod: ack yeah OK I will socialise that some more now it is actually close to happening and ask folks to help me block patches (there should be very few, if any) | 16:48 |
marios | elod: thank you for your time, i am going end of day in a few minutes. if there is anything else about the review i will deal with it next week | 16:48 |
fungi | clarkb: i thought gerrit always showed you the diffs dragged in from rebases when looking at inter-patchset diffs? | 16:52 |
elod | marios: just +2'd it, and I've pinged hberaud. thanks for your patience & have a nice weekend o:] | 16:54 |
fungi | clarkb: heh, i randomly stumbled across one of the google openids. i suppose they're all included in this set | 16:59 |
marios | thank you elod no need to apologise for doing a good job ;) | 16:59 |
marios | elod: have a good one yrself happy friday ;) | 16:59 |
clarkb | fungi: it does, I mean that it differentiates it from the diff you care about | 16:59 |
clarkb | fungi: you get red, green, and purple diff text now | 16:59 |
fungi | oh, neat! | 17:00 |
fungi | node request backlog is not really shrinking, we nearly passed 1.5k a few minutes ago | 17:01 |
clarkb | fungi: I'm going to quickly check the ~17 accounts with preferred email addr issues for invaldi openids and see if I can narrow that list down that way too | 17:08 |
*** marios is now known as marios|out | 17:09 | |
fungi | good idea | 17:10 |
fungi | clarkb: spot checking the 70 entries for "Users without username, ssh keys, valid openid, and no changes or reviews" and it looks right to me | 17:12 |
fungi | ran through a bunch of them querying by hand | 17:12 |
fungi | including double-checking the openid urls 404 for them | 17:12 |
clarkb | great, you're comfortable with running retire-user.sh on those today then removing their conflicting external ids next week? | 17:14 |
clarkb | the time delay helps ensure we haven't missed anything. If we feel strongly about it we can probably just remove the external ids now (though undoing external id removals is more difficult than undoing the retire changes) | 17:15 |
*** eolivare has quit IRC | 17:15 | |
clarkb | using the invalid openid or no openid approach against the preferred email issues identifies 4 more that can be retired (these don't need external id cleanups) | 17:20 |
clarkb | and then I've got 3 more on top of all that that I've sort of manually identified as cleanable. One is the tripleo.ci account, another is an account whose openid says "foo-unused", and the third is for hubcap who is long gone and if I have to make amends will bribe with whiskey | 17:21 |
*** marios|out has quit IRC | 17:22 | |
fungi | clarkb: yeah, comfortable retiring that set whenever you're ready | 17:24 |
fungi | whiskey seems like a fine approach | 17:25 |
clarkb | cool, I'll proceed with those now | 17:26 |
clarkb | weshay|ruck: I've set the tripleo.ci account inactive. the os-tripleo-ci account is untouched. Let us know if this causes any problems. I won't do the more extensive cleanup until next week | 17:32 |
weshay|ruck | clarkb, nice.. thanks! happy friday | 17:33 |
fungi | node request backlog is well over 1.5k now. i rechecked an openstackclient change and it took nearly 3 hours to get nodes assigned | 17:41 |
fungi | looks like there's a slew of neutron changes in the gate pipeline for the openstack tenant, i expect that's a big part of it given the number of node-hours those burn and the odds of gate resets in a deep queue there | 17:42 |
fungi | >90% of the changes in the gate are for neutron | 17:43 |
fungi | er, in the integrated gate queue i mean | 17:44 |
fungi | that said, things are moving quickly. oldest change in the gate has only been there for 3 hours | 17:45 |
fungi | and we're logging fairly steady merge events | 17:46 |
*** mlavalle has joined #opendev | 17:47 | |
fungi | oldest change in check is about to report and has only been in there for 4.5 hours. so really not too bad | 17:47 |
fungi | executors are pretty choked though, we're spending a fair amount of time with no executors accepting jobs | 17:49 |
fungi | looks like it's probably the ram governor | 17:50 |
fungi | that said, we're not under-utilizing our node quota so i don't expect it's a problem | 17:51 |
*** jpena is now known as jpena|off | 17:58 | |
*** irclogbot_3 has joined #opendev | 18:04 | |
clarkb | ok retire-user.sh has been run against all of those accounts. There was one account without an account.config file so my sed failed to update the file. I'm going to look at that next (and possibly retire it via the api?). Then rerun the audit script and all of these accounts should show up in the top list saying they can have their external ids cleaned up | 18:04 |
clarkb | oh let me upload the log really quickly first though | 18:04 |
clarkb | log is uploaded | 18:05 |
clarkb | I set the odd account inactive via the api and that seems happy | 18:08 |
fungi | interesting. i wonder why it was missing an account.config | 18:10 |
clarkb | and I have started the reaudit to ensure we get these accounts classified properly as external ids removable because one account is inactive | 18:10 |
clarkb | fungi: looks like no full name or preferred email was ever set so gerrit didn't write out the config fiel I guess | 18:10 |
fungi | i mean, makes our decision even easier. too bad there aren't more of those? | 18:11 |
clarkb | yup | 18:11 |
fungi | probably it was the result of an almost but not quite complete merger/retirement before the 3.2 upgrade | 18:12 |
clarkb | the audit script isn't particularly quick due to its checking of the openids. Once we've fully cleaned up this invalid openid set I'll update the script to remove that (at least by default) | 18:13 |
fungi | yeah, banging launchpad with those is probably not terrible polite. sort of surprised they haven't thrown a yellow card | 18:15 |
clarkb | fungi: well our gerrit is slow enough that we sort of have a built in sleep between requests I think. Also I'm only doing HEADs | 18:15 |
fungi | heh, good point. and yeah head is sufficient and much nicer | 18:15 |
clarkb | I thought about adding a sleep but realized that the gerrit queries in between are likely long enough to spread things out | 18:15 |
*** irclogbot_3 has quit IRC | 18:24 | |
*** hasharAway has quit IRC | 18:27 | |
*** irclogbot_2 has joined #opendev | 18:27 | |
clarkb | and my audit crashed due to a name resolution failure. I blame my terrible wireless | 18:27 |
clarkb | I'm going to trim the input list down to the accounts we just modified as that should run much quicker and still double check things | 18:31 |
*** ralonsoh has quit IRC | 18:35 | |
*** toomer has quit IRC | 18:40 | |
fungi | i suspect the holes in our executors accepting graph represent deep gate queue resets for the openstack tenant | 18:46 |
fungi | now that the queue is stabilizing, the executors are mostly all back to accepting | 18:47 |
fungi | spoke too soon, another neutron change just blew | 18:49 |
fungi | and right on cue, executors accepting takes a nosedive to 0 | 18:52 |
fungi | so yeah i think that's what's going on | 18:52 |
clarkb | ya they get busy handling resets | 18:54 |
clarkb | the audit script runs much more quickly when you reduce the problem space. Found one minor bug in reporting for accounts that are inactive cleanups though (note I don't think this would cause problems for any of the actions we've previously taken as it was underreporting inactive accounts. It would only report an inactive account if there was a conflicting active account) | 18:59 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add tools being used to make sense of gerrit account inconsistencies https://review.opendev.org/c/opendev/system-config/+/777846 | 19:06 |
clarkb | bug is fixed in ^ | 19:06 |
*** elod has quit IRC | 19:12 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 19:12 |
*** elod has joined #opendev | 19:13 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 19:22 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 19:24 |
*** sboyron has quit IRC | 19:30 | |
clarkb | fungi: https://review.opendev.org/c/zuul/zuul/+/740448/7..11/tests/unit/test_scheduler.py that shows you what I meant about the diffs earlier | 19:42 |
clarkb | ze05-08 replacements are being launched now | 19:50 |
openstackgerrit | Clark Boylan proposed opendev/zone-opendev.org master: Add replacement ze05-08 servers to dns https://review.opendev.org/c/opendev/zone-opendev.org/+/779040 | 20:05 |
clarkb | inventory change coming up next | 20:05 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Replace ze05-08.openstack.org with ze05-08.opendev.org https://review.opendev.org/c/opendev/system-config/+/779041 | 20:08 |
openstackgerrit | Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Create a template for ssh-key and size https://review.opendev.org/c/zuul/zuul-jobs/+/773474 | 20:09 |
clarkb | infra-root ^ I'm around today to get those in and stop old servers if you have a moment to erview those changes | 20:09 |
fungi | i went ahead and approved those, neither should have direct production impact | 20:14 |
fungi | just adding/correcting dns records, and switching configuration management to start running against the new servers instead of the old ones (and also flipping names around in cacti) | 20:15 |
clarkb | yup, and we've done it now for 4 other hosts so I don't expect much trouble. The real fun happens when asking zuul-executor to gracefully stop | 20:16 |
openstackgerrit | Merged opendev/zone-opendev.org master: Add replacement ze05-08 servers to dns https://review.opendev.org/c/opendev/zone-opendev.org/+/779040 | 20:16 |
fungi | so much grace it all over yuo screen (and process list) | 20:17 |
*** slaweq has quit IRC | 20:17 | |
corvus | retro +2 | 20:19 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Fix sshfp record printing https://review.opendev.org/c/opendev/system-config/+/779044 | 20:20 |
clarkb | the zone file edits pointed out ^ is a thing we should do | 20:20 |
fungi | thanks | 20:22 |
clarkb | linus says don't run linux 5.12-rc1 because it has swapfile problems that will write to all the wrong plces in your filesystem | 20:23 |
clarkb | though we're probably someo f the only people that swapfile because cloud ci stuff (and none of our hosts will have a brand new kernel) | 20:23 |
fungi | yeah, i'm still on 5.10 because of the debian bullseye release freeze | 20:31 |
fungi | looks like our node request backlog may finally be on the downward slide into the weekend | 20:34 |
fungi | though a neutron change just reset the entire openstack integrated gate queue again a few minutes ago | 20:35 |
openstackgerrit | Merged opendev/system-config master: Replace ze05-08.openstack.org with ze05-08.opendev.org https://review.opendev.org/c/opendev/system-config/+/779041 | 20:50 |
clarkb | fungi: looks like a big tripleo reset just happened too. Definitely not out of the woods yet, but likely to catch up before next week starts it over agani | 20:52 |
fungi | at least feature freeze will end next week | 20:54 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 21:03 |
*** hamalq has joined #opendev | 21:14 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 22:01 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 22:08 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 22:17 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 22:25 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 22:34 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 22:46 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 22:56 |
*** iurygregory has quit IRC | 22:57 | |
clarkb | ansible has found the new servers, now I wait | 23:11 |
*** iurygregory has joined #opendev | 23:22 | |
*** elod has quit IRC | 23:25 | |
clarkb | starting new executors now | 23:34 |
clarkb | and the old executors have been asked to gracefully stop | 23:39 |
fungi | awesome | 23:40 |
clarkb | ze05.openstack.org is particularly busy but it should be easing off now | 23:42 |
*** hamalq has quit IRC | 23:44 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!