clarkb | zuul, static, mirror, and mirror-update appear to be all the places we run openafs-client | 00:00 |
---|---|---|
clarkb | I agree a mirror node seems like the best option for upgrading out of that set | 00:00 |
clarkb | the inap provider appears to be disabled in nodepool. I'll test on its mirror | 00:01 |
clarkb | https://mirror.mtl01.inap.opendev.org/ubuntu/ it appears to be working now | 00:01 |
clarkb | I will apt-get update && and apt-get install openafs-client on it? | 00:02 |
clarkb | then reboot? | 00:02 |
clarkb | ianw: fungi ^ that seem like a reasonable approach? | 00:02 |
ianw | clarkb: ++ | 00:02 |
clarkb | looks like we also install openafs-krb5 so I'll apt-get install that with openafs-client | 00:03 |
*** redrobot has quit IRC | 00:03 | |
*** fbo has quit IRC | 00:03 | |
*** artom has quit IRC | 00:03 | |
*** chrome0 has quit IRC | 00:03 | |
*** yoctozepto has quit IRC | 00:03 | |
*** calcmandan has quit IRC | 00:03 | |
*** cloudnull has quit IRC | 00:03 | |
*** openstackgerrit has quit IRC | 00:03 | |
*** otherwiseguy has quit IRC | 00:03 | |
*** elod has quit IRC | 00:03 | |
*** mhu has quit IRC | 00:03 | |
*** SotK has quit IRC | 00:03 | |
*** spotz has quit IRC | 00:03 | |
*** hamalq has quit IRC | 00:03 | |
*** auristor has quit IRC | 00:03 | |
*** jaicaa has quit IRC | 00:03 | |
*** tkajinam has quit IRC | 00:03 | |
*** andrii_ostapenko has quit IRC | 00:03 | |
*** dviroel has quit IRC | 00:03 | |
*** guilhermesp has quit IRC | 00:03 | |
*** clayg has quit IRC | 00:03 | |
*** rm_work has quit IRC | 00:03 | |
*** walshh_ has quit IRC | 00:03 | |
*** donnyd has quit IRC | 00:03 | |
*** persia has quit IRC | 00:03 | |
*** seongsoocho has quit IRC | 00:03 | |
*** mattmceuen has quit IRC | 00:03 | |
*** portdirect has quit IRC | 00:03 | |
*** gmann has quit IRC | 00:03 | |
*** melwitt has quit IRC | 00:03 | |
*** ttx has quit IRC | 00:03 | |
*** mordred has quit IRC | 00:03 | |
clarkb | hrm looks like we also do a dance where we install the kernel module first then those two | 00:04 |
clarkb | I'll try to figure out how to translate that to apt-get commands | 00:04 |
*** hamalq has joined #opendev | 00:05 | |
*** auristor has joined #opendev | 00:05 | |
*** jaicaa has joined #opendev | 00:05 | |
*** tkajinam has joined #opendev | 00:05 | |
*** andrii_ostapenko has joined #opendev | 00:05 | |
*** dviroel has joined #opendev | 00:05 | |
*** guilhermesp has joined #opendev | 00:05 | |
*** walshh_ has joined #opendev | 00:05 | |
*** clayg has joined #opendev | 00:05 | |
*** rm_work has joined #opendev | 00:05 | |
*** donnyd has joined #opendev | 00:05 | |
*** persia has joined #opendev | 00:05 | |
*** seongsoocho has joined #opendev | 00:05 | |
*** portdirect has joined #opendev | 00:05 | |
*** mattmceuen has joined #opendev | 00:05 | |
*** gmann has joined #opendev | 00:05 | |
*** melwitt has joined #opendev | 00:05 | |
*** ttx has joined #opendev | 00:05 | |
*** mordred has joined #opendev | 00:05 | |
fungi | clarkb: sounds great | 00:05 |
clarkb | --no-install-recommends when install openafs-modules-dkms, then install the other packages | 00:06 |
*** redrobot has joined #opendev | 00:06 | |
*** cloudnull has joined #opendev | 00:06 | |
*** fbo has joined #opendev | 00:06 | |
*** artom has joined #opendev | 00:06 | |
*** otherwiseguy has joined #opendev | 00:06 | |
*** elod has joined #opendev | 00:06 | |
*** spotz has joined #opendev | 00:06 | |
*** yoctozepto has joined #opendev | 00:06 | |
*** mhu has joined #opendev | 00:06 | |
*** calcmandan has joined #opendev | 00:06 | |
*** SotK has joined #opendev | 00:06 | |
*** mordred has quit IRC | 00:07 | |
*** Eighth_Doctor has quit IRC | 00:07 | |
*** artom has quit IRC | 00:07 | |
*** tosky has quit IRC | 00:07 | |
*** artom has joined #opendev | 00:08 | |
*** chrome0 has joined #opendev | 00:08 | |
*** guilhermesp has quit IRC | 00:09 | |
*** donnyd has quit IRC | 00:09 | |
*** gmann has quit IRC | 00:09 | |
*** donnyd has joined #opendev | 00:10 | |
fungi | i think system-config-run-static may be impacted by this | 00:10 |
*** guilhermesp has joined #opendev | 00:10 | |
clarkb | presumably if it reruns it will run with the new packages and be happy | 00:10 |
clarkb | still waiting on dkms to do its thing on the inap mirror | 00:11 |
*** gmann has joined #opendev | 00:11 | |
fungi | https://zuul.opendev.org/t/openstack/build/28c0192e984548b0a48d10451e6752fb/log/job-output.txt#45767-45775 | 00:11 |
fungi | (also wow that's a long log) | 00:11 |
fungi | look for the "Check AFS mounted" task since the autoscroll isn't going to work on progressive loading a log that long | 00:12 |
clarkb | it looks like inap's mirror may have been running 1.8.3 not 1.8.6-1 fwiw | 00:13 |
clarkb | so these may all need manual intervention? | 00:13 |
fungi | i can try to upgrade some once you're comfortable with the first one | 00:13 |
fungi | should i go ahead and recheck the change which was failing to afs mount? | 00:14 |
fungi | i guess all the relevant packages are in our ppa now | 00:14 |
clarkb | yes they should be there except for arm64 last I checked | 00:15 |
fungi | it's an x86 job so should be fine then | 00:15 |
*** artom has quit IRC | 00:15 | |
ianw | yep all published | 00:18 |
clarkb | rebooting inap mirror now | 00:19 |
ianw | so https://etherpad.opendev.org/p/infra-openafs-1.8 has the outline of what i think an emergency 1.8 upgrade would be | 00:20 |
clarkb | ianw: I was trying to follow along as you went and added some notes too. I think that captures it. I don't know if iti spossible to do a non downtime upgrade. My understanding in the past was that it was not but that may not be accurate (and this was because 1.8 and 1.6 inside the server level couldn't talk to each other) | 00:21 |
clarkb | https://mirror.mtl01.inap.opendev.org/ubuntu/lists/ seems to be working post reboot | 00:22 |
clarkb | and from what I can tell it installed the new packages | 00:22 |
clarkb | I think we'll update openafs in most (all?) places when zuul does its daily runs and/or when unattended upgrades happens | 00:23 |
clarkb | do we want to proactively upgrade them? ahead of that? if not I can check dpkg -l tomorrow and confirm they updated on their own | 00:23 |
ianw | that should be right, though i guess things might want a reboot | 00:23 |
clarkb | ya my concern with rebooting mirrors is that zuul's queue is super deep right now | 00:24 |
clarkb | trying to balance the various factors in play here (not easy) | 00:24 |
ianw | but if the mirrors are getting random failures that's also not great | 00:24 |
clarkb | ya though as far as I can tellthey haven't yet. Are you concerned that they may after the upgrade happens but before a reboot? | 00:25 |
ianw | am i understanding correctly that is the current failure case? randomish failures from the 1.6 servers? | 00:25 |
clarkb | ianw: 1.8 will 100% fail apparently when it starts exhibiting the problem | 00:25 |
clarkb | 1.6 will be randomish | 00:25 |
clarkb | none of our systems should do that unless we sufficiently restart openafs (not sure what that is but reboot definitely is sufficient) | 00:26 |
clarkb | this is beacuse all of our systems should've started before the epoch rollover thing | 00:26 |
clarkb | obviously they won't necessarily all remain in that state in the future as clouds do their cloudy thing | 00:26 |
ianw | yeah, i don't really like sitting on a time-bomb that as soon as the backend fails, we have a fire-drill to get the servers updated | 00:27 |
ianw | actually it's not a fire-drill, it's an actual fire at that point :) | 00:27 |
clarkb | agreed, but we also have a potential multiday zuul backlog that will just implode on itself if we take an outage to fix it. Trying to figure out where in my head the balance is between imploding all those jobs to take a downtime and fix this vs waiting for now and fixing it when zuul is hopeflly happier | 00:28 |
clarkb | mirrors, zuul executors, and static are all involved in that | 00:29 |
clarkb | (in addition to the afs servers) | 00:29 |
ianw | i think doing the manual upgrade to 1.8 with servers in emergency probably isn't a bad thing in the long run | 00:30 |
ianw | it will give us a chance to see 1.8 in action before we upgrade the base os of the servers | 00:30 |
ianw | effectively one thing at a time. we can feel more confident about dropping in replacement servers one-by-one if eerything is at 1.8 | 00:31 |
clarkb | ya agreed | 00:31 |
clarkb | for mirrors we can disable a cloud in nodepool, wait for it to drain out (up to 3 hours or so each due to tripleo jobs), update the mirror and reboot it | 00:31 |
clarkb | for zuul executors we can stop zuul, update openafs, then reboot them one at a time | 00:31 |
clarkb | zuul should retry the jobs that fail as a result | 00:32 |
clarkb | that has an impact but it is smaller | 00:32 |
clarkb | fungi: we may also want to talk to the release team? | 00:32 |
fungi | 2021-01-15 00:03:31 <-- openstackgerrit (~openstack@eavesdrop01.openstack.org) has quit (*.net *.split) | 00:32 |
* fungi sighs | 00:32 | |
clarkb | are they making a ton of releases as part of this milestone that is plugging everything up? | 00:32 |
clarkb | static I think is largely read only and so the impact of that might be less noticeable | 00:33 |
clarkb | ianw: thinking out loud here, doing mirrors, executors, and static first is probably more straight forward and will test our packages betterer? | 00:34 |
fungi | i think the release team really only notices when the releases and tarball sites get out of date (and release notes on docs site) | 00:34 |
clarkb | fungi: well we'll potentially break our ability to write to tarballs | 00:35 |
fungi | so 770856 is probably all they'll need | 00:35 |
clarkb | (if zuul executors are not happy with the upgrade) | 00:35 |
fungi | oh, writes, yes | 00:35 |
ianw | yeah, i agree we need to make sure they are all in a state of having the latest ppa client running so that if the server does get switched on them we are ok | 00:35 |
clarkb | fungi: the new package seems to work for reads just fine | 00:35 |
clarkb | and we can do a single zuul executor first then observe it ? | 00:36 |
fungi | so the current risk is that clients running unpatched 1.8 may spontaneously reboot and stop working, which if we've upgraded the packages (unattended upgrades, ansible, et cetera) and just not rebooted them yet, is probably fine | 00:36 |
clarkb | fungi: also note the chagne you are trying to land will update static's install | 00:36 |
fungi | we already deal with corrupted afs caches at reboot which blocks afs from working on the mirrors on a frequent basis | 00:36 |
clarkb | fungi: ya the other risk is that the new packages don't work in some way and we may not notice until we reboot | 00:37 |
fungi | update static's install but not restart afsd or reboot the server | 00:37 |
clarkb | correct | 00:37 |
fungi | sure, but like i said, we already frequently deal with afs not working after an unclean spontaneous reboot | 00:38 |
clarkb | mirror update, mirrors, and static normally update daily via the periodic pipeline. zuul updates hourly | 00:38 |
fungi | having to scramble to work out why it broke differently would be not great, sure | 00:38 |
clarkb | its possible that all the zuul executors have already updated? | 00:38 |
clarkb | fungi: ya I get what you are saying. basically we've reduced the risk of a reboot causing 100% failure | 00:39 |
clarkb | and the chance of any failure post reboot is minimal since reads are workign | 00:39 |
fungi | ii openafs-modules-dkms 1.8.6-1ubuntu1~xenial1 | 00:39 |
fungi | that's ze01 | 00:39 |
fungi | so no, not all anyway | 00:39 |
clarkb | ze01 has not updated | 00:39 |
clarkb | ya I mean once they update | 00:39 |
clarkb | mirror-update and mirrosr won't happen until ~0600 | 00:40 |
clarkb | static may happe nsooner if your change lands | 00:40 |
*** hamalq has quit IRC | 00:40 | |
clarkb | zuul should happen in the next hour? | 00:40 |
clarkb | part of why I'm bringing this up is I need to pop out for dinner in a few short minutes and then idaelly also call it a work day | 00:41 |
fungi | i'll be glad to stick around, and am happy to do a controlled reboot of static.o.o after the apache change triggers a fresh deployment there | 00:41 |
*** mlavalle has quit IRC | 00:42 | |
ianw | yeah, i can make a list on that etherpad page and make sure things update | 00:42 |
ianw | i can also create the vicepa snapshot in preparation for a manual server upgrade | 00:43 |
clarkb | ze01 should've updated openafs-client and openafs-krb5 at 23:43:44 ish. Now trying to cross check with when the ppa updated for xenial | 00:43 |
*** mordred has joined #opendev | 00:43 | |
ianw | fungi: be good if you could double check the instructions | 00:43 |
clarkb | the timestamp for the amd64 openafs-client package seems to be 23:44 | 00:44 |
clarkb | missed it by seconds | 00:44 |
clarkb | ianw: fungi thanks | 00:44 |
clarkb | then maybe tomorrow I can work on rolling reboots of zuul executors and we can do a reboot of mirror-update if fungi's locks sufficiently idle that server? | 00:45 |
fungi | ianw: i'll take a look, sure | 00:45 |
clarkb | then maybe aim for Monday upgrade of the servers? | 00:45 |
fungi | i'd rather not reboot mirror-update until the tarballs volume is at least done releasing | 00:45 |
fungi | but i guess if we need to we need to | 00:46 |
clarkb | fungi: ya I think we want to wait for it to idle if we can get it to do so | 00:46 |
clarkb | since you've got all the locks held right? | 00:46 |
clarkb | so it should finish the current set of releases then do nothing | 00:46 |
clarkb | if we aim for monday for the outage we can send out comms tomorrow too and try and warn people as much as possible | 00:46 |
clarkb | "the uotage" being the main afs server outage | 00:46 |
clarkb | and that also gives time for vos releases to complete | 00:47 |
fungi | right, i terminated the other outstanding vos release calls from mirror-update.o.o (which unfortunately doesn't stop the transactions so isn't actually freeing up the afs servers) and held locks for all of them in a screen session | 00:47 |
fungi | the possible wrench in the ointment here is that the other replica sync transactions which already got initiated are likely to continue well into next week | 00:48 |
clarkb | hrm I wonder how terrible it will be to upgrade with those happening :/ | 00:49 |
fungi | we might just need to consider afs02.dfw a total loss and start all its replicas from scratch again | 00:49 |
fungi | presumably afs01.dfw will give up trying to replicate to it if the server goes away for a while? | 00:49 |
clarkb | no clue | 00:50 |
clarkb | might be good to have corvus think over some of this stuff too | 00:50 |
ianw | indeed. i can shepherd it on my monday, which is usually a very quiet time | 00:50 |
fungi | any one of these mirror volumes easily needs a weekend or more to do a full release, and we have something like 10 around that size | 00:50 |
ianw | that would give y'all your monday to fix anything :) | 00:50 |
*** mordred has quit IRC | 00:51 | |
clarkb | I've just double checked that openafs-client role is only applied to zuul-executor and not all of zuul. That is the case | 00:53 |
clarkb | and with that I need to go catch up on household/family things. Thank you for all the help today. If you discover new things or have schedulign thoughts for getting stuff uipdated maybe update the etherpad and I'll do my best to catch up in the morning? | 00:53 |
ianw | np. i think what i'll do is monitor all those servers and update the ehterpad. i'll add the vicepa snapshot | 00:54 |
ianw | then i might send a summary email we can sync on | 00:54 |
clarkb | sounds good. ++ | 00:54 |
fungi | awesome, i'll be back around shortly, need to switch rooms | 00:54 |
clarkb | might also be good to indicate if you think an AU monday outage is a good idea given the other releases and everything when your day ends. That way we can send out a warning tomorrow about it | 00:54 |
clarkb | I guess we can always send a warning and mention it may not happen depending on server state | 00:55 |
clarkb | thanks again! | 00:55 |
fungi | i'm going to restart gerritbot, i don't see it coming back since the split | 00:56 |
fungi | #status log restarted gerritbot since it was lost in a netsplit at 00:03 utc | 00:57 |
openstackstatus | fungi: finished logging | 00:57 |
*** Eighth_Doctor has joined #opendev | 01:01 | |
*** mordred has joined #opendev | 01:19 | |
fungi | ze01 still doesn't have newer openafs packages installed yet. i suspect our hourly jobs for those aren't upgrading distro packages | 01:34 |
fungi | unattended-upgrades will take care of it anyway | 01:43 |
*** cloudnull has quit IRC | 01:49 | |
fungi | ze07 looks like it's mid-upgrade | 01:56 |
fungi | openafs-modules-dkms is at 1.8.6-5ubuntu1~xenial2 now but openafs-client and openafs-krb5 are still on 1.8.6-1ubuntu1~xenial1 there | 01:56 |
fungi | ahh, yeah, the dkms postinst build is in progress there | 01:58 |
ianw | ok, lunch done, will take a look | 01:59 |
fungi | it's parented to what looks like an ansible ssh session, so i guess we are updating them that way | 01:59 |
fungi | yeah, other executors are in a similar state now | 02:00 |
fungi | so they should be done shortly | 02:00 |
ianw | ianw@ze01:~$ dpkg --list | grep openafs | 02:20 |
ianw | ii openafs-client 1.8.6-1ubuntu1~xenial1 amd64 AFS distributed filesystem client support | 02:20 |
ianw | ii openafs-krb5 1.8.6-1ubuntu1~xenial1 amd64 AFS distributed filesystem Kerberos 5 integration | 02:20 |
ianw | ii openafs-modules-dkms 1.8.6-5ubuntu1~xenial2 all AFS distributed filesystem kernel module DKMS source | 02:20 |
fungi | yeah | 02:22 |
ianw | - name: Install client packages | 02:22 |
fungi | look at the process list and you'll see lkm compilation underway | 02:22 |
fungi | or i guess it's finished on at least some i'm checking now | 02:23 |
ianw | i think it's finished on ze01 | 02:23 |
ianw | the next step in the role should have updated the other two packages | 02:23 |
ianw | service-zuul.yaml.log just finished | 02:27 |
fungi | yeah, looks like they're fully upgraded now | 02:28 |
ianw | yep | 02:28 |
fungi | now the question whether we should reboot an executor and make sure it's still working correctly | 02:29 |
ianw | just looping them all to double check they're all upgraded | 02:29 |
*** brinzhang_ has joined #opendev | 02:31 | |
ianw | all good. probably worth restarting one ... i can do that if we like | 02:32 |
fungi | it'll restart a bunch of in-progress builds, but that's probably still better than getting caught unawares with a problem when one needs to be rebooted for other reasons | 02:34 |
*** brinzhang0 has quit IRC | 02:34 | |
*** cloudnull has joined #opendev | 02:35 | |
fungi | yeah, we've still got a nearly 3k node backlog | 02:35 |
ianw | i'll restart ze01 for sanity | 02:36 |
fungi | thanks | 02:37 |
ianw | ok, it's back, it's got afs and can look around /afs/openstack.org | 02:40 |
fungi | looks like it's up and yeah | 02:40 |
fungi | we'll want to make sure a docs build succeeds on it | 02:40 |
*** cloudnull has quit IRC | 02:46 | |
ianw | i don't think anything updates the cache for the mirror runs | 02:59 |
ianw | the package cache | 02:59 |
ianw | runs being the ansible runs | 02:59 |
*** brinzhang_ has quit IRC | 03:00 | |
*** brinzhang_ has joined #opendev | 03:00 | |
fungi | unattended-upgrades will get it at some point in 24 hours | 03:01 |
fungi | or we can manually force them earlier | 03:01 |
ianw | i'm running ansible by hand now with update_cache fix in the openafs-client role | 03:02 |
*** ysandeep|out is now known as ysandeep | 03:06 | |
*** cloudnull has joined #opendev | 03:21 | |
*** cloudnull has quit IRC | 03:53 | |
*** cloudnull has joined #opendev | 03:55 | |
fungi | not seeing any obvious job failures attributable to afs writes from ze01, good so far | 04:02 |
*** cloudnull has quit IRC | 04:02 | |
*** cloudnull has joined #opendev | 04:05 | |
ianw | ok, all mirrors should be updated | 04:12 |
fungi | thanks! i think i'm about to nod off, but i'll check through everything again when i wake up | 04:15 |
*** ysandeep is now known as ysandeep|afk | 04:49 | |
*** ysandeep|afk is now known as ysandeep | 05:21 | |
*** brinzhang0 has joined #opendev | 05:36 | |
*** brinzhang0 has quit IRC | 05:37 | |
*** brinzhang0 has joined #opendev | 05:37 | |
*** brinzhang_ has quit IRC | 05:38 | |
ianw | mail sent to discuss list to discuss how to move on from here | 05:57 |
ianw | i have confirmed that all our clients have fixed 1.8.6 packages | 05:58 |
ianw | i have implemented the vicepa snapshot on afs01.dfw; we may want to recreate this I guess but the basics are there | 05:59 |
ianw | not sure what else to do now | 06:06 |
ianw | i'll wait for some feedback and we can take it from there | 06:06 |
*** zbr has quit IRC | 06:09 | |
*** zbr has joined #opendev | 06:11 | |
*** marios has joined #opendev | 06:22 | |
*** eolivare has joined #opendev | 07:12 | |
*** openstackgerrit has joined #opendev | 07:25 | |
openstackgerrit | Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258 | 07:25 |
*** ysandeep is now known as ysandeep|lunch | 07:47 | |
*** sboyron has joined #opendev | 07:54 | |
*** ralonsoh has joined #opendev | 07:55 | |
openstackgerrit | Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258 | 08:09 |
*** rpittau|afk is now known as rpittau | 08:09 | |
*** andrewbonney has joined #opendev | 08:12 | |
*** jpena|off is now known as jpena | 08:34 | |
*** tosky has joined #opendev | 08:42 | |
*** ysandeep|lunch is now known as ysandeep | 08:47 | |
*** lpetrut has joined #opendev | 08:50 | |
*** hashar has joined #opendev | 08:52 | |
openstackgerrit | Martin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker https://review.opendev.org/c/opendev/system-config/+/705258 | 09:12 |
*** DSpider has joined #opendev | 10:17 | |
*** dtantsur|afk is now known as dtantsur | 10:38 | |
*** marios has quit IRC | 10:58 | |
zbr | any infra-core around? | 11:28 |
*** rosmaita has joined #opendev | 11:54 | |
*** marios has joined #opendev | 11:59 | |
*** hashar is now known as hasharLunch | 12:04 | |
*** artom has joined #opendev | 12:16 | |
*** jpena is now known as jpena|lunch | 12:31 | |
*** brinzhang_ has joined #opendev | 12:32 | |
*** brinzhang0 has quit IRC | 12:36 | |
*** whoami-rajat__ has joined #opendev | 12:43 | |
*** slittle1 has quit IRC | 13:10 | |
*** slittle1 has joined #opendev | 13:11 | |
mrunge | Hi there, is there anyone to help me figure out a POST_FAILURE in patches in gate phase? E.g https://review.opendev.org/c/openstack/panko/+/764906 | 13:18 |
mrunge | the same checks work in check phase, but fail in gate phase with this post_failure | 13:19 |
mrunge | and I can't figure out why | 13:19 |
*** slittle1 has quit IRC | 13:30 | |
*** jpena|lunch is now known as jpena | 13:33 | |
frickler | mrunge: looks to me like "just" an unstable job, the same failure is also seen in check here https://zuul.opendev.org/t/openstack/build/3d7cb7959325456f98381916c95081ea | 13:36 |
frickler | it's extremely unlikely that job results should consistently depend on whether the job runs in check or gate | 13:36 |
mrunge | frickler, thank you. It looks like they are failing pretty consistently, but only in gate phase | 13:37 |
frickler | we could either hold a node for you to debug the failure in situ or you could amend the devstack post job to not fail completely in that scenario, allowing for more logs to be present in case of this failure | 13:37 |
mrunge | is it possible that *just* a signal is not sent to the right target? | 13:39 |
frickler | zbr: the keywords to highlight people are either infra-root or config-core (the latter for e.g. project-config reviews). it would also be much more productive if you could just state your issue instead of doing empty pings | 13:39 |
frickler | mrunge: not sure why exporting the journal would fail, most likely it gets oomed. without other logs present it's difficult to really tell, though | 13:40 |
mrunge | right | 13:40 |
mrunge | it looks like this is getting killed | 13:40 |
mrunge | frickler, is it possible to freeze a node in the case of failure? | 13:41 |
mrunge | although, why would the job then just fail in gate and not also in check mode? | 13:42 |
frickler | mrunge: as I said before, I don't think the latter is true. I'll set up a hold in order to keep a node online when it is failing | 13:45 |
mrunge | okay, thank you. I can retrigger a job | 13:46 |
mrunge | (or you could...) | 13:46 |
frickler | I just did, the hold is specific to the patch you mentioned above. | 13:50 |
mrunge | thank you, that's awesome | 13:50 |
zbr | frickler: it would be very useful to mention the keywords in channel topic as we have lots of channels with their own rules. | 13:56 |
*** mlavalle has joined #opendev | 13:58 | |
frickler | zbr: we did not put them into the topic in order to avoid getting spammed too often, they are mentioned somewhere in our docs, though | 13:58 |
zbr | https://review.opendev.org/q/hashtag:%22low-hanging%22+(status:open%20OR%20status:merged) | 13:58 |
zbr | but with projects like git-review is quite hard to guess, which keyword to use. | 14:00 |
zbr | i am core there, but i still need two others to get my own changes in. | 14:00 |
zbr | is bit tricky as low activity projects rely more on help from infra in order to get things moving. | 14:02 |
zbr | i still have no idea how we could improve this | 14:02 |
zbr | another interesting subject would be related to elastic-search licensing: does this impact us? (long-term) | 14:07 |
frickler | well we could discuss dropping the two-review rule for projects like this. maybe add that as a topic to our meeting agenda? | 14:07 |
zbr | sure, i will do. i think that there requirement is not enforced but it could be tricky to know which project needs it or not. | 14:08 |
mrunge | zbr, you're not alone with this issue. | 14:14 |
mrunge | usually we dealt with this by distinguishing between low hanging fruits and patches where a second pair of eyes is really appreciated | 14:15 |
zbr | mrunge: yep, my impresison is that simple low-risk changes should be allowed with downgraded cvorum, as in single reviewer. still, even if that is agreed, we need to give a good set of examples (good and bad). | 14:17 |
mrunge | +1 , however it would be hard to give these do's and don'ts | 14:18 |
zbr | fixing ci pipelines, reqs, broken jobs, is likely to be low risk. but breaking changes not | 14:21 |
zbr | finny bit is that dropping support for py27 and py35 is somewhere in-between, based on the project, it could have breaking impact. | 14:22 |
mrunge | yes! | 14:22 |
zbr | on the other hand, when you already have CI broken for months, you start to wonder which one is the lesser evil | 14:22 |
zbr | i added topic to https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Weekly_Project_Infrastructure_team_meeting | 14:23 |
*** rpittau is now known as rpittau|afk | 14:48 | |
frickler | mrunge: oom theory confirmed: http://paste.openstack.org/show/801664 , if you let me know your ssh key I can give you access for further debugging | 14:49 |
mrunge | good to know | 14:52 |
mrunge | frickler, https://github.com/mrunge.keys | 14:52 |
fungi | is there swap set up on that job? | 14:52 |
mrunge | tbh. I don't know | 14:52 |
mrunge | I even haven't seen that job description so far | 14:53 |
fungi | should be able to run free once logged in and find out | 14:53 |
frickler | our default of 1G, yes, and it is all used up | 14:53 |
frickler | mrunge: ssh root@158.69.69.72 | 14:54 |
frickler | /var/log/journal has 409M, so nothing super huge, though | 14:56 |
fungi | so presumably memory pressure from something else | 14:56 |
mrunge | all together may be a bit tight with 1gb | 14:56 |
mrunge | is there a possibility to give it, e.g. 1.5 gig? | 14:56 |
frickler | it's strange though that xz gets killed, though, maybe it chokes on some special kind of input? or is it just unlucky? | 14:56 |
frickler | mrunge: there is a patch to have 8g of swap again, but so far it seemed only to be needed for stable/stein | 14:57 |
frickler | or was it train? | 14:57 |
mrunge | since the same job passes on check queue, the question is, what is the real difference? | 14:58 |
mrunge | between check and gate queue | 14:58 |
frickler | https://review.opendev.org/c/openstack/devstack/+/757488 | 14:58 |
frickler | mrunge: this failure was in check, it doesn't really matter which queue | 14:58 |
mrunge | ugh | 14:58 |
mrunge | there goes my theory | 14:58 |
frickler | seems just like a 50% chance of failing | 14:58 |
fungi | mrunge: frickler: it's entirely configurable. 1gb is simply the default | 14:59 |
mrunge | fungi, where would I set that? | 14:59 |
mrunge | I wouldn't want to use a lot more, since it often passes | 15:00 |
fungi | you can set a variable in the job definition to indicate how large of a swapfile you want, just be aware it subtracts from available rootfs on at least some providers so don't make it huge unless you know you won't be using a lot of disk | 15:00 |
frickler | mrunge: see the above patch. | 15:00 |
* fungi goes looking for the variable | 15:00 | |
fungi | configure_swap_size yeah | 15:02 |
mrunge | since there is 17 Gig out of 75 Gig used in rootfs, increasing swap to 8 gig would totally fit | 15:02 |
fungi | you can see folks overriding it in various jobs: https://codesearch.opendev.org/?q=configure_swap_size | 15:03 |
fungi | 8192 and 0 seem to be the popular override values | 15:03 |
fungi | depending on whether folks wanted more swap, or more disk and no swap | 15:03 |
mrunge | so, in this case, nearly all of the 1 gig swap is used | 15:04 |
fungi | for a bit of background, we basically had to make this compromise when newer linux kernels started refusing to allow sparsely allocated swapfiles | 15:04 |
fungi | the swapfiles are now preallocated instead of sparse, which means the prior 8gb default made a lot of jobs start failing on providers with smaller rootfs sizes | 15:05 |
mrunge | any idea where I would find the job description for telemetry-dsvm-integration ? | 15:06 |
mrunge | that seems to be inherited from somewhere? | 15:06 |
fungi | it will inherit from the job's parent if there's no description set in the job definition, i believe | 15:07 |
*** lpetrut has quit IRC | 15:07 | |
fungi | mrunge: https://zuul.opendev.org/t/openstack/job/telemetry-dsvm-integration says it inherits from https://zuul.opendev.org/t/openstack/job/telemetry-tempest-base | 15:08 |
mrunge | right, that's what I found so far | 15:09 |
fungi | the former has no description set in the job, but the latter does | 15:09 |
frickler | hmm, according to the manpage, xz may use multiple GB of memory when running with high compression settings like -9. we may want to add --memlimit=x, maybe 256M or so | 15:12 |
fungi | or switch to gz and accept that the files will be a bit larger | 15:13 |
fungi | though i want to say clarkb found xz was massively better at compressing systemd journals than gz | 15:14 |
frickler | well, failing with oom doesn't sound better, so there seems to be a tradeoff to make ;) | 15:15 |
fungi | yep, absolutely | 15:16 |
frickler | anyway, eod for me, maybe someone wants to add extra swap to the held node and test various xz options. or I'll do that later | 15:17 |
fungi | i suppose we could do a fallback to --memlimit=something if the normal compress attempt fails | 15:17 |
mrunge | yes, that sounds sensible | 15:17 |
fungi | but yeah, let's get some more suggestions. i need to switch to double-checking afs stuff | 15:17 |
mrunge | and then there is this change: https://github.com/openstack/devstack/commit/d02fa6f856ac5951b8a879c23b57d5a752f28918 | 15:18 |
mrunge | but not causing this issue | 15:19 |
mrunge | https://review.opendev.org/c/openstack/devstack/+/770949 | 15:23 |
mrunge | that is the change | 15:24 |
mrunge | thank you fungi and frickler | 15:24 |
mrunge | let's see how this goes | 15:24 |
clarkb | xz is much better space wise yes | 15:28 |
openstackgerrit | Merged openstack/project-config master: update gentoo from python 3.6 to python 3.8 https://review.opendev.org/c/openstack/project-config/+/770828 | 15:29 |
mrunge | fungi, frickler: I got access to a held node, I don't believe I still need it. that is 158.69.69.72 and could be released back to the pool | 15:35 |
mrunge | Or can I do that on my own, and if yes: how? | 15:35 |
clarkb | mrunge: you can't currentl release it back on your own | 15:35 |
clarkb | thank you for letting us know | 15:35 |
fungi | infra-root: static.o.o has the new openafs packages installed, unless there are objections i'm going to do a quick reboot in a few minutes so it's using the fixes and then maybe we can look at merging 770856 | 15:36 |
mrunge | thank you for giving access to help debugging | 15:36 |
clarkb | fungi: looks like zuul has caught up a bit (though still well behind) | 15:36 |
fungi | clarkb: yep | 15:36 |
clarkb | fungi: no objections from me on the reboot | 15:37 |
fungi | #status log rebooted static.o.o to pick up the recent openafs fixes | 15:39 |
openstackstatus | fungi: finished logging | 15:39 |
fungi | server's been up 2 minutes now | 15:42 |
fungi | seems to be working, afs content is served | 15:43 |
fungi | since there have been no objections, i'm going to self-approve 770856 for now and keep an eye on it once it deploys to make sure we're serving up to date content | 15:44 |
clarkb | sounds good | 15:47 |
*** caiqilong has joined #opendev | 15:50 | |
caiqilong | I received a email said I have been added to "Autopatrolled users". Is that good or bad? | 15:50 |
fungi | caiqilong: it's "good" | 15:50 |
caiqilong | fungi: ok. thanks. | 15:51 |
fungi | i watch every change made to the wiki to make sure we catch and roll back any spam, but if i see a user make consistently legitimate changes i add them to the autopatrolled users and then i no longer need to review their edits in the future | 15:51 |
fungi | clarkb: enough vos releases stopped that i can quickly get a vos status on afs01.dfw again (the tarballs volume release is still underway though) | 15:54 |
caiqilong | fungi: thanks for your patience. | 15:54 |
fungi | caiqilong: you're welcome! | 16:04 |
openstackgerrit | Merged opendev/system-config master: Temporarily serve static sites from AFS R+W vols https://review.opendev.org/c/opendev/system-config/+/770856 | 16:16 |
*** slaweq has joined #opendev | 16:19 | |
*** ysandeep is now known as ysandeep|dinner | 16:21 | |
fungi | okay, weird, looks like the last time we tried to build a gentoo image was september | 16:24 |
clarkb | I think nodepool will stop building an iamge if we stop telling it to upload to any provider | 16:27 |
clarkb | maybe we did that? | 16:27 |
fungi | looks like we paused it | 16:28 |
clarkb | ah | 16:28 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Un-pause Gentoo image builds https://review.opendev.org/c/openstack/project-config/+/771031 | 16:32 |
fungi | prometheanfire: clarkb: ^ | 16:32 |
prometheanfire | yarp :D | 16:33 |
*** ysandeep|dinner is now known as ysandeep|away | 16:33 | |
*** dtantsur is now known as dtantsur|afk | 16:41 | |
*** iurygregory has quit IRC | 16:41 | |
*** marios is now known as marios|out | 16:47 | |
*** marios|out has quit IRC | 17:04 | |
*** iurygregory has joined #opendev | 17:06 | |
*** hasharLunch is now known as hashar | 17:08 | |
*** slaweq has quit IRC | 17:12 | |
clarkb | fungi: is tarballs the only thing releasing now or its just fewer things overall? | 17:28 |
clarkb | I guess the mirrors are likely still releasing due to their size | 17:28 |
clarkb | fungi: I've +2'd the gentoo unpause, not sure if you want to approve it or wait for another review | 17:28 |
fungi | clarkb: what's "releasing" is a bit misleading. the previously killed vos release calls also still have sync transactions underway, should be able to spot them by listing the transactions | 17:30 |
clarkb | gotcha | 17:30 |
fungi | clarkb: i'll watch the build log for gentoo if you approve | 17:31 |
*** mlavalle has quit IRC | 17:31 | |
clarkb | ok approving now | 17:31 |
*** jpena is now known as jpena|off | 17:31 | |
fungi | thanks! | 17:32 |
*** mlavalle has joined #opendev | 17:32 | |
clarkb | fungi: making sure I'm up to date on people's thoughts re afs upgrades. We've rebooted enough afs clients to be reasonably confident that when the other clients reboot they will be fine (and all clients have the new packages installed) | 17:38 |
clarkb | and that means we don't need to worry about doing rolling reboots of all the mirrso and zuul executors? | 17:38 |
*** hashar has quit IRC | 17:38 | |
openstackgerrit | Merged openstack/project-config master: Un-pause Gentoo image builds https://review.opendev.org/c/openstack/project-config/+/771031 | 17:41 |
fungi | clarkb: yes, we rebooted at least the inap mirror, ze01 and static.o.o | 17:45 |
clarkb | that should be a pretty good representative sample | 17:45 |
fungi | i think that's a reasonable cross-section, yeah | 17:46 |
clarkb | ok, is there anything afs related that you think I should be doing next to help? Looks like ianw did a vicepa snapshot already | 17:47 |
clarkb | are we happy with that lvm level redundancy? | 17:47 |
clarkb | I think yesterday we had suggested it as the least overhead option (lvm is often a mystery to me so want to double check I should be doing anything else around backups/snapshots to help) | 17:48 |
fungi | as a temporary measure it seems like a fine solution to me | 17:48 |
fungi | but we should make sure we have enough available extents in the vg to allow the volumes to diverge for a while | 17:49 |
clarkb | fungi: is that something vgs would show us? | 17:50 |
fungi | a quick rundown of how lvm2 snapshotting works: all the extents (device blocks essentially) which belong to the original volume are marked as also belonging to the snapshot. when new writes are committed in the original volume the old extents are kept and new extents are used instead. as the volumes diverge obviously more additional extents will be used up to (eventually) the size of the original volume | 17:51 |
fungi | itself | 17:51 |
fungi | and yeah, vgs will show you your available extents/space | 17:51 |
fungi | we can always tack on another cinder device as a pv in that vg if needed | 17:52 |
clarkb | ianw did that already according to the etherpad (a full size 1tb volume) | 17:53 |
clarkb | I've just loaded my keys and will double check it too | 17:53 |
clarkb | vgs only shows vfree and vsize by default | 17:53 |
* clarkb reads manpages | 17:53 | |
clarkb | oh I may need lvs | 17:54 |
clarkb | ya the actual lv and its snap show up there | 17:54 |
fungi | right, lvs to see the volumes, vgs to see how much room there is for them to grow | 17:56 |
*** ralonsoh has quit IRC | 17:56 | |
clarkb | in this case I think we've created two lvs one at 4TB and one at 1TB which fills up the entire 5TB vg | 17:57 |
clarkb | the data% of the 1TB snapshot is .55%. Should I read that as this snapshot is using 0.55% of the space allocated to it? | 17:57 |
clarkb | if so then I think we are currently quite happy with that state? | 17:58 |
*** eolivare has quit IRC | 18:02 | |
clarkb | fungi: if you get a chance maybe you want ot double check all that too? | 18:02 |
*** eolivare has joined #opendev | 18:02 | |
clarkb | corvus: if you haven't seen https://etherpad.opendev.org/p/infra-openafs-1.8 yet, it would be great if you could review that | 18:03 |
clarkb | just to get another set of eyeballs on the whole situation | 18:03 |
fungi | yep, that sounds right, but i'll take a closer look when i'm done eating | 18:04 |
clarkb | thanks, that reminds me I should find breakfast | 18:05 |
fungi | yeah, the snapshot setup on afs01.dfw looks okay to me, it's up to 0.56% in use, but at this rate of growth we've probably got plenty of time. also we'd presumably recreate a newer snapshot immediately prior to starting the upgrade | 18:35 |
fungi | also calling sync before making the snapshot may be a good idea | 18:36 |
clarkb | fungi: you should add that to the etherpad | 18:36 |
fungi | i am | 18:36 |
clarkb | thanks! | 18:36 |
*** icey has quit IRC | 18:48 | |
auristor | clarkb: if you wait until jan 31 the need to update the servers to 1.8 will no longer be necessary | 18:50 |
clarkb | auristor: oh does the bit flip over again at that point? | 18:51 |
auristor | by the 31st the number of non-zero bits in the count of seconds since unix epoch will once again be great enough to permit time based random connection ids to be random | 18:51 |
clarkb | aha | 18:51 |
fungi | auristor: ooh, excellent point, though we have other reasons for wanting to upgrade too | 18:52 |
clarkb | auristor: fwiw I think we should upgrade to 1.8 anyway. | 18:52 |
clarkb | auristor: is changing the key format the only thing that you need to do in the upgrade? And it can't be done as a rolling upgrade right? | 18:52 |
auristor | the underlying problem that openafs 1.8 had was that an effort was made to replace time based randomness with RAND_bytes and the implementation was botched. | 18:52 |
auristor | the key does not need to change. | 18:53 |
clarkb | auristor: akeyconvert was the thing that ianw found | 18:53 |
clarkb | someone mentioned that converting the file format for the key was necessary? | 18:53 |
auristor | 1.6 stores non-DES keys in a krb5 keytab. 1.8 uses the KeyFileExt that AuriStor contributed. | 18:53 |
clarkb | right, so to upgrade we stop 1.6, run akeyconvert, then start 1.8? | 18:54 |
clarkb | on the fileservers and db servers | 18:54 |
auristor | akeyconvert is used to import the keys from the keytab to the KeyFileExt. | 18:54 |
auristor | 1.6 doesn't know about KeyFileExt and doesn't care if it exists. 1.8 won't use the keytab. | 18:54 |
clarkb | got it | 18:54 |
clarkb | so we could convert, then stop 1.6 then start 1.8. But do need to convert before 1.8 starts | 18:55 |
auristor | create the KeyFileExt and distribute to all servers as you would the keytab. Then upgrade / downgrade as you wish. | 18:55 |
clarkb | does that also mean we need to update zuul secrets that use keytabs? | 18:55 |
clarkb | though all of our clients are already 1.8 so maybe they do the correct thing already? | 18:56 |
auristor | KeyFileExt is used by all openafs 1.8 servers and admin tools that use -localauth | 18:56 |
clarkb | got it | 18:56 |
auristor | openafs clients do not use keytabs or KeyFileExt | 18:56 |
clarkb | auristor: and we shouldn't have a mix of 1.6 and 1.8 servers? | 18:57 |
auristor | AuriStorFS clients use krb5 keytabs and provide encryption for anonymous processes | 18:57 |
clarkb | I seem to recall reading that once upon a time but haven't found confirmation of that | 18:57 |
auristor | you can mix 1.6 and 1.8 | 18:57 |
auristor | there are no protocol, database or vice partition changes | 18:57 |
clarkb | that is really good to know actually | 18:57 |
clarkb | given ^ I think we should upgrade the ord fileserver first | 18:58 |
auristor | for that matter you can mix AuriStorFS and OpenAFS | 18:58 |
clarkb | beacuse that will have the least impact, then we could do a rolling update across the others | 18:58 |
clarkb | ianw: ^ highlight mark in irc for some interesting details | 18:58 |
auristor | build a 1.8 test fileserver and add it to the cell to make sure your KeyFileExt works | 18:58 |
auristor | once you are comfortable it does, distribute it to all the other db and file servers in the cell | 18:59 |
clarkb | auristor: ya that is essentially what the ord server is (it is in another location compared to the others and we don't use it for much because releases to it over the internet are slow | 18:59 |
clarkb | we should be able to use it as the test in this case with minimal impact (others should double check that assertion though) | 18:59 |
*** andrewbonney has quit IRC | 19:00 | |
clarkb | I'll update our etherpad with this new info after lunch | 19:00 |
fungi | all of the above sounds great, and also greatly simplifies the planned upgrade | 19:02 |
*** slaweq has joined #opendev | 19:22 | |
openstackgerrit | Ghanshyam proposed openstack/project-config master: Combine acl file for all interop source code repo https://review.opendev.org/c/openstack/project-config/+/771066 | 19:28 |
fungi | prometheanfire: gentoo image build underway, log here: https://nb01.opendev.org/gentoo-17-0-systemd-0000143983.log | 19:30 |
fungi | hopefully we'll know shortly if we have usable images again | 19:30 |
fungi | oh, though it's been on "Emerging (1 of 1) dev-python/packaging-20.7::gentoo" for ~1.5 hours according to the timestamp. maybe that build got terminated | 19:32 |
fungi | not terminated, this is still in the process table on nb01 since 18:03... "/bin/bash /tmp/in_target.d/pre-install.d/02-gentoo-04-install-desired-python" | 19:33 |
fungi | this child of it has been running since 18:05... "/usr/bin/python3.8 -b /usr/lib/python-exec/python3.8/emerge --binpkg-respect-use --rebuilt-binaries=y --usepkg=y --with-bdeps=y --binpkg-changed-deps=y --quiet --jobs=2 --autounmask=n --oneshot --update --newuse --deep --nodeps dev-python/packaging" | 19:34 |
fungi | looks like it might be deadlocked (livelocked?), strace shows it waiting on a private futex | 19:35 |
clarkb | and it is that pid with the futex? | 19:37 |
clarkb | or is another child of ^ (might be helpful to know which exact process is sitting on the futex) | 19:37 |
clarkb | fungi: ianw I updated https://etherpad.opendev.org/p/infra-openafs-1.8 with the notes from auristor but didn't change the upgrade plan. Instead just added the new info and proposed a potentially different upgrade plan | 19:38 |
fungi | i also replied to the ml thread with some of it | 19:39 |
fungi | clarkb: the process i straced didn't have any children in the process table | 19:40 |
clarkb | you wouldn't expect emerge to deadlock since that is a tool used by many users, but maybe we'vemanaged to trip a weird corner case in emerging python packaging | 19:41 |
fungi | maybe it doesn't like running with linux 4.15 or something | 19:45 |
*** dmellado has quit IRC | 19:47 | |
*** dmellado has joined #opendev | 19:48 | |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Set up access for #openinfra channel https://review.opendev.org/c/openstack/project-config/+/771073 | 19:55 |
*** auristor has quit IRC | 20:03 | |
*** eolivare has quit IRC | 20:04 | |
*** slaweq has quit IRC | 20:11 | |
*** auristor has joined #opendev | 20:23 | |
fungi | yeah, i'm fairly certain it's not going to terminate on its own, will try killing it and see what happens on the next attempt | 20:40 |
fungi | that caused it to fail and clean up | 20:43 |
clarkb | fungi: it looks like zuul might be caught up by tomorrow | 20:53 |
clarkb | assuming general friday trends continue | 20:53 |
fungi | that'll be nice. gerrit and scheduler restart over the weekend then? | 20:53 |
clarkb | what is the gerrit restart for? | 20:53 |
fungi | i'll be around and can drive | 20:53 |
clarkb | but ya getting a zuul scheduelr in would get us the WIP support which would be great | 20:54 |
fungi | gerrit restart will be needed for the zuul results plugin, if we approve the stack | 20:54 |
clarkb | I'll be around but in and out. Family is telling me that they are stir crazy and need to get out | 20:54 |
fungi | yeah, no worries | 20:54 |
clarkb | fungi: ah, I haven't had a chance to look at it since my last pass of reivews | 20:54 |
clarkb | did the server rename stuff get pulled out of the stack? I really do think simplifying that is a good idea for now and then rename when we actually rename | 20:55 |
fungi | checking | 20:56 |
fungi | yeah, https://review.opendev.org/767059 is still near the top of the stack | 20:58 |
fungi | we could do the gitea upgrade maybe? | 20:59 |
clarkb | oh ya thats another one that got ignored due to the afs things | 20:59 |
clarkb | ya if people are happy with the test node results on that one I think we can proceed | 21:00 |
clarkb | looks like ianw did check the held node and I think you already did so that is three of us | 21:00 |
fungi | i'm happy and seems like ianw tried it and didn't see any problem either | 21:00 |
clarkb | ya | 21:00 |
clarkb | I would probably prioritize zuul then gitea then gerrit | 21:01 |
clarkb | zuul since it is most sensitive to the load | 21:01 |
clarkb | and gerrit last since I'm not sure if we are going to rebase that and stop renaming the server in testing (my preference is that we do that) | 21:01 |
fungi | clarkb: have any suggestions on the best way to reach out to nibalizer about https://review.opendev.org/771073 ? | 21:03 |
clarkb | ya let me see | 21:04 |
fungi | strange, nodepool still doesn't seem to have started trying to build another gentoo image | 21:22 |
clarkb | if other images are building the slots may be full | 21:23 |
fungi | yeah, likely | 21:23 |
clarkb | since nodepool may have decided to start another build after you killed the broken gentoo build | 21:23 |
*** whoami-rajat__ has quit IRC | 21:26 | |
openstackgerrit | Merged opendev/system-config master: system-config-run-review: remove review-dev server https://review.opendev.org/c/opendev/system-config/+/766867 | 21:39 |
*** sboyron has quit IRC | 21:41 | |
ianw | thanks for looking in on the afs stuff | 22:18 |
ianw | i'd agree we can just start with ORD and see how it goes | 22:18 |
ianw | i think it's good to have a plan in case things restart before the end of the month though :) | 22:19 |
*** erbarr has quit IRC | 22:38 | |
*** erbarr has joined #opendev | 22:38 | |
*** logan- has quit IRC | 22:40 | |
*** logan- has joined #opendev | 22:53 | |
*** yoctozepto5 has joined #opendev | 22:54 | |
*** yoctozepto has quit IRC | 22:55 | |
*** yoctozepto5 is now known as yoctozepto | 22:55 | |
*** mwhahaha has quit IRC | 22:57 | |
*** mwhahaha has joined #opendev | 22:57 | |
*** TheJulia has quit IRC | 22:58 | |
*** TheJulia has joined #opendev | 22:58 | |
*** nautik has quit IRC | 22:58 | |
*** cap has quit IRC | 23:00 | |
*** jrosser has quit IRC | 23:01 | |
*** cap has joined #opendev | 23:01 | |
*** jrosser has joined #opendev | 23:03 | |
*** logan- has quit IRC | 23:04 | |
fungi | ianw: of course, and i'm not suggesting we wait to the end of the month to start either, just that we also can take it more slowly and carefully (events permitting) | 23:05 |
openstackgerrit | Lon Hohberger proposed openstack/diskimage-builder master: Pass DIB image's kernel version when checking modules https://review.opendev.org/c/openstack/diskimage-builder/+/771092 | 23:12 |
*** yoctozepto6 has joined #opendev | 23:15 | |
*** yoctozepto has quit IRC | 23:17 | |
*** yoctozepto6 is now known as yoctozepto | 23:17 | |
*** logan- has joined #opendev | 23:18 | |
*** ysandeep|away has quit IRC | 23:20 | |
*** ysandeep has joined #opendev | 23:22 | |
*** yoctozepto4 has joined #opendev | 23:42 | |
*** yoctozepto has quit IRC | 23:44 | |
*** yoctozepto4 is now known as yoctozepto | 23:44 | |
*** yoctozepto4 has joined #opendev | 23:58 | |
*** akrpan-pure has joined #opendev | 23:59 | |
*** yoctozepto has quit IRC | 23:59 | |
*** yoctozepto4 is now known as yoctozepto | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!