opendevreview | Merged opendev/system-config master: reprepro: stop mirroring Debian stretch https://review.opendev.org/c/opendev/system-config/+/817340 | 00:45 |
---|---|---|
*** odyssey4me is now known as Guest5655 | 00:47 | |
ianw | i have locks in a root screen to cleanup debian-security/debian repos for ^ | 00:54 |
ianw | debian-security done | 01:29 |
opendevreview | likui proposed openstack/diskimage-builder master: Replace deprecated assertEquals https://review.opendev.org/c/openstack/diskimage-builder/+/817663 | 01:46 |
opendevreview | Chris Stone proposed openstack/diskimage-builder master: Adding allow-remove-essential for ubuntu grub install. https://review.opendev.org/c/openstack/diskimage-builder/+/817666 | 02:13 |
opendevreview | Chris Stone proposed openstack/diskimage-builder master: Adding allow-remove-essential for ubuntu grub removal. https://review.opendev.org/c/openstack/diskimage-builder/+/817666 | 02:15 |
Alex_Gaynor | Is linaro having issues? None of our arm64 jobs have started in 10 minutes | 02:16 |
corvus | Alex_Gaynor: i think that's about when the periodic jobs got enqueued, so the queues more or less instantaneously max out | 02:29 |
corvus | it seems like there should be more capacity in linaro though... | 02:30 |
corvus | ah, our max servers are higher than the quota on that provider, so the graph is misleading | 02:35 |
corvus | so yeah, they're waiting on quota | 02:35 |
corvus | Alex_Gaynor, ianw: oh, neat -- it looks like osu osl has arm64 images too? | 02:41 |
corvus | https://zuul.opendev.org/t/pyca/stream/bd04cde127334fedb6bc9b03e69a0fea?logfile=console.log is about to run on one i think | 02:42 |
*** diablo_rojo is now known as Guest5669 | 02:59 | |
ianw | Alex_Gaynor: yep, what corvus said about things maxing out at about that hour due to all periodic jobs kicking off then. we moved it forward a bit to avoid .eu timezones recently, so that might be a bit different | 03:04 |
ianw | or backwards, depending on your POV I guess. anyway, it used to happen a couple of hours from now | 03:05 |
ianw | so far Fedora 35 has managed to crash my work VM daily at a random point, generally in the middle of scrolling a largish web page. helpfully giving no message, oops, or anything but instantly dying :/ | 03:07 |
ianw | ok, debian mirror now done too https://grafana.opendev.org/d/T5zTt6PGk/afs?orgId=1 | 03:09 |
corvus | ianw: year of the linux desktop! | 03:14 |
ianw | #status log debian-stretch has been yeeted from nodepool and AFS mirrors | 03:14 |
ianw | i guess now i'm going to diagnose why the status bot has also yeeted itself | 03:15 |
ianw | we merged a couple of changes, and it looks like it restarted 29 hours ago | 03:22 |
ianw | we do not appear to get any logging out of it | 03:24 |
ianw | ok, i found the logging | 03:28 |
ianw | 2021-11-12 03:22:57,183 DEBUG irc.client: connect(server='irc.oftc.net', port=6697, nickname='opendevstatus', ...) | 03:29 |
ianw | 2021-11-12 03:22:57,355 DEBUG irc.client: TO SERVER: NICK opendevstatus | 03:29 |
ianw | 2021-11-12 03:22:57,356 DEBUG irc.client: TO SERVER: USER opendevstatus 0 * :opendevstatus | 03:29 |
ianw | 2021-11-12 03:22:57,356 DEBUG irc.client: process_forever(timeout=0.2) | 03:29 |
ianw | 2021-11-12 03:22:57,400 DEBUG irc.client: _dispatcher: disconnect | 03:29 |
ianw | it is and endless loop like that | 03:29 |
fungi | ianw: did the restart coincide with one of the changes we merged? or was it prior to them merging? | 03:42 |
ianw | yes, it restarted after those changes we merged. | 03:42 |
ianw | i'm going to try reverting the ssl one and see if that helps. i can't see the problem but it's my first suspect | 03:42 |
fungi | yeah, looks like similar timing to me | 03:43 |
fungi | and agreed, the other changes don't really go anywhere near the connection setup | 03:43 |
ianw | i've now managed to restart all containers, because git was so out of date it hadn't managed to figure out the recent letsencrypt update. not sure, but we should also double-check unattended-upgrades | 03:44 |
fungi | statusbot's running from a container, right? so it could be just about anything which changed in its dependency set or base image since the last time it was built? | 03:45 |
ianw | ok, reverting that change, it is back (i docker cp'd a reverted file, overwrote it in the container and did a docker restart on the container) | 03:48 |
ianw | #status log debian-stretch has been yeeted from nodepool and AFS mirrors | 03:48 |
ianw | oh it's still joining | 03:49 |
opendevstatus | ianw: finished logging | 03:50 |
fungi | so good hunch | 03:50 |
ianw | i guess we don't get statusbot change logs from gerritbot? | 03:51 |
ianw | https://review.opendev.org/c/opendev/statusbot/+/817694 | 03:52 |
Clark[m] | ianw: unattended upgrades on eavesdrop? | 03:54 |
ianw | oh, i also added "use_ssl=True" manually to the config and that didn't work either (although with hindsight, we shouldn't have changed the default action) | 03:55 |
ianw | Clark[m]: yeah, that's the next weird thing, there was a lot of non-upgraded packages there, including git, meaning i couldn't clone from opendev.org | 03:55 |
ianw | tristanC: ^ | 03:55 |
fungi | unattended-upgrades only applies security fixes by default, i think | 03:56 |
ianw | i'm going to have to go running around to music lessons etc. in about 30 minutes -- if we want to approve the revert that should push and restart with a good image that puts us in a steady state | 03:56 |
fungi | it's possible gerritbot is broken? i would have expected it to announce that change, it's not in-channel at all | 03:57 |
fungi | 2021-11-12 03:39:29 <-- opendevreview (~opendevre@104.239.144.232) has quit (Remote host closed the connection) | 03:58 |
ianw | that was my fault, but it has restarted | 03:59 |
fungi | ahh, okay | 04:00 |
fungi | i wonder why it ignored the statusbot change | 04:00 |
fungi | its config definitely says opendev/statusbot changes should be announced to this channel | 04:01 |
ianw | yeah it tried to send it and got an exception | 04:01 |
ianw | Nov 12 03:50:20 eavesdrop01 docker-gerritbot[2961013]: irc.client.ServerNotConnectedError: Not connected. | 04:01 |
fungi | huh | 04:01 |
fungi | maybe it was still starting up? | 04:02 |
fungi | anyway, as things seem under control, i'm going to go back to losing consciousness | 04:02 |
ianw | ++ | 04:03 |
ianw | nope, it's just failed to send again. it is getting the message | 04:04 |
opendevreview | Ian Wienand proposed opendev/statusbot master: [dnm] testing notifications https://review.opendev.org/c/opendev/statusbot/+/817695 | 04:06 |
ianw | i dunno, i just restarted it | 04:06 |
opendevreview | Merged opendev/statusbot master: Revert "Add use_ssl option" https://review.opendev.org/c/opendev/statusbot/+/817694 | 04:17 |
ianw | ^ just promoted; i pulled and restarted statusbot with it manually cause i'm under a bit of time pressure right now. it has connected | 04:21 |
*** pojadhav|sick is now known as pojadhav | 04:40 | |
*** ysandeep|out is now known as ysandeep | 06:07 | |
*** gouthamr_ is now known as gouthamr | 06:19 | |
frickler | slightly related we don't seem to have rDNS records for eavesdrop, is that intentional somehow or just an oversight? | 06:56 |
*** ysandeep is now known as ysandeep|lunch | 08:09 | |
*** diablo_rojo is now known as Guest5692 | 08:09 | |
*** gibi is now known as giblet | 08:22 | |
opendevreview | Alfredo Moralejo proposed opendev/system-config master: Enable mirroring of centos stream 9 contents https://review.opendev.org/c/opendev/system-config/+/817136 | 08:41 |
*** ykarel is now known as ykarel|lunch | 08:45 | |
opendevreview | Fabio Verboso proposed openstack/project-config master: Iotronic-pythonclient and Iotronic-UI update. Jobs moved to py38 (set in .zuul.yaml in the project repositories). https://review.opendev.org/c/openstack/project-config/+/817719 | 09:09 |
ianw | frickler: we don't really have an "eavesdrop" any more ... when we moved the services to opendev we moved it to "meetings.opendev.org" so it's a bit less creepy-sounding and so the services just run on eavesdrop01 for historical reasons | 09:17 |
*** ysandeep|lunch is now known as ysandeep | 09:23 | |
frickler | ianw: well I was talking about the host eavesdrop01.opendev.org, is there a reason for it to not have rDNS? | 09:24 |
ianw | frickler: oh, you said rDNS. indeed that is an oversight, i just fixed that :) | 09:24 |
ianw | jinx :) | 09:24 |
frickler | ^5 | 09:24 |
*** chandankumar is now known as raukadah | 10:05 | |
*** ykarel|lunch is now known as ykarel | 10:25 | |
*** lbragstad0 is now known as lbragstad | 11:06 | |
*** ysandeep is now known as ysandeep|afk | 12:01 | |
*** ykarel_ is now known as ykarel | 12:32 | |
*** ysandeep|afk is now known as ysandeep | 12:59 | |
*** jpena|off is now known as jpena | 13:15 | |
opendevreview | Merged opendev/irc-meetings master: Update policy popup meeting time & details https://review.opendev.org/c/opendev/irc-meetings/+/817496 | 13:32 |
opendevreview | Merged opendev/irc-meetings master: Update Interop meeting details https://review.opendev.org/c/opendev/irc-meetings/+/817225 | 13:33 |
*** pojadhav is now known as pojadhav|afk | 14:09 | |
*** ykarel is now known as ykarel|away | 14:20 | |
*** ysandeep is now known as ysandeep|dinner | 15:49 | |
*** frenzy_friday is now known as frenzyfriday|PTO | 15:59 | |
*** marios is now known as marios|out | 16:47 | |
*** ysandeep|dinner is now known as ysandeep | 16:57 | |
*** ysandeep is now known as ysandeep|out | 16:58 | |
*** jpena is now known as jpena|off | 17:41 | |
clarkb | fwiw feeling a lot better today. but still going to try and take it easy. I've caught up on some PBR stuff, and next will be looking at some zuul things. Ping me if there is anything else i should be looking at that I've missed the last coupel of days | 17:44 |
fungi | clarkb: it's mostly been a quiet week other than trying to shake out multi-scheduler bugs, good friday to take it easy if you ask me | 17:50 |
*** Guest5508 is now known as melwitt | 19:26 | |
*** melwitt is now known as Guest5716 | 19:27 | |
*** Guest5716 is now known as melwitt | 19:32 | |
*** melwitt is now known as jgwentworth | 19:34 | |
*** outbrito_ is now known as outbrito | 19:42 | |
corvus | i'm seeing repeated exceptions like this: | 20:44 |
corvus | 2021-11-12 20:39:50,687 ERROR zuul.Scheduler: [e: e3b2e8d75d704c9cb21fbf3954a61297] Exception while removing nodeset from build <Build 6ac065cf12574e0d9434ab215c2cf6fa of neutron-ovn-tempest-slow voting:True> for change <Change 0x7f9c5f5cb9a0 openstack/neutron 805391,13> | 20:44 |
corvus | but maybe the've stopped now? maybe that's a transient problem that was eventually corrected? possibly instigated by a gate reset? | 20:47 |
clarkb | when it says removing nodeset from build is that the actual nodeset of ndoes used to run the jobs or the abstract nodeset definition? | 20:52 |
fungi | there were a couple of connection reset exceptions today | 20:58 |
fungi | looks like they were both gerrit.GerritPoller getting disconnected form googlesource so maybe they were working on something | 20:59 |
clarkb | fungi: tehre was a big google outage overnight | 20:59 |
clarkb | affected youtube and other things as well as gerrit upstream | 21:00 |
fungi | corvus: were those the "Exception: No job nodeset for ..." exceptions you were looking at? | 21:00 |
fungi | oh, seems at least some of those are secondary, following a zuul.exceptions.MergeFailure | 21:01 |
corvus | i haven't tracked the issue down yet, but it seems to have stopped. | 21:05 |
fungi | looks like all the ones i'm finding are changes for openstack/neutron, but different changes (some more than once) | 21:08 |
fungi | oh, though here's one where it wasn't followed by a mergefailure | 21:09 |
fungi | er, wasn't following a mergefailure | 21:09 |
fungi | still for an openstack/neutron change though | 21:10 |
corvus | it's doing it again... same change interestingly | 21:33 |
clarkb | corvus: 805391's parent is 811411 and 811411 had a failure causing jobs to be cancelled for 805391. I guess this is why you were asking about resets? | 21:44 |
corvus | yeah... i'm tracking it down now; i suspect something wrong with cancelJobs(prime=True); using the repl now | 21:44 |
corvus | i expect it to be called with prime=True, but it's not behaving like it is. i can't tell for certain whether it is called with true or not | 21:54 |
corvus | oh i totally misread that, it's being called with prime=False | 21:56 |
corvus | so basically, it's intentional that it would just try to cancel them over and over; i still feel like i'm missing something | 21:58 |
corvus | ah, i think tests.unit.testscheduler.TestScheduler.testfailingdependentchanges actually shows the error, it just doesn't fail (for the same reason it's not fatal in prod -- it does work, just not efficiently) | 22:07 |
corvus | affects both master and 4.10.4 | 22:08 |
corvus | okay, before sos this was all protected by a conditional; so i think i see the fix | 22:11 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!