fungi | donnyd: note that some jobs may still be running on nodes there for a little bit after that change merges | 00:00 |
---|---|---|
fungi | if you wanted to be extra careful you could check whether nodepool has deleted all the server instances first | 00:00 |
*** ociuhandu has quit IRC | 00:00 | |
donnyd | The general purpose jobs have been disabled for a week or so | 00:00 |
fungi | right, so it would only be the "special" nodes anyway | 00:00 |
donnyd | Fn has only been running the specialized jobs | 00:01 |
fungi | of which there very well may be none at this point in the week | 00:01 |
*** mattw4 has joined #openstack-infra | 00:01 | |
donnyd | But I will surely check before I pull the plug on anything | 00:01 |
fungi | because people smarter than us are out drinking | 00:01 |
donnyd | Lol | 00:01 |
*** ociuhandu has joined #openstack-infra | 00:01 | |
clarkb | mordred: changes lgtm | 00:02 |
fungi | donnyd: also give it a few minutes for nodepool to delete the remaining images so it doesn't get stuck thinking they're a thing for all eternity | 00:02 |
fungi | and keep in mind that nodepool changes aren't applied instantaneously when config changes for them merge | 00:03 |
openstackgerrit | Merged openstack/project-config master: Full Disable of FortNebula https://review.opendev.org/709257 | 00:03 |
*** mattw4 has quit IRC | 00:07 | |
*** ociuhandu has quit IRC | 00:07 | |
*** ociuhandu has joined #openstack-infra | 00:09 | |
*** ahosam has joined #openstack-infra | 00:13 | |
donnyd | fungi: I'm not actually going to shut down the controller just yet, just need to grab the IPs from the mirrors project | 00:14 |
*** ociuhandu has quit IRC | 00:14 | |
donnyd | Also I plan to reuse as much as i can from the existing configs, its seems to be working well | 00:15 |
fungi | okay, cool | 00:16 |
*** rfolco has quit IRC | 00:17 | |
*** Goneri has quit IRC | 00:28 | |
*** ociuhandu has joined #openstack-infra | 00:36 | |
clarkb | I did't manage to get to the pip virtualenv spec but will try to hav ethat top of list on monday | 00:41 |
*** ociuhandu has quit IRC | 00:46 | |
*** ociuhandu has joined #openstack-infra | 00:47 | |
*** ociuhandu has quit IRC | 00:48 | |
*** ociuhandu has joined #openstack-infra | 00:48 | |
*** ijw has quit IRC | 00:53 | |
*** ijw has joined #openstack-infra | 00:56 | |
*** ociuhandu has quit IRC | 00:59 | |
*** ociuhandu has joined #openstack-infra | 01:00 | |
*** ijw has quit IRC | 01:01 | |
*** ociuhandu has quit IRC | 01:05 | |
*** ijw has joined #openstack-infra | 01:07 | |
*** gyee has quit IRC | 01:09 | |
*** lbragstad has quit IRC | 01:16 | |
*** rkukura has quit IRC | 01:26 | |
*** artom has quit IRC | 01:28 | |
*** rh-jelabarre has quit IRC | 01:32 | |
*** rkukura has joined #openstack-infra | 01:34 | |
fungi | i've lost count of all the things i didn't get to this week | 01:42 |
*** rfolco has joined #openstack-infra | 01:54 | |
*** ahosam has quit IRC | 02:02 | |
*** redrobot has quit IRC | 02:03 | |
*** jamesmcarthur has joined #openstack-infra | 02:09 | |
*** jamesmcarthur has quit IRC | 02:43 | |
*** jamesmcarthur has joined #openstack-infra | 02:43 | |
*** jamesmcarthur has quit IRC | 02:48 | |
*** jamesmcarthur has joined #openstack-infra | 02:57 | |
*** auristor has quit IRC | 02:58 | |
*** jamesmcarthur has quit IRC | 03:07 | |
*** matt_kosut has joined #openstack-infra | 03:07 | |
*** roman_g has quit IRC | 03:09 | |
*** auristor has joined #openstack-infra | 03:09 | |
*** matt_kosut has quit IRC | 03:12 | |
*** apetrich has quit IRC | 03:13 | |
*** auristor has quit IRC | 03:13 | |
*** auristor has joined #openstack-infra | 03:17 | |
*** ijw has quit IRC | 03:37 | |
*** jamesmcarthur has joined #openstack-infra | 03:42 | |
*** igordc has quit IRC | 03:45 | |
*** rfolco has quit IRC | 03:52 | |
*** imacdonn has quit IRC | 04:45 | |
*** matt_kosut has joined #openstack-infra | 04:45 | |
*** imacdonn has joined #openstack-infra | 04:46 | |
*** jamesmcarthur has quit IRC | 04:48 | |
*** matt_kosut has quit IRC | 04:50 | |
*** ricolin has joined #openstack-infra | 04:58 | |
*** ramishra has quit IRC | 05:13 | |
*** evrardjp has quit IRC | 05:34 | |
*** evrardjp has joined #openstack-infra | 05:35 | |
*** ociuhandu has joined #openstack-infra | 06:15 | |
*** ociuhandu has quit IRC | 06:20 | |
*** stevebaker has quit IRC | 06:44 | |
*** stevebaker has joined #openstack-infra | 06:44 | |
*** dave-mccowan has quit IRC | 07:18 | |
*** ociuhandu has joined #openstack-infra | 08:00 | |
*** ociuhandu has quit IRC | 08:01 | |
*** ociuhandu has joined #openstack-infra | 08:02 | |
*** slaweq has quit IRC | 08:09 | |
*** ociuhandu has quit IRC | 08:15 | |
*** ociuhandu has joined #openstack-infra | 08:16 | |
*** slaweq has joined #openstack-infra | 08:20 | |
*** ociuhandu has quit IRC | 08:22 | |
*** slaweq has quit IRC | 08:24 | |
*** xek has joined #openstack-infra | 08:29 | |
*** ahosam has joined #openstack-infra | 08:30 | |
*** ociuhandu has joined #openstack-infra | 10:02 | |
*** ociuhandu has quit IRC | 10:15 | |
*** lumir_ has joined #openstack-infra | 10:38 | |
*** matt_kosut has joined #openstack-infra | 10:45 | |
*** matt_kosut has quit IRC | 10:50 | |
*** yamamoto has joined #openstack-infra | 11:17 | |
*** yamamoto has quit IRC | 11:29 | |
*** yamamoto has joined #openstack-infra | 11:29 | |
*** slaweq has joined #openstack-infra | 11:30 | |
*** yamamoto has quit IRC | 11:34 | |
*** slaweq has quit IRC | 11:35 | |
*** ahosam has quit IRC | 11:35 | |
*** yamamoto has joined #openstack-infra | 11:36 | |
*** yamamoto has quit IRC | 11:40 | |
*** yamamoto has joined #openstack-infra | 11:42 | |
*** tosky has joined #openstack-infra | 11:43 | |
*** slaweq has joined #openstack-infra | 11:47 | |
*** slaweq has quit IRC | 11:51 | |
*** yamamoto has quit IRC | 11:55 | |
*** slaweq has joined #openstack-infra | 12:04 | |
*** slaweq has quit IRC | 12:13 | |
*** dciabrin has quit IRC | 12:21 | |
*** ociuhandu has joined #openstack-infra | 12:22 | |
*** slaweq has joined #openstack-infra | 12:25 | |
*** ociuhandu has quit IRC | 12:27 | |
*** slaweq has quit IRC | 12:29 | |
*** nicolasbock has joined #openstack-infra | 12:39 | |
*** ociuhandu has joined #openstack-infra | 12:45 | |
*** ociuhandu has quit IRC | 12:50 | |
*** eharney has quit IRC | 13:05 | |
*** nicolasbock has quit IRC | 13:20 | |
*** yamamoto has joined #openstack-infra | 13:47 | |
*** yamamoto has quit IRC | 13:49 | |
*** yamamoto has joined #openstack-infra | 13:51 | |
*** ociuhandu has joined #openstack-infra | 14:01 | |
*** Lucas_Gray has joined #openstack-infra | 14:01 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Adds role to install hashicorp packer https://review.opendev.org/709292 | 14:04 |
*** yamamoto has quit IRC | 14:05 | |
*** ociuhandu has quit IRC | 14:06 | |
*** Lucas_Gray has quit IRC | 14:16 | |
*** surajpatil1 has joined #openstack-infra | 14:39 | |
*** bdodd has quit IRC | 14:41 | |
*** surajpatil1 has quit IRC | 14:42 | |
*** bdodd has joined #openstack-infra | 14:42 | |
*** rfolco has joined #openstack-infra | 14:54 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Replace kubectl snap with apt repo https://review.opendev.org/709253 | 14:59 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Remove snap cleanup tasks https://review.opendev.org/709293 | 14:59 |
*** rfolco has quit IRC | 14:59 | |
mordred | clarkb: ^^ there's a fun cleanup | 15:00 |
*** rfolco has joined #openstack-infra | 15:05 | |
*** yamamoto has joined #openstack-infra | 15:06 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Add jobs to build gerrit 3.1 https://review.opendev.org/709295 | 15:06 |
*** ociuhandu has joined #openstack-infra | 15:07 | |
mordred | infra-root: ooh, reading 3.1 release notes: https://www.gerritcodereview.com/3.1.html#the-messageoftheday-extension-point-is-removed ... this mentions a javascript "banner" plugin entrypoint. perhaps we should write a js plugin that interfaces with statusbot - so when we have a global announcement in place it would show up in a gerrit banner in addition to in IRC | 15:10 |
*** yamamoto has quit IRC | 15:10 | |
*** ociuhandu has quit IRC | 15:11 | |
*** Lucas_Gray has joined #openstack-infra | 15:14 | |
*** bdodd has quit IRC | 15:16 | |
corvus | mordred: statusbot is ready for that: corvus@eavesdrop01:~$ cat /var/lib/statusbot/www/alert.json | 15:42 |
corvus | {"alert": null} | 15:42 |
corvus | i think that used to be served by apache, but looks like it isn't right now | 15:43 |
*** Lucas_Gray has quit IRC | 15:48 | |
*** Lucas_Gray has joined #openstack-infra | 15:51 | |
*** Lucas_Gray has quit IRC | 15:54 | |
*** Goneri has joined #openstack-infra | 15:55 | |
AJaeger | mordred: so, if 3.1 *removes* it - why do you want to use it? Still, like the idea, would be great to show alerts on review.o.o and zuul.o.o | 16:07 |
*** tosky has quit IRC | 16:37 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Add pause-buildset-registry role https://review.opendev.org/709256 | 16:38 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Fix unittests for python2 https://review.opendev.org/709302 | 16:38 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Fix unittests for python2 https://review.opendev.org/709302 | 16:47 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Fix unittests for python2 https://review.opendev.org/709302 | 16:57 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Add pause-buildset-registry role https://review.opendev.org/709256 | 16:57 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Fix cleanup of symlink fixtures https://review.opendev.org/709306 | 16:57 |
AJaeger | infra-root, did we break Zuul? I'm getting RETRY_LIMITs that I can't locate ;( | 17:01 |
AJaeger | example: http://zuul.opendev.org/t/openstack/status/change/709298,1 | 17:01 |
AJaeger | but there are more where I now see "2. attempt" | 17:01 |
mordred | AJaeger: that certainly doesn't seem awesome | 17:02 |
mordred | AJaeger: enum34 made a release recently that broke python2.7 tests for zuul - but I doubt that would have the same impact here | 17:03 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add foreground option https://review.opendev.org/635649 | 17:03 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Deprecate -d switch for running in foreground https://review.opendev.org/705185 | 17:03 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Don't enforce foreground with -d switch https://review.opendev.org/705189 | 17:03 |
AJaeger | and seems to be kind of random, 709298 after a recheck gets "2. attempt" everywhere | 17:03 |
mordred | I've got the log streaming for one of those | 17:04 |
corvus | i'll see if i can find more info about attempt #1 of linters on that change | 17:04 |
*** Goneri has quit IRC | 17:05 | |
mordred | corvus: also - do we know what the deal is with log streaming printing waiting for logger pretty constantly? | 17:05 |
mordred | oh - nevermind | 17:06 |
corvus | mordred: no, that's new to me | 17:06 |
mordred | that's for shell tasks that run before starting the console daemon | 17:06 |
AJaeger | thanks, corvus and mordred ! | 17:06 |
corvus | we have shell tasks before the console daemon? | 17:06 |
mordred | yup | 17:06 |
corvus | mordred: did you catch the releasenotes failure? it just hit retry limit | 17:06 |
mordred | no - I'm watching http://zuul.opendev.org/t/openstack/stream/fde8063280134fe7a6033e4602c8bcd7?logfile=console.log | 17:06 |
corvus | aw bad luck | 17:07 |
mordred | yeah - I made it to RUN on this one - not gonna get a retry | 17:08 |
corvus | git fetch error: stderr: 'fatal: Project not found: openstack/openstack-ansible-rabbitmq_server | 17:09 |
clarkb | you can pull up those logs via logdtash too | 17:09 |
corvus | that's a failing before ansible | 17:09 |
clarkb | since we log attempts. you search for attempts 3, then search on that job for that change on attempt 1 | 17:09 |
clarkb | oh if it is before ansible then logstash may not help | 17:10 |
AJaeger | heisenbugs - when I trail, it works ;( | 17:10 |
corvus | tobiash: ^ if you're around, this may touch the merger work | 17:10 |
corvus | lemme collect a log to paste | 17:10 |
corvus | tobiash, clarkb, mordred, AJaeger: http://paste.openstack.org/show/789894/ | 17:12 |
corvus | note there are 2 builds involved there | 17:13 |
tobiash | corvus: that sounds like something severe (oom?) happened to one of the worker processes | 17:13 |
AJaeger | this is openstack-manuals, why does it care about openstack-ansible-rabbitmq_server at all? | 17:14 |
corvus | tobiash: an oom happened at 16:16 | 17:14 |
AJaeger | so, an hour ago? | 17:15 |
corvus | or 45m before this error | 17:15 |
tobiash | corvus: the second exception looks weird, but shouldn't be running in the process pool | 17:16 |
corvus | AJaeger: if i understand tobiash's theory, the oom killed one of the workers and that went unnoticed until the manuals job (build 3565916d689d46be94f957e46cd6651f) ran 45m later, at which point the process pool noticed it, and that caused the job to fail | 17:16 |
corvus | and yeah, i don't understand the second error yet, or whether or how it might be related to the first | 17:17 |
AJaeger | thanks for explanation. | 17:17 |
tobiash | corvus: so it looks like we should catch concurrent.futures.process.BrokenProcessPool and re-initialize the process pool (or even stop the executor?) | 17:18 |
AJaeger | many jobs are failing, so do we need to restart executor and the worker? | 17:18 |
corvus | tobiash: yeah, i think we should try the first thing, and if that doesn't work, we can look at the second (similar to the streamer situation; i still don't think we've merged clarb's patches for that; we should check on that too) | 17:18 |
mordred | ++ | 17:19 |
mordred | catching broken process pool seems reasonable | 17:19 |
*** ralonsoh has joined #openstack-infra | 17:19 | |
corvus | tobiash, AJaeger i see continuing errors, so actually let me revise the hypothesis: that it may not have gone unnoticed for 45m, it may have started failing every job running on that executor. | 17:20 |
tobiash | the exception says that the process pool is not usable anymore so I'd expect that every job fails from that point on | 17:20 |
mordred | corvus: yeah - if each one is going to throw BrokenProcessPool, that would make sense absent something to rectify the pool - it would also likely let that executor accept a much higher % of the available jobs, since it's failing them quickly | 17:21 |
corvus | there are many instances of the rabbitmq access rights error too. the first instance of that is at 6:25 | 17:21 |
corvus | oh, wait, log rotation... | 17:22 |
corvus | that was happening all yesterday too | 17:23 |
corvus | mordred: git clone https://review.opendev.org/openstack/openstack-ansible-rabbitmq_server | 17:23 |
mordred | corvus: why is something git cloning from review? | 17:24 |
tobiash | ah good, so that one is unrelated | 17:24 |
AJaeger | Oops, 'fatal: remote error: Git repository not found | 17:24 |
corvus | mordred: it's not. i'm asking you to to confirm that it's fubar in gerrit | 17:24 |
mordred | ah. nod | 17:24 |
corvus | mordred: (but it was a *fetch* from review by zuul that failed) | 17:25 |
corvus | https://review.opendev.org/#/admin/projects/?filter=rabbit | 17:25 |
mordred | confirm | 17:25 |
mordred | I cannot clone that repo | 17:25 |
mordred | but I can clone other repos that way | 17:25 |
corvus | i don't see it there; so it looks like that is a problem with gerrit | 17:25 |
AJaeger | indeed, I can't find it ;( | 17:25 |
corvus | i'm going to identify which executors need restarting and will restart them as a temp fix (until we can make the change that tobiash suggested); this is probably something that we can limp along with for a few days at a time until then. | 17:26 |
AJaeger | last new repo creation was https://review.opendev.org/708961 | 17:26 |
corvus | if someone wants to dig into the gerrit issue, that's open for the taking | 17:26 |
mordred | corvus: it exists in /home/gerrit2/review_site/git | 17:27 |
corvus | (i wonder if it has a broken refs/meta?) | 17:27 |
AJaeger | 708961 looks ok. Still, did creating it broke something? | 17:27 |
corvus | only ze02 is broken, so i will restart only it | 17:28 |
* clarkb is headedto a kids birthday party. Sorry cant help right now | 17:28 | |
mordred | and I can clone it from there locally - so the git repo seems fine-ish | 17:28 |
*** ccamacho has quit IRC | 17:28 | |
*** dklyle has quit IRC | 17:28 | |
corvus | mordred: from the error log: http://paste.openstack.org/show/789895/ | 17:30 |
mordred | corvus: oh goodie | 17:30 |
corvus | #status log restarted zuul-executor on ze02 due to process pool failure | 17:31 |
AJaeger | sorry, have to step out as well | 17:33 |
corvus | interesting ebdf54221280df20522ab15cea9c9b67c0c03ca4 does exist in that repo | 17:34 |
corvus | (and it is a change to project.config) | 17:34 |
mordred | corvus: maybe it was a hiccup? | 17:34 |
corvus | mordred: i rsynced a copy of the repo and it fsck's and gc's without error | 17:34 |
*** evrardjp has quit IRC | 17:34 | |
corvus | mordred: yeah, we may just want to try restarting gerrit? | 17:34 |
*** evrardjp has joined #openstack-infra | 17:35 | |
mordred | corvus: yeah - or - maybe clearing cache? | 17:35 |
corvus | oh yeah com.google.gerrit.server.project.ProjectCacheImpl | 17:35 |
mordred | corvus: maybe just trya . gerrit flush-caches and see if that fixes it? | 17:36 |
corvus | mordred: should i do "--all"? | 17:36 |
corvus | or "projects" | 17:36 |
mordred | maybe try projects first | 17:36 |
corvus | okay, i will do that now | 17:36 |
corvus | nope. moving on to --all | 17:37 |
mordred | kk | 17:37 |
corvus | all is slow | 17:38 |
mordred | yeah, I'll bet | 17:39 |
corvus | i'm not sure this is faster than restarting :/ | 17:40 |
mordred | :( | 17:40 |
corvus | done | 17:40 |
corvus | still erroring | 17:40 |
corvus | restart now? | 17:40 |
mordred | double :( | 17:41 |
mordred | yeah. I think so | 17:41 |
corvus | okay, i will stop gerrit | 17:41 |
mordred | fingers crossed | 17:42 |
corvus | starting | 17:42 |
corvus | i alse restarted apache to speed up the proxy refresh | 17:43 |
corvus | \o/ https://review.opendev.org/#/admin/projects/openstack/openstack-ansible-rabbitmq_server exists | 17:43 |
mordred | THANK GOD | 17:43 |
corvus | and i can git clone that repo directly from gerrit | 17:43 |
corvus | #status log restarted gerrit to correct cached git object error with openstack/openstack-ansible-rabbitmq_server (repo on disk appears normal) | 17:44 |
corvus | er, statusbot isn't here | 17:44 |
mordred | corvus: all of the systems are taking mardis gras weekend off | 17:46 |
*** openstackstatus has joined #openstack-infra | 17:46 | |
*** ChanServ sets mode: +v openstackstatus | 17:46 | |
corvus | #statusbot log restarted statusbot because it disappeared | 17:46 |
openstackstatus | corvus: finished logging | 17:46 |
corvus | #status log restarted gerrit to correct cached git object error with openstack/openstack-ansible-rabbitmq_server (repo on disk appears normal) | 17:46 |
openstackstatus | corvus: finished logging | 17:46 |
corvus | #status log restarted zuul-executor on ze02 due to process pool failure | 17:47 |
openstackstatus | corvus: finished logging | 17:47 |
corvus | okay, i'm going to saturday now :) | 17:47 |
mordred | corvus: saturday! | 17:47 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Recover from broken process pool https://review.opendev.org/709307 | 17:53 |
tobiash | corvus, mordred: that should make the executor recover from such a situation ^ | 17:54 |
paladox | corvus we've seen that error before. | 18:40 |
paladox | just it didn't impact us as much :) | 18:40 |
paladox | corvus i think gerrit reindexes on startup? (not too sure if it does it under 2.13 though) | 18:41 |
*** jamesmcarthur has joined #openstack-infra | 18:48 | |
*** jamesmcarthur has quit IRC | 19:01 | |
*** jamesmcarthur has joined #openstack-infra | 19:02 | |
*** mugsie has quit IRC | 19:25 | |
*** mugsie has joined #openstack-infra | 19:27 | |
*** ociuhandu has joined #openstack-infra | 19:44 | |
*** ociuhandu has quit IRC | 19:48 | |
*** rfolco has quit IRC | 20:02 | |
*** rfolco has joined #openstack-infra | 20:07 | |
*** surajpatil1 has joined #openstack-infra | 20:19 | |
*** rfolco has quit IRC | 20:23 | |
*** surajpatil1 has quit IRC | 20:24 | |
*** slaweq has joined #openstack-infra | 20:31 | |
*** Lucas_Gray has joined #openstack-infra | 20:41 | |
*** jamesmcarthur has quit IRC | 20:44 | |
*** ralonsoh has quit IRC | 20:49 | |
*** ociuhandu has joined #openstack-infra | 20:54 | |
*** jamesmcarthur has joined #openstack-infra | 21:09 | |
*** ociuhandu has quit IRC | 21:18 | |
*** jamesmcarthur has quit IRC | 21:19 | |
*** ociuhandu has joined #openstack-infra | 21:30 | |
*** xek has quit IRC | 21:35 | |
*** jamesmcarthur has joined #openstack-infra | 21:54 | |
*** ijw has joined #openstack-infra | 21:54 | |
*** jamesmcarthur has quit IRC | 22:00 | |
*** ociuhandu has quit IRC | 22:01 | |
*** slaweq has quit IRC | 22:10 | |
*** slaweq has joined #openstack-infra | 22:22 | |
*** slaweq has quit IRC | 22:26 | |
*** jamesmcarthur has joined #openstack-infra | 22:40 | |
*** jamesmcarthur has quit IRC | 22:45 | |
*** ijw has quit IRC | 22:46 | |
*** dave-mccowan has joined #openstack-infra | 23:20 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [wip] uncap hacking https://review.opendev.org/709332 | 23:26 |
*** jamesmcarthur has joined #openstack-infra | 23:41 | |
*** jamesmcarthur has quit IRC | 23:46 | |
*** jamesmcarthur has joined #openstack-infra | 23:54 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!