ianw | btw i just tried again clearing out 4aa57cc6-8710-4952-87e6-ddad1db2bb12 (jeblairtest in vexxhost ymq) and it disappeared ok this time ... yay clouds \o/ :) | 00:02 |
---|---|---|
fungi | eventually functional | 00:03 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Cleanup deprecated ansible syntax https://review.opendev.org/667710 | 00:03 |
clarkb | ianw: how are you ipv6'ing that node? | 00:04 |
clarkb | ianw: dns is A record only but then ansible_host has an ipv6 addr. Considreing we cant autoconfigure ipv6 proprely on those hosts maybe we should stick to ipv4 only? | 00:05 |
ianw | clarkb: i'm not, which seems to be what is done with the others | 00:05 |
ianw | oh, didn't i comment that? | 00:05 |
clarkb | ianw: you commented out public_v6 but ansible_host is the same ipv6 addr | 00:06 |
clarkb | I'm leaving a gerrit comment | 00:06 |
*** openstack has joined #openstack-infra | 13:12 | |
*** ChanServ sets mode: +o openstack | 13:12 | |
AJaeger | yeah, working again! Thanks, fungi! | 13:12 |
fungi | dirk: what bindep conflict? what new jobs? | 13:14 |
rakhmerov | AJaeger: ooh, yeah... I could have figured myself :) | 13:15 |
rakhmerov | thanks | 13:15 |
fungi | slaweq: i ought to be able to get to that deletion in roughly an hour | 13:15 |
slaweq | fungi: great, thx a lot | 13:16 |
rakhmerov | AJaeger: I'll just have to make it within the same patch that fixes unit tests | 13:16 |
AJaeger | fungi: regarding jobs, see https://review.opendev.org/#/c/667221 - the problem is that this is a diskimage-builder job with nodepool tests, you would probably need bindep of *both* projects (if I understand Dirk correctly) | 13:16 |
AJaeger | rakhmerov: take it over, abandon, whatever works ;) | 13:16 |
rakhmerov | ok | 13:16 |
AJaeger | rakhmerov: note that my change is for the tempest-plugin, while you work on the mistral repo | 13:17 |
rakhmerov | ooh, yes | 13:17 |
rakhmerov | right | 13:17 |
rakhmerov | 👍 | 13:17 |
AJaeger | rakhmerov: test my change with depends-on from your change... | 13:17 |
fungi | AJaeger: that sounds to me like either job-specific dependencies the job needs to incorporate directly, or the job needs to install the dependencies from each bindep file (normally they can be concatenated with no problem) | 13:17 |
fungi | consider devstack a (large) example... it needs the dependencies of all openstack projects installed | 13:18 |
AJaeger | fungi: I hope dirk will figure it out with corvus ;) | 13:19 |
fungi | if there are dependencies which need to be omitted or included that can usually be accomplished by adding custom profile names to the relevant entries and then filtering on that when running | 13:20 |
*** bobh has joined #openstack-infra | 13:20 | |
*** bobh has quit IRC | 13:22 | |
AJaeger | fungi: indeed, we could add a "nodepool" tag and include those when running... Or use the nodepool bindep file instead (if only that is needed). dirk, some food for thought | 13:22 |
AJaeger | fungi, could you put these small cleanups on your review queue, please? https://review.opendev.org/667900 and https://review.opendev.org/667357 | 13:24 |
*** rcernin has quit IRC | 13:31 | |
*** jistr is now known as jistr|call | 13:31 | |
*** rajinir has joined #openstack-infra | 13:41 | |
*** dave-mccowan has quit IRC | 13:43 | |
*** Goneri has joined #openstack-infra | 13:45 | |
*** pcaruana has quit IRC | 13:46 | |
*** pcaruana has joined #openstack-infra | 13:47 | |
*** mtreinish has joined #openstack-infra | 13:56 | |
*** sgw has joined #openstack-infra | 13:57 | |
*** eharney has quit IRC | 13:58 | |
*** ykarel|afk is now known as ykarel | 14:02 | |
*** roman_g has quit IRC | 14:06 | |
*** kjackal has quit IRC | 14:08 | |
*** kjackal has joined #openstack-infra | 14:10 | |
corvus | dirk: what do you mean by "bindep conflict"? | 14:10 |
*** eharney has joined #openstack-infra | 14:11 | |
*** lpetrut has quit IRC | 14:13 | |
*** bobh has joined #openstack-infra | 14:16 | |
openstackgerrit | James E. Blair proposed openstack/diskimage-builder master: Replace nodepool func jobs https://review.opendev.org/667221 | 14:16 |
corvus | dirk, AJaeger, fungi: oh, i see the comment now. i've responded on 667221 | 14:18 |
sshnaidm|ruck | I think we have a problem with iax mirror: Timeout on http://mirror.iad.rax.opendev.org/centos/7/os/x86_64/Packages/tmpwatch-2.11-5.el7.x86_64.rpm: http://logs.openstack.org/62/667562/3/check/tripleo-ci-centos-7-undercloud-containers/ff7c126/logs/undercloud/home/zuul/undercloud_install.log.gz#_2019-06-27_08_36_06 | 14:26 |
*** dpawlik has quit IRC | 14:27 | |
*** jistr|call is now known as jistr | 14:31 | |
efried | Hey folks, could we please get a devstack core to approve https://review.opendev.org/#/c/667218/ -- needed to unblock nova rocky. TIA! | 14:33 |
fungi | sshnaidm|ruck: that server was indeed offline at 08:36:06 but has been up for nearly 5 hours now | 14:33 |
sshnaidm|ruck | fungi, ack, Tengu ^^ | 14:35 |
Tengu | hmm ok. | 14:37 |
Tengu | so yeah, might be that - I re-checked this morning CET | 14:37 |
fungi | sshnaidm|ruck: Tengu: cause was https://www.redhat.com/archives/linux-cachefs/2019-June/msg00009.html | 14:37 |
Tengu | thanks fungi for the info | 14:37 |
Tengu | bah, who needs syste/kernel upgrades ? ;) | 14:38 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Add .gitreview https://review.opendev.org/667942 | 14:38 |
fungi | or at least we think that's (related to) what's causing it to drop to shutoff state anyway | 14:38 |
sshnaidm|ruck | fungi, thanks, interesting.. | 14:39 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Add .gitreview and .zuul.yaml https://review.opendev.org/667942 | 14:39 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Add .gitreview and .zuul.yaml https://review.opendev.org/667942 | 14:41 |
openstackgerrit | Merged zuul/zuul-operator master: Add .gitreview and .zuul.yaml https://review.opendev.org/667942 | 14:42 |
*** ykarel is now known as ykarel|away | 14:46 | |
openstackgerrit | Alex Schultz proposed openstack/project-config master: Retire tempest-tripleo-ui https://review.opendev.org/667949 | 14:50 |
dirk | fungi: I think it used to work before, but I'm too lazy to dig out a logfile | 14:52 |
dirk | argh | 14:52 |
dirk | corvus: ^^ | 14:52 |
*** xek__ has quit IRC | 14:53 | |
dirk | corvus: I actually did that change in the dependening change to not interfere with our review | 14:53 |
dirk | corvus: it still hits a timeout.. | 14:53 |
*** kjackal has quit IRC | 14:54 | |
corvus | dirk: yeah, there was definitely a gnupg2 installation in the old job -- there was just no indication that it was there for the opensuse jobs | 14:54 |
*** kjackal has joined #openstack-infra | 14:54 | |
corvus | dirk: https://opendev.org/zuul/nodepool/src/branch/master/devstack/files/debs/nodepool | 14:54 |
dirk | aha | 14:55 |
*** xek has joined #openstack-infra | 14:55 | |
dirk | corvus: so any idea why http://logs.openstack.org/37/667537/10/check/dib-nodepool-functional-openstack-opensuse-150-src/6afaba4/ is failing? | 14:55 |
corvus | dirk: looking at the run from your change in http://logs.openstack.org/37/667537/10/check/dib-nodepool-functional-openstack-opensuse-150-src/6afaba4/ it looks like the build worked | 14:55 |
corvus | dirk: it just didn't connect to port 22 | 14:56 |
dirk | yep, build works, but nodepool doesn't consider the image ready | 14:56 |
corvus | console log: http://logs.openstack.org/37/667537/10/check/dib-nodepool-functional-openstack-opensuse-150-src/6afaba4/instances/5f8ffffd-62ae-44fa-9aa4-18b57e3ee3ec/console.log | 14:56 |
corvus | glean ran, openssh started | 14:56 |
corvus | this is very similar to the problems with ubuntu we were looking at yesterday | 14:57 |
*** bhavikdbavishi has joined #openstack-infra | 14:57 | |
corvus | i don't know what the underlying problem is, but we're speculating that it's a boot sequencing issue, and the old jobs, which built 3 images and booted 3 vms at one time had enough cpu contention that it slowed down and altered the boot sequence. but the new jobs which run one vm at a time run faster and lose the race. | 14:58 |
dirk | it is complaining that /etc/resolv.conf is empty | 14:58 |
dirk | but that shouldn't be a problem | 14:58 |
*** xek_ has joined #openstack-infra | 14:58 | |
fungi | well, openssh will try to perform a reverse dns lookup on the client ip address for every incoming connection before responding | 14:58 |
corvus | dirk: do you know if glean should be working with NetworkManager in that configuration? | 14:58 |
corvus | fungi: yeah, but it usually times out relatively quickly and proceeds anyway, right? | 14:59 |
fungi | 30-ish seconds i think, if it's timing out waiting for dns responses. if it doesn't have any nameservers configured i would expect it to fail far more quickly, yes | 15:00 |
corvus | yeah, and we've got a 10 minute window (with probably 8 minutes left on the clock after boot, i'd guess) | 15:01 |
*** xek has quit IRC | 15:01 | |
fungi | right, not it then ;) | 15:01 |
dirk | corvus: that rings a bell, did it always say networkmanager? can I see a console log of a working nodepool image right now somewhere? | 15:01 |
*** dpawlik has joined #openstack-infra | 15:01 | |
*** bhavikdbavishi has quit IRC | 15:02 | |
dirk | corvus: I remember that we had some hack in diskimage builder to turn off networkmanager.. | 15:02 |
corvus | would a console log from a live image in our production nodepool providers work? | 15:02 |
corvus | that might be easiest to get | 15:03 |
dirk | sure | 15:03 |
corvus | i can get that after i eat breakfast | 15:03 |
dirk | so how did the "ubuntu issue" get solved? | 15:05 |
*** dpawlik has quit IRC | 15:07 | |
*** bhavikdbavishi has joined #openstack-infra | 15:10 | |
openstackgerrit | Alex Schultz proposed openstack/project-config master: Retire tempest-tripleo-ui https://review.opendev.org/667949 | 15:12 |
*** hamzy has quit IRC | 15:13 | |
*** bhavikdbavishi has quit IRC | 15:14 | |
*** pcaruana has quit IRC | 15:14 | |
*** bhavikdbavishi has joined #openstack-infra | 15:15 | |
*** dpawlik has joined #openstack-infra | 15:17 | |
fungi | dirk: i think by marking it non-voting? | 15:19 |
openstackgerrit | Alex Schultz proposed openstack/project-config master: Retire tempest-tripleo-ui https://review.opendev.org/667949 | 15:19 |
fungi | so "solved" | 15:20 |
fungi | or at least that was where things were headed when i passed out last night | 15:20 |
*** dpawlik has quit IRC | 15:22 | |
fungi | infra-root: mirror.iad.rax.opendev.org is offline again. going to do an openstack server start on it | 15:22 |
openstackgerrit | Sean McGinnis proposed opendev/yaml2ical master: Add DTSTAMP and UID values to meeting instances https://review.opendev.org/667961 | 15:23 |
fungi | status: SHUTOFF | 15:23 |
fungi | same as before | 15:23 |
fungi | looks like it died at 2019-06-27T15:21:46Z according to the "updated" field? if so has only been offline for a couple minutes | 15:24 |
*** udesale has joined #openstack-infra | 15:25 | |
*** kopecmartin is now known as kopecmartin|off | 15:28 | |
fungi | #status log mirror.iad.rax.opendev.org started again at 15:24z after mysteriously entering shutoff state some time after 15:11z | 15:28 |
openstackstatus | fungi: finished logging | 15:28 |
fungi | there was another page allocation failure for apache2 at 15:07:57 | 15:29 |
dirk | fungi: hmm, great "solution" | 15:29 |
openstackgerrit | Dirk Mueller proposed openstack/diskimage-builder master: DNM: Enable nodepool testing for opensuse 15.1 https://review.opendev.org/667537 | 15:29 |
corvus | dirk: i think it's a fine solution to a pre-existing problem | 15:29 |
openstackgerrit | Sean McGinnis proposed opendev/yaml2ical master: Add DTSTAMP and UID values to meeting instances https://review.opendev.org/667961 | 15:30 |
corvus | i didn't break it, i just discovered that it was already broken | 15:30 |
fungi | yep, separate troubleshooting suspected problems with distro images from the job | 15:30 |
corvus | that discovery does not obligate me to fix it | 15:30 |
*** portdirect has quit IRC | 15:30 | |
openstackgerrit | Dirk Mueller proposed openstack/diskimage-builder master: DNM: Enable nodepool testing for opensuse 15.1 https://review.opendev.org/667537 | 15:30 |
fungi | as long as the job works on at least one platform that's enough to start with | 15:31 |
zul | hmm...i think we are having a problem merging code in x/browbeat im not sure whats going on with this one: https://review.opendev.org/#/c/661624/5 | 15:31 |
*** portdirect has joined #openstack-infra | 15:31 | |
dirk | fungi: corvus: sorry, didn't imply that, my apologies | 15:31 |
fungi | yeah, by '"solved"' i meant it was solved for the job by not worrying about trying to test with broken images | 15:31 |
fungi | but didn't solve the broken images themselves | 15:32 |
*** roman_g has joined #openstack-infra | 15:32 | |
smcginnis | zul: It's possible it got lost in a restart. Try doing a recheck and see if that gets it moving. | 15:32 |
dirk | are we sure the image is broken? | 15:32 |
zul | smcginnis: ack thanks | 15:32 |
dirk | fungi: http://logs.openstack.org/37/667537/10/check/dib-nodepool-functional-openstack-opensuse-150-src/6afaba4/nodepool/nodepool.log | 15:32 |
corvus | zul: we restarted zuul shortly after that because of a memory leak... so yeah, what smcginnis said, sorry | 15:32 |
dirk | fungi: that just means it can not ssh the node? does it mean there is a firewall inbetween potentially? | 15:33 |
zul | no worries | 15:33 |
fungi | dirk: a firewall in between the nodepool and the bionic instance but not between nodepool and the xenial instance? | 15:33 |
fungi | i suppose that's possible but it would be slightly surprising | 15:34 |
auristor | fungi: was a panic logged to the console at the time of the mirror.iad.rax.opendev.org shutdown? | 15:34 |
AJaeger | zul: it's part of a stacked change, the change below it is not approved | 15:34 |
AJaeger | zul: see "Related Changes" | 15:34 |
*** chandankumar is now known as raukadah | 15:34 | |
AJaeger | zul: https://review.opendev.org/#/c/661620/4 needs approving first | 15:35 |
zul | doh ok | 15:35 |
fungi | auristor: i haven't gotten that far yet. we did catch a cachefs problem coincident with the previous shutdown, and ianw posted about it at https://www.redhat.com/archives/linux-cachefs/2019-June/msg00009.html | 15:35 |
fungi | er, fscache | 15:35 |
zul | AJaeger: is there a way to unstack the changes | 15:35 |
fungi | zul: rebase it removing the parent from its history | 15:36 |
auristor | fungi: understood about the prior crash. The more panic stack traces the better so that dhowells can figure out exactly what is going wrong. | 15:36 |
AJaeger | zul: question is whether those are related and need to be stacked - if not, follow fungi's advice | 15:36 |
AJaeger | config-core, could you put these small cleanups on your review queue, please? https://review.opendev.org/667900 and https://review.opendev.org/667357 | 15:37 |
dirk | fungi: I guess I'M talking about a different thing than you are. let me try to explain | 15:39 |
dirk | fungi: what I see is that nodepool tries to launch that test image. and it does so by booting it and then ssh'ing it | 15:39 |
dirk | and after 10 minutes it says "timeout waiting on port 22" | 15:39 |
clarkb | infra-root https://review.opendev.org/#/c/667759/ is ready to restore gitea06 to the haproxy cluster | 15:39 |
dirk | in the console log we can see it takes about 2 minutes to boot | 15:39 |
fungi | dirk: yes, and that problem only seems to be present for some of the images, right? | 15:39 |
dirk | so I wonder if it just does a ssh while the port is not yet open, and it doesn't get a connection refused but the packet just gets dropped for some reason | 15:40 |
fungi | e.g., the ubuntu-xenial images work when booted | 15:40 |
*** hamzy has joined #openstack-infra | 15:40 | |
dirk | and then when openssh comes up, it doesn't retry | 15:40 |
fungi | yes, possible there's some sort of timing/race problem, i agree | 15:40 |
fungi | a firewall blocking one and not the other seems less likely, though not outside the realm of possibility | 15:41 |
*** factor has quit IRC | 15:41 | |
*** factor has joined #openstack-infra | 15:41 | |
corvus | dirk: a console log from a node in ovh-bhs1: http://paste.openstack.org/show/753512/ | 15:42 |
clarkb | dirk: fungi there is a good chance that this is the same glean runs after network is configured problem? | 15:42 |
clarkb | If the unit deps are wrong on one distro I wouldn't be surprised if other distros are affected | 15:42 |
clarkb | One way we could probably confirm that is to build hte image with cloud init instead | 15:43 |
clarkb | ya that log says failed to start glean for eth0 | 15:43 |
clarkb | so that is the likely cause | 15:43 |
corvus | dirk: it retries every 2 seconds for 10 minutes. | 15:43 |
*** mattw4 has joined #openstack-infra | 15:43 | |
dirk | corvus: what?! | 15:43 |
dirk | this looks totally different | 15:43 |
dirk | especially I like the part | 15:43 |
corvus | clarkb: not sure if you saw earlier, but a console log from a test run looked similar on opensuse, with glean running and configuring NetworkManager | 15:44 |
corvus | clarkb: i should say, that the opensuse and ubuntu test runs look similar | 15:44 |
dirk | that glean fails to start and then wicked configures network | 15:44 |
fungi | clarkb: yes, could be that the common thread is systemd? | 15:44 |
corvus | ianw will be shocked to hear that | 15:44 |
dirk | any problem starts with systemd | 15:44 |
dirk | lets agree on that | 15:44 |
fungi | heh | 15:44 |
clarkb | dirk: ya looking at the most recent paste I'm guessing it falls back to dhcp then "just works" | 15:44 |
clarkb | whereas for whatever reason in testing the fallback isn't happening | 15:45 |
dirk | well, in teh paste it doesnt use networkmanager, it uses wicked | 15:45 |
dirk | and glean fails to start, and wicked then configures networking | 15:45 |
dirk | I guess via dhcp | 15:45 |
clarkb | ah | 15:45 |
clarkb | ya dhcp fallback is my guess too | 15:45 |
dirk | in the nodepool job glean claims it did everything and detected nteworkmanager, but then probably didn't do anything | 15:46 |
dirk | I see no wicked mentioned for example | 15:46 |
corvus | is it possible that in the old devstack job we had dhcp available, but not in the new one? | 15:46 |
dirk | ah, scratch that, there is wicked | 15:46 |
clarkb | do we set DIB_SIMPLE_INIT_NETWORKMANAGER: '1' for all distros? | 15:46 |
clarkb | the test template may be over eager with that flag? it should only be set for centos and fedora | 15:47 |
dirk | corvus: can you poke at the node and get the error on why glean is failing? | 15:47 |
corvus | clarkb: i think you're right | 15:47 |
fungi | auristor: bingo! very similar-looking panic over netconsole... FS-Cache: Assertion failed 4 == 5 is false | 15:48 |
fungi | i'll get this copied out | 15:48 |
fungi | ianw: ^ | 15:48 |
auristor | thanks | 15:48 |
corvus | dirk: the production node? | 15:48 |
dirk | yep | 15:48 |
clarkb | I think we should remove network manager = 1 from distros that are not centos and fedora | 15:48 |
dirk | it looks like it has more problems to investigate | 15:48 |
corvus | should be able to, 1 sec | 15:49 |
*** icarusfactor has joined #openstack-infra | 15:49 | |
clarkb | and see if that makes ssh testing more reliable. If it does then that gives us a good tool to test fixes for glean network manager support on other platforms | 15:49 |
corvus | er, where would glean log? | 15:50 |
*** lucasagomes has quit IRC | 15:50 | |
*** tdasilva has joined #openstack-infra | 15:50 | |
dirk | journalctl | 15:50 |
*** factor has quit IRC | 15:50 | |
dirk | or systemctl status | 15:50 |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Remove job legacy-puppet-beaker-rspec https://review.opendev.org/667357 | 15:51 |
corvus | Unit glean.service could not be found. | 15:51 |
corvus | "journalctl | grep glean" produces: http://paste.openstack.org/show/753513/ | 15:52 |
corvus | "journalctl -u glean@eth0" produces: http://paste.openstack.org/show/753514/ | 15:53 |
corvus | that one might have a clue in it | 15:53 |
*** gfidente has quit IRC | 15:55 | |
*** ccamacho has quit IRC | 15:55 | |
dirk | so glean tries networkmanager which isn'T there and then gives up | 15:56 |
dirk | which allows wicked to take over and finish the configuration | 15:56 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job https://review.opendev.org/665023 | 15:58 |
*** yboaron_ has quit IRC | 15:59 | |
tosky | for the record, another rax mirroring issue (apparently): http://logs.openstack.org/87/651487/4/gate/sahara-tests-scenario/d55b212/job-output.txt.gz#_2019-06-27_15_24_13_041410 | 15:59 |
openstackgerrit | James E. Blair proposed openstack/diskimage-builder master: Replace nodepool func jobs https://review.opendev.org/667221 | 16:00 |
corvus | clarkb, fungi, dirk: ^ that should unwind the dib networkmanager flag | 16:00 |
dirk | ah | 16:01 |
*** sgw has quit IRC | 16:01 | |
dirk | so we used to set DIB_SIMPLE_INIT_NETWORKMANAGER | 16:01 |
*** e0ne has quit IRC | 16:02 | |
dirk | I missed that. that explains why the ci log shows that it tries networkmanager | 16:02 |
dirk | which isn't actually installed and not running | 16:02 |
*** jangutter has quit IRC | 16:02 | |
dirk | so I think one of the bugs is that nothing installs networkmanager when networkmanager is selected | 16:02 |
clarkb | ya and we don't actually want to use network manager on these non red hat distros. Red hat is defaulting to network manager (and removing the alternatives) which is why we've switched there | 16:03 |
*** jcoufal has joined #openstack-infra | 16:05 | |
fungi | tosky: yep, see https://wiki.openstack.org/wiki/Infrastructure_Status | 16:05 |
fungi | tosky: with basically the same kernel panic as https://www.redhat.com/archives/linux-cachefs/2019-June/msg00009.html | 16:05 |
*** bobh has quit IRC | 16:06 | |
fungi | working to extract it from the netconsole screen currently | 16:06 |
*** sshnaidm|ruck is now known as sshnaidm|off | 16:06 | |
tosky | oook | 16:06 |
openstackgerrit | Dirk Mueller proposed openstack/diskimage-builder master: DNM: Enable nodepool testing for opensuse 15.1 https://review.opendev.org/667537 | 16:08 |
openstackgerrit | Dirk Mueller proposed openstack/diskimage-builder master: DNM: Enable nodepool testing for opensuse 15.1 https://review.opendev.org/667537 | 16:09 |
*** igordc has joined #openstack-infra | 16:11 | |
*** Lucas_Gray has quit IRC | 16:16 | |
*** xek__ has joined #openstack-infra | 16:20 | |
*** sgw has joined #openstack-infra | 16:20 | |
*** jpich has quit IRC | 16:20 | |
clarkb | fungi: can you take a look at https://review.opendev.org/#/c/667759/ so that I can finish up the gitea06 redeployment? | 16:21 |
*** jpena|mtg is now known as jpena|off | 16:21 | |
clarkb | I also want to update the db backups to only backup the gitea db as well as delete the old server and write up a doc update on what the process is for deploying a new gitea | 16:21 |
*** xek_ has quit IRC | 16:23 | |
*** pgaxatte has quit IRC | 16:25 | |
*** Lucas_Gray has joined #openstack-infra | 16:26 | |
*** mattw4 has quit IRC | 16:26 | |
*** panda has quit IRC | 16:32 | |
*** adriancz has quit IRC | 16:33 | |
clarkb | fungi: thanks! | 16:35 |
*** panda has joined #openstack-infra | 16:36 | |
yoctozepto | mirror.iad.rax.opendev.org at it again: http://logs.openstack.org/68/666068/7/check/kolla-build-ubuntu-source/7d573f8/logs/build/000_FAILED_kube-controller-manager.txt.gz | 16:36 |
yoctozepto | is there a chance we get out of this alpha testing :D | 16:37 |
yoctozepto | we have a multitude of failures due to this one from today only :-( | 16:37 |
yoctozepto | not to mention the past weeks | 16:38 |
fungi | yoctozepto: can you provide a log with timestamps? | 16:38 |
fungi | logs with no timestamp are not very useful for correlation | 16:38 |
yoctozepto | fungi: http://logs.openstack.org/68/666068/7/check/kolla-build-ubuntu-source/7d573f8/logs/build/?C=M;O=D | 16:39 |
yoctozepto | the failure is at the bottom so modtime is almost perfect | 16:39 |
yoctozepto | also, what timezone is it? | 16:39 |
fungi | utc. always utc | 16:39 |
yoctozepto | but it looks like -1 | 16:39 |
yoctozepto | because it ended just now | 16:40 |
yoctozepto | (well, almost 15 mins ago) | 16:40 |
fungi | yoctozepto: proper timestamps are available for those messages in the console log, looks like: http://logs.openstack.org/68/666068/7/check/kolla-build-ubuntu-source/7d573f8/job-output.txt.gz#_2019-06-27_15_15_27_948921 | 16:40 |
clarkb | according to the job output log (which has timestamps in it) it ended an hour and 15 minutes ago | 16:40 |
fungi | yoctozepto: did it simply not report because there were other longer running builds in the same buildset? | 16:41 |
fungi | anyway, yes, 15:15z falls in the 15:11 to 15:24 timeframe where the server suffered a kernel panic and entered shutoff state and then i booted it and got kafs set back up | 16:42 |
fungi | https://wiki.openstack.org/wiki/Infrastructure_Status | 16:42 |
fungi | currently working to get the panic details attached to a reply on the https://www.redhat.com/archives/linux-cachefs/2019-June/msg00009.html thread | 16:42 |
yoctozepto | fungi: thanks | 16:43 |
yoctozepto | I keep my fingers crossed for you, guys | 16:43 |
openstackgerrit | Merged opendev/system-config master: Put gitea06 back in the rotation https://review.opendev.org/667759 | 16:43 |
*** whoami-rajat has quit IRC | 16:44 | |
corvus | dirk: https://review.opendev.org/667225 is very promising | 16:46 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Only backup the gitea database on gitea hosts https://review.opendev.org/667986 | 16:54 |
*** udesale has quit IRC | 16:54 | |
clarkb | corvus: ^ that should simplify db restores in the future | 16:55 |
mnaser | http://mirror.iad.rax.opendev.org/centos/7/os/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidden | 16:55 |
mnaser | had a job fail from that | 16:55 |
fungi | mnaser: yes, timestamp please? | 16:55 |
mnaser | 2019-06-27 15:25:08.583792 | centos-7 | http://mirror.iad.rax.opendev.org/centos/7/os/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidden | 16:55 |
fungi | https://wiki.openstack.org/wiki/Infrastructure_Status | 16:56 |
mnaser | o-ok i guess we caught it going up | 16:56 |
mnaser | sorry for noise | 16:56 |
fungi | looks like that was just as i was bringing the server back online | 16:56 |
fungi | thanks for confirming! | 16:56 |
fungi | just wanted to make sure it wasn't a new failure | 16:56 |
fungi | we did manage to get another fscache-related kernel panic out of it, so working through ml subscription to be able to post it | 16:57 |
corvus | clarkb, fungi, dirk: omg i have never seen some much green: https://review.opendev.org/665023 https://review.opendev.org/667221 https://review.opendev.org/667225 are all green -- 54 passing jobs, one non-voting failure (gentoo) | 16:58 |
clarkb | \o/ | 16:58 |
fungi | the checks are always greener on the other tenant? | 16:59 |
*** icarusfactor has quit IRC | 16:59 | |
*** jcoufal_ has joined #openstack-infra | 16:59 | |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs https://review.opendev.org/667156 | 17:00 |
*** jcoufal has quit IRC | 17:02 | |
*** xek_ has joined #openstack-infra | 17:02 | |
*** ramishra has quit IRC | 17:03 | |
*** kaiokmo has quit IRC | 17:04 | |
*** xek__ has quit IRC | 17:05 | |
*** whoami-rajat has joined #openstack-infra | 17:06 | |
*** hamzy has quit IRC | 17:07 | |
*** kaiokmo has joined #openstack-infra | 17:07 | |
yoctozepto | fungi: older with timestamps if you are interested | 17:11 |
yoctozepto | 2019-06-27 08:22:01 | INFO:kolla.common.utils.sahara-api:[91mhttp://mirror.iad.rax.opendev.org:8080/rdo/centos7/bf/3f/bf3f0b67e44128b44b56d9525f9308d6d983da55_10e135ca/repodata/repomd.xml: [Errno 12] Timeout on http://mirror.iad.rax.opendev.org:8080/rdo/centos7/bf/3f/bf3f0b67e44128b44b56d9525f9308d6d983da55_10e135ca/repodata/repomd.xml: (28, 'Connection timed out after 30001 milliseconds') | 17:11 |
*** ricolin has joined #openstack-infra | 17:12 | |
*** hamzy has joined #openstack-infra | 17:12 | |
yoctozepto | 2019-06-27 12:48 | INFO:kolla.common.utils.ceph-nfs:E: Failed to fetch http://mirror.iad.rax.opendev.org/ubuntu/pool/universe/d/daemon/daemon_0.6.4-1build1_amd64.deb 404 Not Found [IP: 104.130.4.160 80 | 17:13 |
fungi | yoctozepto: thanks, 08:22:01 was the previous outage (ianw brought it back online around 09:46z but didn't #status log it) | 17:15 |
yoctozepto | fungi: thanks, so 12:48 remains, I'm looking for more | 17:16 |
fungi | the 12:48z failure looks unrelated though. the server was up at the time so either that file didn't exist or we still have the occasional stale cache problem | 17:17 |
fungi | i'll check the mirror-update logs to see if reprepro updated daemon to 0.6.4-1build1 today | 17:17 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs https://review.opendev.org/667156 | 17:19 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs https://review.opendev.org/667156 | 17:21 |
dirk | corvus: yep, so clarkb was right, networkmanager | 17:22 |
*** weifan has joined #openstack-infra | 17:22 | |
corvus | yep, sorry i missed that. thanks clarkb :) | 17:23 |
fungi | or as i like to call it, networkmangler | 17:23 |
* fungi is actually quite a fan of it on his portable machines, it has an excellent cli | 17:23 | |
dirk | https://review.opendev.org/#/c/667537/ - this is pretty green | 17:24 |
fungi | nmcli is extremely flexible. but i still don't use nm on my servers, workstation, appliances or similar stationary devices | 17:24 |
clarkb | we always called it notworkmanager but ya I find it very useful on machines that move around | 17:24 |
yoctozepto | 2019-06-27 09:33:33.021793 | primary | http://mirror.iad.rax.opendev.org/centos/7/os/x86_64/repodata/repomd.xml: [Errno 12] Timeout on http://mirror.iad.rax.opendev.org/centos/7/os/x86_64/repodata/repomd.xml: (28, 'Connection timed out after 30001 milliseconds') | 17:26 |
*** kaisers has quit IRC | 17:27 | |
openstackgerrit | Dirk Mueller proposed openstack/diskimage-builder master: Enable nodepool testing for opensuse 15.1 https://review.opendev.org/667537 | 17:28 |
yoctozepto | 2019-06-27 09:41:10.112334 | primary | http://mirror.iad.rax.opendev.org/centos/7/os/x86_64/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidden | 17:28 |
yoctozepto | fungi: that seems all sans close-in-time duplicates | 17:29 |
openstackgerrit | Dirk Mueller proposed zuul/nodepool master: Add Python 3.7 testing https://review.opendev.org/667720 | 17:29 |
fungi | yoctozepto: yeah, 09:33 and 09:41 are shortly before ianw got the server back online earlier today | 17:29 |
*** rlandy is now known as rlandy|brb | 17:29 | |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Reparent nodepool-functional-openstack-src https://review.opendev.org/667995 | 17:29 |
clarkb | haproxy config has gitea06 in it now. But service need a restart to go into effect | 17:30 |
*** diablo_rojo has joined #openstack-infra | 17:30 | |
yoctozepto | <fungi> the 12:48z failure looks unrelated though. the server was up at the time so either that file didn't exist or we still have the occasional stale cache problem | 17:30 |
yoctozepto | <fungi> i'll check the mirror-update logs to see if reprepro updated daemon to 0.6.4-1build1 tod | 17:30 |
fungi | i think my messages to the linux-cachefs ml are getting greylisted. i tried to confirm my subscription my reply and never got a welcome message so ended up confirming via http instead. now my post to the ml isn't showing up after some minutes | 17:31 |
yoctozepto | did you find anything? | 17:31 |
*** kaisers has joined #openstack-infra | 17:31 | |
fungi | yoctozepto: haven't gotten to it yet. doing too many things at once | 17:31 |
yoctozepto | fungi: ok, no problem | 17:31 |
fungi | was trying to get the kernel panic details off to upstream first | 17:31 |
*** lmiccini has quit IRC | 17:32 | |
yoctozepto | fungi: proper priority I guess :-) | 17:33 |
fungi | ahh, yep, my mta logs confirm their listserv is greylisting my deliveries | 17:33 |
yoctozepto | eh :-( | 17:33 |
*** ralonsoh has quit IRC | 17:35 | |
clarkb | corvus: are those job changes for nodepool and friends ready for review now? | 17:35 |
corvus | clarkb: yes, everything in topic:nodepool-func without a -1 is ready | 17:36 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Reparent nodepool-functional-openstack-src https://review.opendev.org/667995 | 17:37 |
corvus | and that was just a git parenting error | 17:37 |
*** lmiccini has joined #openstack-infra | 17:39 | |
*** mattw4 has joined #openstack-infra | 17:40 | |
fungi | yoctozepto: yep, that package was pulled in by the reprepro run whose subsequent vos release completed at 2018-04-04T00:32:22, but what's especially bizarre is that the mirror server was rebooted between that content update and the 404 request for the file it provided. looks like it's retrievable now though | 17:41 |
fungi | oh, i should have looked at the date. wow that's more than a year ago | 17:41 |
*** Lucas_Gray has quit IRC | 17:42 | |
*** _erlon_ has quit IRC | 17:42 | |
fungi | okay, so apache gave a 404 on a file which should have been ni the mirror for over a year | 17:42 |
fungi | that doesn't sound like a stale (negative) cache to me | 17:43 |
openstackgerrit | Dirk Mueller proposed zuul/nodepool master: Add Python 3.7 testing https://review.opendev.org/667720 | 17:43 |
*** xek_ has quit IRC | 17:43 | |
yoctozepto | fungi: thanks for checking; you have any ideas? | 17:44 |
fungi | i'm checking system and apache logs on the mirror from that timeframe now | 17:45 |
yoctozepto | fungi: I can give you more files from that period | 17:45 |
*** rlandy|brb is now known as rlandy | 17:45 | |
yoctozepto | just you watch | 17:45 |
fungi | Jun 27 12:48:07 mirror01 kernel: [11006.612860] kAFS: Volume 536870950 'mirror.ubuntu' is offline | 17:45 |
fungi | that was in syslog | 17:45 |
fungi | ianw: ^ | 17:45 |
yoctozepto | ok, so I don't have to :D | 17:45 |
corvus | dirk: regarding 667896 -- how does the new job run on opensuse without zookeeper? | 17:46 |
yoctozepto | (all were ubuntu) | 17:46 |
clarkb | corvus: comment on https://review.opendev.org/#/c/665023/47 but I approved it | 17:47 |
fungi | we saw similar log entries at 07:10:12 for mirror.opensuse, 16:49:07 for mirror.ubuntu and 17:09:05 for mirror.opensuse | 17:47 |
fungi | i'll see if these may correspond to anything on the afs source side | 17:47 |
*** ociuhandu has quit IRC | 17:48 | |
*** bhavikdbavishi has quit IRC | 17:48 | |
yoctozepto | fungi: ok, we (fortunately it seems) do not build for opensuse (seems it did not get enough attention) | 17:49 |
fungi | there was a vos release of mirror.ubuntu from 16:48:48 to 16:49:23 | 17:50 |
*** mattw4 has quit IRC | 17:50 | |
corvus | clarkb: agreed; there's lots to improve in the new job; it's a pretty coarse first cut. | 17:50 |
fungi | so corresponding to the 16:49:07 offline error for it | 17:50 |
*** ykarel|away has quit IRC | 17:51 | |
fungi | also a vos release of mirror.ubuntu between 12:47:50 and 12:49:00 corresponding with the 12:48:07 offline error | 17:52 |
openstackgerrit | Dirk Mueller proposed zuul/zuul master: Add Python 3.7 testing https://review.opendev.org/668006 | 17:52 |
fungi | we only do those every couple hours, so that's a mighty strong coincidence | 17:52 |
*** electrofelix has quit IRC | 17:53 | |
*** ociuhandu has joined #openstack-infra | 17:53 | |
*** tdasilva has quit IRC | 17:53 | |
dirk | corvus: yep, we don't need this change anymore. I was trying to run the opensuse 151 test on opensuse nodes but I couldn't get it to work due to nodepool not working | 17:54 |
dirk | corvus: I'll see what I can do to add zookeeper, that will take some time though | 17:54 |
*** pcaruana has joined #openstack-infra | 17:54 | |
dirk | corvus: could you review https://review.opendev.org/#/c/667537/ ? | 17:54 |
fungi | as for mirror.opensuse there were vos release calls 17:07:20-17:10:28 (the 17:09:05 offline error) and 07:08:24-07:12:08 (the 07:10:12 offline error) | 17:55 |
fungi | yeah, so basically kafs is sometimes deciding volumes are offline when there is a vos release underway | 17:56 |
fungi | that's... not great | 17:56 |
corvus | dirk: the jobs part of that looks fine, i don't know the first thing about the simple-init element, so we'll want to make sure to get another reviewer who does. you want to take the "DNM" off of that and i'll leave a vote? | 17:57 |
corvus | dirk: and that shouldn't depend on the py37 change | 17:57 |
corvus | oh you did just take the dnm off, sorry :) | 17:57 |
corvus | dirk: so let's drop the depends-on, then i think it'll be good | 17:58 |
*** ociuhandu has quit IRC | 17:58 | |
dirk | corvus: it just ensures that wicked-service is installed. it should be a noop change | 17:59 |
dirk | as it is usually pulled in via some other dependency. but its better to be explicit about which network scripts service we want in suse flavor | 17:59 |
openstackgerrit | Dirk Mueller proposed openstack/diskimage-builder master: Enable nodepool testing for opensuse 15.1 https://review.opendev.org/667537 | 18:00 |
yoctozepto | <fungi> yeah, so basically kafs is sometimes deciding volumes are offline when there is a vos release underway | 18:00 |
yoctozepto | hopefully you get all those issues reported and someone fixes them sooner than later | 18:00 |
yoctozepto | we rely on mirrors a lot | 18:00 |
clarkb | we do have the openafs fallback, but kafs upstream has also been receptive so far | 18:02 |
clarkb | so we'll want to balance that I expect | 18:02 |
corvus | fungi: perhaps it's not correctly trying another fileserver | 18:04 |
*** raissa has joined #openstack-infra | 18:05 | |
fungi | corvus: that may be. it doesn't seem to log that level of detail currently anywhere i'm finding, but ianw may have some additional ideas once he's awake | 18:07 |
*** jcoufal_ has quit IRC | 18:08 | |
*** ricolin has quit IRC | 18:09 | |
fungi | yay, they only greylisted my post for ~45 minutes: https://www.redhat.com/archives/linux-cachefs/2019-June/msg00010.html | 18:09 |
*** mattw4 has joined #openstack-infra | 18:12 | |
*** mattw4 has quit IRC | 18:17 | |
*** raissa has quit IRC | 18:23 | |
*** raissa has joined #openstack-infra | 18:23 | |
*** raissa has quit IRC | 18:23 | |
*** hamzy has quit IRC | 18:24 | |
openstackgerrit | Brian Haley proposed opendev/irc-meetings master: Create a meeting for Networking OVN project https://review.opendev.org/668013 | 18:24 |
*** mattw4 has joined #openstack-infra | 18:24 | |
*** hamzy has joined #openstack-infra | 18:25 | |
*** hamzy has quit IRC | 18:30 | |
*** mattw4 has quit IRC | 18:30 | |
fungi | ianw: i've attempted to encapsulate today's revelations in https://etherpad.openstack.org/p/opendev-mirror-afs | 18:37 |
*** weifan has quit IRC | 18:38 | |
corvus | fungi: wow, that greylisting time is ... generous | 18:39 |
fungi | indeed | 18:39 |
fungi | their greylist 451 at least says they're grelisting you, but incorrectly suggests you return in 5 minutes | 18:40 |
fungi | (also, how is some mta supposed to actually parse that out of the 451 response and know to adjust its requeue time?) | 18:41 |
yoctozepto | fungi: maybe it's a trap - if you do, you are not a legitimate MTA! | 18:44 |
fungi | heh, wouldn't be the strangest (or worst) spam mitigation strategy i've ever seen | 18:45 |
yoctozepto | ;D | 18:46 |
corvus | i usually saw it as a courtesy message to say "dear admin of remote system being greylisted, i am glad you are a real person who is examining their mta logs to determine why a message was not delivered; perhaps you are doing so at the request of a user who is confused. be at peace and understand that all will be well in a mere %i minutes." | 18:46 |
corvus | of course, sending the wrong time is just rude. | 18:47 |
yoctozepto | corvus: they should use just the phrase you invented | 18:47 |
yoctozepto | would be cool | 18:47 |
corvus | yoctozepto: yeah, we usually just get "Printer on fire?" | 18:48 |
yoctozepto | corvus: yeah, unfortunately :-( | 18:49 |
fungi | halt and pc load letter | 18:49 |
*** diablo_rojo has quit IRC | 18:50 | |
openstackgerrit | Clint 'SpamapS' Byrum proposed zuul/zuul-jobs master: Revert "install-nodejs: add support for RPM-based OSes" https://review.opendev.org/668021 | 18:50 |
*** diablo_rojo has joined #openstack-infra | 18:51 | |
*** diablo_rojo_ has joined #openstack-infra | 18:51 | |
*** diablo_rojo_ has quit IRC | 18:51 | |
*** weifan has joined #openstack-infra | 19:18 | |
openstackgerrit | Merged zuul/zuul-jobs master: Revert "install-nodejs: add support for RPM-based OSes" https://review.opendev.org/668021 | 19:19 |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: Add playbooks folder and debug zuul job https://review.opendev.org/668029 | 19:20 |
*** weifan has quit IRC | 19:31 | |
*** weifan has joined #openstack-infra | 19:31 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add docs for deploying a new gitea server https://review.opendev.org/668030 | 19:32 |
clarkb | infra-root ^ that should be fairly complete based on my notes/command history | 19:32 |
*** armstrong has joined #openstack-infra | 19:39 | |
clarkb | The remaining tasks are to clean up the old server and to restart haproxy. Any objection to me getting that done after lunch? | 19:39 |
fungi | #status log deleted https://github.com/openstack/tobiko at slaweq's request as a member of https://review.opendev.org/#/admin/groups/tobiko-core | 19:40 |
openstackstatus | fungi: finished logging | 19:40 |
fungi | slaweq: ^ sorry about the delay. busy day! | 19:40 |
slaweq | fungi: no problem, thx a lot | 19:41 |
fungi | glad to be of assistance | 19:41 |
*** kjackal has quit IRC | 19:48 | |
*** kjackal has joined #openstack-infra | 19:49 | |
*** weifan has quit IRC | 19:51 | |
*** weifan has joined #openstack-infra | 19:52 | |
*** witek has quit IRC | 19:53 | |
*** whoami-rajat has quit IRC | 19:54 | |
*** jtomasek has quit IRC | 19:55 | |
*** weifan has quit IRC | 19:56 | |
*** weifan has joined #openstack-infra | 19:59 | |
*** diablo_rojo has quit IRC | 19:59 | |
*** kjackal has quit IRC | 20:00 | |
fungi | okay, finally got around to reestablishing the netconsole stream from the opendev iad mirror since the last reboot | 20:04 |
fungi | gonna go grab some very late lunch now that the openstack release meeting is over | 20:04 |
clarkb | ok food is consumed any objection to running docker-compose restart on opendev.org to pick up haproxy config update? | 20:14 |
*** Goneri has quit IRC | 20:15 | |
*** weifan has quit IRC | 20:16 | |
clarkb | maybe I go on a bike ride first | 20:19 |
corvus | fungi: dhowells just sent me some info about tracing to possibly capture info about the volume going offline; does this ring a bell? | 20:19 |
*** weifan has joined #openstack-infra | 20:20 | |
corvus | fungi: i updated the etherpad -- lines 21-31 https://etherpad.openstack.org/p/opendev-mirror-afs | 20:22 |
corvus | clarkb: lgtm (docs change and lb restart) | 20:24 |
*** weifan has quit IRC | 20:27 | |
*** weifan has joined #openstack-infra | 20:30 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Split job definitions into multiple files https://review.opendev.org/668040 | 20:36 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add a simple test of the install-nodejs role https://review.opendev.org/668043 | 20:43 |
*** weifan has quit IRC | 20:45 | |
openstackgerrit | Sean McGinnis proposed openstack/project-config master: Retire the release-schedule-generator project https://review.opendev.org/668045 | 20:45 |
*** weifan has joined #openstack-infra | 20:47 | |
*** weifan has quit IRC | 20:48 | |
*** weifan has joined #openstack-infra | 20:48 | |
*** jtomasek has joined #openstack-infra | 20:51 | |
*** weifan has quit IRC | 20:53 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: DNM: install-nodejs: add support for RPM-based OSes https://review.opendev.org/668048 | 20:53 |
*** weifan has joined #openstack-infra | 20:54 | |
*** raissa has joined #openstack-infra | 20:55 | |
*** eharney has quit IRC | 20:55 | |
openstackgerrit | Sean McGinnis proposed openstack/infra-manual master: Update .gitreview wording in retire process https://review.opendev.org/668049 | 20:56 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Move upload-git-mirror test job in-repo https://review.opendev.org/668050 | 21:01 |
*** pcaruana has quit IRC | 21:05 | |
*** Goneri has joined #openstack-infra | 21:09 | |
*** diablo_rojo has joined #openstack-infra | 21:10 | |
*** sgw has quit IRC | 21:12 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add a simple test of the install-nodejs role https://review.opendev.org/668043 | 21:13 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Move upload-git-mirror test job in-repo https://review.opendev.org/668050 | 21:13 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add zuul-jobs-tox-linters https://review.opendev.org/668052 | 21:13 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Move upload-git-mirror test job in-repo https://review.opendev.org/668050 | 21:16 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add zuul-jobs-tox-linters https://review.opendev.org/668052 | 21:16 |
*** tesseract has quit IRC | 21:19 | |
ianw | fungi: thanks for looking in on iad ... there's also another [Thu Jun 27 15:53:34 2019] kAFS: afs_dir_check_page(32f): bad magic 1/2 is 0000 which i've never seen before | 21:23 |
*** rcernin has joined #openstack-infra | 21:24 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Move upload-git-mirror test job in-repo https://review.opendev.org/668050 | 21:27 |
*** jtomasek has quit IRC | 21:30 | |
dirk | is the graphite01.opendev.org access public? I am looking for a readonly access to host my own grafana on top | 21:33 |
fungi | ianw: ooh, it's like an egg hunt. i hadn't spotted that yet either. pretty! | 21:37 |
fungi | corvus: thanks for the extra notes from dhowells! | 21:37 |
pabelanger | dirk: yes, but you are also able to create your own dashboards at: https://opendev.org/openstack/project-config/src/branch/master/grafana | 21:38 |
fungi | dirk: yes, entirely public and the recommended cname alias is graphite.opendev.org | 21:39 |
openstackgerrit | Merged zuul/nodepool master: Switch functional testing to a devstack consumer job https://review.opendev.org/665023 | 21:39 |
fungi | dirk: that way when we eventually rebuild the server you won't need to make config changes | 21:39 |
fungi | and as pabelanger says, general-purpose dashboards on grafana.o.o are also most welcome | 21:40 |
dirk | Yeah, that's why I am asking | 21:40 |
dirk | I would like to prepare a dashboard | 21:40 |
dirk | I can't figure that out by writing yaml | 21:41 |
cloudnull | qq - currently reading https://zuul-ci.org/docs/zuul/user/jobs.html - when using the base job, is "zuul.executor.src_root" the path where the source code is cloned? | 21:41 |
cloudnull | or is that "work_root" ? | 21:41 |
cloudnull | or am I just misunderstanding things. | 21:42 |
pabelanger | src_root is executor path, where git repos are | 21:42 |
dirk | fungi: thanks, where can I find the magic URL to use? | 21:42 |
pabelanger | work_root, is the executor side where things like log collection goes | 21:42 |
dirk | Just using / doesn't work | 21:43 |
openstackgerrit | Merged opendev/zone-opendev.org master: Revert "Temporarily remove AAAA for mirror01.iad.rax" https://review.opendev.org/667766 | 21:43 |
cloudnull | thanks pabelanger | 21:44 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add base integration roles https://review.opendev.org/668061 | 21:44 |
pabelanger | cloudnull: usually on node side it is {{ ansible_user_dir }}/{{ zuul.project.src_dir }} where the contents of git are cloned too | 21:44 |
pabelanger | but that is customizable via base jobs | 21:44 |
cloudnull | ah ha! | 21:44 |
*** mriedem is now known as mriedem_afk | 21:45 | |
cloudnull | pabelanger whats the default log collection path on the node side? | 21:46 |
corvus | ianw: i'm trying not to draw you away from the afs stuff too much, but when you have a second, https://review.opendev.org/667221 and https://review.opendev.org/667225 could use a +3 from you | 21:47 |
pabelanger | cloudnull: {{ ansible_user_dir }}/zuul-output IIRC | 21:48 |
pabelanger | let me confirm | 21:48 |
pabelanger | yah, ensure-output-dirs: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-output-dirs | 21:49 |
pabelanger | cloudnull: so, if you copy logs into that directory, on node, zuul will collect them in post-run base job | 21:49 |
cloudnull | excellent! | 21:51 |
cloudnull | thanks | 21:51 |
pabelanger | np | 21:51 |
corvus | pabelanger, cloudnull: specifically, "{{ ansible_user_dir }}/zuul-output/logs" i think (for logs -- and artifacts/ and docs/ otherwise) | 21:52 |
fungi | dirk: not entirely sure what you mean by "magic url" but i usually browse around the https://graphite.opendev.org/ to build queries and then copy the resulting image url to get the query | 21:54 |
cloudnull | thanks corvus! | 21:54 |
*** Goneri has quit IRC | 21:54 | |
fungi | dirk: also graphite publishes documentation which explains their query language in excruciating detail, if you need it | 21:56 |
dirk | fungi: i'm looking for the datasource "url" to access. I found that "browser" seems to work | 21:57 |
*** raissa has quit IRC | 21:58 | |
fungi | ahh | 21:58 |
clarkb | ok back and restarting haproxy | 21:58 |
clarkb | fungi: dirk I would look at the grafana config | 21:58 |
clarkb | since it pulls data from there as well | 21:58 |
*** raissa has joined #openstack-infra | 21:59 | |
*** raissa has quit IRC | 21:59 | |
clarkb | haproxy is restarted and seems to function | 21:59 |
*** raissa has joined #openstack-infra | 22:00 | |
*** raissa has quit IRC | 22:00 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add base integration roles https://review.opendev.org/668061 | 22:00 |
*** raissa has joined #openstack-infra | 22:00 | |
clarkb | and now I'm going to look at deleting the old server (and its volume if that doesn't happen automatically | 22:01 |
*** raissa has quit IRC | 22:01 | |
*** raissa has joined #openstack-infra | 22:01 | |
*** raissa has quit IRC | 22:02 | |
*** raissa has joined #openstack-infra | 22:02 | |
*** raissa has quit IRC | 22:02 | |
*** raissa has joined #openstack-infra | 22:03 | |
*** raissa has quit IRC | 22:03 | |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Move upload-git-mirror test job in-repo https://review.opendev.org/668050 | 22:03 |
openstackgerrit | James E. Blair proposed zuul/zuul-jobs master: Add base integration roles https://review.opendev.org/668061 | 22:03 |
clarkb | ok the volume that was attached to old gitea06 doesn't seem to want to delete | 22:07 |
clarkb | mnaser: if you get a chance volume 7cd5e56d-8d16-4a5e-b0d6-a4b982c61e80 in sjc1 for user openstackci was the boot volume for the host that had a sad after a live migrate. I've finally managed to replace that server and delete the server but now the volume won't delete | 22:08 |
clarkb | mnaser: happy for that volume to be deleted but also happy to keep it around if you want to do further debugging with it | 22:08 |
*** raissa has joined #openstack-infra | 22:09 | |
*** raissa has quit IRC | 22:10 | |
*** raissa has joined #openstack-infra | 22:10 | |
ianw | corvus: thanks for working on it all! | 22:10 |
*** raissa has quit IRC | 22:10 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Only ping6 in launch node if ping6 is present https://review.opendev.org/668065 | 22:11 |
clarkb | That was a local edit I had to make to launch node to get it to work with our minimal images | 22:11 |
clarkb | #status log Gitea06 had a corrupted root disk around the time of the Denver summit. It has been replaced with a new server and added back to the haproxy config. | 22:12 |
openstackstatus | clarkb: finished logging | 22:12 |
ianw | clarkb: hrm, same area as skipping which i did with https://review.opendev.org/#/c/667548/ ... ping6 doesn't seem like an onerous requirement for our hosts though, it certainly has had a workout lately :) | 22:13 |
clarkb | ianw: ya I think we should add it to our images | 22:14 |
clarkb | but until we do it would be good for launch node to function too | 22:14 |
clarkb | yours is a bit more opt in which is probably a good thing | 22:14 |
clarkb | I'll abandon mine and review yours | 22:15 |
clarkb | (that way it is explicit that the checks aren't happening) | 22:15 |
*** armstrong has quit IRC | 22:19 | |
*** mattw4 has joined #openstack-infra | 22:23 | |
*** slaweq has quit IRC | 22:23 | |
*** pkopec is now known as pkopec|afk | 22:27 | |
clarkb | is there a way to link to a host in cacti? | 22:27 |
melwitt | are there any docs about how to use search in storyboard? | 22:27 |
*** mattw4 has quit IRC | 22:28 | |
*** mattw4 has joined #openstack-infra | 22:28 | |
*** ekultails has quit IRC | 22:28 | |
melwitt | I start typing stuff in the box and the spinny thing spins and I think I'm doing it wrong. I want to search within one project specifically | 22:29 |
clarkb | melwitt: I'm not finding search docs in storyboard's docs. But the way I think it wants to work is you type a term like nova, give it a second to find all the term types that match and select the one you want like project:nova | 22:30 |
clarkb | and you do that for the various terms you want like if you type in Clark it should giveyou my user as an option to further filter by | 22:31 |
clarkb | but it wants you to select those results out of the dropdown | 22:31 |
melwitt | ok, thank you. once I figure it out I'll try to propose a patch to the storyboard ui to add some basic instructions on the search page, if that's possible | 22:32 |
melwitt | I'm trying to check if someone reported a bug for something in osc first before I open a new one | 22:32 |
corvus | also, fwiw, i'm pretty sure the storyboard team is open to patches implementing a "lucene-like" query syntax (eg, like logstash, gerrit, etc) | 22:35 |
*** igordc has quit IRC | 22:36 | |
*** Goneri has joined #openstack-infra | 22:36 | |
melwitt | oh, I would love that myself. if I could figure out how to do it, I will propose it :) | 22:37 |
*** mattw4 has quit IRC | 22:37 | |
melwitt | thanks for the pointers | 22:37 |
clarkb | ok responded to sean about gitea | 22:40 |
clarkb | load average on gitea06 is higher than I would like | 22:49 |
clarkb | I wonder if the unpacked repos are slow? | 22:49 |
corvus | clarkb: yes, they are! | 22:51 |
corvus | they're going to have toooons of extra refs | 22:51 |
clarkb | ok I'm going to manually run the weekly cron then | 22:52 |
clarkb | as it won't run until sunday looks like | 22:52 |
*** weifan has quit IRC | 22:53 | |
*** rfarr_ has quit IRC | 22:56 | |
*** rlandy has quit IRC | 22:57 | |
*** eernst has joined #openstack-infra | 22:58 | |
*** tosky has quit IRC | 23:00 | |
openstackgerrit | Merged openstack/diskimage-builder master: Replace nodepool func jobs https://review.opendev.org/667221 | 23:00 |
*** eernst has quit IRC | 23:02 | |
openstackgerrit | Merged opendev/glean master: Replace nodepool func jobs https://review.opendev.org/667225 | 23:07 |
corvus | clarkb: ++, i like your ml post and followed it up with what i hope are more supportive words | 23:12 |
clarkb | thanks! | 23:14 |
*** aaronsheffield has quit IRC | 23:20 | |
*** tonyb has quit IRC | 23:32 | |
*** tonyb has joined #openstack-infra | 23:32 | |
*** Goneri has quit IRC | 23:33 | |
*** sthussey has quit IRC | 23:40 | |
*** slaweq has joined #openstack-infra | 23:42 | |
*** slaweq has quit IRC | 23:46 | |
*** efried has quit IRC | 23:47 | |
*** efried has joined #openstack-infra | 23:48 | |
clarkb | the gc cron seems to have helped a ton on gitea06. It is still running but load average is way down now | 23:54 |
*** dchen has joined #openstack-infra | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!