*** thorst has joined #openstack-infra | 00:04 | |
openstackgerrit | Merged openstack-infra/system-config master: Also add buildlogs.cdn.centos.org https://review.openstack.org/492256 | 00:09 |
---|---|---|
*** dingyichen has joined #openstack-infra | 00:12 | |
*** sflanigan has joined #openstack-infra | 00:16 | |
*** sflanigan has joined #openstack-infra | 00:16 | |
*** thingee_ has quit IRC | 00:19 | |
openstackgerrit | Jeremy Stanley proposed openstack-infra/system-config master: Limit PTL rolls to foundation members https://review.openstack.org/492329 | 00:21 |
*** kornicameister has quit IRC | 00:25 | |
*** kornicameister has joined #openstack-infra | 00:26 | |
ianw | geez i hate the total lack of formatting available in launchpad | 00:29 |
ianw | clarkb: the plot thickens on the pypi error mismatches -> https://bugs.launchpad.net/openstack-gate/+bug/1708707/comments/1 | 00:29 |
openstack | Launchpad bug 1708707 in OpenStack-Gate "Pip finds hash mismatch for package during installation" [Undecided,New] | 00:29 |
*** thorst has quit IRC | 00:31 | |
*** gildub has joined #openstack-infra | 00:34 | |
ianw | it seems like it thought it got it from the mirror, but didn't actually | 00:36 |
*** yamamoto_ has quit IRC | 00:37 | |
*** yamamoto has joined #openstack-infra | 00:39 | |
*** kornicameister has quit IRC | 00:40 | |
*** Apoorva_ has quit IRC | 00:43 | |
*** sbezverk has quit IRC | 00:44 | |
*** kornicameister has joined #openstack-infra | 00:45 | |
*** xinliang has joined #openstack-infra | 00:45 | |
*** xinliang has quit IRC | 00:45 | |
*** xinliang has joined #openstack-infra | 00:45 | |
*** markvoelker has joined #openstack-infra | 00:45 | |
*** liujiong has joined #openstack-infra | 00:51 | |
pabelanger | cool: http://mirror.regionone.infracloud-vanilla.openstack.org:8080/buildlogs.cdn.centos/centos/7/cloud/x86_64/openstack-pike/ | 00:52 |
pabelanger | looks to be caching things now | 00:52 |
*** _ryan_ has quit IRC | 00:53 | |
pabelanger | we likely can update configure-mirrors.sh variables and hit it directly to avoid the redirect | 00:55 |
pabelanger | http://paste.openstack.org/show/617998/ | 00:55 |
*** EricGonczer_ has quit IRC | 00:57 | |
*** eharney has joined #openstack-infra | 00:59 | |
mnaser | fungi great, ill be able to bump it up once raw images are uploaed | 01:00 |
*** bobh has joined #openstack-infra | 01:02 | |
*** rhallisey has quit IRC | 01:02 | |
*** jkilpatr has quit IRC | 01:04 | |
fungi | post jobs all have nodes now, so we seem to finally be caught up for the day | 01:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/system-config master: Replace buildlogs.centos with buildlogs.cdn.centos https://review.openstack.org/492336 | 01:06 |
*** dave-mccowan has joined #openstack-infra | 01:07 | |
openstackgerrit | Mohammed Naser proposed openstack-infra/project-config master: Fix Grafana graphs for VEXXHOST https://review.openstack.org/492338 | 01:08 |
*** thorst has joined #openstack-infra | 01:08 | |
*** thorst has quit IRC | 01:08 | |
*** thorst has joined #openstack-infra | 01:09 | |
pabelanger | fungi: I'm hoping in the next 2 hours change pipeline for tripleo will also look much better | 01:12 |
*** gildub has quit IRC | 01:12 | |
*** xarses_ has quit IRC | 01:12 | |
pabelanger | I think that was part of the reason infracloud was seeing networking issues, every request to mirror was 302 redirected to internet | 01:13 |
*** rwsu_ has quit IRC | 01:13 | |
*** thorst has quit IRC | 01:13 | |
pabelanger | hopefully this will conserve some bandwidth once cached | 01:14 |
*** mwarad has joined #openstack-infra | 01:16 | |
fungi | makes sense | 01:17 |
*** tuanluong has joined #openstack-infra | 01:19 | |
openstackgerrit | Paul Belanger proposed openstack-infra/tripleo-ci master: Stop trying to build networking-bagpipe with DLRN https://review.openstack.org/492339 | 01:24 |
*** thorst has joined #openstack-infra | 01:25 | |
*** thorst has quit IRC | 01:27 | |
*** kornicameister has quit IRC | 01:30 | |
*** mwarad has quit IRC | 01:30 | |
*** bobh has quit IRC | 01:35 | |
*** kornicameister has joined #openstack-infra | 01:35 | |
*** ramishra has quit IRC | 01:39 | |
*** rwsu has joined #openstack-infra | 01:41 | |
openstackgerrit | Paul Belanger proposed openstack-infra/system-config master: Add registry.npmjs.org reverse proxy cache https://review.openstack.org/457720 | 01:42 |
pabelanger | fungi: clarkb: ianw: ^next one up, npm reverse proxy cache, just seen a job in infracloud timeout fetching npms | 01:43 |
*** cuongnv has joined #openstack-infra | 01:44 | |
mnaser | im not seeing any data in graphite for jobs that run on our cloud and searching logstash for "node_provider:vexxhost-ca-ymq-1" yields no results | 01:44 |
mnaser | i know jobs are running.. but is there a reason why they might not be reporting ? | 01:45 |
mnaser | logstash job queue doesnt look too bad | 01:45 |
*** gongysh has joined #openstack-infra | 01:45 | |
pabelanger | mnaser: node_provider:"ymq-1" | 01:49 |
pabelanger | http://logs.openstack.org/33/492133/1/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/6505200/console.html timedout on vexxhost | 01:50 |
mnaser | pabelanger oh i see, no idea why, but good to know | 01:50 |
pabelanger | possible that job was warming up the afs cache | 01:51 |
mnaser | could be, it probably hasnt been touched for a long time and seems like its the only timeout.. ill keep an eye | 01:53 |
*** zhaozhenlong has joined #openstack-infra | 01:57 | |
*** cshastri has joined #openstack-infra | 01:58 | |
openstackgerrit | Paul Belanger proposed openstack-infra/elastic-recheck master: Add query for bug 1709744 https://review.openstack.org/492342 | 02:01 |
openstack | bug 1709744 in OpenStack-Gate "Gem fetch networking errors" [Undecided,New] https://launchpad.net/bugs/1709744 | 02:01 |
*** aeng has quit IRC | 02:03 | |
*** _ryan_ has joined #openstack-infra | 02:04 | |
dmsimard | ianw: so platform:redhat is only RHEL and CentOS ? | 02:07 |
dmsimard | I don't seem to see a different classifier but there is platform:fedora | 02:08 |
pabelanger | mnaser: I think we might need to turn off vexxhost for a bit, I don't think hostname is setup on our nodes | 02:15 |
pabelanger | http://logs.openstack.org/57/475457/4/gate/gate-tripleo-ci-centos-7-containers-multinode/54b3deb/console.html ran on vexxhost, but hostnamectl was empty | 02:15 |
*** spligak has quit IRC | 02:16 | |
mnaser | pabelanger is that something that we have to do on our side? | 02:16 |
pabelanger | mnaser: we setup hostname with glean IIRC | 02:17 |
openstackgerrit | YAMAMOTO Takashi proposed openstack-infra/project-config master: networking-midonet: Add runtime graphs to grafana dashboard https://review.openstack.org/492350 | 02:17 |
pabelanger | so because it is not running, its likely possible we don't get a valid setting | 02:18 |
mnaser | it is very much possible because i remember --ssh and --hostname was only being called when it found the configdrive | 02:18 |
pabelanger | mnaser: I propose we stop vexxhost for tonight and work on getting a new release of glean tomorrow | 02:18 |
mnaser | pabelanger no problem | 02:19 |
mnaser | i rather not disturb builds in a critical time | 02:19 |
pabelanger | mnaser: mind propose a patch for max-server 0? | 02:19 |
mnaser | sure 1 second | 02:19 |
pabelanger | mnaser: agree | 02:19 |
* fungi is on hand to review such | 02:22 | |
openstackgerrit | Mohammed Naser proposed openstack-infra/project-config master: Set max-servers down to 0 for VEXXHOST https://review.openstack.org/492351 | 02:23 |
mnaser | pabelanger fungi ^ | 02:23 |
*** Sukhdev_ has quit IRC | 02:23 | |
pabelanger | +2 thanks | 02:24 |
fungi | done and done | 02:25 |
fungi | thanks again mnaser! | 02:25 |
*** yamahata has quit IRC | 02:25 | |
*** eharney has quit IRC | 02:26 | |
*** dave-mccowan has quit IRC | 02:27 | |
*** jamesmcarthur has joined #openstack-infra | 02:27 | |
mnaser | fungi np, im writing a small glean patch now | 02:27 |
*** iyamahat has quit IRC | 02:28 | |
*** thorst has joined #openstack-infra | 02:28 | |
*** hongbin has joined #openstack-infra | 02:28 | |
*** thorst has quit IRC | 02:29 | |
*** kornicameister has quit IRC | 02:29 | |
*** Guest93343 has quit IRC | 02:29 | |
*** ramishra has joined #openstack-infra | 02:30 | |
*** jamesmcarthur has quit IRC | 02:32 | |
*** Hal has joined #openstack-infra | 02:32 | |
*** Hal is now known as Guest54759 | 02:32 | |
*** mwarad has joined #openstack-infra | 02:34 | |
openstackgerrit | Mohammed Naser proposed openstack-infra/glean master: Add checks for uppercase config drive label https://review.openstack.org/492353 | 02:35 |
mnaser | pabelanger ^ if you wanna go over that tomorrow :) | 02:35 |
mnaser | i tested the detection logic locally but not all of glean to be fully honest | 02:35 |
*** kornicameister has joined #openstack-infra | 02:35 | |
*** esberglu has quit IRC | 02:40 | |
*** LindaWang has joined #openstack-infra | 02:42 | |
*** EricGonczer_ has joined #openstack-infra | 02:46 | |
*** hongbin has quit IRC | 02:48 | |
*** _mwarad_ has joined #openstack-infra | 02:49 | |
*** hongbin has joined #openstack-infra | 02:49 | |
*** dklyle has quit IRC | 02:49 | |
*** zhurong has joined #openstack-infra | 02:51 | |
*** mwarad has quit IRC | 02:52 | |
*** hongbin has quit IRC | 02:52 | |
*** hongbin has joined #openstack-infra | 02:53 | |
*** zhaozhenlong has left #openstack-infra | 02:54 | |
*** bunnyKun has joined #openstack-infra | 02:54 | |
*** dave-mccowan has joined #openstack-infra | 02:54 | |
*** yamahata has joined #openstack-infra | 02:55 | |
openstackgerrit | Merged openstack-infra/project-config master: Set max-servers down to 0 for VEXXHOST https://review.openstack.org/492351 | 02:56 |
*** markvoelker has quit IRC | 03:02 | |
*** markvoelker has joined #openstack-infra | 03:03 | |
ianw | dmsimard: sorry, i'm not following? | 03:14 |
*** david-lyle has joined #openstack-infra | 03:22 | |
*** ramishra has quit IRC | 03:28 | |
*** ramishra has joined #openstack-infra | 03:29 | |
*** nicolasbock has joined #openstack-infra | 03:29 | |
*** cshastri has quit IRC | 03:32 | |
*** yamamoto has quit IRC | 03:36 | |
*** bunnyKun has quit IRC | 03:37 | |
*** bunnyKun has joined #openstack-infra | 03:41 | |
*** EricGonczer_ has quit IRC | 03:41 | |
*** ramishra has quit IRC | 03:42 | |
*** hongbin has quit IRC | 03:43 | |
*** cshastri has joined #openstack-infra | 03:45 | |
*** _mwarad_ has quit IRC | 03:47 | |
*** yamamoto has joined #openstack-infra | 03:47 | |
*** dave-mccowan has quit IRC | 03:48 | |
*** yamamoto has quit IRC | 03:48 | |
*** yamamoto has joined #openstack-infra | 03:49 | |
*** ramishra has joined #openstack-infra | 03:50 | |
*** kornicameister has quit IRC | 03:53 | |
*** kornicameister has joined #openstack-infra | 03:54 | |
*** luzC has quit IRC | 03:57 | |
*** markvoelker has quit IRC | 04:01 | |
*** markvoelker has joined #openstack-infra | 04:01 | |
*** bunnyKun has quit IRC | 04:03 | |
*** cshastri has quit IRC | 04:03 | |
*** thorst has joined #openstack-infra | 04:04 | |
*** rmcall has quit IRC | 04:07 | |
*** thorst has quit IRC | 04:09 | |
*** spligak has joined #openstack-infra | 04:12 | |
*** ykarel|away has joined #openstack-infra | 04:18 | |
*** Sukhdev has joined #openstack-infra | 04:20 | |
openstackgerrit | Eric Kao proposed openstack-infra/project-config master: Increasing job timeout https://review.openstack.org/492373 | 04:25 |
*** thorst has joined #openstack-infra | 04:26 | |
johnsom | Any thoughts on these instant timeouts? http://logs.openstack.org/43/491643/2/gate/gate-octavia-python35/aa4c23b/console.html | 04:26 |
johnsom | I guess the ansible is failing | 04:27 |
johnsom | http://logs.openstack.org/43/491643/2/gate/gate-octavia-python35/aa4c23b/_zuul_ansible/ansible_log.txt | 04:28 |
*** cshastri has joined #openstack-infra | 04:28 | |
*** dhajare has joined #openstack-infra | 04:29 | |
johnsom | The error was: OSError: [Errno 2] No such file or directory: '/home/jenkins/workspace/gate-octavia-python35' | 04:29 |
*** gongysh has quit IRC | 04:30 | |
*** thorst has quit IRC | 04:30 | |
*** gouthamr has joined #openstack-infra | 04:31 | |
*** sbezverk has joined #openstack-infra | 04:35 | |
*** adisky__ has joined #openstack-infra | 04:37 | |
*** hareesh has joined #openstack-infra | 04:38 | |
*** sbezverk has quit IRC | 04:40 | |
*** rmcall has joined #openstack-infra | 04:40 | |
*** claudiub has joined #openstack-infra | 04:40 | |
*** squid has joined #openstack-infra | 04:41 | |
openstackgerrit | Merged openstack-infra/project-config master: [magnum] Move -nv test to experimental https://review.openstack.org/492177 | 04:42 |
*** calebb has quit IRC | 04:43 | |
*** squid is now known as calebb | 04:43 | |
*** david-lyle has quit IRC | 04:44 | |
*** rmcall has quit IRC | 04:45 | |
*** dklyle has joined #openstack-infra | 04:45 | |
*** cshastri has quit IRC | 04:46 | |
mnaser | would anyone be able to review this? https://review.openstack.org/#/c/491800/ | 04:50 |
*** calebb has quit IRC | 04:52 | |
*** squid has joined #openstack-infra | 04:53 | |
*** squid is now known as calebb | 04:53 | |
*** rwsu has quit IRC | 04:58 | |
openstackgerrit | Artur Basiak proposed openstack-infra/project-config master: Provide unified gate configuration https://review.openstack.org/490790 | 04:58 |
*** gongysh has joined #openstack-infra | 05:09 | |
*** sree has joined #openstack-infra | 05:11 | |
openstackgerrit | Merged openstack-infra/system-config master: Add Fedora Atomic mirrors https://review.openstack.org/491800 | 05:13 |
mnaser | thank you ianw | 05:14 |
*** gouthamr has quit IRC | 05:16 | |
openstackgerrit | Artur Basiak proposed openstack-infra/project-config master: Change service name https://review.openstack.org/492379 | 05:29 |
*** Sukhdev has quit IRC | 05:30 | |
*** luzC has joined #openstack-infra | 05:31 | |
*** jamesdenton has quit IRC | 05:36 | |
*** sshnaidm|off has quit IRC | 05:37 | |
*** luzC has quit IRC | 05:37 | |
*** liujiong has quit IRC | 05:44 | |
*** liujiong has joined #openstack-infra | 05:45 | |
*** ccamacho has left #openstack-infra | 05:47 | |
*** ccamacho has quit IRC | 05:47 | |
*** armax has joined #openstack-infra | 05:48 | |
*** e0ne has joined #openstack-infra | 05:48 | |
*** armax has quit IRC | 05:48 | |
*** _ryan_ has quit IRC | 05:51 | |
*** psachin has joined #openstack-infra | 05:52 | |
*** luzC has joined #openstack-infra | 05:54 | |
*** ykarel_ has joined #openstack-infra | 05:54 | |
*** luzC has quit IRC | 05:57 | |
*** jamesdenton has joined #openstack-infra | 05:57 | |
*** ykarel|away has quit IRC | 05:57 | |
openstackgerrit | Artur Basiak proposed openstack-infra/project-config master: Provide unified gate configuration https://review.openstack.org/490790 | 05:59 |
*** yamamoto has quit IRC | 06:01 | |
*** e0ne has quit IRC | 06:03 | |
*** dhajare has quit IRC | 06:04 | |
*** rcernin has joined #openstack-infra | 06:05 | |
*** slaweq has quit IRC | 06:05 | |
*** yamamoto has joined #openstack-infra | 06:05 | |
*** dhajare has joined #openstack-infra | 06:06 | |
*** rwsu has joined #openstack-infra | 06:07 | |
*** pgadiya has joined #openstack-infra | 06:07 | |
*** kjackal_ has joined #openstack-infra | 06:08 | |
*** junbo has quit IRC | 06:12 | |
*** cshastri has joined #openstack-infra | 06:14 | |
*** junbo has joined #openstack-infra | 06:15 | |
*** jamesdenton has quit IRC | 06:16 | |
*** martinkopec has joined #openstack-infra | 06:19 | |
*** dhajare has quit IRC | 06:20 | |
*** dhajare has joined #openstack-infra | 06:23 | |
*** thorst has joined #openstack-infra | 06:26 | |
*** jamesmcarthur has joined #openstack-infra | 06:28 | |
*** tnovacik has quit IRC | 06:29 | |
*** abelur_ has quit IRC | 06:29 | |
*** slaweq has joined #openstack-infra | 06:29 | |
*** thorst has quit IRC | 06:31 | |
*** jamesmcarthur has quit IRC | 06:32 | |
openstackgerrit | Artur Basiak proposed openstack-infra/project-config master: Change service name https://review.openstack.org/492379 | 06:32 |
*** eranrom has joined #openstack-infra | 06:33 | |
*** jamesdenton has joined #openstack-infra | 06:34 | |
*** dhajare has quit IRC | 06:37 | |
kklimonda | what's responsible for displaying test result table on gerrit review pages? is it a plugin, or a config knob? | 06:38 |
*** jamesdenton has quit IRC | 06:39 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [DNM] lvm testing of spanned vg https://review.openstack.org/491986 | 06:39 |
*** jamesdenton has joined #openstack-infra | 06:48 | |
*** ykarel_ is now known as ykarel | 06:48 | |
*** ccamacho has joined #openstack-infra | 06:57 | |
*** shardy has joined #openstack-infra | 07:09 | |
*** luzC has joined #openstack-infra | 07:10 | |
*** psachin has quit IRC | 07:12 | |
*** gcb has joined #openstack-infra | 07:13 | |
*** tesseract has joined #openstack-infra | 07:13 | |
*** luzC has quit IRC | 07:14 | |
*** luzC has joined #openstack-infra | 07:14 | |
*** luzC has quit IRC | 07:18 | |
*** markus_z has joined #openstack-infra | 07:23 | |
*** aarefiev_afk is now known as aarefiev | 07:29 | |
*** dizquierdo has joined #openstack-infra | 07:36 | |
*** luzC has joined #openstack-infra | 07:38 | |
openstackgerrit | Merged openstack-infra/project-config master: networking-midonet: Add runtime graphs to grafana dashboard https://review.openstack.org/492350 | 07:39 |
*** martinkopec has quit IRC | 07:39 | |
openstackgerrit | Merged openstack-infra/project-config master: Reduce infracloud by 50% https://review.openstack.org/492207 | 07:40 |
openstackgerrit | Merged openstack-infra/project-config master: Fix Grafana graphs for VEXXHOST https://review.openstack.org/492338 | 07:40 |
*** eranrom has quit IRC | 07:40 | |
*** cshastri has quit IRC | 07:41 | |
*** LindaWang has quit IRC | 07:42 | |
*** LindaWang has joined #openstack-infra | 07:43 | |
*** LindaWang has quit IRC | 07:45 | |
*** LindaWang has joined #openstack-infra | 07:45 | |
*** xinliang has quit IRC | 07:48 | |
*** ralonsoh has joined #openstack-infra | 07:48 | |
openstackgerrit | Dima Kuznetsov proposed openstack-infra/project-config master: Dragonflow: increase timeout for fullstack jobs https://review.openstack.org/492416 | 07:48 |
*** alexchadin has joined #openstack-infra | 07:48 | |
yuval | Hello infra! Tons of jobs seem to be queued (some for over 8 hours) | 07:51 |
yuval | maybe nodepool is stuck? (again :\ ) | 07:52 |
yuval | AJaeger_: yolanda: ? | 07:52 |
openstackgerrit | Bogdan Dobrelya proposed openstack-infra/tripleo-ci master: Rework the getthelogs helper script for wget recursive https://review.openstack.org/492178 | 07:52 |
yuval | ianw: fungi: ? | 07:52 |
*** martinkopec has joined #openstack-infra | 07:53 | |
openstackgerrit | Merged openstack-infra/project-config master: Add release permission for neutron-vpnaas and dashboard https://review.openstack.org/491670 | 07:54 |
*** sflanigan has quit IRC | 07:54 | |
*** makowals has quit IRC | 07:56 | |
*** makowals has joined #openstack-infra | 08:00 | |
*** cshastri has joined #openstack-infra | 08:00 | |
*** xinliang has joined #openstack-infra | 08:00 | |
*** xinliang has quit IRC | 08:00 | |
*** xinliang has joined #openstack-infra | 08:00 | |
openstackgerrit | YAMAMOTO Takashi proposed openstack-infra/project-config master: networking-midonet: Remove v2 jobs from grafana dashboard https://review.openstack.org/492424 | 08:01 |
*** martinkopec has quit IRC | 08:03 | |
*** markus_z has quit IRC | 08:03 | |
*** martinkopec has joined #openstack-infra | 08:03 | |
*** markus_z has joined #openstack-infra | 08:04 | |
*** jpena|off has quit IRC | 08:12 | |
openstackgerrit | Merged openstack-infra/project-config master: Update footer to follow docs.o.o https://review.openstack.org/490989 | 08:16 |
*** mriedem has quit IRC | 08:16 | |
*** electrofelix has joined #openstack-infra | 08:19 | |
*** xinliang has quit IRC | 08:23 | |
*** lucas-afk is now known as lucasagomes | 08:25 | |
*** derekh has joined #openstack-infra | 08:25 | |
openstackgerrit | Bogdan Dobrelya proposed openstack-infra/tripleo-ci master: Rework the getthelogs helper script for wget recursive https://review.openstack.org/492178 | 08:25 |
*** jcjo has joined #openstack-infra | 08:26 | |
*** thorst has joined #openstack-infra | 08:27 | |
*** yamamoto has quit IRC | 08:27 | |
*** yamamoto has joined #openstack-infra | 08:28 | |
*** psachin has joined #openstack-infra | 08:29 | |
*** jcjo has quit IRC | 08:30 | |
*** dingyichen has quit IRC | 08:31 | |
*** yamamoto has quit IRC | 08:31 | |
*** yamamoto has joined #openstack-infra | 08:32 | |
*** thorst has quit IRC | 08:32 | |
*** jaosorior has quit IRC | 08:37 | |
*** dizquierdo has quit IRC | 08:37 | |
*** xinliang has joined #openstack-infra | 08:39 | |
*** xinliang has quit IRC | 08:39 | |
*** xinliang has joined #openstack-infra | 08:39 | |
*** jaosorior has joined #openstack-infra | 08:41 | |
*** ociuhandu has quit IRC | 08:41 | |
ianw | yuval: hmm ... looking | 08:45 |
yuval | ianw: some are stuck for over 20 hours | 08:45 |
*** ykarel is now known as ykarel|lunch | 08:46 | |
*** jtomasek has joined #openstack-infra | 08:47 | |
ianw | it looks like jobs have timed out, but zuul has not cleaned them up | 08:47 |
tobiash | kklimonda: it's partly gerrit config (commentlinks) and partly gerrit css config | 08:48 |
tobiash | kklimonda: you should find both somewhere in the system-config repo | 08:48 |
kklimonda | thanks, I'll take a look | 08:48 |
openstackgerrit | Ilya Shakhat proposed openstack-infra/project-config master: Add gerritbot notications for osprofiler into #openstack-performance https://review.openstack.org/492437 | 08:50 |
*** eroux has joined #openstack-infra | 08:51 | |
tobiash | kklimonda: that should be the css part: https://github.com/openstack-infra/system-config/blob/master/modules/openstack_project/files/gerrit/GerritSite.css#L123 | 08:51 |
*** xinliang has quit IRC | 08:52 | |
*** jaosorior has quit IRC | 08:54 | |
*** makowals has quit IRC | 08:54 | |
ianw | no, that's not right, it is moving along. i think we're just down a couple of providers | 08:57 |
tobiash | kklimonda: in zuul you also need to configure job_name_in_report: https://github.com/openstack-infra/puppet-zuul/blob/master/templates/zuul.conf.erb#L25 | 08:58 |
*** makowals has joined #openstack-infra | 08:58 | |
*** pgadiya has quit IRC | 08:58 | |
kklimonda | great, thanks a lot - I'll pass it along | 08:59 |
tobiash | kklimonda: and this I found for the commentlink condfig in gerrit: https://review.openstack.org/#/c/42495/2 | 09:00 |
*** eroux has quit IRC | 09:00 | |
*** e0ne has joined #openstack-infra | 09:01 | |
dalvarez | hi guys, just a quick question... we saw there was a new tag in ubuntu kernel on Aug 31 and we're running that kernel right away in the gate. How does it work? We always run latest tagged kernel? Thanks!! | 09:02 |
*** xinliang has joined #openstack-infra | 09:06 | |
*** xinliang has quit IRC | 09:06 | |
*** xinliang has joined #openstack-infra | 09:06 | |
*** ralonsoh has quit IRC | 09:06 | |
ianw | dalvarez: the trusty/xenial images are built daily, and will basically be running what was apt-get'ed that day | 09:07 |
*** ralonsoh has joined #openstack-infra | 09:07 | |
*** eroux has joined #openstack-infra | 09:07 | |
ianw | but, if you really mean aug 31, well we're not quite at the point we can time travel :) | 09:07 |
*** pgadiya has joined #openstack-infra | 09:12 | |
openstackgerrit | Daniel Lublin proposed openstack-infra/git-review master: Actually output the warning https://review.openstack.org/443474 | 09:12 |
openstackgerrit | Isaku Yamahata proposed openstack-infra/project-config master: manasca: rename monitoring-log-api to monitoring-logging https://review.openstack.org/492444 | 09:12 |
openstackgerrit | Daniel Lublin proposed openstack-infra/git-review master: Allow choosing which field to use as author when naming branch https://review.openstack.org/444574 | 09:12 |
*** jcjo has joined #openstack-infra | 09:13 | |
*** Guest54759 has quit IRC | 09:13 | |
*** jcjo has quit IRC | 09:17 | |
*** ramishra has quit IRC | 09:18 | |
*** gongysh has quit IRC | 09:19 | |
openstackgerrit | Alexander Chadin proposed openstack-infra/project-config master: Register watcher-tempest-plugin jobs https://review.openstack.org/490400 | 09:19 |
*** ramishra has joined #openstack-infra | 09:20 | |
dalvarez | ianw, lol july 31 :) | 09:21 |
dalvarez | ianw, thanks for the info... and is there an "easy" way to revert to the previous image? there's been a regression in ubuntu kernel affecting neutron functional tests... not sure if other projects are affected too | 09:22 |
*** luzC has quit IRC | 09:22 | |
ianw | dalvarez: there is not an easy way to revert to something several days ago, we only keep a couple of days images | 09:23 |
*** luzC has joined #openstack-infra | 09:23 | |
ianw | pinning a kernel is going to be ... tricky | 09:23 |
yamahata | hello, now project-config gate-project-config-jenkins-project is broken. The fix is https://review.openstack.org/#/c/492444/ or https://review.openstack.org/#/c/492379/ | 09:25 |
yamahata | can you please review it? | 09:25 |
dalvarez | ianw, ack pinning would work until the regression's fixed but gotcha | 09:25 |
dalvarez | thanks a lot :) | 09:25 |
*** jamesmcarthur has joined #openstack-infra | 09:28 | |
yamahata | thanks for quick review. | 09:29 |
openstackgerrit | Flávio Ramalho proposed openstack-infra/project-config master: zuul: layout: osa-os_sahara: Add nv centos job https://review.openstack.org/492450 | 09:30 |
*** xinliang has quit IRC | 09:31 | |
*** jamesmcarthur has quit IRC | 09:32 | |
*** sree has quit IRC | 09:34 | |
*** sree has joined #openstack-infra | 09:34 | |
*** ykarel|lunch is now known as ykarel | 09:34 | |
*** sambetts|afk is now known as sambetts | 09:36 | |
openstackgerrit | Merged openstack-infra/project-config master: Change service name https://review.openstack.org/492379 | 09:37 |
*** gongysh has joined #openstack-infra | 09:37 | |
*** luzC has quit IRC | 09:39 | |
*** sree has quit IRC | 09:39 | |
*** sree has joined #openstack-infra | 09:40 | |
*** markvoelker has quit IRC | 09:42 | |
*** xinliang has joined #openstack-infra | 09:44 | |
*** xinliang has quit IRC | 09:44 | |
*** xinliang has joined #openstack-infra | 09:44 | |
*** dizquierdo has joined #openstack-infra | 09:44 | |
*** luzC has joined #openstack-infra | 09:47 | |
*** udesale has joined #openstack-infra | 09:50 | |
*** sdague has joined #openstack-infra | 09:50 | |
*** jaosorior has joined #openstack-infra | 09:54 | |
*** cuongnv has quit IRC | 09:56 | |
yamamoto | infra team, can you add me to new groups, neutron-vpnaas-release and neutron-vpnaas-dashboard-release? | 09:56 |
ianw | yamamoto: done | 10:00 |
*** Hal has joined #openstack-infra | 10:02 | |
*** Hal is now known as Guest34949 | 10:03 | |
*** gcb has quit IRC | 10:04 | |
*** lihi has joined #openstack-infra | 10:05 | |
*** shardy has quit IRC | 10:08 | |
yamamoto | ianw: thank you | 10:09 |
*** shardy has joined #openstack-infra | 10:11 | |
*** cshastri has quit IRC | 10:14 | |
Diabelko | weird, my zuul works fine, however when I do zuul-server -t /etc/zuul/layout.yaml it keeps saying FAILURE: Job X not defined | 10:14 |
Diabelko | should I worry? | 10:14 |
*** katkapilatova has joined #openstack-infra | 10:17 | |
*** yamamoto has quit IRC | 10:21 | |
openstackgerrit | Bogdan Dobrelya proposed openstack-infra/elastic-recheck master: Fix Generic job timeout bug match https://review.openstack.org/492463 | 10:22 |
*** liujiong has quit IRC | 10:22 | |
*** katkapilatova has quit IRC | 10:22 | |
*** yamamoto has joined #openstack-infra | 10:25 | |
*** LindaWang1 has joined #openstack-infra | 10:27 | |
*** yamamoto has quit IRC | 10:28 | |
*** yamamoto has joined #openstack-infra | 10:28 | |
*** thorst has joined #openstack-infra | 10:28 | |
*** LindaWang has quit IRC | 10:30 | |
*** LindaWang1 is now known as LindaWang | 10:30 | |
*** thorst has quit IRC | 10:33 | |
*** dtantsur|afk is now known as dtantsur | 10:35 | |
*** sree has quit IRC | 10:35 | |
*** sree has joined #openstack-infra | 10:35 | |
*** katkapilatova has joined #openstack-infra | 10:35 | |
*** udesale has quit IRC | 10:36 | |
*** zhurong has quit IRC | 10:39 | |
*** sree has quit IRC | 10:40 | |
*** yamamoto has quit IRC | 10:41 | |
*** ykarel_ has joined #openstack-infra | 10:41 | |
*** ykarel__ has joined #openstack-infra | 10:42 | |
*** ykarel_ has quit IRC | 10:43 | |
*** ykarel has quit IRC | 10:43 | |
*** iyamahat has joined #openstack-infra | 10:45 | |
*** iyamahat has quit IRC | 10:45 | |
*** iyamahat has joined #openstack-infra | 10:45 | |
*** yamahata has quit IRC | 10:48 | |
*** iyamahat has quit IRC | 10:50 | |
*** rwsu has quit IRC | 10:53 | |
sdague | fyi, citycloud timing out on pep8 install - http://logs.openstack.org/04/490304/1/gate/gate-nova-pep8-ubuntu-xenial/6a64b56/ | 10:57 |
openstackgerrit | Rui Chen proposed openstack-infra/shade master: Support to get resource by id https://review.openstack.org/492080 | 10:57 |
*** ykarel__ is now known as ykarel | 10:58 | |
*** aeng has joined #openstack-infra | 10:58 | |
*** jkilpatr has joined #openstack-infra | 10:59 | |
*** gongysh has quit IRC | 11:01 | |
*** psachin has quit IRC | 11:03 | |
*** katkapilatova has quit IRC | 11:07 | |
*** yamamoto has joined #openstack-infra | 11:09 | |
ianw | sdague: interesting ... that makes me suspect the mirror | 11:09 |
sdague | ianw: yeh, that would make sense | 11:10 |
ianw | logging in it's stuck at "debug1: pledge: network" | 11:10 |
ianw | which stack overflow tells me might be related to systemd / dbus | 11:10 |
ianw | remember the days when you could log in without message busses getting in the way! | 11:11 |
*** katkapilatova has joined #openstack-infra | 11:12 | |
sdague | heh | 11:13 |
ianw | "System information disabled due to load higher than 2.0" | 11:13 |
ianw | if i could get a prompt ... | 11:13 |
sdague | honestly, I noticed other citycloud long timeouts the other day | 11:13 |
ianw | i am guessing this host is very unhappy | 11:13 |
sdague | I wonder if the move from "sometimes" to "always" is overloading those cloud hosts | 11:13 |
sdague | given that we've not had much idle cycles in nodepool this week | 11:14 |
ianw | load average: 100.74 | 11:14 |
sdague | ianw: where are you getting that? | 11:15 |
ianw | sdague: i'm logged into the mirror for the region of the job you just mentioned | 11:16 |
ianw | there are a lot of htclean processes i think | 11:16 |
sdague | ah | 11:16 |
ianw | htcacheclean, all stuck on D wait channel | 11:16 |
sdague | I wonder if they are iops limitting it at the hypervisor level | 11:16 |
sdague | that would definitely explain some of these job timeouts if everything is handing on the mirror | 11:17 |
ianw | ianw@mirror:~$ ps -aef | grep htcacheclean | wc -l | 11:17 |
ianw | 172 | 11:17 |
ianw | that would be absolutely killing I/O | 11:18 |
ianw | ok ... and exim going mad. that seems bad | 11:19 |
*** dhajare has joined #openstack-infra | 11:20 | |
*** yamamoto_ has joined #openstack-infra | 11:20 | |
*** ykarel_ has joined #openstack-infra | 11:21 | |
openstackgerrit | Petr Benas proposed openstack/python-jenkins master: Allow specifying a non-standard port https://review.openstack.org/492478 | 11:23 |
ianw | ahh, i think exim was going bananas trying to send out messages from cron when i killed the htcacheclean | 11:23 |
*** yamamoto has quit IRC | 11:24 | |
*** ykarel has quit IRC | 11:25 | |
openstackgerrit | Ian Wienand proposed openstack-infra/system-config master: Run htcacheclean under lock https://review.openstack.org/492481 | 11:34 |
*** sshnaidm|off has joined #openstack-infra | 11:34 | |
ianw | sdague / infra-root: ^ I think something like this should be considered without too much delay | 11:35 |
*** ldnunes has joined #openstack-infra | 11:35 | |
sdague | ianw: good call | 11:35 |
ianw | also, rebooting mirror.lon1.citycloud.openstack.org is probably not a bad idea. it's coming back to life, but the logs are full of kernel oops's from stuck tasks. i think probably best to start fresh | 11:36 |
sdague | ianw: no objections here | 11:36 |
sdague | it seems to be failing jobs anyway | 11:36 |
sdague | ianw: I'd say do it now before the US starts hammering the gate | 11:37 |
ianw | sdague : hmm, ok, but if it doesn't come back i'm blaming you ;) | 11:38 |
*** sshnaidm|off has quit IRC | 11:39 | |
*** sshnaidm|off has joined #openstack-infra | 11:39 | |
*** dhajare has quit IRC | 11:39 | |
*** dhajare has joined #openstack-infra | 11:39 | |
*** coolno1 has joined #openstack-infra | 11:40 | |
coolno1 | hello | 11:40 |
coolno1 | Need some inputs on whether openstack CI infra can be used by some third party projects | 11:41 |
coolno1 | Including gerrit, launchpad, gating ? | 11:41 |
*** ldnunes has quit IRC | 11:42 | |
*** rhallisey has joined #openstack-infra | 11:42 | |
*** dave-mccowan has joined #openstack-infra | 11:42 | |
*** lucasagomes is now known as lucas-hungry | 11:42 | |
*** alexchadin has quit IRC | 11:43 | |
*** alexchadin has joined #openstack-infra | 11:44 | |
*** alexchadin has quit IRC | 11:44 | |
*** alexchadin has joined #openstack-infra | 11:45 | |
*** aeng has quit IRC | 11:45 | |
*** alexchadin has quit IRC | 11:45 | |
ianw | sigh ... i think something must be up with it's networking | 11:45 |
*** alexchadin has joined #openstack-infra | 11:45 | |
ianw | it's booting but really slow, and i'm guessing it's dns related | 11:45 |
*** alexchadin has quit IRC | 11:46 | |
*** alexchadin has joined #openstack-infra | 11:46 | |
*** alexchadin has quit IRC | 11:51 | |
*** rhallisey has quit IRC | 11:52 | |
*** katkapilatova has left #openstack-infra | 11:53 | |
coolno1 | Hello Looking for some guidance can someone provide pointers | 11:53 |
*** katkapilatova has joined #openstack-infra | 11:53 | |
sdague | coolno1: the general statement has been that it's fine, however being specific about what you'd like to use it for would be helpful | 11:53 |
*** ldnunes has joined #openstack-infra | 11:54 | |
sdague | ianw: yeh, I kind of wonder if that is more indicative of a bigger issue on that cloud | 11:54 |
coolno1 | sdague, the gerrit review system, OpenStack gates, launchpad | 11:54 |
sdague | coolno1: right... but for what kind of project | 11:54 |
sdague | launchpad isn't openstack specific | 11:54 |
coolno1 | sdague, it is a Cloud Foundry related project capable of deploying on OpenStack | 11:55 |
openstackgerrit | wes hayutin proposed openstack-infra/tripleo-ci master: WIP: containers periodic test https://review.openstack.org/475747 | 11:57 |
sdague | coolno1: that would probably be fine | 11:57 |
coolno1 | sdague, wow | 11:57 |
sdague | ianw: I wonder if it's worth disabling lon1 for now | 11:57 |
sdague | just to not have jobs fail because of that mirror | 11:58 |
coolno1 | sdague, can you please provide me some pointers on what it takes to get this done | 11:58 |
coolno1 | sdague, some reference documentation | 11:58 |
sdague | https://docs.openstack.org/infra/manual/creators.html | 11:58 |
*** esberglu has joined #openstack-infra | 11:59 | |
coolno1 | sdague, Thanks a lot for this. I will come back to irc as well as mailing list for more information | 11:59 |
ianw | sdague: yeah, i think we have to. i've tried disconnecting & reconnecting the interface and it still won't get past cloud init | 11:59 |
sdague | yeh | 11:59 |
sdague | it's only a 50 node drop, so not the end of the world | 11:59 |
ianw | it gets an address from dhcp, but then no bueno | 11:59 |
*** ykarel__ has joined #openstack-infra | 11:59 | |
sdague | the aggregate throughput will go up if we aren't rando failing there | 12:00 |
*** ykarel_ has quit IRC | 12:00 | |
*** ykarel__ is now known as ykarel | 12:00 | |
sdague | ianw: can you hot patch disable that? | 12:01 |
sdague | it will be 4+ hours for the commit to go through | 12:01 |
sdague | I'll propose the commit | 12:02 |
*** thorst has joined #openstack-infra | 12:02 | |
coolno1 | sdague, Few questions. I hope the project can be hosted on github.com? | 12:03 |
coolno1 | sdague, Secondly I hope it can be non-python project. It is developed in nodejs | 12:03 |
*** esberglu has quit IRC | 12:04 | |
sdague | coolno1: we mirror some stuff to github, gut the hosting would be in gerrit if you use that | 12:04 |
*** alexchadin has joined #openstack-infra | 12:04 | |
*** rlandy has joined #openstack-infra | 12:04 | |
*** jtomasek has quit IRC | 12:05 | |
*** tuanluong has quit IRC | 12:06 | |
*** dhajare has quit IRC | 12:07 | |
coolno1 | sdague, and the review system will be "https://review.openstack.org" | 12:07 |
openstackgerrit | Sean Dague proposed openstack-infra/project-config master: Disable citycloud lon1 https://review.openstack.org/492493 | 12:07 |
sdague | coolno1: yeh, it's probably worth reading through all the project team docs, they are pretty extensive | 12:08 |
coolno1 | sdague, Sure I am just trying to understand if it will be misleading as it is a CF related project | 12:09 |
*** alexchadin has quit IRC | 12:09 | |
*** sree has joined #openstack-infra | 12:09 | |
sdague | coolno1: that's fine, but I feel like it will be easier to ask about specific mismatches (if they exist) once you prime yourself with that documentation. | 12:10 |
*** dhajare has joined #openstack-infra | 12:10 | |
coolno1 | sdague, yeah got it | 12:10 |
*** jpena has joined #openstack-infra | 12:10 | |
ianw | #status log nodepool in emergency file and citycloud-lon1 region commented out while we investigate issues with mirror | 12:13 |
openstackstatus | ianw: finished logging | 12:13 |
sdague | ianw: thanks! | 12:13 |
*** sree has quit IRC | 12:13 | |
*** ykarel_ has joined #openstack-infra | 12:15 | |
*** trown|outtypewww is now known as trown | 12:16 | |
*** Goneri has joined #openstack-infra | 12:17 | |
*** ykarel has quit IRC | 12:18 | |
*** rwsu has joined #openstack-infra | 12:19 | |
*** sshnaidm|off is now known as sshnaidm | 12:19 | |
*** xinliang has quit IRC | 12:20 | |
*** sree has joined #openstack-infra | 12:20 | |
openstackgerrit | James Page proposed openstack-infra/project-config master: Add Gnocchi charm and associated interfaces https://review.openstack.org/489946 | 12:22 |
*** rwsu has quit IRC | 12:23 | |
trown | is there anyway to remove https://review.openstack.org/#/c/485689/10 from gate? it looks hung there. I did recheck, but I am worried it wont re-enter gate queue after rerunning check | 12:23 |
*** shardy has quit IRC | 12:23 | |
*** dhajare has quit IRC | 12:24 | |
*** shardy has joined #openstack-infra | 12:25 | |
*** pgadiya has quit IRC | 12:25 | |
ianw | #status log mirror.lon1.citycloud.openstack.org migrated to a new compute node by Kim from citycloud. appears up. nodepool conf restored & nodepool.o.o taken out of emergency file | 12:27 |
openstackstatus | ianw: finished logging | 12:27 |
*** markmcd has quit IRC | 12:31 | |
*** slaweq has quit IRC | 12:31 | |
*** slaweq has joined #openstack-infra | 12:32 | |
*** lucas-hungry is now known as lucasagomes | 12:32 | |
*** xinliang has joined #openstack-infra | 12:33 | |
*** pgadiya has joined #openstack-infra | 12:35 | |
*** jamesdenton has quit IRC | 12:35 | |
ianw | sdague / infra-root : i believe things with lon1 are back to status quo. i wrote http://lists.openstack.org/pipermail/openstack-infra/2017-August/005546.html | 12:36 |
*** dhajare has joined #openstack-infra | 12:36 | |
*** slaweq has quit IRC | 12:36 | |
*** jamesdenton has joined #openstack-infra | 12:36 | |
ianw | either fungi is awake or the robot he uses to do reviews is on autopilot. either way, i feel ok going to bed :) have a good day americans! | 12:38 |
*** sshnaidm is now known as sshnaidm|afk | 12:38 | |
fungi | go to bed ianw. still waking up so on silent running for now, but around (mostly) | 12:39 |
fungi | and thanks!!! | 12:39 |
*** alexchadin has joined #openstack-infra | 12:44 | |
*** mriedem has joined #openstack-infra | 12:45 | |
*** EricGonczer_ has joined #openstack-infra | 12:47 | |
*** sbezverk has joined #openstack-infra | 12:48 | |
*** slaweq has joined #openstack-infra | 12:51 | |
*** vhosakot has joined #openstack-infra | 12:52 | |
*** markmcd has joined #openstack-infra | 12:56 | |
*** jrist has joined #openstack-infra | 12:58 | |
*** bh526r has joined #openstack-infra | 12:58 | |
*** jpena is now known as jpena|mtg | 12:59 | |
*** esberglu has joined #openstack-infra | 13:00 | |
*** dhajare has quit IRC | 13:00 | |
mhayden | i'm still seeing CI jobs come online with old versions of project-config, unfortunately | 13:00 |
mhayden | mordred was looking into it yesterday | 13:00 |
*** EricGonczer_ has quit IRC | 13:02 | |
*** LindaWang has quit IRC | 13:02 | |
*** Goneri has quit IRC | 13:04 | |
openstackgerrit | Merged openstack-infra/project-config master: Remove opensuse-422 from jobs https://review.openstack.org/492181 | 13:13 |
*** baoli has joined #openstack-infra | 13:15 | |
*** katkapilatova_ has joined #openstack-infra | 13:15 | |
*** katkapilatova has quit IRC | 13:16 | |
*** katkapilatova_ is now known as katkapilatova | 13:16 | |
*** ramishra has quit IRC | 13:16 | |
*** jcoufal has joined #openstack-infra | 13:17 | |
*** katkapilatova1 has joined #openstack-infra | 13:17 | |
*** katkapilatova1 has quit IRC | 13:18 | |
*** ociuhandu has joined #openstack-infra | 13:18 | |
*** ramishra has joined #openstack-infra | 13:18 | |
*** rhallisey has joined #openstack-infra | 13:19 | |
*** katkapilatova1 has joined #openstack-infra | 13:20 | |
*** vhosakot has quit IRC | 13:21 | |
*** vhosakot has joined #openstack-infra | 13:21 | |
*** katkapilatova1 has quit IRC | 13:23 | |
*** bobh has joined #openstack-infra | 13:24 | |
*** LindaWang has joined #openstack-infra | 13:26 | |
*** katkapilatova1 has joined #openstack-infra | 13:29 | |
*** jaypipes has joined #openstack-infra | 13:30 | |
*** Julien-zte has quit IRC | 13:34 | |
andreaf | ianw, fungi, EmilienM: I see gate-puppet-openstack-integration-4-scenarioNNN jobs failing rather consistently with timeout | 13:34 |
*** Julien-zte has joined #openstack-infra | 13:34 | |
*** coolno1 has quit IRC | 13:34 | |
andreaf | did either the timeout or the test change somehow? | 13:35 |
andreaf | or am I hitting slow nodes perhaps | 13:35 |
andreaf | I see timeout after 1h (according to console.log) - that sounds too short | 13:37 |
openstackgerrit | Merged openstack-infra/project-config master: networking-odl: retire boron task https://review.openstack.org/491951 | 13:39 |
*** rwsu has joined #openstack-infra | 13:39 | |
*** jamesmcarthur has joined #openstack-infra | 13:51 | |
*** rwsu has quit IRC | 13:52 | |
*** marst_ has joined #openstack-infra | 13:54 | |
mnaser | andreaf we've been seeing those timeouts happen often unfortunately | 13:55 |
*** alexchadin has quit IRC | 13:55 | |
*** rwsu has joined #openstack-infra | 13:55 | |
mnaser | the issue seems to be slow nodes, i said i'd put some time to try and catch the timeouts and get some logs but need some time to work on that | 13:55 |
*** camunoz has joined #openstack-infra | 13:56 | |
mnaser | i usually notice very long puppet runs on failed jobs (~40 minutes first run, 10 minutes second run) ... doesnt leave much time for tempest | 13:56 |
andreaf | mnaser: if you looks at failures from pupper jobs in https://review.openstack.org/#/c/492190/ you have a lot of timeouts | 13:56 |
*** vhosakot has quit IRC | 13:56 | |
andreaf | mnaser: but 1h timeout sounds too short anyways - does the timeout include the time for the node to boot? I would not think so | 13:57 |
*** vhosakot has joined #openstack-infra | 13:57 | |
mnaser | andreaf i think the timeout starts the second the job starts | 13:57 |
andreaf | mnaser: ok yeah that's what I thought | 13:57 |
mnaser | andreaf in that case the first puppet run seems to have taken ~30 minutes and the second ~10 minutes | 13:58 |
mnaser | instance took 8 minutes to setup too so lets round that up to 10 | 13:58 |
mnaser | that means tempest has 10 minutes to run | 13:58 |
mnaser | andreaf pabelanger noticed that some of the centos mirrors are not working properly so he was working on a solution for the caching https://review.openstack.org/#/c/492333/ -- the job runtimes in there seem promising | 13:59 |
*** gouthamr has joined #openstack-infra | 14:00 | |
*** katkapilatova1 has quit IRC | 14:00 | |
*** katkapilatova has left #openstack-infra | 14:00 | |
*** davidsha has joined #openstack-infra | 14:01 | |
*** gongysh has joined #openstack-infra | 14:02 | |
*** gongysh has quit IRC | 14:03 | |
openstackgerrit | Andrea Frittoli proposed openstack-infra/project-config master: Increase puppet integration job timeout to 90m https://review.openstack.org/492544 | 14:04 |
*** Goneri has joined #openstack-infra | 14:04 | |
andreaf | mnaser, EmilienM: what about this ^^ until the problem is sorted? | 14:04 |
andreaf | pabelanger: ^^ | 14:05 |
mnaser | andreaf id let EmilienM or mwhahaha make the call on that, they know the CI much better, but i think its a good idea cause we're always timing out on successful installs | 14:05 |
mwhahaha | I think it's related to the bandwidth problems | 14:06 |
mwhahaha | i guess we could increase it but i'm not sure that's going to really help | 14:06 |
mwhahaha | we'll probably see it take 90mins and still timeout | 14:07 |
*** rwsu has quit IRC | 14:07 | |
andreaf | mwhahaha: I see jobs usually timeout in the middle of a Tempest run, so I was hoping the extra time would help | 14:07 |
mwhahaha | yea we can go for it for now | 14:08 |
mwhahaha | i'm trying to get some tripleo-ci fixes merged so we stop reseting the gate queue | 14:08 |
mwhahaha | which is also not helping | 14:08 |
andreaf | mwhahaha: on Tempest patches I'm blind now wrt puppet jobs, so I don't to risk breaking something | 14:08 |
mwhahaha | sure | 14:09 |
*** rlandy has quit IRC | 14:09 | |
*** spzala has joined #openstack-infra | 14:10 | |
*** ramishra has quit IRC | 14:11 | |
*** ramishra has joined #openstack-infra | 14:12 | |
mnaser | would any infra-root be able to help out with this? https://review.openstack.org/#/c/491800 merged last night but mirrors still have not appeared anywhere yet. would anyone be able to investigate? we've been blocked in magnum for quite a long time because of this - http://mirror.regionone.infracloud-vanilla.openstack.org/fedora/ shows nothing even though timestamp is updated | 14:13 |
*** jpena|mtg is now known as jpena|off | 14:14 | |
*** rbrndt has joined #openstack-infra | 14:14 | |
*** sree has quit IRC | 14:18 | |
*** slaweq has quit IRC | 14:19 | |
*** jkilpatr has quit IRC | 14:20 | |
*** EricGonczer_ has joined #openstack-infra | 14:20 | |
*** jkilpatr has joined #openstack-infra | 14:20 | |
*** EricGonc_ has joined #openstack-infra | 14:24 | |
*** EricGonczer_ has quit IRC | 14:25 | |
jeblair | mnaser: rsync: failed to set permissions on "/afs/.openstack.org/mirror/fedora/atomic/.": Permission denied (13) | 14:26 |
mnaser | jeblair: did i miss or forget to do something in my patch? | 14:26 |
jeblair | mnaser: not sure yet | 14:26 |
mnaser | jeblair i pretty much tried to replicate what was being done for fedora in the same file, but at least we know it's failing | 14:27 |
*** lbragstad has quit IRC | 14:28 | |
jeblair | mnaser: btw, how big is the atomic mirror? | 14:29 |
mnaser | jeblair: every qcow2 file is ~650M and i filtered it so that it only gets the qcow2 files, there are 10 images right now so rougly 6.5GB? | 14:30 |
fungi | oh, yep, i didn't think to check how much space that was going to add. sounded like only a few files, but... it's images so could actually be huge i guess | 14:30 |
jeblair | should be okay, but the partition is at 90% so it's probably time to make new volume and move stuff around | 14:31 |
mnaser | there was a lot more content originally but i stripped it down with excludes | 14:31 |
mnaser | i left a --dry-run output in the changeset which shows total size 6936506222 to be fully accurate (6.93gb) | 14:32 |
dimak | Hey, zuul shows there's a lot of queued jobs, some for 20 hours now. Is there an infra issue? | 14:33 |
mnaser | dimak a few providers that donate infra have been disabled because of some issues so we're heavily capped by # of instances available to run tests | 14:36 |
mnaser | you can see the number of test nodes just capped in a perfect line :p | 14:37 |
*** spzala has quit IRC | 14:37 | |
*** priteau has joined #openstack-infra | 14:39 | |
jeblair | chmod(".", 02755) = -1 EACCES (Permission denied) | 14:41 |
jeblair | it's trying to setgid on the directory | 14:42 |
fungi | ahh, we likely need to not attempt to preserve permissions with rsync? | 14:42 |
jeblair | or at least not that permission | 14:42 |
jeblair | but yeah, i would think the default perms would be fine | 14:43 |
jeblair | mnaser: so maybe just drop "p" from -rlptDvz ? | 14:43 |
jeblair | i'll try it real quick to make sure | 14:43 |
fungi | i'm wondering to what extent any perms from the source need to be kept | 14:44 |
mnaser | jeblair okay cool, i can propose a fix if it's working | 14:44 |
jeblair | fungi: yeah, probably none | 14:44 |
dimak | mnaser, I see, thanks for the answer! :) | 14:44 |
jeblair | mnaser: yeah it's happy now | 14:44 |
fungi | agreed, i'm not coming up with any scenarios. maybe corner cases where some consuming application expects a regular file to have executable permission on the mirror? but even that seems like a pathological/broken behavior | 14:45 |
jeblair | mnaser: it has actually completed the rsync; that was the last thing it was trying to do. so once the fix lands, we shouldn't need to wait long for it to release the volume. | 14:45 |
jeblair | du says 6.5G | 14:46 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add sphinx-autodoc-typehits sphinx extension https://review.openstack.org/492557 | 14:46 |
*** yamahata has joined #openstack-infra | 14:46 | |
davidsha | Hi would this be the place to ask questions about tempest tests? | 14:48 |
jrich | Odd behavior: from a fresh checkout of project-config - I can init gerrit just fine (git review -s). When trying the same on openstack-dev/sandbox - I get errors: "Could not connect to gerrit". Had to manually add a remote. This normal? | 14:48 |
openstackgerrit | Mohammed Naser proposed openstack-infra/system-config master: Stop rsync from managing setgid permissions for Fedora Atomic mirror https://review.openstack.org/492558 | 14:48 |
mnaser | jeblair fungi ^ thank you for your help/investigation | 14:48 |
jeblair | mnaser: you're welcome! | 14:49 |
fungi | davidsha: #openstack-qa is probably the channel you're looking for | 14:50 |
*** PsionTheory has joined #openstack-infra | 14:50 | |
fungi | jrich: definitely not normal. you may need to make sure you don't have any local or config changes to the sandbox repo | 14:51 |
jeblair | jrich: it looks like the .gitreview file in the sandbox repo is incorrect | 14:51 |
jeblair | or... at least... unusual | 14:51 |
fungi | wow, that may be one side effect of giving approval rights to that repo we didn't consider! | 14:51 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 14:51 |
jeblair | it has the ipv4 address rather than hostname | 14:51 |
jrich | I noticed it had an IP address instead of the resolvable name. I changed that first, but still had the issue. | 14:52 |
*** tmorin has joined #openstack-infra | 14:52 | |
davidsha | fungi: Thanks! | 14:52 |
jrich | correction: after addressing the IP address, I had a different error =) | 14:52 |
fungi | jrich: git review -s sets some remote lines in the local .git/config based on what it finds in the .gitreview file, so fixing that file doesn't necessarily correct the behavior afterward | 14:53 |
*** armax has joined #openstack-infra | 14:53 | |
tmorin | infraroot ? would someone be around to help on a gate job that (apparently) keeps restarting ( https://review.openstack.org/492142 ) | 14:53 |
tmorin | (sorry not a gate job, but the full set of gate jobs for a given change) | 14:54 |
jeblair | fungi, jrich: https://review.openstack.org/492560 | 14:54 |
*** sbezverk has quit IRC | 14:55 | |
tmorin | what I observe is that: all jobs restarted, at a point where most jobs for this change had passed, except two which were in progress | 14:55 |
*** quite has quit IRC | 14:55 | |
jeblair | fungi, jrich: fwiw, i did not have a git remote set up and used git-review to submit that successfully. so based on what fungi said, it may be worth completely deleting your local sandbox repo and trying again. | 14:55 |
*** LindaWang has quit IRC | 14:55 | |
tmorin | this has been occurring a few times in the past two hours | 14:56 |
jrich | jeblair: Hah, I checked it out 10 mins ago -=) | 14:56 |
*** LindaWang has joined #openstack-infra | 14:56 | |
fungi | tmorin: that's usually a sign that something in your change completely crashes or brings down network connectivity for the job node, so zuul thinks something unrelated happened to the node (ECLOUDS) and starts the job over for you | 14:56 |
jrich | jeblair: but I'll do it anyway since I manually added the remote to fix my issue. I want to try it fresh again to see if I can reproduce. | 14:56 |
jeblair | jrich: yes, let us know if it still happens | 14:56 |
fungi | tmorin: have you streamed the console log to see what point in the job it stops? is it roughly the same point each time? | 14:57 |
tmorin | fungi: nope, I'm currently behind a fw making this painful :-/ | 14:57 |
*** makowals_ has joined #openstack-infra | 14:57 | |
jeblair | fungi, tmorin: if *all* jobs restarted, that's because zuul reset the gate queue | 14:57 |
fungi | ahh, yes, i'm switching computers so i can more easily see whether the change in question is in the gate pipeline or in check | 14:58 |
*** Swami has joined #openstack-infra | 14:58 | |
fungi | i was (perhaps wrongly) assuming the latter | 14:58 |
tmorin | jeblair: I'm not sure I know what 'reset the gate queue' means | 14:58 |
jeblair | it's in gate with about 7 changes ahead, one failing | 14:58 |
*** makowals has quit IRC | 14:58 | |
fungi | yeah, so changes failing ahead of it causing it to be retested | 14:58 |
tmorin | yes, the issue is in 'gate' queue | 14:59 |
tmorin | fungi: ah, ok | 14:59 |
fungi | completely normal behavior, just didn't expect anyone to be surprised by that so my mind jumped to jobs getting requeued | 14:59 |
jeblair | tmorin: zuul takes all of the changes that have been approved and tests each one with the ones ahead of it | 14:59 |
*** lbragstad has joined #openstack-infra | 14:59 | |
jeblair | tmorin: if one of the changes ahead fails, it pulls that change out of the line, reorders the list, and starts jobs again | 14:59 |
*** sree has joined #openstack-infra | 15:00 | |
jeblair | tmorin: if you look at the status page at http://status.openstack.org/zuul/ you'll see that your change is behind 492120 which failed | 15:00 |
*** jamesmcarthur has quit IRC | 15:00 | |
jeblair | tmorin: you'll also note that change is disconnected from the others in that queue. so the most recent time all the jobs restarted on your change was when that one failed and zuul pulled it out | 15:00 |
*** felipemonteiro has joined #openstack-infra | 15:00 | |
tmorin | fungi, jeblair: ok, I had read about that a while ago, but somehow could accept this to be the reality today, given how likely it seems tobe the past few days that jobs fail | 15:01 |
jeblair | tmorin: you can read more about this here: https://docs.openstack.org/infra/zuul/gating.html | 15:01 |
jrich | Bloody strange. Exception: Could not connect to gerrit at ssh://jrich@review.openstack.org:29418/openstack-dev/sandbox.git | 15:01 |
jrich | nc review.openstack.org 29418 | 15:01 |
jrich | SSH-2.0-GerritCodeReview_2.11.4-22-ge0c0f29 (SSHD-CORE-0.14.0) | 15:01 |
jeblair | jrich: do you get any more information if you add the '-v' option? | 15:02 |
jrich | jeblair: will try that next. Good idea. | 15:02 |
fungi | jrich: git-review is attempting to do a test push, and the sandbox repo has contributor license agreement enforcement turned on, so it may be confused by the rejection gerrit is giving it if you haven't agreed to the icla | 15:02 |
tmorin | jeblair, fungi: one general question on the overall CI load/slowness : can we expect to this this improve when some things get fixed / improved/ scaled, or rather that we should leave with that for a while as a result of OSIC decomissioning ? | 15:03 |
jrich | fungi: that might be it. However, that would imply the project-config repo is missing a setting. (I was able to set that one up without a problem) | 15:04 |
fungi | tmorin: right now we're down not just osic, but also all of ovh, one region in citycloud and running infra-cloud at half-capacity due to network issues. we hope once some of those are addressed it will pick back up | 15:05 |
fungi | jrich: i don't believe we enforce a cla on project-config but i'm checking now | 15:05 |
jeblair | tmorin: unfortunately, all of those are out of our control :( | 15:05 |
*** jaypipes has quit IRC | 15:05 | |
*** LindaWang has quit IRC | 15:05 | |
*** jaypipes has joined #openstack-infra | 15:05 | |
jrich | fungi: winner winner chicken dinner. I had not accepted the icla on this account. | 15:05 |
tmorin | jeblair, fungi: :-( | 15:05 |
tmorin | fungi: thanks for the answer | 15:06 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Set basepython to python3 https://review.openstack.org/491594 | 15:06 |
*** derekh has quit IRC | 15:06 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Setup ANSIBLE_ROLES_PATH in tox.ini https://review.openstack.org/491595 | 15:06 |
*** yamamoto_ has quit IRC | 15:08 | |
*** dhajare has joined #openstack-infra | 15:08 | |
openstackgerrit | Kevin Carter (cloudnull) proposed openstack-infra/project-config master: Up the quota within RAX https://review.openstack.org/492566 | 15:09 |
* clarkb tries to catch up on all the fun | 15:09 | |
cloudnull | infra-core ^ | 15:09 |
clarkb | pabelanger: fungi is glean's config drive label thing still a problem? | 15:09 |
cloudnull | if you all can get that through I've spoken to pub cloud folks over here and they say we're good to up the limits for now | 15:10 |
clarkb | cloudnull: I've +2'd thank you! | 15:10 |
openstackgerrit | Slawek Kaplonski proposed openstack-infra/project-config master: Enable missing "qos" extension driver for Neutron ML2 plugin https://review.openstack.org/492567 | 15:10 |
*** yamamoto has joined #openstack-infra | 15:10 | |
jrich | If I may ask a process question: I've been reading up on all the processes to create a 3rd party neutron plugin for my company. I've been reading the creators page over and over and find myself a bit confused as some of it implies you might already have a repo in the openstack tree, and others imply you might use launchpad to host your code before it is accepted to the openstack tree. At what point would I ask infra to create an empy repo in the | 15:10 |
jrich | openstack tree? (I've got launchpad all setup, working on CI related matters now) | 15:10 |
*** slaweq has joined #openstack-infra | 15:10 | |
*** jamesmcarthur has joined #openstack-infra | 15:10 | |
*** kjackal_ has quit IRC | 15:11 | |
*** pgadiya has quit IRC | 15:11 | |
*** yamamoto has quit IRC | 15:11 | |
*** hareesh has quit IRC | 15:12 | |
cloudnull | clarkb: any way that could get in without having to go through the 5+ hours of gating ? | 15:12 |
clarkb | cloudnull: we can bypass the gate it deemed necessary. I was mostly afk yesterday and catching up now so I will let others more in the know make that judgement call | 15:12 |
cloudnull | ++ | 15:12 |
mnaser | clarkb i proposed a fix yesterday https://review.openstack.org/#/c/492353/ | 15:13 |
mnaser | i didnt want to recheck as i wasnt sure the suse thing was a real issue or a timeout | 15:13 |
mnaser | but i guess i can always throw a recheck | 15:13 |
*** jamesmcarthur has quit IRC | 15:15 | |
openstackgerrit | Merged openstack-infra/system-config master: Run htcacheclean under lock https://review.openstack.org/492481 | 15:16 |
clarkb | mnaser: I've gone ahead and approved it too | 15:16 |
*** ramishra has quit IRC | 15:16 | |
fungi | clarkb: i think we still need a glean release if that label change has merged | 15:17 |
tmorin | infraroot: I think I heard someone talk about a script that retrieves all logs for a test job run, is there something available somewhere doing that ? | 15:17 |
clarkb | fungi: ya we will need one once the change merges (if it merges...) | 15:17 |
*** quite has joined #openstack-infra | 15:19 | |
*** quite has quit IRC | 15:19 | |
*** quite has joined #openstack-infra | 15:19 | |
*** jamesmcarthur has joined #openstack-infra | 15:20 | |
clarkb | tmorin: the tox test runner knows to grab common tox command related logs (things like subunit and such for unittests) and devstack-gate knows how to get openstack related logs (openstack services, libvirt, mysql, and so on) | 15:21 |
*** annegentle has joined #openstack-infra | 15:21 | |
pabelanger | morning | 15:22 |
*** rlandy has joined #openstack-infra | 15:22 | |
fungi | clarkb: tmorin: the question may be more about how to retrieve a bundle of logs from the logs site for a completed test run? | 15:22 |
clarkb | ianw: thanks for digging into the mirror related items. Re the rackspace mirror being mostly odd on its own ya I think restarting may be a good place to start then we can apply mpm changes if problem persists | 15:22 |
pabelanger | clarkb: fungi: do you mind looking at https://review.openstack.org/492336 help avoid a 302 redirect from yum client on every request to buildlogs | 15:22 |
tmorin | clarkb: yes exactly, sorry if I wasn't clear | 15:22 |
clarkb | to retrieve a bundle of logs from the logs site I just use wget | 15:23 |
*** spzala has joined #openstack-infra | 15:23 | |
fungi | i used to do something for an ol ftp archive i ran, where requesting a directory name with .tar appended would recursively bundle it up and serve that to you, but no idea if that would be generally useful for logs.o.o or a possible nuisance for us | 15:24 |
fungi | and agreed, wget can be set to recursively mirror content for you as an alternative anyway | 15:25 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 15:25 |
openstackgerrit | Merged openstack-infra/elastic-recheck master: Add query for bug 1709744 https://review.openstack.org/492342 | 15:25 |
openstack | bug 1709744 in OpenStack-Gate "Gem fetch networking errors" [Undecided,New] https://launchpad.net/bugs/1709744 | 15:25 |
*** annegentle has quit IRC | 15:26 | |
fungi | pabelanger: for some reason i thought i had already reviewed that one, but i guess not | 15:26 |
pabelanger | fungi: I think you did review adding buildlogs.cdn.centos patch, this once is new a collapses it back to a single entry | 15:27 |
*** sdake_ is now known as sdake | 15:27 | |
fungi | oh, so it does | 15:28 |
*** annegentle has joined #openstack-infra | 15:28 | |
clarkb | jeblair: fungi http://status.openstack.org/elastic-recheck/#1686542 shows there may be a recent drop off in job timeouts. Do we think that is likely related to reducing max-servers in infracloud? | 15:31 |
clarkb | oh I guess there was a change to pause image uploads there too | 15:31 |
*** thingee_ has joined #openstack-infra | 15:31 | |
fungi | clarkb: yeah, i think the combination of the two impacted that | 15:31 |
pabelanger | clarkb: and fixes to buildlogs.cdn.centos has helped alot, we're now caching RPMs properly | 15:32 |
pabelanger | ruby gems / npm is the next ones to do I think | 15:32 |
pabelanger | https://review.openstack.org/457720 for npm | 15:33 |
tmorin | fungi, clarkb: yes, ok, recursive wget will do! thanks | 15:35 |
*** LindaWang has joined #openstack-infra | 15:35 | |
openstackgerrit | Sean Handley proposed openstack-infra/project-config master: Add Public Cloud WG project. https://review.openstack.org/489548 | 15:35 |
*** LindaWang has quit IRC | 15:36 | |
openstackgerrit | Merged openstack-infra/system-config master: Replace buildlogs.centos with buildlogs.cdn.centos https://review.openstack.org/492336 | 15:36 |
*** e0ne has quit IRC | 15:39 | |
seanhandley | AJaeger_: I rebased with master ^ Was that what your last comment was suggesting? I notice there's a "Conflicts with" on the right of the UI (never seen that before). | 15:39 |
openstackgerrit | Petr Benas proposed openstack/python-jenkins master: Allow specifying a non-standard port https://review.openstack.org/492478 | 15:42 |
*** jamesmcarthur has quit IRC | 15:42 | |
*** jamesmcarthur has joined #openstack-infra | 15:43 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 15:44 |
clarkb | infra-root with more udnerstadning of load issues do we want to direct enqueue the glean fix for config drive labels and the rax quota bump changes into the gate so they merge more quickly? | 15:44 |
jeblair | clarkb: ++ | 15:45 |
clarkb | ok I'll go ahead and do that | 15:46 |
*** jamesmcarthur has quit IRC | 15:46 | |
pabelanger | Oh, we have more quota in RAX, nice | 15:46 |
*** jamesmcarthur has joined #openstack-infra | 15:46 | |
fungi | i'm on board with that suggestion | 15:47 |
clarkb | the enqueue command is not returning as quickly as I would expect. I will attempt to practice patience :) | 15:48 |
fungi | cloudnull: somehow i missed your change in the scrollback until pabelanger just mentioned it. thanks!!! | 15:48 |
pabelanger | clarkb: fungi: maybe we should consider brining citycloud-sto2 online and monitor mirrors.sto2 this time. I think htcache could explain some networking issues we see in that region | 15:48 |
clarkb | pabelanger: does that mirror have the problem that lon1 has/had? | 15:49 |
fungi | pabelanger: it wouldn't have explained our dns resolution failures there, but worth trying i guess | 15:49 |
pabelanger | fungi: clarkb: and with buildlogs.cdn.centos changes, I think that will also help no more mirror issues we seen | 15:49 |
pabelanger | fungi: Oh, right. DNS | 15:50 |
pabelanger | I forgot about that | 15:50 |
*** jpena|off is now known as jpena | 15:50 | |
fungi | more just that it's been a few weeks since we brought it to their attention, they mentioned some things they were looking into... then nothing | 15:50 |
fungi | so maybe they've fixed it already? | 15:50 |
clarkb | ok both changes are being enqueued, will hopefully merge in the near future | 15:50 |
pabelanger | maybe? or maybe we should try live chat like ianw did | 15:51 |
clarkb | may be worthwhile | 15:51 |
mordred | pabelanger: https://review.openstack.org/#/c/492567 <-- got a sec for a trivial +A? | 15:52 |
pabelanger | fungi: do you mind also looking at 457720 | 15:52 |
pabelanger | done | 15:52 |
mordred | thanks | 15:53 |
*** Apoorva has joined #openstack-infra | 15:56 | |
*** vhosakot has quit IRC | 15:58 | |
*** camunoz has quit IRC | 15:59 | |
*** jamesmcarthur has quit IRC | 15:59 | |
*** camunoz has joined #openstack-infra | 15:59 | |
*** jamesmcarthur has joined #openstack-infra | 15:59 | |
*** aarefiev is now known as aarefiev_afk | 16:00 | |
*** skelso has joined #openstack-infra | 16:01 | |
fungi | pabelanger: i don't know about live chat for something complex like intermittent network connectivity for all instances in a region. plus they escalated it to one of the engineers already who was e-mailing us about it | 16:01 |
*** annegentle has quit IRC | 16:01 | |
*** dklyle is now known as david-lyle | 16:02 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 16:03 |
*** jamesmcarthur has quit IRC | 16:05 | |
openstackgerrit | Merged openstack-infra/project-config master: Up the quota within RAX https://review.openstack.org/492566 | 16:06 |
openstackgerrit | Witold Bedyk proposed openstack-infra/project-config master: Add documentation jobs for monasca-api https://review.openstack.org/490569 | 16:07 |
*** jamesmcarthur has joined #openstack-infra | 16:07 | |
openstackgerrit | Merged openstack-infra/system-config master: Add registry.npmjs.org reverse proxy cache https://review.openstack.org/457720 | 16:09 |
pabelanger | clarkb: fungi: just seen a job fail to download from mirror.iad.rax.openstack.org, it had load of about 6 and also multiple htcacheclean process, I've killed them for now and load is back down to under 1.0 | 16:10 |
pabelanger | we likely should audit all of the mirrors | 16:10 |
pabelanger | going to do that now | 16:10 |
*** yamamoto has joined #openstack-infra | 16:12 | |
clarkb | ok thanks | 16:12 |
dmsimard | clarkb, pabelanger, fungi, mordred: let me know if there are any low hanging fruits I can help with ? reviews or stuff I can fix. I'm not root or core but would like to help if I can. | 16:13 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Bind secrets to their playbooks https://review.openstack.org/492307 | 16:14 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Expose final job attribute https://review.openstack.org/479382 | 16:14 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove 'auth' dict from jobs https://review.openstack.org/492309 | 16:14 |
pabelanger | dmsimard: for now, I've just been keeping a close eye on jobs that are failing in gate (with help of elastic-recheck). | 16:14 |
fungi | thanks pabelanger, good idea | 16:14 |
pabelanger | mirror.iad same issue, load 6+, back under 1 once htcleancache killed | 16:14 |
pabelanger | dfw* sorry | 16:15 |
fungi | dmsimard: yeah, helping us sort out what's slow or failing and why, trying to identify commonalities and bucket them together, is useful | 16:15 |
*** yamamoto has quit IRC | 16:17 | |
dmsimard | mhayden and I did notice some strange unexplained behavior about some jobs that seemed to be running with outdated images or project-config, I don't know if we have been able to get to the bottom of it but maybe there is a correllation | 16:17 |
*** eroux has quit IRC | 16:17 | |
dmsimard | mhayden: did someone figure that one out ? | 16:17 |
fungi | dmsimard: that would be great to dig into. it doesn't sound like a problem i'm aware of yet (other than the very brief mention in scrollback earlier) | 16:18 |
fungi | dmsimard: do you have any details? | 16:18 |
mhayden | not yet | 16:18 |
*** Swami has quit IRC | 16:19 | |
mhayden | i supplied a few examples to mordred yesterday -- un momento | 16:19 |
*** mriedem is now known as mriedem_away | 16:20 | |
dmsimard | fungi: to make a long story short, we had jobs that were randomly not able to install a pyopenssl package, even long after the supposed fix had landed | 16:20 |
dmsimard | I can surely come up with a logstash query, hang on. | 16:20 |
*** martinkopec has quit IRC | 16:21 | |
mhayden | fungi: https://gist.github.com/major/4a6760f1f90303625061b40d16b79374 | 16:21 |
fungi | dmsimard: mhayden: a link to the expected fix would also be helpful | 16:21 |
mhayden | fungi: well, the fix is already merged into project-config, and has been for 5-6days | 16:21 |
mhayden | however, some nodes come up with old versions of project-config | 16:22 |
mhayden | prior to the fix | 16:22 |
*** eroux has joined #openstack-infra | 16:22 | |
fungi | mhayden: right, that's why i'd like to know which one it was | 16:22 |
mhayden | oh, i see what you mean | 16:22 |
mhayden | :) | 16:22 |
fungi | nodes don't themselves necessarily rely on project-config, which is part of my confusion | 16:22 |
*** ccamacho has quit IRC | 16:22 | |
fungi | so trying to start from the bottom and work my way up | 16:22 |
mhayden | fungi: https://github.com/openstack-infra/project-config/commit/c6cc5abe77ebcab2f55fffbc8ec1ee1c27c13074 | 16:23 |
dmsimard | mhayden, fungi: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Package%20pyOpenSSL-0.15.1-1.el7.noarch%20is%20obsoleted%20by%20python2-pyOpenSSL-16.2.0-3.el7.noarch%20which%20is%20already%20installed%5C%22 | 16:23 |
dmsimard | message:"Package pyOpenSSL-0.15.1-1.el7.noarch is obsoleted by python2-pyOpenSSL-16.2.0-3.el7.noarch which is already installed" | 16:23 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename allow-secrets to allow-untrusted-secrets https://review.openstack.org/492614 | 16:23 |
mhayden | fungi: well, that last one is missing, as is https://github.com/openstack-infra/project-config/commit/5fd623e1c30cf73faf01c355b1f769a68794aa49 | 16:23 |
*** tmorin has quit IRC | 16:23 | |
pabelanger | fungi: clarkb: I think I got all the mirrors, each had multiple processes of htcacheclean running. RAX improved the most | 16:24 |
pabelanger | but did notice: htcacheclean -n -d120 -i -p/var/cache/apache2/mod_cache_disk -l300M | 16:24 |
pabelanger | not sure where that is getting called from | 16:24 |
*** pcaruana has quit IRC | 16:25 | |
*** psachin has joined #openstack-infra | 16:26 | |
dmsimard | mhayden, fungi: I'll actually go ahead and create a bug for that and create an elastic recheck query. | 16:28 |
dmsimard | since it's quite easy to track down in logstash | 16:29 |
mhayden | dmsimard: teach me the ways when you're done ;) | 16:29 |
*** lucasagomes is now known as lucas-afk | 16:32 | |
*** hongbin has joined #openstack-infra | 16:33 | |
openstackgerrit | Gabriele Cerami proposed openstack-infra/tripleo-ci master: WIP: containers periodic test https://review.openstack.org/475747 | 16:33 |
*** rcernin has quit IRC | 16:33 | |
openstackgerrit | David Moreau Simard proposed openstack-infra/elastic-recheck master: Add query for PyOpenSSL installation failures https://review.openstack.org/492616 | 16:34 |
dmsimard | mhayden: ^ | 16:34 |
mhayden | thanks | 16:34 |
dmsimard | mhayden: however ironically it might take a long time to land due to the state of the gate :( | 16:35 |
openstackgerrit | Merged openstack-infra/glean master: Add checks for uppercase config drive label https://review.openstack.org/492353 | 16:36 |
pabelanger | mnaser: clarkb: Yay^ | 16:38 |
oanson | Hi. I am trying to add neutron-dynamic-routing tempest tests to Dragonflow's tempest gate. I've added the (^neutron_dynamic_routing...) regex, but I am not sure how to tell it to load neutron-dynamic-routing's tempest plugin. Could someone please assist? | 16:38 |
clarkb | mhayden: how does redirecting stdout fix openssl? | 16:38 |
fungi | mhayden: dmsimard: okay, shifted gears for a sec so i can look into this... looks like the error from that query has appeared in logs going back at least a week according to logstash | 16:39 |
mhayden | clarkb: that was a secondary patch | 16:39 |
clarkb | oanson: for tempest plugin help #openstack-qa is probably better location to ask | 16:39 |
oanson | clarkb, Sure. Will cross post. Thanks! | 16:39 |
dmsimard | clarkb: yeah mhayden confused everyone with his devnull patch, you have to look for the one before that :p | 16:39 |
mhayden | clarkb: this was the main patch -> https://github.com/openstack-infra/project-config/commit/5fd623e1c30cf73faf01c355b1f769a68794aa49 | 16:39 |
mhayden | the CentOS image has *tons* of repos enabled | 16:40 |
mhayden | so we disable all except the basics in our gates, to more closely simulate a production environment | 16:40 |
dmsimard | yeah, it installs centos-release-openstack-ocata which bundles about 4 repos I think | 16:40 |
dmsimard | rdo, virt, virt-common, ceph | 16:40 |
dmsimard | (probably shouldn't do that btw) | 16:40 |
mhayden | i was tempted to propose a patch to stop doing that, but i don't know how the RDO/triple-o folks feel about such things ;) | 16:40 |
*** ykarel_ has quit IRC | 16:41 | |
mhayden | i would hope that their gate jobs would ensure that repo is present | 16:41 |
pabelanger | mhayden: dmsimard: yes, I want to remove that too. I wanted to discuss it with ianw first however | 16:41 |
dmsimard | mhayden: maybe pabelanger or ianw would know why those are there. I know that OOO jobs get rid of all the repos too | 16:41 |
fungi | so for starters, the suspected fix was to the {pipeline}-{name}-ansible-{scenario}-{ostype}{suffix} template | 16:41 |
fungi | most recent hit i pulled from logstash was in a job called gate-openstack-ansible-os_aodh-ansible-func-centos-7 | 16:42 |
mhayden | sounds reasonable | 16:42 |
pabelanger | dmsimard: mhayden: we add rdo-ocata repo in images right now to install things like openvswitch. Need to see when that package gets installed | 16:42 |
mhayden | pabelanger: happy to join that conversation as i seem to be the RPM flag carrier in the land of openstack-ansible ;) | 16:42 |
fungi | openstack-ansible-os_aodh is a project i guess? | 16:42 |
mhayden | fungi: yes, it deploys aodh | 16:42 |
fungi | yeah, just confirmed from the project column in logstash | 16:43 |
fungi | okay, trying to make sure we were at least looking at a job built from this template | 16:43 |
mhayden | pabelanger: ah okay -- our expectation is that we would need to ensure the proper repos are present when our gate job scripts run (the ones from the OSA repos) | 16:43 |
fungi | it's a shame the shell block there isn't set -x so we can see the commands | 16:44 |
pabelanger | mhayden: I am thinking we'd do the same thing we did for EPEL, only enable it for the package we need, but leave it disabled / removed | 16:44 |
mhayden | we don't expect the nodepool image to contain those repos, if that makes sense -- we expect to configure them ourselves since our prod envs likely wouldn't have them installed | 16:44 |
mhayden | pabelanger: 100% agreed | 16:44 |
pabelanger | mhayden: but like I said, need to confirm with ianw | 16:44 |
odyssey4me | I think ianw works in a EU timezone, so we'd likely only get feedback tomorrow. | 16:44 |
dmsimard | fungi: FWIW you'll find some of those errors in ARA gate too. Which is one of the reasons I turned those jobs non-voting .. but I suspect they run less than the jobs in the OSA gate and thus occur less often. | 16:44 |
dmsimard | odyssey4me: AU timezone :) | 16:45 |
fungi | oh, i guess it does inherit a -x | 16:45 |
fungi | zuul does that by default i guess | 16:45 |
odyssey4me | ah, I stand corrected - he's always on in my morning - so that makes sense :) | 16:45 |
fungi | and yes looks like `sudo yum-config-manager --enable epel` gets run but not the others that patch should have added | 16:45 |
openstackgerrit | Monty Taylor proposed openstack-dev/pbr master: Put test-requirements into an extra named 'test' https://review.openstack.org/492619 | 16:46 |
openstackgerrit | Monty Taylor proposed openstack-dev/pbr master: Add support for a docs extra aligned with RTD locations https://review.openstack.org/492620 | 16:46 |
fungi | so this probably suggests we're behind in updating configuration for one of the launchers | 16:46 |
openstackgerrit | Gabriele Cerami proposed openstack-infra/tripleo-ci master: WIP: containers periodic test https://review.openstack.org/475747 | 16:46 |
clarkb | iirc those repos are there for ovs | 16:46 |
pabelanger | odyssey4me: dmsimard: right now, it looks like python-pip / python2-setuptools are getting installed from centos-openstack-ocata: http://nb04.openstack.org/dib.centos-7.log | 16:46 |
fungi | "Launched by zl02" on this particular example | 16:46 |
clarkb | if rhel would be so kind to realize people use ovs without openstack this would bea non issue :) | 16:46 |
dmsimard | pabelanger: fair for python-pip, python2-setuptools is also in base OS so not required | 16:46 |
mhayden | fungi: your assessment sounds spot-on | 16:47 |
fungi | the "build_master" column in logstash gets us that | 16:47 |
fungi | and they _all_ look to be zl02 | 16:47 |
dmsimard | fungi: that seems like a winner | 16:47 |
fungi | so i think this means zl02 isn't getting config updates for some reason | 16:47 |
odyssey4me | pabelanger clarkb given that centos isn't used for devstack (or does it?) - why would OVS need to be there for jobs? | 16:47 |
pabelanger | dmsimard: right, we likely need to build a centos-7 DIB with centos-openstack-ocata and see what breaks in the build. Then find which projects (devstack) are installing packages for it, like OVS | 16:47 |
fungi | i'll check out that launcher | 16:47 |
dmsimard | odyssey4me: they are trying to make it run devstack, though. | 16:48 |
odyssey4me | dmsimard ah, I see | 16:48 |
*** dhajare has quit IRC | 16:48 | |
pabelanger | odyssey4me: we use it for multinode jobs, but need to check I think they are still non-voting | 16:48 |
clarkb | odyssey4me: it is used for devstack and multinode testing | 16:48 |
odyssey4me | surely whatever uses it should have the tolling to add the repo it needs instead of building the repo config into the image? | 16:49 |
clarkb | pabelanger: devstack-gate is what will break bte | 16:49 |
odyssey4me | *tooling | 16:49 |
fungi | the /etc/project-config/jenkins/jobs/ansible-role-jobs.yaml file on zl02 looks current (has the fix) to project-config itself is getting updates there, suggesting the launcher daemon isn't loading them successfully. now to see why | 16:49 |
clarkb | yes there are lots of options | 16:49 |
clarkb | iirc we decided to do it this way so that the images would be roughly equivalent | 16:50 |
*** Swami has joined #openstack-infra | 16:50 | |
clarkb | rather than needing to special case every silly distro derp | 16:50 |
pabelanger | clarkb: thanks | 16:51 |
*** bhavik1 has joined #openstack-infra | 16:51 | |
odyssey4me | clarkb yeah, it's hard to balance the cost/benefit for all needs | 16:52 |
jeblair | launcher-debug.log.14.gz:2017-07-27 06:54:22,262 DEBUG zuul.CommandSocket: Received reconfigure from socket | 16:52 |
jeblair | fungi: last command received ^ | 16:52 |
*** jamesmcarthur has quit IRC | 16:52 | |
fungi | jeblair: thanks, i was grepping through them for reconfig but hadn't gotten that far back in time yet | 16:52 |
fungi | probably need to look at puppet logs instead in that case | 16:53 |
*** slaweq has quit IRC | 16:53 | |
*** caphrim007 has joined #openstack-infra | 16:53 | |
fungi | Aug 10 09:39:47 zl02 puppet-user[7913]: (/Stage[main]/Zuul::Launcher/Exec[zuul-launcher-reload]) Triggered 'refresh' from 1 events | 16:54 |
*** marst_ has quit IRC | 16:54 | |
*** marst_ has joined #openstack-infra | 16:54 | |
fungi | no errors there | 16:54 |
jeblair | root@zl02:~# zuul-launcher --help | 16:55 |
jeblair | zuul-launcher: command not found | 16:55 |
fungi | it's new-style reconfigure so not signal handler based... maybe it's stopped watching its fifo for that? | 16:55 |
*** jamesmcarthur has joined #openstack-infra | 16:55 | |
fungi | oh, weird | 16:55 |
mordred | fungi, clarkb, odyssey4me: I've been thinking recently that it would be 'nice' to be able to list alternate repos that are needed for bindep | 16:56 |
fungi | cannot access /usr/local/bin/zuul-launcher: No such file or directory | 16:56 |
pabelanger | eep | 16:56 |
dmsimard | mordred: I think I created a story or a bug somewhere wit ianw to support a syntax like [platform:epel] or something | 16:56 |
* dmsimard searches | 16:56 | |
fungi | pip list does't show zuul installed | 16:56 |
mordred | ran in to a similar thing the other day where there was a tool that need a repo added but there was no way to communicate that other than readme, which meant putting the tool depend into bindep was a non-starter | 16:56 |
*** Guest34949 has quit IRC | 16:56 | |
pabelanger | fungi: pip3? | 16:57 |
pabelanger | fungi: possible it was installed as python3? | 16:57 |
fungi | pabelanger: doesn't look like it either, no | 16:57 |
*** yamahata has quit IRC | 16:57 | |
pabelanger | odd indeed | 16:57 |
mordred | dmsimard: yah- something like that - although the specific case I had was like [platform:ubuntu:repo:https://example.com/ubutu:gpg-id:223452] | 16:57 |
jeblair | fungi, pabelanger: syslog has aged out far enough for us to lose relevant info i think | 16:58 |
fungi | so best guess, zuul somehow got uninstalled late last month on zl02, but we don't have syslog going back that far | 16:58 |
fungi | right, that | 16:58 |
mordred | which is obviously a terrible ui for that - but was the thing I wanted to be able to express and couldn't | 16:58 |
mordred | fungi, jeblair: that's both very strange and also I'm sad there is no more log :( | 16:59 |
jeblair | fungi, pabelanger: shall i "pip install /opt/zuul" ? | 16:59 |
fungi | jeblair: though next question, why isn't puppet installing it? | 16:59 |
jeblair | do we want a -U in there? | 16:59 |
odyssey4me | mordred we've always wanted to be able to point to a remote bindep file to reduce the requirement to synchronise things all over the place, but that's a whole different story | 16:59 |
jeblair | fungi: i think we only install on git repo updates | 16:59 |
odyssey4me | s/always/also/ | 16:59 |
jeblair | fungi: oh, the date of the last commit on that git repo is july 27 | 17:00 |
mordred | odyssey4me: nod. that would also be nice | 17:00 |
*** dtantsur is now known as dtantsur|afk | 17:00 | |
jeblair | fungi: it seems likely that pip *uninstalled* it for us then | 17:00 |
jeblair | er puppet | 17:00 |
fungi | jeblair: okay, so that's more than coincidence at least | 17:00 |
fungi | jeblair: how about this: roll back the on-disk repo to HEAD^1 and see what puppet does next? | 17:00 |
odyssey4me | heh, the puppetmaster has stolen its ghost | 17:00 |
jeblair | possibly it uninstalled it, then hit an error, then since it only fires on repo updates, never got around to fixing it. | 17:00 |
mordred | oh - yah - the uninstall/reinstall dance maybe died in the middle | 17:01 |
mordred | since that's how upgrades work | 17:01 |
fungi | worried that a manual run of pip won't turn up the same issue as puppet's attempt for heisenbug reasons | 17:01 |
*** baoli has quit IRC | 17:01 | |
*** ralonsoh has quit IRC | 17:01 | |
mnaser | pabelanger awesome glad the glean patch merged | 17:01 |
fungi | but yeah, could have been something as innocuous as a network error hitting pypi | 17:02 |
mordred | maybe we should also put in a puppet resource that will run the pip install if /usr/local/bin/zuul-launcher doesn't exist? | 17:02 |
jeblair | fungi: i'd wager the bug was transient, but if you want to test it with HEAD^1, i think that would work and get us a little more data. have at it (i'll stand down) | 17:02 |
openstackgerrit | John L. Villalovos proposed openstack/gertty master: Change usage of exit() to sys.exit() https://review.openstack.org/492622 | 17:02 |
fungi | mordred: seems like good belt-and-braces engineering to me | 17:02 |
mordred | (assuming any of us know how to express such a thing in puppet) | 17:02 |
jeblair | mordred: ++ | 17:02 |
clarkb | pabelanger: odyssey4me also the reason we use ovs instead of linux bridges has to do with vxlan support which may not be a problem anymore for linux bridge? We can't GRE in all clouds as it is its own ip protocol and you can't reliably enable it via neutron | 17:03 |
mnaser | i'd kindly ask if it is possible to request the following to be promoted - https://review.openstack.org/#/c/492558/ -- we've been blocked in magnum for quite sometime due to our dependency on fedorapeople.org and it being really slow, almost all jobs are timing out (wasting resources) | 17:03 |
clarkb | pabelanger: odyssey4me all that to say we could potentially go back to linux bridge with vxlan instead of gre assuming linux bridge vxlan support has grown enough to support it | 17:03 |
mnaser | with that patch, we will get those mirrored and we'll be able to churn less failed tests :> | 17:03 |
odyssey4me | clarkb yep, LXB has had VXLAN since trusty | 17:03 |
clarkb | and just bypass ovs repo trouble entirely | 17:03 |
clarkb | odyssey4me: ya but iirc it can't do braodcasts or something without multicast whereas ovs has hacks for that? | 17:04 |
fungi | jeblair: okay, because of merge commits i guess "Your branch is behind 'origin/master' by 2 commits" now that i've reset it to HEAD^1 | 17:04 |
clarkb | odyssey4me: its easy enough to test by pushing a change to run with linux bridge and seeing if it works though | 17:04 |
*** baoli has joined #openstack-infra | 17:04 | |
openstackgerrit | Merged openstack-infra/project-config master: Add periodic-stable jobs to oslo projecst that assert stable:follows-policy https://review.openstack.org/491980 | 17:04 |
fungi | i'm tailing syslog on zl02 now filtering for puppet | 17:04 |
odyssey4me | clarkb since the inception of OSA in its previous life as os-ansible-deployment (icehouse) we were using vxlan... so I think those days are long, long gone | 17:04 |
openstackgerrit | Merged openstack-infra/project-config master: Add release notes jobs for python-swiftclient https://review.openstack.org/491940 | 17:04 |
clarkb | odyssey4me: with multicast though or without? | 17:05 |
odyssey4me | that I couldn't answer - perhaps mhayden or cloudnull are aware of such things? | 17:05 |
*** ociuhandu has quit IRC | 17:06 | |
clarkb | but ya simple enough to switch it over in devstack gate and just see if anything breaks :) | 17:06 |
openstackgerrit | Merged openstack-infra/project-config master: Add periodic python jobs to kolla https://review.openstack.org/491133 | 17:06 |
*** bh526r has quit IRC | 17:06 | |
clarkb | if I find time today I may push that up | 17:06 |
pabelanger | clarkb: I'm going to try: yum-config-manager --disable centos-openstack-ocata; yum -y install --enablerepo=centos-openstack-ocata openvswitch. That should also work | 17:07 |
pabelanger | we do the same thing with haveged and EPEL | 17:07 |
clarkb | pabelanger: in devstack-gate? | 17:07 |
pabelanger | clarkb: ya, see what breaks | 17:07 |
clarkb | note that we use devstack's install routines iirc | 17:07 |
pabelanger | ok | 17:08 |
odyssey4me | clarkb please add me to the review, I plan on doing a bunch of work with OSA to use devstack-gate in Queens so I need to get more familiar with it | 17:08 |
*** electrofelix has quit IRC | 17:08 | |
odyssey4me | we'd like to converge tooling for host prep where possible to stop reinventing wheels | 17:08 |
*** sree has quit IRC | 17:08 | |
openstackgerrit | Merged openstack-infra/project-config master: Revert "Change job type for 3nodes job to move to oooq runner" https://review.openstack.org/491477 | 17:09 |
pabelanger | we're also about to start the push to ansiblify devstack-gate for zuulv3 too | 17:09 |
odyssey4me | oh, good show | 17:09 |
odyssey4me | I would be very happy to review anything along that line. In fact, let me add that repo to my review dashboard. | 17:10 |
fungi | yeah, the eventual goal there is to decompose a lot of the reusable bits of devstack-gate as ansible roles so projects can mix-n-match them separate of the devstack-gate framework as a whole | 17:10 |
openstackgerrit | Monty Taylor proposed openstack-infra/puppet-zuul master: Ensure zuul gets reinstalled if it's missing https://review.openstack.org/492624 | 17:11 |
mordred | clarkb, fungi, jeblair, pabelanger: ^^ how does that look? | 17:12 |
*** iyamahat has joined #openstack-infra | 17:12 | |
odyssey4me | certainly works as a goal - which will hopefully bring the general projects using ansible together for test/infra bits | 17:13 |
*** marst_ has quit IRC | 17:13 | |
jeblair | mordred: i wonder if install_zuul can just have its own creates? or does refreshonly prohibit that? | 17:13 |
clarkb | refresh only won't prohibit it but would undermine it as sicne those conditions will be ANDed iirc | 17:14 |
odyssey4me | and once PR's are enabled as as a patch submission tool for openstack-infra, it might make it possible to share with non-openstack projects too | 17:14 |
*** annegentle has joined #openstack-infra | 17:14 | |
*** Apoorva_ has joined #openstack-infra | 17:14 | |
jeblair | odyssey4me: can you elaborate on "once PR's are enabled as as a patch submission tool for openstack-infra" ? | 17:14 |
odyssey4me | well, it's my understanding that PR's via github may become a possible way to submit patches to openstack repositories | 17:15 |
*** iyamahat has quit IRC | 17:15 | |
mordred | jeblair: I think refreshonly breaks that | 17:15 |
*** iyamahat has joined #openstack-infra | 17:15 | |
mordred | or, yeah, what clarkb said | 17:15 |
*** iyamahat has quit IRC | 17:16 | |
*** iyamahat has joined #openstack-infra | 17:16 | |
*** slaweq has joined #openstack-infra | 17:16 | |
jeblair | odyssey4me: is there a spec about that? | 17:16 |
fungi | odyssey4me: we'd need some tool to ingest those, and also to finish the work necessary to drop the icla in favor of the dco | 17:16 |
clarkb | odyssey4me: ah ok looks like you can set bridge fdb rules to forward all l2 traffic to the null address to all interfaces. This way you don't have to run l2pop | 17:17 |
clarkb | odyssey4me: that is probably workable for this setup since we are talking small numbers of hosts (also I think it is equivalent to how ovs hacks around this problem) | 17:17 |
*** Apoorva has quit IRC | 17:18 | |
odyssey4me | jeblair everything I just said was from hearsay - don't believe a word of it ;) | 17:18 |
odyssey4me | I'm not sure how far along that is from idea to reality, or if it will ever be. | 17:18 |
jeblair | odyssey4me: heh, ok. yeah i think it's more at the idea stage. could be reality eventually, but i don't think there's any current work planned. | 17:19 |
openstackgerrit | Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: WIP: Add upload-pypi job https://review.openstack.org/491926 | 17:19 |
openstackgerrit | Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: Create publish-openstack-python(-branch)-tarball jobs https://review.openstack.org/491093 | 17:19 |
odyssey4me | I like any idea that broadens the community that can contribute and use what we work on. | 17:19 |
fungi | odyssey4me: it's been discussed as a hypothetical once we can rely on the dco (update the pull request closer to instead leave comments if signed-off-by is missing, git-am any with a signed-off-by and close with a link to the gerrit review) but we're far from being able to implement that for logistical reasons for now | 17:19 |
*** bhavik1 has quit IRC | 17:20 | |
fungi | odyssey4me: the main catch right now is that we need contributors to official deliverables to have agreed to the individual contributor license agreement first | 17:20 |
odyssey4me | Not that contributing to openstack is hard, or consuming what we build is hard... certainly with the CLA requirement the barrier is lower now. | 17:20 |
odyssey4me | by that I mean http://lists.openstack.org/pipermail/openstack-dev/2017-August/120771.html | 17:21 |
Diabelko | how can I push job result to Gerrit so it will get picked up by test_result_table? can't find that part anywhere | 17:21 |
fungi | puppet is starting to update zl02 now | 17:22 |
mordred | odyssey4me: also, someone will need to write and maintain a thing that can do sync/forward between PRs and gerrit changes, which would increase the surface area of 'important things that are bad when they break' realted to github | 17:22 |
odyssey4me | mordred yes, there is that | 17:22 |
odyssey4me | and with the shrinking resource base of people people both capable and able to do such a thing, I guess it's more likely to remain an idea | 17:22 |
fungi | mordred: i expect the first version would just close ingested pull requests with a link to the corresponding change in gerrit and some blurb about how to update it via further pushes directly to gerrit | 17:23 |
mordred | odyssey4me: yah - but discussing doing that will be legally possible with the switch to dco | 17:23 |
mordred | fungi: yup | 17:23 |
fungi | the icla->dco switch is something i want to start focusing on soon, personally | 17:24 |
fungi | just trying to get myself a little un-buried first | 17:24 |
fungi | hopefully not being a ptl will help ;) | 17:24 |
mnaser | i would like to ask if it's possible to get https://review.openstack.org/#/c/492558/ bumped up the queue? magnum is completely blocked because the image we download is failing and it's churning job timeouts for no reason | 17:24 |
*** rbrndt has quit IRC | 17:24 | |
caphrim007 | mordred: are you aware of any update to pbr breaking the shade install? | 17:26 |
*** baoli has quit IRC | 17:26 | |
fungi | infra-root: zm02 now has zuul (2.5.3.dev2) installed again, so whatever caused it to get left uninstalled back on the 27th was transient as suspected | 17:27 |
*** baoli has joined #openstack-infra | 17:27 | |
*** baoli has quit IRC | 17:27 | |
odyssey4me | mhayden ^ | 17:27 |
odyssey4me | dmsimard & | 17:28 |
odyssey4me | thanks y'all for getting that figured out | 17:28 |
mhayden | fungi: thanks so much for digging into that | 17:28 |
fungi | thanks to everyone who reported it. this was an unusual situation, to be sure | 17:28 |
odyssey4me | it'd still be interesting to understand how it happened - likely a failed install | 17:28 |
dmsimard | sweet | 17:28 |
mordred | caphrim007: I am not aware of such a thing - are you seeing such an issue? | 17:28 |
dmsimard | fungi: thanks for tracking that down, hopefully it will help the gate to some extent | 17:28 |
caphrim007 | mordred: i saw this today https://gist.github.com/caphrim007/21f12e899c212ae076e9360b8ac58287 | 17:29 |
*** jamesmcarthur has quit IRC | 17:29 | |
openstackgerrit | Omer Anson proposed openstack-infra/project-config master: Add nuetron-dynamic-routing to Dragonflow's tempest's local.conf https://review.openstack.org/492626 | 17:29 |
openstackgerrit | John L. Villalovos proposed openstack/gertty master: Change usage of exit() to sys.exit() https://review.openstack.org/492622 | 17:29 |
*** jamesmcarthur has joined #openstack-infra | 17:29 | |
caphrim007 | mordred: pbr==3.0.0 didnt show a similar error | 17:29 |
fungi | mnaser: i went ahead and enqueued 492558,1 into the gate, so hopefully should land shortly | 17:30 |
mnaser | fungi thank you so much! | 17:30 |
mordred | caphrim007: interesting. I'm ont sure why pbr version would affect that - but lemme look real quick | 17:30 |
*** jpena is now known as jpena|mtg | 17:30 | |
odyssey4me | very happy to see the nodes in-use count climb after the quota increase merge | 17:30 |
*** markus_z has quit IRC | 17:30 | |
fungi | mnaser: the concern we were using a lot of additional test resources on continually failing magnum changes seemed reasonable | 17:31 |
*** yamahata has joined #openstack-infra | 17:31 | |
* fungi thinks things may have finally quieted down enough he can finish catching up on this morning's e-mail | 17:32 | |
mnaser | fungi: yeah, i figured just as much. unfortunately most of the cores are on EU so i would have asked them to abandon/restore to stop the jobs till we land this + patch for mirrors | 17:32 |
mnaser | but cant have it all D: | 17:32 |
caphrim007 | mordred: erm...now i cant reproduce it. i guess nevermind? | 17:32 |
openstackgerrit | David Shrewsbury proposed openstack-infra/project-config master: Disable py2 dsvm on nodepool feature/zuulv3 branch https://review.openstack.org/492629 | 17:32 |
Shrews | pabelanger: ^^^ i think this needs to be done for nodepool too? | 17:33 |
mordred | caphrim007: maybe you got lucky! definitely ping if you see it again, it looks like an issue in a transitive depend so it would be weird for a pbr version to be involved | 17:33 |
Shrews | pabelanger: i'm not sure about the coverage job | 17:33 |
pabelanger | Shrews: left question | 17:35 |
openstackgerrit | Sean McGinnis proposed openstack-infra/project-config master: Skip bandit and functional tests for doc changes https://review.openstack.org/492630 | 17:35 |
*** jamesmcarthur has quit IRC | 17:36 | |
*** baoli has joined #openstack-infra | 17:37 | |
Shrews | pabelanger: where are the gate jobs defined? | 17:38 |
*** shardy has quit IRC | 17:38 | |
*** krtaylor has quit IRC | 17:38 | |
pabelanger | Shrews: same place, just add a gate: key | 17:39 |
Shrews | pabelanger: what controls the gate jobs when not explicitly listed? | 17:40 |
Shrews | oh, probably in the templates | 17:41 |
pabelanger | Shrews: the templates above, they have gate jobs listed also | 17:41 |
openstackgerrit | Merged openstack-infra/system-config master: Stop rsync from managing setgid permissions for Fedora Atomic mirror https://review.openstack.org/492558 | 17:43 |
*** annegentle has quit IRC | 17:44 | |
*** slaweq_ has joined #openstack-infra | 17:46 | |
*** SumitNaiksatam has joined #openstack-infra | 17:48 | |
*** mriedem_away is now known as mriedem | 17:48 | |
*** slaweq has quit IRC | 17:50 | |
*** ociuhandu has joined #openstack-infra | 17:50 | |
*** sekelso has joined #openstack-infra | 17:50 | |
*** gouthamr_ has joined #openstack-infra | 17:51 | |
*** gouthamr has quit IRC | 17:51 | |
*** xarses_ has joined #openstack-infra | 17:52 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/project-config master: Disable py2 dsvm on nodepool feature/zuulv3 branch https://review.openstack.org/492629 | 17:52 |
*** skelso has quit IRC | 17:53 | |
*** spzala has quit IRC | 17:55 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/project-config master: Disable py2 dsvm on nodepool feature/zuulv3 branch https://review.openstack.org/492629 | 17:58 |
*** dizquierdo has quit IRC | 18:00 | |
*** iyamahat has quit IRC | 18:00 | |
*** iyamahat has joined #openstack-infra | 18:01 | |
*** trown is now known as trown|lunch | 18:01 | |
openstackgerrit | Matt Riedemann proposed openstack-infra/elastic-recheck master: Add query for same-host cellsv1 fail bug 1709946 https://review.openstack.org/492636 | 18:05 |
openstack | bug 1709946 in OpenStack Compute (nova) "ServersAdminTestJSON.test_create_server_with_scheduling_hint randomly fails SameHostFilter in cells v1 job" [Low,Confirmed] https://launchpad.net/bugs/1709946 | 18:05 |
*** Apoorva_ has quit IRC | 18:08 | |
*** Apoorva has joined #openstack-infra | 18:09 | |
*** davidsha has quit IRC | 18:10 | |
*** apetrich has quit IRC | 18:10 | |
*** annegentle has joined #openstack-infra | 18:11 | |
*** apetrich has joined #openstack-infra | 18:13 | |
*** Apoorva_ has joined #openstack-infra | 18:15 | |
clarkb | mordred: pabelanger fungi mnaser so do we want to make a glean release? | 18:16 |
fungi | clarkb: i believe so | 18:16 |
fungi | have you looked at the history since the last tag yet? | 18:17 |
clarkb | I have not. Currently getting babysitter settled in then I can dig through and see what we'd be adding (my guess is not much) | 18:17 |
mnaser | clarkb i guess we could, then once we get new images, i made some tweaks/changes to the flavors so ill probably have to submit a change which will be able to give 8 cores (shared) instead of 2 (dedicated). looks like 100% cpu usage is not much of an issue | 18:17 |
fungi | clarkb: no problem, just trying not to duplicate efforts. i'm looking through it now | 18:18 |
*** SumitNaiksatam has quit IRC | 18:18 | |
clarkb | kk | 18:18 |
mnaser | fungi my little hack - https://github.com/openstack-infra/glean/compare/1.9.1...master | 18:18 |
mnaser | :p | 18:18 |
*** Apoorva has quit IRC | 18:19 | |
fungi | given that 1.9.1 was a revert of a feature from 1.9.0 which is being reintroduced, we probably need to tag 1.10.0 for this | 18:19 |
fungi | TheJulia: ^ that's yours, do you concur with that version numbering choice? | 18:19 |
openstackgerrit | Sam Yaple proposed openstack-infra/project-config master: Add additional LOCI repos https://review.openstack.org/492637 | 18:23 |
*** rbrndt has joined #openstack-infra | 18:27 | |
*** ldnunes has quit IRC | 18:31 | |
*** krtaylor has joined #openstack-infra | 18:32 | |
*** rcernin has joined #openstack-infra | 18:35 | |
clarkb | fungi: that sounds right to me. Its just the systemd detection right? | 18:38 |
fungi | i suppose? i mean that's what ci and code review said | 18:38 |
clarkb | ya arguably that is a bugfix? so bug fix of bug fix could be 1.9.2? | 18:39 |
clarkb | I don't think it mattes too much | 18:39 |
fungi | pabelanger and mordred seemed to think it was an appropriate reimplementation, along with prometheanfire, wznoinsk and sambetts | 18:40 |
clarkb | ya it looks fine I'm just trying to reason about whether or not it is a feature deserving the .10 | 18:40 |
clarkb | or just a .9.2 | 18:40 |
clarkb | I think either way work since it straddles the line | 18:41 |
fungi | the reason i say 1.10.0 is that 1.9.0 introduced "enable network.service with systemd" which 1.9.1 reverted and now we're reintroducing | 18:41 |
clarkb | ah | 18:41 |
fungi | but it's a fuzzy argument, i agree | 18:42 |
clarkb | fungi: I'm not seeing that in the diff mnaser linked (but I may just be blind) | 18:42 |
*** spzala has joined #openstack-infra | 18:42 | |
fungi | hrm, yeah revising my analysis here | 18:43 |
*** Apoorva_ has quit IRC | 18:43 | |
*** ldnunes has joined #openstack-infra | 18:43 | |
*** spzala has quit IRC | 18:43 | |
openstackgerrit | Merged openstack-infra/project-config master: Enable missing "qos" extension driver for Neutron ML2 plugin https://review.openstack.org/492567 | 18:43 |
fungi | oh! "Enable network.service with systemd" merged before we tagged 1.5.0 | 18:44 |
*** spzala has joined #openstack-infra | 18:44 | |
*** Apoorva has joined #openstack-infra | 18:44 | |
fungi | wow that revert was a long time coming | 18:44 |
*** Sukhdev has joined #openstack-infra | 18:44 | |
fungi | i would have thought the revert itself warranted a 1.10.0 in that case | 18:44 |
openstackgerrit | Merged openstack-infra/project-config master: Register watcher-tempest-plugin jobs https://review.openstack.org/490400 | 18:44 |
fungi | and looking at diffs, "Revise systemd determination to verify systemctl presence" is not a reintroduction of that feature | 18:45 |
clarkb | ya I don't think the feature has been added in this delta | 18:45 |
fungi | so now i'm thinking 1.9.1 probably _should_ have been 1.10.0, but water under the bridge now | 18:45 |
fungi | so i'm cool with 1.9.2 as those changes do look like trivial fixes | 18:46 |
clarkb | ah ok now I grok | 18:46 |
AJaeger_ | seanhandley: I did not request you do anything. I left a comment with an explanation that wasn't clear, sorry. Recheck means: Run the tests again. We automatically rebase before running tests, so the new test run would have worked since it automatically rebases ... | 18:46 |
clarkb | fungi: ya I think its purely bug fixes in this one | 18:46 |
fungi | wfm. 1.9.2 it is | 18:46 |
fungi | i should have looked closer at those commit diffs | 18:46 |
*** eranrom has joined #openstack-infra | 18:47 | |
kklimonda | is openstack-infra/system-config a good starting point to deploy a copy of OS CI for a different project? | 18:50 |
fungi | kklimonda: there is an openstack-infra/puppet-openstackci module which we consider to be the main entrypoint for most of that | 18:50 |
fungi | system-config drags in a ton of other stuff like our wiki, etherpad, ethercalc, listserv, codesearch... | 18:51 |
*** psachin has quit IRC | 18:52 | |
fungi | kklimonda: what you likely want to take a look at is https://docs.openstack.org/infra/openstackci/ and then separately set up gerrit (assuming you mean a full ci system including your own gerrit code review server) separately with the puppet-gerrit module, maybe using the openstack_project::gerrit class from system-config as an example | 18:53 |
*** kjackal_ has joined #openstack-infra | 18:53 | |
fungi | you might also consider other options for installing/managing gerrit, as our puppet-gerrit module isn't so great at hands-off bootstrapping a new gerrit deployment from scratch | 18:54 |
fungi | the puppet-gerrit module depends on some separate manual steps anyway | 18:54 |
kklimonda | that's what I'm worried about, i've seen it often enough that the puppet (or config management) drifts off and is unsuitable for deploying from scratch | 18:54 |
fungi | well, in the gerrit case it was never completely suitable for deploying from scratch | 18:55 |
fungi | gerrit's a fairly complex java app running in a jvm which needs interactive setup like initial account creation and pushing in at least minimal configuration to grant permission to your automation | 18:56 |
kklimonda | mhm | 18:56 |
kklimonda | we can handle small things like that for sure | 18:57 |
fungi | my recommendation would be to first experiment with installing gerrit by hand following their instructions, and then look at our system-config documentation about how we configure and operate ours: https://docs.openstack.org/infra/system-config/gerrit.html | 18:57 |
kklimonda | btw, I was going through your system-config (and docs) and you have puppetmaster, but you use ansible for running puppet on nodes. | 18:57 |
fungi | the puppet-gerrit module is not bad for maintaining a running and configured gerrit deployment, but there are some chicken-and-egg/catch-22 issues trying to automate a from-scratch gerrit deployment | 18:58 |
kklimonda | have you repurposed puppetmaster node for some other things (like a central place to run ansible from) or is there something I'm missing | 18:58 |
kklimonda | we've installed gerrit a couple of times | 18:58 |
kklimonda | and we already have gerrit, our current CI is based on an old system-config fork I think, at least partially | 18:58 |
fungi | kklimonda: yes, our puppetmaster server is no longer a puppet master, it's just a place where we centrally host and manage our secrets, and where the cron job that calls ansible lives | 18:59 |
*** rcernin has quit IRC | 18:59 | |
*** sekelso has quit IRC | 19:00 | |
fungi | ansible in turn copies puppet manifests and secrets (in the form of hiera trees) onto individual servers in the inventory and then calls puppet apply locally on them | 19:00 |
fungi | kklimonda: so anyway, if all you want is a ci system then system-config contains a bunch of extra stuff you don't need and also a lot of settings that are very specific to the openstack community's needs which would likely need removing/adjusting to suit your community's | 19:01 |
fungi | we consider system-config be the entrypoint for our entire community infrastructure, not just our ci system | 19:02 |
*** adisky__ has quit IRC | 19:02 | |
kklimonda | mhm, it also contains a lot of battle stories that we'd love to leverage :) | 19:02 |
fungi | no doubt | 19:02 |
kklimonda | (for example going with ansible for running puppet probably had a reason etc.) | 19:02 |
*** portdirect is now known as eteppete | 19:02 | |
fungi | i mean, that's probably the primary reason we make all of this available publicly under a free software license, after all | 19:02 |
*** sekelso has joined #openstack-infra | 19:02 | |
*** jamesdenton has quit IRC | 19:03 | |
fungi | we want other communities to be able to learn from our mistakes and not need to duplicate effort on problems we've already solved, after all | 19:03 |
*** openstackgerrit has quit IRC | 19:03 | |
*** jamesdenton has joined #openstack-infra | 19:03 | |
kklimonda | but I see your point - I'll look into puppet-openstackci module and then start thinking how to leverage it along with system-config to get something of our own :) | 19:03 |
*** eteppete is now known as portdirect | 19:03 | |
fungi | so i guess my point was, since you initially mentioned wanting to duplicate our ci system, system-config is a much larger proposition | 19:04 |
fungi | but if you really want to duplicate other bits of our community infrastructure besides just our ci system, it is indeed a good example | 19:04 |
kklimonda | not for now, you are probably handling mailing lists, irc channels etc. | 19:05 |
fungi | yep | 19:05 |
kklimonda | so your suggestion for the entry point makes more sense | 19:05 |
fungi | we have puppet wrapper classes and modules for doing mailing lists and irc channels, for sure | 19:05 |
*** eranrom has quit IRC | 19:06 | |
fungi | but ultimately, we've organized it the way we have because we need a team of about half a dozen root sysadmins to be able to look after the infrastructure needs of a community of many thousands of developers, operators, users, et cetera | 19:06 |
fungi | and be able to solicit help doing so from random interested members of our community and also from other communities with shared interests | 19:07 |
fungi | kklimonda: we also have some slide decks linked from https://docs.openstack.org/infra/publications/ which may provide some interesting insights (though they're in varying states of up-to-dateness so definitely don't assume all the technical specifics are current) | 19:08 |
*** openstackgerrit has joined #openstack-infra | 19:09 | |
openstackgerrit | Clark Boylan proposed openstack-infra/devstack-gate master: Hack in linux bridge vxlan support https://review.openstack.org/492654 | 19:09 |
clarkb | pabelanger: odyssey4me ^ thats a quick first pass at using linux bridge. I expect that there will be bugs and something won't work | 19:09 |
*** sree has joined #openstack-infra | 19:09 | |
clarkb | we can also refer back to the old gre code to see what might be wrong if thing sbreak | 19:09 |
*** trown|lunch is now known as trown | 19:10 | |
clarkb | pabelanger: odyssey4me feel free to push fixes if you like | 19:10 |
*** eranrom has joined #openstack-infra | 19:11 | |
seanhandley | AJaeger_: Aha ok, that makes sense. Thanks! | 19:12 |
*** baoli has quit IRC | 19:13 | |
*** sree has quit IRC | 19:13 | |
*** portdirect has quit IRC | 19:17 | |
*** portdirect has joined #openstack-infra | 19:17 | |
*** eranrom has quit IRC | 19:21 | |
*** nicolasbock has quit IRC | 19:21 | |
openstackgerrit | Matthew Treinish proposed openstack-infra/subunit2sql master: Add graph for aggregate run time grouped by run metadata https://review.openstack.org/492655 | 19:24 |
mtreinish | fungi: ^^^ if you were curious that's what I was using to generate the graphs before | 19:24 |
fungi | mtreinish: oh, neat! | 19:25 |
openstackgerrit | Matthew Treinish proposed openstack-infra/subunit2sql master: Add api func to get list of unique values for run_metadata key https://review.openstack.org/492656 | 19:25 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Bind secrets to their playbooks https://review.openstack.org/492307 | 19:26 |
*** tnovacik has joined #openstack-infra | 19:27 | |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Remove 'auth' dict from jobs https://review.openstack.org/492309 | 19:28 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Expose final job attribute https://review.openstack.org/479382 | 19:28 |
*** slaweq has joined #openstack-infra | 19:33 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Create publish-openstack-python(-branch)-tarball jobs https://review.openstack.org/491093 | 19:33 |
openstackgerrit | Ramamani Yeleswarapu proposed openstack-infra/devstack-gate master: [TESTING][DO NOT MERGE] Testing TLS in Ironic jobs https://review.openstack.org/492661 | 19:35 |
clarkb | fungi: will you be pushing the glean tag? | 19:35 |
*** xyang1 has joined #openstack-infra | 19:35 | |
*** eranrom has joined #openstack-infra | 19:35 | |
fungi | clarkb: i can, just a sec lemme make sure there's nothing special like reno or release management going on with it | 19:36 |
*** slaweq_ has quit IRC | 19:36 | |
*** sekelso has quit IRC | 19:36 | |
TheJulia | fungi: I do | 19:36 |
fungi | clarkb: looks like you pushed the last tag for it, so seems like our normal process. doing now | 19:36 |
clarkb | ya pretty sure it isn't using relmgmt tooling | 19:37 |
fungi | thanks for checking in, TheJulia! turns out i was mistaking your systemd systemctl detection fix for the other systemd-related thing we'd reverted in 1.9.1 | 19:37 |
*** iyamahat has quit IRC | 19:38 | |
*** iyamahat has joined #openstack-infra | 19:38 | |
fungi | clarkb: easy enough to tell, tags get pushed/signed by our infra release key instead of individuals if release management automation is doing it | 19:38 |
*** sbezverk has joined #openstack-infra | 19:39 | |
fungi | clarkb: wow, you were very detailed with your tag description on 1.9.1 | 19:40 |
fungi | i think i'm just going to let the changelog speak for me with 1.9.2 | 19:40 |
*** baoli has joined #openstack-infra | 19:40 | |
clarkb | :) | 19:40 |
*** baoli has quit IRC | 19:41 | |
fungi | weird. i can't seem to sign tags on my workstation suddenly. gimme a check to troubleshoot | 19:41 |
fungi | "error: gpg failed to sign the data" | 19:41 |
fungi | oh, i bet it's that DISPLAY isn't set and it's trying to use x11 for the askpass | 19:42 |
fungi | nope, that's not it | 19:42 |
fungi | aha | 19:44 |
openstackgerrit | Matt Riedemann proposed openstack-infra/elastic-recheck master: Add query for rebuild timeout cellsv1 bug 1709985 https://review.openstack.org/492665 | 19:44 |
openstack | bug 1709985 in OpenStack Compute (nova) "test_rebuild_server_in_error_state randomly times out waiting for rebuilding instance to be active" [Undecided,New] https://launchpad.net/bugs/1709985 | 19:45 |
clarkb | mriedem: so whats the story on the live migration issues? | 19:45 |
mnaser | if someone has an extra minute (somehow?!) on infra-root, could you check and see if the rsync script is working ok or not? we merged the fix to get the mirrors working again but i dont see anything here - http://mirror.regionone.infracloud-vanilla.openstack.org/fedora | 19:45 |
*** baoli has joined #openstack-infra | 19:45 | |
openstackgerrit | Merged openstack-infra/elastic-recheck master: Add query for same-host cellsv1 fail bug 1709946 https://review.openstack.org/492636 | 19:45 |
openstack | bug 1709946 in OpenStack Compute (nova) "ServersAdminTestJSON.test_create_server_with_scheduling_hint randomly fails SameHostFilter in cells v1 job" [Low,Confirmed] https://launchpad.net/bugs/1709946 | 19:45 |
*** sambetts is now known as sambetts|afk | 19:45 | |
*** tnovacik has quit IRC | 19:45 | |
*** mhickey has joined #openstack-infra | 19:46 | |
clarkb | mriedem: are the two libvirt talking to each other too slowly? | 19:46 |
mriedem | clarkb: which one? | 19:46 |
*** jrist has quit IRC | 19:48 | |
*** rlandy has quit IRC | 19:48 | |
clarkb | mriedem: the citycloud one | 19:49 |
clarkb | wonder what run outlasted interval by 1.03 sec means | 19:49 |
clarkb | *I wonder | 19:49 |
fungi | #status log glean 1.9.2 released to properly support vfat configdrive labels | 19:50 |
openstackstatus | fungi: finished logging | 19:50 |
*** e0ne has joined #openstack-infra | 19:50 | |
mriedem | clarkb: it's a periodic checkin with the service group api | 19:50 |
mriedem | every 10 seconds by default i think | 19:50 |
mriedem | if the service doesn't check in within that time, it's considered down | 19:50 |
mriedem | and you can't schedule to it | 19:50 |
mriedem | which blows up any move operation test like live migration | 19:51 |
fungi | turns out, if you're like me and you keep a short expiration on your signing key but periodically extend the expiration date on it, then other systems where you use that key to sign things need an occasional `gpg --refresh-keys` or they're refuse to keep using it thinking it expired | 19:51 |
clarkb | mriedem: because you need >1 computes to do moves right? | 19:51 |
mriedem | yes | 19:51 |
mriedem | unless you resize to same host | 19:51 |
mriedem | but this isn't that | 19:51 |
openstackgerrit | Sam Yaple proposed openstack-infra/project-config master: Add additional LOCI repos https://review.openstack.org/492637 | 19:51 |
clarkb | mriedem: and that happens over rabbit? | 19:51 |
clarkb | the check in | 19:52 |
mriedem | yeah | 19:53 |
mriedem | well, | 19:53 |
mriedem | i'd have to dig, | 19:53 |
clarkb | looking at http://logs.openstack.org/12/491012/12/check/gate-tempest-dsvm-py35-ubuntu-xenial/2dfbf13/logs/screen-n-cpu.txt it doesn't just happen once either | 19:54 |
*** apetrich has quit IRC | 19:54 | |
clarkb | mriedem: if we can confirm the channel over which that happens we should be able to do a bit more profiling of that specifically within citycloud | 19:55 |
fungi | https://pypi.python.org/pypi/glean has 1.9.2 now, so next image updates should get it. do we want to trigger some now? | 19:55 |
*** apetrich has joined #openstack-infra | 19:55 | |
clarkb | fungi: probably a good idea so that if there are any problems we can delete new images and that won't happen overnight | 19:55 |
mriedem | it's a thread group timer thing here https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/db.py#L53 | 19:55 |
mriedem | but i'd have to dig into it more - this is always confusing | 19:56 |
fungi | clarkb: easiest way to do that? just nodepool dib-image-delete the oldest ones? | 19:56 |
mriedem | https://github.com/openstack/nova/blob/master/nova/service.py#L186 | 19:57 |
clarkb | fungi: I think you want nodepool image-build | 19:57 |
clarkb | and then if there is a problem you dib-image-delete the newer ones | 19:57 |
*** e0ne has quit IRC | 19:57 | |
mriedem | jaypipes: sdague: ^ speaking of things to document at some point, the servicegroup api and how it monitors | 19:58 |
mriedem | i always have to re-learn this | 19:58 |
sdague | mriedem: yeh | 19:58 |
*** tnovacik has joined #openstack-infra | 19:58 | |
jaypipes | mriedem: ack | 19:59 |
fungi | looks like we have three ubuntu-xenial images ready according to nodepool dib-image-list, presumably due to adding raw? | 19:59 |
mriedem | i think once the service starts, it reports into the thread group timer, which runs _report_state every 10 seconds | 19:59 |
mriedem | https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/db.py#L90 | 19:59 |
mriedem | sets a counter and if the save() fails, or times out, we consider it down | 19:59 |
sdague | mriedem: you mean like this - https://github.com/openstack/nova/blob/2d2bf2a26bb49a3a8db9a3dddfef9097aea5739b/doc/source/admin/service-groups.rst#L2 | 19:59 |
clarkb | mnaser: I still see -p in the rsync command for fedora. To make sure I'm looking at the right thing can you point me at the change that was supposed to fix that? | 19:59 |
mriedem | sdague: heh yeah | 20:00 |
clarkb | fungi: the third one may not be uploaded everywhere yes so we keep the last two | 20:00 |
clarkb | fungi: once third is uploaded everywhere we should delete the oldtest one | 20:00 |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Revert "Remove NPM mirror settings" https://review.openstack.org/492666 | 20:00 |
mnaser | clarkb https://review.openstack.org/#/c/492558/ | 20:00 |
pabelanger | clarkb: fungi: ^would be nice to land our NPM mirror revert. Just tested in rackspace and npm install worked as expected | 20:01 |
clarkb | mriedem: sdague ah ok so its doing a database table update | 20:01 |
fungi | hrm, we have opensuse-42.2, opensuse-422 and opensuse-423 images still. we merged changes to drop 422 and presumably 42.2 is cruft as well, should i just dib-image-delete those? | 20:01 |
clarkb | and we know the database can get pretty well loaded up here? also I thought computes couldn't talk to the db... | 20:02 |
pabelanger | ya, opensuse-422 can be removed | 20:02 |
pabelanger | I haven't don't that in nodepool yet | 20:02 |
mriedem | clarkb: it goes through conductor | 20:02 |
pabelanger | fungi: want me to propose a patch for nodepool.yaml? | 20:02 |
fungi | #status log Image builds manually queued for centos-7, debian-jessie, fedora-25, fedora-26, opensuse-423, ubuntu-trusty and ubuntu-xenial to use latest glean (1.9.2) | 20:03 |
openstackstatus | fungi: finished logging | 20:03 |
fungi | pabelanger: i thought we already had approved one | 20:03 |
pabelanger | fungi: no, just removal from JJB | 20:03 |
fungi | pabelanger: yep, i concur. nodepool.yaml on the server still have 422 | 20:04 |
mriedem | clarkb: so it's not the compute service that does this really, i mean it is, kind of | 20:04 |
fungi | though 42.2 probably needs manual cleanup? | 20:04 |
mriedem | it's this on a timer | 20:04 |
mriedem | https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/db.py#L90 | 20:04 |
mriedem | which triggers this https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L614 | 20:04 |
mriedem | where we update the last_seen_up value | 20:04 |
mriedem | which is checked here https://github.com/openstack/nova/blob/master/nova/servicegroup/drivers/db.py#L60 | 20:05 |
mriedem | to tell if it's up or not | 20:05 |
mriedem | based on some window | 20:05 |
clarkb | mnaser: oh its just the one rsync of many we updated /me checks the file again | 20:05 |
fungi | clarkb: https://review.openstack.org/492558 Stop rsync from managing setgid permissions for Fedora Atomic mirror | 20:05 |
mriedem | i guess service_down_time is 60 seconds by default | 20:05 |
mriedem | probably to match the default rpc timeout | 20:06 |
fungi | oh, mnaser already linked it and i missed that | 20:06 |
clarkb | mnaser: cron runs every two hours, change merged at 1800something UTC so its only just now rerunning at 2000UTC with new code | 20:06 |
mnaser | ahh that explains it | 20:06 |
fungi | cron to run rsync, specifically | 20:06 |
mnaser | http://mirror.regionone.infracloud-vanilla.openstack.org/fedora/ | 20:06 |
clarkb | mnaser: so it is running now, you should see things after rsync runs and afs publishes | 20:06 |
mnaser | i see it :D | 20:06 |
clarkb | mriedem: and so that log message means it checked it at 61 seconds? | 20:07 |
mnaser | oh this is awesome, this will help builds so much in magnum | 20:07 |
mnaser | and take a big pressure off network | 20:07 |
fungi | mnaser: now if only heat will review your patch! | 20:07 |
*** baoli has quit IRC | 20:07 | |
fungi | i'm surprised they were still going out to the internet for all that | 20:08 |
mriedem | clarkb: yeah i think so | 20:08 |
mnaser | fungi and given that its fedora 24 which was in there, must have been happening for quite sometime | 20:08 |
fungi | eek | 20:08 |
clarkb | slowly whittling down the list of reliability fixes :) | 20:09 |
mriedem | clarkb: the report interval is 10 seconds by default, | 20:09 |
clarkb | mriedem: gotcha so it has ~6 chances to report in before being marked as bad | 20:09 |
mriedem | which is what i think is the _report_state periodic | 20:09 |
mriedem | yar | 20:09 |
clarkb | that seems reasonable and the wall time also seems more than sufficient | 20:10 |
clarkb | I wonder if packet loss is part of hte problem there | 20:10 |
clarkb | (since I cna't imagine it takes more than a minute to update the database) | 20:10 |
*** jkilpatr has quit IRC | 20:11 | |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Set opensuse-422 min-ready to -1 https://review.openstack.org/492667 | 20:11 |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Remove opensuse-422 from nodepool https://review.openstack.org/492668 | 20:11 |
pabelanger | clarkb: fungi: removal of opensuse-422^ | 20:11 |
*** tnovacik has quit IRC | 20:11 | |
pabelanger | cannot remember if we need -1 on min-ready or if 0 is enough now | 20:11 |
clarkb | mriedem: so nextstep is lets hold a test env and do some testing directly. What job should I be looking at holding? | 20:12 |
clarkb | 0 | 20:12 |
pabelanger | mnaser: cool, let me know if magnum has issue. Would be good to get another project on to AFS mirrors | 20:13 |
clarkb | pabelanger: pretty sure its 0 across the board now then we just remove the images themselves to stop building and delete them (or pause) | 20:13 |
openstackgerrit | Merged openstack-infra/elastic-recheck master: Add query for rebuild timeout cellsv1 bug 1709985 https://review.openstack.org/492665 | 20:13 |
openstack | bug 1709985 in OpenStack Compute (nova) "test_rebuild_server_in_error_state randomly times out waiting for rebuilding instance to be active" [Low,Confirmed] https://launchpad.net/bugs/1709985 | 20:13 |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Set opensuse-422 min-ready to 0 https://review.openstack.org/492667 | 20:14 |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Remove opensuse-422 from nodepool https://review.openstack.org/492668 | 20:14 |
pabelanger | clarkb: updated | 20:14 |
*** mhickey has quit IRC | 20:14 | |
*** funzo has quit IRC | 20:14 | |
fungi | thanks pabelanger | 20:14 |
pabelanger | I check with ianw tonight about doing the same with fedora-25 | 20:15 |
pabelanger | and we just started running jobs for post pipeline | 20:16 |
pabelanger | extra 100 nodes does help :) | 20:16 |
mriedem | clarkb: i saw it in gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial when i reported the bug | 20:17 |
clarkb | pabelanger: fungi did we need dirk to chime in on suse things or should I just go for it? | 20:17 |
clarkb | mriedem: thanks will try to catch one of those on citycloud and hold it | 20:17 |
pabelanger | clarkb: fungi: I confirmed with dirk before removing JJB jobs, so I think we are good to proceed | 20:18 |
clarkb | pabelanger: ok first change is proceeding. Second one has +2's just needs approval when ready | 20:18 |
fungi | yeah, no need to keep old images which no longer have any jobs | 20:18 |
pabelanger | ++ | 20:19 |
*** Hunner has quit IRC | 20:19 | |
*** jaypipes has quit IRC | 20:20 | |
*** bmjen has quit IRC | 20:20 | |
*** Sukhdev has quit IRC | 20:21 | |
openstackgerrit | Sam Yaple proposed openstack-infra/project-config master: Add additional LOCI repos https://review.openstack.org/492637 | 20:22 |
openstackgerrit | Clark Boylan proposed openstack-infra/devstack-gate master: Hack in linux bridge vxlan support https://review.openstack.org/492654 | 20:24 |
*** eranrom has quit IRC | 20:24 | |
dirk | clarkb: pabelanger : I am okay with it. | 20:24 |
*** hamzy has quit IRC | 20:24 | |
*** amotoki has quit IRC | 20:24 | |
dirk | I am just unsure whether the mirroring can be removed as well, I believe dib somehow depends on it | 20:24 |
fungi | i don't think our dib elements actually rely on our mirroring | 20:25 |
dirk | There was recently a switch to set a mirror during testing so that it doesn't pull from the internet | 20:25 |
fungi | oh, you mean changes to dib might have been getting tested on 422? | 20:25 |
clarkb | mriedem: also that isn't a 100% failure in citycloud-lon1 is it? | 20:26 |
mriedem | not 100% failure no | 20:26 |
dirk | fungi: https://review.openstack.org/478443 | 20:26 |
mriedem | when the live migration job fails, and it's with that type of warning, it's 90+% in that node provider though | 20:27 |
clarkb | mriedem: ya I do see that specific region is far more common | 20:27 |
sdague | clarkb: also, it took 56 minutes to clean the node at the end | 20:27 |
mnaser | 2017-08-10 20:24:19.406877 | MAGNUM_GUEST_IMAGE_URL='\''http://mirror.mtl01.internap.openstack.org/fedora/atomic/stable/Fedora-Atomic-26-20170723.0/CloudImages/x86_64/images/Fedora-Atomic-26-20170723.0.x86_64.qcow2'\'' | 20:27 |
mnaser | thanks fungi / pabelanger and everyone else :D | 20:27 |
sdague | http://logs.openstack.org/54/487954/13/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/49d7ac4/console.html#_2017-08-10_19_08_56_206007 | 20:27 |
sdague | sorry, 58 minutes | 20:27 |
sdague | which is indicative of really bad io | 20:28 |
clarkb | huh I wonder if those nodes are on the same host as the mirror was | 20:28 |
clarkb | (maybe this is the downside to using online chat support, they will fix the one problem without doing the rest of it?) | 20:28 |
sdague | yeh, who knows | 20:28 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Bindmount /etc/lsb-release into bubblewrap https://review.openstack.org/490200 | 20:28 |
sdague | clarkb: yeh, any chance of getting a rep to irc lurk? | 20:29 |
sdague | or is that out of scope | 20:29 |
clarkb | not sure | 20:29 |
*** jcoufal has quit IRC | 20:29 | |
clarkb | pabelanger did send them email about possible bad hypervisor and sent instance uuids | 20:30 |
clarkb | pabelanger: we haven't heard back on that irght (I don't see a response at least) | 20:30 |
fungi | we're also still waiting to hear back from them about the random network issues in sto2 | 20:31 |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Add SSH private key for static.o.o https://review.openstack.org/492671 | 20:33 |
pabelanger | jeblair: mordred: ^ our first secret for project-config | 20:33 |
pabelanger | start the painting | 20:33 |
jeblair | pabelanger: that is much larger than i expected! i'm guessing that's the pem-encoded version. | 20:34 |
jeblair | pabelanger: can we encode the secret as binary and then pem-encode it on the other side? | 20:34 |
pabelanger | jeblair: I believe so? that was the output using ./tools/encrypt_secret.py | 20:34 |
fungi | yeah, no strict need to double-encode it | 20:34 |
jeblair | pabelanger: sorry, i mean the *input* was pem encoded | 20:35 |
pabelanger | Oh, yes | 20:35 |
jeblair | which is the normal thing for an ssh key | 20:35 |
openstackgerrit | Clark Boylan proposed openstack-infra/devstack-gate master: Hack in linux bridge vxlan support https://review.openstack.org/492654 | 20:35 |
jeblair | so we'll need to figure out how to translate that to binary, and then how to get ansible to write a pem-encoded version of the binary data (maybe a module)? | 20:35 |
pabelanger | clarkb: I did send the email, I cc both you and fungi | 20:36 |
jeblair | i expect that to be two pxkcs1 blocks if we do that | 20:36 |
fungi | thanks pabelanger! | 20:36 |
clarkb | pabelanger: ya I see the one you sent just wondering if there was a reply that I may not have gotten (because they hit reply instead of reply all for example) | 20:36 |
clarkb | guessing not | 20:36 |
fungi | i'm still catching up on e-mail from this morning, regrettably | 20:36 |
*** sdake is now known as k2so | 20:37 | |
pabelanger | jeblair: I need to step away for next 45mins, feel free to itterate on 492671 if needed | 20:37 |
openstackgerrit | Merged openstack-infra/project-config master: Set opensuse-422 min-ready to 0 https://review.openstack.org/492667 | 20:37 |
clarkb | fungi: and no word from ovh either I take it? | 20:37 |
jeblair | pabelanger: i'm just going to leave some notes; i'm in the middle of other stuff | 20:37 |
fungi | clarkb: not a few hours ago when i last checked, but will let you know shortly | 20:37 |
pabelanger | jeblair: ack | 20:38 |
fungi | it's not entirely clear to me whether tickets i open through the ovh dashboard will get me e-mail replies so i'm trying to watch for the dashboard updating, potential e-mail to infra-root@ and also jean-daniel replying to me directly from my earlier request | 20:39 |
*** jamesdenton has quit IRC | 20:39 | |
*** jamesmcarthur has joined #openstack-infra | 20:40 | |
jeblair | (i also did send an email, and cc'd pierre who was on earlier threads, but have not gotten a personal reply) | 20:41 |
*** jamesdenton has joined #openstack-infra | 20:42 | |
openstackgerrit | Clark Boylan proposed openstack-infra/devstack-gate master: Hack in linux bridge vxlan support https://review.openstack.org/492654 | 20:43 |
fungi | thanks jeblair! | 20:44 |
*** sekelso has joined #openstack-infra | 20:45 | |
*** annegentle has quit IRC | 20:45 | |
*** esberglu has quit IRC | 20:47 | |
*** skelso has joined #openstack-infra | 20:48 | |
mordred | mriedem: anything change recently (last couple of days) wrt console logs from nova in devstack/devstack-gate? | 20:48 |
mordred | mriedem: we just failed a functional test herE: http://logs.openstack.org/48/491248/1/check/gate-shade-functional/d3fe2ef/console.html#_2017-08-10_20_28_34_958225 on an unrelated thing and figured I'd check to see if there's anything you know of off the top of your head before I dig further | 20:49 |
fungi | heads up, http://blog.recurity-labs.com/2017-08-10/scm-vulns (git vulnerable to shell command injection via malicious ssh:// urls) | 20:50 |
*** gouthamr_ has quit IRC | 20:50 | |
*** sekelso has quit IRC | 20:50 | |
fungi | i don't think we need to worry about anything in our infrastructure being impacted (and we'll have updated git installed shortly anyway if it isn't already) but be mindful of your local dev environments | 20:51 |
*** jamesmcarthur has quit IRC | 20:53 | |
*** jamesmcarthur has joined #openstack-infra | 20:53 | |
ianw | pabelanger: https://review.openstack.org/#/c/490331/ is the last outstanding issue for devstack + fedora26 ... any non-devstack f25 jobs happy to switch | 20:55 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Bindmount /etc/lsb-release into bubblewrap https://review.openstack.org/490200 | 20:55 |
mriedem | mordred: what's the actual failure? | 20:56 |
mriedem | i don't see any problems in the n-api or n-cpu logs for req-d90afce6-9ef9-4059-830c-2338ad184483 | 20:56 |
clarkb | pabelanger: odyssey4me latest patchset for using linuxbrdige instead of ovs actually looks pretty good. I can ping all the nodes from all the nodes on the dvr ha job setup | 20:57 |
clarkb | pabelanger: odyssey4me still to be seen if there are any problems with neutron etc running on top of it but that could be a good option | 20:57 |
clarkb | we'll also want to test on centos since its kernel is older iirc | 20:57 |
*** vhosakot has joined #openstack-infra | 20:57 | |
mordred | mriedem: I think it's a poorly written test and is subject to there just not being any actual console log content | 20:58 |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Make get_server_console tests more resilient https://review.openstack.org/492683 | 20:58 |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Remove keystoneclient and ironicclient as direct depends https://review.openstack.org/492684 | 20:58 |
*** jamesmcarthur has quit IRC | 20:58 | |
mordred | mriedem: thanks for looking - I just pushed up arework of the test that should test that shade calls nova correctly and gets a response and doesn't try to test that the guest in nova has produced console log output, which we have no real control over | 20:58 |
mriedem | mordred: http://logs.openstack.org/48/491248/1/check/gate-shade-functional/d3fe2ef/logs/screen-n-cpu.txt.gz#_Aug_10_20_28_24_170095 says there is no console lot | 21:00 |
mriedem | *log | 21:00 |
*** jkilpatr has joined #openstack-infra | 21:01 | |
*** esberglu has joined #openstack-infra | 21:01 | |
mordred | mriedem: cool - so in this case there is just legitmately no console log - and nova returned {'console': ''} appropriately | 21:01 |
*** iyamahat has quit IRC | 21:01 | |
*** trown is now known as trown|outtypewww | 21:02 | |
*** iyamahat has joined #openstack-infra | 21:02 | |
*** esberglu_ has joined #openstack-infra | 21:02 | |
*** jpena|mtg is now known as jpena|off | 21:02 | |
*** esberglu has quit IRC | 21:02 | |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Borrow some internap quota for Zuulv3 https://review.openstack.org/492685 | 21:03 |
*** rhallisey has quit IRC | 21:04 | |
ianw | pabelanger / dmsimard : on the repos ... i'm more than willing to take advice. primarily we've been focused on the devstack case but obviously all input is welcome. | 21:04 |
pabelanger | clarkb: fungi: 492668 can land, no more opensuse-422 nodes running | 21:05 |
clarkb | fungi: did the leaked nodes in vanilla get cleaned up? | 21:05 |
clarkb | pabelanger: and the job against glean etc for opensuse 422 is removed? | 21:05 |
ianw | issues we've had include openvswitch coming from rdo, a couple of python packages, but i'd have to check the logs, and making sure we're using the rhev forward-port for kvm | 21:05 |
fungi | clarkb: there were no leaked nodes in vanilla as it turns out | 21:05 |
clarkb | ianw: https://review.openstack.org/#/c/492654/ is an effort to stop needing ovs in devstack-gate | 21:06 |
fungi | clarkb: eventually the differential reached two entries, both of which nova knew about (one is the mirror and the other is a test instance pabelanger appears to have left there) | 21:06 |
clarkb | ianw: if ^ ends up working we could remove that repo at least from d-eg | 21:06 |
clarkb | *d-g | 21:06 |
pabelanger | clarkb: ya, we should have no jobs using opensuse-422 | 21:06 |
clarkb | fungi: gotcha | 21:06 |
pabelanger | fungi: oh, my instance can likely be deleted if you want | 21:06 |
clarkb | pabelanger: ok I say approve at will then you should have the +2's you need | 21:06 |
dmsimard | ianw: oh btw re: making f26 voting. There was two issues. The first was bindep to get the right python-devel package and the second (still haven't dug into this one) is that it doesn't seem like f26 has a "python3.5" interpreter | 21:07 |
fungi | clarkb: the only other anomaly was the undeletable instance that nova thought was there but virsh on the compute node said didn't actually exist. not sure how to clean that up though | 21:07 |
pabelanger | okay, now I have to step away for 45mins | 21:07 |
clarkb | fungi: fun, I think we hav eone of those in chocolate too | 21:07 |
clarkb | and ya no good ideas on how to clean that up other than manual database munging | 21:07 |
fungi | i like to call it, "openstack mitaka" | 21:08 |
fungi | those instances will go away when we redeploy | 21:08 |
clarkb | + | 21:08 |
clarkb | + | 21:08 |
*** krtaylor has quit IRC | 21:08 | |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Zuulv3: add the gate pipeline https://review.openstack.org/492687 | 21:09 |
*** baoli has joined #openstack-infra | 21:10 | |
mordred | jeblair: \o/ ... btw, on the etherpad, when you do the check/gate swap - perhaps we should include zuul-jobs/openstack-zuul-jobs/zuul-sphinx in that too? | 21:10 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add Zuul to gate pipeline https://review.openstack.org/492689 | 21:10 |
mnaser | fungi clarkb wouldnt a nova delete just do a noop delete on those undeletable instances? | 21:11 |
*** aeng has joined #openstack-infra | 21:12 | |
jeblair | mordred: seems reasonable | 21:12 |
*** Goneri has quit IRC | 21:13 | |
clarkb | mnaser: I think nodepool is trying to nova delete them in a loop every few minutes | 21:13 |
clarkb | mnaser: so pretty sure that it isn't working | 21:13 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Add zuul-jobs to gate pipeline https://review.openstack.org/492691 | 21:14 |
*** mriedem has left #openstack-infra | 21:14 | |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Make get_server_console tests more resilient https://review.openstack.org/492683 | 21:14 |
*** mriedem has joined #openstack-infra | 21:14 | |
fungi | mnaser: clarkb: also i manually tried to nova delete and the node remained in "active" state | 21:14 |
fungi | not even error | 21:15 |
fungi | even as an admin user | 21:15 |
*** funzo has joined #openstack-infra | 21:15 | |
mnaser | is it possible that the hypervisor of that vm is no longer alive? | 21:15 |
fungi | i did not look at the nova service logs however | 21:15 |
*** ldnunes has quit IRC | 21:15 | |
fungi | mnaser: the compute node nova show (as admin) claimed it belonged to is the one where i ran teh virsh show --all or whatever | 21:16 |
openstackgerrit | James E. Blair proposed openstack-infra/openstack-zuul-jobs master: Add openstack-zuul-jobs to gate pipeline https://review.openstack.org/492692 | 21:16 |
mnaser | cause that could happen if n-cpu is crashed and not responding to rabbitmq | 21:16 |
mnaser | so stuff like that gets queue'd up and never executed by it | 21:16 |
fungi | oh, maybe, except that compute node was booting and deleting other instances | 21:16 |
fungi | so seemed to be working otherwise | 21:17 |
openstackgerrit | Merged openstack-infra/project-config master: Remove opensuse-422 from nodepool https://review.openstack.org/492668 | 21:17 |
mnaser | waits | 21:17 |
jeblair | mordred: maybe not zuul-sphinx; it's not in zuulv3 at all yet | 21:17 |
mnaser | so it doesnt even go in deleting state? | 21:17 |
mordred | jeblair: ah. well, yeah - we certainly shouldn't enable it | 21:18 |
mriedem | clarkb: fyi, it's not just live migration jobs | 21:19 |
mriedem | gate-tempest-dsvm-neutron-full-ubuntu-xenial: https://bugs.launchpad.net/bugs/1709506 | 21:19 |
openstack | Launchpad bug 1709506 in OpenStack-Gate "Random live migration failures due to ComputeServiceUnavailable in citycloud-lon1 nodes" [Undecided,Confirmed] | 21:19 |
mriedem | e-r just commented on that | 21:19 |
*** funzo has quit IRC | 21:20 | |
openstackgerrit | Merged openstack-infra/tripleo-ci master: Replace references to deprecated controllerExtraConfig https://review.openstack.org/480395 | 21:20 |
clarkb | mriedem: ok, maybe you want to update the bug title as that job won't run live migrations? something like "Nova compute heartbeats are slow and nova marks computes as offline"? | 21:21 |
openstackgerrit | Ryan proposed openstack-infra/bindep master: Add ability to list all deps https://review.openstack.org/492693 | 21:21 |
clarkb | mriedem: also that implies it isn't a networking problem because single node tests won't heartbeat over network | 21:21 |
clarkb | mriedem: which puts weight behind the poor disk io on a hypervisor theory | 21:21 |
mriedem | done | 21:22 |
*** rybridges has joined #openstack-infra | 21:23 | |
*** hrubi has quit IRC | 21:23 | |
rybridges | Hello! I have a review up here for bindep. Please take a look at your convenience -> https://review.openstack.org/#/c/492693/ | 21:23 |
rybridges | Thanks! | 21:23 |
*** gouthamr has joined #openstack-infra | 21:24 | |
clarkb | mriedem: thinking about that we could add an fio run to pull general performance data on io maybe put that in devstacks world dump? though that only happens if devstack fails | 21:24 |
*** hrubi has joined #openstack-infra | 21:24 | |
*** thorst has quit IRC | 21:24 | |
sdague | clarkb: is there enough date in dstat? | 21:25 |
sdague | if io is bad, there should be a lot of wait time, right? | 21:25 |
clarkb | sdague: ya, though I half expect we'd see lots of wait time in general. But we can look at the data we have and see | 21:25 |
sdague | http://logs.openstack.org/54/487954/13/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/49d7ac4/logs/screen-dstat.txt.gz | 21:26 |
sdague | I'm seeing a 19.2 load at one poin | 21:26 |
*** thorst has joined #openstack-infra | 21:27 | |
clarkb | rybridges: any reason you couldn't just bindep | sed or awk to do that? | 21:27 |
sdague | yeh, it's regularly going 50 - 60% wait | 21:27 |
*** Sukhdev has joined #openstack-infra | 21:27 | |
fungi | clarkb: i think the need is to have bindep output even not-missing dependencies | 21:27 |
fungi | e.g. list all dependencies whether they're installed or not | 21:28 |
sdague | 94% wait at one point | 21:28 |
clarkb | fungi: oh I thoguht it did that if you left off the -b | 21:28 |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Zuulv3: Remove check/gate jobs from zuul and friends https://review.openstack.org/492697 | 21:28 |
sdague | reight before the load 19.2 | 21:28 |
rybridges | Perhaps my knowledge of bindep is what is holding me back. From what I could tell, there is no way to get bindep to print all of the required packages regardless of whether or not they are already installed on the system | 21:28 |
sdague | so io is definitely not good | 21:28 |
rybridges | If there was some way to pull out the list of all required packages, then sure i could sed/awk it into the format that I want | 21:28 |
sdague | those are far outside our norms | 21:29 |
rybridges | but whenever i run bindep, it only shows missing packages, which is not useful for me when I am trying to build rpms with concrete dependency lists | 21:29 |
sdague | load average of 10 was kind of our danger mark, and why we stopped doing scenario tests in parallel | 21:29 |
*** annegentle has joined #openstack-infra | 21:29 | |
sdague | also why its cpucount / 2 | 21:29 |
rybridges | If you leave off the -b it just prints the missing packages in a slightly different format than it does with the -b flag on | 21:30 |
fungi | clarkb: rybridges: right, even without -b you still only get the list of missing packages currently (just in a more verbose form), so it seems like a reasonable feature addition | 21:30 |
clarkb | sdague: ya it tended to stay under 8 when I was doing testing to see if 3/4 was sane | 21:30 |
clarkb | sdague: so 19 is quite high | 21:30 |
clarkb | fungi: gotcha | 21:30 |
clarkb | fungi: rybridges in that case I would list them like the other outputs list them (line by line iirc) and not assume formating | 21:30 |
clarkb | then it is easy to add in formatting if necessary using sed or whatever | 21:31 |
*** bmjen has joined #openstack-infra | 21:31 | |
*** thorst has quit IRC | 21:31 | |
rybridges | clarkb: I am printing it as a csv which is a very standard format that people can parse themselves however they like just as easily. and it plays nice with my particular use case which is putting the list of dependencies directly into an RPM .spec file. Furthermore, we are all python people here. We have a native csv library for python which makes dealing with the output very easy and flexible | 21:33 |
*** Hunner has joined #openstack-infra | 21:33 | |
*** Hunner has quit IRC | 21:33 | |
*** Hunner has joined #openstack-infra | 21:33 | |
mordred | clarkb, fungi, rybridges: maybe two features - a "list all" flag, and a "output in csv format" flag | 21:33 |
fungi | yeah, one-line-per-package would make sense if building dependency lists for an rpm specfile or debian/control file is the primary use case | 21:33 |
mordred | that way we have the new feature "I want all packages" in a form consumable byanyone who is using the current output - and we can also give the comma-separated option to people who are using the '-b' option too, should they desire it | 21:34 |
*** Hypercube32 has joined #openstack-infra | 21:34 | |
mnaser | rybridges is it.. normal to download an image at 700Kb/s from a region mirror? | 21:36 |
clarkb | ya Ithink providing both would be fine. I just worry that we'd have multiple output formats that differ between what output you want | 21:36 |
clarkb | where the format should be separate from what output you want | 21:36 |
mnaser | sorry about that extra highlight rybridges -- getting tired, no idea why you were in my chatbox :x | 21:36 |
clarkb | mnaser: if caches are stale yes | 21:36 |
mnaser | ahh that might explain it then | 21:36 |
rybridges | heh no worries mnaser | 21:36 |
mnaser | this was the first job to use it | 21:37 |
clarkb | mnaser: afs in particular invalidates its entire cache whenever afs volumes are published | 21:37 |
jeblair | fungi: would you mind okaying 429685? | 21:37 |
*** iyamahat has quit IRC | 21:37 | |
clarkb | and if there is large geogrpahic distance between the mirror and the afs volume itslef that is painful | 21:37 |
*** spzala has quit IRC | 21:37 | |
clarkb | mnaser: as for reverse apache proxy I'd imagine thats what the download speed is for whatever backend item is being pulled and isn't cached | 21:37 |
*** spzala has joined #openstack-infra | 21:38 | |
mnaser | 17 minutes to download the image, job took 1h10m .. so the k8s job should be around 50 minutes which is a huge improvement over the 1h40 or so it took before | 21:38 |
*** baoli has quit IRC | 21:38 | |
mnaser | exciting | 21:38 |
fungi | jeblair: looking | 21:38 |
*** spzala has quit IRC | 21:38 | |
*** baoli has joined #openstack-infra | 21:39 | |
jeblair | clarkb, mnaser: as long as the image isn't changed, it should be faster after the cache warms up, even with subsequent volume releases (in that case, it only needs to do a roundtrip to stat the file) | 21:39 |
*** spzala has joined #openstack-infra | 21:39 | |
clarkb | jeblair: oh right its the metadata, with small pypi packages that ends up being a large chunk of time but I imagine for big ataomic fedora images it isn't | 21:39 |
jeblair | clarkb: yeah, that's what i'd expect | 21:40 |
mnaser | jeblair oh we'll be using the same image and bumping it only when we need, so a stat is nothing compared to the pain we'd deal with | 21:40 |
fungi | jeblair: that's a openstack/openstack-ansible-openstack_openrc change, is that the number you meant? | 21:40 |
mnaser | this reminds me of back when i had to run a glusterfs cluster | 21:40 |
jeblair | fungi: nope | 21:40 |
mnaser | it did a stat on every access across the entire cluster | 21:40 |
jeblair | fungi: how about 492685 ? :) | 21:40 |
fungi | let's see | 21:40 |
fungi | jeblair: done, seems prudent to keep work on that flowing | 21:41 |
jeblair | mnaser: afs's forward cache invalidation is nice there -- we only have to do the stat once after a volume release (which we're doing no more often than every 2 hours) | 21:41 |
openstackgerrit | Ryan proposed openstack-infra/bindep master: Add ability to list all deps https://review.openstack.org/492693 | 21:42 |
mnaser | jeblair seems much more reasonable for low write patterns (which is really the case of mirrors) | 21:42 |
*** jtomasek has joined #openstack-infra | 21:43 | |
*** spzala has quit IRC | 21:44 | |
jeblair | mnaser: yeah, i think the biggest disappointment is the pypi case where we release very frequently and also have tons of small files. other uses still seem to be holding up well. | 21:44 |
mnaser | jeblair depending on how nice pypi is with cache-control it could be a more suitable proxy caching case | 21:44 |
fungi | distro package mirrors, or original use case, seem to continue to be a good fit | 21:44 |
fungi | s/or original/our original/ | 21:45 |
clarkb | mnaser: it doens't end up working so well for that at least not out of the box with a naive implementation | 21:45 |
clarkb | mnaser: we tried it for a few days and pretty quiickly ran into "lib released we want it now its not there because cache" | 21:45 |
mnaser | clarkb i guess it would need a bit more investment of time .. such as no caching on indexes but cache the .tar.gz files or something only | 21:46 |
mnaser | im sure its a lot more complicated than that | 21:46 |
jeblair | fungi, clarkb: more fun can be had with https://review.openstack.org/492687 | 21:46 |
*** bobh has quit IRC | 21:47 | |
clarkb | mnaser: ya indexes is the biggest thing, but then you ar emore susceptible to failures because you ar egrabbing every index | 21:47 |
clarkb | mnaser: definitelyworth fiddling more with after the release | 21:47 |
*** jtomasek has quit IRC | 21:47 | |
mnaser | clarkb eek true | 21:48 |
*** bobh has joined #openstack-infra | 21:48 | |
clarkb | ianw: were you wanting to restart hte rax-ord mirror today? | 21:49 |
*** krtaylor has joined #openstack-infra | 21:49 | |
*** thorst has joined #openstack-infra | 21:49 | |
ianw | clarkb: i can, if we agree it's worth a try | 21:49 |
ianw | the pypi mismatch errors do seem to be largely isolated to it | 21:50 |
clarkb | jeblair: left a comment but +2'd (did not approve in case that is something yo uwant to change) | 21:50 |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Zuulv3: Add project-config to check pipeline https://review.openstack.org/492700 | 21:50 |
clarkb | ianw: ya I think it likely is owrth a short | 21:50 |
clarkb | wow I can't type | 21:50 |
openstackgerrit | Merged openstack-infra/project-config master: Borrow some internap quota for Zuulv3 https://review.openstack.org/492685 | 21:51 |
jeblair | clarkb: yes, it would compete, but there's a stack of changes to pull zuul and 2 other repos out of zuulv2 so only zuulv3 will gate them. so we'll still only have one of the zuuls gating each repo. | 21:51 |
jeblair | clarkb: (child of that change and its dependencies) | 21:52 |
*** bobh has quit IRC | 21:53 | |
*** xyang1 has quit IRC | 21:54 | |
*** thorst has quit IRC | 21:54 | |
openstackgerrit | Merged openstack-infra/project-config master: Add Gnocchi charm and associated interfaces https://review.openstack.org/489946 | 21:55 |
fungi | i guess as long as the repos form a closed set from a shared gate queue perspective, there should be no issues | 21:56 |
*** Sukhdev has quit IRC | 21:57 | |
pabelanger | clarkb: I think we are getting IO issues in citycloud-lon1 | 21:58 |
pabelanger | http://paste.openstack.org/show/618108/ | 21:58 |
pabelanger | more then 1.5 hours to chown /opt/git | 21:58 |
clarkb | pabelanger: ya the dstat sdague linked to seemed to agree | 21:58 |
clarkb | pabelanger: I'm guessing it sa bad hypervisor and related to your original email though | 21:59 |
pabelanger | ya, think so | 21:59 |
*** baoli has quit IRC | 22:01 | |
pabelanger | clarkb: fungi: sdague: did we want to consider restoring 492493 | 22:01 |
pabelanger | mirror.lon1 is back up to 10 load also | 22:02 |
clarkb | pabelanger: well the commit message would need rewriting (the mirror is fine aiui) | 22:02 |
clarkb | oh huh | 22:02 |
fungi | right, if there's an overloaded hypervisor there, then the theory that we got the mirror migrated to a better compute node but are still getting random instances scheduled onto the bad one seems reasonable | 22:02 |
*** tesseract has quit IRC | 22:02 | |
pabelanger | and seems to be having IO issues | 22:02 |
fungi | oh, really we have slow performance on the mirror instance again too? | 22:02 |
fungi | i wonder if we're dos'ing their cloud :/ | 22:03 |
pabelanger | wait | 22:03 |
clarkb | its all in wai too | 22:03 |
pabelanger | htcacheclean has multiple processes agan | 22:03 |
pabelanger | maybe puppet hasn't ran there | 22:03 |
pabelanger | but, IO is slow on the mirror | 22:03 |
clarkb | I don't see any flocks | 22:04 |
pabelanger | hostkey on mirror.lon1.citycloud.openstack.org changed, so puppet hasn't connected | 22:04 |
pabelanger | clarkb: fungi: is that expected if we migrated the VM? | 22:05 |
fungi | shouldn't be, no | 22:05 |
clarkb | no, possibly it wasn't done before (though I thought I got all of them) | 22:05 |
*** Apoorva_ has joined #openstack-infra | 22:05 | |
*** baoli has joined #openstack-infra | 22:05 | |
clarkb | also did it get the proxy updates? | 22:06 |
clarkb | if it got proxy updates it should have had working puppet at one time | 22:06 |
*** priteau has quit IRC | 22:07 | |
pabelanger | 2017-08-10 07:30:30,027 p=21556 u=root | mirror.lon1.citycloud.openstack.org : ok=5 changed=1 unreachable=1 failed=0 | 22:07 |
pabelanger | first time is started failing | 22:07 |
clarkb | it has the proxy config | 22:07 |
clarkb | so it did update recently if the key did indeed change | 22:07 |
clarkb | pabelanger: did you test ssh? could just be that server timed out making the ssh connection? | 22:08 |
pabelanger | clarkb: ya, I can SSH into it | 22:08 |
pabelanger | just need to accept new host key | 22:08 |
pabelanger | but, not sure why it would have changed | 22:08 |
fungi | ianw: is that around when the server migration happened? | 22:08 |
clarkb | I mean from the puppetmaster | 22:08 |
clarkb | just want to make sure that puppet master does see a new key and this wasn't related to the io problems | 22:09 |
pabelanger | clarkb: yes, I can hit it from puppet master | 22:09 |
*** Apoorva has quit IRC | 22:09 | |
*** Sukhdev has joined #openstack-infra | 22:09 | |
pabelanger | http://paste.openstack.org/show/618110/ | 22:09 |
clarkb | pabelanger: and did it ask toconfirm a new key there? | 22:09 |
pabelanger | clarkb: yup | 22:09 |
pabelanger | it wants to remove old and accept new | 22:10 |
ianw | is 07:30 utc? | 22:10 |
clarkb | fun so ya definitely changed | 22:10 |
pabelanger | ianw: ya | 22:10 |
ianw | migration and reboots happened ~ 10 hours ago | 22:10 |
pabelanger | /etc/ssh have new timestamps on our host keys | 22:11 |
pabelanger | let me see what changed them | 22:11 |
mordred | I think depending on how the migration is implemented, it can be seen by cloud-init as a boot of a new server from an image snapshot of the server - and thus cloud-init would generate a new host key | 22:12 |
pabelanger | ya, it is cloud-init | 22:12 |
pabelanger | Aug 10 12:16:56 mirror [CLOUDINIT] util.py[DEBUG]: Running command ['/usr/lib/cloud-init/write-ssh-key-fingerprints', '', 'ssh-dss'] with allowed return codes [0] (shell=False, capture=True) | 22:13 |
fungi | oh, so not a live migration at all | 22:13 |
pabelanger | mordred comment appears to be correct | 22:13 |
fungi | just happened to boot from a snapshot of the old instance and keep the same ip addresses or something? | 22:13 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Bindmount /etc/lsb-release into bubblewrap https://review.openstack.org/490200 | 22:13 |
mordred | yah - certainly seems more like a snap/boot | 22:13 |
mordred | fungi: yah | 22:13 |
fungi | k | 22:13 |
fungi | then the changing host key isn't a complete surprise at least. that's good | 22:14 |
clarkb | step 0 is probably get htcacheclean ing working as expected then we can compare again? | 22:14 |
pabelanger | we are confortable to accepting new hostkeys then? | 22:14 |
clarkb | pabelanger: yes I think so | 22:14 |
pabelanger | k | 22:14 |
clarkb | if the large wai comes back after that is fixed then we should consider restoring sdague's change | 22:14 |
*** priteau has joined #openstack-infra | 22:15 | |
pabelanger | going to kick.sh mirror.lon1.citycloud.openstack.org to confirm puppet runs | 22:15 |
pabelanger | mordred: we should consider using the fact-cache for ansible on puppetmaster.o.o too. shave a few seconds on each time we loop our ansible-playbook commands | 22:17 |
*** openstackgerrit has quit IRC | 22:18 | |
pabelanger | clarkb: fungi: ianw: I also confirmed we start htcacheclean from by default with apache2 service, to clean our mod_cache_dir default directory. Will propose a patch to disable that too, since we are not using it | 22:19 |
ianw | dmsimard: on the python3.5 ... yeah fedora26 comes with python3.6 | 22:19 |
*** priteau has quit IRC | 22:20 | |
clarkb | dmsimard: ianw where are you having trouble with that? is it devstack? It hardcodes the version of python3 maybe we should just let it run whatever python3 is present? | 22:20 |
ianw | i haven't quite got to devstack+python3+fedora26 yet, but it's on my list :) dmsimard just mentioned it before | 22:21 |
*** openstackgerrit has joined #openstack-infra | 22:21 | |
openstackgerrit | Merged openstack-infra/project-config master: Zuulv3: add the gate pipeline https://review.openstack.org/492687 | 22:21 |
pabelanger | Ya, I think this mirror might need to be migrated or rebuilt, we are at 13+ load with puppetmaster connected via ansible | 22:22 |
pabelanger | some IO issues for sure | 22:22 |
clarkb | pabelanger: well it is running a whole bunch of processes trying to stat disk right? | 22:23 |
*** slaweq has quit IRC | 22:23 | |
clarkb | pabelanger: the apache run htcacheclean should be fine since its for a dir we don't use | 22:23 |
clarkb | (basically lets get it to a known good state then evaluate the io problems) | 22:24 |
ianw | afs_background is writing a couple hundred kb/s on it, but not much else | 22:26 |
*** skelso has quit IRC | 22:27 | |
ianw | watching iotop it's actually more like 1mb/s all up and pretty constant from afs. it doesn't seem like much, but i'm not sure what to expect | 22:29 |
jeblair | ianw: that may be afs saving data to its cache as fast as it is able to stream it from the server? | 22:30 |
openstackgerrit | Ramamani Yeleswarapu proposed openstack-infra/project-config master: Enable TLS in ironic gate jobs except grenade https://review.openstack.org/492231 | 22:33 |
*** EricGonc_ has quit IRC | 22:33 | |
ianw | yeah, i'd say ... is there a proc node for it or something | 22:35 |
ianw | even still, if it can't keep up a 1mb/s, it's not going to be having much fun | 22:35 |
*** felipemonteiro has quit IRC | 22:36 | |
openstackgerrit | Monty Taylor proposed openstack-infra/shade master: Make get_server_console tests more resilient https://review.openstack.org/492683 | 22:37 |
clarkb | ianw: ya | 22:37 |
jeblair | clarkb: can you ack https://review.openstack.org/492697 please? | 22:42 |
clarkb | jeblair: probably worth warning project-config reviewers that layout checks against infra repos will basically be skipped | 22:43 |
clarkb | (thinking AJaeger_ in particular) | 22:44 |
jeblair | clarkb: yes, i'll send email | 22:44 |
fungi | AJaeger_: is travelling this week so yes e-mail will be good | 22:44 |
pabelanger | Aug 10 22:43:42 mirror puppet-user[5313]: (/Stage[main]/Openstack_project::Mirror/Cron[apache-cache-cleanup]/command) command changed 'htcacheclean -n -p /var/cache/apache2/proxy -t -l 81920M > /dev/null' to 'flock -n /var/run/htcacheclean.lock htcacheclean -n -p /var/cache/apache2/proxy -t -l 81920M > /dev/null' | 22:44 |
fungi | also, not sure whether anybody else noticed (possible i'm the only one here who cares anyway?) but nist's sp 800-63-3 update officially drops the recommendation to periodically change passwords | 22:45 |
clarkb | also this shouldn't affect depends on | 22:45 |
pabelanger | ianw: fungi: do either of you mind looking at https://review.openstack.org/492666/ to bring .npmrc back online | 22:45 |
mordred | fungi: neat | 22:46 |
jeblair | fungi: oh i didn't notice that. i had heard about all the other good stuff (like don't require weird chars, allow long passphrases, allow copy/paste, etc) | 22:46 |
clarkb | jeblair: only other thing I can think of is how worried are you about wedging zuul (are things in flux enough to make that a big concern?) | 22:46 |
clarkb | I guess worst cas eyou apply fix directly then have that gate the fix | 22:47 |
clarkb | (if that makes sense | 22:47 |
fungi | yeah, all it took was decades of security researchers complaining that forcing periodic passwords changes did more harm than good | 22:47 |
*** esberglu_ has quit IRC | 22:47 | |
jeblair | clarkb: it is possible, maybe even likely. but i think it's worth going ahead and exercising it a bit more, and maybe occasionally we have to force push or fix as you suggest. | 22:48 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Add openstack-zuul-jobs to gate pipeline https://review.openstack.org/492692 | 22:48 |
*** annegentle has quit IRC | 22:48 | |
mordred | fungi: now how long will it take for our of date policies to catch up with the new recommendations? | 22:48 |
fungi | heh | 22:49 |
*** annegentle has joined #openstack-infra | 22:49 | |
clarkb | by the time we get there nist will have decided rotations are a good thing again | 22:49 |
fungi | mordred: i'm at least glad i pushed keystone hard to do pbkdf2 when they changed out their password hash backend | 22:49 |
fungi | it's one of the couple of key derivations mentioned as recommended now | 22:49 |
fungi | (the other being balloon) | 22:50 |
*** yamamoto has joined #openstack-infra | 22:51 | |
fungi | as for hamstringing/confusing zuul, we've just caught up on the job backlog in the past couple hours, so at least as long as we keep an eye on it and notice sudden issues i don't think we're likely to cause significant adverse impact unless we allow it to persist until tomorrow | 22:51 |
clarkb | fungi: oh I meant for zuul gating with v3 zuul | 22:52 |
clarkb | I don't expect there will be problems for everyone else | 22:52 |
fungi | oh, got it. thought maybe you were concerned about corner cases in v2 reconfiguration | 22:52 |
fungi | unknown unknowns | 22:52 |
pabelanger | odyssey4me: just seen some traffic to images.linuxcontainers reverse proxy cache on a mirror, looks to be working | 22:53 |
*** annegentle has quit IRC | 22:53 | |
jeblair | clarkb: email sent, thanks | 22:54 |
*** spzala has joined #openstack-infra | 22:55 | |
*** spzala has quit IRC | 22:55 | |
*** spzala has joined #openstack-infra | 22:55 | |
*** spzala has quit IRC | 22:55 | |
*** spzala has joined #openstack-infra | 22:56 | |
*** spzala has quit IRC | 22:56 | |
*** spzala has joined #openstack-infra | 22:56 | |
*** spzala has quit IRC | 22:56 | |
openstackgerrit | Merged openstack-infra/project-config master: Revert "Remove NPM mirror settings" https://review.openstack.org/492666 | 22:57 |
*** spzala has joined #openstack-infra | 22:57 | |
*** spzala has quit IRC | 22:57 | |
*** spzala has joined #openstack-infra | 22:57 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add zuul-jobs to gate pipeline https://review.openstack.org/492691 | 22:57 |
*** spzala has quit IRC | 22:58 | |
*** vhosakot has quit IRC | 22:58 | |
*** vhosakot has joined #openstack-infra | 22:59 | |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Add SSH private key for static.o.o https://review.openstack.org/492671 | 22:59 |
mordred | jeblair: 492697 has 3 +2s already - so you can pull the trigger whenever | 23:00 |
*** abelur_ has joined #openstack-infra | 23:01 | |
jeblair | mordred: ya was going to wait until the final move lands | 23:01 |
mordred | ++ | 23:01 |
jeblair | mordred, pabelanger: while we're thinking about it, https://review.openstack.org/492700 would be really good | 23:02 |
pabelanger | ya, +2 | 23:02 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add Zuul to gate pipeline https://review.openstack.org/492689 | 23:03 |
clarkb | odyssey4me: pabelanger linux bridge + vxlan is being tested with a neutron change (they have a few jobs that rely on this overlay setup) I've also got one more improvement (to make the setup more symmetrical on the test nodes) that I will push once we have first round of results from neutron jobs | 23:03 |
pabelanger | clarkb: cool, sounds promissing | 23:04 |
*** annegentle has joined #openstack-infra | 23:05 | |
*** rbrndt has quit IRC | 23:05 | |
*** Swami has quit IRC | 23:05 | |
openstackgerrit | Merged openstack-infra/shade master: Make QoS rules required parameters to be not optional https://review.openstack.org/491033 | 23:06 |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Fix typo in nodepoolv3 config https://review.openstack.org/492706 | 23:06 |
jeblair | mordred, pabelanger: ^ whoops | 23:06 |
clarkb | pabelanger: ya if this ends up not making existing jobs unhappy I think we can likely merge it post releaes then update our images to remove rdo repo | 23:06 |
fungi | that'll be a huuuuge improvement | 23:08 |
clarkb | pabelanger: do you know if any of the puppet or tripleo jobs use this overlay stuff too? I could depends on in changes to them to double check them (but probably won't worry about that until I know neutron is happy) | 23:08 |
clarkb | also 3.8 seems to be the magic kernel that you need | 23:08 |
clarkb | and centos is 3.10 so hopeful it will work there too | 23:09 |
jeblair | i manually made that change on nl01 | 23:09 |
*** lihi has quit IRC | 23:10 | |
*** dimak has quit IRC | 23:10 | |
*** dimak has joined #openstack-infra | 23:11 | |
*** pbourke has quit IRC | 23:11 | |
*** xarses_ has quit IRC | 23:11 | |
*** lihi has joined #openstack-infra | 23:11 | |
*** gouthamr has quit IRC | 23:12 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 23:12 |
*** pbourke has joined #openstack-infra | 23:13 | |
*** annegentle has quit IRC | 23:13 | |
openstackgerrit | Merged openstack-infra/project-config master: Zuulv3: Remove check/gate jobs from zuul and friends https://review.openstack.org/492697 | 23:14 |
jeblair | that's a milestone ^ :) zuul v3 has moved from "self-hosting" to "self-gating" | 23:17 |
*** funzo has joined #openstack-infra | 23:17 | |
jeblair | (we hope; i guess we'll know when we approve the next change ;) | 23:17 |
mordred | jeblair: \o/ | 23:17 |
mordred | jeblair: that's what I'm looking forward to ... | 23:17 |
openstackgerrit | Merged openstack-infra/project-config master: Zuulv3: Add project-config to check pipeline https://review.openstack.org/492700 | 23:17 |
pabelanger | ianw: clarkb: sign, glean is running after rc.local on centos-7. So /etc/resolv.conf is not pointing to nameserver 127.0.0.1 | 23:18 |
pabelanger | ianw: clarkb: all centos-7 nodes in infracloud just ping to 8.8.8.8 | 23:19 |
pabelanger | point* | 23:19 |
clarkb | pabelanger: huh, I don't think anything in the diff should've affected that right? | 23:19 |
clarkb | pabelanger: we aren't on fire with that though right? its working just not as we intend? | 23:19 |
pabelanger | glean==1.9.1 | 23:20 |
pabelanger | ya, this is an old image | 23:20 |
clarkb | ah ok at least it isn't a new regression from 1.9.2 | 23:20 |
pabelanger | clarkb: right, just seen a job fail on DNS in infracloud-vanilla. First time I see that | 23:20 |
*** funzo has quit IRC | 23:21 | |
pabelanger | clarkb: we are getting 8.8.8.8 from config-drive, so we should be able to just change it to 127.0.0.1 for now right? | 23:21 |
pabelanger | then work on patch to glean for configfile support and disable DNS updates | 23:21 |
*** hongbin has quit IRC | 23:21 | |
clarkb | pabelanger: ya thats a network setting in the cloud. Thats an interesting hack to make it do what we want. | 23:22 |
ianw | i think i'm missing some context; did this just start happening? | 23:22 |
pabelanger | ianw: not sure why it started, but I just seen a job in infracloud-vanilla fail Could not resolve host: mirror.regionone.infracloud-vanilla.openstack.org | 23:23 |
clarkb | ianw: no I don't think it just started, guessing its just been noticed bceause otherwise google dns usually works | 23:23 |
pabelanger | http://logs.openstack.org/63/491463/5/gate/gate-tripleo-ci-centos-7-scenario001-multinode-oooq-puppet/5055004/logs/undercloud/home/jenkins/undercloud_install.log.txt.gz | 23:23 |
pabelanger | I confirmed ubuntu-xenial is okay | 23:23 |
*** slaweq has joined #openstack-infra | 23:23 | |
clarkb | (but I could be wrong maybe something in centos packaging bumped systemd ordering recently?) | 23:23 |
pabelanger | but I manually booted a node and confirmed glean is running after rc.local service | 23:23 |
pabelanger | clarkb: that is possible | 23:24 |
openstackgerrit | Merged openstack-infra/project-config master: Fix typo in nodepoolv3 config https://review.openstack.org/492706 | 23:26 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 23:27 |
pabelanger | clarkb: okay, maybe tomorrow I'll try creating a new subnet in infracloud-vanilla with 127.0.0.1 as the dns server | 23:27 |
clarkb | that may be something you can update in existing subnets? not sure | 23:28 |
ianw | rc.local is just "After=network.target" | 23:28 |
pabelanger | clarkb: ya, we can. Wasn't sure if we wanted to do it live | 23:28 |
*** slaweq has quit IRC | 23:28 | |
clarkb | pabelanger: if we double check the nodes boot and get an unbound running with the right config and with the right resolvers and listening on localhost I think that is relatively safe | 23:29 |
pabelanger | clarkb: okay, want me to update it now? | 23:30 |
fungi | afaik image uploads to infracloud have been paused for a couple days | 23:30 |
clarkb | pabelanger: one downside to that is you won't be able to boot anything in that subnet unless it is configured to have a local resolver | 23:30 |
clarkb | (and since it is provider networking I think everything is on that one subnet) | 23:30 |
pabelanger | Ya, we have single subnet provider-subnet-infracloud | 23:30 |
*** caphrim007 has quit IRC | 23:30 | |
*** caphrim007 has joined #openstack-infra | 23:31 | |
pabelanger | actually, I wonder if we could just remove dns server | 23:32 |
pabelanger | and glean will just skip setting it | 23:32 |
pabelanger | then we'll default to 127.0.0.1 | 23:32 |
pabelanger | clarkb: ^thoughts on that? | 23:33 |
pabelanger | openstack subnet set --name provider-subnet-infracloud --no-dns-nameservers should be the command | 23:33 |
*** yamamoto has quit IRC | 23:33 | |
*** claudiub has quit IRC | 23:35 | |
*** Swami has joined #openstack-infra | 23:35 | |
*** caphrim007 has quit IRC | 23:36 | |
clarkb | pabelanger: that may be more friendly to other hosts that may not have an unbound running (though I don't know that we'd do that so maybe its not worth worrying about) | 23:36 |
pabelanger | clarkb: k, let me try it real quick. It shouldn't affect nodepool, just won't be able to run configure-mirror.sh if failed | 23:38 |
*** thorst has joined #openstack-infra | 23:38 | |
pabelanger | okay, centos-7 booted with 127.0.0.1 | 23:40 |
clarkb | and dns works? | 23:40 |
pabelanger | and clean ignored writing /etc/resolv.conf | 23:40 |
pabelanger | yup | 23:40 |
clarkb | hax | 23:40 |
*** yamamoto has joined #openstack-infra | 23:40 | |
pabelanger | let me look at nodepool debug to make sure nodes are still booting | 23:41 |
pabelanger | clarkb: ya, nodepool is happy. Going to do the same in chocolate now | 23:45 |
clarkb | cool | 23:45 |
*** vhosakot has quit IRC | 23:45 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Bindmount /etc/lsb-release into bubblewrap https://review.openstack.org/490200 | 23:45 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Override tox requirments with zuul git repos https://review.openstack.org/489719 | 23:46 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Rename tox_command_line in docs to tox_extra_args https://review.openstack.org/489758 | 23:46 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: WIP job for non-OpenStack sphinx build https://review.openstack.org/492709 | 23:46 |
pabelanger | clarkb: done | 23:46 |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Zuul v3: add description to check pipeline https://review.openstack.org/492710 | 23:47 |
pabelanger | #status log removed 8.8.8.8 dns servers from both infracloud-chocolate and infracloud-vanilla provider-subnet-infracloud subnet | 23:47 |
*** sdague has quit IRC | 23:47 | |
*** pabelanger has quit IRC | 23:47 | |
*** pabelanger has joined #openstack-infra | 23:47 | |
pabelanger | #status log removed 8.8.8.8 dns servers from both infracloud-chocolate and infracloud-vanilla provider-subnet-infracloud subnet | 23:47 |
openstackstatus | pabelanger: finished logging | 23:47 |
openstackgerrit | Jeremy Stanley proposed openstack-infra/infra-specs master: Gerrit ContactStore Removal is implemented https://review.openstack.org/492287 | 23:47 |
fungi | pabelanger: i guess that's something we need to encode in the puppet module? | 23:48 |
clarkb | lon1 mirror fixes have been in place for almost an hour, load average is down to ~4 which is an improvement but still high wait | 23:49 |
pabelanger | fungi: ya, I'm looking to see where we set that up. Not sure if cloud-launcher or puppet is the place | 23:49 |
pabelanger | Yup, puppet-infracloud | 23:50 |
jeblair | pabelanger: http://logs.openstack.org/00/490200/5/check/tox-py35/cad03b1/job-output.txt.gz#_2017-08-10_23_49_14_718336 | 23:50 |
jeblair | does that mean anything to you? | 23:50 |
jeblair | that may be one of our first zuulv3 jobs on internap | 23:51 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [WIP] LVM support for dib-block-device https://review.openstack.org/472065 | 23:51 |
pabelanger | jeblair: apt task needs python bindings for apt. It must not be able to install that package for some reason | 23:51 |
pabelanger | jeblair: I think we could avoid that step by adding it to our DIB? | 23:52 |
jeblair | pabelanger: hrm... are we in a chicken/egg situation? | 23:53 |
jeblair | pabelanger: the apt-module needs python-apt to be installed in order to run apt commands, but we're using it to run apt-get update so that apt has a package cache and can function? | 23:53 |
pabelanger | jeblair: ya, I think so. I am guess our cache apt cache is too old to find python-apt for us to update the cache | 23:53 |
pabelanger | jeblair: I think so | 23:53 |
jeblair | pabelanger: well, we did just change the source list under it, so i think that makes it immediately out of date | 23:54 |
jeblair | pabelanger: should we just make that a shell command? | 23:54 |
pabelanger | jeblair: I think in this case, adding it infra-packages-needs like we do for python-selinux make sense | 23:54 |
pabelanger | jeblair: Ya, maybe shell in this case | 23:54 |
jeblair | pabelanger: since this is in zuul-jobs, that might make it most widely compatible without us having to tell folks they need stuff on their images | 23:55 |
openstackgerrit | Paul Belanger proposed openstack-infra/puppet-infracloud master: Remove dns_servers from provider-subnet-infracloud https://review.openstack.org/492712 | 23:55 |
pabelanger | jeblair: yes, I agree | 23:56 |
jeblair | pabelanger: i'm working on a change | 23:56 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Add publish-openstack-artifacts base job https://review.openstack.org/492713 | 23:56 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Remove openstack-publish-tarball base job https://review.openstack.org/492714 | 23:56 |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Use openstack-publish-artifacts base job https://review.openstack.org/492715 | 23:56 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Use shell for apt-get update https://review.openstack.org/492716 | 23:57 |
jeblair | pabelanger, mordred: did i ansible right? ^ | 23:57 |
mordred | jeblair: looking. | 23:57 |
mordred | jeblair, pabelanger: also - there's the rename stack for the base job | 23:57 |
pabelanger | Hmm, I think ansible-lint will fail. lets see, but looks right | 23:58 |
mordred | jeblair: yes. although command is preferred to shell unless you actually need shell | 23:58 |
jeblair | mordred: i can never remember :) | 23:58 |
jeblair | pabelanger: what's linty? | 23:58 |
pabelanger | what mordred just said | 23:58 |
jeblair | ok i'll just change it then | 23:59 |
pabelanger | tag: skip_ansible_lint I think is how we ignore linting per task | 23:59 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Use shell for apt-get update https://review.openstack.org/492716 | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!