*** thorst has joined #openstack-infra | 00:00 | |
fungi | ianw: i suppose i could work an option for longer-term entries into http://git.openstack.org/cgit/openstack-infra/puppet-exim/tree/templates/aliases.erb but likely we actually need to do something more manageable if this persists much longer (like an actual spam identification system) | 00:00 |
---|---|---|
fungi | i've resisted the pressure to do that so far, but things have never been anywhere near this bad until the past few weeks | 00:01 |
*** jamesmcarthur has quit IRC | 00:02 | |
*** thorst has quit IRC | 00:02 | |
*** slaweq has quit IRC | 00:02 | |
pabelanger | dmsimard: Ah, yes. It was limited to the CR repo for 7.4 | 00:03 |
fungi | okay, mysqldump just finished, gerrit restarting now | 00:05 |
*** thorst has joined #openstack-infra | 00:05 | |
fungi | gerrit webui seems to be working again | 00:06 |
fungi | #status log Gerrit on review.openstack.org restarted just now, and is no longer using contact store functionality or configuration options | 00:07 |
openstackstatus | fungi: finished logging | 00:07 |
fungi | i'll get a notice out to the infra ml tomorrow about https://review.openstack.org/491090 | 00:09 |
fungi | other than that, i think the gerrit-contactstore-removal spec is done | 00:09 |
*** jamesmcarthur has joined #openstack-infra | 00:12 | |
*** jkilpatr has quit IRC | 00:13 | |
*** dingyichen has joined #openstack-infra | 00:17 | |
*** jamesmcarthur has quit IRC | 00:17 | |
*** gmann has quit IRC | 00:18 | |
*** gmann has joined #openstack-infra | 00:18 | |
*** slaweq has joined #openstack-infra | 00:19 | |
*** thorst has quit IRC | 00:23 | |
*** slaweq has quit IRC | 00:23 | |
*** thorst has joined #openstack-infra | 00:23 | |
*** harlowja has quit IRC | 00:25 | |
openstackgerrit | Merged openstack/diskimage-builder master: Bump fedora/fedora-minimal DIB_RELEASE 26 https://review.openstack.org/482570 | 00:26 |
*** thorst has quit IRC | 00:27 | |
*** slaweq has joined #openstack-infra | 00:29 | |
*** claudiub has quit IRC | 00:34 | |
*** slaweq has quit IRC | 00:36 | |
*** armax has quit IRC | 00:37 | |
*** thorst has joined #openstack-infra | 00:38 | |
pabelanger | ianw: clarkb: thanks, elastic-recheck seems to be detecting tripleo failures now | 00:39 |
*** slaweq has joined #openstack-infra | 00:41 | |
*** Apoorva_ has joined #openstack-infra | 00:42 | |
*** bobh has joined #openstack-infra | 00:45 | |
*** Apoorva has quit IRC | 00:45 | |
*** slaweq has quit IRC | 00:46 | |
*** LindaWang has joined #openstack-infra | 00:46 | |
*** Apoorva_ has quit IRC | 00:47 | |
*** armax has joined #openstack-infra | 00:48 | |
*** liujiong has joined #openstack-infra | 00:48 | |
*** slaweq has joined #openstack-infra | 00:51 | |
*** markvoelker has joined #openstack-infra | 00:55 | |
*** slaweq has quit IRC | 00:56 | |
ianw | clarkb: any thoughts on http://logs.openstack.org/78/480778/2/check/gate-tempest-dsvm-neutron-full-centos-7-nv/8d9e9cc/logs/screen-n-cpu.txt.gz#_Aug_01_14_20_47_808336 | 00:56 |
ianw | unfortunately (?) your name comes up when looking for proxy errors in devstack logs :) | 00:57 |
*** markvoelker_ has joined #openstack-infra | 00:57 | |
ianw | it might be a red herring though, maybe it's a real neutron issue that bubbles up to nova like this ... | 00:57 |
*** markvoelker has quit IRC | 01:01 | |
*** slaweq has joined #openstack-infra | 01:01 | |
*** gouthamr has quit IRC | 01:02 | |
*** armax has quit IRC | 01:07 | |
*** slaweq has quit IRC | 01:07 | |
*** tuanluong has joined #openstack-infra | 01:10 | |
*** shu-mutou-AWAY is now known as shu-mutou | 01:10 | |
*** zhurong has joined #openstack-infra | 01:11 | |
*** pahuang has quit IRC | 01:18 | |
*** slaweq has joined #openstack-infra | 01:23 | |
*** thorst has quit IRC | 01:24 | |
*** slaweq has quit IRC | 01:28 | |
*** rwsu has quit IRC | 01:32 | |
ianw | 23.253.166.156 - - [01/Aug/2017:14:20:39 +0000] "GET /identity/v3/auth/tokens HTTP/1.1" 200 3714 | 01:32 |
ianw | 23.253.166.156 - - [01/Aug/2017:14:19:47 +0000] "GET /v2.0/auto-allocated-topology/f6806985392e4ece8ac13fb6784131b6 HTTP/1.1" 200 174 | 01:32 |
ianw | 23.253.166.156 - - [01/Aug/2017:14:20:39 +0000] "GET /identity/v3/auth/tokens HTTP/1.1" 200 3586 | 01:32 |
ianw | nothing good ever happens when times goes backwards | 01:32 |
*** slaweq has joined #openstack-infra | 01:34 | |
*** pahuang has joined #openstack-infra | 01:35 | |
*** dougwig has quit IRC | 01:36 | |
*** cuongnv has joined #openstack-infra | 01:37 | |
*** slaweq has quit IRC | 01:40 | |
*** rwsu has joined #openstack-infra | 01:44 | |
*** camunoz has quit IRC | 01:48 | |
*** pahuang has quit IRC | 01:54 | |
*** slaweq has joined #openstack-infra | 01:56 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 02:00 |
*** bobh has quit IRC | 02:00 | |
*** slaweq has quit IRC | 02:00 | |
*** ramishra has quit IRC | 02:03 | |
*** iyamahat has quit IRC | 02:06 | |
*** slaweq has joined #openstack-infra | 02:06 | |
*** yamahata has quit IRC | 02:07 | |
*** pahuang has joined #openstack-infra | 02:07 | |
*** jamesmcarthur has joined #openstack-infra | 02:13 | |
*** slaweq has quit IRC | 02:13 | |
*** gildub has joined #openstack-infra | 02:14 | |
*** dhill_ has quit IRC | 02:15 | |
*** dhill_ has joined #openstack-infra | 02:15 | |
*** Marx314 has quit IRC | 02:16 | |
*** mtreinish has quit IRC | 02:17 | |
*** fbouliane has quit IRC | 02:17 | |
*** gtmanfred has quit IRC | 02:17 | |
*** rbergeron has quit IRC | 02:18 | |
*** lifeless has quit IRC | 02:18 | |
*** tnarg has quit IRC | 02:18 | |
*** rodrigods has quit IRC | 02:19 | |
*** rbergeron has joined #openstack-infra | 02:19 | |
*** lifeless has joined #openstack-infra | 02:19 | |
*** mtreinish has joined #openstack-infra | 02:22 | |
*** gtmanfred has joined #openstack-infra | 02:23 | |
*** rodrigods has joined #openstack-infra | 02:23 | |
*** fbouliane has joined #openstack-infra | 02:23 | |
*** gcb has joined #openstack-infra | 02:29 | |
*** ramishra has joined #openstack-infra | 02:34 | |
*** sree has joined #openstack-infra | 02:34 | |
*** bobh has joined #openstack-infra | 02:35 | |
*** sree has quit IRC | 02:39 | |
*** jamesmcarthur has quit IRC | 02:40 | |
*** slaweq has joined #openstack-infra | 02:40 | |
*** armax has joined #openstack-infra | 02:41 | |
*** armax has quit IRC | 02:44 | |
*** slaweq has quit IRC | 02:45 | |
*** yamamoto_ has joined #openstack-infra | 02:46 | |
*** yamamoto has quit IRC | 02:46 | |
*** jamesmcarthur has joined #openstack-infra | 02:48 | |
*** hongbin_ has joined #openstack-infra | 02:49 | |
*** hongbin has quit IRC | 02:49 | |
*** hongbin_ has quit IRC | 02:49 | |
*** hongbin has joined #openstack-infra | 02:49 | |
*** tnovacik has joined #openstack-infra | 02:50 | |
*** slaweq has joined #openstack-infra | 02:51 | |
openstackgerrit | jimmygc proposed openstack/diskimage-builder master: Fix ubuntu minimal build failure https://review.openstack.org/491653 | 02:55 |
*** slaweq has quit IRC | 02:56 | |
*** jamesmcarthur has quit IRC | 02:56 | |
*** slaweq has joined #openstack-infra | 03:01 | |
*** slaweq has quit IRC | 03:05 | |
*** ramineni has joined #openstack-infra | 03:06 | |
*** ramineni has left #openstack-infra | 03:07 | |
*** slaweq has joined #openstack-infra | 03:11 | |
*** spzala has quit IRC | 03:16 | |
*** david-lyle has quit IRC | 03:16 | |
*** slaweq has quit IRC | 03:18 | |
*** tnovacik has quit IRC | 03:23 | |
*** david-lyle has joined #openstack-infra | 03:23 | |
*** jascott1_ has quit IRC | 03:24 | |
*** nicolasbock has joined #openstack-infra | 03:25 | |
*** jascott1 has joined #openstack-infra | 03:25 | |
*** jascott1 has quit IRC | 03:26 | |
*** jascott1 has joined #openstack-infra | 03:27 | |
*** slaweq has joined #openstack-infra | 03:33 | |
*** slaweq has quit IRC | 03:38 | |
*** bobh has quit IRC | 03:38 | |
*** nicolasbock has quit IRC | 03:39 | |
*** slaweq has joined #openstack-infra | 03:43 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Update the zuul-sphinx extension config https://review.openstack.org/491134 | 03:44 |
*** baoli has quit IRC | 03:44 | |
*** Dinesh_Bhor has joined #openstack-infra | 03:45 | |
*** dave-mccowan has quit IRC | 03:47 | |
*** slaweq has quit IRC | 03:49 | |
*** nicolasbock has joined #openstack-infra | 03:50 | |
*** links has joined #openstack-infra | 03:53 | |
*** hongbin has quit IRC | 03:56 | |
*** esberglu has quit IRC | 03:59 | |
*** EricGonczer_ has joined #openstack-infra | 04:02 | |
*** ykarel has joined #openstack-infra | 04:02 | |
*** EricGonczer_ has quit IRC | 04:20 | |
*** adisky__ has joined #openstack-infra | 04:21 | |
*** thorst has joined #openstack-infra | 04:25 | |
*** thorst has quit IRC | 04:30 | |
*** harlowja has joined #openstack-infra | 04:35 | |
*** spzala has joined #openstack-infra | 04:47 | |
*** esberglu has joined #openstack-infra | 04:49 | |
*** spzala has quit IRC | 04:51 | |
*** pahuang has quit IRC | 04:52 | |
*** esberglu has quit IRC | 04:53 | |
*** jamesmcarthur has joined #openstack-infra | 04:57 | |
*** sflanigan has quit IRC | 04:59 | |
*** slaweq has joined #openstack-infra | 05:00 | |
*** claudiub has joined #openstack-infra | 05:01 | |
*** hareesh has joined #openstack-infra | 05:01 | |
*** jamesmcarthur has quit IRC | 05:02 | |
*** pahuang has joined #openstack-infra | 05:05 | |
*** slaweq has quit IRC | 05:05 | |
*** eranrom has quit IRC | 05:08 | |
*** slaweq has joined #openstack-infra | 05:10 | |
*** nicolasbock has quit IRC | 05:11 | |
*** harlowja has quit IRC | 05:14 | |
*** slaweq has quit IRC | 05:15 | |
*** waynr has joined #openstack-infra | 05:19 | |
*** waynr has left #openstack-infra | 05:20 | |
*** slaweq has joined #openstack-infra | 05:20 | |
*** slaweq has quit IRC | 05:27 | |
*** psachin has joined #openstack-infra | 05:33 | |
*** yamahata has joined #openstack-infra | 05:41 | |
openstackgerrit | Akihiro Motoki proposed openstack-infra/project-config master: Add release permission for neutron-vpnaas and dashboard https://review.openstack.org/491670 | 05:43 |
*** sree has joined #openstack-infra | 05:49 | |
*** nicolasbock has joined #openstack-infra | 05:53 | |
*** cshastri has joined #openstack-infra | 05:53 | |
*** thorst has joined #openstack-infra | 05:58 | |
*** markus_z has joined #openstack-infra | 05:58 | |
*** sflanigan has joined #openstack-infra | 06:03 | |
*** thorst has quit IRC | 06:03 | |
*** bhavik1 has joined #openstack-infra | 06:05 | |
*** pgadiya has joined #openstack-infra | 06:18 | |
*** rcernin has joined #openstack-infra | 06:21 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Fix detail headers order for nodepool list https://review.openstack.org/491678 | 06:25 |
*** coolsvap has joined #openstack-infra | 06:26 | |
*** kjackal_ has joined #openstack-infra | 06:28 | |
*** bhavik1 has quit IRC | 06:53 | |
*** stevebaker has quit IRC | 06:54 | |
*** slaweq has joined #openstack-infra | 06:58 | |
*** zhurong has quit IRC | 06:59 | |
*** pcaruana has joined #openstack-infra | 07:00 | |
*** markvoelker_ has quit IRC | 07:01 | |
*** stevebaker has joined #openstack-infra | 07:02 | |
*** spzala has joined #openstack-infra | 07:04 | |
*** slaweq has quit IRC | 07:04 | |
*** jascott1 has quit IRC | 07:05 | |
*** jascott1 has joined #openstack-infra | 07:05 | |
*** markvoelker has joined #openstack-infra | 07:07 | |
*** markvoelker has quit IRC | 07:08 | |
*** markvoelker has joined #openstack-infra | 07:08 | |
*** spzala has quit IRC | 07:09 | |
*** aarefiev has joined #openstack-infra | 07:10 | |
*** jascott1 has quit IRC | 07:10 | |
*** gtrxcb has quit IRC | 07:11 | |
*** florianf has joined #openstack-infra | 07:15 | |
*** aviau has quit IRC | 07:19 | |
*** aviau has joined #openstack-infra | 07:19 | |
*** tesseract has joined #openstack-infra | 07:21 | |
*** ralonsoh has joined #openstack-infra | 07:22 | |
*** Swami has quit IRC | 07:27 | |
*** slaweq has joined #openstack-infra | 07:30 | |
*** Douhet has quit IRC | 07:31 | |
*** Douhet has joined #openstack-infra | 07:32 | |
*** slaweq has quit IRC | 07:36 | |
*** ccamacho has joined #openstack-infra | 07:38 | |
*** yamamoto_ has quit IRC | 07:44 | |
*** sflanigan has quit IRC | 07:48 | |
*** yamamoto has joined #openstack-infra | 07:50 | |
*** alexchadin has joined #openstack-infra | 07:55 | |
*** e0ne has joined #openstack-infra | 07:56 | |
*** ralonsoh_ has joined #openstack-infra | 07:57 | |
*** ralonsoh has quit IRC | 07:57 | |
*** rtjure has quit IRC | 07:58 | |
*** thorst has joined #openstack-infra | 07:59 | |
*** ralonsoh_ is now known as ralonsoh | 08:02 | |
*** arturb has quit IRC | 08:02 | |
*** thorst has quit IRC | 08:04 | |
*** shardy has joined #openstack-infra | 08:06 | |
*** seanhandley has left #openstack-infra | 08:07 | |
*** gildub has quit IRC | 08:09 | |
*** priteau has joined #openstack-infra | 08:09 | |
*** mwarad has joined #openstack-infra | 08:15 | |
*** _mwarad_ has joined #openstack-infra | 08:15 | |
*** _mwarad_ has quit IRC | 08:15 | |
*** derekh has joined #openstack-infra | 08:20 | |
*** dizquierdo has joined #openstack-infra | 08:20 | |
*** slaweq has joined #openstack-infra | 08:25 | |
*** dingyichen has quit IRC | 08:25 | |
*** lucas-afk is now known as lucasagomes | 08:26 | |
openstackgerrit | Merged openstack-infra/project-config master: Make neutron functional job non-voting https://review.openstack.org/491548 | 08:27 |
*** esberglu has joined #openstack-infra | 08:28 | |
bauzas | mmm, can't we now provide HTTP links in a gerrit comment ? | 08:29 |
*** slaweq has quit IRC | 08:29 | |
*** esberglu has quit IRC | 08:32 | |
*** slaweq has joined #openstack-infra | 08:35 | |
*** slaweq has quit IRC | 08:41 | |
dimak | hey | 08:41 |
strigazi | ianw yt? | 08:41 |
dimak | I have an error with Babel from openstack mirror | 08:42 |
dimak | http://logs.openstack.org/00/489000/2/gate/gate-dragonflow-python35/44a33cb/console.html#_2017-08-08_07_25_10_200073 | 08:42 |
dimak | Anyone noticed this? | 08:42 |
*** electrofelix has joined #openstack-infra | 08:46 | |
*** rtjure has joined #openstack-infra | 08:46 | |
*** yamamoto has quit IRC | 08:49 | |
openstackgerrit | Spyros Trigazis (strigazi) proposed openstack-infra/system-config master: [magnum] Cache fedorapeople.org https://review.openstack.org/491466 | 08:49 |
openstackgerrit | Spyros Trigazis (strigazi) proposed openstack-infra/system-config master: [magnum] Cache fedorapeople.org https://review.openstack.org/491466 | 08:50 |
*** mwarad has quit IRC | 08:59 | |
openstackgerrit | Spyros Trigazis (strigazi) proposed openstack-infra/project-config master: [magnum] Cache fedorapeople.org https://review.openstack.org/491724 | 08:59 |
*** alexchadin has quit IRC | 09:03 | |
*** spzala has joined #openstack-infra | 09:05 | |
*** ykarel is now known as ykarel|lunch | 09:08 | |
*** stakeda has quit IRC | 09:09 | |
*** spzala has quit IRC | 09:10 | |
*** alexchadin has joined #openstack-infra | 09:15 | |
*** nicolasbock has quit IRC | 09:15 | |
*** yamamoto has joined #openstack-infra | 09:15 | |
*** slaweq has joined #openstack-infra | 09:19 | |
*** sambetts|afk is now known as sambetts | 09:20 | |
*** yamamoto has quit IRC | 09:22 | |
*** slaweq has quit IRC | 09:25 | |
*** pgadiya has quit IRC | 09:26 | |
*** tosky has joined #openstack-infra | 09:29 | |
*** pgadiya has joined #openstack-infra | 09:29 | |
*** slaweq has joined #openstack-infra | 09:29 | |
ianw | strigazi: for a bit | 09:33 |
strigazi | ianw https://review.openstack.org/#/q/topic:cache-fedorapeople-magnum | 09:33 |
*** slaweq has quit IRC | 09:34 | |
ianw | strigazi: ok cool, get pabelanger to take a look too but LGTM | 09:35 |
strigazi | ianw he is in canada? | 09:38 |
ianw | usually :) | 09:39 |
*** slaweq has joined #openstack-infra | 09:39 | |
strigazi | ianw yes he is, you in AU afaik and me in Switzerland, very convenient setup :) | 09:40 |
*** shardy has quit IRC | 09:43 | |
*** nicolasbock has joined #openstack-infra | 09:43 | |
*** slaweq has quit IRC | 09:46 | |
*** kornicameister has quit IRC | 09:47 | |
*** cuongnv has quit IRC | 09:52 | |
*** yamamoto has joined #openstack-infra | 09:54 | |
*** shu-mutou is now known as shu-mutou-AWAY | 09:54 | |
*** shardy has joined #openstack-infra | 09:56 | |
*** jamesmcarthur has joined #openstack-infra | 09:57 | |
*** alexchadin has quit IRC | 09:59 | |
*** thorst has joined #openstack-infra | 10:00 | |
*** kornicameister has joined #openstack-infra | 10:00 | |
*** slaweq has joined #openstack-infra | 10:02 | |
*** jamesmcarthur has quit IRC | 10:02 | |
*** thorst has quit IRC | 10:05 | |
*** yamamoto has quit IRC | 10:07 | |
*** pgadiya has quit IRC | 10:07 | |
*** slaweq has quit IRC | 10:08 | |
*** liujiong has quit IRC | 10:09 | |
*** dtantsur|afk is now known as dtantsur | 10:09 | |
*** yamamoto has joined #openstack-infra | 10:12 | |
*** slaweq has joined #openstack-infra | 10:12 | |
*** igormarnat has quit IRC | 10:16 | |
*** ruhe has quit IRC | 10:16 | |
*** tnarg has joined #openstack-infra | 10:17 | |
*** markvoelker has quit IRC | 10:17 | |
*** yamamoto has quit IRC | 10:17 | |
*** igormarnat has joined #openstack-infra | 10:17 | |
*** Odd_Bloke has quit IRC | 10:17 | |
*** abelur has quit IRC | 10:17 | |
*** esberglu has joined #openstack-infra | 10:18 | |
*** yamamoto has joined #openstack-infra | 10:18 | |
*** ruhe has joined #openstack-infra | 10:18 | |
*** abelur has joined #openstack-infra | 10:18 | |
*** hareesh has quit IRC | 10:18 | |
*** odyssey4me has quit IRC | 10:18 | |
*** abelur_ has quit IRC | 10:18 | |
*** Odd_Bloke has joined #openstack-infra | 10:19 | |
*** slaweq has quit IRC | 10:19 | |
*** hareesh has joined #openstack-infra | 10:19 | |
*** pgadiya has joined #openstack-infra | 10:19 | |
*** odyssey4me has joined #openstack-infra | 10:19 | |
*** yamamoto has quit IRC | 10:20 | |
*** yamamoto has joined #openstack-infra | 10:20 | |
*** esberglu has quit IRC | 10:21 | |
*** zhurong has joined #openstack-infra | 10:24 | |
*** AJaeger is now known as AJaeger_ | 10:26 | |
*** pgadiya has quit IRC | 10:28 | |
*** tojuvone has joined #openstack-infra | 10:34 | |
*** tojuvone has left #openstack-infra | 10:35 | |
*** katkapilatova has joined #openstack-infra | 10:36 | |
openstackgerrit | Merged openstack-infra/project-config master: Make grenade-linuxbridge-multinode job experimental https://review.openstack.org/490993 | 10:38 |
openstackgerrit | Mark Korondi proposed openstack-infra/project-config master: Bringing upstream training virtual environment over here https://review.openstack.org/490202 | 10:40 |
openstackgerrit | Merged openstack-infra/project-config master: [Kuryr] Turn python3 job to voting https://review.openstack.org/491627 | 10:40 |
*** ykarel|lunch is now known as ykarel | 10:41 | |
*** pgadiya has joined #openstack-infra | 10:41 | |
*** thorst has joined #openstack-infra | 10:42 | |
openstackgerrit | Merged openstack-infra/project-config master: [Fuxi] Turn python3 job to voting https://review.openstack.org/491628 | 10:44 |
openstackgerrit | Merged openstack-infra/project-config master: [Zun] Make python3 dsvm job as voting https://review.openstack.org/491623 | 10:44 |
openstackgerrit | Mark Korondi proposed openstack-infra/project-config master: Bringing upstream training virtual environment over here https://review.openstack.org/490202 | 10:45 |
*** igormarnat has quit IRC | 10:48 | |
*** igormarnat has joined #openstack-infra | 10:48 | |
openstackgerrit | Merged openstack-infra/project-config master: [Zun] Move multinode job to experimental https://review.openstack.org/491624 | 10:50 |
openstackgerrit | Merged openstack-infra/project-config master: Reduce yum-config-manager output https://review.openstack.org/491076 | 10:50 |
openstackgerrit | Merged openstack-infra/project-config master: Upgrade the ARA fedora jobs to fedora 26 https://review.openstack.org/491633 | 10:51 |
*** thorst has quit IRC | 10:54 | |
*** thorst has joined #openstack-infra | 10:54 | |
*** jkilpatr has joined #openstack-infra | 10:58 | |
*** yamamoto has quit IRC | 10:58 | |
*** lrossetti_ has joined #openstack-infra | 10:58 | |
*** thorst has quit IRC | 10:59 | |
*** lrossetti has quit IRC | 10:59 | |
*** slaweq has joined #openstack-infra | 10:59 | |
*** yamamoto has joined #openstack-infra | 10:59 | |
*** sdague has joined #openstack-infra | 10:59 | |
*** yamamoto has quit IRC | 11:05 | |
*** slaweq_ has joined #openstack-infra | 11:07 | |
*** spzala has joined #openstack-infra | 11:07 | |
*** jascott1 has joined #openstack-infra | 11:07 | |
*** yamamoto has joined #openstack-infra | 11:07 | |
*** yamamoto has quit IRC | 11:08 | |
*** yamamoto has joined #openstack-infra | 11:10 | |
*** sree has quit IRC | 11:10 | |
*** jascott1 has quit IRC | 11:12 | |
*** spzala has quit IRC | 11:12 | |
*** slaweq_ has quit IRC | 11:12 | |
*** yamamoto has quit IRC | 11:13 | |
*** yamamoto has joined #openstack-infra | 11:15 | |
*** huanxie has quit IRC | 11:15 | |
*** yamamoto has quit IRC | 11:16 | |
*** yamamoto has joined #openstack-infra | 11:16 | |
*** slaweq_ has joined #openstack-infra | 11:17 | |
*** alexchadin has joined #openstack-infra | 11:19 | |
*** slaweq_ has quit IRC | 11:23 | |
*** gildub has joined #openstack-infra | 11:24 | |
*** EricGonczer_ has joined #openstack-infra | 11:33 | |
*** gordc has joined #openstack-infra | 11:37 | |
*** EricGonczer_ has quit IRC | 11:38 | |
*** EricGonczer_ has joined #openstack-infra | 11:39 | |
*** dave-mccowan has joined #openstack-infra | 11:40 | |
*** ldnunes has joined #openstack-infra | 11:41 | |
*** lucasagomes is now known as lucas-hungry | 11:46 | |
*** abelur_ has joined #openstack-infra | 11:50 | |
*** thorst has joined #openstack-infra | 11:51 | |
*** slaweq_ has joined #openstack-infra | 11:51 | |
*** slaweq_ has quit IRC | 11:56 | |
*** psachin has quit IRC | 11:58 | |
*** jrist has joined #openstack-infra | 11:58 | |
openstackgerrit | Tobias Rydberg proposed openstack-infra/irc-meetings master: Changed to correct chairs for the publiccloud_wg https://review.openstack.org/491769 | 12:00 |
*** psachin has joined #openstack-infra | 12:00 | |
pabelanger | looks like we are hitting quota issues in citycloud-lon1 | 12:01 |
pabelanger | OpenStackCloudHTTPError: (403) Client Error for url: https://lon1.citycloud.com:8774/v2/bed89257500340af8d0fbe7141b1bfd6/servers Quota exceeded for cores, instances: Requested 8, 1, but already used 400, 50 of 400, 50 cores, instances | 12:01 |
pabelanger | also, that error message is super confusing | 12:01 |
*** slaweq_ has joined #openstack-infra | 12:01 | |
*** jpena|off is now known as jpena | 12:03 | |
*** esberglu has joined #openstack-infra | 12:04 | |
*** trown|outtypewww is now known as trown | 12:05 | |
*** rlandy has joined #openstack-infra | 12:06 | |
*** slaweq_ has quit IRC | 12:06 | |
*** tuanluong has quit IRC | 12:07 | |
*** hareesh has quit IRC | 12:08 | |
*** esberglu has quit IRC | 12:09 | |
*** slaweq_ has joined #openstack-infra | 12:12 | |
*** slaweq_ has quit IRC | 12:16 | |
*** yamamoto has quit IRC | 12:18 | |
pabelanger | clarkb: any idea why we'd see this warning http://logs.openstack.org/49/491749/1/check/gate-tripleo-ci-centos-7-undercloud-oooq/5edaa28/console.html#_2017-08-08_11_03_02_676400 | 12:19 |
pabelanger | clarkb: I mean, I know why it is there but how should I go about fixing it | 12:20 |
*** yamamoto has joined #openstack-infra | 12:20 | |
*** yamamoto has quit IRC | 12:20 | |
*** slaweq_ has joined #openstack-infra | 12:22 | |
mnaser | https://review.openstack.org/#/c/491466/ can someone give this a bit of love by any chance | 12:23 |
mnaser | most magnum jobs are timing out due to this | 12:23 |
mnaser | so hopefully if we can get some caching in, it'll become significantly less | 12:23 |
pabelanger | mnaser: strigazi: which images are specifically needed? | 12:24 |
mnaser | pabelanger right now the one that keeps timing out in master https://fedorapeople.org/groups/magnum/fedora-atomic-latest.qcow2 | 12:24 |
mnaser | it downloads at ~30Kb/s so it just times out | 12:25 |
mnaser | http://logs.openstack.org/11/488511/4/check/gate-functional-dsvm-magnum-api-ubuntu-xenial/25369a5/logs/devstacklog.txt.gz < warning, big log file, but you can see it there | 12:25 |
pabelanger | mnaser: right, what is the difference between that and atomic images shipped by fedora? | 12:25 |
*** jcoufal has joined #openstack-infra | 12:25 | |
pabelanger | mnaser: for example, http://mirror.regionone.infracloud-vanilla.openstack.org/fedora/releases/26/CloudImages/x86_64/images/ | 12:26 |
*** Goneri has joined #openstack-infra | 12:26 | |
mnaser | pabelanger good question, i'll defer to strigazi for that. however, as a deployer, I use the atomic images shipped by fedora and they work | 12:27 |
mnaser | however we're testing/running against fedora 25 right now | 12:27 |
mnaser | and i dont see that in the mirrors for some reason | 12:27 |
mnaser | http://mirror.regionone.infracloud-vanilla.openstack.org/fedora/releases/25/CloudImages/x86_64/images/ | 12:27 |
*** slaweq_ has quit IRC | 12:28 | |
*** rwsu has quit IRC | 12:28 | |
*** rwsu has joined #openstack-infra | 12:29 | |
mnaser | http://mirror.math.princeton.edu/pub/alt/atomic/stable/ | 12:29 |
mnaser | okay, thats a specific mirror but that seems to be where they are stored, /pub/alt/atomic/ .. dont think we already cache that? | 12:29 |
pabelanger | ya, looking. Fedoar-26 seems to ship them now | 12:30 |
pabelanger | trying to see where fedora-25 is | 12:30 |
*** zhurong has quit IRC | 12:30 | |
*** ralonsoh has quit IRC | 12:31 | |
*** ralonsoh has joined #openstack-infra | 12:32 | |
strigazi | pabelanger we need fedora-atomic-latest which is a symlink to Fedora-Atomic-25-20170719.qcow2 fedora-kubernetes-ironic-latest.tar.gz -> fedora-25-kubernetes-ironic-20170620.tar.gz and ubuntu-mesos-latest.qcow2 -> ubuntu-14.04.3-mesos-0.25.0.qcow2 | 12:32 |
strigazi | pabelanger mnaser the images are stock images | 12:33 |
pabelanger | right, so lets see if we can just mirror them directly from source | 12:34 |
pabelanger | ATM, fedora-26 atomic we get for free | 12:34 |
strigazi | pabelanger we use fedora eople to use a symlink when we update the image and not add commits to our repo | 12:34 |
*** jaypipes has joined #openstack-infra | 12:35 | |
*** sbezverk has joined #openstack-infra | 12:35 | |
mnaser | imho its probably cleaner to show which upstream ones we're using exactly, making it easy for potential users to know the exact image that is being used | 12:35 |
strigazi | pabelanger but we can always commit there if it makes our life easier and we gain performance | 12:35 |
pabelanger | seems like a lot of pressure on fedorapeople.org | 12:35 |
strigazi | pabelanger if we get f26 for free we can change to the official repo. | 12:36 |
mnaser | strigazi https://github.com/openstack-infra/system-config/blob/master/modules/openstack_project/files/mirror/fedora-mirror-update.sh we can edit this to get the fedora 25 images | 12:36 |
mnaser | http://mirrors.kernel.org/fedora-alt/atomic/stable/ | 12:36 |
pabelanger | strigazi: right, I mean, if you want to test fedora-26, we already mirror that to AFS | 12:37 |
mnaser | strigazi fyi f26 come with docker 1.13.1 and i had to push up a few things to make it work so just to keep in mind (mainly k8s 1.6.7 and a patch to set default policy for iptables forward to accept) | 12:37 |
pabelanger | otherwise, we should be able to add mirror for https://dl.fedoraproject.org/pub/alt/atomic/stable/ | 12:37 |
strigazi | pabelanger sounds good but for stable branches we are slower in updates we still need f25 until we update. | 12:37 |
openstackgerrit | Alexander Chadin proposed openstack-infra/project-config master: Remove gate job from watcherclient https://review.openstack.org/491784 | 12:38 |
*** kgiusti has joined #openstack-infra | 12:38 | |
pabelanger | strigazi: why isn't Fedora-Atomic-25-20170719.qcow2 listed at https://dl.fedoraproject.org/pub/alt/atomic/stable/ ? | 12:39 |
mnaser | pabelanger based on my simple math it seem to be around ~4GB per fedora atomic release so mirroring should use ~36gb | 12:39 |
strigazi | pabelanger deleted? | 12:39 |
*** yamamoto has joined #openstack-infra | 12:40 | |
pabelanger | strigazi: would one of the listed images work for you? How do you decided when you need to replace Fedora-Atomic-25-20170719.qcow2 | 12:41 |
strigazi | pabelanger I'll give it a go with f26 and if it works we can see what to with our ubuntu image and stable branches. | 12:41 |
robcresswell | o/ Just setting up 3rd party CI, is it expected that the noop-check-communication job doesnt receieve params like LOG_PATH? Seems to be able to find the log server, but isn't populating that build param. | 12:42 |
pabelanger | strigazi: sure, lets see if that works, if so, then you get the mirror for free. Looking at other images now | 12:42 |
*** rhallisey has joined #openstack-infra | 12:42 | |
pabelanger | robcresswell: see http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/openstack_functions.py how we set it up today for zuulv2.5 | 12:43 |
pabelanger | robcresswell: you'll need to create a job like: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/layout.yaml#n1119 to call the function | 12:43 |
*** slaweq_ has joined #openstack-infra | 12:44 | |
*** dprince has joined #openstack-infra | 12:45 | |
robcresswell | thanks pabelanger. Little out of my depth atm. That's really helpful. | 12:45 |
*** lucas-hungry is now known as lucasagomes | 12:45 | |
pabelanger | np | 12:46 |
*** jpena is now known as jpena|mtg | 12:46 | |
*** slaweq_ has quit IRC | 12:49 | |
*** Goneri has quit IRC | 12:50 | |
*** mandre_away is now known as mandre_mtg | 12:51 | |
*** pradk has joined #openstack-infra | 12:54 | |
*** abelur_ has quit IRC | 12:54 | |
*** slaweq_ has joined #openstack-infra | 12:54 | |
*** felipemonteiro_ has joined #openstack-infra | 12:55 | |
*** coolsvap has quit IRC | 12:56 | |
*** jpena|mtg is now known as jpena|off | 12:56 | |
*** felipemonteiro__ has joined #openstack-infra | 12:57 | |
*** esberglu has joined #openstack-infra | 12:58 | |
*** jamesmcarthur has joined #openstack-infra | 12:58 | |
*** slaweq_ has quit IRC | 13:00 | |
*** felipemonteiro_ has quit IRC | 13:01 | |
mnaser | pabelanger whats the decision making process when deciding if something will be mirrored or cached? | 13:01 |
mnaser | http://mirror.regionone.infracloud-vanilla.openstack.org/fedora/releases/26/CloudImages/x86_64/images/ -- the image there is from 2017-07-05.. there's been a few newer images since (such as one released on the 23rd of july) | 13:02 |
mnaser | so i suspect we're not going to get access to fresh images :( | 13:02 |
pabelanger | mnaser: if we can usually rsync, we mirror. However, if the contents chance too fast (like rdo), then we reverse proxy cache | 13:02 |
*** jrist has quit IRC | 13:02 | |
*** clayton has quit IRC | 13:03 | |
mnaser | pabelanger i would guess then images would be something that we can consider more on the side of stable content | 13:04 |
mnaser | and it would involve a small change here only https://github.com/openstack-infra/system-config/blob/master/modules/openstack_project/files/mirror/fedora-mirror-update.sh | 13:04 |
*** clayton has joined #openstack-infra | 13:05 | |
mnaser | i can propose a small change and what ill do is ill exclude all the older releases so we only have f25 atomic latest + f26 atomic latest and then new releases moving forward | 13:05 |
mnaser | it'll save a bunch of disk space on images we likely wont use | 13:05 |
*** gildub has quit IRC | 13:06 | |
*** Julien-zte has joined #openstack-infra | 13:06 | |
*** pradk has quit IRC | 13:06 | |
fungi | i'm curious why the content is so stale. we run rsync from the official copy ~daily? | 13:07 |
*** sbezverk has quit IRC | 13:07 | |
mnaser | fungi i dont think its the content thats stale, i think the atomic team doesnt publish images there officially | 13:08 |
mnaser | they probably release in /pub/alt/atomic and there might have been some old reason why that ended up there (for fedora 25, it doesnt even exist) | 13:08 |
*** spzala has joined #openstack-infra | 13:08 | |
*** rlandy has quit IRC | 13:08 | |
openstackgerrit | Gael Chamoulaud proposed openstack-infra/tripleo-ci master: Enable tripleo-validations tests https://review.openstack.org/481080 | 13:09 |
pabelanger | Ya, I don't think ISO content (or any content) changes in release directory | 13:09 |
pabelanger | we'd likely need to mirror: https://dl.fedoraproject.org/pub/alt/atomic/stable/ | 13:09 |
fungi | got it | 13:10 |
fungi | now i'm less confused ;) | 13:10 |
*** links has quit IRC | 13:11 | |
*** markvoelker has joined #openstack-infra | 13:12 | |
*** pgadiya has quit IRC | 13:13 | |
numans | pabelanger, hi, can you please add this to your review queue - https://review.openstack.org/#/c/490622/ | 13:13 |
*** LindaWang has quit IRC | 13:13 | |
*** dizquierdo is now known as dizquierdo_afk | 13:14 | |
*** slaweq_ has joined #openstack-infra | 13:16 | |
strigazi | pabelanger mnaser Will someone push a chnage to mirror https://dl.fedoraproject.org/pub/alt/atomic/stable/ ? | 13:17 |
*** mpranjic has joined #openstack-infra | 13:17 | |
mnaser | strigazi working on it! | 13:17 |
mnaser | i'm making sure we dont mirror useless stuff like isos etc | 13:17 |
*** Liuqing has joined #openstack-infra | 13:18 | |
strigazi | mnaser cool | 13:18 |
*** bobh has joined #openstack-infra | 13:18 | |
mpranjic | hello! I have issues with login to wiki.openstack.org with openID. | 13:19 |
strigazi | mnaser they don't have ISOs i think | 13:19 |
mpranjic | I get the error: | 13:19 |
mpranjic | OpenID error | 13:19 |
mpranjic | An error occurred: an invalid token was found. | 13:19 |
mnaser | strigazi http://mirrors.kernel.org/fedora-alt/atomic/stable/Fedora-Atomic-26-20170723.0/Atomic/x86_64/iso/ | 13:19 |
mpranjic | can someone help me out with that? | 13:19 |
mnaser | and there is stuff like libvirt boxes and blabla, i'll get it addressed shortly | 13:19 |
mpranjic | my Ubuntu One username is: mpranjic | 13:19 |
strigazi | mnaser we only need /CloudImages, not /Atomic | 13:20 |
mnaser | yep thats why im doing all the excludes in the rsync mirorring | 13:20 |
mnaser | so we get all the .qcow2s pretty much | 13:20 |
*** ldnunes has quit IRC | 13:20 | |
strigazi | and raw I guess | 13:21 |
*** slaweq_ has quit IRC | 13:21 | |
*** sshnaidm|afk is now known as sshnaidm | 13:21 | |
openstackgerrit | Mohammed Naser proposed openstack-infra/system-config master: Add Fedora Atomic mirrors https://review.openstack.org/491800 | 13:22 |
mnaser | pabelanger fungi ^ i also added output in the comments of a dry run so it should work :) | 13:23 |
*** baoli has joined #openstack-infra | 13:23 | |
*** xyang1 has joined #openstack-infra | 13:24 | |
*** slaweq_ has joined #openstack-infra | 13:26 | |
*** sree has joined #openstack-infra | 13:27 | |
openstackgerrit | Mohammed Naser proposed openstack-infra/project-config master: Add NODEPOOL_ATOMIC_MIRROR to configure_mirror.sh https://review.openstack.org/491801 | 13:28 |
*** LindaWang has joined #openstack-infra | 13:28 | |
*** jamesmcarthur has quit IRC | 13:28 | |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Add mapping file containing v2 to v3 mappings https://review.openstack.org/491804 | 13:29 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805 | 13:30 |
*** slaweq_ has quit IRC | 13:33 | |
*** ldnunes has joined #openstack-infra | 13:33 | |
*** cshastri has quit IRC | 13:33 | |
slaweq | mordred: hello | 13:34 |
slaweq | mordred: can You take a look at https://review.openstack.org/#/c/491266/ | 13:34 |
mordred | slaweq: yes! | 13:34 |
slaweq | mordred: I think that it's enough to do it like I did but please check if maybe yamamoto is right | 13:35 |
slaweq | mordred: thx in advance :) | 13:35 |
mordred | oh - sorry - I had this reviewed in my browswer but didn't actually click submit ... | 13:35 |
*** cshastri has joined #openstack-infra | 13:36 | |
mordred | slaweq: review left - but basically we need to copy the ENABLE_IDENTIY_V2 pattern for now (this will be better in a couple of weeks) | 13:37 |
*** alexchadin has quit IRC | 13:39 | |
ssbarnea | What could I do to make the release of JJB 2.0 happen before the apocalypse? https://storyboard.openstack.org/#!/story/2000745 | 13:39 |
*** alexchadin has joined #openstack-infra | 13:40 | |
*** alexchadin has quit IRC | 13:40 | |
slaweq | mordred: thx | 13:40 |
*** alexchadin has joined #openstack-infra | 13:40 | |
mordred | ssbarnea: hi! so - I think we'd like to hold off until we've migrated openstack to zuul v3 which is planned for september 11 | 13:40 |
fungi | mordred: well, _we_ pin the version we're using | 13:41 |
*** alexchadin has quit IRC | 13:41 | |
mordred | oh. | 13:41 |
mordred | well | 13:41 |
mordred | ignore me | 13:41 |
fungi | so i don't expect they need to hold off releasing | 13:41 |
fungi | we haven't asked them not to | 13:41 |
*** alexchadin has joined #openstack-infra | 13:41 | |
*** bh526r has joined #openstack-infra | 13:41 | |
*** alexchadin has quit IRC | 13:41 | |
ssbarnea | the fact that 2.0 is in pre-release for so long does hurt it a lot as I cannot 'persuade' others to use the pre-release in production. | 13:42 |
*** wznoinsk_ is now known as wznoinsk | 13:42 | |
*** alexchadin has joined #openstack-infra | 13:42 | |
fungi | ssbarnea: have you asked in #openstack-jjb? the devs/reviewers on that repo have been mostly autonomous for a while, the infra team only provides a bit of oversight | 13:42 |
sshnaidm | clarkb, ping | 13:42 |
fungi | we stopped exerting much control over it when we ceased using jenkins (roughly a year ago) | 13:43 |
*** ldnunes_ has joined #openstack-infra | 13:43 | |
ssbarnea | fungi: thanks for the hint. I didn't know about that channel, joined and going to cross post now. | 13:44 |
*** ldnunes has quit IRC | 13:44 | |
odyssey4me | hi all, I'd like to understand more about how we can cache an image onto the nodepool nodes | 13:44 |
*** camunoz has joined #openstack-infra | 13:45 | |
*** jtomasek has joined #openstack-infra | 13:46 | |
*** alexchadin has quit IRC | 13:46 | |
*** hongbin has joined #openstack-infra | 13:47 | |
*** felipemonteiro__ has quit IRC | 13:48 | |
*** slaweq_ has joined #openstack-infra | 13:48 | |
*** ociuhandu has joined #openstack-infra | 13:50 | |
*** slaweq_ has quit IRC | 13:53 | |
*** ociuhandu has quit IRC | 13:56 | |
robcresswell | o/ Sorry, back with more questions; nodepool list seems to be "stuck" with a list of instances in the delete state, but the provider has already deleted them. Is there a way to nudge nodepool to figure that out? | 13:57 |
*** EricGonczer_ has joined #openstack-infra | 13:57 | |
*** Liuqing has quit IRC | 13:58 | |
*** slaweq_ has joined #openstack-infra | 13:59 | |
*** dizquierdo_afk is now known as dizquierdo | 13:59 | |
*** gouthamr has joined #openstack-infra | 14:00 | |
*** EricGonc_ has joined #openstack-infra | 14:01 | |
*** xinliang has quit IRC | 14:01 | |
dimak | AJaeger_, yolanda there are a lot of queued jenkins jobs, any chance there are more node-pool issues? | 14:02 |
fungi | odyssey4me: we cache the _small_ images and similar files devstack declares it wants by running its image_list.sh utility script from this element: https://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/elements/cache-devstack/extra-data.d/55-cache-devstack-repos#n107 | 14:02 |
*** EricGonczer_ has quit IRC | 14:02 | |
fungi | odyssey4me: obviously baking too many or too large images onto the filesystems of our worker images makes them unwieldy, so we do try to keep it to a minimum and infrequently-used/larger images can instead be grabbed through our afs-backed mirrors or our caching reverse proxies | 14:04 |
odyssey4me | fungi it's probably a bit big to cache, and putting into the afs mirror or reverse proxying might work fine | 14:04 |
*** marst has joined #openstack-infra | 14:05 | |
fungi | for example, kolla publishes their images onto tarballs.o.o and then we have a reverse proxy they pull them through in each provider/region | 14:05 |
fungi | and their largest images are over 4gib | 14:05 |
*** slaweq_ has quit IRC | 14:06 | |
fungi | the ones we cache onto our image filesystems are more things like cirros which if memory serves is in the tens of mib | 14:06 |
mtreinish | fungi: it's 13MB | 14:09 |
*** mriedem has joined #openstack-infra | 14:09 | |
*** slaweq has quit IRC | 14:10 | |
fungi | cool, i was within margin of error/order of magnitude anyway ;) | 14:10 |
*** sree has quit IRC | 14:10 | |
*** sree has joined #openstack-infra | 14:11 | |
mtreinish | well at least on x86_64, maybe other arches are bigger :) | 14:11 |
odyssey4me | fungi oh no, let me check on the size - but it's less than 300MB IIRC | 14:13 |
*** xinliang has joined #openstack-infra | 14:13 | |
*** rbrndt has joined #openstack-infra | 14:14 | |
odyssey4me | fungi ah it seems it's around ~90MB per platform | 14:14 |
fungi | we carve out 100gib for the afs cache and another separate 100gib for the apache reverse proxy cache now, so should have plenty of room to cache things local to workers either way, but files in the neighborhood of 100mib is probably pushing the bounds of what we'd want to cache unless a substantial percentage of all jobs we're running will use it | 14:15 |
*** jtomasek has quit IRC | 14:15 | |
odyssey4me | fungi so it'd be preferred as a file on AFS, rather than a reverse proxy? | 14:16 |
fungi | that mostly depends on how often the file is expected to change | 14:17 |
odyssey4me | fungi we'd be happy to refresh it daily, or even weekly | 14:17 |
openstackgerrit | Claudiu Belu proposed openstack-infra/project-config master: cloudbase-init: Adds releasenotes jobs https://review.openstack.org/491821 | 14:17 |
fungi | is this something you're producing and reconsuming, or something you're consuming which is published outside our ci system by some other community (and how often, roughly)? | 14:18 |
odyssey4me | fungi it's the base lxc cache which is published once every 6 hours IIRC onto images.lxccontainers.org | 14:18 |
odyssey4me | sorry - images.linuxcontainers.org | 14:19 |
fungi | that seems like a better fit for the reverse proxy cache, yeah | 14:19 |
odyssey4me | yeah, that would be alot easier for us to consume I think, because we'd still be able to use the API instead of creating a code path to use a file path or custom URL | 14:19 |
fungi | well, it'll still be a custom url because it's not a transparent proxy | 14:20 |
odyssey4me | yep, but we have a code path for that already | 14:20 |
fungi | oh, awesome | 14:20 |
odyssey4me | so, how do I add a new reverse proxy? | 14:20 |
fungi | two places need updating: | 14:20 |
fungi | http://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/templates/mirror.vhost.erb | 14:21 |
fungi | https://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/scripts/configure_mirror.sh | 14:21 |
*** Guest13936 is now known as med_ | 14:21 | |
*** med_ has quit IRC | 14:21 | |
*** med_ has joined #openstack-infra | 14:21 | |
*** med_ is now known as medberry | 14:21 | |
fungi | it should be pretty clear from surrounding context what needs to be added, but if you have questions then ask away | 14:22 |
odyssey4me | thanks fungi - I'll take a closer look shortly and ping any further questions, thanks so much for your expertise and assistance | 14:23 |
*** jpena|off is now known as jpena | 14:25 | |
fungi | odyssey4me: just glad i could help | 14:25 |
hongbin | hi, i want to know if there is a way to dump non-devstack systemd logs (i.e. docker logs) to the gate, i tried to do this: https://review.openstack.org/#/c/480306/1/contrib/post_test_hook.sh , but it looks the logs are not there if there is a timeout killed | 14:26 |
*** admcleod_ is now known as admcleod | 14:26 | |
openstackgerrit | Matthew Treinish proposed openstack-infra/subunit2sql master: Switch to using stestr https://review.openstack.org/491074 | 14:27 |
fungi | dimak: i think we're just backlogged. the osic environment was finally turned off last week, we've got a couple of citycloud regions offline for different issues, and our voucher for ovh expired so we're waiting for that to get re-upped | 14:28 |
openstackgerrit | Matthew Treinish proposed openstack-infra/subunit2sql master: Update python3 versions in tox.ini envlist https://review.openstack.org/491827 | 14:29 |
fungi | the post pipeline's only about 4 hours behind, so the situation's not terrible (yet anyway) | 14:29 |
openstackgerrit | Matthew Treinish proposed openstack-infra/subunit2sql master: Switch to using stestr https://review.openstack.org/491074 | 14:34 |
openstackgerrit | Matthew Treinish proposed openstack-infra/subunit2sql master: Update python3 versions in tox.ini envlist https://review.openstack.org/491827 | 14:34 |
*** felipemonteiro_ has joined #openstack-infra | 14:37 | |
*** armax has joined #openstack-infra | 14:37 | |
*** jpena is now known as jpena|off | 14:38 | |
*** florianf has quit IRC | 14:40 | |
*** alexchadin has joined #openstack-infra | 14:42 | |
*** medberry is now known as med_ | 14:43 | |
*** LindaWang has quit IRC | 14:43 | |
*** slaweq has joined #openstack-infra | 14:43 | |
*** dtantsur is now known as dtantsur|brb | 14:44 | |
*** katkapilatova has left #openstack-infra | 14:45 | |
*** alexchadin has quit IRC | 14:47 | |
*** gyee has joined #openstack-infra | 14:48 | |
openstackgerrit | Merged openstack/os-client-config master: Update globals safely https://review.openstack.org/491618 | 14:48 |
*** slaweq has quit IRC | 14:48 | |
*** links has joined #openstack-infra | 14:48 | |
*** links has quit IRC | 14:49 | |
*** cshastri has quit IRC | 14:50 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 14:53 |
*** florianf has joined #openstack-infra | 14:53 | |
*** slaweq has joined #openstack-infra | 14:53 | |
*** annegentle has joined #openstack-infra | 14:56 | |
*** xarses_ has joined #openstack-infra | 14:57 | |
clarkb | sshnaidm: hi | 14:58 |
*** EricGonc_ has quit IRC | 14:58 | |
*** slaweq has quit IRC | 14:59 | |
sshnaidm | clarkb, do we have any problem with logs now? EmilienM told me you have some issues | 14:59 |
clarkb | odyssey4me: fungi not that if the lxc images arent served with ttls they will be cached for roughly 24 hours. which is 4x their update cycle | 15:00 |
clarkb | sshnaidm: yes there are still ~27 copies on /etc in every job | 15:00 |
sshnaidm | clarkb, which job? do you have an url? | 15:00 |
*** dmsimard is now known as dmsimard|afk | 15:00 | |
odyssey4me | clarkb our issue for testing is not really getting the latest image, but getting one at all | 15:00 |
*** EricGonczer_ has joined #openstack-infra | 15:01 | |
odyssey4me | between the dns failures, and slow download speeds, we're not getting them reliably done and getting job timeouts/failures | 15:01 |
odyssey4me | so we're hoping just to get something more reliable in place | 15:01 |
*** psachin has quit IRC | 15:01 | |
clarkb | odyssey4me: sure just noting that that may be a drawback | 15:02 |
clarkb | sshnaidm: gate-tripleo-ci-centos-7-undercloud-containers is the one I looked at but assuming the others are that way too | 15:02 |
odyssey4me | clarkb appreciate the heads up - for us it won't be an issue | 15:02 |
fungi | clarkb: odyssey4me: if it becomes an issue, convincing the hosts of that image to start employing a cache ttl header is probably not an entirely wasted effort either | 15:03 |
*** slaweq has joined #openstack-infra | 15:04 | |
*** gyee has quit IRC | 15:04 | |
*** pradk has joined #openstack-infra | 15:05 | |
*** jrist has joined #openstack-infra | 15:05 | |
*** pradk has quit IRC | 15:07 | |
*** mattmceuen has joined #openstack-infra | 15:07 | |
sshnaidm | clarkb, fyi https://bugs.launchpad.net/tripleo/+bug/1709339 | 15:07 |
openstack | Launchpad bug 1709339 in tripleo "CI: duplicate /etc directories in logs for containers" [Critical,Triaged] | 15:07 |
clarkb | sshnaidm: http://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/var/log/extra/docker/containers/neutron_ovs_agent/etc/ http://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/var/log/extra/docker/containers/mysql/etc/ | 15:08 |
clarkb | http://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/var/log/extra/docker/containers/mistral_executor/etc/ and so on | 15:08 |
sshnaidm | clarkb, I see, it's described in the bug I submitted right now | 15:09 |
*** jascott1 has joined #openstack-infra | 15:09 | |
clarkb | sshnaidm: its not just for the containers btw http://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/etc/ | 15:09 |
*** slaweq has quit IRC | 15:10 | |
sshnaidm | clarkb, this directory is from main subnode, it's not duplicated | 15:11 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 15:13 |
clarkb | sshnaidm: but it is, because its teh same stuff from the containers | 15:13 |
clarkb | sshnaidm: and its got redundant info we should never be collecting like DIR_COLORS | 15:14 |
clarkb | sshnaidm: we need to stop collecting all of that | 15:14 |
clarkb | make 1 copy of the necessary data and thats it | 15:14 |
sshnaidm | clarkb, yeah, but problem is collecting /etc in containers, not this one | 15:14 |
*** jascott1 has quit IRC | 15:14 | |
clarkb | its both... | 15:14 |
sshnaidm | clarkb, we need 1 /etc directory anyway | 15:14 |
clarkb | yes but we don't need all the extra crap in it | 15:14 |
sshnaidm | clarkb, if it takes 1KB, I'm not sure it worth the effort to overcomplicate the code | 15:15 |
clarkb | sshnaidm: but it is | 15:16 |
clarkb | beacuse we've already had problems where overcollecting results in grabbing potentially massive content you don't want | 15:16 |
clarkb | this is why we keep asking you to only copy what you want | 15:16 |
clarkb | rather than copying everything then reducing from everything | 15:16 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: Fixed Typo on Summit Service https://review.openstack.org/491836 | 15:17 |
sshnaidm | clarkb, we excluded everything you told to exclude previous time: https://github.com/openstack-infra/tripleo-ci/blob/master/toci-quickstart/config/collect-logs.yml#L35-L64 | 15:17 |
clarkb | sshnaidm: right but we've also asked you to invert the way you collect logs and only collect what you want | 15:17 |
clarkb | sshnaidm: so first step was stop collecting absolutely everything but moving forward we should be collecting what we want/need explicitly | 15:18 |
openstackgerrit | Merged openstack-infra/openstackid-resources master: Fixed Typo on Summit Service https://review.openstack.org/491836 | 15:18 |
clarkb | but also you are still collecting all of /etc multiple times including things like dir colors and bashcompletion which we've asked you to stop for weeks now | 15:18 |
sshnaidm | clarkb, it's too much projects in tripleo together so not always is possible to maintain the relevant up to date list, please consider fact it's not one project like you have usually, but a lot of them | 15:19 |
clarkb | yes taht is the same situation we are in for devstack-gate and it hasn't been a problem | 15:19 |
sshnaidm | clarkb, I'm not sure it's the same situation | 15:20 |
clarkb | sshnaidm: and I'd like you to consider that you have effectively ddos'd our filesystem multiple times | 15:20 |
fungi | sshnaidm: why not put together a list of files you've needed to look at when troubleshooting job failures for those in the past? certainly you haven't looked at every copy of every file in /etc? | 15:20 |
clarkb | one single job uses 10% of all our disk | 15:20 |
clarkb | more than the next two jobs combined | 15:20 |
sshnaidm | clarkb, we talk now about files of size a few KBs, it's not what kills fs | 15:20 |
clarkb | sshnaidm: no the collect everything attitude is what kills the fs | 15:20 |
clarkb | sshnaidm: because when centos 7.4 happens and some new thing sneaks in we break again | 15:21 |
clarkb | and then against for 7.5 and then 8 and so on | 15:21 |
clarkb | if instead you collcet what you need this risk is greatly reduced | 15:21 |
fungi | it's better to realize you're not collecting a file you need and then make a change to start including it than to collect files you won't ever need | 15:21 |
sshnaidm | clarkb, not sure I understand how centos is related to collecting /etc files | 15:21 |
clarkb | sshnaidm: because the contents of /etc will change as centos changes over time | 15:22 |
fungi | sshnaidm: because each new update or release of the distro can move files around in /etc or add new ones | 15:22 |
sshnaidm | clarkb, yeah, but how would it break anything? | 15:22 |
*** Julien-zte has quit IRC | 15:22 | |
clarkb | sshnaidm: if a large file shows up all of a sudden we fill the disk again just like we already did with the java stuff | 15:22 |
fungi | when rh decides it should include some new large set of files in /etc you start collecting that automatically and fill up our logserver again | 15:22 |
*** sbezverk has joined #openstack-infra | 15:23 | |
fungi | let me put this another way... if we stopped hosting logs for tripleo jobs, we could provide the community with several months of log retention instead of just one | 15:23 |
fungi | do 2/3 of our community benefit from being able to look at logs for tripleo job failures? | 15:23 |
sshnaidm | fungi, yes, because most of projects use one dir for logs and one /etc folder, because the have only one process | 15:24 |
*** iyamahat has joined #openstack-infra | 15:24 | |
fungi | sshnaidm: that's not an answer to my question | 15:24 |
sshnaidm | fungi, they would | 15:24 |
*** sbezverk_ has joined #openstack-infra | 15:24 | |
sshnaidm | fungi, because we test their projects too | 15:24 |
fungi | what percentage of our community do you thnik look at logs for tripleo jobs? i doubt it's even close to the proportion of space you're using on the logs site | 15:25 |
sshnaidm | fungi, every project which is part of tripleo will benefit from our jobs | 15:25 |
*** Julien-zte has joined #openstack-infra | 15:25 | |
sshnaidm | fungi, we have jobs running in about 10 other projects, part of them are voting, part of them are not, and part is experimental | 15:25 |
*** Julien-zte has quit IRC | 15:26 | |
*** slaweq has joined #openstack-infra | 15:26 | |
*** marst_ has joined #openstack-infra | 15:26 | |
sshnaidm | fungi, and we work to have there much more voting and relevant jobs to prevent failures and to help with integration | 15:26 |
*** marst has quit IRC | 15:27 | |
*** iyamahat has quit IRC | 15:27 | |
sshnaidm | fungi, it's neutron, nova, ironic, etc, etc, and tripleo jobs for some of them are only way to be tested in "real life" | 15:27 |
sshnaidm | so yes, I think we do something useful for all community | 15:27 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 15:27 |
*** sbezverk has quit IRC | 15:28 | |
clarkb | sshnaidm: but we do that in other jobs as well without copying unnecessary data redundantly in every job is the point | 15:28 |
clarkb | sshnaidm: you can do both things, they do not conflict with each other | 15:28 |
*** vhosakot has joined #openstack-infra | 15:28 | |
sshnaidm | clarkb, I handled the problem with all these /etc, it's a bug and will be solved, but I'm against whitelist of logs | 15:30 |
*** dtantsur|brb is now known as dtantsur | 15:30 | |
*** slaweq has quit IRC | 15:30 | |
sshnaidm | clarkb, from all my investigations in last years it's really hard to determine which log will give your info, it could anyone | 15:30 |
clarkb | sshnaidm: there is no reason to collect bash compleetion or dir colors and so on | 15:30 |
clarkb | what is the argument that you need those? | 15:31 |
*** lrossetti_ has quit IRC | 15:31 | |
sshnaidm | clarkb, right, we don't need this, I can add them to exclude list right now | 15:31 |
*** lrossetti has joined #openstack-infra | 15:31 | |
sshnaidm | clarkb, to maintain configs list for tens of services is much more complicated and reason for breakages and failures | 15:32 |
clarkb | but it isn't... | 15:32 |
clarkb | we have done it successfully for years | 15:32 |
*** lrossetti has quit IRC | 15:33 | |
fungi | devstack-gate specifically has done it for years | 15:33 |
*** camunoz has quit IRC | 15:34 | |
sshnaidm | clarkb, fungi ok, I will raise this question on next tripleo meeting, please come and let's discuss there, I hope we'll find something suitable for everybody | 15:34 |
sshnaidm | clarkb, fungi does it work for you? | 15:34 |
clarkb | tuesday at 1400UTC is a bit early for me but I can try | 15:34 |
clarkb | and I think fungi is traveling that day | 15:34 |
clarkb | sshnaidm: I can do my best to get up early | 15:36 |
*** slaweq has joined #openstack-infra | 15:36 | |
*** rhallisey has quit IRC | 15:37 | |
jeblair | what's the current disk space used per build? | 15:37 |
clarkb | seems to be between ~75MB and 100MB based on the job | 15:37 |
clarkb | (so it is a massive improvement over where we were, but it would be nice to make it robsut so that we don't have to worry as much about it exploding in the future) | 15:38 |
*** ccamacho has quit IRC | 15:38 | |
*** ccamacho has joined #openstack-infra | 15:38 | |
jeblair | k | 15:38 |
fungi | part of it is also the number of tripleo jobs and number of times they're run multiplied by their average size | 15:38 |
*** tosky has quit IRC | 15:40 | |
*** tosky has joined #openstack-infra | 15:40 | |
clarkb | basically we know that collecting everything is problematic because you end up with what you don't expect (there are multiple cases of this) so to avoid it in the future I personally would like to see a more whitelist approach to collecting logs than a grab everything + blacklist | 15:41 |
fungi | doing a quick analysis of the sample data clarkb collected, jobs with "tripleo" in the name account for 33% of the data we're storing right now, so trying to figure out how to get that reduced | 15:41 |
clarkb | with a single job (the one linked to above) being ~10% of the total | 15:42 |
clarkb | whcih is more than the next two jobs combined | 15:42 |
*** armax has quit IRC | 15:42 | |
*** iyamahat has joined #openstack-infra | 15:42 | |
*** slaweq has quit IRC | 15:42 | |
*** armax has joined #openstack-infra | 15:43 | |
fungi | at least it's down from previously, where some 70% of the data we were storing were tripleo job logs | 15:43 |
*** pstack has joined #openstack-infra | 15:43 | |
*** dougwig has joined #openstack-infra | 15:44 | |
*** e0ne has quit IRC | 15:45 | |
sshnaidm | clarkb, fungi I added the item, if you can please join, if not - I'll present your point: https://etherpad.openstack.org/p/tripleo-meeting-items | 15:46 |
*** markus_z has quit IRC | 15:46 | |
fungi | thanks sshnaidm! | 15:46 |
fungi | and clarkb is correct, i'll be driving a car during that next meeting | 15:47 |
*** camunoz has joined #openstack-infra | 15:47 | |
fungi | otherwise i would gladly attend | 15:47 |
*** jamesdenton has quit IRC | 15:49 | |
*** jamesdenton has joined #openstack-infra | 15:50 | |
*** slaweq_ has joined #openstack-infra | 15:52 | |
*** krtaylor has quit IRC | 15:54 | |
*** yamamoto has quit IRC | 15:54 | |
*** hamzy has quit IRC | 15:55 | |
*** yamamoto has joined #openstack-infra | 15:58 | |
*** felipemonteiro__ has joined #openstack-infra | 15:58 | |
mwhahaha | guestion, so i seem to have puppet-tripleo-puppet-unit logs in my puppet-mistral-puppet-lint results: http://logs.openstack.org/52/491352/1/gate/gate-puppet-mistral-puppet-lint/9b68f25/console.html#_2016-12-16_09_24_02_240950 | 16:00 |
mwhahaha | also they are results from 2016 | 16:01 |
mwhahaha | any thoughts on how that happened? | 16:02 |
*** felipemonteiro_ has quit IRC | 16:02 | |
clarkb | the timestamp on the file itself is from today the 8th of august, possible that a node booted with bad clock resulting in the 2016 problem | 16:02 |
pabelanger | ya, that is odd | 16:03 |
*** yamamoto has quit IRC | 16:03 | |
mwhahaha | http://logs.openstack.org/52/491352/1/gate/gate-puppet-mistral-puppet-lint/9b68f25/console.html#_2016-12-16_09_29_54_873896 | 16:03 |
mwhahaha | i like the success/failure/failure | 16:04 |
clarkb | and the other builds there have the correct content so it isn't a consistent problem | 16:04 |
mwhahaha | wonder if there's a node that never fully cleared or something | 16:04 |
*** jamesmcarthur has joined #openstack-infra | 16:04 | |
mwhahaha | because it looks like it's got a fail from back in march as well | 16:04 |
clarkb | thinking about how console logs work could it be a uuid collision (that seems very unlikely but we are only using the short version of the uuid in the path at least | 16:05 |
pabelanger | ya | 16:05 |
clarkb | jeblair: ^ | 16:05 |
pabelanger | it also doesn't explain why http://logs.openstack.org/52/491352/1/gate/gate-puppet-mistral-puppet-lint/9b68f25/_zuul_ansible/scripts/07-4094c726a11441b9b73ac0c6dde28be6.sh was actually called | 16:05 |
pabelanger | because console log is different | 16:06 |
clarkb | pabelanger: ya tahts why I'm wondering if its a collision on the uuids and we ended up copying so old file left around on the launcher maybe | 16:06 |
pabelanger | maybe | 16:06 |
clarkb | except we clear those out too don't we? they don't have a life on the launcher beyond the job? | 16:06 |
pabelanger | let me look at zl02 | 16:07 |
pabelanger | see if anything is odd | 16:07 |
*** Apoorva has joined #openstack-infra | 16:07 | |
clarkb | mwhahaha: I highly doubt that the test node itself managed to survive for 8 months. Nodepool is pretty good about keeping things cleaned up after its timeouts so 8 months would be a long time to survive | 16:08 |
pabelanger | clarkb: look at the node ID | 16:08 |
mwhahaha | ¯\_(ツ)_/¯ stranger things have happened :D | 16:08 |
pabelanger | centos-7-infracloud-chocolate-6231138 | 16:08 |
pabelanger | that is way wrong | 16:08 |
pabelanger | centos-7-infracloud-vanilla-10324975 | 16:09 |
pabelanger | what is should have been | 16:09 |
pabelanger | so, I wonder if we some how booted an old VM again in infracloiud | 16:09 |
clarkb | pabelanger: ya that and the appended statuses makes me think there may be a collision somewhere and we end up picking up the old data | 16:09 |
clarkb | oh it could be cached on the remote somehow hrm? | 16:10 |
*** armax has quit IRC | 16:10 | |
*** armax has joined #openstack-infra | 16:11 | |
openstackgerrit | Gabriele Cerami proposed openstack-infra/tripleo-ci master: WIP: containers periodic test https://review.openstack.org/475747 | 16:11 |
openstackgerrit | Slawek Kaplonski proposed openstack-infra/project-config master: Add QoS service plugin to be enabled in shade tests https://review.openstack.org/491266 | 16:12 |
clarkb | in any case I think it likely is running the correct job but when the console log is collected we are grabbing the old file somehow | 16:12 |
clarkb | mwhahaha: ^ | 16:12 |
mwhahaha | well it failed after passing the check so who knows | 16:12 |
mwhahaha | i rechecked and we'll see | 16:12 |
pabelanger | clarkb: mwhahaha: so, job timed out for some reason. And zuul killed ansible, collected logs | 16:12 |
pabelanger | so, possible something was wrong with node | 16:13 |
pabelanger | SSH hostkeys didn't change, do it was the right node | 16:13 |
clarkb | we don't check hostkeys though | 16:13 |
pabelanger | which makes me think it was just collsion with logs | 16:13 |
pabelanger | clarkb: ya, we set them up | 16:13 |
clarkb | in 2.5? | 16:13 |
clarkb | pretty sure we don't | 16:14 |
pabelanger | ya, 1 sec | 16:14 |
clarkb | basedon the sequence number of the node that node definitely does look like it would've been booted in december though. So I think we are just getting logs from december somehow | 16:14 |
pabelanger | clarkb: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/launcher/ansiblelaunchserver.py#n1399 | 16:15 |
*** EricGonczer_ has quit IRC | 16:15 | |
clarkb | pabelanger: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/launcher/ansiblelaunchserver.py#n1283 | 16:16 |
fungi | i wonder if it could have been on a down hypervisor until very recently, and we ran into an ip address collision rather than a uuid collision (which seems far less likely) | 16:16 |
clarkb | so we don't have any idea what the hostkey should be we just grab whatever we get and trust it | 16:16 |
clarkb | so we aren't really checking it in a way to know we got the right node | 16:16 |
pabelanger | clarkb: Right, we keyscan to make sure node doesn't disappear between playbook runs | 16:16 |
pabelanger | but you are right, we blindly assume | 16:16 |
clarkb | fungi: that is an interesting thoery | 16:17 |
clarkb | fungi: basically arp wins on old hosts and we pick up preexisting console log there? | 16:17 |
fungi | though still surprising that nodepool's cleanup wouldn't have dealt with it given the way we tag instances | 16:17 |
pabelanger | clarkb: I wonder if we should try to match node via hostname too? Nodepool sets it centos-7-infracloud-vanilla-10324975, we could then have ansible task validate correct hostname | 16:18 |
fungi | it should be deleting old nodes it finds in the server list even if it has lost track of them in its db | 16:18 |
clarkb | fungi: ya but that runs on a 15 minute cron iirc so there is a window where your theory could happen | 16:18 |
clarkb | small window but possible | 16:18 |
pabelanger | or some other form of meta data in config-drive | 16:18 |
clarkb | pabelanger: the idea would be for nova/neutron to provide the hostkey to us then we check that | 16:18 |
clarkb | pabelanger: that work is slowly in progress aiui but nothing we can control today unfortunately | 16:19 |
pabelanger | clarkb: ya | 16:19 |
fungi | there is also the possibility to have glean echo the hostkey to the kconsole on boot and then get nodepool to scrape it from the nova console log, but not all our providers support the necessary api method | 16:20 |
*** ccamacho has left #openstack-infra | 16:20 | |
*** ykarel has quit IRC | 16:22 | |
clarkb | we should be able to do a nova list and see if any nodes with ancient sequence numbers show up /me does this | 16:22 |
*** krtaylor has joined #openstack-infra | 16:23 | |
*** lucasagomes is now known as lucas-afk | 16:24 | |
clarkb | ubuntu-xenial-infracloud-vanilla-8895313 | 16:25 |
clarkb | ubuntu-xenial-infracloud-chocolate-8911632 | 16:25 |
clarkb | those may be held nodes? | 16:25 |
clarkb | everything else looks fairly new | 16:26 |
fungi | what are the possibilities that there could be a lost instance which isn't tracked in nova's db, and so is squatting an ip address but never getting cleaned up since it doesn't appear in the server list? | 16:27 |
clarkb | nope they've been in a delete state for 81 and 78 days | 16:27 |
*** ggillies_ has quit IRC | 16:27 | |
clarkb | fungi: I'm guessing it is theoretically possible | 16:27 |
*** pcaruana has quit IRC | 16:27 | |
clarkb | but don't know enough about nova to know under what circumstances that could happen if any | 16:27 |
fungi | here in openstack, anything is theoretically possible! | 16:27 |
clarkb | we could run a virsh list --all and compare | 16:28 |
*** ggillies has joined #openstack-infra | 16:30 | |
*** dizquierdo has quit IRC | 16:31 | |
*** rcernin has quit IRC | 16:31 | |
*** dizquierdo has joined #openstack-infra | 16:31 | |
*** slagle has quit IRC | 16:31 | |
*** tesseract has quit IRC | 16:31 | |
pabelanger | fungi: any feedback from OVH and our collections? | 16:32 |
fungi | pabelanger: i hadn't seen anything back from jean-daniel yet as of a few minutes ago | 16:33 |
*** kjackal_ has quit IRC | 16:33 | |
fungi | looking back through the discussion history from the last time this happened, our most recent voucher expired in january and we began to get notifications at that time | 16:34 |
fungi | between them being in french and going to infra-root@ address which nobody was monitoring regularly until i started keeping an eye on it a couple months ago, i didn't realize these were how we're supposed to know it's time to re-up the voucher | 16:34 |
pabelanger | ack | 16:35 |
fungi | so once we get this squared away, the _next_ time we start getting messages in french from ovh to infra-root@ we should reach out to jean-daniel at that point to ask to have the voucher re-upped | 16:36 |
*** markvoelker has quit IRC | 16:36 | |
fungi | but it would be great if more people than just me set up their imap clients to keep an eye on the various mailboxes under that account too | 16:36 |
clarkb | scanning libvirt for rogue instances is not as easy as it sounds (or I'm missing the virsh command that tells you what the nova uuid is) | 16:38 |
*** LindaWang has joined #openstack-infra | 16:39 | |
clarkb | nova doesn't set instance descriptions | 16:40 |
openstackgerrit | Merged openstack-infra/irc-meetings master: Changed to correct chairs for the publiccloud_wg https://review.openstack.org/491769 | 16:40 |
fungi | i smell a ranty summit talk in the works | 16:40 |
clarkb | aha! virsh list --uuid | 16:41 |
*** bh526r has quit IRC | 16:41 | |
*** slaweq_ has quit IRC | 16:43 | |
*** dmsimard|afk is now known as dmsimard | 16:43 | |
*** LindaWang has quit IRC | 16:43 | |
*** alexchadin has joined #openstack-infra | 16:44 | |
*** pstack has quit IRC | 16:44 | |
*** slaweq has joined #openstack-infra | 16:45 | |
*** shardy has quit IRC | 16:46 | |
*** voipmonk has left #openstack-infra | 16:47 | |
*** annegentle has quit IRC | 16:49 | |
*** alexchadin has quit IRC | 16:49 | |
openstackgerrit | Matthew Treinish proposed openstack/os-testr master: Switch to stestr under the covers https://review.openstack.org/488441 | 16:50 |
*** rhallisey has joined #openstack-infra | 16:50 | |
*** Apoorva_ has joined #openstack-infra | 16:50 | |
*** derekh has quit IRC | 16:52 | |
*** rhallisey has quit IRC | 16:52 | |
*** yamahata has quit IRC | 16:52 | |
*** rhallisey has joined #openstack-infra | 16:52 | |
*** iyamahat has quit IRC | 16:53 | |
*** Apoorva has quit IRC | 16:53 | |
*** pstack has joined #openstack-infra | 16:53 | |
*** ralonsoh has quit IRC | 16:53 | |
*** trown is now known as trown|lunch | 16:54 | |
clarkb | fungi: pabelanger http://paste.openstack.org/show/617807/ that is what we have on reachable hypervisors | 16:58 |
clarkb | now to cross check against the nodepool logs to see if any don't belong | 16:58 |
*** camunoz has quit IRC | 16:58 | |
fungi | i have a feeling i'm going to be disappointed but still unsurprised by the result | 16:58 |
*** camunoz has joined #openstack-infra | 16:59 | |
*** baoli has quit IRC | 17:00 | |
openstackgerrit | Chris Dent proposed openstack-infra/project-config master: Publish placement-api-ref https://review.openstack.org/491860 | 17:00 |
*** slaweq has quit IRC | 17:02 | |
clarkb | fungi: pabelanger http://paste.openstack.org/show/617809/ a few of them don't show in today's logs. Will cross check those against nova listings next | 17:03 |
clarkb | and I guess nodepool listings as they may be older than today | 17:03 |
*** tosky has quit IRC | 17:04 | |
*** rwsu has quit IRC | 17:04 | |
clarkb | http://paste.openstack.org/show/617811/ is that cleaned up a bit | 17:06 |
clarkb | only one of thoseshows up in nodepool listings | 17:08 |
clarkb | now we check nova | 17:08 |
*** baoli has joined #openstack-infra | 17:09 | |
*** iyamahat has joined #openstack-infra | 17:10 | |
fungi | funny, ovh says our instance with ip address 158.69.77.16 was reported conducting a brute force attack against someone's ssh server at 00:38:10 CEST today | 17:13 |
*** annegentle has joined #openstack-infra | 17:13 | |
fungi | i can't find evidence that nodepool's booted any instance with that ip address in the past ~10 days of launcher debug logs | 17:15 |
fungi | and it's not an ip address for anything in our ansible inventory | 17:16 |
clarkb | and I can't ssh to it implying it never was one of ours | 17:16 |
clarkb | or rather if it still was around it wasn't ours | 17:16 |
clarkb | fungi: pabelanger http://paste.openstack.org/show/617815/ all of those VMs appear to be leaked. Actually now that I say that I didn't find whcih one is our mirror node so need to clean it out of the list | 17:17 |
*** dtantsur is now known as dtantsur|afk | 17:17 | |
clarkb | http://paste.openstack.org/show/617816/ doesn't include the mirror | 17:18 |
*** sree has quit IRC | 17:18 | |
clarkb | can you maybe double check that list and make sure I'm not missing some other VMs? but I think next step is dumpxml on them to see if we can get any more info about why they exist then possible virsh destroy them | 17:19 |
clarkb | and virsh undefine them | 17:19 |
*** sree has joined #openstack-infra | 17:19 | |
clarkb | fungi: I also think ^ lends weight to your IP addr theory | 17:19 |
*** dizquierdo has quit IRC | 17:19 | |
clarkb | that first node says <nova:creationTime>2016-12-15 16:34:33</nova:creationTime> | 17:20 |
* fungi sighs | 17:20 | |
*** bobh has quit IRC | 17:21 | |
clarkb | which is suspiciously close to the log timestamp that mwhahaha pointed out | 17:21 |
clarkb | also it is a nodepool host according to dumpxml | 17:21 |
*** rbrndt has quit IRC | 17:21 | |
fungi | yeah, i have a sinking feeling something happened to the environment around that time and whatever was done to recover from it caused us to lose track of those | 17:21 |
fungi | given the close clustering of timestamps | 17:21 |
*** baoli has quit IRC | 17:22 | |
fungi | could also explain why we've been getting a little less performance out of it than we thought we should for the number of instances we were booting i support (though with them being idle, probably not) | 17:22 |
*** baoli has joined #openstack-infra | 17:22 | |
*** sree has quit IRC | 17:23 | |
clarkb | I'll gather what info I can for each one but ya I think we just delete them if nova diesnt know about them | 17:23 |
clarkb | (so please helo me double check that aspect of it) | 17:23 |
*** hamzy has joined #openstack-infra | 17:24 | |
fungi | i agree, it's more a warning that we should have some way of spotting leaks in those clouds | 17:24 |
*** electrofelix has quit IRC | 17:24 | |
fungi | i don't see much reason to keep them, though i also don't know as much about what else might be a vm in that environment | 17:24 |
*** kjackal_ has joined #openstack-infra | 17:24 | |
fungi | does the bifrost deployment create virtual machines on the hypervisor nodes outside nova's control? | 17:25 |
fungi | seems unlikely, but i'm not too familiar with its architecture | 17:25 |
fungi | also, i guess if we log into each of them and they all look like test nodes, then deletesky | 17:26 |
fungi | any way to easily tease hostnames out of them? | 17:26 |
clarkb | I'm pretty sure bifrost doesn't | 17:28 |
clarkb | fungi: not sure, but we can in theory attach to their consoles | 17:28 |
*** jamesmcarthur has quit IRC | 17:28 | |
*** sambetts is now known as sambetts|afk | 17:29 | |
*** spzala has quit IRC | 17:29 | |
clarkb | of the 4 VMs I have dumpxml'd 3 are from 12/15 and one is from 12/14 | 17:31 |
clarkb | and they all use flavor nodepool | 17:31 |
clarkb | fungi: I'm not able to connect to the console, get error: internal error: character device console0 is not using a PTY so that may not be possible | 17:33 |
clarkb | oh that is because nova redirects it to a file which we can read | 17:33 |
clarkb | the one on 009 is ubuntu-xenial-infracloud-chocolate-6205157 | 17:33 |
clarkb | now to cross check all these against mwhahaha's log | 17:33 |
mwhahaha | did i find some long lost vms? :D | 17:34 |
*** yamahata has joined #openstack-infra | 17:34 | |
fungi | mwhahaha: you taught us that apparently nova leaks like a sieve ;) | 17:34 |
mwhahaha | :o | 17:35 |
fungi | not really like a sieve. looks like we had some issue back in mid-december that caused us to lose track of some dozen or so instances in that cloud | 17:35 |
clarkb | ugh the one on 12 appears to be a failed boot and its just appending to its log file constantly | 17:35 |
mwhahaha | sounds like we need to invest in some flex tape | 17:35 |
*** pstack has quit IRC | 17:35 | |
fungi | mwhahaha: if it works like in the infomercials, i'll pick up a few cases | 17:36 |
*** florianf has quit IRC | 17:36 | |
clarkb | it is 4GB large | 17:36 |
*** 94KAA7YW9 has joined #openstack-infra | 17:37 | |
*** sbezverk_ has quit IRC | 17:37 | |
*** markvoelker has joined #openstack-infra | 17:37 | |
clarkb | centos-7-infracloud-chocolate-6200254 on 011 | 17:37 |
clarkb | ubuntu-xenial-infracloud-chocolate-6200221 on 013 | 17:38 |
pabelanger | clarkb: wow, nice work | 17:38 |
clarkb | ubuntu-xenial-infracloud-chocolate-6193047 on 028 | 17:39 |
clarkb | ubuntu-xenial-infracloud-chocolate-6192475 on 026 | 17:40 |
*** sbezverk has joined #openstack-infra | 17:42 | |
clarkb | the node on 024 isn't running | 17:42 |
clarkb | ubuntu-xenial-infracloud-chocolate-6198347 on 036 | 17:43 |
*** markvoelker has quit IRC | 17:44 | |
*** slagle has joined #openstack-infra | 17:45 | |
*** alexchadin has joined #openstack-infra | 17:45 | |
* clarkb stops posting all of them here (I think this is enough info to show the leaks are from nodepool and such) | 17:45 | |
*** annegentle has quit IRC | 17:45 | |
fungi | agreed, no need to keep any of those in my opinion | 17:46 |
*** pradk has joined #openstack-infra | 17:48 | |
clarkb | I don't see the node that ran mwhahaha's job though. So possibly that is the one that is appending to its console log such that I can't really see what it is. Going to try and grep through that log now | 17:49 |
*** alexchadin has quit IRC | 17:49 | |
clarkb | sdague: dansmith in the case of nova "leaking" libvirt VMs. Is it safe to virsh destroy and undefine the nodes under nova? Or are there bits of the database we should check again? | 17:51 |
*** Swami has joined #openstack-infra | 17:52 | |
*** pradk has quit IRC | 17:52 | |
*** bobh has joined #openstack-infra | 17:52 | |
clarkb | I am going to start by virsh shutdowning the instances so they stop trying to do things | 17:52 |
clarkb | or maybe even that isn't safe it if gets nova out of sync somewhere? | 17:53 |
*** florianf has joined #openstack-infra | 17:53 | |
dansmith | clarkb: nova is leaking libvirt vms? how? | 17:54 |
fungi | seems to me like nova's already out of sync? | 17:54 |
clarkb | dansmith: we don't know how, but there are VMs from December in infracloud that don't show up in nova listings | 17:54 |
mnaser | clarkb that behaviour can happen is things get messy in the cloud | 17:54 |
clarkb | dansmith: they are all from december 14 and 15 so guessing something went sideways | 17:54 |
mnaser | do you get warnings in the nova compute log with VM and database count not matching? | 17:54 |
clarkb | mnaser: let me see | 17:55 |
dansmith | clarkb: you can tell nova to reap things it doesn't know about, but I also don't know that I've ever seen that happen | 17:55 |
fungi | dansmith: best guess is that cloud suffered some sort of trauma months ago and didn't record these in the db or was unable to fully destroy them when it deleted them. also keep in mind this is still old code (mitaka-based i believe?) | 17:55 |
*** tnovacik has joined #openstack-infra | 17:55 | |
clarkb | mnaser: 2017-08-08 17:52:56.780 12787 WARNING nova.compute.manager [req-b15f50fc-47af-4f7e-971a-40d3e232d89e - - - - -] While synchronizing instance power states, found 0 instances in the database and 1 instances on the hypervisor. yup | 17:55 |
mnaser | there ya go | 17:55 |
mnaser | those warnings will be your big hint | 17:55 |
mnaser | and as dansmith said there is a setting to delete them but i'm too scared to run that so manual cleanup would be easier | 17:56 |
clarkb | mnaser: given that indicates the db doesn't know about the instance I should be safe to just destroy it manually ya? | 17:56 |
fungi | good that we have something to look for in the future | 17:56 |
mnaser | i mean if you want to verify | 17:56 |
mnaser | virsh dumpxml <foo> the ID of the VM | 17:56 |
mnaser | will be the instance uuid | 17:56 |
mnaser | you can cross check that with the nova instances table | 17:56 |
mnaser | it should be marked as deleted, if it is, you can virsh destroy <foo> and delete remains in /var/lib/nova/instances (that's what i do most of the time and nothing blew up) | 17:57 |
*** jrist has quit IRC | 17:57 | |
clarkb | mnaser: does virsh undefine not clean up /var/lib/nova/instances? | 17:57 |
dansmith | clarkb: not images | 17:58 |
clarkb | ah ok | 17:58 |
clarkb | I will start with a shutdown of the instances across the board so they stop running at least then we can go through and clean up | 17:58 |
*** pushkaraj__ has joined #openstack-infra | 17:58 | |
*** 94KAA7YW9 has quit IRC | 17:58 | |
*** pvaneck has joined #openstack-infra | 17:59 | |
*** spzala has joined #openstack-infra | 17:59 | |
*** spzala has quit IRC | 18:01 | |
*** rbrndt has joined #openstack-infra | 18:01 | |
*** spzala has joined #openstack-infra | 18:01 | |
*** makowals has quit IRC | 18:06 | |
clarkb | mnaser: I shouldn't need to modify the nova db at all right? just clean up hypervisor disk contents? | 18:07 |
mnaser | clarkb correct, if the API is returning "instance does not exist" it means for all the nova knows, that VM is supposed to be terminated | 18:08 |
clarkb | perfect, thanks | 18:09 |
clarkb | I'm almost done shutting down/destroying the instances so they stop running. Does anyone else want to look at infracloud and see if we have the same problem there? | 18:11 |
clarkb | fungi: pabelanger ^ | 18:11 |
*** makowals has joined #openstack-infra | 18:13 | |
mnaser | clarkb https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L6639-L6714 | 18:13 |
clarkb | fungi: pabelanger first step was running `sudo virsh list --all --uuid` against all the reachable hypervisors. Then take that list and remove any nodes that show up in todays nodepool log or in nodepool list data. Then remove our mirror node then cross check against nova listing | 18:14 |
*** slaweq has joined #openstack-infra | 18:14 | |
pabelanger | clarkb: not at the moment, chasing down zuul change queue questions | 18:14 |
mnaser | looks like that code was added 4-6 years ago | 18:14 |
mnaser | and the option to look for is running_deleted_instance_action | 18:15 |
mnaser | so you can set that to log or shutdown (or reap if you want) | 18:15 |
*** kjackal_ has quit IRC | 18:15 | |
fungi | clarkb: i can probably take a look after the infra meeting | 18:15 |
clarkb | mnaser: oh if shutdown is an option we probably want to set it to that. Thank you | 18:15 |
*** EricGonczer_ has joined #openstack-infra | 18:16 | |
mnaser | the odd thing is it seems to default to reap | 18:16 |
clarkb | (though I'm finding our images don't shutdown they have to be destroyed... guessing -minimal image builds don't have the acpi bits to handle a graceful shutdown request) | 18:16 |
mnaser | so it should be deleting them.. somehow | 18:16 |
mnaser | but i guess its not | 18:16 |
* mnaser shrugs | 18:16 | |
clarkb | ya double checking our config we don't seem to set any value | 18:17 |
mnaser | i dont know enough to know why its not getting reap'd but i know we dont have it set to anything (i think) and we get orphan instances sometimes | 18:17 |
*** jamesmcarthur has joined #openstack-infra | 18:18 | |
clarkb | ok I've got all of them in a non running state | 18:19 |
clarkb | in chocolate | 18:19 |
clarkb | mnaser: so you are saying clear out the content in /var/lib/nova/instances and virsh undefine the domains? | 18:20 |
pabelanger | jeblair: question on changequeue merging: http://logs.openstack.org/33/491633/1/gate/gate-project-config-layout/3bc9763/console.html#_2017-08-08_10_50_23_258797 The reason networking-bagpipe is merged into other tripleojob is because it shares a job ate-tempest-dsvm-networking-bgpvpn-bagpipe-ubuntu-xenia with networking-bgpvpn, which it shares a job with tripleo-ci? | 18:20 |
mnaser | clarkb yep, and just to be clear the contents of that specific instance id, not all of /var/lib/nova/instances :p | 18:20 |
clarkb | ya | 18:21 |
clarkb | /var/lib/nova/instances/$uuid | 18:21 |
mnaser | yeah that should be okay | 18:22 |
clarkb | ok compute064 is done | 18:22 |
clarkb | I'm going to tail the nova compute log there and just doulbe check we don't get that warning again or any new errors before going to other hypervisors | 18:23 |
*** jamesmcarthur has quit IRC | 18:23 | |
clarkb | mnaser: dansmith thanks for the help | 18:24 |
*** jamesmcarthur has joined #openstack-infra | 18:24 | |
mnaser | np | 18:24 |
*** trown|lunch is now known as trown | 18:30 | |
*** jamesmcarthur has quit IRC | 18:31 | |
*** jamesmcarthur has joined #openstack-infra | 18:32 | |
*** nicolasbock has quit IRC | 18:32 | |
clarkb | logs look good on 064 and we have successfully booted new instances in that cloud. Going to move forward finishing up cleanup for the rest of these VMs | 18:34 |
*** dprince has quit IRC | 18:36 | |
openstackgerrit | Mathieu Gagné proposed openstack-infra/project-config master: Bump internap-mtl01 capacity to 190 https://review.openstack.org/491882 | 18:37 |
*** florianf has quit IRC | 18:39 | |
*** jascott1 has joined #openstack-infra | 18:39 | |
pabelanger | mgagne: +2 | 18:40 |
pabelanger | and danke! | 18:40 |
mgagne | =) | 18:40 |
*** markvoelker has joined #openstack-infra | 18:40 | |
fungi | thanks clarkb and mnaser! | 18:41 |
fungi | big thanks mgagne! | 18:42 |
*** alexchadin has joined #openstack-infra | 18:46 | |
*** markvoelker has quit IRC | 18:48 | |
*** EricGonczer_ has quit IRC | 18:48 | |
*** Apoorva_ has quit IRC | 18:49 | |
*** Apoorva has joined #openstack-infra | 18:50 | |
*** alexchadin has quit IRC | 18:50 | |
*** EricGonczer_ has joined #openstack-infra | 18:51 | |
clarkb | ok I think chocolate is all cleaned up assuming my list of leaked VMs was complete | 18:53 |
openstackgerrit | wes hayutin proposed openstack-infra/tripleo-ci master: fix random broken pipe on du command https://review.openstack.org/491884 | 18:54 |
clarkb | libvirt domains are all undefined and the nova instances dirs for each has been dleeted | 18:54 |
*** florianf has joined #openstack-infra | 18:54 | |
clarkb | fungi: I can likely tackle vanilla after meeting and lunch but would be good if more than one person is familiar with this :) | 18:55 |
fungi | clarkb: i don't disagree, though i' | 18:56 |
fungi | ve done so little with infra-cloud so far that my learning curve will be steeeeep | 18:56 |
clarkb | I'll walk you through it :) | 18:57 |
fungi | much appreciated | 18:57 |
clarkb | this particular issue isn't too bad especially once mnaser pointed out that log warning we can grep for | 18:57 |
fungi | there's no tc meeting today, so in an hour i should have time to take it for a spin | 18:57 |
clarkb | just lots of listing and cross referencing stuff | 18:57 |
jeblair | pabelanger: that sounds plausible, though i haven't looked at the details. that is how the queue merging works. | 18:58 |
dmsimard | The publishers I see i project-config are all scp, ftp, afs -- is there no way to run shell inside a publisher ? I see one instance of "postbuildscript" used here: https://github.com/openstack-infra/project-config/blob/master/jenkins/jobs/infra.yaml#L299 but I doubt it works | 18:58 |
jeblair | dmsimard: yes that's complicated and best avoided. | 18:59 |
jeblair | dmsimard: super easy in v3. | 18:59 |
*** vhosakot has quit IRC | 18:59 | |
dmsimard | jeblair: context is to run log collection outside of the job and inside a publisher instead so that if the job times out, logs are available | 19:00 |
fungi | it's that (weekly infra team meeting) time again! find us in #openstack-meeting for the next hour | 19:00 |
clarkb | dmsimard: the way we do that with devstack-gate is to timeout the main test process 5 minutes before the job timeout | 19:00 |
clarkb | dmsimard: then you have 5 minutes to colelct logs | 19:00 |
jeblair | dmsimard: right. devstack-gate has support for that. | 19:00 |
jeblair | dmsimard: otherwise, there isn't a good way for that in v2. | 19:00 |
*** slaweq has quit IRC | 19:00 | |
dmsimard | clarkb: yikes | 19:00 |
*** vhosakot has joined #openstack-infra | 19:00 | |
dmsimard | okay, then, thanks :) | 19:00 |
*** slaweq has joined #openstack-infra | 19:01 | |
openstackgerrit | wes hayutin proposed openstack-infra/tripleo-ci master: fix random broken pipe on du command https://review.openstack.org/491884 | 19:03 |
*** vhosakot has quit IRC | 19:05 | |
*** slaweq has quit IRC | 19:06 | |
*** sslypushenko_ has joined #openstack-infra | 19:06 | |
*** slaweq has joined #openstack-infra | 19:06 | |
*** vhosakot has joined #openstack-infra | 19:10 | |
*** kjackal_ has joined #openstack-infra | 19:20 | |
*** baoli has quit IRC | 19:22 | |
sdague | fungi / clarkb interesting git review edge case I just ran into | 19:23 |
sdague | 3 patch series, in merge conflict | 19:23 |
sdague | rebase the first patch in gerrit ui, second is a merge conflict so you can't | 19:23 |
sdague | pull them down, rebase on master | 19:23 |
sdague | git review... failed | 19:23 |
sdague | because the bottom ref did not change | 19:23 |
sdague | it will not push the other two | 19:24 |
clarkb | sdague: what you can do is rebase the other two onto the updated base and that should work | 19:24 |
fungi | strange, if the bottom patch isn't different from what's already in gerrit, it should ignore it and push the others | 19:24 |
clarkb | because then it will have the same sha1 and not attempt a zero delta update (it will just recognize it as existing) | 19:24 |
fungi | oh, yeah if you somehow changed the bottom sha locally after that | 19:25 |
clarkb | fungi: its different in its sha1 because of timestamps and such but the patch diff is nil | 19:25 |
fungi | right | 19:25 |
sdague | http://paste.openstack.org/show/617823/ | 19:25 |
sdague | yeh | 19:25 |
sdague | I ended up just making a random change in the base patch in the gerrit ui | 19:25 |
fungi | i agree that's a tough one to automate away | 19:25 |
sdague | then pushed over | 19:25 |
sdague | yeh, it's definitely very edge case | 19:25 |
clarkb | you don't need to make a random change in the base patch | 19:25 |
sdague | but it seemed interesting enough to at least tell someone | 19:25 |
clarkb | you just have to rebase second and third patch on what is in gerrit for first patch | 19:26 |
sdague | clarkb: sure, but that's actually more work than gerrit random change && git review | 19:26 |
clarkb | this is also why git review -x can be problematic because you can easily end up with updates to changes that are considered nil changes | 19:26 |
clarkb | its also not something git review can really do anything about, its gerrit behavior | 19:26 |
sdague | yep, that's fine | 19:27 |
sdague | like I said, it's just an interesting edge condition | 19:27 |
fungi | more work, but does avoid yet one more patchset on that change at least | 19:29 |
fungi | so maybe a tradeoff | 19:30 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 19:30 |
sdague | yeh, at this point I was optimizing for time | 19:32 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Add mapping file containing v2 to v3 mappings https://review.openstack.org/491804 | 19:39 |
*** markvoelker has joined #openstack-infra | 19:44 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add comments about base jobs https://review.openstack.org/491897 | 19:45 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683 | 19:49 |
*** markvoelker has quit IRC | 19:51 | |
*** pushkaraj__ has quit IRC | 19:56 | |
clarkb | fungi: with meeting winding down. `OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml venv/bin/openstack --os-cloud admin-infracloud-vanilla compute service list` is what you run to get a list of all the nova services, we want to filter out the compute hosts from that. Then for each up compute host I ran ssh for loop to get `echo $hostname && sudo virsh list --all --uuid` | 19:56 |
clarkb | fungi: for the computes that are down I manually attempted sshing to them and in chocolate none of them responded | 19:57 |
clarkb | then with that list I removed any uuid taht showed up in nodepool launcher log from today, any that show up in nodepool list and any that show up in nodepool list. Also remove the mirror node | 19:57 |
clarkb | fungi: then the remaining uuids I ssh'd into each compute host with one of them and ran virsh dumpxml $uuid to get info about the node (shows you the flavor and creation time) | 19:58 |
fungi | clarkb: from the puppetmaster presumably | 19:58 |
clarkb | once confirmed that the instances are old and not needed via dumpxml you do `virsh undefine $uuid` and then delete /var/lib/nova/instances/$uuid | 19:58 |
pabelanger | neat, just got Your OpenStack Summit Sydney Registration Code email | 19:59 |
clarkb | fungi: the nodepool checking I did on nodepool.o.o but the nova list on puppetmaster yes | 19:59 |
fungi | pabelanger: from me or the full discount one? | 19:59 |
pabelanger | fungi: from kendall@openstack.org | 20:00 |
fungi | pabelanger: okay, so the full one. good ;) | 20:00 |
fungi | since you were a ptg attendee in atlanta you shouldn't have gotten one from me | 20:00 |
clarkb | fungi: I'm going to grab food now but will watch irc if you have questions about ^ | 20:00 |
*** pushkaraj__ has joined #openstack-infra | 20:00 | |
fungi | i only sent the us$300 codes, and those were de-duped to not include ptg attendees | 20:01 |
*** baoli has joined #openstack-infra | 20:01 | |
fungi | clarkb: will do, this is enough to get me started, thanks! | 20:01 |
pabelanger | fungi: cool | 20:01 |
fungi | clarkb: quick question though, any reason you're calling openstackclient from a virtualenv rather than using the globally-installed one we have on the puppetmaster? | 20:01 |
*** baoli_ has joined #openstack-infra | 20:02 | |
fungi | i've usually used the globally installed one, and it's been working fine, but wondering if i shouldn't be for some reason | 20:02 |
fungi | looks like we have 7 compute hosts down in vanilla | 20:03 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805 | 20:03 |
pabelanger | fungi: that sounds about right | 20:04 |
fungi | cehcking the down ones, so far they don't even respond to ping (while the working ones do) | 20:04 |
*** baoli has quit IRC | 20:05 | |
*** pushkaraj__ has quit IRC | 20:05 | |
clarkb | fungi: no reason for venv I like controlling the client versions | 20:05 |
fungi | okay, so not for any particular bug | 20:06 |
pabelanger | fungi: ya, I know the last few compute hosts I couldn't access via ilo either. | 20:06 |
fungi | compute035.vanilla.ic is responding to ping but refused my first ssh connection (tcp rst) | 20:06 |
fungi | now it's timing out subsequent ssh attempts | 20:07 |
*** jamesdenton has quit IRC | 20:07 | |
fungi | wonder if i asploded it trying to ssh in | 20:07 |
pabelanger | I know a few had HDDs that look to be dying | 20:07 |
fungi | nah, some ssh attempts are refused by it, others time out | 20:07 |
fungi | any good way to make ironic reboot these? openstack baremetal reboot or something? | 20:08 |
fungi | public endpoint for baremetal service in RegionOne region not found | 20:09 |
fungi | poop | 20:09 |
*** jamesdenton has joined #openstack-infra | 20:10 | |
fungi | looks like maybe we don't have it in the catalog. i'm trying our instructions for hitting it from the controller | 20:11 |
fungi | that seems to get it | 20:11 |
*** e0ne has joined #openstack-infra | 20:12 | |
pabelanger | fungi: clarkb: so, I think we might have a bad hypervisor in citycloud-lon1, incoming logstash query | 20:13 |
pabelanger | http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22%5C%5C%5C%22msg%5C%5C%5C%22%3A%20%5C%5C%5C%22Timer%20expired%5C%5C%5C%22%5C%22%20AND%20message%3A%5C%22%5C%5C%5C%22rc%5C%5C%5C%22%3A%20257%5C%22%20AND%20filename%3A%5C%22console.html%5C%22%20AND%20voting%3A1&from=864000s | 20:13 |
clarkb | ya you have to hit it on the main baremetal node | 20:13 |
clarkb | brcause bifrost is not a full openstack deployment | 20:14 |
clarkb | so no real auth and isnt exposed eith thr other apis | 20:14 |
pabelanger | fungi: clarkb: would should see about passing the info to citycloud and have them confirm | 20:14 |
clarkb | pabelanger: we probably want to give them our VM uuids so they cantrack it to hypervisor | 20:14 |
pabelanger | Ya, I can get a list here in a few minutes | 20:14 |
fungi | gonna try to reboot all the controllers that i can't ssh into | 20:15 |
fungi | er, compute nodes i mean | 20:15 |
fungi | all the ones listed by nova as being down anyway | 20:15 |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Use new syntax for base jobs https://review.openstack.org/491906 | 20:16 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Remove base job https://review.openstack.org/491907 | 20:16 |
*** jkilpatr has quit IRC | 20:16 | |
*** jamesmcarthur has quit IRC | 20:19 | |
*** kgiusti has left #openstack-infra | 20:19 | |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Require a base job https://review.openstack.org/491610 | 20:20 |
pabelanger | clarkb: last 4 UUIDs http://paste.openstack.org/show/617830/ | 20:21 |
pabelanger | I'll compose an email shortly | 20:22 |
*** e0ne has quit IRC | 20:23 | |
*** adisky__ has quit IRC | 20:23 | |
fungi | i've confirmed i couldn't ssh into any of the down compute nodes in vanilla, and have asked ironic to reboot them. giving it a few minutes (none are up in nova's service list just yet) | 20:25 |
*** jamesmcarthur has joined #openstack-infra | 20:25 | |
*** kjackal_ has quit IRC | 20:25 | |
pabelanger | fungi: thanks! | 20:25 |
fungi | after that i'll start collecting instance lists | 20:26 |
*** jcoufal has quit IRC | 20:27 | |
*** jamesmcarthur has quit IRC | 20:28 | |
pabelanger | clarkb: fungi: emails sent | 20:31 |
*** rossella_s has quit IRC | 20:31 | |
fungi | thanks pabelanger! | 20:33 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable https://review.openstack.org/491915 | 20:33 |
fungi | oh, hey, compute40 came back online after rebooting | 20:33 |
*** rossella_s has joined #openstack-infra | 20:35 | |
clarkb | pabelanger: thanks, I see it | 20:36 |
clarkb | fungi: nice | 20:36 |
*** jkilpatr has joined #openstack-infra | 20:37 | |
*** e0ne has joined #openstack-infra | 20:37 | |
*** jamesmcarthur has joined #openstack-infra | 20:38 | |
*** felipemonteiro__ has quit IRC | 20:40 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable https://review.openstack.org/491915 | 20:43 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable https://review.openstack.org/491915 | 20:44 |
fungi | compute35 came back up into a state where it again responds to ping and refuses ssh access. no clue what's up there | 20:44 |
fungi | maybe it's missing a host key or something | 20:44 |
*** marst_ has quit IRC | 20:44 | |
sshnaidm | clarkb, fungi fyi, solution for bug with multiple /etc is merging here: https://review.openstack.org/#/c/481233/ | 20:45 |
openstackgerrit | Merged openstack-infra/project-config master: Bump internap-mtl01 capacity to 190 https://review.openstack.org/491882 | 20:47 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable https://review.openstack.org/491915 | 20:49 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable https://review.openstack.org/491915 | 20:50 |
*** baoli_ has quit IRC | 20:50 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805 | 20:50 |
*** krtaylor has quit IRC | 20:51 | |
fungi | sshnaidm: that looks like it could be an effective reduction. thanks | 20:52 |
*** sbezverk has quit IRC | 20:53 | |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Add mapping file containing v2 to v3 mappings https://review.openstack.org/491804 | 20:55 |
*** dprince has joined #openstack-infra | 20:55 | |
openstackgerrit | Matthew Treinish proposed openstack/os-testr master: Switch to stestr under the covers https://review.openstack.org/488441 | 20:55 |
*** e0ne has quit IRC | 20:55 | |
*** marst has joined #openstack-infra | 20:55 | |
openstackgerrit | Merged openstack-infra/project-config master: Use new syntax for base jobs https://review.openstack.org/491906 | 20:56 |
fungi | clarkb: okay, i've confirmed that the remaining down computes after attempting to reboot them are still inaccessible via ssh, so proceeding to the instance lists collection phase | 20:57 |
clarkb | ok | 20:57 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805 | 20:58 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Create fetch-tox-output role https://review.openstack.org/490643 | 20:59 |
*** camunoz has quit IRC | 21:01 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805 | 21:01 |
*** spzala has quit IRC | 21:02 | |
*** iyamahat has quit IRC | 21:02 | |
*** baoli has joined #openstack-infra | 21:03 | |
*** trown is now known as trown|outtypewww | 21:03 | |
*** iyamahat has joined #openstack-infra | 21:03 | |
fungi | clarkb: pabelanger: should i find it odd that puppetmaster isn't recognizing the ssh host keys for a lot of infracloud vanilla compute nodes? | 21:04 |
fungi | starting to wonder how ansible has been dealing with them | 21:04 |
*** dprince has quit IRC | 21:05 | |
clarkb | ya I would expect the root user to be able to ssh to them as part of the ansibling | 21:05 |
clarkb | I ssh'ed from my local desktop when doing the virsh listings in chocolate though | 21:05 |
fungi | d'oh, operator error | 21:06 |
fungi | i was missing the sudo on ssh | 21:06 |
fungi | so it was trying to add them to my ~/.ssh/known_hosts on puppetmaster | 21:07 |
openstackgerrit | Gabriele Cerami proposed openstack-infra/tripleo-ci master: WIP: containers periodic test https://review.openstack.org/475747 | 21:08 |
*** yamamoto has joined #openstack-infra | 21:09 | |
fungi | clarkb: optimization, for my own sense of laziness... gonna generate two uuid lists a little while apart and only check entries which appear in both lists | 21:10 |
fungi | need to grab a bite to eat anyway, so i'll put that delay to good use | 21:10 |
clarkb | ok | 21:10 |
openstackgerrit | Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: WIP: Add upload-pypi job https://review.openstack.org/491926 | 21:11 |
clarkb | fungi: another way is to do xml parsing and only look at domains for which the creation time is older than say a week | 21:12 |
clarkb | but that is likely far more work because xml | 21:12 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805 | 21:13 |
clarkb | fungi: pabelanger we might also want to followup with citycloud on the state of sto2 (I think ti was that region) as that is 50 instance quota we are not able to use currently | 21:13 |
*** ldnunes_ has quit IRC | 21:15 | |
*** sslypushenko_ has quit IRC | 21:16 | |
pabelanger | clarkb: Ya, I haven't heard anything back myself | 21:16 |
*** camunoz has joined #openstack-infra | 21:17 | |
*** slaweq has quit IRC | 21:18 | |
openstackgerrit | Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: WIP: Add upload-pypi job https://review.openstack.org/491926 | 21:18 |
*** jamesmcarthur has quit IRC | 21:20 | |
*** EricGonczer_ has quit IRC | 21:21 | |
*** yamamoto has quit IRC | 21:21 | |
*** sbezverk has joined #openstack-infra | 21:22 | |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Zuulv3: update sql reporter syntax https://review.openstack.org/491932 | 21:23 |
jeblair | pabelanger, mordred: ^ we need that in to restart zuulv3 | 21:23 |
pabelanger | +2 | 21:25 |
*** rockyg has joined #openstack-infra | 21:27 | |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add comments about base jobs https://review.openstack.org/491897 | 21:29 |
*** annegentle has joined #openstack-infra | 21:32 | |
*** felipemonteiro_ has joined #openstack-infra | 21:33 | |
*** felipemonteiro__ has joined #openstack-infra | 21:35 | |
*** pvaneck_ has joined #openstack-infra | 21:35 | |
*** markvoelker has joined #openstack-infra | 21:37 | |
*** thorst has quit IRC | 21:37 | |
*** pvaneck has quit IRC | 21:38 | |
*** felipemonteiro_ has quit IRC | 21:38 | |
openstackgerrit | Merged openstack-infra/project-config master: Zuulv3: update sql reporter syntax https://review.openstack.org/491932 | 21:43 |
*** markvoelker has quit IRC | 21:44 | |
*** jascott1 has quit IRC | 21:45 | |
*** jascott1 has joined #openstack-infra | 21:45 | |
*** jascott1 has quit IRC | 21:46 | |
*** jascott1 has joined #openstack-infra | 21:46 | |
*** jascott1 has quit IRC | 21:49 | |
*** jascott1 has joined #openstack-infra | 21:50 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805 | 21:53 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Run zuul-migrate job on changes to mapping file https://review.openstack.org/491937 | 21:54 |
mordred | jeblair, pabelanger: ^^ also there's the project-config change to run that job on changes to the mapping file | 21:54 |
*** jascott1 has quit IRC | 21:54 | |
*** yamamoto has joined #openstack-infra | 21:59 | |
*** florianf has quit IRC | 21:59 | |
*** dprince has joined #openstack-infra | 21:59 | |
*** priteau has quit IRC | 22:02 | |
*** esberglu has quit IRC | 22:03 | |
*** markvoelker has joined #openstack-infra | 22:04 | |
*** esberglu has joined #openstack-infra | 22:04 | |
*** markvoelker_ has joined #openstack-infra | 22:05 | |
*** esberglu has quit IRC | 22:08 | |
*** markvoelker has quit IRC | 22:09 | |
clarkb | I eyeballed the PTG walk poorly. its .8 miles according to google (still walkable but quite a bit more than 1/4 mile) | 22:14 |
*** camunoz has quit IRC | 22:16 | |
*** esberglu has joined #openstack-infra | 22:16 | |
fungi | yeah, no concerns with that on my part | 22:17 |
fungi | my luggage is a backpack, so i could do miles on foot with it uphill if needed | 22:17 |
*** slaweq has joined #openstack-infra | 22:19 | |
*** slaweq has quit IRC | 22:24 | |
*** rockyg has quit IRC | 22:24 | |
openstackgerrit | Tim Burke proposed openstack-infra/project-config master: Add release notes jobs for python-swiftclient https://review.openstack.org/491940 | 22:26 |
*** Julien-zte has joined #openstack-infra | 22:27 | |
jeblair | clarkb, mordred, fungi: are there other reports of infracloud being slow? | 22:29 |
clarkb | jeblair: I think we've heard it about other clouds but haven't seen infracloud necessarily | 22:29 |
clarkb | we are also tracking job timeouts with e-r and they are up since turning off osic | 22:29 |
clarkb | you mgiht want to pull up that query and see what the cloud distribution is | 22:30 |
*** spzala has joined #openstack-infra | 22:30 | |
fungi | also we still keep fiddling with the max-servers in infra-cloud to figure out how hard we can push it (and as we run at capacity for a while we've still needed to reduce it a couple times) | 22:30 |
*** dprince has quit IRC | 22:31 | |
*** felipemonteiro__ has quit IRC | 22:31 | |
clarkb | http://blog.ffwll.ch/2017/08/github-why-cant-host-the-kernel.html completely unrelated but potentially interesting | 22:32 |
jeblair | i'm trying to figure out if we should scale infracloud back some more | 22:33 |
jeblair | i don't really have the time to tune it myself. so if we think this is a signal that we're still oversubscribed, maybe we should lower our usage some. | 22:33 |
jeblair | but if we think it's an errant signal, i'll just ignore it for now and see if zuulv3 jobs run faster when we're less busy. | 22:34 |
clarkb | vanilla is 28% of job timeouts, chocolate is 17% | 22:34 |
clarkb | citycloud lon1 is 15% and rax ord is 11% | 22:34 |
*** aeng_ has joined #openstack-infra | 22:34 | |
*** xyang1 has quit IRC | 22:34 | |
clarkb | so vanilla is significanlty more likely to timeout that other regions but chocolate seems to be in line (if high) with other regions | 22:35 |
clarkb | also that isn't scaled against total server quota, just percentage of total fails | 22:35 |
jeblair | vanilla is 12% of capacity and chocolate is 9% | 22:35 |
jeblair | so the're vaguely hand-wavey 2x as represented in timeouts as they should be based on their proportion of quota | 22:36 |
*** markvoelker has joined #openstack-infra | 22:36 | |
*** gordc has quit IRC | 22:37 | |
*** bobh has quit IRC | 22:37 | |
*** markvoel_ has joined #openstack-infra | 22:37 | |
clarkb | we've also done about 10 timeouts per hour based on logstash data | 22:38 |
*** thorst has joined #openstack-infra | 22:38 | |
clarkb | out of 600-900 jobs launched per hour | 22:38 |
clarkb | based on that I think I would tune vanilla back | 22:40 |
*** markvoelker_ has quit IRC | 22:40 | |
clarkb | chocolate maybe less so? but likely needs it as well | 22:40 |
*** markvoelker has quit IRC | 22:41 | |
*** Julien-zte has quit IRC | 22:41 | |
*** jascott1 has joined #openstack-infra | 22:41 | |
*** jaypipes has quit IRC | 22:42 | |
*** Julien-z_ has joined #openstack-infra | 22:42 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Don't pass self to a bound method https://review.openstack.org/491946 | 22:43 |
*** thorst has quit IRC | 22:43 | |
clarkb | I think that cleaning up any leaked instances will help too | 22:44 |
clarkb | chocolate should start being better in that regard, but to be determined if vanilla has a problem | 22:44 |
openstackgerrit | Swaminathan Vasudevan proposed openstack/diskimage-builder master: Failed to open parameter YAML error while trying to unmount imagedir https://review.openstack.org/490637 | 22:45 |
fungi | yeah, we've got one node stuck in a delete state in vanilla for several months i'm trying to work out how to clean up | 22:45 |
fungi | looks like it's active according to nova so i'm going to attempt to delete it through the api | 22:46 |
clarkb | fungi: we have one of those in chocolate too if you find out how to clear the one in vanilla we can do that one next | 22:46 |
*** rbrndt has quit IRC | 22:46 | |
fungi | and nova continues to list it in an active state | 22:47 |
fungi | not reachable via ssh | 22:48 |
fungi | the uuid also isn't showing up in the virsh list | 22:48 |
fungi | so i think this one's teh inverse of the others from earlier. nova still thinks it exists but it doesn't appear to (or maybe it's on a dead compute node?) | 22:49 |
*** vhosakot has quit IRC | 22:49 | |
clarkb | ah | 22:49 |
clarkb | if you nova show it as admin I think you can get the hypervisor info from nova | 22:50 |
clarkb | you might also be able to tell nova to forget about it as admin? | 22:50 |
fungi | cool, will try that in a jiffy | 22:50 |
*** krtaylor has joined #openstack-infra | 22:50 | |
jeblair | okay, so ze01 can rsync a repo to mtl01 at 6mbps, but it can rsync the same repo to vanilla at something like 160kbps | 22:51 |
* fungi wonders if we need to switch our isdn to dual-channel | 22:51 | |
fungi | btw, we're down to 8 uuids in vanilla which appeared in my initial list. i'm about to check whether we have any left in nodepool older than the initial list i made (other than the months-old ghost that is) | 22:53 |
jeblair | a wget of a large file saving to /dev/null on a vanilla node is reporting 70kbps | 22:53 |
jeblair | same file on mtl01 is 30mbps | 22:54 |
jeblair | same file on comput032.vanilla is also 70kbps | 22:55 |
fungi | looks like we still have a handful of nodes in vanilla running jobs since before i pulled the uuids from virsh, so odds are once these age out we're left with very few (if any) leaked from nova | 22:55 |
jeblair | so i think we're saturating our network link | 22:55 |
clarkb | jeblair: check chocolate too as its the same networking I think | 22:55 |
clarkb | (would help rule out hardware problems as it is different base hardware on roughyl the same networking) | 22:56 |
jeblair | clarkb: yeah, i checked a chocolate node and it's reporting about 180kbps | 22:56 |
jeblair | so, erm, twice as fast? :) | 22:56 |
jeblair | but it varies a lot, so could be approx the same | 22:57 |
fungi | so the phantom instance nova says is on compute012 but virsh list on that host doesn't include it | 22:59 |
clarkb | fungi: and you are using virsh list --all? | 22:59 |
clarkb | without --all you only see running instances | 22:59 |
clarkb | also check if /var/lib/nova/instances has a dir for it | 23:00 |
fungi | yup, --all --uuid | 23:00 |
*** markvoelker has joined #openstack-infra | 23:01 | |
fungi | there are in fact only two instances listed on compute012 and neither matches this uuid | 23:01 |
fungi | ubuntu-xenial-infracloud-vanilla-8895313 created 2017-05-19T10:07:18Z | 23:02 |
openstackgerrit | James E. Blair proposed openstack-infra/project-config master: Reduce infra-cloud usage https://review.openstack.org/491949 | 23:03 |
*** markvoel_ has quit IRC | 23:03 | |
jeblair | clarkb, fungi: there's a shot in the dark reduction ^ | 23:04 |
jeblair | clarkb, fungi: do we have any information about the network there and what we should expect? | 23:04 |
clarkb | jeblair: after the flood I know that local networking went to 1gig instead of 10Gbe in vanilla. But unsure of the internet connectivity | 23:04 |
clarkb | its also possible they are just throttling the hell out of us | 23:04 |
*** xarses_ has quit IRC | 23:05 | |
*** spzala has quit IRC | 23:05 | |
*** spzala has joined #openstack-infra | 23:05 | |
fungi | same, all i know is what i can read on https://docs.openstack.org/infra/system-config/infra-cloud.html | 23:05 |
*** spzala has quit IRC | 23:05 | |
*** spzala has joined #openstack-infra | 23:06 | |
*** marst has quit IRC | 23:06 | |
*** spzala has quit IRC | 23:06 | |
clarkb | rcarrillocruz: and cmurphy (possibly jesusaur) may know more | 23:06 |
*** jascott1 has quit IRC | 23:06 | |
*** rhallisey has quit IRC | 23:06 | |
*** spzala has joined #openstack-infra | 23:07 | |
*** spzala has quit IRC | 23:07 | |
*** spzala has joined #openstack-infra | 23:07 | |
*** pbourke has quit IRC | 23:07 | |
*** spzala has quit IRC | 23:07 | |
*** spzala has joined #openstack-infra | 23:08 | |
*** spzala has quit IRC | 23:08 | |
*** pbourke has joined #openstack-infra | 23:09 | |
jeblair | i feel like i'm missing something | 23:09 |
jeblair | we set max-servers to 96 on vanilla -- we have 45 compute hosts -- that's almost down to two vms per host | 23:10 |
fungi | we're now down to 4 uuids which were present in vanilla when i first checked, and two of those are known to nodepool, meaning we have two needing cleanup | 23:10 |
openstackgerrit | Isaku Yamahata proposed openstack-infra/project-config master: networking-odl: retire boron task https://review.openstack.org/491951 | 23:10 |
jeblair | cacti says the compute hosts average about 25mbit continuous inbound traffic | 23:10 |
jeblair | 2x150kbps != 25mbps | 23:11 |
jeblair | how is it anything other than way *undercommitted*? | 23:11 |
clarkb | fwiw its 35 compute hosts in vanilla that are operational | 23:11 |
jeblair | so nearly 3 nodes / host | 23:12 |
clarkb | ya 3 is 1:1 cpu ration | 23:12 |
clarkb | *ratio | 23:12 |
jeblair | and in fact, compute032 is running 3 instances right now | 23:12 |
*** sflanigan has joined #openstack-infra | 23:13 | |
jeblair | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=5981&rra_id=all | 23:13 |
clarkb | the base data is likely rabbitmq traffic and glance image transfers which is all on the same layer 2 so we should be running at 1gig for that | 23:14 |
jeblair | that graph makes it look like we've been running flat out at 25mbps for nearly 24h | 23:14 |
jeblair | image transfers should be a spike, and i hope we're not doing 25mbps of rabbit | 23:15 |
clarkb | ya I think the spike to 100Mbps must be image trasnfers | 23:15 |
jeblair | sounds reasonable | 23:15 |
fungi | so interestingly, these two "leaked" instances in vanilla are known to nova, neither is leaked. one is the mirror and the other is pabelanger-test1... so other than the phantom instance that we can't delete because it doesn't actually exist, we have no discrepancies there | 23:15 |
clarkb | that also makes me worry that we have been given 100mbps connectivity there and not gigabit | 23:16 |
jeblair | clarkb: indeed | 23:16 |
clarkb | jeblair: I know that SpamapS and greghaynes found rabbit to be very chatty. It wouldn't surprise me if it was doing 25mbps but I also think it is insane to be doing that | 23:16 |
openstackgerrit | Isaku Yamahata proposed openstack-infra/project-config master: networking-odl: retire boron task https://review.openstack.org/491951 | 23:16 |
*** markvoelker has quit IRC | 23:18 | |
*** markvoelker has joined #openstack-infra | 23:18 | |
fungi | trying to see if i can tease interface speeds out of one of the controllers | 23:19 |
fungi | er, computes | 23:19 |
jeblair | clarkb: when i run iftop on compute032, i see a *lot* of connections between zuul/git.o.o and many different infracloud ips | 23:19 |
jeblair | clarkb: i would only expect to see connections to the 3 ips of the nodes running on compute032 | 23:19 |
*** pvaneck_ has quit IRC | 23:19 | |
*** rhallisey has joined #openstack-infra | 23:20 | |
jeblair | clarkb: eg: http://paste.openstack.org/show/617852/ | 23:20 |
clarkb | jeblair: as if we are on a hub not a switch | 23:20 |
fungi | on compute12 (i had to install the ethtool package) i see eth2 is the only physical interface with link detected and it claims to be operating at 10000baseT/Full | 23:21 |
jeblair | clarkb: ya | 23:21 |
*** aeng has quit IRC | 23:21 | |
fungi | which i find dubious | 23:21 |
clarkb | fungi: ya eth2 is our only link | 23:21 |
clarkb | its why we have the weird bridge thing for neutron | 23:21 |
fungi | the 10gb link speed i find dubious i mean | 23:22 |
clarkb | but the weird bridge thing for neutron should be a proper switch | 23:22 |
clarkb | fungi: oh ya | 23:22 |
*** sdague has quit IRC | 23:22 | |
fungi | i guess we could check the bridge table in the kernel and make sure only local macs are showing up on the local interfaces? | 23:23 |
fungi | (to rule out bridge loops) | 23:23 |
*** markvoelker has quit IRC | 23:23 | |
* fungi plays around with brctl | 23:23 | |
mnaser | does all openstack infra testing happen on 8 core machines? | 23:24 |
fungi | br-vlan2551 and brq85ba3bb6-1f (on compute12) both seem to have a lot of macs showing on them | 23:25 |
clarkb | mnaser: it does not but hasn't always been the case (nor will it necessarily be the case in the future) | 23:25 |
persia | mnaser: That is the default request from infrastructure donors. | 23:25 |
clarkb | s/not/now/ | 23:25 |
*** Swami has quit IRC | 23:25 | |
fungi | mnaser: per https://docs.openstack.org/infra/manual/testing.html#known-differences-to-watch-out-for it can vary | 23:26 |
clarkb | fungi: the rough setup is eth2 - eth2.$vlan - br-$vlan - veth1 - veth2 iirc | 23:26 |
mnaser | so.. we're testing some new flavors with fully dedicated cores and i kinda wanted to throw a small 10-15 nodes to see how it copes with (and also help the gate a tiny bit if the setup is still there) | 23:26 |
mnaser | for example you'd get 2 cores + 8gb of memory, but 2 fully dedicated cores | 23:26 |
clarkb | fungi: the reason for that is we need an interface on $vlan for the hypervisor but need to put neutron the same vlan without letting it manage the vlan (so neutron is all untagged because if we let neutron manage it then it borks the hypervisor interface on the same vlan) | 23:26 |
clarkb | fungi: brq$stuff is the bridge neutron manages | 23:27 |
fungi | i was about to ask, that was my first guess though | 23:27 |
clarkb | I think what we want to do is tcpdump eth2 and see if we are getting hub like behavior (but thats rouhgly what iftop was doing for me) | 23:28 |
clarkb | because eth2 should be the raw ethernet connection and we should only see stuff destined to hosts behind it on the hypervisor | 23:28 |
fungi | clarkb: okay, so given that br-vlan2551 only shows a couple local macs and the rest (147 at the moment) are all showing nonlocal | 23:28 |
fungi | i'm guessing we don't see any sort of reflection to account for a storm | 23:29 |
clarkb | but could we be contending for access to the bus if we are plugged into a hub? | 23:30 |
clarkb | we shouldn't see 147 non local IPs I don't think | 23:30 |
clarkb | controller, upstream router, and VMs are all we should have right? | 23:30 |
clarkb | oh it could be hypervisor to hypervisor because of multinode | 23:30 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Don't pass self to a bound method https://review.openstack.org/491946 | 23:30 |
clarkb | so maybe this is ok | 23:30 |
*** hongbin has quit IRC | 23:31 | |
fungi | by "plugged into a hub" i assume you mean a switch which has given up trying to track macs in its bridge table. i can't imagine where they'd find an ethernet hub in this day and age | 23:31 |
jeblair | and the mirror | 23:31 |
clarkb | but possible 100Mbps not 1gig or 1gig | 23:31 |
clarkb | fungi: right that, cam table filled and you lose | 23:31 |
fungi | i can certainly imagine some scenarios where certain switches may run into issues if we cycle through random macs faster than they get aged out of teh table and end up filling them up | 23:32 |
fungi | yes, that | 23:32 |
jeblair | 48/96 nodes are being used for multinode jobs | 23:32 |
clarkb | jeblair: what is a transfer between hypervisors like | 23:33 |
clarkb | that should be our best case transfer | 23:33 |
jeblair | will check | 23:33 |
fungi | compute017 is the one hosting the mirror, if that helps to compare against | 23:33 |
fungi | in vanilla | 23:34 |
fungi | and wow does it show signs of packet loss! | 23:34 |
fungi | http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=3&leaf_id=422 | 23:34 |
fungi | very high error count on eth2 as well | 23:35 |
fungi | most of its eth2 traffic is from eth2.2551 | 23:36 |
jeblair | clarkb: 160mbps | 23:36 |
fungi | i can see on the graph where the mirror vm got rebuilt since it seems like that's probably when it landed on this compute node | 23:37 |
jeblair | that's reading from disk | 23:38 |
clarkb | fungi: ya eth2.2552 is where we tag the vlan for all traffic outbound | 23:39 |
clarkb | fungi: so that includes all the hypervisor communication and all the VM communication | 23:39 |
jeblair | clarkb: sorry, 160mBps so 1280mbps | 23:39 |
fungi | picking another compute node at random, also seeing gaps in snmp responses and pretty high error rate on eth2 | 23:40 |
clarkb | jeblair: cool so we likely aren't having global issues. The best case comes out pretty well | 23:40 |
fungi | so i do agree it's more likely we're saturating the uplink rather than intra-cloud links | 23:40 |
clarkb | its possible that linux is hating us for the chained bridges (yay software switches) or upstream devices have trouble with VM macs changing frequently or problems are largely to internet? | 23:40 |
clarkb | fungi: ya | 23:41 |
fungi | baremetal00 shows similar packet loss and errors on eth2 | 23:41 |
fungi | hrm | 23:42 |
fungi | though it's seeing a solid 20mbps inbound on eth2.2551 | 23:42 |
fungi | it shouldn't really be consuming anything, right? | 23:42 |
clarkb | fungi: what does iftop show it talking to or just netstat? | 23:43 |
fungi | that one probably makes a good control group if we're looking for signs of a storm | 23:43 |
clarkb | bifrost/ironic will do heartbeats to the nodes | 23:43 |
clarkb | but ya 20mbps for heartbeats seems really high | 23:43 |
clarkb | and yes it should be good control group | 23:43 |
*** mattmceuen has quit IRC | 23:45 | |
*** soliosg has quit IRC | 23:46 | |
fungi | this doesn't look good. didn't even have to go that far | 23:46 |
clarkb | thats interesting I see nb03 and nb04 comms | 23:46 |
clarkb | to controll00 from compute021 | 23:46 |
fungi | yeah, a tcpdump on eth2.2551 showed me a centos7 test node talking to git.o.o | 23:47 |
fungi | that should _never_ make it to baremetal00 | 23:47 |
clarkb | sudo tcpdump -i any host 199.19.215.9 shows nb03 to controller00 | 23:48 |
clarkb | so ya | 23:48 |
fungi | so definitely looks like the switch layer in that rack at least is falling back to flooding behavior | 23:48 |
clarkb | ya | 23:48 |
fungi | we can e-mail hpe and ask them to power-cycle the switches i guess, though as i understand it we're not the only machines plugged into them | 23:49 |
clarkb | maybe have them check the theory at least | 23:49 |
clarkb | I wonder what our router is though | 23:50 |
fungi | 15.184.64.1 | 23:50 |
fungi | that's probably not what you meant | 23:50 |
clarkb | well sort of I know when tripleo was using this setup they used linux as a router | 23:51 |
clarkb | but that doesn't appear to be one of ours | 23:51 |
clarkb | so I think this is entirely upstream of us | 23:51 |
fungi | bug surprise, the oui of the router's mac (bceafa) is assigned to... [drumroll] | 23:52 |
fungi | Hewlett Packard | 23:52 |
fungi | so could be linux on a "newer" proliant (post-compaq) or something i guess | 23:53 |
jeblair | so likely two things: 1) switch acting as hub -- annoying, taxes each compute node with an extra 20mbps it has to ignore, but probably not killing performance. 2) upstream bandwidth limit. | 23:54 |
jeblair | that sound right? | 23:54 |
*** gildub has joined #openstack-infra | 23:55 | |
fungi | yeah, that's the best i can piece together | 23:55 |
jeblair | (interestingly, i wonder if the 20-25mbps we're seeing on all the nodes because of the switch behavior clues us into our upstream bandwidth? 25mbps/96=260kbps which is not too far off from the 160kbps we measured earlier) | 23:56 |
clarkb | also I bet image uploads tank that bw | 23:57 |
fungi | probably so | 23:57 |
fungi | we _could_ test the theory by dialing down max-servers to 0 in both clouds and then doing some bulk transfers to/from baremetal00 or the mirror or something | 23:57 |
fungi | if we really wanted a more accurate picture | 23:58 |
fungi | also possible we're not being throttled, but are sharing an uplink from that pod with some much more network-consuming neighbors hogging the available bandwidth | 23:58 |
*** slagle has quit IRC | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!