Tuesday, 2017-08-08

*** thorst has joined #openstack-infra		00:00
fungi	ianw: i suppose i could work an option for longer-term entries into http://git.openstack.org/cgit/openstack-infra/puppet-exim/tree/templates/aliases.erb but likely we actually need to do something more manageable if this persists much longer (like an actual spam identification system)	00:00
fungi	i've resisted the pressure to do that so far, but things have never been anywhere near this bad until the past few weeks	00:01
*** jamesmcarthur has quit IRC		00:02
*** thorst has quit IRC		00:02
*** slaweq has quit IRC		00:02
pabelanger	dmsimard: Ah, yes. It was limited to the CR repo for 7.4	00:03
fungi	okay, mysqldump just finished, gerrit restarting now	00:05
*** thorst has joined #openstack-infra		00:05
fungi	gerrit webui seems to be working again	00:06
fungi	#status log Gerrit on review.openstack.org restarted just now, and is no longer using contact store functionality or configuration options	00:07
openstackstatus	fungi: finished logging	00:07
fungi	i'll get a notice out to the infra ml tomorrow about https://review.openstack.org/491090	00:09
fungi	other than that, i think the gerrit-contactstore-removal spec is done	00:09
*** jamesmcarthur has joined #openstack-infra		00:12
*** jkilpatr has quit IRC		00:13
*** dingyichen has joined #openstack-infra		00:17
*** jamesmcarthur has quit IRC		00:17
*** gmann has quit IRC		00:18
*** gmann has joined #openstack-infra		00:18
*** slaweq has joined #openstack-infra		00:19
*** thorst has quit IRC		00:23
*** slaweq has quit IRC		00:23
*** thorst has joined #openstack-infra		00:23
*** harlowja has quit IRC		00:25
openstackgerrit	Merged openstack/diskimage-builder master: Bump fedora/fedora-minimal DIB_RELEASE 26 https://review.openstack.org/482570	00:26
*** thorst has quit IRC		00:27
*** slaweq has joined #openstack-infra		00:29
*** claudiub has quit IRC		00:34
*** slaweq has quit IRC		00:36
*** armax has quit IRC		00:37
*** thorst has joined #openstack-infra		00:38
pabelanger	ianw: clarkb: thanks, elastic-recheck seems to be detecting tripleo failures now	00:39
*** slaweq has joined #openstack-infra		00:41
*** Apoorva_ has joined #openstack-infra		00:42
*** bobh has joined #openstack-infra		00:45
*** Apoorva has quit IRC		00:45
*** slaweq has quit IRC		00:46
*** LindaWang has joined #openstack-infra		00:46
*** Apoorva_ has quit IRC		00:47
*** armax has joined #openstack-infra		00:48
*** liujiong has joined #openstack-infra		00:48
*** slaweq has joined #openstack-infra		00:51
*** markvoelker has joined #openstack-infra		00:55
*** slaweq has quit IRC		00:56
ianw	clarkb: any thoughts on http://logs.openstack.org/78/480778/2/check/gate-tempest-dsvm-neutron-full-centos-7-nv/8d9e9cc/logs/screen-n-cpu.txt.gz#_Aug_01_14_20_47_808336	00:56
ianw	unfortunately (?) your name comes up when looking for proxy errors in devstack logs :)	00:57
*** markvoelker_ has joined #openstack-infra		00:57
ianw	it might be a red herring though, maybe it's a real neutron issue that bubbles up to nova like this ...	00:57
*** markvoelker has quit IRC		01:01
*** slaweq has joined #openstack-infra		01:01
*** gouthamr has quit IRC		01:02
*** armax has quit IRC		01:07
*** slaweq has quit IRC		01:07
*** tuanluong has joined #openstack-infra		01:10
*** shu-mutou-AWAY is now known as shu-mutou		01:10
*** zhurong has joined #openstack-infra		01:11
*** pahuang has quit IRC		01:18
*** slaweq has joined #openstack-infra		01:23
*** thorst has quit IRC		01:24
*** slaweq has quit IRC		01:28
*** rwsu has quit IRC		01:32
ianw	23.253.166.156 - - [01/Aug/2017:14:20:39 +0000] "GET /identity/v3/auth/tokens HTTP/1.1" 200 3714	01:32
ianw	23.253.166.156 - - [01/Aug/2017:14:19:47 +0000] "GET /v2.0/auto-allocated-topology/f6806985392e4ece8ac13fb6784131b6 HTTP/1.1" 200 174	01:32
ianw	23.253.166.156 - - [01/Aug/2017:14:20:39 +0000] "GET /identity/v3/auth/tokens HTTP/1.1" 200 3586	01:32
ianw	nothing good ever happens when times goes backwards	01:32
*** slaweq has joined #openstack-infra		01:34
*** pahuang has joined #openstack-infra		01:35
*** dougwig has quit IRC		01:36
*** cuongnv has joined #openstack-infra		01:37
*** slaweq has quit IRC		01:40
*** rwsu has joined #openstack-infra		01:44
*** camunoz has quit IRC		01:48
*** pahuang has quit IRC		01:54
*** slaweq has joined #openstack-infra		01:56
openstackgerrit	sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683	02:00
*** bobh has quit IRC		02:00
*** slaweq has quit IRC		02:00
*** ramishra has quit IRC		02:03
*** iyamahat has quit IRC		02:06
*** slaweq has joined #openstack-infra		02:06
*** yamahata has quit IRC		02:07
*** pahuang has joined #openstack-infra		02:07
*** jamesmcarthur has joined #openstack-infra		02:13
*** slaweq has quit IRC		02:13
*** gildub has joined #openstack-infra		02:14
*** dhill_ has quit IRC		02:15
*** dhill_ has joined #openstack-infra		02:15
*** Marx314 has quit IRC		02:16
*** mtreinish has quit IRC		02:17
*** fbouliane has quit IRC		02:17
*** gtmanfred has quit IRC		02:17
*** rbergeron has quit IRC		02:18
*** lifeless has quit IRC		02:18
*** tnarg has quit IRC		02:18
*** rodrigods has quit IRC		02:19
*** rbergeron has joined #openstack-infra		02:19
*** lifeless has joined #openstack-infra		02:19
*** mtreinish has joined #openstack-infra		02:22
*** gtmanfred has joined #openstack-infra		02:23
*** rodrigods has joined #openstack-infra		02:23
*** fbouliane has joined #openstack-infra		02:23
*** gcb has joined #openstack-infra		02:29
*** ramishra has joined #openstack-infra		02:34
*** sree has joined #openstack-infra		02:34
*** bobh has joined #openstack-infra		02:35
*** sree has quit IRC		02:39
*** jamesmcarthur has quit IRC		02:40
*** slaweq has joined #openstack-infra		02:40
*** armax has joined #openstack-infra		02:41
*** armax has quit IRC		02:44
*** slaweq has quit IRC		02:45
*** yamamoto_ has joined #openstack-infra		02:46
*** yamamoto has quit IRC		02:46
*** jamesmcarthur has joined #openstack-infra		02:48
*** hongbin_ has joined #openstack-infra		02:49
*** hongbin has quit IRC		02:49
*** hongbin_ has quit IRC		02:49
*** hongbin has joined #openstack-infra		02:49
*** tnovacik has joined #openstack-infra		02:50
*** slaweq has joined #openstack-infra		02:51
openstackgerrit	jimmygc proposed openstack/diskimage-builder master: Fix ubuntu minimal build failure https://review.openstack.org/491653	02:55
*** slaweq has quit IRC		02:56
*** jamesmcarthur has quit IRC		02:56
*** slaweq has joined #openstack-infra		03:01
*** slaweq has quit IRC		03:05
*** ramineni has joined #openstack-infra		03:06
*** ramineni has left #openstack-infra		03:07
*** slaweq has joined #openstack-infra		03:11
*** spzala has quit IRC		03:16
*** david-lyle has quit IRC		03:16
*** slaweq has quit IRC		03:18
*** tnovacik has quit IRC		03:23
*** david-lyle has joined #openstack-infra		03:23
*** jascott1_ has quit IRC		03:24
*** nicolasbock has joined #openstack-infra		03:25
*** jascott1 has joined #openstack-infra		03:25
*** jascott1 has quit IRC		03:26
*** jascott1 has joined #openstack-infra		03:27
*** slaweq has joined #openstack-infra		03:33
*** slaweq has quit IRC		03:38
*** bobh has quit IRC		03:38
*** nicolasbock has quit IRC		03:39
*** slaweq has joined #openstack-infra		03:43
openstackgerrit	Merged openstack-infra/zuul-jobs master: Update the zuul-sphinx extension config https://review.openstack.org/491134	03:44
*** baoli has quit IRC		03:44
*** Dinesh_Bhor has joined #openstack-infra		03:45
*** dave-mccowan has quit IRC		03:47
*** slaweq has quit IRC		03:49
*** nicolasbock has joined #openstack-infra		03:50
*** links has joined #openstack-infra		03:53
*** hongbin has quit IRC		03:56
*** esberglu has quit IRC		03:59
*** EricGonczer_ has joined #openstack-infra		04:02
*** ykarel has joined #openstack-infra		04:02
*** EricGonczer_ has quit IRC		04:20
*** adisky__ has joined #openstack-infra		04:21
*** thorst has joined #openstack-infra		04:25
*** thorst has quit IRC		04:30
*** harlowja has joined #openstack-infra		04:35
*** spzala has joined #openstack-infra		04:47
*** esberglu has joined #openstack-infra		04:49
*** spzala has quit IRC		04:51
*** pahuang has quit IRC		04:52
*** esberglu has quit IRC		04:53
*** jamesmcarthur has joined #openstack-infra		04:57
*** sflanigan has quit IRC		04:59
*** slaweq has joined #openstack-infra		05:00
*** claudiub has joined #openstack-infra		05:01
*** hareesh has joined #openstack-infra		05:01
*** jamesmcarthur has quit IRC		05:02
*** pahuang has joined #openstack-infra		05:05
*** slaweq has quit IRC		05:05
*** eranrom has quit IRC		05:08
*** slaweq has joined #openstack-infra		05:10
*** nicolasbock has quit IRC		05:11
*** harlowja has quit IRC		05:14
*** slaweq has quit IRC		05:15
*** waynr has joined #openstack-infra		05:19
*** waynr has left #openstack-infra		05:20
*** slaweq has joined #openstack-infra		05:20
*** slaweq has quit IRC		05:27
*** psachin has joined #openstack-infra		05:33
*** yamahata has joined #openstack-infra		05:41
openstackgerrit	Akihiro Motoki proposed openstack-infra/project-config master: Add release permission for neutron-vpnaas and dashboard https://review.openstack.org/491670	05:43
*** sree has joined #openstack-infra		05:49
*** nicolasbock has joined #openstack-infra		05:53
*** cshastri has joined #openstack-infra		05:53
*** thorst has joined #openstack-infra		05:58
*** markus_z has joined #openstack-infra		05:58
*** sflanigan has joined #openstack-infra		06:03
*** thorst has quit IRC		06:03
*** bhavik1 has joined #openstack-infra		06:05
*** pgadiya has joined #openstack-infra		06:18
*** rcernin has joined #openstack-infra		06:21
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Fix detail headers order for nodepool list https://review.openstack.org/491678	06:25
*** coolsvap has joined #openstack-infra		06:26
*** kjackal_ has joined #openstack-infra		06:28
*** bhavik1 has quit IRC		06:53
*** stevebaker has quit IRC		06:54
*** slaweq has joined #openstack-infra		06:58
*** zhurong has quit IRC		06:59
*** pcaruana has joined #openstack-infra		07:00
*** markvoelker_ has quit IRC		07:01
*** stevebaker has joined #openstack-infra		07:02
*** spzala has joined #openstack-infra		07:04
*** slaweq has quit IRC		07:04
*** jascott1 has quit IRC		07:05
*** jascott1 has joined #openstack-infra		07:05
*** markvoelker has joined #openstack-infra		07:07
*** markvoelker has quit IRC		07:08
*** markvoelker has joined #openstack-infra		07:08
*** spzala has quit IRC		07:09
*** aarefiev has joined #openstack-infra		07:10
*** jascott1 has quit IRC		07:10
*** gtrxcb has quit IRC		07:11
*** florianf has joined #openstack-infra		07:15
*** aviau has quit IRC		07:19
*** aviau has joined #openstack-infra		07:19
*** tesseract has joined #openstack-infra		07:21
*** ralonsoh has joined #openstack-infra		07:22
*** Swami has quit IRC		07:27
*** slaweq has joined #openstack-infra		07:30
*** Douhet has quit IRC		07:31
*** Douhet has joined #openstack-infra		07:32
*** slaweq has quit IRC		07:36
*** ccamacho has joined #openstack-infra		07:38
*** yamamoto_ has quit IRC		07:44
*** sflanigan has quit IRC		07:48
*** yamamoto has joined #openstack-infra		07:50
*** alexchadin has joined #openstack-infra		07:55
*** e0ne has joined #openstack-infra		07:56
*** ralonsoh_ has joined #openstack-infra		07:57
*** ralonsoh has quit IRC		07:57
*** rtjure has quit IRC		07:58
*** thorst has joined #openstack-infra		07:59
*** ralonsoh_ is now known as ralonsoh		08:02
*** arturb has quit IRC		08:02
*** thorst has quit IRC		08:04
*** shardy has joined #openstack-infra		08:06
*** seanhandley has left #openstack-infra		08:07
*** gildub has quit IRC		08:09
*** priteau has joined #openstack-infra		08:09
*** mwarad has joined #openstack-infra		08:15
*** _mwarad_ has joined #openstack-infra		08:15
*** _mwarad_ has quit IRC		08:15
*** derekh has joined #openstack-infra		08:20
*** dizquierdo has joined #openstack-infra		08:20
*** slaweq has joined #openstack-infra		08:25
*** dingyichen has quit IRC		08:25
*** lucas-afk is now known as lucasagomes		08:26
openstackgerrit	Merged openstack-infra/project-config master: Make neutron functional job non-voting https://review.openstack.org/491548	08:27
*** esberglu has joined #openstack-infra		08:28
bauzas	mmm, can't we now provide HTTP links in a gerrit comment ?	08:29
*** slaweq has quit IRC		08:29
*** esberglu has quit IRC		08:32
*** slaweq has joined #openstack-infra		08:35
*** slaweq has quit IRC		08:41
dimak	hey	08:41
strigazi	ianw yt?	08:41
dimak	I have an error with Babel from openstack mirror	08:42
dimak	http://logs.openstack.org/00/489000/2/gate/gate-dragonflow-python35/44a33cb/console.html#_2017-08-08_07_25_10_200073	08:42
dimak	Anyone noticed this?	08:42
*** electrofelix has joined #openstack-infra		08:46
*** rtjure has joined #openstack-infra		08:46
*** yamamoto has quit IRC		08:49
openstackgerrit	Spyros Trigazis (strigazi) proposed openstack-infra/system-config master: [magnum] Cache fedorapeople.org https://review.openstack.org/491466	08:49
openstackgerrit	Spyros Trigazis (strigazi) proposed openstack-infra/system-config master: [magnum] Cache fedorapeople.org https://review.openstack.org/491466	08:50
*** mwarad has quit IRC		08:59
openstackgerrit	Spyros Trigazis (strigazi) proposed openstack-infra/project-config master: [magnum] Cache fedorapeople.org https://review.openstack.org/491724	08:59
*** alexchadin has quit IRC		09:03
*** spzala has joined #openstack-infra		09:05
*** ykarel is now known as ykarel\|lunch		09:08
*** stakeda has quit IRC		09:09
*** spzala has quit IRC		09:10
*** alexchadin has joined #openstack-infra		09:15
*** nicolasbock has quit IRC		09:15
*** yamamoto has joined #openstack-infra		09:15
*** slaweq has joined #openstack-infra		09:19
*** sambetts\|afk is now known as sambetts		09:20
*** yamamoto has quit IRC		09:22
*** slaweq has quit IRC		09:25
*** pgadiya has quit IRC		09:26
*** tosky has joined #openstack-infra		09:29
*** pgadiya has joined #openstack-infra		09:29
*** slaweq has joined #openstack-infra		09:29
ianw	strigazi: for a bit	09:33
strigazi	ianw https://review.openstack.org/#/q/topic:cache-fedorapeople-magnum	09:33
*** slaweq has quit IRC		09:34
ianw	strigazi: ok cool, get pabelanger to take a look too but LGTM	09:35
strigazi	ianw he is in canada?	09:38
ianw	usually :)	09:39
*** slaweq has joined #openstack-infra		09:39
strigazi	ianw yes he is, you in AU afaik and me in Switzerland, very convenient setup :)	09:40
*** shardy has quit IRC		09:43
*** nicolasbock has joined #openstack-infra		09:43
*** slaweq has quit IRC		09:46
*** kornicameister has quit IRC		09:47
*** cuongnv has quit IRC		09:52
*** yamamoto has joined #openstack-infra		09:54
*** shu-mutou is now known as shu-mutou-AWAY		09:54
*** shardy has joined #openstack-infra		09:56
*** jamesmcarthur has joined #openstack-infra		09:57
*** alexchadin has quit IRC		09:59
*** thorst has joined #openstack-infra		10:00
*** kornicameister has joined #openstack-infra		10:00
*** slaweq has joined #openstack-infra		10:02
*** jamesmcarthur has quit IRC		10:02
*** thorst has quit IRC		10:05
*** yamamoto has quit IRC		10:07
*** pgadiya has quit IRC		10:07
*** slaweq has quit IRC		10:08
*** liujiong has quit IRC		10:09
*** dtantsur\|afk is now known as dtantsur		10:09
*** yamamoto has joined #openstack-infra		10:12
*** slaweq has joined #openstack-infra		10:12
*** igormarnat has quit IRC		10:16
*** ruhe has quit IRC		10:16
*** tnarg has joined #openstack-infra		10:17
*** markvoelker has quit IRC		10:17
*** yamamoto has quit IRC		10:17
*** igormarnat has joined #openstack-infra		10:17
*** Odd_Bloke has quit IRC		10:17
*** abelur has quit IRC		10:17
*** esberglu has joined #openstack-infra		10:18
*** yamamoto has joined #openstack-infra		10:18
*** ruhe has joined #openstack-infra		10:18
*** abelur has joined #openstack-infra		10:18
*** hareesh has quit IRC		10:18
*** odyssey4me has quit IRC		10:18
*** abelur_ has quit IRC		10:18
*** Odd_Bloke has joined #openstack-infra		10:19
*** slaweq has quit IRC		10:19
*** hareesh has joined #openstack-infra		10:19
*** pgadiya has joined #openstack-infra		10:19
*** odyssey4me has joined #openstack-infra		10:19
*** yamamoto has quit IRC		10:20
*** yamamoto has joined #openstack-infra		10:20
*** esberglu has quit IRC		10:21
*** zhurong has joined #openstack-infra		10:24
*** AJaeger is now known as AJaeger_		10:26
*** pgadiya has quit IRC		10:28
*** tojuvone has joined #openstack-infra		10:34
*** tojuvone has left #openstack-infra		10:35
*** katkapilatova has joined #openstack-infra		10:36
openstackgerrit	Merged openstack-infra/project-config master: Make grenade-linuxbridge-multinode job experimental https://review.openstack.org/490993	10:38
openstackgerrit	Mark Korondi proposed openstack-infra/project-config master: Bringing upstream training virtual environment over here https://review.openstack.org/490202	10:40
openstackgerrit	Merged openstack-infra/project-config master: [Kuryr] Turn python3 job to voting https://review.openstack.org/491627	10:40
*** ykarel\|lunch is now known as ykarel		10:41
*** pgadiya has joined #openstack-infra		10:41
*** thorst has joined #openstack-infra		10:42
openstackgerrit	Merged openstack-infra/project-config master: [Fuxi] Turn python3 job to voting https://review.openstack.org/491628	10:44
openstackgerrit	Merged openstack-infra/project-config master: [Zun] Make python3 dsvm job as voting https://review.openstack.org/491623	10:44
openstackgerrit	Mark Korondi proposed openstack-infra/project-config master: Bringing upstream training virtual environment over here https://review.openstack.org/490202	10:45
*** igormarnat has quit IRC		10:48
*** igormarnat has joined #openstack-infra		10:48
openstackgerrit	Merged openstack-infra/project-config master: [Zun] Move multinode job to experimental https://review.openstack.org/491624	10:50
openstackgerrit	Merged openstack-infra/project-config master: Reduce yum-config-manager output https://review.openstack.org/491076	10:50
openstackgerrit	Merged openstack-infra/project-config master: Upgrade the ARA fedora jobs to fedora 26 https://review.openstack.org/491633	10:51
*** thorst has quit IRC		10:54
*** thorst has joined #openstack-infra		10:54
*** jkilpatr has joined #openstack-infra		10:58
*** yamamoto has quit IRC		10:58
*** lrossetti_ has joined #openstack-infra		10:58
*** thorst has quit IRC		10:59
*** lrossetti has quit IRC		10:59
*** slaweq has joined #openstack-infra		10:59
*** yamamoto has joined #openstack-infra		10:59
*** sdague has joined #openstack-infra		10:59
*** yamamoto has quit IRC		11:05
*** slaweq_ has joined #openstack-infra		11:07
*** spzala has joined #openstack-infra		11:07
*** jascott1 has joined #openstack-infra		11:07
*** yamamoto has joined #openstack-infra		11:07
*** yamamoto has quit IRC		11:08
*** yamamoto has joined #openstack-infra		11:10
*** sree has quit IRC		11:10
*** jascott1 has quit IRC		11:12
*** spzala has quit IRC		11:12
*** slaweq_ has quit IRC		11:12
*** yamamoto has quit IRC		11:13
*** yamamoto has joined #openstack-infra		11:15
*** huanxie has quit IRC		11:15
*** yamamoto has quit IRC		11:16
*** yamamoto has joined #openstack-infra		11:16
*** slaweq_ has joined #openstack-infra		11:17
*** alexchadin has joined #openstack-infra		11:19
*** slaweq_ has quit IRC		11:23
*** gildub has joined #openstack-infra		11:24
*** EricGonczer_ has joined #openstack-infra		11:33
*** gordc has joined #openstack-infra		11:37
*** EricGonczer_ has quit IRC		11:38
*** EricGonczer_ has joined #openstack-infra		11:39
*** dave-mccowan has joined #openstack-infra		11:40
*** ldnunes has joined #openstack-infra		11:41
*** lucasagomes is now known as lucas-hungry		11:46
*** abelur_ has joined #openstack-infra		11:50
*** thorst has joined #openstack-infra		11:51
*** slaweq_ has joined #openstack-infra		11:51
*** slaweq_ has quit IRC		11:56
*** psachin has quit IRC		11:58
*** jrist has joined #openstack-infra		11:58
openstackgerrit	Tobias Rydberg proposed openstack-infra/irc-meetings master: Changed to correct chairs for the publiccloud_wg https://review.openstack.org/491769	12:00
*** psachin has joined #openstack-infra		12:00
pabelanger	looks like we are hitting quota issues in citycloud-lon1	12:01
pabelanger	OpenStackCloudHTTPError: (403) Client Error for url: https://lon1.citycloud.com:8774/v2/bed89257500340af8d0fbe7141b1bfd6/servers Quota exceeded for cores, instances: Requested 8, 1, but already used 400, 50 of 400, 50 cores, instances	12:01
pabelanger	also, that error message is super confusing	12:01
*** slaweq_ has joined #openstack-infra		12:01
*** jpena\|off is now known as jpena		12:03
*** esberglu has joined #openstack-infra		12:04
*** trown\|outtypewww is now known as trown		12:05
*** rlandy has joined #openstack-infra		12:06
*** slaweq_ has quit IRC		12:06
*** tuanluong has quit IRC		12:07
*** hareesh has quit IRC		12:08
*** esberglu has quit IRC		12:09
*** slaweq_ has joined #openstack-infra		12:12
*** slaweq_ has quit IRC		12:16
*** yamamoto has quit IRC		12:18
pabelanger	clarkb: any idea why we'd see this warning http://logs.openstack.org/49/491749/1/check/gate-tripleo-ci-centos-7-undercloud-oooq/5edaa28/console.html#_2017-08-08_11_03_02_676400	12:19
pabelanger	clarkb: I mean, I know why it is there but how should I go about fixing it	12:20
*** yamamoto has joined #openstack-infra		12:20
*** yamamoto has quit IRC		12:20
*** slaweq_ has joined #openstack-infra		12:22
mnaser	https://review.openstack.org/#/c/491466/ can someone give this a bit of love by any chance	12:23
mnaser	most magnum jobs are timing out due to this	12:23
mnaser	so hopefully if we can get some caching in, it'll become significantly less	12:23
pabelanger	mnaser: strigazi: which images are specifically needed?	12:24
mnaser	pabelanger right now the one that keeps timing out in master https://fedorapeople.org/groups/magnum/fedora-atomic-latest.qcow2	12:24
mnaser	it downloads at ~30Kb/s so it just times out	12:25
mnaser	http://logs.openstack.org/11/488511/4/check/gate-functional-dsvm-magnum-api-ubuntu-xenial/25369a5/logs/devstacklog.txt.gz < warning, big log file, but you can see it there	12:25
pabelanger	mnaser: right, what is the difference between that and atomic images shipped by fedora?	12:25
*** jcoufal has joined #openstack-infra		12:25
pabelanger	mnaser: for example, http://mirror.regionone.infracloud-vanilla.openstack.org/fedora/releases/26/CloudImages/x86_64/images/	12:26
*** Goneri has joined #openstack-infra		12:26
mnaser	pabelanger good question, i'll defer to strigazi for that. however, as a deployer, I use the atomic images shipped by fedora and they work	12:27
mnaser	however we're testing/running against fedora 25 right now	12:27
mnaser	and i dont see that in the mirrors for some reason	12:27
mnaser	http://mirror.regionone.infracloud-vanilla.openstack.org/fedora/releases/25/CloudImages/x86_64/images/	12:27
*** slaweq_ has quit IRC		12:28
*** rwsu has quit IRC		12:28
*** rwsu has joined #openstack-infra		12:29
mnaser	http://mirror.math.princeton.edu/pub/alt/atomic/stable/	12:29
mnaser	okay, thats a specific mirror but that seems to be where they are stored, /pub/alt/atomic/ .. dont think we already cache that?	12:29
pabelanger	ya, looking. Fedoar-26 seems to ship them now	12:30
pabelanger	trying to see where fedora-25 is	12:30
*** zhurong has quit IRC		12:30
*** ralonsoh has quit IRC		12:31
*** ralonsoh has joined #openstack-infra		12:32
strigazi	pabelanger we need fedora-atomic-latest which is a symlink to Fedora-Atomic-25-20170719.qcow2 fedora-kubernetes-ironic-latest.tar.gz -> fedora-25-kubernetes-ironic-20170620.tar.gz and ubuntu-mesos-latest.qcow2 -> ubuntu-14.04.3-mesos-0.25.0.qcow2	12:32
strigazi	pabelanger mnaser the images are stock images	12:33
pabelanger	right, so lets see if we can just mirror them directly from source	12:34
pabelanger	ATM, fedora-26 atomic we get for free	12:34
strigazi	pabelanger we use fedora eople to use a symlink when we update the image and not add commits to our repo	12:34
*** jaypipes has joined #openstack-infra		12:35
*** sbezverk has joined #openstack-infra		12:35
mnaser	imho its probably cleaner to show which upstream ones we're using exactly, making it easy for potential users to know the exact image that is being used	12:35
strigazi	pabelanger but we can always commit there if it makes our life easier and we gain performance	12:35
pabelanger	seems like a lot of pressure on fedorapeople.org	12:35
strigazi	pabelanger if we get f26 for free we can change to the official repo.	12:36
mnaser	strigazi https://github.com/openstack-infra/system-config/blob/master/modules/openstack_project/files/mirror/fedora-mirror-update.sh we can edit this to get the fedora 25 images	12:36
mnaser	http://mirrors.kernel.org/fedora-alt/atomic/stable/	12:36
pabelanger	strigazi: right, I mean, if you want to test fedora-26, we already mirror that to AFS	12:37
mnaser	strigazi fyi f26 come with docker 1.13.1 and i had to push up a few things to make it work so just to keep in mind (mainly k8s 1.6.7 and a patch to set default policy for iptables forward to accept)	12:37
pabelanger	otherwise, we should be able to add mirror for https://dl.fedoraproject.org/pub/alt/atomic/stable/	12:37
strigazi	pabelanger sounds good but for stable branches we are slower in updates we still need f25 until we update.	12:37
openstackgerrit	Alexander Chadin proposed openstack-infra/project-config master: Remove gate job from watcherclient https://review.openstack.org/491784	12:38
*** kgiusti has joined #openstack-infra		12:38
pabelanger	strigazi: why isn't Fedora-Atomic-25-20170719.qcow2 listed at https://dl.fedoraproject.org/pub/alt/atomic/stable/ ?	12:39
mnaser	pabelanger based on my simple math it seem to be around ~4GB per fedora atomic release so mirroring should use ~36gb	12:39
strigazi	pabelanger deleted?	12:39
*** yamamoto has joined #openstack-infra		12:40
pabelanger	strigazi: would one of the listed images work for you? How do you decided when you need to replace Fedora-Atomic-25-20170719.qcow2	12:41
strigazi	pabelanger I'll give it a go with f26 and if it works we can see what to with our ubuntu image and stable branches.	12:41
robcresswell	o/ Just setting up 3rd party CI, is it expected that the noop-check-communication job doesnt receieve params like LOG_PATH? Seems to be able to find the log server, but isn't populating that build param.	12:42
pabelanger	strigazi: sure, lets see if that works, if so, then you get the mirror for free. Looking at other images now	12:42
*** rhallisey has joined #openstack-infra		12:42
pabelanger	robcresswell: see http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/openstack_functions.py how we set it up today for zuulv2.5	12:43
pabelanger	robcresswell: you'll need to create a job like: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/layout.yaml#n1119 to call the function	12:43
*** slaweq_ has joined #openstack-infra		12:44
*** dprince has joined #openstack-infra		12:45
robcresswell	thanks pabelanger. Little out of my depth atm. That's really helpful.	12:45
*** lucas-hungry is now known as lucasagomes		12:45
pabelanger	np	12:46
*** jpena is now known as jpena\|mtg		12:46
*** slaweq_ has quit IRC		12:49
*** Goneri has quit IRC		12:50
*** mandre_away is now known as mandre_mtg		12:51
*** pradk has joined #openstack-infra		12:54
*** abelur_ has quit IRC		12:54
*** slaweq_ has joined #openstack-infra		12:54
*** felipemonteiro_ has joined #openstack-infra		12:55
*** coolsvap has quit IRC		12:56
*** jpena\|mtg is now known as jpena\|off		12:56
*** felipemonteiro__ has joined #openstack-infra		12:57
*** esberglu has joined #openstack-infra		12:58
*** jamesmcarthur has joined #openstack-infra		12:58
*** slaweq_ has quit IRC		13:00
*** felipemonteiro_ has quit IRC		13:01
mnaser	pabelanger whats the decision making process when deciding if something will be mirrored or cached?	13:01
mnaser	http://mirror.regionone.infracloud-vanilla.openstack.org/fedora/releases/26/CloudImages/x86_64/images/ -- the image there is from 2017-07-05.. there's been a few newer images since (such as one released on the 23rd of july)	13:02
mnaser	so i suspect we're not going to get access to fresh images :(	13:02
pabelanger	mnaser: if we can usually rsync, we mirror. However, if the contents chance too fast (like rdo), then we reverse proxy cache	13:02
*** jrist has quit IRC		13:02
*** clayton has quit IRC		13:03
mnaser	pabelanger i would guess then images would be something that we can consider more on the side of stable content	13:04
mnaser	and it would involve a small change here only https://github.com/openstack-infra/system-config/blob/master/modules/openstack_project/files/mirror/fedora-mirror-update.sh	13:04
*** clayton has joined #openstack-infra		13:05
mnaser	i can propose a small change and what ill do is ill exclude all the older releases so we only have f25 atomic latest + f26 atomic latest and then new releases moving forward	13:05
mnaser	it'll save a bunch of disk space on images we likely wont use	13:05
*** gildub has quit IRC		13:06
*** Julien-zte has joined #openstack-infra		13:06
*** pradk has quit IRC		13:06
fungi	i'm curious why the content is so stale. we run rsync from the official copy ~daily?	13:07
*** sbezverk has quit IRC		13:07
mnaser	fungi i dont think its the content thats stale, i think the atomic team doesnt publish images there officially	13:08
mnaser	they probably release in /pub/alt/atomic and there might have been some old reason why that ended up there (for fedora 25, it doesnt even exist)	13:08
*** spzala has joined #openstack-infra		13:08
*** rlandy has quit IRC		13:08
openstackgerrit	Gael Chamoulaud proposed openstack-infra/tripleo-ci master: Enable tripleo-validations tests https://review.openstack.org/481080	13:09
pabelanger	Ya, I don't think ISO content (or any content) changes in release directory	13:09
pabelanger	we'd likely need to mirror: https://dl.fedoraproject.org/pub/alt/atomic/stable/	13:09
fungi	got it	13:10
fungi	now i'm less confused ;)	13:10
*** links has quit IRC		13:11
*** markvoelker has joined #openstack-infra		13:12
*** pgadiya has quit IRC		13:13
numans	pabelanger, hi, can you please add this to your review queue - https://review.openstack.org/#/c/490622/	13:13
*** LindaWang has quit IRC		13:13
*** dizquierdo is now known as dizquierdo_afk		13:14
*** slaweq_ has joined #openstack-infra		13:16
strigazi	pabelanger mnaser Will someone push a chnage to mirror https://dl.fedoraproject.org/pub/alt/atomic/stable/ ?	13:17
*** mpranjic has joined #openstack-infra		13:17
mnaser	strigazi working on it!	13:17
mnaser	i'm making sure we dont mirror useless stuff like isos etc	13:17
*** Liuqing has joined #openstack-infra		13:18
strigazi	mnaser cool	13:18
*** bobh has joined #openstack-infra		13:18
mpranjic	hello! I have issues with login to wiki.openstack.org with openID.	13:19
strigazi	mnaser they don't have ISOs i think	13:19
mpranjic	I get the error:	13:19
mpranjic	OpenID error	13:19
mpranjic	An error occurred: an invalid token was found.	13:19
mnaser	strigazi http://mirrors.kernel.org/fedora-alt/atomic/stable/Fedora-Atomic-26-20170723.0/Atomic/x86_64/iso/	13:19
mpranjic	can someone help me out with that?	13:19
mnaser	and there is stuff like libvirt boxes and blabla, i'll get it addressed shortly	13:19
mpranjic	my Ubuntu One username is: mpranjic	13:19
strigazi	mnaser we only need /CloudImages, not /Atomic	13:20
mnaser	yep thats why im doing all the excludes in the rsync mirorring	13:20
mnaser	so we get all the .qcow2s pretty much	13:20
*** ldnunes has quit IRC		13:20
strigazi	and raw I guess	13:21
*** slaweq_ has quit IRC		13:21
*** sshnaidm\|afk is now known as sshnaidm		13:21
openstackgerrit	Mohammed Naser proposed openstack-infra/system-config master: Add Fedora Atomic mirrors https://review.openstack.org/491800	13:22
mnaser	pabelanger fungi ^ i also added output in the comments of a dry run so it should work :)	13:23
*** baoli has joined #openstack-infra		13:23
*** xyang1 has joined #openstack-infra		13:24
*** slaweq_ has joined #openstack-infra		13:26
*** sree has joined #openstack-infra		13:27
openstackgerrit	Mohammed Naser proposed openstack-infra/project-config master: Add NODEPOOL_ATOMIC_MIRROR to configure_mirror.sh https://review.openstack.org/491801	13:28
*** LindaWang has joined #openstack-infra		13:28
*** jamesmcarthur has quit IRC		13:28
openstackgerrit	Monty Taylor proposed openstack-infra/project-config master: Add mapping file containing v2 to v3 mappings https://review.openstack.org/491804	13:29
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805	13:30
*** slaweq_ has quit IRC		13:33
*** ldnunes has joined #openstack-infra		13:33
*** cshastri has quit IRC		13:33
slaweq	mordred: hello	13:34
slaweq	mordred: can You take a look at https://review.openstack.org/#/c/491266/	13:34
mordred	slaweq: yes!	13:34
slaweq	mordred: I think that it's enough to do it like I did but please check if maybe yamamoto is right	13:35
slaweq	mordred: thx in advance :)	13:35
mordred	oh - sorry - I had this reviewed in my browswer but didn't actually click submit ...	13:35
*** cshastri has joined #openstack-infra		13:36
mordred	slaweq: review left - but basically we need to copy the ENABLE_IDENTIY_V2 pattern for now (this will be better in a couple of weeks)	13:37
*** alexchadin has quit IRC		13:39
ssbarnea	What could I do to make the release of JJB 2.0 happen before the apocalypse? https://storyboard.openstack.org/#!/story/2000745	13:39
*** alexchadin has joined #openstack-infra		13:40
*** alexchadin has quit IRC		13:40
slaweq	mordred: thx	13:40
*** alexchadin has joined #openstack-infra		13:40
mordred	ssbarnea: hi! so - I think we'd like to hold off until we've migrated openstack to zuul v3 which is planned for september 11	13:40
fungi	mordred: well, _we_ pin the version we're using	13:41
*** alexchadin has quit IRC		13:41
mordred	oh.	13:41
mordred	well	13:41
mordred	ignore me	13:41
fungi	so i don't expect they need to hold off releasing	13:41
fungi	we haven't asked them not to	13:41
*** alexchadin has joined #openstack-infra		13:41
*** bh526r has joined #openstack-infra		13:41
*** alexchadin has quit IRC		13:41
ssbarnea	the fact that 2.0 is in pre-release for so long does hurt it a lot as I cannot 'persuade' others to use the pre-release in production.	13:42
*** wznoinsk_ is now known as wznoinsk		13:42
*** alexchadin has joined #openstack-infra		13:42
fungi	ssbarnea: have you asked in #openstack-jjb? the devs/reviewers on that repo have been mostly autonomous for a while, the infra team only provides a bit of oversight	13:42
sshnaidm	clarkb, ping	13:42
fungi	we stopped exerting much control over it when we ceased using jenkins (roughly a year ago)	13:43
*** ldnunes_ has joined #openstack-infra		13:43
ssbarnea	fungi: thanks for the hint. I didn't know about that channel, joined and going to cross post now.	13:44
*** ldnunes has quit IRC		13:44
odyssey4me	hi all, I'd like to understand more about how we can cache an image onto the nodepool nodes	13:44
*** camunoz has joined #openstack-infra		13:45
*** jtomasek has joined #openstack-infra		13:46
*** alexchadin has quit IRC		13:46
*** hongbin has joined #openstack-infra		13:47
*** felipemonteiro__ has quit IRC		13:48
*** slaweq_ has joined #openstack-infra		13:48
*** ociuhandu has joined #openstack-infra		13:50
*** slaweq_ has quit IRC		13:53
*** ociuhandu has quit IRC		13:56
robcresswell	o/ Sorry, back with more questions; nodepool list seems to be "stuck" with a list of instances in the delete state, but the provider has already deleted them. Is there a way to nudge nodepool to figure that out?	13:57
*** EricGonczer_ has joined #openstack-infra		13:57
*** Liuqing has quit IRC		13:58
*** slaweq_ has joined #openstack-infra		13:59
*** dizquierdo_afk is now known as dizquierdo		13:59
*** gouthamr has joined #openstack-infra		14:00
*** EricGonc_ has joined #openstack-infra		14:01
*** xinliang has quit IRC		14:01
dimak	AJaeger_, yolanda there are a lot of queued jenkins jobs, any chance there are more node-pool issues?	14:02
fungi	odyssey4me: we cache the _small_ images and similar files devstack declares it wants by running its image_list.sh utility script from this element: https://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/elements/cache-devstack/extra-data.d/55-cache-devstack-repos#n107	14:02
*** EricGonczer_ has quit IRC		14:02
fungi	odyssey4me: obviously baking too many or too large images onto the filesystems of our worker images makes them unwieldy, so we do try to keep it to a minimum and infrequently-used/larger images can instead be grabbed through our afs-backed mirrors or our caching reverse proxies	14:04
odyssey4me	fungi it's probably a bit big to cache, and putting into the afs mirror or reverse proxying might work fine	14:04
*** marst has joined #openstack-infra		14:05
fungi	for example, kolla publishes their images onto tarballs.o.o and then we have a reverse proxy they pull them through in each provider/region	14:05
fungi	and their largest images are over 4gib	14:05
*** slaweq_ has quit IRC		14:06
fungi	the ones we cache onto our image filesystems are more things like cirros which if memory serves is in the tens of mib	14:06
mtreinish	fungi: it's 13MB	14:09
*** mriedem has joined #openstack-infra		14:09
*** slaweq has quit IRC		14:10
fungi	cool, i was within margin of error/order of magnitude anyway ;)	14:10
*** sree has quit IRC		14:10
*** sree has joined #openstack-infra		14:11
mtreinish	well at least on x86_64, maybe other arches are bigger :)	14:11
odyssey4me	fungi oh no, let me check on the size - but it's less than 300MB IIRC	14:13
*** xinliang has joined #openstack-infra		14:13
*** rbrndt has joined #openstack-infra		14:14
odyssey4me	fungi ah it seems it's around ~90MB per platform	14:14
fungi	we carve out 100gib for the afs cache and another separate 100gib for the apache reverse proxy cache now, so should have plenty of room to cache things local to workers either way, but files in the neighborhood of 100mib is probably pushing the bounds of what we'd want to cache unless a substantial percentage of all jobs we're running will use it	14:15
*** jtomasek has quit IRC		14:15
odyssey4me	fungi so it'd be preferred as a file on AFS, rather than a reverse proxy?	14:16
fungi	that mostly depends on how often the file is expected to change	14:17
odyssey4me	fungi we'd be happy to refresh it daily, or even weekly	14:17
openstackgerrit	Claudiu Belu proposed openstack-infra/project-config master: cloudbase-init: Adds releasenotes jobs https://review.openstack.org/491821	14:17
fungi	is this something you're producing and reconsuming, or something you're consuming which is published outside our ci system by some other community (and how often, roughly)?	14:18
odyssey4me	fungi it's the base lxc cache which is published once every 6 hours IIRC onto images.lxccontainers.org	14:18
odyssey4me	sorry - images.linuxcontainers.org	14:19
fungi	that seems like a better fit for the reverse proxy cache, yeah	14:19
odyssey4me	yeah, that would be alot easier for us to consume I think, because we'd still be able to use the API instead of creating a code path to use a file path or custom URL	14:19
fungi	well, it'll still be a custom url because it's not a transparent proxy	14:20
odyssey4me	yep, but we have a code path for that already	14:20
fungi	oh, awesome	14:20
odyssey4me	so, how do I add a new reverse proxy?	14:20
fungi	two places need updating:	14:20
fungi	http://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/templates/mirror.vhost.erb	14:21
fungi	https://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/scripts/configure_mirror.sh	14:21
*** Guest13936 is now known as med_		14:21
*** med_ has quit IRC		14:21
*** med_ has joined #openstack-infra		14:21
*** med_ is now known as medberry		14:21
fungi	it should be pretty clear from surrounding context what needs to be added, but if you have questions then ask away	14:22
odyssey4me	thanks fungi - I'll take a closer look shortly and ping any further questions, thanks so much for your expertise and assistance	14:23
*** jpena\|off is now known as jpena		14:25
fungi	odyssey4me: just glad i could help	14:25
hongbin	hi, i want to know if there is a way to dump non-devstack systemd logs (i.e. docker logs) to the gate, i tried to do this: https://review.openstack.org/#/c/480306/1/contrib/post_test_hook.sh , but it looks the logs are not there if there is a timeout killed	14:26
*** admcleod_ is now known as admcleod		14:26
openstackgerrit	Matthew Treinish proposed openstack-infra/subunit2sql master: Switch to using stestr https://review.openstack.org/491074	14:27
fungi	dimak: i think we're just backlogged. the osic environment was finally turned off last week, we've got a couple of citycloud regions offline for different issues, and our voucher for ovh expired so we're waiting for that to get re-upped	14:28
openstackgerrit	Matthew Treinish proposed openstack-infra/subunit2sql master: Update python3 versions in tox.ini envlist https://review.openstack.org/491827	14:29
fungi	the post pipeline's only about 4 hours behind, so the situation's not terrible (yet anyway)	14:29
openstackgerrit	Matthew Treinish proposed openstack-infra/subunit2sql master: Switch to using stestr https://review.openstack.org/491074	14:34
openstackgerrit	Matthew Treinish proposed openstack-infra/subunit2sql master: Update python3 versions in tox.ini envlist https://review.openstack.org/491827	14:34
*** felipemonteiro_ has joined #openstack-infra		14:37
*** armax has joined #openstack-infra		14:37
*** jpena is now known as jpena\|off		14:38
*** florianf has quit IRC		14:40
*** alexchadin has joined #openstack-infra		14:42
*** medberry is now known as med_		14:43
*** LindaWang has quit IRC		14:43
*** slaweq has joined #openstack-infra		14:43
*** dtantsur is now known as dtantsur\|brb		14:44
*** katkapilatova has left #openstack-infra		14:45
*** alexchadin has quit IRC		14:47
*** gyee has joined #openstack-infra		14:48
openstackgerrit	Merged openstack/os-client-config master: Update globals safely https://review.openstack.org/491618	14:48
*** slaweq has quit IRC		14:48
*** links has joined #openstack-infra		14:48
*** links has quit IRC		14:49
*** cshastri has quit IRC		14:50
openstackgerrit	sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683	14:53
*** florianf has joined #openstack-infra		14:53
*** slaweq has joined #openstack-infra		14:53
*** annegentle has joined #openstack-infra		14:56
*** xarses_ has joined #openstack-infra		14:57
clarkb	sshnaidm: hi	14:58
*** EricGonc_ has quit IRC		14:58
*** slaweq has quit IRC		14:59
sshnaidm	clarkb, do we have any problem with logs now? EmilienM told me you have some issues	14:59
clarkb	odyssey4me: fungi not that if the lxc images arent served with ttls they will be cached for roughly 24 hours. which is 4x their update cycle	15:00
clarkb	sshnaidm: yes there are still ~27 copies on /etc in every job	15:00
sshnaidm	clarkb, which job? do you have an url?	15:00
*** dmsimard is now known as dmsimard\|afk		15:00
odyssey4me	clarkb our issue for testing is not really getting the latest image, but getting one at all	15:00
*** EricGonczer_ has joined #openstack-infra		15:01
odyssey4me	between the dns failures, and slow download speeds, we're not getting them reliably done and getting job timeouts/failures	15:01
odyssey4me	so we're hoping just to get something more reliable in place	15:01
*** psachin has quit IRC		15:01
clarkb	odyssey4me: sure just noting that that may be a drawback	15:02
clarkb	sshnaidm: gate-tripleo-ci-centos-7-undercloud-containers is the one I looked at but assuming the others are that way too	15:02
odyssey4me	clarkb appreciate the heads up - for us it won't be an issue	15:02
fungi	clarkb: odyssey4me: if it becomes an issue, convincing the hosts of that image to start employing a cache ttl header is probably not an entirely wasted effort either	15:03
*** slaweq has joined #openstack-infra		15:04
*** gyee has quit IRC		15:04
*** pradk has joined #openstack-infra		15:05
*** jrist has joined #openstack-infra		15:05
*** pradk has quit IRC		15:07
*** mattmceuen has joined #openstack-infra		15:07
sshnaidm	clarkb, fyi https://bugs.launchpad.net/tripleo/+bug/1709339	15:07
openstack	Launchpad bug 1709339 in tripleo "CI: duplicate /etc directories in logs for containers" [Critical,Triaged]	15:07
clarkb	sshnaidm: http://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/var/log/extra/docker/containers/neutron_ovs_agent/etc/ http://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/var/log/extra/docker/containers/mysql/etc/	15:08
clarkb	http://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/var/log/extra/docker/containers/mistral_executor/etc/ and so on	15:08
sshnaidm	clarkb, I see, it's described in the bug I submitted right now	15:09
*** jascott1 has joined #openstack-infra		15:09
clarkb	sshnaidm: its not just for the containers btw http://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/etc/	15:09
*** slaweq has quit IRC		15:10
sshnaidm	clarkb, this directory is from main subnode, it's not duplicated	15:11
openstackgerrit	sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683	15:13
clarkb	sshnaidm: but it is, because its teh same stuff from the containers	15:13
clarkb	sshnaidm: and its got redundant info we should never be collecting like DIR_COLORS	15:14
clarkb	sshnaidm: we need to stop collecting all of that	15:14
clarkb	make 1 copy of the necessary data and thats it	15:14
sshnaidm	clarkb, yeah, but problem is collecting /etc in containers, not this one	15:14
*** jascott1 has quit IRC		15:14
clarkb	its both...	15:14
sshnaidm	clarkb, we need 1 /etc directory anyway	15:14
clarkb	yes but we don't need all the extra crap in it	15:14
sshnaidm	clarkb, if it takes 1KB, I'm not sure it worth the effort to overcomplicate the code	15:15
clarkb	sshnaidm: but it is	15:16
clarkb	beacuse we've already had problems where overcollecting results in grabbing potentially massive content you don't want	15:16
clarkb	this is why we keep asking you to only copy what you want	15:16
clarkb	rather than copying everything then reducing from everything	15:16
openstackgerrit	sebastian marcet proposed openstack-infra/openstackid-resources master: Fixed Typo on Summit Service https://review.openstack.org/491836	15:17
sshnaidm	clarkb, we excluded everything you told to exclude previous time: https://github.com/openstack-infra/tripleo-ci/blob/master/toci-quickstart/config/collect-logs.yml#L35-L64	15:17
clarkb	sshnaidm: right but we've also asked you to invert the way you collect logs and only collect what you want	15:17
clarkb	sshnaidm: so first step was stop collecting absolutely everything but moving forward we should be collecting what we want/need explicitly	15:18
openstackgerrit	Merged openstack-infra/openstackid-resources master: Fixed Typo on Summit Service https://review.openstack.org/491836	15:18
clarkb	but also you are still collecting all of /etc multiple times including things like dir colors and bashcompletion which we've asked you to stop for weeks now	15:18
sshnaidm	clarkb, it's too much projects in tripleo together so not always is possible to maintain the relevant up to date list, please consider fact it's not one project like you have usually, but a lot of them	15:19
clarkb	yes taht is the same situation we are in for devstack-gate and it hasn't been a problem	15:19
sshnaidm	clarkb, I'm not sure it's the same situation	15:20
clarkb	sshnaidm: and I'd like you to consider that you have effectively ddos'd our filesystem multiple times	15:20
fungi	sshnaidm: why not put together a list of files you've needed to look at when troubleshooting job failures for those in the past? certainly you haven't looked at every copy of every file in /etc?	15:20
clarkb	one single job uses 10% of all our disk	15:20
clarkb	more than the next two jobs combined	15:20
sshnaidm	clarkb, we talk now about files of size a few KBs, it's not what kills fs	15:20
clarkb	sshnaidm: no the collect everything attitude is what kills the fs	15:20
clarkb	sshnaidm: because when centos 7.4 happens and some new thing sneaks in we break again	15:21
clarkb	and then against for 7.5 and then 8 and so on	15:21
clarkb	if instead you collcet what you need this risk is greatly reduced	15:21
fungi	it's better to realize you're not collecting a file you need and then make a change to start including it than to collect files you won't ever need	15:21
sshnaidm	clarkb, not sure I understand how centos is related to collecting /etc files	15:21
clarkb	sshnaidm: because the contents of /etc will change as centos changes over time	15:22
fungi	sshnaidm: because each new update or release of the distro can move files around in /etc or add new ones	15:22
sshnaidm	clarkb, yeah, but how would it break anything?	15:22
*** Julien-zte has quit IRC		15:22
clarkb	sshnaidm: if a large file shows up all of a sudden we fill the disk again just like we already did with the java stuff	15:22
fungi	when rh decides it should include some new large set of files in /etc you start collecting that automatically and fill up our logserver again	15:22
*** sbezverk has joined #openstack-infra		15:23
fungi	let me put this another way... if we stopped hosting logs for tripleo jobs, we could provide the community with several months of log retention instead of just one	15:23
fungi	do 2/3 of our community benefit from being able to look at logs for tripleo job failures?	15:23
sshnaidm	fungi, yes, because most of projects use one dir for logs and one /etc folder, because the have only one process	15:24
*** iyamahat has joined #openstack-infra		15:24
fungi	sshnaidm: that's not an answer to my question	15:24
sshnaidm	fungi, they would	15:24
*** sbezverk_ has joined #openstack-infra		15:24
sshnaidm	fungi, because we test their projects too	15:24
fungi	what percentage of our community do you thnik look at logs for tripleo jobs? i doubt it's even close to the proportion of space you're using on the logs site	15:25
sshnaidm	fungi, every project which is part of tripleo will benefit from our jobs	15:25
*** Julien-zte has joined #openstack-infra		15:25
sshnaidm	fungi, we have jobs running in about 10 other projects, part of them are voting, part of them are not, and part is experimental	15:25
*** Julien-zte has quit IRC		15:26
*** slaweq has joined #openstack-infra		15:26
*** marst_ has joined #openstack-infra		15:26
sshnaidm	fungi, and we work to have there much more voting and relevant jobs to prevent failures and to help with integration	15:26
*** marst has quit IRC		15:27
*** iyamahat has quit IRC		15:27
sshnaidm	fungi, it's neutron, nova, ironic, etc, etc, and tripleo jobs for some of them are only way to be tested in "real life"	15:27
sshnaidm	so yes, I think we do something useful for all community	15:27
openstackgerrit	sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683	15:27
*** sbezverk has quit IRC		15:28
clarkb	sshnaidm: but we do that in other jobs as well without copying unnecessary data redundantly in every job is the point	15:28
clarkb	sshnaidm: you can do both things, they do not conflict with each other	15:28
*** vhosakot has joined #openstack-infra		15:28
sshnaidm	clarkb, I handled the problem with all these /etc, it's a bug and will be solved, but I'm against whitelist of logs	15:30
*** dtantsur\|brb is now known as dtantsur		15:30
*** slaweq has quit IRC		15:30
sshnaidm	clarkb, from all my investigations in last years it's really hard to determine which log will give your info, it could anyone	15:30
clarkb	sshnaidm: there is no reason to collect bash compleetion or dir colors and so on	15:30
clarkb	what is the argument that you need those?	15:31
*** lrossetti_ has quit IRC		15:31
sshnaidm	clarkb, right, we don't need this, I can add them to exclude list right now	15:31
*** lrossetti has joined #openstack-infra		15:31
sshnaidm	clarkb, to maintain configs list for tens of services is much more complicated and reason for breakages and failures	15:32
clarkb	but it isn't...	15:32
clarkb	we have done it successfully for years	15:32
*** lrossetti has quit IRC		15:33
fungi	devstack-gate specifically has done it for years	15:33
*** camunoz has quit IRC		15:34
sshnaidm	clarkb, fungi ok, I will raise this question on next tripleo meeting, please come and let's discuss there, I hope we'll find something suitable for everybody	15:34
sshnaidm	clarkb, fungi does it work for you?	15:34
clarkb	tuesday at 1400UTC is a bit early for me but I can try	15:34
clarkb	and I think fungi is traveling that day	15:34
clarkb	sshnaidm: I can do my best to get up early	15:36
*** slaweq has joined #openstack-infra		15:36
*** rhallisey has quit IRC		15:37
jeblair	what's the current disk space used per build?	15:37
clarkb	seems to be between ~75MB and 100MB based on the job	15:37
clarkb	(so it is a massive improvement over where we were, but it would be nice to make it robsut so that we don't have to worry as much about it exploding in the future)	15:38
*** ccamacho has quit IRC		15:38
*** ccamacho has joined #openstack-infra		15:38
jeblair	k	15:38
fungi	part of it is also the number of tripleo jobs and number of times they're run multiplied by their average size	15:38
*** tosky has quit IRC		15:40
*** tosky has joined #openstack-infra		15:40
clarkb	basically we know that collecting everything is problematic because you end up with what you don't expect (there are multiple cases of this) so to avoid it in the future I personally would like to see a more whitelist approach to collecting logs than a grab everything + blacklist	15:41
fungi	doing a quick analysis of the sample data clarkb collected, jobs with "tripleo" in the name account for 33% of the data we're storing right now, so trying to figure out how to get that reduced	15:41
clarkb	with a single job (the one linked to above) being ~10% of the total	15:42
clarkb	whcih is more than the next two jobs combined	15:42
*** armax has quit IRC		15:42
*** iyamahat has joined #openstack-infra		15:42
*** slaweq has quit IRC		15:42
*** armax has joined #openstack-infra		15:43
fungi	at least it's down from previously, where some 70% of the data we were storing were tripleo job logs	15:43
*** pstack has joined #openstack-infra		15:43
*** dougwig has joined #openstack-infra		15:44
*** e0ne has quit IRC		15:45
sshnaidm	clarkb, fungi I added the item, if you can please join, if not - I'll present your point: https://etherpad.openstack.org/p/tripleo-meeting-items	15:46
*** markus_z has quit IRC		15:46
fungi	thanks sshnaidm!	15:46
fungi	and clarkb is correct, i'll be driving a car during that next meeting	15:47
*** camunoz has joined #openstack-infra		15:47
fungi	otherwise i would gladly attend	15:47
*** jamesdenton has quit IRC		15:49
*** jamesdenton has joined #openstack-infra		15:50
*** slaweq_ has joined #openstack-infra		15:52
*** krtaylor has quit IRC		15:54
*** yamamoto has quit IRC		15:54
*** hamzy has quit IRC		15:55
*** yamamoto has joined #openstack-infra		15:58
*** felipemonteiro__ has joined #openstack-infra		15:58
mwhahaha	guestion, so i seem to have puppet-tripleo-puppet-unit logs in my puppet-mistral-puppet-lint results: http://logs.openstack.org/52/491352/1/gate/gate-puppet-mistral-puppet-lint/9b68f25/console.html#_2016-12-16_09_24_02_240950	16:00
mwhahaha	also they are results from 2016	16:01
mwhahaha	any thoughts on how that happened?	16:02
*** felipemonteiro_ has quit IRC		16:02
clarkb	the timestamp on the file itself is from today the 8th of august, possible that a node booted with bad clock resulting in the 2016 problem	16:02
pabelanger	ya, that is odd	16:03
*** yamamoto has quit IRC		16:03
mwhahaha	http://logs.openstack.org/52/491352/1/gate/gate-puppet-mistral-puppet-lint/9b68f25/console.html#_2016-12-16_09_29_54_873896	16:03
mwhahaha	i like the success/failure/failure	16:04
clarkb	and the other builds there have the correct content so it isn't a consistent problem	16:04
mwhahaha	wonder if there's a node that never fully cleared or something	16:04
*** jamesmcarthur has joined #openstack-infra		16:04
mwhahaha	because it looks like it's got a fail from back in march as well	16:04
clarkb	thinking about how console logs work could it be a uuid collision (that seems very unlikely but we are only using the short version of the uuid in the path at least	16:05
pabelanger	ya	16:05
clarkb	jeblair: ^	16:05
pabelanger	it also doesn't explain why http://logs.openstack.org/52/491352/1/gate/gate-puppet-mistral-puppet-lint/9b68f25/_zuul_ansible/scripts/07-4094c726a11441b9b73ac0c6dde28be6.sh was actually called	16:05
pabelanger	because console log is different	16:06
clarkb	pabelanger: ya tahts why I'm wondering if its a collision on the uuids and we ended up copying so old file left around on the launcher maybe	16:06
pabelanger	maybe	16:06
clarkb	except we clear those out too don't we? they don't have a life on the launcher beyond the job?	16:06
pabelanger	let me look at zl02	16:07
pabelanger	see if anything is odd	16:07
*** Apoorva has joined #openstack-infra		16:07
clarkb	mwhahaha: I highly doubt that the test node itself managed to survive for 8 months. Nodepool is pretty good about keeping things cleaned up after its timeouts so 8 months would be a long time to survive	16:08
pabelanger	clarkb: look at the node ID	16:08
mwhahaha	¯\_(ツ)_/¯ stranger things have happened :D	16:08
pabelanger	centos-7-infracloud-chocolate-6231138	16:08
pabelanger	that is way wrong	16:08
pabelanger	centos-7-infracloud-vanilla-10324975	16:09
pabelanger	what is should have been	16:09
pabelanger	so, I wonder if we some how booted an old VM again in infracloiud	16:09
clarkb	pabelanger: ya that and the appended statuses makes me think there may be a collision somewhere and we end up picking up the old data	16:09
clarkb	oh it could be cached on the remote somehow hrm?	16:10
*** armax has quit IRC		16:10
*** armax has joined #openstack-infra		16:11
openstackgerrit	Gabriele Cerami proposed openstack-infra/tripleo-ci master: WIP: containers periodic test https://review.openstack.org/475747	16:11
openstackgerrit	Slawek Kaplonski proposed openstack-infra/project-config master: Add QoS service plugin to be enabled in shade tests https://review.openstack.org/491266	16:12
clarkb	in any case I think it likely is running the correct job but when the console log is collected we are grabbing the old file somehow	16:12
clarkb	mwhahaha: ^	16:12
mwhahaha	well it failed after passing the check so who knows	16:12
mwhahaha	i rechecked and we'll see	16:12
pabelanger	clarkb: mwhahaha: so, job timed out for some reason. And zuul killed ansible, collected logs	16:12
pabelanger	so, possible something was wrong with node	16:13
pabelanger	SSH hostkeys didn't change, do it was the right node	16:13
clarkb	we don't check hostkeys though	16:13
pabelanger	which makes me think it was just collsion with logs	16:13
pabelanger	clarkb: ya, we set them up	16:13
clarkb	in 2.5?	16:13
clarkb	pretty sure we don't	16:14
pabelanger	ya, 1 sec	16:14
clarkb	basedon the sequence number of the node that node definitely does look like it would've been booted in december though. So I think we are just getting logs from december somehow	16:14
pabelanger	clarkb: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/launcher/ansiblelaunchserver.py#n1399	16:15
*** EricGonczer_ has quit IRC		16:15
clarkb	pabelanger: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/launcher/ansiblelaunchserver.py#n1283	16:16
fungi	i wonder if it could have been on a down hypervisor until very recently, and we ran into an ip address collision rather than a uuid collision (which seems far less likely)	16:16
clarkb	so we don't have any idea what the hostkey should be we just grab whatever we get and trust it	16:16
clarkb	so we aren't really checking it in a way to know we got the right node	16:16
pabelanger	clarkb: Right, we keyscan to make sure node doesn't disappear between playbook runs	16:16
pabelanger	but you are right, we blindly assume	16:16
clarkb	fungi: that is an interesting thoery	16:17
clarkb	fungi: basically arp wins on old hosts and we pick up preexisting console log there?	16:17
fungi	though still surprising that nodepool's cleanup wouldn't have dealt with it given the way we tag instances	16:17
pabelanger	clarkb: I wonder if we should try to match node via hostname too? Nodepool sets it centos-7-infracloud-vanilla-10324975, we could then have ansible task validate correct hostname	16:18
fungi	it should be deleting old nodes it finds in the server list even if it has lost track of them in its db	16:18
clarkb	fungi: ya but that runs on a 15 minute cron iirc so there is a window where your theory could happen	16:18
clarkb	small window but possible	16:18
pabelanger	or some other form of meta data in config-drive	16:18
clarkb	pabelanger: the idea would be for nova/neutron to provide the hostkey to us then we check that	16:18
clarkb	pabelanger: that work is slowly in progress aiui but nothing we can control today unfortunately	16:19
pabelanger	clarkb: ya	16:19
fungi	there is also the possibility to have glean echo the hostkey to the kconsole on boot and then get nodepool to scrape it from the nova console log, but not all our providers support the necessary api method	16:20
*** ccamacho has left #openstack-infra		16:20
*** ykarel has quit IRC		16:22
clarkb	we should be able to do a nova list and see if any nodes with ancient sequence numbers show up /me does this	16:22
*** krtaylor has joined #openstack-infra		16:23
*** lucasagomes is now known as lucas-afk		16:24
clarkb	ubuntu-xenial-infracloud-vanilla-8895313	16:25
clarkb	ubuntu-xenial-infracloud-chocolate-8911632	16:25
clarkb	those may be held nodes?	16:25
clarkb	everything else looks fairly new	16:26
fungi	what are the possibilities that there could be a lost instance which isn't tracked in nova's db, and so is squatting an ip address but never getting cleaned up since it doesn't appear in the server list?	16:27
clarkb	nope they've been in a delete state for 81 and 78 days	16:27
*** ggillies_ has quit IRC		16:27
clarkb	fungi: I'm guessing it is theoretically possible	16:27
*** pcaruana has quit IRC		16:27
clarkb	but don't know enough about nova to know under what circumstances that could happen if any	16:27
fungi	here in openstack, anything is theoretically possible!	16:27
clarkb	we could run a virsh list --all and compare	16:28
*** ggillies has joined #openstack-infra		16:30
*** dizquierdo has quit IRC		16:31
*** rcernin has quit IRC		16:31
*** dizquierdo has joined #openstack-infra		16:31
*** slagle has quit IRC		16:31
*** tesseract has quit IRC		16:31
pabelanger	fungi: any feedback from OVH and our collections?	16:32
fungi	pabelanger: i hadn't seen anything back from jean-daniel yet as of a few minutes ago	16:33
*** kjackal_ has quit IRC		16:33
fungi	looking back through the discussion history from the last time this happened, our most recent voucher expired in january and we began to get notifications at that time	16:34
fungi	between them being in french and going to infra-root@ address which nobody was monitoring regularly until i started keeping an eye on it a couple months ago, i didn't realize these were how we're supposed to know it's time to re-up the voucher	16:34
pabelanger	ack	16:35
fungi	so once we get this squared away, the _next_ time we start getting messages in french from ovh to infra-root@ we should reach out to jean-daniel at that point to ask to have the voucher re-upped	16:36
*** markvoelker has quit IRC		16:36
fungi	but it would be great if more people than just me set up their imap clients to keep an eye on the various mailboxes under that account too	16:36
clarkb	scanning libvirt for rogue instances is not as easy as it sounds (or I'm missing the virsh command that tells you what the nova uuid is)	16:38
*** LindaWang has joined #openstack-infra		16:39
clarkb	nova doesn't set instance descriptions	16:40
openstackgerrit	Merged openstack-infra/irc-meetings master: Changed to correct chairs for the publiccloud_wg https://review.openstack.org/491769	16:40
fungi	i smell a ranty summit talk in the works	16:40
clarkb	aha! virsh list --uuid	16:41
*** bh526r has quit IRC		16:41
*** slaweq_ has quit IRC		16:43
*** dmsimard\|afk is now known as dmsimard		16:43
*** LindaWang has quit IRC		16:43
*** alexchadin has joined #openstack-infra		16:44
*** pstack has quit IRC		16:44
*** slaweq has joined #openstack-infra		16:45
*** shardy has quit IRC		16:46
*** voipmonk has left #openstack-infra		16:47
*** annegentle has quit IRC		16:49
*** alexchadin has quit IRC		16:49
openstackgerrit	Matthew Treinish proposed openstack/os-testr master: Switch to stestr under the covers https://review.openstack.org/488441	16:50
*** rhallisey has joined #openstack-infra		16:50
*** Apoorva_ has joined #openstack-infra		16:50
*** derekh has quit IRC		16:52
*** rhallisey has quit IRC		16:52
*** yamahata has quit IRC		16:52
*** rhallisey has joined #openstack-infra		16:52
*** iyamahat has quit IRC		16:53
*** Apoorva has quit IRC		16:53
*** pstack has joined #openstack-infra		16:53
*** ralonsoh has quit IRC		16:53
*** trown is now known as trown\|lunch		16:54
clarkb	fungi: pabelanger http://paste.openstack.org/show/617807/ that is what we have on reachable hypervisors	16:58
clarkb	now to cross check against the nodepool logs to see if any don't belong	16:58
*** camunoz has quit IRC		16:58
fungi	i have a feeling i'm going to be disappointed but still unsurprised by the result	16:58
*** camunoz has joined #openstack-infra		16:59
*** baoli has quit IRC		17:00
openstackgerrit	Chris Dent proposed openstack-infra/project-config master: Publish placement-api-ref https://review.openstack.org/491860	17:00
*** slaweq has quit IRC		17:02
clarkb	fungi: pabelanger http://paste.openstack.org/show/617809/ a few of them don't show in today's logs. Will cross check those against nova listings next	17:03
clarkb	and I guess nodepool listings as they may be older than today	17:03
*** tosky has quit IRC		17:04
*** rwsu has quit IRC		17:04
clarkb	http://paste.openstack.org/show/617811/ is that cleaned up a bit	17:06
clarkb	only one of thoseshows up in nodepool listings	17:08
clarkb	now we check nova	17:08
*** baoli has joined #openstack-infra		17:09
*** iyamahat has joined #openstack-infra		17:10
fungi	funny, ovh says our instance with ip address 158.69.77.16 was reported conducting a brute force attack against someone's ssh server at 00:38:10 CEST today	17:13
*** annegentle has joined #openstack-infra		17:13
fungi	i can't find evidence that nodepool's booted any instance with that ip address in the past ~10 days of launcher debug logs	17:15
fungi	and it's not an ip address for anything in our ansible inventory	17:16
clarkb	and I can't ssh to it implying it never was one of ours	17:16
clarkb	or rather if it still was around it wasn't ours	17:16
clarkb	fungi: pabelanger http://paste.openstack.org/show/617815/ all of those VMs appear to be leaked. Actually now that I say that I didn't find whcih one is our mirror node so need to clean it out of the list	17:17
*** dtantsur is now known as dtantsur\|afk		17:17
clarkb	http://paste.openstack.org/show/617816/ doesn't include the mirror	17:18
*** sree has quit IRC		17:18
clarkb	can you maybe double check that list and make sure I'm not missing some other VMs? but I think next step is dumpxml on them to see if we can get any more info about why they exist then possible virsh destroy them	17:19
clarkb	and virsh undefine them	17:19
*** sree has joined #openstack-infra		17:19
clarkb	fungi: I also think ^ lends weight to your IP addr theory	17:19
*** dizquierdo has quit IRC		17:19
clarkb	that first node says <nova:creationTime>2016-12-15 16:34:33</nova:creationTime>	17:20
* fungi sighs		17:20
*** bobh has quit IRC		17:21
clarkb	which is suspiciously close to the log timestamp that mwhahaha pointed out	17:21
clarkb	also it is a nodepool host according to dumpxml	17:21
*** rbrndt has quit IRC		17:21
fungi	yeah, i have a sinking feeling something happened to the environment around that time and whatever was done to recover from it caused us to lose track of those	17:21
fungi	given the close clustering of timestamps	17:21
*** baoli has quit IRC		17:22
fungi	could also explain why we've been getting a little less performance out of it than we thought we should for the number of instances we were booting i support (though with them being idle, probably not)	17:22
*** baoli has joined #openstack-infra		17:22
*** sree has quit IRC		17:23
clarkb	I'll gather what info I can for each one but ya I think we just delete them if nova diesnt know about them	17:23
clarkb	(so please helo me double check that aspect of it)	17:23
*** hamzy has joined #openstack-infra		17:24
fungi	i agree, it's more a warning that we should have some way of spotting leaks in those clouds	17:24
*** electrofelix has quit IRC		17:24
fungi	i don't see much reason to keep them, though i also don't know as much about what else might be a vm in that environment	17:24
*** kjackal_ has joined #openstack-infra		17:24
fungi	does the bifrost deployment create virtual machines on the hypervisor nodes outside nova's control?	17:25
fungi	seems unlikely, but i'm not too familiar with its architecture	17:25
fungi	also, i guess if we log into each of them and they all look like test nodes, then deletesky	17:26
fungi	any way to easily tease hostnames out of them?	17:26
clarkb	I'm pretty sure bifrost doesn't	17:28
clarkb	fungi: not sure, but we can in theory attach to their consoles	17:28
*** jamesmcarthur has quit IRC		17:28
*** sambetts is now known as sambetts\|afk		17:29
*** spzala has quit IRC		17:29
clarkb	of the 4 VMs I have dumpxml'd 3 are from 12/15 and one is from 12/14	17:31
clarkb	and they all use flavor nodepool	17:31
clarkb	fungi: I'm not able to connect to the console, get error: internal error: character device console0 is not using a PTY so that may not be possible	17:33
clarkb	oh that is because nova redirects it to a file which we can read	17:33
clarkb	the one on 009 is ubuntu-xenial-infracloud-chocolate-6205157	17:33
clarkb	now to cross check all these against mwhahaha's log	17:33
mwhahaha	did i find some long lost vms? :D	17:34
*** yamahata has joined #openstack-infra		17:34
fungi	mwhahaha: you taught us that apparently nova leaks like a sieve ;)	17:34
mwhahaha	:o	17:35
fungi	not really like a sieve. looks like we had some issue back in mid-december that caused us to lose track of some dozen or so instances in that cloud	17:35
clarkb	ugh the one on 12 appears to be a failed boot and its just appending to its log file constantly	17:35
mwhahaha	sounds like we need to invest in some flex tape	17:35
*** pstack has quit IRC		17:35
fungi	mwhahaha: if it works like in the infomercials, i'll pick up a few cases	17:36
*** florianf has quit IRC		17:36
clarkb	it is 4GB large	17:36
*** 94KAA7YW9 has joined #openstack-infra		17:37
*** sbezverk_ has quit IRC		17:37
*** markvoelker has joined #openstack-infra		17:37
clarkb	centos-7-infracloud-chocolate-6200254 on 011	17:37
clarkb	ubuntu-xenial-infracloud-chocolate-6200221 on 013	17:38
pabelanger	clarkb: wow, nice work	17:38
clarkb	ubuntu-xenial-infracloud-chocolate-6193047 on 028	17:39
clarkb	ubuntu-xenial-infracloud-chocolate-6192475 on 026	17:40
*** sbezverk has joined #openstack-infra		17:42
clarkb	the node on 024 isn't running	17:42
clarkb	ubuntu-xenial-infracloud-chocolate-6198347 on 036	17:43
*** markvoelker has quit IRC		17:44
*** slagle has joined #openstack-infra		17:45
*** alexchadin has joined #openstack-infra		17:45
* clarkb stops posting all of them here (I think this is enough info to show the leaks are from nodepool and such)		17:45
*** annegentle has quit IRC		17:45
fungi	agreed, no need to keep any of those in my opinion	17:46
*** pradk has joined #openstack-infra		17:48
clarkb	I don't see the node that ran mwhahaha's job though. So possibly that is the one that is appending to its console log such that I can't really see what it is. Going to try and grep through that log now	17:49
*** alexchadin has quit IRC		17:49
clarkb	sdague: dansmith in the case of nova "leaking" libvirt VMs. Is it safe to virsh destroy and undefine the nodes under nova? Or are there bits of the database we should check again?	17:51
*** Swami has joined #openstack-infra		17:52
*** pradk has quit IRC		17:52
*** bobh has joined #openstack-infra		17:52
clarkb	I am going to start by virsh shutdowning the instances so they stop trying to do things	17:52
clarkb	or maybe even that isn't safe it if gets nova out of sync somewhere?	17:53
*** florianf has joined #openstack-infra		17:53
dansmith	clarkb: nova is leaking libvirt vms? how?	17:54
fungi	seems to me like nova's already out of sync?	17:54
clarkb	dansmith: we don't know how, but there are VMs from December in infracloud that don't show up in nova listings	17:54
mnaser	clarkb that behaviour can happen is things get messy in the cloud	17:54
clarkb	dansmith: they are all from december 14 and 15 so guessing something went sideways	17:54
mnaser	do you get warnings in the nova compute log with VM and database count not matching?	17:54
clarkb	mnaser: let me see	17:55
dansmith	clarkb: you can tell nova to reap things it doesn't know about, but I also don't know that I've ever seen that happen	17:55
fungi	dansmith: best guess is that cloud suffered some sort of trauma months ago and didn't record these in the db or was unable to fully destroy them when it deleted them. also keep in mind this is still old code (mitaka-based i believe?)	17:55
*** tnovacik has joined #openstack-infra		17:55
clarkb	mnaser: 2017-08-08 17:52:56.780 12787 WARNING nova.compute.manager [req-b15f50fc-47af-4f7e-971a-40d3e232d89e - - - - -] While synchronizing instance power states, found 0 instances in the database and 1 instances on the hypervisor. yup	17:55
mnaser	there ya go	17:55
mnaser	those warnings will be your big hint	17:55
mnaser	and as dansmith said there is a setting to delete them but i'm too scared to run that so manual cleanup would be easier	17:56
clarkb	mnaser: given that indicates the db doesn't know about the instance I should be safe to just destroy it manually ya?	17:56
fungi	good that we have something to look for in the future	17:56
mnaser	i mean if you want to verify	17:56
mnaser	virsh dumpxml <foo> the ID of the VM	17:56
mnaser	will be the instance uuid	17:56
mnaser	you can cross check that with the nova instances table	17:56
mnaser	it should be marked as deleted, if it is, you can virsh destroy <foo> and delete remains in /var/lib/nova/instances (that's what i do most of the time and nothing blew up)	17:57
*** jrist has quit IRC		17:57
clarkb	mnaser: does virsh undefine not clean up /var/lib/nova/instances?	17:57
dansmith	clarkb: not images	17:58
clarkb	ah ok	17:58
clarkb	I will start with a shutdown of the instances across the board so they stop running at least then we can go through and clean up	17:58
*** pushkaraj__ has joined #openstack-infra		17:58
*** 94KAA7YW9 has quit IRC		17:58
*** pvaneck has joined #openstack-infra		17:59
*** spzala has joined #openstack-infra		17:59
*** spzala has quit IRC		18:01
*** rbrndt has joined #openstack-infra		18:01
*** spzala has joined #openstack-infra		18:01
*** makowals has quit IRC		18:06
clarkb	mnaser: I shouldn't need to modify the nova db at all right? just clean up hypervisor disk contents?	18:07
mnaser	clarkb correct, if the API is returning "instance does not exist" it means for all the nova knows, that VM is supposed to be terminated	18:08
clarkb	perfect, thanks	18:09
clarkb	I'm almost done shutting down/destroying the instances so they stop running. Does anyone else want to look at infracloud and see if we have the same problem there?	18:11
clarkb	fungi: pabelanger ^	18:11
*** makowals has joined #openstack-infra		18:13
mnaser	clarkb https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L6639-L6714	18:13
clarkb	fungi: pabelanger first step was running `sudo virsh list --all --uuid` against all the reachable hypervisors. Then take that list and remove any nodes that show up in todays nodepool log or in nodepool list data. Then remove our mirror node then cross check against nova listing	18:14
*** slaweq has joined #openstack-infra		18:14
pabelanger	clarkb: not at the moment, chasing down zuul change queue questions	18:14
mnaser	looks like that code was added 4-6 years ago	18:14
mnaser	and the option to look for is running_deleted_instance_action	18:15
mnaser	so you can set that to log or shutdown (or reap if you want)	18:15
*** kjackal_ has quit IRC		18:15
fungi	clarkb: i can probably take a look after the infra meeting	18:15
clarkb	mnaser: oh if shutdown is an option we probably want to set it to that. Thank you	18:15
*** EricGonczer_ has joined #openstack-infra		18:16
mnaser	the odd thing is it seems to default to reap	18:16
clarkb	(though I'm finding our images don't shutdown they have to be destroyed... guessing -minimal image builds don't have the acpi bits to handle a graceful shutdown request)	18:16
mnaser	so it should be deleting them.. somehow	18:16
mnaser	but i guess its not	18:16
* mnaser shrugs		18:16
clarkb	ya double checking our config we don't seem to set any value	18:17
mnaser	i dont know enough to know why its not getting reap'd but i know we dont have it set to anything (i think) and we get orphan instances sometimes	18:17
*** jamesmcarthur has joined #openstack-infra		18:18
clarkb	ok I've got all of them in a non running state	18:19
clarkb	in chocolate	18:19
clarkb	mnaser: so you are saying clear out the content in /var/lib/nova/instances and virsh undefine the domains?	18:20
pabelanger	jeblair: question on changequeue merging: http://logs.openstack.org/33/491633/1/gate/gate-project-config-layout/3bc9763/console.html#_2017-08-08_10_50_23_258797 The reason networking-bagpipe is merged into other tripleojob is because it shares a job ate-tempest-dsvm-networking-bgpvpn-bagpipe-ubuntu-xenia with networking-bgpvpn, which it shares a job with tripleo-ci?	18:20
mnaser	clarkb yep, and just to be clear the contents of that specific instance id, not all of /var/lib/nova/instances :p	18:20
clarkb	ya	18:21
clarkb	/var/lib/nova/instances/$uuid	18:21
mnaser	yeah that should be okay	18:22
clarkb	ok compute064 is done	18:22
clarkb	I'm going to tail the nova compute log there and just doulbe check we don't get that warning again or any new errors before going to other hypervisors	18:23
*** jamesmcarthur has quit IRC		18:23
clarkb	mnaser: dansmith thanks for the help	18:24
*** jamesmcarthur has joined #openstack-infra		18:24
mnaser	np	18:24
*** trown\|lunch is now known as trown		18:30
*** jamesmcarthur has quit IRC		18:31
*** jamesmcarthur has joined #openstack-infra		18:32
*** nicolasbock has quit IRC		18:32
clarkb	logs look good on 064 and we have successfully booted new instances in that cloud. Going to move forward finishing up cleanup for the rest of these VMs	18:34
*** dprince has quit IRC		18:36
openstackgerrit	Mathieu Gagné proposed openstack-infra/project-config master: Bump internap-mtl01 capacity to 190 https://review.openstack.org/491882	18:37
*** florianf has quit IRC		18:39
*** jascott1 has joined #openstack-infra		18:39
pabelanger	mgagne: +2	18:40
pabelanger	and danke!	18:40
mgagne	=)	18:40
*** markvoelker has joined #openstack-infra		18:40
fungi	thanks clarkb and mnaser!	18:41
fungi	big thanks mgagne!	18:42
*** alexchadin has joined #openstack-infra		18:46
*** markvoelker has quit IRC		18:48
*** EricGonczer_ has quit IRC		18:48
*** Apoorva_ has quit IRC		18:49
*** Apoorva has joined #openstack-infra		18:50
*** alexchadin has quit IRC		18:50
*** EricGonczer_ has joined #openstack-infra		18:51
clarkb	ok I think chocolate is all cleaned up assuming my list of leaked VMs was complete	18:53
openstackgerrit	wes hayutin proposed openstack-infra/tripleo-ci master: fix random broken pipe on du command https://review.openstack.org/491884	18:54
clarkb	libvirt domains are all undefined and the nova instances dirs for each has been dleeted	18:54
*** florianf has joined #openstack-infra		18:54
clarkb	fungi: I can likely tackle vanilla after meeting and lunch but would be good if more than one person is familiar with this :)	18:55
fungi	clarkb: i don't disagree, though i'	18:56
fungi	ve done so little with infra-cloud so far that my learning curve will be steeeeep	18:56
clarkb	I'll walk you through it :)	18:57
fungi	much appreciated	18:57
clarkb	this particular issue isn't too bad especially once mnaser pointed out that log warning we can grep for	18:57
fungi	there's no tc meeting today, so in an hour i should have time to take it for a spin	18:57
clarkb	just lots of listing and cross referencing stuff	18:57
jeblair	pabelanger: that sounds plausible, though i haven't looked at the details. that is how the queue merging works.	18:58
dmsimard	The publishers I see i project-config are all scp, ftp, afs -- is there no way to run shell inside a publisher ? I see one instance of "postbuildscript" used here: https://github.com/openstack-infra/project-config/blob/master/jenkins/jobs/infra.yaml#L299 but I doubt it works	18:58
jeblair	dmsimard: yes that's complicated and best avoided.	18:59
jeblair	dmsimard: super easy in v3.	18:59
*** vhosakot has quit IRC		18:59
dmsimard	jeblair: context is to run log collection outside of the job and inside a publisher instead so that if the job times out, logs are available	19:00
fungi	it's that (weekly infra team meeting) time again! find us in #openstack-meeting for the next hour	19:00
clarkb	dmsimard: the way we do that with devstack-gate is to timeout the main test process 5 minutes before the job timeout	19:00
clarkb	dmsimard: then you have 5 minutes to colelct logs	19:00
jeblair	dmsimard: right. devstack-gate has support for that.	19:00
jeblair	dmsimard: otherwise, there isn't a good way for that in v2.	19:00
*** slaweq has quit IRC		19:00
dmsimard	clarkb: yikes	19:00
*** vhosakot has joined #openstack-infra		19:00
dmsimard	okay, then, thanks :)	19:00
*** slaweq has joined #openstack-infra		19:01
openstackgerrit	wes hayutin proposed openstack-infra/tripleo-ci master: fix random broken pipe on du command https://review.openstack.org/491884	19:03
*** vhosakot has quit IRC		19:05
*** slaweq has quit IRC		19:06
*** sslypushenko_ has joined #openstack-infra		19:06
*** slaweq has joined #openstack-infra		19:06
*** vhosakot has joined #openstack-infra		19:10
*** kjackal_ has joined #openstack-infra		19:20
*** baoli has quit IRC		19:22
sdague	fungi / clarkb interesting git review edge case I just ran into	19:23
sdague	3 patch series, in merge conflict	19:23
sdague	rebase the first patch in gerrit ui, second is a merge conflict so you can't	19:23
sdague	pull them down, rebase on master	19:23
sdague	git review... failed	19:23
sdague	because the bottom ref did not change	19:23
sdague	it will not push the other two	19:24
clarkb	sdague: what you can do is rebase the other two onto the updated base and that should work	19:24
fungi	strange, if the bottom patch isn't different from what's already in gerrit, it should ignore it and push the others	19:24
clarkb	because then it will have the same sha1 and not attempt a zero delta update (it will just recognize it as existing)	19:24
fungi	oh, yeah if you somehow changed the bottom sha locally after that	19:25
clarkb	fungi: its different in its sha1 because of timestamps and such but the patch diff is nil	19:25
fungi	right	19:25
sdague	http://paste.openstack.org/show/617823/	19:25
sdague	yeh	19:25
sdague	I ended up just making a random change in the base patch in the gerrit ui	19:25
fungi	i agree that's a tough one to automate away	19:25
sdague	then pushed over	19:25
sdague	yeh, it's definitely very edge case	19:25
clarkb	you don't need to make a random change in the base patch	19:25
sdague	but it seemed interesting enough to at least tell someone	19:25
clarkb	you just have to rebase second and third patch on what is in gerrit for first patch	19:26
sdague	clarkb: sure, but that's actually more work than gerrit random change && git review	19:26
clarkb	this is also why git review -x can be problematic because you can easily end up with updates to changes that are considered nil changes	19:26
clarkb	its also not something git review can really do anything about, its gerrit behavior	19:26
sdague	yep, that's fine	19:27
sdague	like I said, it's just an interesting edge condition	19:27
fungi	more work, but does avoid yet one more patchset on that change at least	19:29
fungi	so maybe a tradeoff	19:30
openstackgerrit	sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683	19:30
sdague	yeh, at this point I was optimizing for time	19:32
openstackgerrit	Monty Taylor proposed openstack-infra/project-config master: Add mapping file containing v2 to v3 mappings https://review.openstack.org/491804	19:39
*** markvoelker has joined #openstack-infra		19:44
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add comments about base jobs https://review.openstack.org/491897	19:45
openstackgerrit	sebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync https://review.openstack.org/487683	19:49
*** markvoelker has quit IRC		19:51
*** pushkaraj__ has quit IRC		19:56
clarkb	fungi: with meeting winding down. `OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml venv/bin/openstack --os-cloud admin-infracloud-vanilla compute service list` is what you run to get a list of all the nova services, we want to filter out the compute hosts from that. Then for each up compute host I ran ssh for loop to get `echo $hostname && sudo virsh list --all --uuid`	19:56
clarkb	fungi: for the computes that are down I manually attempted sshing to them and in chocolate none of them responded	19:57
clarkb	then with that list I removed any uuid taht showed up in nodepool launcher log from today, any that show up in nodepool list and any that show up in nodepool list. Also remove the mirror node	19:57
clarkb	fungi: then the remaining uuids I ssh'd into each compute host with one of them and ran virsh dumpxml $uuid to get info about the node (shows you the flavor and creation time)	19:58
fungi	clarkb: from the puppetmaster presumably	19:58
clarkb	once confirmed that the instances are old and not needed via dumpxml you do `virsh undefine $uuid` and then delete /var/lib/nova/instances/$uuid	19:58
pabelanger	neat, just got Your OpenStack Summit Sydney Registration Code email	19:59
clarkb	fungi: the nodepool checking I did on nodepool.o.o but the nova list on puppetmaster yes	19:59
fungi	pabelanger: from me or the full discount one?	19:59
pabelanger	fungi: from kendall@openstack.org	20:00
fungi	pabelanger: okay, so the full one. good ;)	20:00
fungi	since you were a ptg attendee in atlanta you shouldn't have gotten one from me	20:00
clarkb	fungi: I'm going to grab food now but will watch irc if you have questions about ^	20:00
*** pushkaraj__ has joined #openstack-infra		20:00
fungi	i only sent the us$300 codes, and those were de-duped to not include ptg attendees	20:01
*** baoli has joined #openstack-infra		20:01
fungi	clarkb: will do, this is enough to get me started, thanks!	20:01
pabelanger	fungi: cool	20:01
fungi	clarkb: quick question though, any reason you're calling openstackclient from a virtualenv rather than using the globally-installed one we have on the puppetmaster?	20:01
*** baoli_ has joined #openstack-infra		20:02
fungi	i've usually used the globally installed one, and it's been working fine, but wondering if i shouldn't be for some reason	20:02
fungi	looks like we have 7 compute hosts down in vanilla	20:03
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805	20:03
pabelanger	fungi: that sounds about right	20:04
fungi	cehcking the down ones, so far they don't even respond to ping (while the working ones do)	20:04
*** baoli has quit IRC		20:05
*** pushkaraj__ has quit IRC		20:05
clarkb	fungi: no reason for venv I like controlling the client versions	20:05
fungi	okay, so not for any particular bug	20:06
pabelanger	fungi: ya, I know the last few compute hosts I couldn't access via ilo either.	20:06
fungi	compute035.vanilla.ic is responding to ping but refused my first ssh connection (tcp rst)	20:06
fungi	now it's timing out subsequent ssh attempts	20:07
*** jamesdenton has quit IRC		20:07
fungi	wonder if i asploded it trying to ssh in	20:07
pabelanger	I know a few had HDDs that look to be dying	20:07
fungi	nah, some ssh attempts are refused by it, others time out	20:07
fungi	any good way to make ironic reboot these? openstack baremetal reboot or something?	20:08
fungi	public endpoint for baremetal service in RegionOne region not found	20:09
fungi	poop	20:09
*** jamesdenton has joined #openstack-infra		20:10
fungi	looks like maybe we don't have it in the catalog. i'm trying our instructions for hitting it from the controller	20:11
fungi	that seems to get it	20:11
*** e0ne has joined #openstack-infra		20:12
pabelanger	fungi: clarkb: so, I think we might have a bad hypervisor in citycloud-lon1, incoming logstash query	20:13
pabelanger	http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22%5C%5C%5C%22msg%5C%5C%5C%22%3A%20%5C%5C%5C%22Timer%20expired%5C%5C%5C%22%5C%22%20AND%20message%3A%5C%22%5C%5C%5C%22rc%5C%5C%5C%22%3A%20257%5C%22%20AND%20filename%3A%5C%22console.html%5C%22%20AND%20voting%3A1&from=864000s	20:13
clarkb	ya you have to hit it on the main baremetal node	20:13
clarkb	brcause bifrost is not a full openstack deployment	20:14
clarkb	so no real auth and isnt exposed eith thr other apis	20:14
pabelanger	fungi: clarkb: would should see about passing the info to citycloud and have them confirm	20:14
clarkb	pabelanger: we probably want to give them our VM uuids so they cantrack it to hypervisor	20:14
pabelanger	Ya, I can get a list here in a few minutes	20:14
fungi	gonna try to reboot all the controllers that i can't ssh into	20:15
fungi	er, compute nodes i mean	20:15
fungi	all the ones listed by nova as being down anyway	20:15
openstackgerrit	James E. Blair proposed openstack-infra/project-config master: Use new syntax for base jobs https://review.openstack.org/491906	20:16
openstackgerrit	James E. Blair proposed openstack-infra/zuul-jobs master: Remove base job https://review.openstack.org/491907	20:16
*** jkilpatr has quit IRC		20:16
*** jamesmcarthur has quit IRC		20:19
*** kgiusti has left #openstack-infra		20:19
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Require a base job https://review.openstack.org/491610	20:20
pabelanger	clarkb: last 4 UUIDs http://paste.openstack.org/show/617830/	20:21
pabelanger	I'll compose an email shortly	20:22
*** e0ne has quit IRC		20:23
*** adisky__ has quit IRC		20:23
fungi	i've confirmed i couldn't ssh into any of the down compute nodes in vanilla, and have asked ironic to reboot them. giving it a few minutes (none are up in nova's service list just yet)	20:25
*** jamesmcarthur has joined #openstack-infra		20:25
*** kjackal_ has quit IRC		20:25
pabelanger	fungi: thanks!	20:25
fungi	after that i'll start collecting instance lists	20:26
*** jcoufal has quit IRC		20:27
*** jamesmcarthur has quit IRC		20:28
pabelanger	clarkb: fungi: emails sent	20:31
*** rossella_s has quit IRC		20:31
fungi	thanks pabelanger!	20:33
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable https://review.openstack.org/491915	20:33
fungi	oh, hey, compute40 came back online after rebooting	20:33
*** rossella_s has joined #openstack-infra		20:35
clarkb	pabelanger: thanks, I see it	20:36
clarkb	fungi: nice	20:36
*** jkilpatr has joined #openstack-infra		20:37
*** e0ne has joined #openstack-infra		20:37
*** jamesmcarthur has joined #openstack-infra		20:38
*** felipemonteiro__ has quit IRC		20:40
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable https://review.openstack.org/491915	20:43
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable https://review.openstack.org/491915	20:44
fungi	compute35 came back up into a state where it again responds to ping and refuses ssh access. no clue what's up there	20:44
fungi	maybe it's missing a host key or something	20:44
*** marst_ has quit IRC		20:44
sshnaidm	clarkb, fungi fyi, solution for bug with multiple /etc is merging here: https://review.openstack.org/#/c/481233/	20:45
openstackgerrit	Merged openstack-infra/project-config master: Bump internap-mtl01 capacity to 190 https://review.openstack.org/491882	20:47
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable https://review.openstack.org/491915	20:49
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable https://review.openstack.org/491915	20:50
*** baoli_ has quit IRC		20:50
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805	20:50
*** krtaylor has quit IRC		20:51
fungi	sshnaidm: that looks like it could be an effective reduction. thanks	20:52
*** sbezverk has quit IRC		20:53
openstackgerrit	Monty Taylor proposed openstack-infra/project-config master: Add mapping file containing v2 to v3 mappings https://review.openstack.org/491804	20:55
*** dprince has joined #openstack-infra		20:55
openstackgerrit	Matthew Treinish proposed openstack/os-testr master: Switch to stestr under the covers https://review.openstack.org/488441	20:55
*** e0ne has quit IRC		20:55
*** marst has joined #openstack-infra		20:55
openstackgerrit	Merged openstack-infra/project-config master: Use new syntax for base jobs https://review.openstack.org/491906	20:56
fungi	clarkb: okay, i've confirmed that the remaining down computes after attempting to reboot them are still inaccessible via ssh, so proceeding to the instance lists collection phase	20:57
clarkb	ok	20:57
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805	20:58
openstackgerrit	Merged openstack-infra/zuul-jobs master: Create fetch-tox-output role https://review.openstack.org/490643	20:59
*** camunoz has quit IRC		21:01
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805	21:01
*** spzala has quit IRC		21:02
*** iyamahat has quit IRC		21:02
*** baoli has joined #openstack-infra		21:03
*** trown is now known as trown\|outtypewww		21:03
*** iyamahat has joined #openstack-infra		21:03
fungi	clarkb: pabelanger: should i find it odd that puppetmaster isn't recognizing the ssh host keys for a lot of infracloud vanilla compute nodes?	21:04
fungi	starting to wonder how ansible has been dealing with them	21:04
*** dprince has quit IRC		21:05
clarkb	ya I would expect the root user to be able to ssh to them as part of the ansibling	21:05
clarkb	I ssh'ed from my local desktop when doing the virsh listings in chocolate though	21:05
fungi	d'oh, operator error	21:06
fungi	i was missing the sudo on ssh	21:06
fungi	so it was trying to add them to my ~/.ssh/known_hosts on puppetmaster	21:07
openstackgerrit	Gabriele Cerami proposed openstack-infra/tripleo-ci master: WIP: containers periodic test https://review.openstack.org/475747	21:08
*** yamamoto has joined #openstack-infra		21:09
fungi	clarkb: optimization, for my own sense of laziness... gonna generate two uuid lists a little while apart and only check entries which appear in both lists	21:10
fungi	need to grab a bite to eat anyway, so i'll put that delay to good use	21:10
clarkb	ok	21:10
openstackgerrit	Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: WIP: Add upload-pypi job https://review.openstack.org/491926	21:11
clarkb	fungi: another way is to do xml parsing and only look at domains for which the creation time is older than say a week	21:12
clarkb	but that is likely far more work because xml	21:12
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805	21:13
clarkb	fungi: pabelanger we might also want to followup with citycloud on the state of sto2 (I think ti was that region) as that is 50 instance quota we are not able to use currently	21:13
*** ldnunes_ has quit IRC		21:15
*** sslypushenko_ has quit IRC		21:16
pabelanger	clarkb: Ya, I haven't heard anything back myself	21:16
*** camunoz has joined #openstack-infra		21:17
*** slaweq has quit IRC		21:18
openstackgerrit	Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: WIP: Add upload-pypi job https://review.openstack.org/491926	21:18
*** jamesmcarthur has quit IRC		21:20
*** EricGonczer_ has quit IRC		21:21
*** yamamoto has quit IRC		21:21
*** sbezverk has joined #openstack-infra		21:22
openstackgerrit	James E. Blair proposed openstack-infra/project-config master: Zuulv3: update sql reporter syntax https://review.openstack.org/491932	21:23
jeblair	pabelanger, mordred: ^ we need that in to restart zuulv3	21:23
pabelanger	+2	21:25
*** rockyg has joined #openstack-infra		21:27
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Add comments about base jobs https://review.openstack.org/491897	21:29
*** annegentle has joined #openstack-infra		21:32
*** felipemonteiro_ has joined #openstack-infra		21:33
*** felipemonteiro__ has joined #openstack-infra		21:35
*** pvaneck_ has joined #openstack-infra		21:35
*** markvoelker has joined #openstack-infra		21:37
*** thorst has quit IRC		21:37
*** pvaneck has quit IRC		21:38
*** felipemonteiro_ has quit IRC		21:38
openstackgerrit	Merged openstack-infra/project-config master: Zuulv3: update sql reporter syntax https://review.openstack.org/491932	21:43
*** markvoelker has quit IRC		21:44
*** jascott1 has quit IRC		21:45
*** jascott1 has joined #openstack-infra		21:45
*** jascott1 has quit IRC		21:46
*** jascott1 has joined #openstack-infra		21:46
*** jascott1 has quit IRC		21:49
*** jascott1 has joined #openstack-infra		21:50
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion https://review.openstack.org/491805	21:53
openstackgerrit	Monty Taylor proposed openstack-infra/project-config master: Run zuul-migrate job on changes to mapping file https://review.openstack.org/491937	21:54
mordred	jeblair, pabelanger: ^^ also there's the project-config change to run that job on changes to the mapping file	21:54
*** jascott1 has quit IRC		21:54
*** yamamoto has joined #openstack-infra		21:59
*** florianf has quit IRC		21:59
*** dprince has joined #openstack-infra		21:59
*** priteau has quit IRC		22:02
*** esberglu has quit IRC		22:03
*** markvoelker has joined #openstack-infra		22:04
*** esberglu has joined #openstack-infra		22:04
*** markvoelker_ has joined #openstack-infra		22:05
*** esberglu has quit IRC		22:08
*** markvoelker has quit IRC		22:09
clarkb	I eyeballed the PTG walk poorly. its .8 miles according to google (still walkable but quite a bit more than 1/4 mile)	22:14
*** camunoz has quit IRC		22:16
*** esberglu has joined #openstack-infra		22:16
fungi	yeah, no concerns with that on my part	22:17
fungi	my luggage is a backpack, so i could do miles on foot with it uphill if needed	22:17
*** slaweq has joined #openstack-infra		22:19
*** slaweq has quit IRC		22:24
*** rockyg has quit IRC		22:24
openstackgerrit	Tim Burke proposed openstack-infra/project-config master: Add release notes jobs for python-swiftclient https://review.openstack.org/491940	22:26
*** Julien-zte has joined #openstack-infra		22:27
jeblair	clarkb, mordred, fungi: are there other reports of infracloud being slow?	22:29
clarkb	jeblair: I think we've heard it about other clouds but haven't seen infracloud necessarily	22:29
clarkb	we are also tracking job timeouts with e-r and they are up since turning off osic	22:29
clarkb	you mgiht want to pull up that query and see what the cloud distribution is	22:30
*** spzala has joined #openstack-infra		22:30
fungi	also we still keep fiddling with the max-servers in infra-cloud to figure out how hard we can push it (and as we run at capacity for a while we've still needed to reduce it a couple times)	22:30
*** dprince has quit IRC		22:31
*** felipemonteiro__ has quit IRC		22:31
clarkb	http://blog.ffwll.ch/2017/08/github-why-cant-host-the-kernel.html completely unrelated but potentially interesting	22:32
jeblair	i'm trying to figure out if we should scale infracloud back some more	22:33
jeblair	i don't really have the time to tune it myself. so if we think this is a signal that we're still oversubscribed, maybe we should lower our usage some.	22:33
jeblair	but if we think it's an errant signal, i'll just ignore it for now and see if zuulv3 jobs run faster when we're less busy.	22:34
clarkb	vanilla is 28% of job timeouts, chocolate is 17%	22:34
clarkb	citycloud lon1 is 15% and rax ord is 11%	22:34
*** aeng_ has joined #openstack-infra		22:34
*** xyang1 has quit IRC		22:34
clarkb	so vanilla is significanlty more likely to timeout that other regions but chocolate seems to be in line (if high) with other regions	22:35
clarkb	also that isn't scaled against total server quota, just percentage of total fails	22:35
jeblair	vanilla is 12% of capacity and chocolate is 9%	22:35
jeblair	so the're vaguely hand-wavey 2x as represented in timeouts as they should be based on their proportion of quota	22:36
*** markvoelker has joined #openstack-infra		22:36
*** gordc has quit IRC		22:37
*** bobh has quit IRC		22:37
*** markvoel_ has joined #openstack-infra		22:37
clarkb	we've also done about 10 timeouts per hour based on logstash data	22:38
*** thorst has joined #openstack-infra		22:38
clarkb	out of 600-900 jobs launched per hour	22:38
clarkb	based on that I think I would tune vanilla back	22:40
*** markvoelker_ has quit IRC		22:40
clarkb	chocolate maybe less so? but likely needs it as well	22:40
*** markvoelker has quit IRC		22:41
*** Julien-zte has quit IRC		22:41
*** jascott1 has joined #openstack-infra		22:41
*** jaypipes has quit IRC		22:42
*** Julien-z_ has joined #openstack-infra		22:42
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Don't pass self to a bound method https://review.openstack.org/491946	22:43
*** thorst has quit IRC		22:43
clarkb	I think that cleaning up any leaked instances will help too	22:44
clarkb	chocolate should start being better in that regard, but to be determined if vanilla has a problem	22:44
openstackgerrit	Swaminathan Vasudevan proposed openstack/diskimage-builder master: Failed to open parameter YAML error while trying to unmount imagedir https://review.openstack.org/490637	22:45
fungi	yeah, we've got one node stuck in a delete state in vanilla for several months i'm trying to work out how to clean up	22:45
fungi	looks like it's active according to nova so i'm going to attempt to delete it through the api	22:46
clarkb	fungi: we have one of those in chocolate too if you find out how to clear the one in vanilla we can do that one next	22:46
*** rbrndt has quit IRC		22:46
fungi	and nova continues to list it in an active state	22:47
fungi	not reachable via ssh	22:48
fungi	the uuid also isn't showing up in the virsh list	22:48
fungi	so i think this one's teh inverse of the others from earlier. nova still thinks it exists but it doesn't appear to (or maybe it's on a dead compute node?)	22:49
*** vhosakot has quit IRC		22:49
clarkb	ah	22:49
clarkb	if you nova show it as admin I think you can get the hypervisor info from nova	22:50
clarkb	you might also be able to tell nova to forget about it as admin?	22:50
fungi	cool, will try that in a jiffy	22:50
*** krtaylor has joined #openstack-infra		22:50
jeblair	okay, so ze01 can rsync a repo to mtl01 at 6mbps, but it can rsync the same repo to vanilla at something like 160kbps	22:51
* fungi wonders if we need to switch our isdn to dual-channel		22:51
fungi	btw, we're down to 8 uuids in vanilla which appeared in my initial list. i'm about to check whether we have any left in nodepool older than the initial list i made (other than the months-old ghost that is)	22:53
jeblair	a wget of a large file saving to /dev/null on a vanilla node is reporting 70kbps	22:53
jeblair	same file on mtl01 is 30mbps	22:54
jeblair	same file on comput032.vanilla is also 70kbps	22:55
fungi	looks like we still have a handful of nodes in vanilla running jobs since before i pulled the uuids from virsh, so odds are once these age out we're left with very few (if any) leaked from nova	22:55
jeblair	so i think we're saturating our network link	22:55
clarkb	jeblair: check chocolate too as its the same networking I think	22:55
clarkb	(would help rule out hardware problems as it is different base hardware on roughyl the same networking)	22:56
jeblair	clarkb: yeah, i checked a chocolate node and it's reporting about 180kbps	22:56
jeblair	so, erm, twice as fast? :)	22:56
jeblair	but it varies a lot, so could be approx the same	22:57
fungi	so the phantom instance nova says is on compute012 but virsh list on that host doesn't include it	22:59
clarkb	fungi: and you are using virsh list --all?	22:59
clarkb	without --all you only see running instances	22:59
clarkb	also check if /var/lib/nova/instances has a dir for it	23:00
fungi	yup, --all --uuid	23:00
*** markvoelker has joined #openstack-infra		23:01
fungi	there are in fact only two instances listed on compute012 and neither matches this uuid	23:01
fungi	ubuntu-xenial-infracloud-vanilla-8895313 created 2017-05-19T10:07:18Z	23:02
openstackgerrit	James E. Blair proposed openstack-infra/project-config master: Reduce infra-cloud usage https://review.openstack.org/491949	23:03
*** markvoel_ has quit IRC		23:03
jeblair	clarkb, fungi: there's a shot in the dark reduction ^	23:04
jeblair	clarkb, fungi: do we have any information about the network there and what we should expect?	23:04
clarkb	jeblair: after the flood I know that local networking went to 1gig instead of 10Gbe in vanilla. But unsure of the internet connectivity	23:04
clarkb	its also possible they are just throttling the hell out of us	23:04
*** xarses_ has quit IRC		23:05
*** spzala has quit IRC		23:05
*** spzala has joined #openstack-infra		23:05
fungi	same, all i know is what i can read on https://docs.openstack.org/infra/system-config/infra-cloud.html	23:05
*** spzala has quit IRC		23:05
*** spzala has joined #openstack-infra		23:06
*** marst has quit IRC		23:06
*** spzala has quit IRC		23:06
clarkb	rcarrillocruz: and cmurphy (possibly jesusaur) may know more	23:06
*** jascott1 has quit IRC		23:06
*** rhallisey has quit IRC		23:06
*** spzala has joined #openstack-infra		23:07
*** spzala has quit IRC		23:07
*** spzala has joined #openstack-infra		23:07
*** pbourke has quit IRC		23:07
*** spzala has quit IRC		23:07
*** spzala has joined #openstack-infra		23:08
*** spzala has quit IRC		23:08
*** pbourke has joined #openstack-infra		23:09
jeblair	i feel like i'm missing something	23:09
jeblair	we set max-servers to 96 on vanilla -- we have 45 compute hosts -- that's almost down to two vms per host	23:10
fungi	we're now down to 4 uuids which were present in vanilla when i first checked, and two of those are known to nodepool, meaning we have two needing cleanup	23:10
openstackgerrit	Isaku Yamahata proposed openstack-infra/project-config master: networking-odl: retire boron task https://review.openstack.org/491951	23:10
jeblair	cacti says the compute hosts average about 25mbit continuous inbound traffic	23:10
jeblair	2x150kbps != 25mbps	23:11
jeblair	how is it anything other than way undercommitted?	23:11
clarkb	fwiw its 35 compute hosts in vanilla that are operational	23:11
jeblair	so nearly 3 nodes / host	23:12
clarkb	ya 3 is 1:1 cpu ration	23:12
clarkb	*ratio	23:12
jeblair	and in fact, compute032 is running 3 instances right now	23:12
*** sflanigan has joined #openstack-infra		23:13
jeblair	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=5981&rra_id=all	23:13
clarkb	the base data is likely rabbitmq traffic and glance image transfers which is all on the same layer 2 so we should be running at 1gig for that	23:14
jeblair	that graph makes it look like we've been running flat out at 25mbps for nearly 24h	23:14
jeblair	image transfers should be a spike, and i hope we're not doing 25mbps of rabbit	23:15
clarkb	ya I think the spike to 100Mbps must be image trasnfers	23:15
jeblair	sounds reasonable	23:15
fungi	so interestingly, these two "leaked" instances in vanilla are known to nova, neither is leaked. one is the mirror and the other is pabelanger-test1... so other than the phantom instance that we can't delete because it doesn't actually exist, we have no discrepancies there	23:15
clarkb	that also makes me worry that we have been given 100mbps connectivity there and not gigabit	23:16
jeblair	clarkb: indeed	23:16
clarkb	jeblair: I know that SpamapS and greghaynes found rabbit to be very chatty. It wouldn't surprise me if it was doing 25mbps but I also think it is insane to be doing that	23:16
openstackgerrit	Isaku Yamahata proposed openstack-infra/project-config master: networking-odl: retire boron task https://review.openstack.org/491951	23:16
*** markvoelker has quit IRC		23:18
*** markvoelker has joined #openstack-infra		23:18
fungi	trying to see if i can tease interface speeds out of one of the controllers	23:19
fungi	er, computes	23:19
jeblair	clarkb: when i run iftop on compute032, i see a lot of connections between zuul/git.o.o and many different infracloud ips	23:19
jeblair	clarkb: i would only expect to see connections to the 3 ips of the nodes running on compute032	23:19
*** pvaneck_ has quit IRC		23:19
*** rhallisey has joined #openstack-infra		23:20
jeblair	clarkb: eg: http://paste.openstack.org/show/617852/	23:20
clarkb	jeblair: as if we are on a hub not a switch	23:20
fungi	on compute12 (i had to install the ethtool package) i see eth2 is the only physical interface with link detected and it claims to be operating at 10000baseT/Full	23:21
jeblair	clarkb: ya	23:21
*** aeng has quit IRC		23:21
fungi	which i find dubious	23:21
clarkb	fungi: ya eth2 is our only link	23:21
clarkb	its why we have the weird bridge thing for neutron	23:21
fungi	the 10gb link speed i find dubious i mean	23:22
clarkb	but the weird bridge thing for neutron should be a proper switch	23:22
clarkb	fungi: oh ya	23:22
*** sdague has quit IRC		23:22
fungi	i guess we could check the bridge table in the kernel and make sure only local macs are showing up on the local interfaces?	23:23
fungi	(to rule out bridge loops)	23:23
*** markvoelker has quit IRC		23:23
* fungi plays around with brctl		23:23
mnaser	does all openstack infra testing happen on 8 core machines?	23:24
fungi	br-vlan2551 and brq85ba3bb6-1f (on compute12) both seem to have a lot of macs showing on them	23:25
clarkb	mnaser: it does not but hasn't always been the case (nor will it necessarily be the case in the future)	23:25
persia	mnaser: That is the default request from infrastructure donors.	23:25
clarkb	s/not/now/	23:25
*** Swami has quit IRC		23:25
fungi	mnaser: per https://docs.openstack.org/infra/manual/testing.html#known-differences-to-watch-out-for it can vary	23:26
clarkb	fungi: the rough setup is eth2 - eth2.$vlan - br-$vlan - veth1 - veth2 iirc	23:26
mnaser	so.. we're testing some new flavors with fully dedicated cores and i kinda wanted to throw a small 10-15 nodes to see how it copes with (and also help the gate a tiny bit if the setup is still there)	23:26
mnaser	for example you'd get 2 cores + 8gb of memory, but 2 fully dedicated cores	23:26
clarkb	fungi: the reason for that is we need an interface on $vlan for the hypervisor but need to put neutron the same vlan without letting it manage the vlan (so neutron is all untagged because if we let neutron manage it then it borks the hypervisor interface on the same vlan)	23:26
clarkb	fungi: brq$stuff is the bridge neutron manages	23:27
fungi	i was about to ask, that was my first guess though	23:27
clarkb	I think what we want to do is tcpdump eth2 and see if we are getting hub like behavior (but thats rouhgly what iftop was doing for me)	23:28
clarkb	because eth2 should be the raw ethernet connection and we should only see stuff destined to hosts behind it on the hypervisor	23:28
fungi	clarkb: okay, so given that br-vlan2551 only shows a couple local macs and the rest (147 at the moment) are all showing nonlocal	23:28
fungi	i'm guessing we don't see any sort of reflection to account for a storm	23:29
clarkb	but could we be contending for access to the bus if we are plugged into a hub?	23:30
clarkb	we shouldn't see 147 non local IPs I don't think	23:30
clarkb	controller, upstream router, and VMs are all we should have right?	23:30
clarkb	oh it could be hypervisor to hypervisor because of multinode	23:30
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Don't pass self to a bound method https://review.openstack.org/491946	23:30
clarkb	so maybe this is ok	23:30
*** hongbin has quit IRC		23:31
fungi	by "plugged into a hub" i assume you mean a switch which has given up trying to track macs in its bridge table. i can't imagine where they'd find an ethernet hub in this day and age	23:31
jeblair	and the mirror	23:31
clarkb	but possible 100Mbps not 1gig or 1gig	23:31
clarkb	fungi: right that, cam table filled and you lose	23:31
fungi	i can certainly imagine some scenarios where certain switches may run into issues if we cycle through random macs faster than they get aged out of teh table and end up filling them up	23:32
fungi	yes, that	23:32
jeblair	48/96 nodes are being used for multinode jobs	23:32
clarkb	jeblair: what is a transfer between hypervisors like	23:33
clarkb	that should be our best case transfer	23:33
jeblair	will check	23:33
fungi	compute017 is the one hosting the mirror, if that helps to compare against	23:33
fungi	in vanilla	23:34
fungi	and wow does it show signs of packet loss!	23:34
fungi	http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=3&leaf_id=422	23:34
fungi	very high error count on eth2 as well	23:35
fungi	most of its eth2 traffic is from eth2.2551	23:36
jeblair	clarkb: 160mbps	23:36
fungi	i can see on the graph where the mirror vm got rebuilt since it seems like that's probably when it landed on this compute node	23:37
jeblair	that's reading from disk	23:38
clarkb	fungi: ya eth2.2552 is where we tag the vlan for all traffic outbound	23:39
clarkb	fungi: so that includes all the hypervisor communication and all the VM communication	23:39
jeblair	clarkb: sorry, 160mBps so 1280mbps	23:39
fungi	picking another compute node at random, also seeing gaps in snmp responses and pretty high error rate on eth2	23:40
clarkb	jeblair: cool so we likely aren't having global issues. The best case comes out pretty well	23:40
fungi	so i do agree it's more likely we're saturating the uplink rather than intra-cloud links	23:40
clarkb	its possible that linux is hating us for the chained bridges (yay software switches) or upstream devices have trouble with VM macs changing frequently or problems are largely to internet?	23:40
clarkb	fungi: ya	23:41
fungi	baremetal00 shows similar packet loss and errors on eth2	23:41
fungi	hrm	23:42
fungi	though it's seeing a solid 20mbps inbound on eth2.2551	23:42
fungi	it shouldn't really be consuming anything, right?	23:42
clarkb	fungi: what does iftop show it talking to or just netstat?	23:43
fungi	that one probably makes a good control group if we're looking for signs of a storm	23:43
clarkb	bifrost/ironic will do heartbeats to the nodes	23:43
clarkb	but ya 20mbps for heartbeats seems really high	23:43
clarkb	and yes it should be good control group	23:43
*** mattmceuen has quit IRC		23:45
*** soliosg has quit IRC		23:46
fungi	this doesn't look good. didn't even have to go that far	23:46
clarkb	thats interesting I see nb03 and nb04 comms	23:46
clarkb	to controll00 from compute021	23:46
fungi	yeah, a tcpdump on eth2.2551 showed me a centos7 test node talking to git.o.o	23:47
fungi	that should _never_ make it to baremetal00	23:47
clarkb	sudo tcpdump -i any host 199.19.215.9 shows nb03 to controller00	23:48
clarkb	so ya	23:48
fungi	so definitely looks like the switch layer in that rack at least is falling back to flooding behavior	23:48
clarkb	ya	23:48
fungi	we can e-mail hpe and ask them to power-cycle the switches i guess, though as i understand it we're not the only machines plugged into them	23:49
clarkb	maybe have them check the theory at least	23:49
clarkb	I wonder what our router is though	23:50
fungi	15.184.64.1	23:50
fungi	that's probably not what you meant	23:50
clarkb	well sort of I know when tripleo was using this setup they used linux as a router	23:51
clarkb	but that doesn't appear to be one of ours	23:51
clarkb	so I think this is entirely upstream of us	23:51
fungi	bug surprise, the oui of the router's mac (bceafa) is assigned to... [drumroll]	23:52
fungi	Hewlett Packard	23:52
fungi	so could be linux on a "newer" proliant (post-compaq) or something i guess	23:53
jeblair	so likely two things: 1) switch acting as hub -- annoying, taxes each compute node with an extra 20mbps it has to ignore, but probably not killing performance. 2) upstream bandwidth limit.	23:54
jeblair	that sound right?	23:54
*** gildub has joined #openstack-infra		23:55
fungi	yeah, that's the best i can piece together	23:55
jeblair	(interestingly, i wonder if the 20-25mbps we're seeing on all the nodes because of the switch behavior clues us into our upstream bandwidth? 25mbps/96=260kbps which is not too far off from the 160kbps we measured earlier)	23:56
clarkb	also I bet image uploads tank that bw	23:57
fungi	probably so	23:57
fungi	we _could_ test the theory by dialing down max-servers to 0 in both clouds and then doing some bulk transfers to/from baremetal00 or the mirror or something	23:57
fungi	if we really wanted a more accurate picture	23:58
fungi	also possible we're not being throttled, but are sharing an uplink from that pod with some much more network-consuming neighbors hogging the available bandwidth	23:58
*** slagle has quit IRC		23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!