Tuesday, 2018-12-11

corvus	i believe it will ignore it	00:01
pabelanger	yup, just testing on 3.1.1. Not errors on PR	00:03
*** efried has quit IRC		00:05
*** hwoarang has quit IRC		00:06
*** hwoarang has joined #openstack-infra		00:08
*** efried has joined #openstack-infra		00:10
*** slaweq has joined #openstack-infra		00:11
*** efried has quit IRC		00:14
*** efried has joined #openstack-infra		00:14
*** efried has quit IRC		00:15
*** slaweq has quit IRC		00:16
*** rh-jelabarre has quit IRC		00:18
*** rh-jelabarre has joined #openstack-infra		00:23
clarkb	I've not seen a successful upload to inap yet, but I think those that have failed may have started before mgagne_ fixed thigns? if they are still failing in another hour so then liekly not fixed sdk side	00:28
clarkb	an image just went ready in inap \o/	00:32
clarkb	mnaser: ^ fyi we should be closing in on fixing that centos image problem	00:32
ianw	:/ it looks like the readthedocs isn't triggering correctly any more. unfortunately to not echo out the password we have no_log on the important bits	00:32
mordred	clarkb: I was only mildly following - mgagne found a thing?	00:33
clarkb	mordred: oui	00:33
mordred	clarkb: awesome	00:33
clarkb	so this may have been entirely cloud side	00:33
clarkb	I think it helped that osc was able to reproduce a failure if not the same one	00:34
*** wolverineav has quit IRC		00:36
*** wolverineav has joined #openstack-infra		00:37
clarkb	\| 0000000040 \| 0000000017 \| inap-mtl01 \| centos-7 \| centos-7-1544483708 \| 9416a0d2-48f9-43c3-9aed-271635b897dd \| ready \| 00:00:02:26 \|	00:39
clarkb	osa centos jobs should be happy now	00:39
clarkb	if they start on new nodes	00:39
*** wolverineav has quit IRC		00:42
*** kjackal has joined #openstack-infra		00:43
*** jcoufal has quit IRC		00:44
ianw	... {"detail":"CSRF Failed: CSRF cookie not set."} ... i do not like the look of this, rtd might have broken access to the authenticated endpoint	00:45
*** yamamoto has quit IRC		00:49
*** _alastor_ has joined #openstack-infra		00:59
*** rockyg has quit IRC		00:59
*** sthussey has quit IRC		01:03
*** wolverineav has joined #openstack-infra		01:05
*** rockyg has joined #openstack-infra		01:07
*** wolverineav has quit IRC		01:07
*** wolverineav has joined #openstack-infra		01:07
ianw	well i don't think there's much we can do ... filed https://github.com/rtfd/readthedocs.org/issues/4986	01:08
*** rkukura has quit IRC		01:12
*** rockyg has quit IRC		01:14
mnaser	clarkb: thank you so much!	01:14
*** ianychoi has quit IRC		01:20
*** _alastor_ has quit IRC		01:25
*** bobh has joined #openstack-infra		01:27
*** bobh has quit IRC		01:31
*** hwoarang has quit IRC		01:36
*** hwoarang has joined #openstack-infra		01:37
*** kjackal has quit IRC		01:40
*** rkukura has joined #openstack-infra		01:55
*** neilsun has joined #openstack-infra		01:58
*** _alastor_ has joined #openstack-infra		02:01
*** _alastor_ has quit IRC		02:06
*** mrsoul has joined #openstack-infra		02:07
*** bobh has joined #openstack-infra		02:08
*** bobh has quit IRC		02:12
*** jistr has quit IRC		02:42
*** jistr has joined #openstack-infra		02:50
*** psachin has joined #openstack-infra		02:52
*** anteaya has quit IRC		03:01
*** bhavikdbavishi has joined #openstack-infra		03:07
*** apetrich has quit IRC		03:15
*** hongbin has joined #openstack-infra		03:15
*** rh-jelabarre has quit IRC		03:38
*** bobh has joined #openstack-infra		03:39
*** ykarel\|away has joined #openstack-infra		03:42
*** bobh has quit IRC		03:43
*** udesale has joined #openstack-infra		03:48
*** gyee has quit IRC		03:53
*** jamesden_ has joined #openstack-infra		03:54
*** agopi_ has joined #openstack-infra		03:54
*** agopi has quit IRC		03:54
*** jamesdenton has quit IRC		03:55
*** markvoelker has joined #openstack-infra		03:57
*** ramishra has quit IRC		03:59
*** bobh has joined #openstack-infra		04:01
*** markvoelker has quit IRC		04:02
*** bobh has quit IRC		04:06
*** udesale has quit IRC		04:08
*** mriedem_away has quit IRC		04:15
*** _alastor_ has joined #openstack-infra		04:23
*** wolverineav has quit IRC		04:29
*** wolverineav has joined #openstack-infra		04:30
*** slaweq has joined #openstack-infra		04:30
*** jamesmcarthur has joined #openstack-infra		04:31
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: executor: add support for generic build resource https://review.openstack.org/570668	04:38
*** udesale has joined #openstack-infra		04:43
*** yamamoto has joined #openstack-infra		04:46
*** lpetrut has joined #openstack-infra		04:55
*** wolverineav has quit IRC		05:05
*** hongbin has quit IRC		05:07
*** hwoarang has quit IRC		05:10
*** hwoarang has joined #openstack-infra		05:12
*** wolverineav has joined #openstack-infra		05:13
*** jamesmcarthur has quit IRC		05:18
*** jamesmcarthur has joined #openstack-infra		05:18
*** ykarel\|away has quit IRC		05:19
*** chandan_kumar has joined #openstack-infra		05:21
*** jamesmcarthur has quit IRC		05:23
*** ramishra has joined #openstack-infra		05:26
*** agopi_ is now known as agop		05:30
*** agop is now known as agopi		05:30
*** ykarel\|away has joined #openstack-infra		05:35
*** lpetrut has quit IRC		05:35
*** ykarel\|away is now known as ykarel		05:42
*** jamesmcarthur has joined #openstack-infra		05:50
*** _alastor_ has quit IRC		05:50
*** wolverineav has quit IRC		05:54
*** yboaron_ has joined #openstack-infra		06:00
openstackgerrit	OpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/624277	06:06
openstackgerrit	gaobin proposed openstack-infra/zuul master: Modify some file content errors https://review.openstack.org/624278	06:08
openstackgerrit	gaobin proposed openstack-infra/zuul master: Modify some file content errors https://review.openstack.org/624278	06:11
*** wolverineav has joined #openstack-infra		06:18
*** _alastor_ has joined #openstack-infra		06:19
*** _alastor_ has quit IRC		06:23
*** betherly has quit IRC		06:24
*** ahosam has joined #openstack-infra		06:29
*** jmccrory has quit IRC		06:34
*** sdake has quit IRC		06:35
*** jmccrory has joined #openstack-infra		06:40
*** sdake has joined #openstack-infra		06:40
*** apetrich has joined #openstack-infra		06:40
*** bobh has joined #openstack-infra		06:41
*** rcernin has quit IRC		06:43
*** wolverineav has quit IRC		06:44
*** bobh has quit IRC		06:46
*** jamesmcarthur has quit IRC		06:50
*** rlandy has quit IRC		06:51
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul-base-jobs master: Add base openshift job https://review.openstack.org/570669	06:53
*** wolverineav has joined #openstack-infra		07:01
*** ahosam has quit IRC		07:08
*** e0ne has joined #openstack-infra		07:10
*** e0ne has quit IRC		07:12
*** e0ne has joined #openstack-infra		07:13
*** e0ne has quit IRC		07:14
*** quiquell\|off is now known as quiquell		07:14
*** ramishra has quit IRC		07:15
*** e0ne has joined #openstack-infra		07:15
*** e0ne has quit IRC		07:17
openstackgerrit	Merged openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/624277	07:22
*** bobh has joined #openstack-infra		07:25
*** wolverineav has quit IRC		07:29
*** jtomasek has joined #openstack-infra		07:29
*** bobh has quit IRC		07:29
*** ykarel is now known as ykarel\|lunch		07:31
*** kjackal has joined #openstack-infra		07:34
*** yboaron_ has quit IRC		07:35
openstackgerrit	Merged openstack-infra/zuul master: Add spacing to Queue lengths line https://review.openstack.org/623960	07:37
*** jtomasek has quit IRC		07:42
*** jtomasek has joined #openstack-infra		07:43
*** ahosam has joined #openstack-infra		07:43
*** rossella_s has quit IRC		07:46
*** ahosam has quit IRC		07:49
*** quiquell is now known as quiquell\|brb		07:53
*** oanson has quit IRC		07:54
*** agopi_ has joined #openstack-infra		07:56
*** tosky has joined #openstack-infra		07:58
*** agopi has quit IRC		07:59
*** ramishra has joined #openstack-infra		08:01
*** bobh has joined #openstack-infra		08:01
*** ginopc has joined #openstack-infra		08:02
*** longkb has joined #openstack-infra		08:02
*** rossella_s has joined #openstack-infra		08:03
*** ccamacho has joined #openstack-infra		08:04
*** agopi_ is now known as agopi		08:04
*** bobh has quit IRC		08:05
*** kjackal has quit IRC		08:06
*** mgoddard has quit IRC		08:10
*** mgoddard has joined #openstack-infra		08:10
*** agopi_ has joined #openstack-infra		08:11
*** agopi has quit IRC		08:14
*** kjackal has joined #openstack-infra		08:15
*** shardy has joined #openstack-infra		08:15
amorin	hey frickler and others, I am moving your instances on separate hosts	08:20
amorin	in the meantime, we found a issue on the hypervisors	08:20
amorin	about RAM usage	08:20
amorin	if the instances are not having enough memory, they could be using swap instead, which could cause them to slow down a lot	08:21
*** imacdonn has quit IRC		08:22
*** imacdonn has joined #openstack-infra		08:23
*** _alastor_ has joined #openstack-infra		08:25
*** agopi_ is now known as agopi		08:27
*** bhavikdbavishi has quit IRC		08:28
*** shardy has quit IRC		08:28
*** hwoarang has quit IRC		08:28
*** shardy has joined #openstack-infra		08:29
*** ykarel\|lunch is now known as ykarel		08:30
*** _alastor_ has quit IRC		08:30
*** hwoarang has joined #openstack-infra		08:30
*** ahosam has joined #openstack-infra		08:32
*** ahosam has quit IRC		08:32
*** priteau has joined #openstack-infra		08:39
*** quiquell\|brb is now known as quiquell		08:40
*** ramishra has quit IRC		08:44
*** ramishra has joined #openstack-infra		08:51
*** bobh has joined #openstack-infra		08:51
*** bobh has quit IRC		08:56
*** yamamoto has quit IRC		09:01
*** ahosam has joined #openstack-infra		09:02
*** jpena\|off is now known as jpena		09:03
*** dpawlik has quit IRC		09:03
*** dpawlik has joined #openstack-infra		09:04
*** eumel8 has joined #openstack-infra		09:05
*** ahosam has quit IRC		09:05
*** wolverineav has joined #openstack-infra		09:07
*** jtomasek_ has joined #openstack-infra		09:08
*** dpawlik has quit IRC		09:08
*** jpich has joined #openstack-infra		09:09
*** jtomasek has quit IRC		09:10
*** wolverineav has quit IRC		09:12
*** kjackal has quit IRC		09:13
*** kjackal has joined #openstack-infra		09:14
*** yamamoto has joined #openstack-infra		09:18
*** lpetrut has joined #openstack-infra		09:21
AJaeger	ianw: looking at https://review.openstack.org/621840 - do you have a change that tests it and shows that it does the right thing?	09:21
*** yamamoto has quit IRC		09:27
*** bobh has joined #openstack-infra		09:30
*** derekh has joined #openstack-infra		09:36
*** dpawlik has joined #openstack-infra		09:39
*** dpawlik has quit IRC		09:39
*** dpawlik has joined #openstack-infra		09:39
*** aojea has joined #openstack-infra		09:40
*** pbourke_ has quit IRC		09:54
*** yamamoto has joined #openstack-infra		10:06
*** yamamoto has quit IRC		10:10
*** e0ne has joined #openstack-infra		10:12
*** rossella_s has quit IRC		10:21
*** priteau has quit IRC		10:24
*** pbourke has joined #openstack-infra		10:26
*** electrofelix has joined #openstack-infra		10:28
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Only reset working copy when needed https://review.openstack.org/624343	10:31
*** gfidente has joined #openstack-infra		10:34
*** rossella_s has joined #openstack-infra		10:35
*** yamamoto has joined #openstack-infra		10:45
*** yamamoto has quit IRC		10:47
*** udesale has quit IRC		10:56
*** tobias-urdin is now known as tobias-urdin\|lun		11:00
*** tobias-urdin\|lun is now known as tobias-urdin_afk		11:01
*** yamamoto has joined #openstack-infra		11:11
*** yamamoto has quit IRC		11:16
*** yamamoto has joined #openstack-infra		11:16
*** rfolco has quit IRC		11:18
*** dtantsur\|afk is now known as dtantsur		11:18
*** rfolco has joined #openstack-infra		11:23
*** tobias-urdin_afk is now known as tobias-urdin		11:27
*** rossella_s has quit IRC		11:28
*** longkb has quit IRC		11:39
*** rossella_s has joined #openstack-infra		11:43
*** quiquell is now known as quiquell\|brb		11:50
*** dpawlik has quit IRC		11:56
*** ahosam has joined #openstack-infra		11:57
*** dpawlik has joined #openstack-infra		11:57
ssbarnea\|rover	i seen an interesting spike on timeouts which seems to re-occur after exactly one week: http://status.openstack.org/elastic-recheck/	12:00
*** ahosam has quit IRC		12:01
ssbarnea\|rover	i am considering adding a new query for POST specific timeouts as the generic one seems too generic and we have a signifiant number of post ones. anyone against?	12:02
*** quiquell\|brb is now known as quiquell		12:13
sean-k-mooney	are there any docs on how to create an elastic recheck query	12:16
sean-k-mooney	i want to create one for "os_vif error: [Errno 24] Too many open files" in the nova compute agent log	12:17
*** wolverineav has joined #openstack-infra		12:18
*** wolverineav has quit IRC		12:22
*** yamamoto has quit IRC		12:23
*** fresta_ is now known as fresta		12:27
*** yamamoto has joined #openstack-infra		12:28
*** jamesden_ is now known as jamesdenton		12:29
*** jpena is now known as jpena\|lunch		12:32
*** yamamoto has quit IRC		12:32
*** psachin has quit IRC		12:35
*** rh-jelabarre has joined #openstack-infra		12:39
*** jamesmcarthur has joined #openstack-infra		12:42
*** jamesmcarthur has quit IRC		12:46
*** e0ne has quit IRC		12:47
*** dave-mccowan has joined #openstack-infra		12:53
frickler	amorin: oh, that could indeed explain our issues. can you work around it by adjusting quota? do you still want us to proceed with the other tests?	12:55
openstackgerrit	Chris Dent proposed openstack-infra/project-config master: Change os-resource-classes acl config to placement https://review.openstack.org/624387	13:00
ssbarnea\|rover	sean-k-mooney: just create another file in queries/ folder, that's all. look at existing files to reverse engineer the docs ;)	13:00
ssbarnea\|rover	in the end is a 4-5 lines yaml file	13:00
*** dave-mccowan has quit IRC		13:01
*** yamamoto has joined #openstack-infra		13:05
*** gfidente has quit IRC		13:09
sean-k-mooney	ya i figured that out but i cant figure out the kibana/elastic serach query	13:09
sean-k-mooney	tags:"screen-n-cpu.txt" and message:"os_vif error: [Errno 24] Too many open files" and project:"openstack/neutron"	13:09
sean-k-mooney	that does not seam to work	13:09
*** boden has joined #openstack-infra		13:09
*** panda\|off is now known as panda		13:10
*** ykarel is now known as ykarel\|afk		13:12
*** trown\|outtypewww is now known as trown		13:13
*** jamesmcarthur has joined #openstack-infra		13:15
fungi	amorin: oh, were the hosts oversubscribed on ram? i agree that could have been an explanation	13:18
*** gfidente has joined #openstack-infra		13:21
fungi	sean-k-mooney: do you have a recent example of a job log in which that string appeared?	13:23
fungi	message:"os_vif error: [Errno 24] Too many open files" isn't found in any indexed job logs for at least the past week	13:23
*** zul has quit IRC		13:26
*** zul has joined #openstack-infra		13:26
*** rlandy has joined #openstack-infra		13:28
sean-k-mooney	fungi: yes so currently its commign up as an uncatogerised issue	13:31
sean-k-mooney	once sec	13:31
sean-k-mooney	http://logs.openstack.org/49/622449/4/check/neutron-tempest-iptables_hybrid/aa25876/logs/screen-n-cpu.txt.gz?level=TRACE#_Dec_11_10_29_35_876336	13:32
*** jpena\|lunch is now known as jpena		13:32
sean-k-mooney	fungi: the neutron-tempest-iptables_hybrid entry on http://status.openstack.org/elastic-recheck/data/integrated_gate.html	13:33
sean-k-mooney	is caused by https://bugs.launchpad.net/os-vif/+bug/1807949	13:33
openstack	Launchpad bug 1807949 in os-vif "os_vif error: [Errno 24] Too many open files" [High,Triaged] - Assigned to sean mooney (sean-k-mooney)	13:33
sean-k-mooney	or rather by pyrout2	13:33
fungi	we do seem to be indexing that file in that job	13:44
fungi	since build_name:"neutron-tempest-iptables_hybrid" AND filename:"logs/screen-n-api.txt" returns plenty of hits in the past 6 hours	13:45
fungi	but appending AND message:"Too many open files" has 0 matches in 24 hours	13:46
fungi	or even 48 hours, so should have caught that run	13:47
*** jamesmcarthur has quit IRC		13:47
*** jamesmcarthur has joined #openstack-infra		13:48
fungi	build_short_uuid:"aa25876" has hits for that file too though	13:48
*** kgiusti has joined #openstack-infra		13:50
sean-k-mooney	ok so atleast i did not completely missunderstand how to use kibana	13:50
fungi	well, either that or i completely misunderstand how to use kibana too ;)	13:51
fungi	certainly not ruling that out	13:51
sean-k-mooney	filenema:"logs/screen-n-api.txt" is the wrong file by the way	13:51
sean-k-mooney	it should be screen-n-cpu.txt	13:52
fungi	d'oh, thanks!	13:53
fungi	that seemed to make a difference, though still no lines indexed with message:"os_vif error"	13:54
sean-k-mooney	can you share your query by the way	13:54
sean-k-mooney	this could become a neutorn gate blocker or it could just be intermitent so i wanted to get a querry to try and monitor it	13:55
fungi	i'm currently combing through build_name:"neutron-tempest-iptables_hybrid" AND filename:"logs/screen-n-cpu.txt" AND build_short_uuid:"aa25876" AND message:"error"	13:56
fungi	trying to work out why that line is missing	13:56
fungi	noting the entries are in reverse-chronological order	13:56
*** ykarel\|afk has quit IRC		13:57
fungi	found!	13:57
fungi	the message it parsed out for that line is "error: [Errno 24] Too many open files"	13:57
fungi	okay, now working to generalize	13:58
*** sthussey has joined #openstack-infra		14:00
*** jamesmcarthur has quit IRC		14:02
*** jamesmcarthur has joined #openstack-infra		14:03
fungi	sean-k-mooney: is the project:"openstack/neutron" part critical to this query?	14:03
fungi	is this showing up in multiple jobs, but only jobs run on changes to neutron and not to any other projects?	14:04
fungi	tags:"screen-n-cpu.txt" AND message:"error: [Errno 24] Too many open files" AND project:"openstack/neutron" shows up starting around 09:00 utc today	14:05
sean-k-mooney	no	14:06
fungi	if i drop the project filter, it's still the same number of hits	14:06
sean-k-mooney	i think i have a query	14:06
sean-k-mooney	http://logstash.openstack.org/#/dashboard/file/logstash.json?query=tags:%5C%22screen-n-cpu.txt%5C%22%20AND%20message:%5C%22OSError:%20%5BErrno%2024%5D%20Too%20many%20open%20files%5C%22%20AND%20module:%5C%22os_vif%5C%22%20AND%20loglevel:%20%5C%22ERROR%5C%22	14:06
*** mriedem has joined #openstack-infra		14:06
*** jamesmcarthur has quit IRC		14:07
sean-k-mooney	fungi: i catully want to check the nova and neutron and kurry-kubernetes jobs	14:07
sean-k-mooney	so droping is fine	14:07
fungi	lgtm	14:07
*** dave-mccowan has joined #openstack-infra		14:08
*** rossella_s has quit IRC		14:08
fungi	first hit does still seem to be around 09:00	14:08
sean-k-mooney	ya so we did a release yesterday of os-vif	14:08
sean-k-mooney	the thing is i dont know if this is intermitent or if it always happens	14:08
sean-k-mooney	i think the issue is caused by pyroute2 however	14:09
*** rossella_s has joined #openstack-infra		14:09
fungi	the other litmus test is that appending AND build_status:"SUCCESS" returns 0 hits, which seems to be the case	14:09
fungi	so we know this pattern is present only in failed job runs	14:09
sean-k-mooney	cool so this is the tracking bug https://bugs.launchpad.net/os-vif/+bug/1807949	14:10
openstack	Launchpad bug 1807949 in os-vif "os_vif error: [Errno 24] Too many open files" [High,Triaged] - Assigned to sean mooney (sean-k-mooney)	14:10
*** psachin has joined #openstack-infra		14:11
sean-k-mooney	if i add tags:"screen-n-cpu.txt" AND message:"OSError: [Errno 24] Too many open files" AND module:"os_vif" AND loglevel: "ERROR" as the query in the elastic serch repo the file just needs the same name as the bug number right	14:11
*** dave-mccowan has quit IRC		14:14
*** quiquell is now known as quiquell\|lunch		14:18
*** ykarel\|afk has joined #openstack-infra		14:18
*** smarcet has joined #openstack-infra		14:20
smarcet	fungi: clarkb: morming , as i forementioned before, we need to migrate openstackid to latest Laravel version (5.6) and migrate puppet to start using php7.x, u mentioned that newer ubuntu version that u guys support is xenial, but xenial by default only support php 7.0 and i have a hard requirement to use PHP >= 7.1.3 bc https://laravel.com/docs/5.6	14:22
smarcet	its posible for me to update the puppet to use this ppa ppa:ondrej/php and be able to use php 7.2 ?	14:22
*** rossella_s has quit IRC		14:24
fungi	i guess laravel has decided they don't support any ubuntu lts other than the latest one at this point? bionic (18.04 lts) seems ot have php 7.2 but we currently have problems using puppet on it and are looking at solutions for deploying containerized services on bionic as a result	14:26
fungi	smarcet: given that ppa:ondrej/php is maintained by one of the official ubuntu php package maintainers, it seems like a safe enough compromise	14:27
fungi	i guess this is his alternative to getting the php7.2 packages into xenial-backports	14:28
*** quiquell\|lunch is now known as quiquell		14:30
openstackgerrit	sean mooney proposed openstack-infra/elastic-recheck master: add query for os-vif pyroute2 open files https://review.openstack.org/624412	14:30
smarcet	fungi: ok cool, if its ok, then i will update puppet to work on that way, may i ask to remove openstackid production server from puppet agent ? so i could test dev server ?	14:31
*** rossella_s has joined #openstack-infra		14:31
*** udesale has joined #openstack-infra		14:33
fungi	#status log added openstackid.org to the emergency disable list while smarcet tests out php7.2 on openstackid-dev.openstack.org	14:34
openstackstatus	fungi: finished logging	14:34
fungi	smarcet: i see that we're still running ubuntu trusty (14.04 lts) on both of those servers too	14:35
fungi	maybe this is an opportunity to rebuild them on xenial (16.04 lts) too?	14:35
*** smarcet has quit IRC		14:36
*** ykarel\|afk is now known as ykarel		14:37
*** smarcet has joined #openstack-infra		14:38
smarcet	fungi: yes of course	14:38
smarcet	i will test that and we could try first on dev server :)	14:38
smarcet	thx u	14:38
*** rossella_s has quit IRC		14:39
*** e0ne has joined #openstack-infra		14:45
*** rossella_s has joined #openstack-infra		14:46
*** gfidente has quit IRC		14:59
*** eharney has joined #openstack-infra		15:00
*** markvoelker has joined #openstack-infra		15:00
openstackgerrit	Fabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver https://review.openstack.org/604404	15:05
*** psachin has quit IRC		15:05
*** smarcet has quit IRC		15:09
*** oanson has joined #openstack-infra		15:17
*** smarcet has joined #openstack-infra		15:20
*** eharney_ has joined #openstack-infra		15:23
*** agopi has quit IRC		15:24
*** eharney has quit IRC		15:26
*** eharney_ is now known as eharney		15:27
*** agopi has joined #openstack-infra		15:29
*** geguileo has joined #openstack-infra		15:31
geguileo	dmsimard: hi, I'm trying to run this playbook https://review.openstack.org/#/c/620671/7/playbooks/cinderlib/run.yaml	15:32
geguileo	dmsimard: and it's being called from here https://review.openstack.org/#/c/620671/7/playbooks/legacy/cinder-tempest-dsvm-lvm-lio-barbican/run.yaml	15:32
*** bobh has quit IRC		15:32
geguileo	dmsimard: and I'm running into this error http://logs.openstack.org/71/620671/7/check/cinder-tempest-dsvm-lvm-lio-barbican/6de7951/job-output.txt.gz#_2018-12-04_19_52_26_753969	15:32
geguileo	dmsimard: which is a little opaque for me	15:33
dmsimard	geguileo: there's a bit more info in the ara report: http://logs.openstack.org/71/620671/7/check/cinder-tempest-dsvm-lvm-lio-barbican/6de7951/ara-report/result/abc1dc34-2d56-43e9-9c11-730cf6ec8d1d/	15:33
dmsimard	(from http://logs.openstack.org/71/620671/7/check/cinder-tempest-dsvm-lvm-lio-barbican/6de7951/ara-report/ )	15:33
geguileo	dmsimard: thanks!	15:34
dmsimard	does that directory exist or not ? there's the notion of sudoers in your playbook -- do the tests need to run with superuser privileges ?	15:34
geguileo	dmsimard: how can I know where devstack is installed? r:-??	15:34
*** agopi has quit IRC		15:35
dmsimard	geguileo: the devstack installation occurs in a previous task: http://logs.openstack.org/71/620671/7/check/cinder-tempest-dsvm-lvm-lio-barbican/6de7951/ara-report/result/b1365e39-3d97-48e5-a474-e65e50aba1ff/	15:36
dmsimard	I'm not super familiar with devstack but it looks like there's stuff in /opt/stack for sure	15:37
*** bobh has joined #openstack-infra		15:38
*** ykarel is now known as ykarel\|away		15:38
geguileo	dmsimard: thanks	15:39
geguileo	dmsimard: I'll try to figure out if there's a variable with the directory	15:39
*** bobh has quit IRC		15:40
*** neilsun has quit IRC		15:41
openstackgerrit	Chris Dent proposed openstack-infra/project-config master: Change os-resource-classes and os-traits acl config to placement https://review.openstack.org/624387	15:47
*** gfidente has joined #openstack-infra		15:52
*** wolverineav has joined #openstack-infra		15:54
*** markvoelker has quit IRC		15:55
clarkb	our inap images are all up to date now'	15:56
*** markvoelker has joined #openstack-infra		15:56
*** bobh has joined #openstack-infra		15:57
*** tpsilva has joined #openstack-infra		15:58
*** smarcet has quit IRC		15:58
*** wolverineav has quit IRC		15:58
*** smarcet has joined #openstack-infra		15:59
*** markvoelker has quit IRC		16:01
*** bobh has quit IRC		16:01
*** ccamacho has quit IRC		16:09
*** jamesmcarthur has joined #openstack-infra		16:10
*** udesale has quit IRC		16:14
*** bobh has joined #openstack-infra		16:20
*** bhavikdbavishi has joined #openstack-infra		16:24
*** bhavikdbavishi has quit IRC		16:25
*** bhavikdbavishi has joined #openstack-infra		16:31
*** e0ne has quit IRC		16:36
*** sean-k-mooney has quit IRC		16:43
*** quiquell is now known as quiquell\|off		16:48
*** sean-k-mooney has joined #openstack-infra		16:49
*** eharney has quit IRC		16:51
*** d0ugal has quit IRC		16:56
*** bhavikdbavishi1 has joined #openstack-infra		16:58
*** kjackal has quit IRC		16:59
*** kjackal has joined #openstack-infra		17:00
*** bhavikdbavishi1 has quit IRC		17:00
*** bhavikdbavishi has quit IRC		17:02
*** bhavikdbavishi has joined #openstack-infra		17:05
*** rossella_s has quit IRC		17:07
*** eharney has joined #openstack-infra		17:07
*** jamesmcarthur has quit IRC		17:13
*** yamamoto has quit IRC		17:19
*** zul has quit IRC		17:20
*** gyee has joined #openstack-infra		17:22
*** jpich has quit IRC		17:24
clarkb	A lot of email to get through this morning. Probably a fairly slow start for me today between that and our meeting	17:35
*** ykarel\|away has quit IRC		17:35
*** pgaxatte has quit IRC		17:37
fungi	mordred: corvus: clarkb: jpmaxman is hacking on a gerrit backend driver for netlify cms and interested in having a repo in our gerrit for some test content. any concerns?	17:43
clarkb	fungi: could possibly reuse the sandbox repo? (though that might get abused). I don't see any issues with having a test repo	17:43
fungi	yeah, i figure it might be cleaner to use a dedicated repo and then just retire it once no longer needed (or keep it around for similar future sorts of netlify backend testing). i think he wants	17:44
fungi	i think he wants to be able to test-drive it with zuul doing gating of content changes and stuff	17:45
fungi	which is why i didn't suggest just using the official gerrit container to test with	17:45
*** JpMaxMan has joined #openstack-infra		17:45
corvus	fungi: no objection here	17:46
corvus	and also, now that i've read all the requirements -- no better ideas :)	17:46
*** xarses has joined #openstack-infra		17:46
*** sshnaidm is now known as sshnaidm\|afk		17:47
fungi	and exciting as this may mean easier collaboration on site content for zuul-ci.org and opendev.org	17:47
*** xarses has quit IRC		17:47
JpMaxMan	yes that's the dream :)	17:48
*** xarses has joined #openstack-infra		17:48
fungi	clarkb: should it just go in the openstack-infra namespace? seems more related to infra/opendev efforts than to openstack anyway, even if it's not something that would necessarily be an official deliverable repo of the infra team	17:49
clarkb	that is fine with me.	17:50
JpMaxMan	right now we're just working up a POC using the starlingx site as it is already in netlify	17:50
JpMaxMan	https://github.com/StarlingXWeb/starlingx-website	17:52
fungi	JpMaxMan: want me to get the project-config change going to create the repository? do you want starlingx-website imported as the initial repository content?	17:53
*** lpetrut has quit IRC		17:56
JpMaxMan	I'm happy to take a stab at it - and yes we'd start with the starlingx-website as an initial repo.	17:56
*** bobh has quit IRC		17:56
*** gyee has quit IRC		17:56
fungi	JpMaxMan: in that case we have instructions at https://docs.openstack.org/infra/manual/creators.html and are happy to help answer any questions you have	17:57
fungi	JpMaxMan: i recommend something like openstack-infra/netlify-sandbox to fit with existing naming conventions for other repos in our gerrit	17:57
JpMaxMan	excellent! Thank you - will let you know as I proceed. And yes, any naming conventions suggestions welcome - will use that to start :)	17:58
fungi	note that a lot of what's in there isn't relevant for this particular case so you'll end up skipping some of it (e.g., anything having to do with pypi)	17:59
fungi	and if you miss something or include something unnecessary, that's why we have automated checks and reviewers	18:00
*** derekh has quit IRC		18:00
*** aojea has quit IRC		18:01
clarkb	unrelated but it is really cool that university researchers are starting to figure out we've got all this real world data freely available for research on software development process activity	18:01
*** trown is now known as trown\|lunch		18:02
JpMaxMan	ok good to know - I'll give the automation a run for its money :P	18:02
fungi	we all do	18:02
fungi	clarkb: yes, i love that academic research sees our work as a gold mine of behavioral (both human and systems) data	18:06
*** dtantsur is now known as dtantsur\|afk		18:07
clarkb	mwhahaha: ssbarnea\|rover EmilienM I'm still in a try to better understand what the afilures are are experiencing are state and looking at http://logs.openstack.org/22/605722/2/gate/tripleo-ci-centos-7-undercloud-containers/d1a7140/logs/undercloud/ I see the undercloud failed due to configuring keepalived? Having a hard time seeing why/where keepalived failed. Can you help me find the appropriate	18:07
clarkb	logs?	18:07
mwhahaha	clarkb: error mounting image volumes: unable to find user root: no matching entries in passwd file	18:07
mwhahaha	is a bug in podman (probably runc)	18:08
clarkb	mwhahaha: which logfile do I look in for that?	18:08
mwhahaha	http://logs.openstack.org/22/605722/2/gate/tripleo-ci-centos-7-undercloud-containers/d1a7140/logs/undercloud/home/zuul/undercloud_install.log.txt.gz#_2018-12-11_17_23_18	18:09
mwhahaha	https://bugs.launchpad.net/tripleo/+bug/1803544	18:09
openstack	Launchpad bug 1803544 in tripleo "unable to find user root: no matching entries in passwd file" [High,Triaged]	18:09
*** Swami has joined #openstack-infra		18:09
clarkb	aha I needed to scroll up for more ERROR messages. Thank you	18:09
mwhahaha	http://status.openstack.org/elastic-recheck/index.html#1803544	18:10
*** e0ne has joined #openstack-infra		18:10
mwhahaha	we're trying to figure it out, it's one of those really obscure bugs	18:10
* mwhahaha wanders off		18:10
clarkb	cool so its being tracked already. Thanks	18:10
EmilienM	clarkb: hi, yes I'm working with the podman team today and we have a fix already : https://github.com/containers/libpod/pull/1978	18:11
EmilienM	clarkb: I'm working on getting the fix merged and built asap...	18:12
*** gfidente has quit IRC		18:12
clarkb	EmilienM: good to know. FWIW not singling out this specific bug I was jsut going through and trying to find the breadcrumbs and got lost. Thank you for pointing me at the other error messages and the bug and the fix	18:12
clarkb	(this is me trying to better understand the variety of testing we run so that the infra team can help debug and/or fix things when it is on our end)	18:12
EmilienM	yeah it makes sense	18:13
clarkb	mriedem: http://logs.openstack.org/76/582376/8/gate/tempest-full-py3/a8f62b6/job-output.txt.gz#_2018-12-11_10_50_01_185172 is that one you recognize? looks like either the test node ran out of disk or the devstack test flavor is too small for cirros	18:16
clarkb	unfortunately dstat doesn't capture disk usage	18:17
clarkb	http://logs.openstack.org/76/582376/8/gate/tempest-full-py3/a8f62b6/controller/logs/df.txt.gz whenever that df is run by devstack indicates we have a lot of disk there though	18:18
fungi	clarkb: could also be bubbling up from lack of disk space at the hypervisor layer, though that build was in inap-mtl01 which isn't somewhere we've seen disk issues like that in the past as far as i'm aware	18:18
clarkb	fungi: ya the df shows we haev 150GB disk which is a lot more than we promise to have	18:18
clarkb	maybe someone can boot the cirros image and check how much disk it ends up using (or is that something we can ask qemu-img)	18:19
clarkb	oh you mean the hypervisor in inap, thats a good point	18:19
*** wolverineav has joined #openstack-infra		18:19
clarkb	sorry misread it as the test node being cirros' hypervisor	18:19
fungi	yeah, i can see now how that might have been vague on my part	18:19
fungi	the provider's hypervisor layer/compute host	18:19
fungi	not devstack's hypervisor layer	18:20
clarkb	yup	18:20
fungi	i think enospc gets plumbed up into the guest anyway	18:20
clarkb	let us see what logstash says. If its inap specific then ya probably full up hypervisor. If we have more occurences across clouds then maybe cirros is too big	18:20
fungi	good thinkin	18:20
*** _alastor_ has joined #openstack-infra		18:22
*** d0ugal has joined #openstack-infra		18:22
clarkb	there is a blip of it in inap on the 11th. Then a smaller blip in rax-iad	18:23
clarkb	though I'm only searching recent days /me expands search	18:23
*** wolverineav has quit IRC		18:24
clarkb	it happens in rax-iad, ord, inap and ovh gra1	18:24
clarkb	inap is about 2/3 of the occurences and rax ord half that	18:24
clarkb	mgagne_: ^ if it is easy for you to check, any idea what disk pressure looks like on those hypervisors? Also thank you for the image upload fix. Our images are up to date now	18:25
*** wolverineav has joined #openstack-infra		18:26
*** wolverineav has quit IRC		18:27
*** wolverineav has joined #openstack-infra		18:27
*** rkukura_ has joined #openstack-infra		18:32
*** rkukura has quit IRC		18:32
*** rkukura_ is now known as rkukura		18:32
mgagne_	clarkb: didn't check all hypervisors but disk is far from being full. and now going into a meeting.	18:33
openstackgerrit	Clark Boylan proposed openstack-infra/elastic-recheck master: Add query for bug 1808010 https://review.openstack.org/624458	18:35
openstack	bug 1808010 in OpenStack-Gate "Tempest cirros boots fail due to lack of disk space" [Undecided,New] https://launchpad.net/bugs/1808010	18:35
clarkb	mgagne_: thanks	18:35
clarkb	started tracking it ^ there	18:35
clarkb	mriedem: ^ fyi	18:35
*** trown\|lunch is now known as trown		18:36
*** rfolco is now known as rfolco_brb		18:38
fungi	we're now down to 15 zuul mergers, and the merger queue seems to be getting backed up more often (though still clears fairly quickly)	18:40
*** jpena is now known as jpena\|off		18:40
clarkb	we expect 20 right? 12 executors + 8 dedicated mergers	18:42
fungi	we don't seem to register mergers distinctly in gearman, they just show up in the merger:merge, merger:refstate, merger:fileschanges and merger:cat buckets so hard to tell which ones are missing	18:42
fungi	yeah, should be 20	18:42
mriedem	clarkb: ack,	18:43
mriedem	note that until https://review.openstack.org/#/c/619319/	18:43
mriedem	the flavors used by tempest via devstack specifiy 0 root_gb,	18:43
mriedem	meaning compute uses whatever is the size of the image	18:43
*** Adri2000 has quit IRC		18:43
clarkb	mriedem: possible that the image is too small for some of the writes then? you'd expect that to be more consistent though so maby does point to test node or host hypervisor	18:44
fungi	it was 20 mergers registered for just a split second back on the 6th/7th (when we brought ze12 into production right after restarting everything): http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=30&fullscreen&orgId=1&from=now-7d&to=now	18:44
fungi	looks like we were already down 2 before the restarts, so probably been going on for a while	18:44
mriedem	clarkb: hmm, maybe, not sure what size the config drive is	18:45
mriedem	looks like vfat is a fixed 64MB	18:46
mriedem	but we don't use vfat by default	18:46
fungi	seems two died around utc midnight on november 13th, prior to that we were running with a full compliment since beginning of october at least, so maybe we added something in early-to-mid november which made merger threads crashy?	18:47
mriedem	oh wait is this config drive in the test node or a nested virt guest created by tempest?	18:47
*** armax has joined #openstack-infra		18:47
clarkb	mriedem: this is the cirros nested "virt" guest created by tempest failing to configure networking because its disk is full (now that could be because the hypervisor running devstack is full disk or the hypervisor running the test node is also running with full disk)	18:48
mriedem	looks like this is by far happening in networking-odl-tempest-fluorine	18:48
clarkb	in particular it appears that it can't set the default route (I'm guessing because that needs disk to write to)	18:48
*** Adri2000 has joined #openstack-infra		18:48
clarkb	and without a default route it seems that ssh is failing from tempest to the cirros node	18:49
clarkb	~10 minutes to the infra meeting	18:49
fungi	`pgrep -c zuul-executor` returns "2" on all the executors	18:50
clarkb	fungi: possibly the dedicated mergers have died?	18:50
clarkb	or maybe haven't reconnected to gearman after restarting the scheduler?	18:50
fungi	`pgrep -c zuul-merger` returns "1" on all the standalone mergers	18:50
fungi	also we seem to lose mergers one or two at a time, over time, according to the graph	18:51
fungi	not corresponding to scheduler/geard restarts	18:51
fungi	we'll likely need to dig into merger logs on the servers to find out what's going on	18:52
*** smarcet has quit IRC		18:52
*** bhavikdbavishi has quit IRC		18:52
clarkb	fungi: check netstat connections to 4730 on all of the executors and mergers?	18:52
clarkb	and vice versa on from the gearman scheduler	18:52
clarkb	that should narrow down where the connections don't exist	18:52
*** bhavikdbavishi has joined #openstack-infra		18:53
fungi	good idea	18:53
*** _alastor_ has quit IRC		18:53
fungi	odd that some are ipv4 and some v6	18:53
fungi	wonder if this is network instability in rax-dfw at play	18:53
clarkb	I did confirm with the logstash switch to just geard that gear will fall back appropriately	18:54
clarkb	https://review.openstack.org/#/c/611920/ was another output of that to make geard a bit more ipv6 friendly	18:55
clarkb	mriedem: actually /run on cirros isn't necessarily a real fs either. It is possible that that is tmpfs or similar in which case it could be memroy pressure?	18:56
fungi	confirmed that all executors see 2 established gearman connections and all standalone mergers 1	18:56
clarkb	I probably need to boot a cirros image locally	18:56
fungi	will check from the scheduler end now	18:56
*** rlandy is now known as rlandy\|brb		18:59
mriedem	clarkb: on one of the failures i looked at, the config drive was .5 MB	18:59
*** _alastor_ has joined #openstack-infra		18:59
openstackgerrit	Merged openstack-infra/nodepool master: Fix race in test_handler_poll_session_expired https://review.openstack.org/623269	19:00
fungi	clarkb: i think that got it: http://paste.openstack.org/show/737045/ (we have 5 executors showing only 1 established gearman connection on the geard end)	19:01
fungi	ze02, ze03, ze07, ze08 and ze11 seen to have probably lost their gearman connections for their merger threads	19:01
fungi	Shrews: this may also be up your alley to dig into once you get working internets again	19:03
fungi	i guess we don't get separate merger logs on the executors, the messages are just all mixed into the executor logs?	19:04
*** bobh has joined #openstack-infra		19:07
*** jamesmcarthur has joined #openstack-infra		19:09
*** wolverineav has quit IRC		19:09
Shrews	fungi: the gearman stuff? maybe alley-adjacent :) i can help you poke around in logs after the meeting	19:10
*** wolverineav has joined #openstack-infra		19:10
fungi	yeah, no rush	19:10
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Fix node leak when skipping child jobs https://review.openstack.org/613261	19:10
*** bhavikdbavishi has quit IRC		19:11
fungi	checking ze02 for a start, seems it logged zuul.Merger entries other than "Updating local repository" up until 2018-11-28 23:03:52,646 and then abruptly ceased	19:11
openstackgerrit	Merged openstack-infra/elastic-recheck master: Add query for bug 1808010 https://review.openstack.org/624458	19:12
openstack	bug 1808010 in OpenStack-Gate "Tempest cirros boots fail due to lack of disk space" [Undecided,New] https://launchpad.net/bugs/1808010	19:12
*** jamesmcarthur has quit IRC		19:14
*** rlandy\|brb is now known as rlandy		19:14
*** yamamoto has joined #openstack-infra		19:17
*** xarses has quit IRC		19:19
*** _alastor_ has quit IRC		19:25
*** electrofelix has quit IRC		19:26
*** xarses has joined #openstack-infra		19:27
*** lbragstad has quit IRC		19:30
*** lbragstad has joined #openstack-infra		19:31
fungi	digging into the log around that time, i don't see any exceptions/tracebacks	19:32
fungi	current theory: network issues resulted in geard dropping the connection from the merger, but the merger on the executor still thinks the socket is established. lack of keepalive/dpd(?) means the merger thread is humming along blissfully unaware that it will never see any new requests	19:35
*** shardy has quit IRC		19:37
*** wolverineav has quit IRC		19:37
fungi	related question: why is this only affecting the tag-along merger threads on the executors and not the stand-alone merger daemons?	19:40
fungi	we've lost 25% of our mergers, and none are stand-alone even though those account for for 40% of the total	19:40
fungi	statistically unlikely it's random distribution there	19:41
*** jamesmcarthur has joined #openstack-infra		19:43
*** wolverineav has joined #openstack-infra		19:46
*** smarcet has joined #openstack-infra		19:47
*** jamesmcarthur has quit IRC		19:47
*** wolverineav has quit IRC		19:48
*** wolverineav has joined #openstack-infra		19:49
tobiash	fungi: related to your theory: https://review.openstack.org/599567	19:52
tobiash	fungi: we observed the same after a vm crash hosting the scheduler/geard	19:52
fungi	tobiash: thanks!!! that's indeed interesting	19:53
fungi	tobiash: any idea why it might affect the merger threads on our executors but not affect our stand-alone mergers?	19:53
tobiash	fungi: that's just co-incidence, on our scheduler crash it affected all mergers	19:54
fungi	got it, thanks again	19:54
tobiash	fungi: the point is that if a merge was in progress while having network issues, the merger will try to send the result and notice that the connection is broken while an idle merger won't notice it	19:54
fungi	makes sense. perhaps our stand-alone mergers are more active than our tag-along mergers	19:55
fungi	and so statistically more likely to be in the middle of something when the disconnect occurs, so notice and reconnect	19:56
fungi	no idea if our data backs that up, but one possible explanation anyway	19:56
tobiash	maybe	19:56
openstackgerrit	Chris Dent proposed openstack-infra/project-config master: Change os-resource-classes and os-traits acl config to placement https://review.openstack.org/624387	19:57
*** wolverineav has quit IRC		19:58
corvus	tobiash, fungi: don't we have keepalives on the server? shouldn't that be enough?	19:59
tobiash	corvus: no, because an idle merger won't notice until it tries to send something	19:59
fungi	corvus: if we do, then i'm indeed curious why it's not helping	19:59
tobiash	so you need keepalive in both directions	19:59
corvus	tobiash: oh, i get it. thanks :)	20:00
tobiash	corvus: the server correctly notices that the client is gone, so that's fine	20:00
fungi	we definitely seem to have connections which are marked as established on the client but absent on the server	20:00
clarkb	corvus: ianw re https://review.openstack.org/#/c/605585/14 I left a comment on what I think is the issue and how to fix it. Do you think that fix is reasonable? if so I can get it up pretty quickly	20:00
fungi	tobiash: yep, that's i think what we're seeing here then	20:00
clarkb	oh wait there is another issue too	20:01
corvus	tobiash, fungi: +3	20:01
fungi	thanks!	20:01
tobiash	corvus, fungi: the according zuul change is 599567 (which needs an update to the requirements after a geard release)	20:01
tobiash	corvus: thanks :)	20:01
corvus	clarkb: yep; i think you or i may have suggested that originally too	20:02
tobiash	er 599573	20:02
fungi	i'm just glad this is probably explained (and even known) and i can hopefully stop worrying about the cause now ;)	20:02
clarkb	corvus: just left a second comment on another failure	20:02
clarkb	corvus: this one will need a little more thought but I think we can safely converge that rule across our control plane	20:02
corvus	clarkb: that's very amusing, btw -- this was my yesterday: https://review.openstack.org/619643	20:03
clarkb	ha	20:04
ianw	clarkb: hrm, FORWARD DROP seems safter anyway?	20:04
clarkb	ianw: ya I think FORWARD DROP is currently the more correct rule for how we use our nodes	20:04
corvus	clarkb: but i agree that -- at least until we're running our own kubernetes clusters on top of our normal infrastructure, that should be fine	20:05
*** zul has joined #openstack-infra		20:05
clarkb	its possible that kubernetes if we switch to it will change that as corvus has found (docker wants it set to DROP as well then it very carefully punches holes for what it passes through NAT)	20:05
clarkb	since we'll docker with host network namespace its a noop for our docker	20:05
corvus	clarkb: re https://review.openstack.org/624246 maybe we should just do it in project-config?	20:09
clarkb	corvus: ya we could add stub projects for the tripleo repos	20:09
clarkb	and project config is listed first so will win right?	20:09
corvus	yep	20:09
* corvus lunches		20:10
openstackgerrit	Merged openstack-infra/gear master: Add support for keepalive to client https://review.openstack.org/599567	20:11
clarkb	mriedem: that cirros instance seems to boot with 64MB of ram according to http://logs.openstack.org/76/582376/8/gate/tempest-full-py3/a8f62b6/controller/logs/libvirt/qemu/instance-00000022_log.txt.gz	20:16
clarkb	(I think I mapped the instance id properly from the console log)	20:17
*** smarcet has quit IRC		20:17
mriedem	is that what -m 64 is?	20:17
clarkb	whcih seems to be the m1.nano flavor. I'm going to boot cirros here with 64MB memory and see if its unhappy	20:18
clarkb	mriedem: ya	20:18
fungi	i would rather plan n64	20:18
fungi	er, play n64	20:18
mriedem	only if not bond	20:18
mriedem	clarkb: yeah http://logs.openstack.org/76/582376/8/gate/tempest-full-py3/a8f62b6/controller/logs/devstacklog.txt.gz#_2018-12-11_10_39_47_721	20:18
*** yamamoto has quit IRC		20:19
mriedem	i think i might have gotten to the bottom of this multiattach swap volume multinode race bug...	20:19
mriedem	oh it would be so sweet	20:19
*** tobiash has quit IRC		20:19
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Add a script to generate the static inventory https://review.openstack.org/622964	20:20
clarkb	nope I'm not going to boot cirros locally because apparmor says libvirtd is not allowed to start	20:20
fungi	it knows best	20:21
ianw	clarkb: ^^ the inventory script was a little too bare-bones i think, suggested updates	20:21
clarkb	anyone have a quick easy way to boot https://download.cirros-cloud.net/0.3.5/cirros-0.3.5-x86_64-disk.img locally under qemu/kvm with 64MB memory to see if /run is a tmpfs or similar?	20:21
clarkb	I want to rule out that the low memory environment is itself the source of the cp errors	20:21
*** mriedem has quit IRC		20:22
*** tobiash has joined #openstack-infra		20:23
*** bobh has quit IRC		20:23
openstackgerrit	Jean-Philippe Evrard proposed openstack-infra/zuul-jobs master: Add docker insecure registries feature https://review.openstack.org/624484	20:23
clarkb	I'm going to find lunch then maybe when I get back I'll figureo ut apparmor	20:23
*** mriedem has joined #openstack-infra		20:24
fungi	clarkb: also board meeting at 2100z if you are interested in dialling in	20:24
clarkb	ya Ill have that in the background likely	20:25
*** bobh has joined #openstack-infra		20:26
openstackgerrit	Jean-Philippe Evrard proposed openstack-infra/zuul-jobs master: Add docker insecure registries feature https://review.openstack.org/624484	20:26
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Use gearman client keepalive https://review.openstack.org/599573	20:32
*** eharney has quit IRC		20:34
*** wolverineav has joined #openstack-infra		20:36
*** wolverineav has quit IRC		20:36
*** wolverineav has joined #openstack-infra		20:36
frickler	clarkb: tmpfs on /run type tmpfs (rw,nosuid,relatime,size=200k,mode=755)	20:46
frickler	clarkb: so that is bound to fail if the config drive contains > 200k data	20:47
clarkb	frickler: thanks I think that means maybe 64MB isnt big enough	20:47
clarkb	ya	20:47
frickler	64MB is pretty huge compared to that	20:47
clarkb	oh ya 200k	20:47
clarkb	mriedem: ^ fyi	20:48
*** hamerins has joined #openstack-infra		20:48
*** d0ugal has quit IRC		20:48
*** eharney has joined #openstack-infra		20:49
fungi	why would we create the tempfs in /run anyway? that's supposed to just be for things like pidfiles during early boot	20:49
mriedem	clarkb: but the 64MB here http://logs.openstack.org/76/582376/8/gate/tempest-full-py3/a8f62b6/controller/logs/libvirt/qemu/instance-00000022_log.txt.gz is the root disk, not the config drive	20:49
fungi	er, i mean create the configdrive in /run	20:49
clarkb	mriedem: its ram memory, but that may be orthogonal if the tmpfs is that small	20:50
mriedem	oh right, was thinking root disk, nvm	20:50
clarkb	200kb tmpfs us pretty tiny	20:50
fungi	we really should never use /run for anything	20:50
clarkb	fungi: thats likely cirros/smoser	20:51
frickler	it's the cirros init script that uses it	20:51
clarkb	since it doesnt run glean or cloud init it does its own thing	20:51
fungi	it's for pidfiles and fifos for services starting before /var/run is available	20:51
clarkb	does the 4.0 image change that U wonder	20:52
fungi	and if you want a reasonable-sized tmpfs for data you generally mount one yourself (like on /tmp)	20:52
clarkb	could be a reasoon ti switch if so	20:52
clarkb	frickler: ^ maybe you can check the newer 4.0 image too?	20:52
*** fuentess has joined #openstack-infra		20:52
frickler	cirros 0.4 doesn't work in devstack last I checked, so not a short term option	20:54
clarkb	ah	20:54
frickler	I'm more wondering why the config-drive gets so large	20:54
clarkb	mriedem may know	20:56
clarkb	we did add a debugging script in tempest as user data	20:56
clarkb	its not huge but could contribute maybe	20:56
*** bobh has quit IRC		20:56
clarkb	also the reason not setting the route matters is we ssh via the fip	20:57
clarkb	so it isnt shared l2 fom cirros perspective	20:57
mriedem	clarkb: not sure, wondering if something changed in tempest recently	20:57
*** rfolco_brb has quit IRC		20:57
frickler	clarkb: oh, where was that script added? "df -h /run" gives me 92% used, 16k free, so not much headroom there	21:00
clarkb	frickler: its in tempest itself for the heavyweight ssh tests. was added to dumo debug info to console	21:00
clarkb	i forget where exactly I added it though but its my name on the commit if that helps to find it (eating lunch and listening to board meeting now)	21:01
frickler	I found a patch from 2017, so that by itself wouldn't explain any recent breakage	21:02
clarkb	it may no longerbe helpful and we could rmeove it if it helps	21:02
clarkb	ya it wasnt super re ent	21:03
clarkb	*recent	21:03
frickler	hmm, that only looks to be three lines of script. removing it may help a bit, but if thing are really so tight I think we need some more general measures	21:04
frickler	anyway, eod for me, will followup tomorrow	21:05
clarkb	++	21:06
clarkb	thank you for getting that booted	21:06
*** d0ugal has joined #openstack-infra		21:06
mriedem	clarkb: looks like we need https://review.openstack.org/#/c/623597/ on stable/rocky	21:06
mriedem	because grenade on master is failing	21:06
mriedem	if you want to cherry pick	21:07
clarkb	I'll look after lunch	21:09
clarkb	have a link to failure?	21:09
mriedem	logstash still shows it hitting	21:10
mriedem	in grenade jobs	21:10
mriedem	so it's probably devstack in stable/rocky	21:10
clarkb	ah	21:10
*** bobh has joined #openstack-infra		21:18
*** bobh has quit IRC		21:19
*** bobh has joined #openstack-infra		21:19
*** bobh has quit IRC		21:21
*** auristor has quit IRC		21:22
openstackgerrit	MarcH proposed openstack-infra/git-review master: tox.ini: add passenv = http_proxy https_proxy # _JAVA_OPTIONS https://review.openstack.org/624496	21:28
*** kgiusti has left #openstack-infra		21:28
JpMaxMan	Hey random question - I'm helping someone get their git review for gerrit going - should this be 404'ing ? https://git.openstack.org/tools/hooks/commit-msg it's causing an error in the git review.	21:30
openstackgerrit	MarcH proposed openstack-infra/git-review master: tox.ini: add passenv = http_proxy https_proxy # _JAVA_OPTIONS https://review.openstack.org/624496	21:30
clarkb	yes that should be served by review.openstack.org	21:30
clarkb	what is your .gitreview file gerrit server value set to?	21:31
clarkb	JpMaxMan: ^	21:31
JpMaxMan	lemme see	21:31
fungi	JpMaxMan: when you run, e.g., `git review -s` it should just work. if this is in an empty repository you may need to create a .gitreview file to commit to it	21:31
JpMaxMan	I was having him follow the instructions for the sandbox	21:32
JpMaxMan	https://docs.openstack.org/infra/manual/sandbox.html	21:32
*** auristor has joined #openstack-infra		21:33
fungi	https://git.openstack.org/cgit/openstack-dev/sandbox/tree/.gitreview#n2 looks correct	21:33
corvus	JpMaxMan: we can get more debug info by running "git review -s -v" and copy/pasting the output to http://paste.openstack.org/	21:33
JpMaxMan	yeah checked the .gitreview it looks right	21:34
JpMaxMan	host=review.openstack.org	21:34
fungi	yes, i wonder if something is going sideways/getting guessed wrong due to a problem with a gerrit account	21:34
fungi	so the verbose output will help	21:34
*** eernst has joined #openstack-infra		21:34
JpMaxMan	http://paste.openstack.org/show/737053/	21:36
JpMaxMan	hmmm it seems to work if I clone... git clone https://review.openstack.org/openstack-dev/sandbox.git	21:37
JpMaxMan	review instead of git ...	21:38
corvus	fungi, JpMaxMan: the first two lines of the debug output are interesting -- apparently gitreview.remote is set	21:38
fungi	could be set in ~/.gitconfig already?	21:39
JpMaxMan	oh yes sorry I think I did that in my first bit of troubleshooting - it was complaining that there wasn't an remote named gerrit	21:39
JpMaxMan	I looked and the remote was set to origin so I set that	21:39
corvus	JpMaxMan: where did you set that?	21:39
fungi	aha, yes if there is already a git remote named "gerrit" then git-review will assume that's what it should use to reach the gerrit server	21:40
corvus	fungi: JpMaxMan said the opposite of that	21:40
JpMaxMan	git config --global gitreview.remote origin	21:41
fungi	oh, yep	21:41
JpMaxMan	I first tried renaming the remote to gerrit which produced the same output	21:41
corvus	JpMaxMan: can you run "git config --global --unset gitreview.remote" please? and then run 'git review -s -v' and paste the new output?	21:41
JpMaxMan	sure	21:42
*** eharney has quit IRC		21:42
fungi	git review should normally set a git remote named "gerrit" for you based on the content of the .gitreview file and the account name it attempts to determine via a test connection. if something goes wrong with the connection test that's when i've seen users start trying random things	21:43
fungi	in the future we might want to revisit how it performs username determination	21:43
*** jamesmcarthur has joined #openstack-infra		21:44
JpMaxMan	ok I think I see what happened one second	21:44
*** markvoelker has joined #openstack-infra		21:45
*** e0ne has quit IRC		21:47
*** eernst has quit IRC		21:47
JpMaxMan	Ok - so the initial error was caused by a bad username: "We don't know where your gerrit is. Please manually create a remote named 'gerrit' and try again."	21:47
*** jamesmcarthur_ has joined #openstack-infra		21:48
JpMaxMan	and yes @corvus - thank you - unsetting that did fix the issue	21:48
JpMaxMan	but using the correct username ;)	21:48
JpMaxMan	he had originally put in email instead of username and I didn't notice	21:48
corvus	JpMaxMan: aha! glad it worked :)	21:48
clarkb	mriedem: remote: https://review.openstack.org/624499 Set apache proxy-initial-not-pooled env var	21:48
*** markvoelker has quit IRC		21:49
JpMaxMan	makes sense now - appreciate it	21:49
*** jamesmcarthur has quit IRC		21:50
openstackgerrit	Clark Boylan proposed openstack-infra/system-config master: Import install-docker role https://review.openstack.org/605585	21:54
openstackgerrit	Clark Boylan proposed openstack-infra/system-config master: Set iptables forward drop by default https://review.openstack.org/624501	21:54
*** wolverineav has quit IRC		21:54
clarkb	corvus: ianw mordred ^ thats the outcome of the iptables discussion from a bit earlier	21:54
*** wolverineav has joined #openstack-infra		21:55
*** wolverineav has quit IRC		21:55
*** wolverineav has joined #openstack-infra		21:55
clarkb	jungleboyj: any idea why cinder + lower constraints tests seem to be unhappy fairly often?	21:57
jungleboyj	clarkb: No idea. I was wondering that too.	21:58
clarkb	jungleboyj: http://logs.openstack.org/42/600442/1/gate/openstack-tox-lower-constraints/6592c5d/job-output.txt.gz#_2018-12-11_21_48_46_655602 seems related to database migrations?	21:58
clarkb	but it isn't the old "disk is slow" timeout error. Instead this seems to complain about data types	21:59
*** rcernin has joined #openstack-infra		21:59
jungleboyj	Jeez. I haven't seen that test case fail in a long time.	22:00
*** smarcet has joined #openstack-infra		22:00
jungleboyj	It is strange that that would be seen more in the LowerConstraints test.	22:01
*** hamerins has quit IRC		22:01
*** bobh has joined #openstack-infra		22:03
fungi	if you haven't seen it in a while and it's failing with older versions of deps...	22:03
jungleboyj	:-) Yeah.	22:03
*** trown is now known as trown\|outtypewww		22:04
clarkb	I've updated https://bugs.launchpad.net/openstack-gate/+bug/1808010 to indicate I think its an interaction with cirros tmpfs and not a cloud issue	22:04
openstack	Launchpad bug 1808010 in OpenStack-Gate "Tempest cirros boots fail due to lack of disk space" [Undecided,New]	22:04
ianw	clarkb: ++ thanks. i like it when a change gets like 3 authors ... shows the system is working :)	22:04
clarkb	ianw: I think we are all invested in getting this going :)	22:05
ianw	clarkb: hrm, this isn't related to a recent change we made calculating tempest disk size? not sure if that merged ...	22:05
clarkb	ianw: it hasn't mriedem linked to it and its unmerged. But also cirros mounts /run as tmpfs so its actually in memory	22:05
clarkb	ianw: and its only 200kb according to frickler's testing	22:05
ianw	ah, ok, should read the bug	22:06
fungi	well, /run is pretty ubiquitously mounted tmpfs by all distros	22:06
clarkb	fungi: ya thats why it occurred to me it may not be disk when I saw it was /run that had a problem	22:06
fungi	they don't generally even create a /run directory unless it's going to be used for pre-rootfs situations	22:07
*** EmilienM has quit IRC		22:07
fungi	and it's pretty much always teensy too	22:07
clarkb	fungi: my hunch here is that cirros is abusing /run this way because config drive can tell you things about what goes into fstab	22:08
*** EmilienM has joined #openstack-infra		22:08
clarkb	so its processing the config drive before it has real disk to write to because it may have to set up those real disks itself	22:08
clarkb	but unfortunately it is leading to broken networking due to constraints being run up against	22:08
clarkb	its also not a super common error. So we may not want to spend too many cycles on it while debugging more common ones first. It is being tracked by e-r now so we should see if it persists or gets worse or bubbles to the top of the list due to us fixing other stuff	22:10
clarkb	according to e-r the top four bugs seem related to timeouts and network issues	22:11
clarkb	there was a spiek in those that went away that I haven't debugged because it went away. Guessing a temporary provider issue	22:12
clarkb	after that is http://status.openstack.org/elastic-recheck/#1807518 which I just pushed a backport to rocky in devstack for (so hopefully those go away)	22:12
clarkb	then its a long long tail of all the random things that are unreliable	22:12
clarkb	mwhahaha: EmilienM ssbarnea\|rover it seems that the centos-ceph-luminous mirror your jobs are talkign to may be getting increasingly flaky	22:18
EmilienM	damn	22:18
EmilienM	weshay: ^	22:18
clarkb	http://mirror.dfw.rax.openstack.org/centos/7/storage/x86_64/ceph-luminous/ is something that I think we do mirror for you	22:18
mwhahaha	we don't have any of those jobs in the gate anymore	22:19
clarkb	so may just be a matter of switching to the in region mirrors for ceph-luminous	22:19
mwhahaha	but yes we should check that out, not sure which we're using	22:19
clarkb	I'm looking at gate e-r graphs (and the logstash links for them)	22:19
clarkb	http://status.openstack.org/elastic-recheck/gate.html#1708704 specifically that one	22:20
mwhahaha	we're not running any jobs that should require ceph	22:20
mwhahaha	but will need to look	22:20
clarkb	more than 50% of the failures in the gate are in the last 24 hours and they fail against centos-ceph-luminous against centos.org	22:20
mwhahaha	hmm it's mirror.centos.org	22:21
*** anteaya has joined #openstack-infra		22:21
mwhahaha	Failed to connect to 2607:f130:0:87::10: Network is unreachable	22:21
clarkb	yup	22:21
mwhahaha	ipv6'd	22:21
clarkb	but we mirror it for you locally at http://mirror.dfw.rax.openstack.org/centos/7/storage/x86_64/ceph-luminous/ (replace region specific data as necessar)	22:21
mwhahaha	yea let me go find the config	22:22
mwhahaha	is that build into the image maybe?	22:22
mwhahaha	cause i'm seeing NODEPOOL_CENTOS_MIRROR referenced in quickstart	22:23
clarkb	zuul drops some hints as to where to find the various mirrors (nodepool did it in the past so the vars say nodepool for compat)	22:23
clarkb	it writes /etc/ci/mirror_info.sh iirc. Let me see if I can find that	22:23
mwhahaha	yea we use that	22:23
mwhahaha	so i need to find out why that one isn't set	22:23
*** markvoelker has joined #openstack-infra		22:24
mwhahaha	oh this is before we even get to our config	22:24
mwhahaha	so yea it's the repos from the image	22:24
mwhahaha	http://logs.openstack.org/23/624323/1/gate/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/98dc676/job-output.txt#_2018-12-11_21_45_14_659825	22:24
clarkb	http://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/roles/mirror-info/templates/mirror_info.sh.j2	22:24
*** wolverineav has quit IRC		22:24
mwhahaha	this is in pre-run	22:24
clarkb	ya that should run very early in our base job	22:25
mwhahaha	right so the pre roles don't properly configure the mirrors	22:25
mwhahaha	not the tripleo stuff	22:25
mwhahaha	we're configuring to use the mirrors	22:25
mwhahaha	so this is likely the repo config of the image	22:25
*** wolverineav has joined #openstack-infra		22:25
clarkb	the image doesn't have that data, we apply it in the job itself	22:25
clarkb	and our base pre run should run before your pre run does	22:26
clarkb	yes it is part of the base jobs defined in project-config	22:26
mwhahaha	the images come with /etc/yum.repos.d cofigured	22:26
mwhahaha	with the defaults from centos	22:26
*** slaweq has quit IRC		22:26
mwhahaha	we're actually clearing out those configs when our code starts	22:27
clarkb	why would centos have random repos enabled by default	22:27
clarkb	(I've quickly grepped and project-config dib elements don't add it at least)	22:27
* mwhahaha shrugs		22:27
clarkb	ianw: ^ this may interest you	22:27
clarkb	I wonder if this is new with 7.6	22:28
mwhahaha	so by default all the CentOS-* files are in the cloud iamge	22:29
mwhahaha	http://logs.openstack.org/23/624323/1/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/bcbde3b/logs/undercloud/etc/yum.repos.d/	22:29
mwhahaha	we turn them off when we run our ci code	22:29
clarkb	but those failures are happening before the disabling occurs?	22:30
mwhahaha	yes	22:31
mwhahaha	this is before any of the tripleo code runs	22:31
mwhahaha	this is just basic infra prep	22:31
mwhahaha	to install OVS	22:31
mwhahaha	for the multinode setup	22:32
clarkb	but why would it care about the ceph repo in that case? I guess yum has to scan all the repos to see where the most appropriate ovs package lives?	22:32
mwhahaha	yum update tries to get all the metadata	22:32
mwhahaha	or yum install	22:32
mwhahaha	if it doesn't exist	22:32
mwhahaha	so it errors	22:32
fungi	infra-root: is someone grooming the openstackadmin account on github right now? seeing some address removals/confirmations and just want to be sure it's one of us (i expect it's related to the discussion in our meeting but would like to be sure)	22:33
clarkb	fungi: ianw volunteered in the meeting today	22:33
*** jamesmcarthur_ has quit IRC		22:33
fungi	cool. ianw: i guess those are you?	22:34
fungi	(removed root@o.o, confirmed infra-root@o.o...)	22:34
mwhahaha	clarkb: so it's that role	22:34
ianw	fungi: yep, poking at it now	22:34
mwhahaha	clarkb: http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/multi-node-bridge/tasks/common.yaml#n10 probably drops that storage repo inplace	22:34
*** jamesmcarthur has joined #openstack-infra		22:35
fungi	ianw: perfect. thanks again!	22:35
mwhahaha	clarkb: but there is no code to swap out the mirrors	22:35
mwhahaha	clarkb: so it uses what is shipped from centos-release-openstack-queens	22:35
*** jamesmcarthur has quit IRC		22:35
clarkb	mwhahaha: gotcha, fwiw http://logs.openstack.org/18/607318/1/gate/tripleo-ci-centos-7-standalone/fbbd3a3/zuul-info/ also exhibits this behavior and is a single node test	22:35
mwhahaha	yea so it's any centos job that installs OVS	22:35
*** jamesmcarthur has joined #openstack-infra		22:35
clarkb	mwhahaha: not sure why it would be running multinode setup if it is single node (that might be a separate cleanup)	22:35
mwhahaha	clarkb: we use ovs for fake interfaces	22:36
*** jamesmcarthur has quit IRC		22:36
mwhahaha	but the issue is that the multi-node-bridge role does not properly configure mirrors to install ovs from	22:36
clarkb	does centos-release-openstack-queens imply centos-ceph-lumnious transitively?	22:36
mwhahaha	likely	22:36
fungi	clarkb: do we miss setting a mirror url for the ovs packages?	22:36
mwhahaha	clarkb: yes, https://rpmfind.net/linux/RPM/centos/extras/7.6.1810/x86_64/Packages/centos-release-openstack-queens-1-2.el7.centos.noarch.html	22:37
fungi	is that the summary?	22:37
*** jamesmcarthur has joined #openstack-infra		22:37
clarkb	fungi: possibly? I'm not sure if we set the mirror properly for the rdo/openstack repo	22:37
clarkb	and then ceph is an unexpected addition	22:37
clarkb	or if we fail to set both of them	22:37
mwhahaha	so the repos get added and removed in multi-node-bridge	22:37
mwhahaha	http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/multi-node-bridge/tasks/common.yaml#n45	22:37
clarkb	looks like we don't really do much between add repo and install package	22:37
mwhahaha	so it adds the stock repos, installs ovs, removes the repos	22:37
clarkb	so likely unset for both repos	22:37
*** bobh has quit IRC		22:38
*** bobh has joined #openstack-infra		22:38
ianw	clarkb / fungi: so that turned out to be rather easy ... when you get a minute do you want to look at the password file and try logging into the shared account with 2fa token as described there?	22:39
*** jamesmcarthur has quit IRC		22:40
clarkb	ianw: ya I can try when I've paged this ovs/ceph stuff out	22:40
*** jtomasek_ has quit IRC		22:40
*** bobh has quit IRC		22:41
*** _alastor_ has joined #openstack-infra		22:41
*** jamesmcarthur has joined #openstack-infra		22:42
*** slaweq has joined #openstack-infra		22:44
*** bobh has joined #openstack-infra		22:44
ianw	clarkb: also can you take a look at stein mirroring request, seems straight foward -> https://review.openstack.org/#/c/621231/	22:45
*** boden has quit IRC		22:46
*** jamesmcarthur has quit IRC		22:46
*** slaweq has quit IRC		22:48
clarkb	mwhahaha: ianw: configure-mirror role tries to do this for centos but only applies it for epel and the base os/ portion of the mirror	22:51
clarkb	I think it will work if we write out the file that specifies centos-ceph-luminous and disable it like we do with epel. Any idea where I can find a copy of that file?	22:52
clarkb	https://github.com/CentOS-Storage-SIG/centos-release-ceph-luminous/blob/master/CentOS-Ceph-Luminous.repo that maybe?	22:54
ianw	clarkb: won't the package install overwrite that? in the epel case, we have epel-release package installed	22:55
*** yamamoto has joined #openstack-infra		22:55
clarkb	ianw: maybe? I know very little about how centos is expected to work. Its all a foreign language to me particularly the way everything is in a different repo and you have to do somethign special to install what seems like every other package	22:57
clarkb	ianw: we configure epel with this j2 file https://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/configure-mirrors/templates/etc/yum.repos.d/epel.repo.j2	22:58
clarkb	seems like we set it to enabled=0 then expect something else to enable it. Can we write out a CentOS-Ceph-Luminous.repo file in a similar way and have the package that installs the repo flip the bit or will it overwrite entirely?	22:59
ianw	clarkb: i think the package will overwrite it. for epel, we have pre-installed the package with https://git.openstack.org/cgit/openstack/diskimage-builder/tree/diskimage_builder/elements/epel	23:00
clarkb	as an alternative we can have multi-node-bridge role do a text substition on that file after the packages install the repo	23:00
clarkb	but before we install ovs	23:00
clarkb	oh got it	23:00
ianw	the idea for epel is that you do "yum install --enablerepo=epel ..." so we know what we're dragging in explicitly	23:00
clarkb	in that case maybe the text substition in multi-node-bridge role is better	23:00
mwhahaha	it's really specific to that role, so if the mirrors exist in the ansible vars then do a text substitution between the install of the repo before the package	23:01
mwhahaha	this is the annoying problems with the CI repo configs that we end up duplicating this same thing all over the place	23:01
ianw	hrm, i forget, we uninstall the repos after right?	23:02
mwhahaha	in that role, yes	23:02
mwhahaha	http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/multi-node-bridge/tasks/common.yaml#n45	23:02
mwhahaha	it's litterally to just get the queens version of OVS	23:02
ianw	yeah, that's right, but if they were there we don't	23:02
ianw	and i think at one point we used to install RDO in the base package, but that caused problems, which is why we moved it "up" to this point	23:03
clarkb	mwhahaha: yes, and every other distro avoids this problem by having A repo	23:03
clarkb	(even fedora has everything in a single repo iirc)	23:03
mwhahaha	pretty sure ubuntu has more than one	23:03
mwhahaha	UCA is the extra one	23:03
mwhahaha	anyway	23:03
* mwhahaha then points to pypi, yum, docker, etc mirrors		23:04
*** eernst has joined #openstack-infra		23:04
ianw	yeah, not really centos's fault because it's raison d'etre is to be rhel like, so if rhel doesn't have ovs in base then we end up like this	23:04
clarkb	ya it just gets really complicated quickly	23:05
*** bobh has quit IRC		23:05
ianw	we could install rdo like epel and disable it	23:06
clarkb	ianw: I'm working on a lineinfile patch for multi-node-bridge	23:06
*** bobh has joined #openstack-infra		23:06
clarkb	which will replace the remote with the mirror node	23:06
clarkb	(I hope)	23:07
mwhahaha	we used to always have the N-1 version installed by default but i think that caused more problems	23:07
mwhahaha	it would be nice if we got OVS from something that only contained OVS	23:07
clarkb	mwhahaha: ya we removed it from the image because that caused confusion too	23:07
ianw	++	23:07
mwhahaha	at this point i think just lineinfile mirror.centos.org with the local mirrors is probably the best bet	23:08
ianw	yes the KISS approach	23:08
mwhahaha	though i wonder how that plays in with the uninstall if the file is changed	23:08
* mwhahaha shrugs		23:08
ianw	i think we almost had linuxbridge working for multinode too? i remember that being a possibility for removing ovs	23:08
ianw	by "we", i mean clarkb, i didn't do anything useful :)	23:09
clarkb	ianw: neutron assumed ovs unfortunately	23:10
clarkb	so it got tricky to untable the unfortaunte dep from devstack + neutron that that ovs bridge existing	23:11
clarkb	and I gave up	23:11
clarkb	its entirely doable at this point if we get devstack + neutron to learn how to plug the linux bridge bridge into its own ovs bridges	23:11
clarkb	anyone know where to get a copy of /etc/yum.repos.d/CentOS-OpenStack-queens.repo ?	23:12
openstackgerrit	Jp Maxwell proposed openstack-infra/project-config master: Adding the netlify-sandbox project https://review.openstack.org/624523	23:13
openstackgerrit	Matt Riedemann proposed openstack-infra/elastic-recheck master: Add query for glance-api proxy error bug 1808063 https://review.openstack.org/624524	23:13
openstack	bug 1808063 in OpenStack-Gate "glanceclient.exc.HTTPBadGateway: 502 Proxy Error during server snapshot" [Undecided,Confirmed] https://launchpad.net/bugs/1808063	23:13
mriedem	clarkb: ^	23:13
*** slaweq has joined #openstack-infra		23:14
ianw	clarkb: http://paste.openstack.org/show/737099/ i think, from https://www.rdoproject.org/repos/rdo-release.rpm	23:15
*** kjackal has quit IRC		23:15
mwhahaha	http://mirror.centos.org/centos/7/extras/x86_64/Packages/centos-release-openstack-queens-1-2.el7.centos.noarch.rpm	23:15
* mwhahaha is downloading to fetch		23:16
mwhahaha	http://paste.openstack.org/show/737100/	23:17
mwhahaha	it's more than just the rdo-release	23:17
*** _alastor_ has quit IRC		23:17
mwhahaha	if you swap out mirror.centois.org and buildlogs.centos.org i think we have mirrors for those	23:17
mwhahaha	though only mirror.centos.org is the one that is enabled	23:18
openstackgerrit	Clark Boylan proposed openstack-infra/zuul-jobs master: Use mirrors if available when installing OVS on centos https://review.openstack.org/624525	23:18
clarkb	ya I was just doing mirror.centos.org since it si the only one enabled	23:18
clarkb	I think something like ^ should work	23:18
mwhahaha	http://paste.openstack.org/show/737101/ is the ceph one	23:19
clarkb	I don't think multi-node-bridge is a trusted role so we should be able to depends on that chagne from a tripelo change to make sure it works	23:19
mwhahaha	yea that should work	23:19
*** slaweq has quit IRC		23:19
clarkb	mwhahaha: care to push that depends on change (I don't know what would be a good representative set)	23:20
mwhahaha	sure	23:20
clarkb	thanks	23:20
mwhahaha	https://review.openstack.org/#/c/624526/	23:21
mwhahaha	will get an assortment of jobs	23:21
*** jamesmcarthur has joined #openstack-infra		23:22
*** eernst has quit IRC		23:25
*** jamesmcarthur has quit IRC		23:26
melwitt	clarkb: mriedem just told me about https://bugs.launchpad.net/openstack-gate/+bug/1808010 while I was looking at a failed job run, but in the log I see "WARN: failed: route add -net "0.0.0.0/0" gw "10.1.0.1"" but not any messages about no space left. is that a separate known launchpad bug or do you think it's the same thing?	23:27
openstack	Launchpad bug 1808010 in OpenStack-Gate "Tempest cirros boots fail due to lack of disk space" [Undecided,New]	23:27
melwitt	http://logs.openstack.org/82/623282/3/check/nova-next/a900344/logs/testr_results.html.gz	23:27
clarkb	melwitt I thought it was thr same thing	23:28
*** smarcet has quit IRC		23:28
melwitt	ok, thanks	23:28
clarkb	melwitt: in the bug it has messages about the disk errors	23:28
clarkb	happena before failing to set the route	23:29
melwitt	yeah, I don't see them in the cirros log excerpt on the job I was looking at (linked above) so I wasn't sure	23:29
clarkb	huh maybe disk space isnt the root cause then	23:29
clarkb	Im pretty sure the broken default route is what breaks ssh	23:29
clarkb	and thought it was caused by the disk issue	23:29
melwitt	but indeed when I search for it on logstash I see most of the hits coming from the networking-odl-tempest-fluorine job, all failures	23:30
melwitt	*when I search for the failed route add	23:30
*** slaweq has joined #openstack-infra		23:36
openstackgerrit	Jp Maxwell proposed openstack-infra/project-config master: Adding the netlify-sandbox project https://review.openstack.org/624523	23:38
clarkb	melwitt: we probably want to better understand what could cause that route add failure	23:40
clarkb	and go from there	23:40
clarkb	cirros runs busybox so it may be different than whatever distro you have locally too	23:40
*** slaweq has quit IRC		23:40
*** armax has quit IRC		23:42
*** xek_ has joined #openstack-infra		23:43
*** xek has quit IRC		23:46
*** smarcet has joined #openstack-infra		23:46
melwitt	clarkb: ack, thanks	23:49
melwitt	I added a note to the launchpad	23:49
*** dklyle has joined #openstack-infra		23:51
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Enable github shared admin account https://review.openstack.org/624531	23:52
*** xarses has quit IRC		23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!