Tuesday, 2019-04-23

openstackgerrit	Clark Boylan proposed opendev/system-config master: Double stack size on gitea https://review.opendev.org/654634	00:00
clarkb	there we go	00:00
clarkb	I'm gonna go track down dinner now	00:00
clarkb	but will try to keep an eye on ^ as fixing that will be nice	00:00
*** ijw has quit IRC		00:17
*** mattw4 has quit IRC		00:23
*** michael-beaver has quit IRC		00:23
*** gyee has quit IRC		00:29
openstackgerrit	Merged opendev/system-config master: Use swift to back intermediate docker registry https://review.opendev.org/653613	00:30
*** mriedem has quit IRC		00:30
*** dave-mccowan has joined #openstack-infra		00:35
*** Weifan has quit IRC		00:42
*** markvoelker has joined #openstack-infra		00:51
*** jamesmcarthur has quit IRC		00:54
*** smarcet has joined #openstack-infra		01:00
*** whoami-rajat has joined #openstack-infra		01:01
*** ricolin has joined #openstack-infra		01:15
*** diablo_rojo has quit IRC		01:16
*** smarcet has quit IRC		01:24
*** smarcet has joined #openstack-infra		01:28
mordred	clarkb, corvus: sorry - was AFK way more today than I originally expected - had to deal with a bunch of family stuff - I think I'm caught up on openstack/openstack, stack sizes - and intermediate registries from scrollback - nice work on all of that	01:36
*** rlandy\|ruck has quit IRC		01:46
*** hwoarang has quit IRC		01:55
*** hwoarang has joined #openstack-infra		02:00
clarkb	mordred: isnt that a fun git bug?	02:02
*** nicolasbock has quit IRC		02:04
*** _erlon_ has quit IRC		02:05
*** ykarel\|away has joined #openstack-infra		02:26
*** anteaya has joined #openstack-infra		02:34
*** Weifan has joined #openstack-infra		02:44
*** dklyle has quit IRC		02:45
*** dklyle has joined #openstack-infra		02:46
*** Weifan has quit IRC		02:48
*** dave-mccowan has quit IRC		02:48
*** bhavikdbavishi has joined #openstack-infra		03:04
*** ykarel\|away is now known as ykarel		03:06
*** hongbin has joined #openstack-infra		03:10
*** bhavikdbavishi has quit IRC		03:10
*** Qiming has quit IRC		03:17
*** hwoarang has quit IRC		03:18
*** rh-jelabarre has quit IRC		03:19
*** hwoarang has joined #openstack-infra		03:24
*** bhavikdbavishi has joined #openstack-infra		03:28
*** zhangfei has joined #openstack-infra		03:39
*** zhangfei has quit IRC		03:40
*** zhangfei has joined #openstack-infra		03:41
*** lpetrut has joined #openstack-infra		03:50
*** yamamoto has quit IRC		04:11
*** yamamoto has joined #openstack-infra		04:12
*** lpetrut has quit IRC		04:13
*** ramishra has joined #openstack-infra		04:15
*** yamamoto has quit IRC		04:19
*** udesale has joined #openstack-infra		04:22
*** yamamoto has joined #openstack-infra		04:33
*** hongbin has quit IRC		04:33
*** zhangfei has quit IRC		04:43
*** markvoelker has quit IRC		04:57
*** Weifan has joined #openstack-infra		05:00
*** Weifan has quit IRC		05:00
*** ykarel is now known as ykarel\|afk		05:01
*** ykarel\|afk has quit IRC		05:06
*** raukadah is now known as chandankumar		05:11
*** jaosorior has joined #openstack-infra		05:17
*** yamamoto has quit IRC		05:21
*** zhurong has joined #openstack-infra		05:23
*** ykarel\|afk has joined #openstack-infra		05:23
*** yamamoto has joined #openstack-infra		05:23
*** ykarel\|afk is now known as ykarel		05:24
*** yamamoto has quit IRC		05:26
*** yamamoto has joined #openstack-infra		05:27
*** yamamoto has quit IRC		05:27
*** ykarel_ has joined #openstack-infra		05:28
*** ykarel has quit IRC		05:31
*** armax has quit IRC		05:32
*** kjackal has joined #openstack-infra		05:33
*** ccamacho has quit IRC		05:46
dangtrinhnt	Hi infra time. Right now the default topic of #poenstack-fenix channel is a little weird. I would like to change that but looks like I don't have enough privileges to do that. If someone can help, it would be great. Many thanks.	05:47
dangtrinhnt	Infra Team.	05:47
*** quiquell\|off is now known as quiquell\|rover		05:49
*** yamamoto has joined #openstack-infra		05:51
*** kjackal has quit IRC		05:56
*** lpetrut has joined #openstack-infra		06:00
*** pcaruana has joined #openstack-infra		06:06
AJaeger	config-core, here's a change to use py36 for some periodic jobs - please put on your review queue: https://review.opendev.org/654571	06:08
*** electrofelix has joined #openstack-infra		06:14
icey	I think I'm missing a project after the openstack->opendev migration? when I try to `git review ...` I get "fatal: Project not found: openstack/charm-vault ... fatal: Could not read from remote repository." I'm guessing it's because it somehow moved into a namespace "x" on opendev.org (https://opendev.org/x/charm-vault)	06:15
quiquell\|rover	hello, what's the replacement for https://git.openstack.org/cgit/... with opendev ?	06:17
*** kjackal has joined #openstack-infra		06:18
*** dpawlik has joined #openstack-infra		06:18
icey	quiquell\|rover: opendev.org seems to be	06:21
*** slaweq has joined #openstack-infra		06:21
*** yamamoto has quit IRC		06:22
quiquell\|rover	sshnaidm\|afk: ^	06:23
quiquell\|rover	sshnaidm\|afk: fixed reproducer with latests comments https://review.rdoproject.org/r/20371	06:23
*** yamamoto has joined #openstack-infra		06:25
*** yamamoto has quit IRC		06:25
AJaeger	quiquell\|rover: the old https URLs should redirect	06:26
*** yamamoto has joined #openstack-infra		06:26
AJaeger	icey: yes, see all the emails on openstack-infra, openstack-discuss about OpenDev	06:26
icey	AJaeger: I've seen the emails, I'm wondering why most of the openstack-charms stayed under openstack, and charm-vault moved :-/	06:27
AJaeger	icey: you need to update your ssh remotes, we cannot redirect those.	06:27
AJaeger	icey: charm-vault is not an official OpenStack project	06:27
icey	AJaeger: interesting :-/	06:27
AJaeger	icey: not listed here: https://governance.openstack.org/tc/reference/projects/openstack-charms.html	06:28
icey	AJaeger: indeed - I suspect that's an oversight; annoying but thanks :)	06:28
*** yamamoto has quit IRC		06:29
*** yamamoto has joined #openstack-infra		06:29
*** yamamoto has quit IRC		06:29
AJaeger	icey: it's no oversight, see https://review.opendev.org/#/c/541287/ - the PTL rejected it to be part of official charm	06:30
AJaeger	bbl	06:30
*** ykarel_ is now known as ykarel		06:31
icey	I see, thanks again AJaeger	06:31
ykarel	Looks like OpenStack Release Bot sending wrong updates to .gitreview	06:32
ykarel	without rebasing	06:32
ykarel	see some last updates:- https://review.opendev.org/#/q/owner:OpenStack+Release+Bot+gitreview	06:33
ykarel	infra-root ^^ AJaeger ^^	06:33
*** udesale has quit IRC		06:35
*** dciabrin has joined #openstack-infra		06:35
*** udesale has joined #openstack-infra		06:37
ykarel	Okk seems those are old reviews posted before migration	06:38
ykarel	but merging those as it is without rebase will override .gitreview	06:39
*** bhavikdbavishi has quit IRC		06:39
*** bhavikdbavishi has joined #openstack-infra		06:41
*** quiquell\|rover is now known as quique\|rover\|brb		06:42
*** udesale has quit IRC		06:44
*** udesale has joined #openstack-infra		06:44
*** zhangfei has joined #openstack-infra		06:51
*** markvoelker has joined #openstack-infra		06:52
*** pgaxatte has joined #openstack-infra		06:58
AJaeger	ykarel: best talk with release team...	06:58
ykarel	AJaeger, ack, seems they are in some other timezone, already posted a issue this morning	06:59
AJaeger	ykarel: patching those is fine as well ;)	06:59
ykarel	s/a/other	06:59
*** yamamoto has joined #openstack-infra		07:03
*** ginopc has joined #openstack-infra		07:08
*** yamamoto has quit IRC		07:11
*** quique\|rover\|brb is now known as quiquell\|rover		07:12
*** arxcruz\|off\|23 is now known as arxcruz		07:13
*** ccamacho has joined #openstack-infra		07:13
*** iurygregory has joined #openstack-infra		07:14
*** zhangfei has quit IRC		07:15
*** zhangfei has joined #openstack-infra		07:15
*** tosky has joined #openstack-infra		07:17
*** udesale has quit IRC		07:18
*** udesale has joined #openstack-infra		07:19
*** ccamacho has quit IRC		07:27
*** ccamacho has joined #openstack-infra		07:27
*** yamamoto has joined #openstack-infra		07:31
*** yamamoto has quit IRC		07:31
*** fmount has quit IRC		07:32
*** yamamoto has joined #openstack-infra		07:33
openstackgerrit	Jason Lee proposed opendev/storyboard master: WIP: Updated Loader functionality in preparation for Writer https://review.opendev.org/654812	07:33
*** fmount has joined #openstack-infra		07:35
*** gfidente has joined #openstack-infra		07:40
tosky	AJaeger: uhm, huge bunch of wrong fixes for the opendev transition	07:40
*** jpena\|off has joined #openstack-infra		07:43
*** jpena\|off is now known as jpena		07:43
*** ykarel is now known as ykarel\|lunch		07:45
openstackgerrit	Bernard Cafarelli proposed openstack/project-config master: Update Grafana dashboards for stable Neutron releases https://review.opendev.org/653354	07:52
*** dtantsur\|afk is now known as dtantsur		07:55
*** jpich has joined #openstack-infra		07:55
*** kjackal has quit IRC		07:57
*** kjackal has joined #openstack-infra		07:58
*** roman_g has joined #openstack-infra		08:01
*** rpittau\|afk is now known as rpittau		08:08
*** helenafm has joined #openstack-infra		08:08
*** gfidente has quit IRC		08:12
*** lseki has joined #openstack-infra		08:12
*** lucasagomes has joined #openstack-infra		08:15
*** dikonoor has joined #openstack-infra		08:28
*** derekh has joined #openstack-infra		08:28
*** apetrich has joined #openstack-infra		08:30
frickler	infra-root: do we already have a plan to make opendev.org listen on IPv6? seems the lack of that is actively breaking things, see e.g. this paste posted in #-qa http://paste.openstack.org/show/749620/ . we might want to drop the AAAA record until we get that fixed	08:31
*** ginopc has quit IRC		08:33
*** rossella_s has joined #openstack-infra		08:36
*** ginopc has joined #openstack-infra		08:39
*** e0ne has joined #openstack-infra		08:39
*** tkajinam has quit IRC		08:42
*** mleroy has joined #openstack-infra		08:52
*** ykarel\|lunch is now known as ykarel		08:52
*** dikonoor has quit IRC		08:56
*** jbadiapa has joined #openstack-infra		09:02
*** ginopc has quit IRC		09:06
*** dikonoor has joined #openstack-infra		09:09
*** gfidente has joined #openstack-infra		09:20
*** kjackal has quit IRC		09:24
*** kjackal has joined #openstack-infra		09:24
*** lpetrut has quit IRC		09:30
AJaeger	tosky: yeah, I handed out a few -1s ;(	09:30
*** jaosorior has quit IRC		09:31
*** amansi26 has joined #openstack-infra		09:35
*** yamamoto has quit IRC		09:40
*** Lucas_Gray has joined #openstack-infra		09:42
*** gfidente has quit IRC		09:46
*** yamamoto has joined #openstack-infra		09:48
*** yamamoto has quit IRC		09:53
*** jcoufal has joined #openstack-infra		09:54
*** bhavikdbavishi has quit IRC		09:59
*** gfidente has joined #openstack-infra		10:00
*** kjackal has quit IRC		10:01
*** jaosorior has joined #openstack-infra		10:14
*** threestrands has quit IRC		10:15
*** lseki has quit IRC		10:16
*** kjackal has joined #openstack-infra		10:16
*** gfidente has quit IRC		10:32
*** sshnaidm\|afk is now known as sshnaidm		10:40
*** bhavikdbavishi has joined #openstack-infra		10:44
*** amansi26 has quit IRC		10:46
*** ginopc has joined #openstack-infra		10:47
*** bhavikdbavishi has quit IRC		10:56
*** nicolasbock has joined #openstack-infra		11:00
*** jaosorior has quit IRC		11:00
*** dikonoor has quit IRC		11:03
aspiers	git.openstack.org[0: 23.253.125.17]: errno=No route to host	11:08
aspiers	git.openstack.org[1: 2001:4800:7817:103:be76:4eff:fe04:e3e3]: errno=Network is unreachable	11:08
aspiers	infra-root: is this expected post-transition?	11:08
openstackgerrit	Ghanshyam Mann proposed openstack/openstack-zuul-jobs master: Add python36-charm-jobs project template https://review.opendev.org/654954	11:09
*** happyhemant has joined #openstack-infra		11:09
* aspiers reads through mail threads to see if he missed something		11:09
*** yamamoto has joined #openstack-infra		11:12
*** ykarel is now known as ykarel\|afk		11:13
frickler	aspiers: no, those are expected to work and do work fine for me, maybe a local networking issue for you? we do have a known issue with opendev.org not responding via IPv6, though	11:14
aspiers	frickler: I just saw the same issue reported above in the scrollback	11:14
aspiers	nope, this is not an IPv6 issue - see the above paste which is both IPv4 and v6	11:15
aspiers	frickler: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2019-04-22.log.html	11:16
aspiers	clarkb: in case you didn't know, there's also git remote set-url these days; no need to remove and re-add	11:17
frickler	aspiers: yeah, I haven't read much scrollback yet. so you did have some git:// remote?	11:17
aspiers	frickler: yes	11:18
frickler	aspiers: ah, o.k., that's really some weird way of reporting errors	11:19
aspiers	clarkb: although I'm not sure if that actually has much benefit, since I guess a remote remove won't immediately GC all the objects from that remote forcing a redownload	11:19
aspiers	frickler: sorry, which way is weird?	11:19
*** yamamoto has quit IRC		11:19
frickler	aspiers: oh, sorry, that could be misunderstood. I was talking about git, not about you	11:21
aspiers	:)	11:21
aspiers	frickler: you mean No route to host?	11:21
frickler	aspiers: yes, that and network unreachable. they really should be "connection refused" instead	11:21
aspiers	frickler: actually that error message comes straight from the OS via strerror(3), so I think it has to be correct	11:24
*** bhavikdbavishi has joined #openstack-infra		11:25
aspiers	although I can ping 23.253.125.17 so it does seem weird	11:25
aspiers	there must be something else strange going on	11:25
*** bhavikdbavishi has quit IRC		11:25
frickler	aspiers: hmm, via tcpdump I see "ICMP host 23.253.125.17 unreachable - admin prohibited", that doesn't match to "No route to host" to me	11:25
frickler	aspiers: but yeah, maybe a kernel thing instead of git	11:26
aspiers	frickler: in any case, a) you are saying it's supposed to work? and b) what's the correct future-proof git:// host to use?	11:26
*** smarcet has quit IRC		11:26
*** bhavikdbavishi has joined #openstack-infra		11:26
frickler	aspiers: no, the git:// variant is no longer working, since our new frontend doesn't support it. changing to http(s) is the correct way to fix this issue	11:27
frickler	aspiers: I was only confused by the error message	11:28
aspiers	ah	11:28
aspiers	was it announced anywhere that git:// no longer works? if so I missed it	11:28
aspiers	if not, I fear you can expect a lot more questions about this	11:28
frickler	it should have been in one of the mails early on	11:28
*** rh-jelabarre has joined #openstack-infra		11:29
frickler	there was also a set of patches removing the git:// references from devstack and elsewhere	11:29
frickler	let me check the archives	11:29
*** bhavikdbavishi1 has joined #openstack-infra		11:30
aspiers	it wasn't in an announcement, but it was buried in 3 followups within that large thread	11:30
aspiers	http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004921.html	11:30
*** bhavikdbavishi has quit IRC		11:31
*** bhavikdbavishi1 is now known as bhavikdbavishi		11:31
*** apetrich has quit IRC		11:32
*** zhangfei has quit IRC		11:32
*** jpena is now known as jpena\|lunch		11:33
frickler	aspiers: yeah, seems like this was a bit hidden, sorry for that	11:33
aspiers	np :) just want to help you avoid a flood of duplicate questions	11:34
aspiers	frickler: here's a nice workaround to advertise:	11:35
aspiers	git config --global url.https://git.openstack.org/.insteadof git://git.openstack.org/	11:35
AJaeger	frickler: could you review https://review.opendev.org/#/c/654954/ and https://review.opendev.org/654571 , please?	11:35
*** quiquell\|rover is now known as quique\|rover\|lun		11:35
*** quique\|rover\|lun is now known as quique\|rover\|eat		11:36
*** apetrich has joined #openstack-infra		11:36
AJaeger	argh, just gave -1 on 954...	11:37
*** bhavikdbavishi has quit IRC		11:42
*** bhavikdbavishi has joined #openstack-infra		11:43
*** lyarwood has joined #openstack-infra		11:44
aspiers	OK this is like a million times nicer https://opendev.org/openstack/nova-specs	11:48
aspiers	kudos corvus clarkb fungi and all of infra-root!	11:49
openstackgerrit	Merged openstack/openstack-zuul-jobs master: Use py36 instead of py35 for periodic master jobs https://review.opendev.org/654571	11:50
openstackgerrit	Ghanshyam Mann proposed openstack/openstack-zuul-jobs master: Add python36-charm-jobs project template https://review.opendev.org/654954	11:55
*** yamamoto has joined #openstack-infra		11:57
*** panda is now known as panda\|lunch		11:57
*** quique\|rover\|eat is now known as quiquell\|rover		11:59
*** markvoelker has quit IRC		12:01
*** rlandy has joined #openstack-infra		12:06
*** boden has joined #openstack-infra		12:07
*** rlandy is now known as rlandy\|ruck		12:07
frickler	infra-root: the hashdiff-0.3.9 gem breaks beaker-trusty, previous passing job had hashdiff-0.3.8 http://logs.openstack.org/77/654577/1/gate/openstackci-beaker-ubuntu-trusty/dfa87dd/job-output.txt.gz#_2019-04-22_20_56_13_419699	12:09
AJaeger	frickler: ah. Can we use that version for now?	12:10
*** mriedem has joined #openstack-infra		12:11
frickler	AJaeger: not sure, maybe someone with more puppet voodoo than myself can find a fix. otherwise I'd propose to make that job non-voting for now	12:11
boden	hi. I'm trying to understand if/when we might expect "Hound" (code search) to work again? I sent a note to the ML http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005481.html. but never saw a response about when it might be available	12:11
*** jbadiapa has quit IRC		12:22
*** gfidente has joined #openstack-infra		12:28
*** jpena\|lunch is now known as jpena		12:31
openstackgerrit	Slawek Kaplonski proposed openstack/project-config master: Add openstacksdk-functional-devstack-networking job to Neutron dashboard https://review.opendev.org/652993	12:31
*** gfidente has quit IRC		12:34
AJaeger	config-core, could you review https://review.opendev.org/654574 as next step for py36 jobs, please? thanks!	12:39
AJaeger	boden: idea is to use https://opendev.org/explore/code instead of codesearch	12:40
*** gfidente has joined #openstack-infra		12:40
*** kaiokmo has joined #openstack-infra		12:40
openstackgerrit	Merged openstack/openstack-zuul-jobs master: Add python36-charm-jobs project template https://review.opendev.org/654954	12:40
boden	AJaeger as of right now there doesn't seem to be parity... search results that I need return nothing with https://opendev.org/explore/code	12:41
openstackgerrit	Tobias Henkel proposed openstack/diskimage-builder master: Support defining the free space in the image https://review.opendev.org/655127	12:41
boden	AJaeger I don't see code explorer working for our needs as-is, unless I'm missing something	12:41
*** kgiusti has joined #openstack-infra		12:41
*** bhavikdbavishi has quit IRC		12:42
boden	AJaeger for example I want to find all the requirements.txt files that have the string "neutron-lib-current", but it seems to not find anything https://opendev.org/explore/code?q=neutron-lib-current&tab=	12:42
frickler	boden: you need to quote the "-" as "\-"	12:42
boden	frickler unless I'm missing something, it doesn't help	12:44
boden	there are at least 20 projects that have the string "neutron-lib-current" in their requirements.txt file...	12:44
*** kjackal has quit IRC		12:44
*** kjackal has joined #openstack-infra		12:44
*** jamesmcarthur has joined #openstack-infra		12:46
*** lseki has joined #openstack-infra		12:46
frickler	boden: do you have an example? I can't seem to find one easily	12:48
AJaeger	frickler: dragonflow/requirements.txt	12:48
boden	frickler https://github.com/openstack/networking-ovn/blob/master/requirements.txt#L24	12:48
boden	frickler, or as another example I want to find all uses of the (neutron) constant "ROUTER_CONTROLLER"... I can't find any, and there are some for sure	12:49
*** xarses_ has joined #openstack-infra		12:49
*** rh-jelabarre has quit IRC		12:50
*** ykarel\|afk is now known as ykarel		12:51
*** aaronsheffield has joined #openstack-infra		12:51
*** xarses has quit IRC		12:52
frickler	boden: hmm, that's indeed strange. searching for some other term like "policy\-in\-code" seems to work fine. maybe https://opendev.org/explore/code?q=infra+initiatives&tab= can help you as a workaround for the first search. but there certainly seems to be an issue with terms containing "_"	12:55
AJaeger	boden: for your specific query: in openstack namespace it's neutron, neutron-lib, networking-odl. I don't have the namespace x checked out to grep there	12:57
*** andreww has joined #openstack-infra		12:58
*** xarses_ has quit IRC		13:01
*** jpich has quit IRC		13:01
*** gfidente has quit IRC		13:01
*** smarcet has joined #openstack-infra		13:01
*** jpich has joined #openstack-infra		13:02
openstackgerrit	Monty Taylor proposed zuul/nodepool master: Update devstack settings and docs for opendev https://review.opendev.org/654230	13:03
mordred	frickler: I pushed up https://review.opendev.org/655133 which I think should fix that ^^	13:03
mordred	boden: I was out yesterday so I'm not 100% caught up on hound - I believe clarkb was looking at it yesterday though	13:04
*** eharney has quit IRC		13:04
AJaeger	mordred: could you review a small py35->36 change, please? https://review.opendev.org/#/c/654574/	13:05
mordred	we'd definitely LIKE to replace it with the gitea codesearch - but I think more work might need to go in to that to make it suitable (there is now upstream support for pluggable search backends and we'd like to get an elasticsearch backend put in there, for instance)	13:05
AJaeger	thanks, mordred	13:07
boden	frickler I'm not sure we can count on "infra initiatives" being there... AJaeger we also have projects in the x/ namespace that we need to search	13:09
boden	just as a heads up this will certainly impact some of our work on neutron blueprints since we need the ability to search cross projects for impacts	13:10
openstackgerrit	Merged openstack/project-config master: Use py36 instead of py35 for periodic master jobs https://review.opendev.org/654574	13:13
clarkb	mordred: boden I've not had achance to look athound yet. I think it updates based on projects.yaml but needs a restart? unsure. I'm likely to followup on git stack size segfaults for openstack/openstack first today then can start looking athound	13:14
openstackgerrit	Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-neutron-lib-master https://review.opendev.org/654580	13:14
openstackgerrit	Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-ovsdbapp-master https://review.opendev.org/655136	13:14
AJaeger	clarkb: for the git stack, we need to fix the system-config tests first, see above for frickler comment on hashdiff-0.3.9 breaking beaker-trusty. That needs fixing first.	13:14
AJaeger	clarkb: and good morning!	13:15
*** jbadiapa has joined #openstack-infra		13:15
clarkb	AJaeger: got it and thanks. I'm not quite "here" yet	13:15
clarkb	the openstack infra ci helper has our package pins iirc	13:16
*** jamesmcarthur has quit IRC		13:18
openstackgerrit	Monty Taylor proposed opendev/system-config master: Set logo height rather than width https://review.opendev.org/655139	13:19
openstackgerrit	Darragh Bailey (electrofelix) proposed zuul/zuul master: Improve proxy settings support for compose env https://review.opendev.org/655140	13:20
openstackgerrit	Darragh Bailey (electrofelix) proposed zuul/zuul master: Add some packages for basic python jobs https://review.opendev.org/655141	13:20
openstackgerrit	Darragh Bailey (electrofelix) proposed zuul/zuul master: Scale nodes up to 4 instances https://review.opendev.org/655142	13:20
mriedem	mnaser: i see that http://status.openstack.org/elastic-recheck/#1806912 hits predominantly in vexxhost-sjc1 nodes, any idea if there could be something slowing down g-api startup there?	13:21
boden	clarkb okay... should I create a bug or something to track this work, or no?	13:21
mordred	clarkb: the config.json definitely looks updated on codesearch - want me to just restart the service?	13:21
*** lpetrut has joined #openstack-infra		13:21
clarkb	mordred: probably worth a try	13:22
clarkb	boden: I'm not sureit is needed. It id a known thing just lower on priority list because worst caselocal grep works	13:22
mordred	oh - nope. there's another issue	13:22
openstackgerrit	Andreas Jaeger proposed opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 https://review.opendev.org/655143	13:23
AJaeger	clarkb: is that the proper fix for hashdiff? ^	13:23
*** panda\|lunch is now known as panda		13:23
clarkb	AJaeger: I think so	13:24
*** redrobot has joined #openstack-infra		13:24
openstackgerrit	Monty Taylor proposed opendev/jeepyb master: Use opendev and https by default https://review.opendev.org/655145	13:24
mnaser	mriedem: odd. has it increased recently? We added IPv6 one or two weeks ago	13:24
mordred	clarkb, frickler: ^^ that is needed to fix codesearch	13:24
mriedem	mnaser: what's really weird is the g-api logs show it only taking about 7 seconds for g-api to startup	13:25
mriedem	http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/screen-g-api.txt.gz	13:25
*** jrist- is now known as jrist		13:25
mriedem	http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/devstacklog.txt.gz#_2019-04-23_05_17_22_811	13:25
*** sshnaidm is now known as sshnaidm\|afk		13:26
*** quiquell\|rover is now known as quique\|rover\|lun		13:26
*** quique\|rover\|lun is now known as quique\|rover\|eat		13:26
mriedem	looks like devstack is uploading an image and then waiting to get the image back and maybe it's taking longer in swift?	13:27
openstackgerrit	Monty Taylor proposed opendev/jeepyb master: Use opendev and https by default https://review.opendev.org/655145	13:27
boden	clarkb local grep isn't really an option for this work; we are talking 10s of projects that would need to be up to date to grep across them and whats more we can't share that search with people as its used in the code reviews	13:27
mordred	boden: yeah. we're working on getting codesearch fixed	13:28
mriedem	rosmaita: ^ on g-api taking over a minute to 'start' in case you can identify something in those logs	13:28
mnaser	mriedem: is it possible that it is trying to check if its listening on ipv4/ipv6 and the check is verifying the other port?	13:28
mriedem	rosmaita: http://status.openstack.org/elastic-recheck/#1806912	13:28
openstackgerrit	Monty Taylor proposed opendev/jeepyb master: Use opendev and https by default https://review.opendev.org/655145	13:29
rosmaita	mriedem: ack	13:29
mriedem	mnaser: not sure	13:29
mnaser	then again we're not the only dual stack cloud	13:30
mnaser	I think	13:30
*** rlandy\|ruck is now known as rlandy\|ruck\|mtg		13:30
mriedem	the bug hits other providers	13:30
mriedem	just not as much	13:30
mriedem	i see a GET from g-api to swift here http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/screen-g-api.txt.gz#_Apr_23_04_58_17_713760	13:30
mriedem	that results in a 404	13:31
mriedem	which is maybe normal devstack checking to see if the image exists (or glance checking) before uploading it to swift?	13:31
mordred	boden, clarkb: I have restarted hound - it's going to take a couple of minutes because it's got a bunch of new stuff to clone and index	13:31
mriedem	guessing it's glance checking because then	13:31
mriedem	Apr 23 04:58:17.714742 ubuntu-bionic-vexxhost-sjc1-0005468291 devstack@g-api.service[23037]: INFO glance_store._drivers.swift.store [None req-8c0ac321-01c3-40ac-a9f5-0b733baac629 admin admin] Creating swift container glance	13:31
mnaser	yeah I saw that too, that seems business as usual	13:32
openstackgerrit	Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-neutron-lib-master https://review.opendev.org/654580	13:32
openstackgerrit	Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-ovsdbapp-master https://review.opendev.org/655136	13:32
frickler	mriedem: apaches proxies to 127.0.0.1:60998, but g-api is binding to :60999, not sure where these port numbers come from in the first place http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/apache_config/glance-wsgi-api_conf.txt.gz	13:33
frickler	http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/screen-g-api.txt.gz#_Apr_23_04_58_11_635308	13:33
*** sthussey has joined #openstack-infra		13:33
openstackgerrit	Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-neutron-lib-master https://review.opendev.org/654580	13:35
openstackgerrit	Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-ovsdbapp-master https://review.opendev.org/655136	13:35
frickler	for a working run I see 60999 in both locations	13:35
*** bhavikdbavishi has joined #openstack-infra		13:35
mriedem	hmm yeah and the curl is specifying noproxy	13:36
mriedem	oh i guess that's just part of wait_for_service in devstack	13:36
mordred	clarkb: if you get a sec, https://review.opendev.org/#/c/655133/ should fix the nodepool devstack job I thnik	13:37
openstackgerrit	Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-ovsdbapp-master https://review.opendev.org/655136	13:39
*** bhavikdbavishi has quit IRC		13:39
mriedem	frickler: it looks like the proxy port is random	13:40
mriedem	https://github.com/openstack/devstack/blob/master/lib/apache#L349	13:41
mriedem	port=$(get_random_port)	13:41
frickler	mriedem: yeah, so the glance config here also has 60998, not sure why uwsgi chooses 60999 instead http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/etc/glance/glance-uwsgi.ini.gz	13:42
*** liuyulong has joined #openstack-infra		13:43
mnaser	question: if we have a change that depends-on a review.openstack.org change, will the dependency not be taken into consideration or will it 'block' ?	13:44
mnaser	okay, never mind, it will just ignore it	13:44
mnaser	Zuul took two minutes to pick up a change and bring it to gate so I was starting to wonder what was going on	13:45
*** michael-beaver has joined #openstack-infra		13:45
*** jamesmcarthur has joined #openstack-infra		13:46
*** kranthikirang has joined #openstack-infra		13:47
*** quique\|rover\|eat is now known as quiquell\|rover		13:47
*** jamesmcarthur has quit IRC		13:47
*** smarcet has quit IRC		13:48
openstackgerrit	Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-neutron-lib-master https://review.opendev.org/654580	13:49
openstackgerrit	Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-ovsdbapp-master https://review.opendev.org/655136	13:49
pabelanger	mnaser: yah, depends-on: review.o.o will just be ignored	13:50
mriedem	frickler: here is where the random port is retrieved (note ipv4 only) https://github.com/openstack/devstack/blob/master/functions#L801	13:51
*** jamesmcarthur has joined #openstack-infra		13:52
*** smarcet has joined #openstack-infra		13:52
openstackgerrit	Nicolas Hicher proposed openstack/diskimage-builder master: openssh-server: enforce sshd config https://review.opendev.org/653890	13:53
AJaeger	amorin: is ovh-bhs1 ok again? We disabled it on the 19th due to network problems, can we use it again?	13:54
*** sshnaidm\|afk is now known as sshnaidm		13:55
*** yamamoto has quit IRC		13:56
*** Goneri has joined #openstack-infra		13:57
*** psachin has joined #openstack-infra		13:58
*** rh-jelabarre has joined #openstack-infra		13:59
*** amansi26 has joined #openstack-infra		13:59
mnaser	infra-root: either limestone is maybe having issues or rax executors are having network issues, we're seeing a lot of RETRY_LIMIT, jobs failing midway, someone caught one http://paste.openstack.org/show/749635/	14:00
*** jaosorior has joined #openstack-infra		14:00
mnaser	http://zuul.openstack.org/builds?result=RETRY_LIMIT	14:01
mnaser	we have a lot of RETRY_LIMIT fails	14:01
mnaser	no logs however..	14:01
*** Lucas_Gray has quit IRC		14:04
*** mleroy has left #openstack-infra		14:06
Shrews	pabelanger: any idea what happened here? http://logs.openstack.org/62/654462/3/gate/openstackci-beaker-ubuntu-trusty/e77f31d/job-output.txt.gz#_2019-04-23_13_21_28_866532	14:06
*** Lucas_Gray has joined #openstack-infra		14:06
mordred	boden: http://codesearch.openstack.org/?q=neutron-lib-current&i=nope&files=&repos= works now	14:07
pabelanger	Shrews: I think AJaeger just posted a patch for that	14:07
pabelanger	https://review.opendev.org/655143/ maybe?	14:07
Shrews	ah	14:07
pabelanger	AJaeger: looks like we might need to cap bundler too	14:08
*** smarcet has quit IRC		14:08
frickler	mriedem: humm, devstack is being run twice it seems. see the successful end of the first run here http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/job-output.txt.gz#_2019-04-23_05_00_55_349794 and then the log of the second run in http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/devstacklog.txt.gz	14:10
*** smarcet has joined #openstack-infra		14:12
mriedem	frickler: i think clarkb pointed that the last time i looked at this :)	14:16
mordred	mriedem, frickler: that doesn't seem like a thing we want it doing	14:17
quiquell\|rover	pabelanger: ping	14:17
AJaeger	pabelanger: yeah - want to take over my change?	14:17
quiquell\|rover	pabelanger: do we have any way to get zuul jobs enqueue time from all the stein cycle ?	14:17
quiquell\|rover	pabelanger: like the enqueued_time json value at zuul status api	14:18
corvus	infra-root: am i to understand that there are reports gitea is not answering on ipv6? is anyone working on that?	14:19
mordred	corvus: I believe that wound up being attempted use of git://	14:20
AJaeger	pabelanger: patching myself quickly	14:20
*** eharney has joined #openstack-infra		14:20
corvus	mordred: ah, ok thanks!	14:20
mordred	frickler: want to re-+2 this: https://review.opendev.org/#/c/655145/ ?	14:21
openstackgerrit	Andreas Jaeger proposed opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 and bundler 2.0.1 https://review.opendev.org/655143	14:21
corvus	mordred: this isn't working that great for me: telnet 2604:e100:3:0:f816:3eff:fe6b:ad62 80	14:22
*** electrofelix has quit IRC		14:22
pabelanger	AJaeger: thanks, can help land it	14:24
mordred	corvus: I agree	14:24
pabelanger	corvus: clarkb: mordred: with gitea, I don't see tags listed on the web interface, am I missing something obvious or is that not a setting enabled?	14:24
mordred	pabelanger: click the branch dropdown, then click tags	14:25
pabelanger	quiquell\|rover: I think it is logged in statsd otherwise I _think_ it is in db	14:25
*** armax has joined #openstack-infra		14:26
pabelanger	mordred: ah, thanks!	14:26
pabelanger	I was looking for a releases tab like in github	14:26
mordred	pabelanger: yes - we removed the releases tab becuase it inappropriately provides download links for git export tarballs, which is a terrible idea	14:27
mordred	and causes people to think that downloading those things and using them might work	14:27
corvus	which, incidentally, is something that github does and we can not prevent	14:27
mordred	yeah	14:27
corvus	so github is making its own "releases" of openstack	14:28
mordred	I wouldn't mind reneabling the page if we could fix it to _only_ show manually uploaded artifacts	14:28
mordred	since there is a "create release" and "upload artifact" api	14:28
corvus	i bet we could make a patch	14:28
mordred	yeah	14:28
mnaser	hmm	14:29
*** rlandy\|ruck\|mtg is now known as rlandy\|ruck		14:30
mnaser	corvus, mordred: curl 2604:e100:3:0:f816:3eff:fe6b:ad62:80 returns no route to host, but hitting 8080 gives connection refused	14:30
mnaser	so.. I don't think this is something we're doing?	14:30
corvus	yeah, it looks like the LB is only listening on v4: tcp 0 0 0.0.0.0:http 0.0.0.0:* LISTEN	14:31
corvus	i was just trying to verify that we restarted it after the config change that should have it listening on v6	14:31
pabelanger	mordred: ack, thanks	14:32
corvus	i'm not 100% sure of that, so i think it's worth a restart	14:32
mordred	corvus: ++	14:33
quiquell\|rover	pabelanger: do you know where http://grafana.openstack.org/d/T6vSHcSik/zuul-status?orgId=1 is taking the data from ?	14:35
corvus	i'm going to restart the container... mostly because we've never tested the "-sf" haproxy option in a container.	14:35
corvus	there will be a short interruption in service	14:35
corvus	done	14:36
corvus	telnet to the v6 address works now	14:37
pabelanger	quiquell\|rover: graphite.opendev.org	14:37
corvus	so that seems to have been the issue	14:37
quiquell\|rover	thanks	14:37
*** amansi26 has quit IRC		14:39
*** lpetrut has quit IRC		14:40
clarkb	infra-root https://review.opendev.org/#/c/655143/2 is the expected fix for system-config jobs which should allow us to merge the git stack size fix for openstack/openstack	14:42
*** smarcet has quit IRC		14:42
*** nhicher has joined #openstack-infra		14:42
quiquell\|rover	pabelanger, clarkb: do we store queued_time here https://graphite01.opendev.org/ I cannot find it	14:44
quiquell\|rover	enqueued_time I mean	14:44
*** smarcet has joined #openstack-infra		14:44
openstackgerrit	Paul Belanger proposed zuul/zuul master: Bump lru_cache size to 10 https://review.opendev.org/655173	14:44
*** udesale has quit IRC		14:45
*** udesale has joined #openstack-infra		14:46
*** ccamacho has quit IRC		14:46
*** smarcet has quit IRC		14:47
clarkb	quiquell\|rover: stats.timers.zuul.tenant.zuul.pipeline.check.resident_time.count is an example of that data Ithink	14:48
*** smarcet has joined #openstack-infra		14:48
quiquell\|rover	clarkb: can we filter that per queue name ?	14:49
quiquell\|rover	clarkb: or queue name is not stored ?	14:49
clarkb	quiquell\|rover: 'check' is the pipeline name	14:49
*** ramishra has quit IRC		14:49
AJaeger	clarkb: the infra spec helper fix fails, see http://logs.openstack.org/43/655143/2/check/legacy-puppet-openstack-infra-spec-helper-unit-ubuntu-trusty/8f2dede/job-output.txt.gz#_2019-04-23_14_45_25_116490 - any ideas?	14:50
clarkb	AJaeger: oh I think this was the thing that cmurphy and mordred were looking at. I think that repo may not be self testing at the moment	14:51
clarkb	mordred: cmurphy ^ are you able to confirm that? we may have to force merge that change :/	14:51
*** amoralej has joined #openstack-infra		14:52
mordred	clarkb: there's definitely something bonged with those jobs that we should dig in to - I believe last time we just force-merged but I don't have specifics this instant	14:54
quiquell\|rover	clarkb: was looking for something like stats.timers.zuul.tenant.zuul.pipeline.periodic.queue.tripleo.resident_time.count	14:54
openstackgerrit	Merged opendev/jeepyb master: Use opendev and https by default https://review.opendev.org/655145	14:54
clarkb	quiquell\|rover: I don't think we aggregate by the pipeline queue	14:55
quiquell\|rover	weshay\|rover: ^	14:55
clarkb	mordred: do you have an opinion on whether or not a force merge would be appropriate here?	14:56
*** lpetrut has joined #openstack-infra		14:56
amoralej	is there any known issue with nodes running in rax-ord?	14:56
mordred	clarkb: I don't - I want to dig in to the construction of that more and see if I can understand all the pieces better - but haven't had time - and now I need to jump on a call for a half hour	14:57
clarkb	amoralej: mnaser mentioend some problems with job retries. but I don't think anyone has had a chance to debug yet	14:57
amoralej	ack	14:57
mordred	clarkb: I don't thnik force-merging is likely to break anything any _more_ though	14:57
clarkb	mordred: ya I'm on a call myself	14:57
amoralej	i see jobs failing and being retried	14:57
clarkb	amoralej: ya that was what mnaser described. If a job fails in pre run stage it will be retried up to 3 times	14:57
mnaser	http://zuul.openstack.org/builds?result=RETRY_LIMIT	14:58
clarkb	I've got a meeting in just a minute but after can start helping to look at it	14:58
mnaser	it looks pretty wide spread right now but yeah, leaving it for who can look at it..	14:58
cmurphy	clarkb: this seems like a different issue than what i was worried about	14:58
clarkb	capturing what the console log of a job that gets retried would be useful if not already done	14:58
amoralej	in my case, are failing not in pre, but in run playbook	14:58
cmurphy	these unit tests are just looking for the spec helper in ../.. so it should still work	14:59
openstackgerrit	Matt Riedemann proposed opendev/elastic-recheck master: Add query for VolumeAttachment lazy load bug 1826000 https://review.opendev.org/655177	14:59
openstack	bug 1826000 in Cinder "Intermittent 500 error when listing volumes with details and all_tenants=1 during tempest cleanup" [Undecided,Confirmed] https://launchpad.net/bugs/1826000	14:59
clarkb	cmurphy: hrm that change pins bundler to < 2.3.0	14:59
clarkb	cmurphy: so maybe there is another chicken and egg in the testing?	14:59
cmurphy	clarkb: actually it doesn't in the unit tests http://logs.openstack.org/43/655143/2/check/legacy-puppet-openstack-infra-spec-helper-unit-ubuntu-trusty/8f2dede/job-output.txt.gz#_2019-04-23_14_45_21_610904	15:00
AJaeger	cmurphy: or is my change wrong?	15:00
cmurphy	AJaeger: you need to edit run_unit_tests.sh too	15:00
cmurphy	https://opendev.org/opendev/puppet-openstack_infra_spec_helper/src/branch/master/run_unit_tests.sh#L41	15:00
amoralej	clarkb, in some cases nothing at all https://imgur.com/a/kj6C7c5	15:01
openstackgerrit	Slawek Kaplonski proposed openstack/project-config master: Switch py35 periodic jobs to py36 in Neutron's dashboard https://review.opendev.org/655178	15:01
openstackgerrit	Nicolas Hicher proposed openstack/diskimage-builder master: openssh-server: enforce sshd config https://review.opendev.org/653890	15:02
AJaeger	cmurphy: can I just say "gem install bundler < 2.3.0" ? Or what's the syntax?	15:02
cmurphy	AJaeger: i think with a -v	15:03
cmurphy	or --version	15:03
AJaeger	thanks	15:04
amoralej	clarkb, another one http://paste.openstack.org/show/749648/ this seems failed with unreachable and then remote host identification has changed	15:04
amoralej	node redeployed?	15:05
clarkb	remote host id changing is often due to neutron reusing IPs	15:05
clarkb	(yay dogfooding)	15:05
fungi	okay, i think i'm caught up on scrollback in here, so hopefully after my conference call i can help fix some of the new broken	15:05
openstackgerrit	Andreas Jaeger proposed opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 and bundler 2.0.1 https://review.opendev.org/655143	15:06
AJaeger	cmurphy, clarkb, next try ^	15:06
amoralej	clarkb, i have some consoles where i just see regular messages and suddenly --- END OF STREAM ---	15:07
amoralej	not sure if i'm losing messages if i'm not in the console window or something	15:07
*** yamamoto has joined #openstack-infra		15:07
*** zul has joined #openstack-infra		15:08
clarkb	amoralej: if the networking is completely broken it won't be able to stream the data off the host anymore	15:08
*** ykarel is now known as ykarel\|away		15:08
*** gyee has joined #openstack-infra		15:09
*** yamamoto has quit IRC		15:14
*** ccamacho has joined #openstack-infra		15:17
clarkb	amoralej: looking at message:"REMOTE HOST IDENTIFICATION HAS CHANGED" AND filename:"job-output.txt" in logstash it seems that infra jobs are the biggest problem with that particular error and that is a centos7 specific issue	15:17
clarkb	otherwise it affects multiple clouds and multiple images	15:17
clarkb	the vast majority are on a single zuul executor but also affects multiple zuul executors	15:18
clarkb	I wonder if it is the executors that are at least part of the problem	15:18
clarkb	based on that data we are retrying properly and jobs are eventually rerunning and passing	15:20
clarkb	(at least in some cases)	15:20
fungi	"The fingerprint for the ED25519 key sent by the remote host is\nSHA256:..."	15:20
fungi	umm	15:20
clarkb	it also peaked a couple hours ago and seems to be tapering off now	15:20
clarkb	fungi: its a known issue that neutron will reuse IPs in some clouds causing these failures then ARP fights happen	15:21
amoralej	clarkb, yeah, jobs are running again, let's see how this run goes	15:21
clarkb	fungi: we could potentially avoid some of that struggle if we were able to ipv6 mroe aggressively	15:21
amoralej	so far, some jobs are passing or properly failing	15:21
amoralej	with no infra issues	15:21
clarkb	ya lets monitor it. The graph data implies it could've been a provider blip that has been corrected	15:21
AJaeger	do we want to add ovh-bhs1 back again? I tried pinging amorin here but never got a reply...	15:21
*** lpetrut has quit IRC		15:22
clarkb	AJaeger: I think we can try it and turn it off again if it is still safd	15:22
fungi	the failures matching the query you provided span all rax regions as well as ovh-gra1	15:22
AJaeger	clarkb: change is https://review.opendev.org/#/c/653879/ - I'll +2 now	15:23
clarkb	fungi: ya our ipv4 clouds :)	15:23
fungi	though yeah the biggest volume in the past day was around 13:00 to 13:30 and almost exclusively in ovh-gra1	15:24
clarkb	fungi: and a single infra job	15:24
fungi	so the rax hits may be rogue vms	15:24
fungi	oh, yep, puppet-beaker-rspec-centos-7-infra for the big spike	15:24
fungi	strange correlation	15:24
fungi	also do we still need the puppet-beaker-rspec-centos-7-infra job?	15:25
clarkb	no I thought I had removed it	15:26
clarkb	(we should do further cleanup on those as necessary)	15:26
fungi	so anyway, we assume the key mismatch errors are unrelated to the retry_timeout results	15:27
fungi	doesn't seem to be any overlap of significance	15:27
clarkb	the key mismatches will cause retries	15:27
clarkb	and if you get 3 in a row a retry_limit error	15:27
clarkb	depends on whether or not the error happens in pre run	15:27
fungi	this one reported roughly 50 minutes ago and seems to be a sudden disconnect: http://logs.openstack.org/78/655078/1/check/tacker-functional-devstack-multinode-python3/5ed80d7/job-output.txt.gz#_2019-04-23_14_39_43_446923	15:28
AJaeger	clarkb, cmurphy , https://review.opendev.org/#/c/655143/ did not work ;(	15:29
fungi	Setting up conntrackd was the last thing it was doing... will look for a pattern	15:29
clarkb	fungi: that is suspicious	15:29
fungi	yeah	15:29
clarkb	AJaeger: I think 2.0.0 also requires ruby >= 2.30	15:30
clarkb	AJaeger: see https://rubygems.org/gems/bundler/versions/2.0.0	15:30
AJaeger	I requrest y 2.0.0 now	15:30
clarkb	AJaeger: so we may want < 2.0.0	15:30
fungi	probably not conntrackd... this other one died around the same time in the same way but while installing different packages: http://logs.openstack.org/78/655078/1/check/tacker-functional-devstack-multinode/2900b2b/job-output.txt.gz#_2019-04-23_14_39_49_829832	15:30
clarkb	fungi: that could be a race in buggering	15:30
clarkb	fungi: the manpages happen just before conntrack in your first log	15:30
fungi	possible...	15:31
clarkb	*buffering	15:31
fungi	yeah, both apparently ;)	15:31
openstackgerrit	Andreas Jaeger proposed opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 and bundler 2.0.1 https://review.opendev.org/655143	15:31
clarkb	we only need trusty for a few more days hopefully :/ and then we can remove testing for it as well as centos7	15:32
fungi	this one died while configuring swap: http://logs.openstack.org/95/654995/1/check/designate-pdns4-postgres/21337d4/job-output.txt.gz#_2019-04-23_13_45_34_585017	15:32
clarkb	fungi: I wonder if those timestamps coincide with the other network issues	15:32
clarkb	like maybe our executors had broken ipv4 networking during that time	15:33
clarkb	(or something along those lines)	15:33
fungi	i was starting to have a similar suspicion, maybe network issues in or near rax-dfw?	15:33
openstackgerrit	Andreas Jaeger proposed opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 and bundler 2.0 https://review.opendev.org/655143	15:33
clarkb	ya	15:34
fungi	is https://rackspace.service-now.com/system_status/ blank for anyone else?	15:35
clarkb	not for me	15:35
clarkb	there is nothing listed there	15:35
clarkb	I mean its not blank but no issues posted either	15:36
clarkb	perhaps was upstream of them and they didn't even notice	15:36
fungi	firefox just give me a blank page for it. oh well	15:36
fungi	er, gives	15:36
AJaeger	fungi: works on firefox for me - but takes a bit to load, seems to use some javascript...	15:37
*** dustinc_away is now known as dustinc		15:37
*** helenafm has quit IRC		15:37
fungi	yeah, looks from the page source like it's opening a new window or something. hard to tell what exactly... end result is i get no content but also my privacy extensions aren't reporting blocking anything	15:38
openstackgerrit	Merged opendev/elastic-recheck master: Add query for VolumeAttachment lazy load bug 1826000 https://review.opendev.org/655177	15:38
openstack	bug 1826000 in Cinder "Intermittent 500 error when listing volumes with details and all_tenants=1 during tempest cleanup" [Undecided,Confirmed] https://launchpad.net/bugs/1826000	15:38
*** jamesdenton has joined #openstack-infra		15:38
*** kjackal has quit IRC		15:41
*** ccamacho has quit IRC		15:46
*** ccamacho has joined #openstack-infra		15:47
smcginnis	So we've gotten a lot more failures with ensure-twine being run in a virtualenv.	15:49
smcginnis	Still not sure where that is coming from.	15:49
smcginnis	Would it make sense to add a check in https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-twine/tasks/main.yaml to look for {{ lookup('env', 'VIRTUAL_ENV') }} and drop the --user if set?	15:50
*** e0ne has quit IRC		15:50
smcginnis	Probably also add a debug print of VIRTUAL_ENV so we can figure out where it's coming from too.	15:50
szaher	sts	15:51
szaher	sts	15:52
amoralej	clarkb, i was tracking 6 different jobs, all failed, i've pasted info about nodes and error messages in http://paste.openstack.org/show/749650/	15:53
amoralej	there are different error messages	15:53
amoralej	although it seems all may be network related	15:54
*** sshnaidm is now known as sshnaidm\|afk		15:54
clarkb	smcginnis: you might have to do a python script instead if it isn't part of the env but is instead coming from the executable path. And ya I think adding that debugging info would be great	15:54
frickler	boden: fyi, codesearch should be back to work now	15:54
clarkb	amoralej: were any of them retried?	15:55
amoralej	retries are in queue	15:55
*** ykarel\|away has quit IRC		15:55
smcginnis	clarkb: Ah, I assumed the environment would have been activated since we are just calling "python3" and getting that error.	15:55
amoralej	well at least one has reached retry_limit	15:55
boden	frickler great thanks much! so is the long term plan to use the "explore" from opendev, or to keep the "hound" code search?	15:55
amoralej	and all of them were retries of previous failed run	15:55
clarkb	smcginnis: there are two major ways to venv. One is via env vars. The other is to run python out of a venv directly (maybe our path is being munged?)	15:56
mordred	boden: I think we need to discuss the long-term plan ... I thnik we'd like to be able to collapse thigns and not need to run both codesearch and gitea ... but the searching in "explore" has some deficiencies at the moment and I think we need to discuss what needs to be or can be done and what the plan will be	15:57
weshay\|rover	sorry to bug you guys, are the status from https://review.opendev.org/#/c/616306/15/releasenotes/notes/resource-usage-stats-bfcd6765ef4a9c86.yaml public or something only avail to infra.. if so I was hoping to represent the stats at the tripleo mtg at ptg	15:57
mordred	boden: which si to say - I don't think there is yet a full plan - more like a latent desire	15:57
* weshay\|rover trying to see how tripleo performed in being a good upstream citizen in stein		15:57
*** dave-mccowan has joined #openstack-infra		15:57
clarkb	weshay\|rover: they should be in the same graphite server	15:58
mordred	weshay\|rover: yes - those are in graphite	15:58
clarkb	weshay\|rover: but that change is not merged yet	15:58
boden	mordred ack and thanks for the info and everyones help on this	15:58
mordred	boden: sure thing! thanks for the patience, I'm glad we were able to get hound back up and running properly :)	15:58
weshay\|rover	k.. thanks	15:59
clarkb	weshay\|rover: I can do a log parsing run in a bit to give you numbers for the last 30 days	15:59
weshay\|rover	clarkb k.. I know ur busy, it's not critical.. but a nice to have :)	15:59
clarkb	well I'm waiting for test results on AJaeger's puppet testing fix so I have a few minutes now :)	16:00
clarkb	amoralej: my hunch is that we've got ongoing instability in ipv4 networking between our executors and test clouds that ipv4	16:01
clarkb	fungi: ^ any ideas on testing that more directly	16:01
clarkb	fungi: mtr between ze0* and ovh and rax-iad/ord?	16:01
clarkb	weshay\|rover: http://paste.openstack.org/show/749651/	16:02
clarkb	amoralej: limestone and vexxhost are talked to via ipv6. Inap is our other ipv4 cloud. If we can find evidence of trouble to vexxhost or limestone we may be able to rule out this theory	16:03
amoralej	clarkb, that'd make sense, i'm trying to find some pattern in logstash	16:04
*** quiquell\|rover is now known as quiquell\|off		16:04
weshay\|rover	clarkb thanks.. comparing to http://paste.openstack.org/show/736797/ 42.6 -> 24.8 not bad :)	16:05
corvus	clarkb: i started mtrs last week between ze01 and sjc1 v4/v6, rax-ord, rax-iad, rax-dfw, and google dns	16:05
*** jpich has quit IRC		16:05
clarkb	weshay\|rover: yup seems to have been steady progress since we started tracking it	16:05
corvus	sjc1v6 had some noticable packet loss, google dns had a very small amount, nothing on the others.	16:06
clarkb	corvus: are you doing ipv6 or ipv4 to rax-* ?	16:06
corvus	to clarify, those mtrs are still running	16:06
*** dave-mccowan has quit IRC		16:07
*** Lucas_Gray has quit IRC		16:07
corvus	clarkb: v4 for some reason	16:07
weshay\|rover	clarkb thanks for the help!	16:07
clarkb	corvus: I think that is what we want to know for this theory at least. Good to know there isn't any loss there	16:08
*** pgaxatte has quit IRC		16:09
corvus	clarkb: AJaeger was saying we still have bhs1 disabled; mnaser disabled it because of network errors, but then i think we've started to suspect that might have been the same errors we're seeing everywhere?	16:09
clarkb	corvus: ya and AJaeger has asked us to reenable bhs1 as a result	16:10
clarkb	let me find the change	16:10
mnaser	but I think at the time clarkb had a vm he was doing tests on	16:10
mnaser	and it was losing packets or whatnot	16:10
clarkb	corvus: https://review.opendev.org/#/c/653879/	16:10
clarkb	mnaser: yes but could have been related to general network sadness? it was a personal vm in ovh1-bhs1 that had trouble talking to cloudflare dns	16:10
clarkb	mnaser: those failures could still have been related to the same thing that is making our other traffic unhappy is what I was trying to say	16:11
corvus	i've added bhs1 to and inap my mtr screen on ze01	16:11
*** jbadiapa has quit IRC		16:11
mnaser	right, but at the time we found proof in unbound that some of these failed jobs failed to contact 1.1.1.1 too in unbound (but of course, that can be all gone now)	16:11
clarkb	infra-root https://review.opendev.org/#/c/655143/ passes now and should fix system-cofnig tests allowing us to fix git stack sizes	16:12
mnaser	but anyways, yes, there seems to be something weird going on	16:12
clarkb	if anyone can be second review on that real quick it would be much appreciated	16:12
mnaser	also, if someone wants to run mtr from outside rax to sjc1v6 .. in case there's actual issues	16:12
corvus	did i see a theory about executor localization?	16:12
corvus	+3	16:12
clarkb	corvus: localization like i18n? or physical location of executors playing a part? we did wonder if perhaps the problem was more on the executor side which would explain widespread impact	16:13
fungi	smcginnis: clarkb: i thought we were preinstalling twine on our executors for use in our release jobs. i wonder if the pip install error is due to us failing to actually preinstall it on some executors? or maybe failing to preinstall it for some interpreters?	16:13
corvus	clarkb: the second thing	16:13
clarkb	fungi: oh! the move to ansible venvs for zuul would explain that	16:13
clarkb	fungi: we have a list of things to install into those venvs and I wouldn't be surprised if twine is not on that	16:13
smcginnis	fungi: THere is a check for `which twine` that fails.	16:14
clarkb	and that may also explain the it's a virtualenv jim problem	16:14
corvus	clarkb: yeah, if it's widespread, it means one or more of the following: (a) the internet is broken (b) rax-dfw networking is broken (c) networking on one or more exectuors is broken	16:14
*** ijw has joined #openstack-infra		16:14
corvus	i was wondering if someone has correlated failures to suggest (c)	16:14
clarkb	no I don't think we haev managed to get beyond "it is a theoretical possibility"	16:15
smcginnis	clarkb, fungi: So should I worry about mucking with ensure-twine, update ensure-twine to not ever do --user, or wait for preinstalling to be fixed?	16:15
mnaser	I guess we can check if the # of failures per executor is higher	16:15
clarkb	smcginnis: we can probably wait for preinstall fix to be fixed if that is an intended feature	16:15
*** lucasagomes has quit IRC		16:16
amoralej	clarkb, looking for "RESULT_UNREACHABLE" message in the last 6hours it shows some high peaks	16:16
*** ykarel\|away has joined #openstack-infra		16:16
clarkb	amoralej: about 3 hours ago?	16:16
smcginnis	clarkb: OK, that's probably easiest for me. Do you think we should actually drop the ensure-twine role completely if there is an expectation that it will always be preinstalled?	16:17
amoralej	higher peak is at 15:45 - 15:50	16:17
amoralej	up to 124 in 5 minutes	16:17
openstackgerrit	Paul Belanger proposed zuul/zuul master: Use user.html_url for github reporter messages https://review.opendev.org/655188	16:17
smcginnis	And this is blocking some releases, so second question would be who is taking that action and do we have an ETA?	16:17
clarkb	amoralej: oh good to know	16:17
amoralej	also at around 16:40	16:17
clarkb	smcginnis: would you like to take the action?	16:17
fungi	smcginnis: the ensure-twine role, i think, is included in the job in the zuul-jobs standard library, for the benefit of folks who don't preinstall twine on executors	16:17
amoralej	and 14:20 - 14:25	16:18
smcginnis	clarkb: Not sure if I can take that one.	16:18
smcginnis	fungi: Sounds like it should probably be fixed then if there's a chance others may use this role in cases where it is not preinstalled.	16:18
smcginnis	Do I understand right that with a zuul change this will always be run within a venv?	16:18
smcginnis	In which case we just need to drop "--user" from the pip install.	16:19
fungi	i believe that behavior will depend on whether the given zuul deployment is configured to manage its own ansible installs, though i could be wrong	16:19
smcginnis	OK, so we probably do need to make that more robust to be able to handle both cases.	16:20
clarkb	corvus: any idea where the list of things to install into the zuul ansible venvs is?	16:20
fungi	in cases where ansible is not being run from a virtualenv, --user installs presumably work	16:20
corvus	clarkb: on it	16:20
clarkb	I'm not having good luck finding it but know that we had to add gear to it semi recently	16:20
smcginnis	Odd that we had a mix of those working. Maybe due to it being preinstalled some places but not others?	16:20
openstackgerrit	James E. Blair proposed opendev/puppet-zuul master: Install twine in executor Ansible environments https://review.opendev.org/655189	16:21
corvus	clarkb, fungi, smcginnis ^	16:21
clarkb	tyty	16:21
smcginnis	Thanks corvus	16:21
amoralej	clarkb, https://imgur.com/a/Xk7Qnk6 in case it helps	16:21
corvus	docs here: https://zuul-ci.org/docs/zuul/admin/installation.html#ansible	16:21
amoralej	there are failures in ovh and limestone-regionone too	16:22
clarkb	ok so failures in limestone imply that this isn't ipv4 specific	16:22
corvus	amoralej: can you correlate with zuul_executor and see if there's a pattern?	16:22
smcginnis	corvus: Does that need to also include the others here: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-twine/tasks/main.yaml#L14	16:23
corvus	smcginnis: yep	16:23
*** kopecmartin is now known as kopecmartin\|off		16:23
*** mrhillsman is now known as openlab		16:23
*** openlab is now known as mrhillsman		16:23
corvus	smcginnis, clarkb, fungi: how did we used to have twine pre-installed?	16:24
clarkb	corvus: it is/was in system-config/manifests/site.pp on the zuul scheduler	16:24
amoralej	corvus, seems disperse	16:24
clarkb	er executor	16:24
*** mrhillsman is now known as openlab		16:24
corvus	clarkb: 'git grep twine' in system config is nil	16:24
clarkb	hrm no that is just gear ok I'm wrong	16:24
corvus	amoralej: thanks	16:24
*** openlab is now known as mrhillsman		16:25
fungi	ugh, firefox is back to showing a "corrupted content error" about network protocol violations when trying to browse opendev.org	16:25
corvus	perhaps we did not pre-install?	16:25
clarkb	corvus: ya that could be but python wasn't a venv and so --user worked	16:25
corvus	fungi: i restarted the haproxy ~2hours ago	16:25
fungi	ahh, maybe that's it	16:25
clarkb	so maybe the proper fix here is to simply fix it in the job / role	16:25
fungi	corvus: how could we have been using twine on the executors if it wasn't preinstalled? what options did we have for installing software on executors? is that allowed in bubblewrap?	16:26
clarkb	fungi: it was running the pip install	16:26
clarkb	(or could have run the pip install that it is running now)	16:26
amoralej	corvus, executors go from ze01..ze13 ?	16:27
amoralej	12 i meant	16:27
corvus	on ze01, python3 -e "import twine" -> ImportError: No module named 'twine'	16:27
corvus	amoralej: yes	16:27
fungi	so pip install --user was working under bwrap-managed homedirs i guess?	16:27
clarkb	fungi: yes that is my hunch	16:27
corvus	i WIP'd https://review.opendev.org/655189	16:28
fungi	i recall having discussions about needing tools we're going to run on executors preinstalled for security reasons, but i concur i don't see it in the puppet-zuul git history	16:28
*** smarcet has quit IRC		16:28
*** dtantsur is now known as dtantsur\|afk		16:28
clarkb	it is also possible that pip being mad about --user from in a venv is new	16:29
corvus	we certainly need system packages pre-installed, but installing a python package from our mirror should be 99.99999% reliable	16:29
corvus	and permitted by bwrap (at least when run in a trusted playbook)	16:29
*** smarcet has joined #openstack-infra		16:30
fungi	which also brings me back to wondering why this was only failing sporadically on some builds and then worked when we reenqueued them, but has now started to fail consistently	16:30
corvus	that's an improvement	16:30
*** whoami-rajat has quit IRC		16:30
fungi	over the course of a week or so	16:31
clarkb	fungi: image updates?	16:31
clarkb	no we run on the executor	16:31
amoralej	corvus, http://paste.openstack.org/show/749655/ note a single failed job can have more that one RESULT_UNREACHABLE message, i'm trying to clean it more	16:32
fungi	so analyzing a recent release ensure-twine failure of this nature: http://logs.openstack.org/02/02dc0019af5f47d1850781b83e6041201054e1c5/release/release-openstack-python/9e49a6f/job-output.txt.gz#_2019-04-22_21_30_35_950934	16:34
clarkb	corvus: looking at ze01 there are not very manyssh connections (`sudo lsof -n -i TCP`) and I can't hit one of the hosts that is SYN_SENT and not ESTABLISHED from home	16:34
clarkb	corvus: implying that the host is actually not reachable	16:34
clarkb	I'm going to boot a testnode or three in rax iad	16:35
clarkb	out of band of nodepool and see if they are reachable from the executors and home	16:36
corvus	the jobs are a mix of devstack and non-devstack (eg, osa), right?	16:36
corvus	(the unreachable node jobs)	16:36
mnaser	yep ^	16:36
fungi	corvus: yes, and tox stuff too from what i saw	16:36
*** rcernin has quit IRC		16:36
mnaser	and I have some OSA failures that just failed in a super random spot (so nothing having to do with network related operations)	16:36
fungi	a lot of jobs just terminate partway through (at different points) and declare ssh unreachable and the console stream prematurely ending	16:37
*** bobh has joined #openstack-infra		16:37
AJaeger	fungi: care to change your WIP to +2A on https://review.opendev.org/#/c/653018/ to give us imap back? See last comment there...	16:37
fungi	done	16:38
AJaeger	thanks	16:38
clarkb	fungi: yup and looking at executor tcp connections there aren't a ton of ssh connections	16:39
*** whoami-rajat has joined #openstack-infra		16:39
fungi	okay, so on the ensure-twine problem... i have to assume that the virtialenv it's talking about in the error is the one zuul is managing for ansible... if so, that's going to be outside bwrap and mapped in read-only so a regular pip install without --user won't work, right?	16:40
clarkb	fungi: yes	16:40
amoralej	clarkb, it seems there has been another peak right 10 minutes ago	16:43
clarkb	amoralej: I wonder if that has to do with job runtimes and our use of ssh control persist	16:43
*** ginopc has quit IRC		16:44
fungi	most recent pip release was over a month ago, most recent virtualenv release was nearly a month ago, so those don't really seem to line up with when we started seeing the ensure-twine failures	16:45
clarkb	fungi: virtualenv updates its pip on install now. possible it is just pip and not virtualenv?	16:45
*** mattw4 has joined #openstack-infra		16:45
*** rossella_s has quit IRC		16:45
fungi	well, neither released new versions around the time we started to see the problem, which was after the stein release	16:46
*** tosky has quit IRC		16:46
*** udesale has quit IRC		16:47
clarkb	I have 3 hosts up in rax-iad now all show zero packet loss to various executors. They all ended up in the same /24 though (so if network range problem this may not expose it)	16:48
*** eharney has quit IRC		16:48
amoralej	clarkb, for the reviews i've been closely monitoring, failing jobs are long running ones, but it's hard to say if it's related to long jobs or just it's more likely that you hit network issues in long jobs	16:49
clarkb	amoralej: ya we use ssh control persistence to reduce the number of connections that have to made too. If there is network trouble it is possible that longer jobs are more likely to run into it given the mitigations we already have	16:50
openstackgerrit	Merged openstack/project-config master: Revert "Temporarily disable inap-mtl01 for maintenance" https://review.opendev.org/653018	16:50
*** altlogbot_2 has quit IRC		16:50
mordred	clarkb, amoralej: I have anecdotally observed the same thing - the issue seems to be most seen while running a long-running single shell task	16:50
mordred	but the plural of anecdote isn't data - so I don't know if that's a real thing or just what I happen to have observed	16:51
amoralej	mordred, yes, that's also my case	16:51
openstackgerrit	Jason Lee proposed opendev/storyboard master: WIP: Second correction to Loader in preparation for Writer Update https://review.opendev.org/654812	16:52
*** jpena is now known as jpena\|off		16:52
fungi	well, also longer-running jobs are simply statistically more likely to fall victim to a random network problem	16:52
mordred	fungi: yes, this is a very accurate statement	16:52
clarkb	also nodepool checks ssh connectivity before giving the node to a job	16:52
mordred	and within those jobs, long-running single tasks are statistically more likely to be the thing that hits it	16:52
fungi	yep	16:53
clarkb	so we know that networking works well enough for that to be successful before zuul gets the node. I am going to let my test nodes sit around a for a bit as a result and see if they look worse in an hour	16:53
clarkb	we may also want to sanity check nodepool isn't getting duplicate IPs	16:53
clarkb	maybe we are our own noisy neighbor type situation	16:53
*** altlogbot_3 has joined #openstack-infra		16:55
clarkb	as a spot check we do recycle ip addrs but they are not during overlapping time periods (if arp was not updating properly we might see this behavior)	16:56
fungi	i guess our job logs don't actually say where the ansible they're running is installed on the executor? at least i can't seem to find that information. also the docs for zuul-manage-ansible don't say how it installs ansible... the versioned trees under /var/lib/zuul/ansible/ on our executors don't look like virtualenvs either	16:56
*** derekh has quit IRC		16:57
clarkb	I need to step out for breakfast I'll be back in a bit to look into this networking stuff more	16:58
fungi	digging into the AnsibleManager class definition now	16:58
fungi	aha, https://zuul-ci.org/docs/zuul/admin/components.html#attr-executor.ansible_root	17:00
corvus	fungi: the debug log says it's running /usr/lib/zuul/ansible/2.7/bin/ansible-playbook	17:00
fungi	oh, i bet that bindir is somehow mapped into the bwrap context	17:02
fungi	no, nevermind	17:02
fungi	/usr/lib not /var/lib	17:02
corvus	here's a full example command: http://paste.openstack.org/show/749657/	17:02
fungi	i guess <zuul_install_dir> is /usr in our case	17:03
corvus	fungi: i think the ansible venv is in /usr/lib and the zuul modules (which are also versioned) are in /var/lib	17:03
fungi	fhs tunnel vision, i don't typically expect anything besides the system package manager to add things in /usr, thanks!	17:04
fungi	i kept looking at that and my brain was automatically substituting /var	17:04
corvus	yeah, maybe we should change that	17:05
fungi	not super critical, just me with distro-oriented blinders on	17:05
*** jbadiapa has joined #openstack-infra		17:05
fungi	was trying to run this down from first principles and validate our assumptions about how/where it's trying to install twine	17:06
fungi	so on ze01 the ansible venvs are all using python 3.5.2 and pip 19.0.3	17:08
fungi	latest version from february 20	17:09
fungi	ahh, right, i can't calendar. the latest pip and virtualenv versions are from two months ago, not a month ago, so even less correlated to the start of these failures	17:09
fungi	we're now in april	17:10
*** nicolasbock has quit IRC		17:10
fungi	so looks like the ansible venvs on ze01 were created on march 18, still much longer ago than ensure-twine started popping this error	17:10
fungi	same creation timestamp on all 12 executors	17:12
fungi	and definitely no twine installed in any of them right now	17:14
*** gagehugo has joined #openstack-infra		17:15
fungi	no twine executable in the default system path for any of the executors either	17:15
fungi	also as previously established, the last change in git for the ensure-twine role merged january 29	17:17
*** ijw has quit IRC		17:18
*** ijw has joined #openstack-infra		17:19
*** ijw has quit IRC		17:20
*** e0ne has joined #openstack-infra		17:20
*** ijw has joined #openstack-infra		17:20
fungi	we're calling ensure-twine from the release-python playbook in opendev/base-jobs which was fixed to add that role on april 3	17:20
*** rpittau is now known as rpittau\|afk		17:20
*** Weifan has joined #openstack-infra		17:21
fungi	my notes say the first recorded case of this particular failure was 2019-04-17 in http://logs.openstack.org/19/19a7574237f44807b16c37e0983223ff57340ba3/release/release-openstack-python/769f856/	17:22
*** Weifan has quit IRC		17:22
fungi	so roughly 6 days ago	17:23
*** Weifan has joined #openstack-infra		17:23
*** ijw has quit IRC		17:23
openstackgerrit	Paul Belanger proposed zuul/zuul master: Add retries to getPullReviews() with github https://review.opendev.org/655204	17:23
clarkb	I can still ssh into my three test nodes in iad after leaving them be for a while	17:23
*** ijw has joined #openstack-infra		17:23
*** e0ne has quit IRC		17:24
fungi	clarkb: i wonder, can you start up a nc on both ends and connect them to each other with no traffic for a while, then see if they get disconnected (or simply stop passing traffic)?	17:24
clarkb	could be worth testing. I'm rotating out one of the three to see if a new one immedaitely after ad elete has any interesting behavior (since that is what ndoepool does)	17:25
fungi	yeah, wondering if the failures we see couldn't be some stateful network device losing its sh^Htates	17:26
fungi	or aggressively dropping inactive ones	17:27
clarkb	ya	17:27
clarkb	(at some point I really need to start putting together a project update and figuring out a summit schedule)	17:27
clarkb	(so please don't assume I should be the only one debugging this stuff :) )	17:27
fungi	also, one thing which can cause this... packet shapers. i'm going to look and see if there is rate limiting evidence in the cacti graphs for our executors	17:28
clarkb	++ thanks	17:28
*** psachin has quit IRC		17:28
clarkb	oh and we have a meeting in an hour and a half and the ptg to plan for	17:28
fungi	meh, "priorities" ;)	17:28
*** diablo_rojo has joined #openstack-infra		17:29
clarkb	fungi: this is where you say "baord meetings are for writing project updates" ?	17:29
clarkb	btw I also checked dmesg on ze01 for any evidence of say OOMKiller and found nothing	17:30
clarkb	and syslog lacks complaints from ssh	17:30
*** jamesmcarthur has quit IRC		17:30
corvus	i'm working on fixing our local patch to gitea	17:30
fungi	cacti says ze01 is running pretty tight on available memory, but i suppose that's our ram governor at work. the others are almost certainly similar	17:30
*** jamesmcarthur has joined #openstack-infra		17:31
corvus	it's not easy because every time i try to run the unit tests, my printer start spewing garbage	17:31
fungi	hah	17:31
clarkb	corvus: at least it isn't on fire?	17:31
fungi	i hope you don't run out of greenbar	17:31
corvus	yeah.. maybe don't print the randomly generated binary test data to stdout?	17:32
fungi	yeesh	17:32
clarkb	what are teh chances this is an ssh/ansible issue?	17:32
clarkb	(just wondering if we need to explore that too)	17:33
fungi	cacti seems to only occasionally be able to reach ze02, but this doesn't look like new behavior: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64158&rra_id=all	17:33
corvus	there have been ansible releases and we should be auto-upgrading them	17:33
clarkb	2.7.latest is what we should be using right?	17:33
clarkb	by default at least	17:33
clarkb	2.7 updated on april 20 according to timestamps on files	17:34
clarkb	but last 2.7 release was april 4	17:35
*** jamesmcarthur has quit IRC		17:35
clarkb	we did manage to emrge an openstack manuals chagne so maybe not all is lost :)	17:36
fungi	oh, that was just the documentation update which says "all is lost"	17:37
fungi	i'm up to ze06 so far... no obvious signs of network rate-limiting or anything else especially anomalous on the graphs which might coincide with these ssh problems	17:38
fungi	cacti seems to have collected no data whatsoever on ze10	17:41
clarkb	at this point I've sent several thousand mtr tracer pings and not a single one was lost between iad and 3 executors over ipv4	17:42
fungi	so aside from the fact that cacti can't reliably reach ze02 and ze10 over ipv6 (other servers where we saw this, i think deleting and recreating the network port got it working?) i see nothing on the cacti graphs for our executors which would explain the ssh connection issues	17:44
*** ccamacho has quit IRC		17:54
clarkb	I've just ssh'd into every ipv4 host connected to from ze01	17:54
clarkb	and they all worked	17:54
*** bobh has quit IRC		17:55
openstackgerrit	Merged opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 and bundler 2.0 https://review.opendev.org/655143	17:55
jrosser	fwiw the log I grabbed here was an ipv6 fail http://paste.openstack.org/show/749635/	17:55
clarkb	woo I can recheck the gitea git stack fix now	17:55
fungi	i wonder, should we set some autoholds and then when one catches a node which fell victim to this behavior check connectivity and/or nova console?	17:55
clarkb	jrosser: thanks. In this case I didn't check ipv6 because no ipv6 at home	17:56
*** amoralej is now known as amoralej\|off		17:56
clarkb	but ya I think we amoralej dig up some limestone failures taht were similar and also ipv6	17:56
clarkb	fungi: ya that might be more efficient than me trying to manually boot one that fails	17:56
fungi	i wonder how broad of an autohold i can add	17:56
clarkb	I'm going to delete my three test instances in iad now since they haven't shown me anything useful	17:58
clarkb	corvus: the docker registry backed by swift change merged yesterday iirc	18:00
clarkb	corvus: is that something we hsould followup and check on?	18:00
corvus	clarkb: i suspect we will need to restart it to make it go into effect, and it would be good to do that in conjuctions with watching some jobs	18:02
mordred	corvus, clarkb: I can take that - y'all seem to be having fun diagnosing network issues :)	18:05
corvus	mordred: well, i'm mostly working on the gitea change	18:05
corvus	but yes, also "fun"	18:05
mordred	corvus, clarkb: I can take that - y'all seem to be having fun diagnosing network issues and that gitea change :)	18:05
clarkb	mordred: thanks	18:05
clarkb	I need to switch gears into prepping for the meeting un just a moment so not sure how much network debugging I'll be doing for a bit	18:06
mnaser	maybe a little silly but perhaps someone shooting off an email to rax about if there is any network changes might be productive	18:06
mnaser	maybe there's some firewall or network appliance that was recently setup which affects our type of workloads	18:06
clarkb	cloudnull: ^ you about?	18:07
mnaser	just an extra useful datapoint	18:07
corvus	i finally have gotten the gitea sqlite integration tests (the ones that are failing on my pr) to pass and run on master. it turns out the procedure for running them is not the one in the docs, or in integrations/README.md, but rather, is only documented in the drone.yml config file.	18:11
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Blackhole spam for airship-discuss-owner address https://review.opendev.org/655227	18:12
clarkb	http://paste.openstack.org/show/749660/ is a specific example I dug up ansible logs for	18:16
clarkb	any idea if the complaint about the inventory not being in the desired format is potentially related?	18:17
clarkb	like maybe we cannot connect becuse we broke the inventory somehow?	18:17
*** ricolin has quit IRC		18:17
clarkb	that was an ipv6 host in sjc1 too fwiw	18:19
openstackgerrit	Monty Taylor proposed opendev/base-jobs master: Update opendev intermediate registry secret https://review.opendev.org/655228	18:19
clarkb	I think the next steps are fungi's hold idea and filing a ticket/sending email to rax. I've got to pop out and do some stuff before the meeting but I guess we'll pick back up there	18:20
mordred	corvus: ^^ we need to do that per-tenant, right?	18:20
fungi	yeah, i'm just to the point of fiddling with autohold now	18:20
mordred	corvus: or just the once in opendev/base-jobs is fine	18:20
fungi	i don't suppose there are any particular projects/jobs/changes which will be better choices for an autohold than others	18:21
clarkb	fungi: jobs that take longer to run	18:21
mordred	clarkb: https://review.opendev.org/655228 is needed for the registry stuff - gotta rekey on the client side too	18:21
clarkb	maybe tempest-full, a tripleo job, and an OSA job?	18:21
clarkb	mordred: k	18:21
fungi	i suppose i can push up some trivial dnm changes and set autoholds for some long-running jobs which will run against them	18:22
clarkb	also we may not hold on network failures?	18:22
clarkb	the example above that I pasted is being rerun aiui	18:22
clarkb	because ansbile reported it as a network failure to zuul	18:23
fungi	oh, yeah at best it'll hold the last build which ends in retry_limit i guess?	18:23
clarkb	ya	18:24
clarkb	assuming it actually gets there and doesn't magically work on the third attempt	18:24
openstackgerrit	Nicolas Hicher proposed openstack/diskimage-builder master: openssh-server: enforce sshd config https://review.opendev.org/653890	18:24
clarkb	ok really need to pop out for a bit now. Back soon	18:24
mordred	corvus, fungi: if you get a sec, https://review.opendev.org/#/c/655228/	18:26
*** e0ne has joined #openstack-infra		18:29
*** nicolasbock has joined #openstack-infra		18:31
*** eharney has joined #openstack-infra		18:39
clarkb	mnaser: bce3129d-8458-4947-b567-2c41311aab6a is the nova uuid of the node above that failed in sjc1. Might be worth sanity checking it to make sure that it didn't crash qemu/kvm (perhaps related to image updates or something)	18:42
*** jamesmcarthur has joined #openstack-infra		18:44
*** jamesmcarthur_ has joined #openstack-infra		18:45
*** happyhemant has quit IRC		18:48
*** jamesmcarthur has quit IRC		18:49
*** smarcet has quit IRC		18:53
corvus	fungi, clarkb: yeah, you can autohold all failures if you want. retry_limit will trigger autohold, but only on the last.	18:56
corvus	that doesn't seem to be a problem though.	18:56
corvus	there's a change in review to allow you to specify the result states for an autohold. :/	18:56
clarkb	corvus: I think the issue is we'll only be able to hold it if it gets to retry_limit	18:56
clarkb	which is maybe good enough	18:56
corvus	clarkb: right. but plenty of jobs are doing that :)	18:56
*** ykarel\|away has quit IRC		18:57
fungi	cool, these are the autoholds i added: http://paste.openstack.org/show/749665/	18:57
corvus	we will also get the network failures that happen in run and post-run playbooks, though those will be harder to triage out from regular failures	18:57
fungi	will check periodically to see what we catch in the trap and throw back any which aren't keepers	18:57
fungi	it's eerily like crabbing	18:58
clarkb	especially when you throw back 99% of what you get	18:58
*** Weifan has quit IRC		18:58
fungi	yeah	18:58
clarkb	infra meeting in a few minutes	18:58
mordred	fungi: i don't think the keepers in this case will be very tasty	18:58
clarkb	join us in #openstack-meeting	18:58
corvus	you only want jobs larger than a certain size	18:59
fungi	though it looks like my third zuul autohold command isn't returning control to the terminal	18:59
fungi	and it's been a few minutes	18:59
fungi	maybe reconfigure in progress	18:59
corvus	fungi: yeah, be patient	18:59
corvus	well.. hcm	19:00
fungi	"crabbing suspended: ocean reloading, please wait"	19:00
corvus	it... could be that the scheduler is out of ram	19:00
fungi	oof	19:00
corvus	apparently there is a memory leak as of our april 16 restart	19:01
*** e0ne has quit IRC		19:01
mwhahaha	any particular reason why centos7 jobs are RETRY_LIMITing?	19:02
mwhahaha	see https://review.opendev.org/#/c/654648/ (only centos7 jobs did)	19:02
corvus	i will restart scheduler now	19:02
clarkb	mwhahaha: it is affecting all jobs	19:02
clarkb	mwhahaha: we've been trying to sort it out for much of the morning	19:02
mwhahaha	k	19:02
corvus	help appreciated	19:02
mwhahaha	it didn't seem to affect the bionic jobs on that one	19:03
mwhahaha	weird	19:03
corvus	clarkb, fungi: what do you think about taking the opportunity to restart all executors?	19:03
fungi	wow, yeah we've been swapping on the scheduler since ~12:30z today	19:03
fungi	corvus: seems like a good idea	19:03
clarkb	corvus: maybe even reboot them if it is possible ssh issues are related to the system	19:03
fungi	yes, exactly my thoughs	19:03
corvus	they have been running for a long time -- just incase some cruft has accumulated	19:03
fungi	thoughts	19:04
fungi	should we use the full restart playbook for this?	19:04
corvus	fungi: almost -- if we want to reboot that's an extra step	19:05
corvus	so i'll just do it manually	19:05
fungi	ahh, yeah i missed clarkb's use of the word "reboot"	19:05
fungi	status notice the zuul scheduler is being restarted now in order to address a memory utilization problem; changes under test will be reenqueued automatically	19:07
fungi	that look sufficient?	19:07
corvus	fungi: ++	19:07
clarkb	fungi: ++	19:07
fungi	#status notice the zuul scheduler is being restarted now in order to address a memory utilization problem; changes under test will be reenqueued automatically	19:07
openstackstatus	fungi: sending notice	19:07
openstackgerrit	Nate Johnston proposed openstack/project-config master: Track neutron uwsgi jobs move to check queue https://review.opendev.org/655234	19:08
-openstackstatus- NOTICE: the zuul scheduler is being restarted now in order to address a memory utilization problem; changes under test will be reenqueued automatically		19:08
openstackstatus	fungi: finished sending notice	19:10
corvus	still waiting on execs to stop	19:12
corvus	all stopped	19:14
corvus	i will reboot all mergers and executors	19:14
fungi	thanks	19:15
*** kjackal has joined #openstack-infra		19:16
smcginnis	clarkb, fungi, corvus: Sorry, I had to step out for awhile. Do we still need an update to ensure-twine to check whether to install with --user or not? I saw a comment that I think was saying it might not help, but I wasn't really sure.	19:18
corvus	executors and mergers are up and running	19:19
corvus	restarting sched now	19:19
fungi	smcginnis: indeterminate. i went back to the drawing board trying to confirm how things could have been working previously and working up a timeline of what we know changed when	19:21
fungi	because it's still baffling	19:21
smcginnis	Yeah, very baffling.	19:22
smcginnis	I would feel much better if we understood what changed that caused this. That first one that worked after a reenqueue was odd.	19:22
corvus	er, neat. something killed the scheduler	19:24
mordred	corvus: "awesome"	19:24
Shrews	wow	19:26
corvus	we've seen this once before, also never found out what it was	19:26
corvus	trying again	19:26
*** nicolasbock has quit IRC		19:27
*** nicolasbock has joined #openstack-infra		19:27
*** wehde has joined #openstack-infra		19:29
wehde	Can anyone help me figure out a neutron issue?	19:29
*** igordc has joined #openstack-infra		19:29
*** jamesmcarthur_ has quit IRC		19:32
corvus	loaded	19:32
corvus	re-enqueueing	19:32
corvus	#status log restarted all of Zuul at commit 6afa22c9949bbe769de8e54fd27bc0aad14298bc due to memory leak	19:32
openstackstatus	corvus: finished logging	19:32
*** Weifan has joined #openstack-infra		19:34
*** smarcet has joined #openstack-infra		19:37
openstackgerrit	Monty Taylor proposed opendev/system-config master: Use internal gitweb instead of gitea for now https://review.opendev.org/655238	19:37
*** Weifan has quit IRC		19:39
paladox	Ah you use logging too :) (though your bot appears to have "status" too).	19:41
*** kjackal has quit IRC		19:42
fungi	and we have it send notices to numerous irc channels, and in extreme cases also update channel topics about ongoing situations	19:42
fungi	and all the entries get recorded at https://wiki.openstack.org/wiki/Infrastructure_Status (for the moment anyway)	19:42
paladox	That's nice! (that would be useful)	19:43
paladox	our's get's logged to multiple places	19:43
fungi	i think ours also tries to tweet thnigs, but i'm not sure where since i'm not really into social media	19:43
paladox	heh (i know ours does :))	19:43
fungi	there was talk of an rss/atom feed as well	19:43
paladox	someone logged spam.	19:43
mordred	fungi, paladox: https://twitter.com/openstackinfra	19:44
fungi	ahh, there it is	19:45
paladox	The bot can log here https://wikitech.wikimedia.org/wiki/Nova_Resource:<project>/SAL (if it's a WMCS project) otherwise things get logged here https://wikitech.wikimedia.org/wiki/Server_Admin_Log	19:45
paladox	ah	19:45
fungi	corvus: clarkb: should i readd my earlier autoholds, or do we want to just watch the system for a bit first and see if the problem resurfaces?	19:47
mordred	smarcet: o hai. I told summit.openstack.org to sync with my google calendar, but I dont have any summit sessions on my calendar. you have all the magical fixing powers right?	19:47
fungi	worst bug report evar	19:48
corvus	fungi: good q, and i'm too hungry to come up with an answer	19:48
clarkb	if we can remove autoholds after the fact without them triggering I say add them	19:49
clarkb	otherwise maybe watch and see	19:49
fungi	yeah, about to find food as soon as the infra meeting is over	19:49
fungi	clarkb: oh, the scheduler restart took care of removing the autoholds for us, which is why i asked ;)	19:49
clarkb	ah	19:49
paladox	fungi this is our's https://twitter.com/wikimediatech	19:50
openstackgerrit	Sean McGinnis proposed zuul/zuul-jobs master: ensure-twine: Don't install --user if running in venv https://review.opendev.org/655241	19:56
smcginnis	fungi, clarkb, corvus: Newbie yet, so would appreciate feedback on that approach. ^	19:56
clarkb	corvus: want to direct enqueue 654634?	19:57
openstackgerrit	Sean McGinnis proposed zuul/zuul-jobs master: ensure-twine: Don't install --user if running in venv https://review.opendev.org/655241	19:57
* clarkb smells the curry that was made for lunch and wanders downstairs		19:58
fungi	i have shrimp risotto to get to	19:58
fungi	smcginnis: i suspect the problem for us is going to be that the virtualenv from which ansible is run is read-only for the jobs, so they're not going to be able to pip install anything into it	19:59
mordred	smarcet: nevermind. I don't know how to calendar apparently	20:00
smarcet	mordred: actually there are 2 ways of doing it	20:01
corvus	fungi, smcginnis: that's an avenue to explore, however, i'm not sure the ansible virtualenv will be "activated" for anything other than the ansible process...	20:01
smarcet	mordred: allow oauth2 permission to your calendar using the synching button and choose google as provider	20:02
fungi	smcginnis: probably we either need to have ansible invoke pip install --user under the system python (not exactly sure what the complexities of that are) or have it create a local virtualenv ni the workspace and pip install twine into that	20:02
smcginnis	corvus: I believe that will still pick up if we are running via a virtualenv python, even if the whole environment is not activated.	20:02
smarcet	mordred: or you could use the brand new option “	20:03
smarcet	GET SHAREABLE CALENDAR LINK	20:03
smarcet	”	20:03
corvus	smcginnis: ansible is being run in a virtuaenv -- what ansible then runs is an open question	20:03
smarcet	mordred: from page https://www.openstack.org/summit/denver-2019/summit-schedule	20:03
smcginnis	Based on the failure, it would appear what gets run at least picks up that python executable.	20:03
corvus	smcginnis: do you have a link that shows that?	20:03
smcginnis	fungi: If that's true, we should drop the ensure-twine role completely as it likely will never work right.	20:03
corvus	all i've seen is the opaque error about user in a virtualenv	20:04
smcginnis	corvus: That was the cojecture^whypothesis easrlier as to why it is failing the pip install. It's not running in a virtualenv itself.	20:04
corvus	right, i'm saying i have doubts about that hypothesis and we should attempt to prove or disprove it rather than assume it is correct	20:05
diablo_rojo	clarkb, sorry I got distracted during the meeting, nothing new with storyboard-- still have a lot of patches to review. Planning a huge story triage/overhaul at the PTG Thursday morning. Thats about it.	20:05
fungi	agreed, it's also possible pip is confused and thinks it's being run from a virtualenv when it isn't	20:05
pabelanger	corvus: smcginnis: I believe, if ansible is using localhost, it will look to be inside a virtualenv for playbook task, however, if it uses ssh via localhost, it won't.	20:06
smcginnis	All I know is, releases are completely blocked until this issue is resolved.	20:06
corvus	anyway, i think the next step is for someone to write a job which exercises this stuff and gets some debug output	20:07
mordred	++	20:07
*** Weifan has joined #openstack-infra		20:07
*** jamesmcarthur has joined #openstack-infra		20:08
*** igordc has quit IRC		20:10
fungi	it's entirely possible, since "twine_python" is a variable we're passing to the role, that its value started being set to 'python3' recently and that's what started triggering this behavior? looking at the json, it's running this command: `python3 -m pip install twine!=1.12.0 readme_renderer[md]!=23.0 requests-toolbelt!=0.9.0 --user`	20:11
fungi	the default declared for twine_python in the role is just "python" not "python3"	20:12
*** Weifan has quit IRC		20:12
*** jamesmcarthur has quit IRC		20:12
fungi	the release-python playbook in opendev/base-jobs doesn't set it	20:13
fungi	nor does the opendev-release-python job in the same repo	20:14
*** Weifan has joined #openstack-infra		20:14
*** igordc has joined #openstack-infra		20:14
*** igordc has quit IRC		20:15
*** Lucas_Gray has joined #openstack-infra		20:16
fungi	aha, release-openstack-python as defined in openstack/project-config sets it according to http://zuul.opendev.org/t/openstack/job/release-openstack-python	20:16
fungi	https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L112-L129	20:18
fungi	its path to the ensure-twine role, for the record, is via https://opendev.org/openstack/project-config/src/branch/master/playbooks/publish/pypi.yaml	20:19
clarkb	fwiw the last ansible exitcode 4 was at 19:08UTC on ze01	20:22
fungi	git blame suggests the twine_python variable was set in that job as of november when https://review.opendev.org/616676 merged	20:22
fungi	so that's not what has caused this	20:23
smcginnis	Since that first solum patch failed, but then worked after a reenqueue, is there something changed in the nodes that could explain why one (assumingly newer) node would fail, but another would work as it has been until now? And as more nodes were updated the failure became more prevalent?	20:24
fungi	there are no nodes in this case, the ensure-twine role is running on the executor ("localhost" in the inventory)	20:26
smcginnis	Executors updated?	20:26
fungi	so been trying to figure out what could have changed on the executors on or around the 17th	20:26
*** pcaruana has quit IRC		20:32
openstackgerrit	Merged opendev/system-config master: Double stack size on gitea https://review.opendev.org/654634	20:33
clarkb	corvus: ^ finally	20:34
*** kgiusti has left #openstack-infra		20:34
clarkb	I think we are about half an hour from that applying	20:34
*** jamesmcarthur has joined #openstack-infra		20:39
*** jamesmcarthur has quit IRC		20:46
mordred	woot	20:50
mordred	clarkb, fungi: https://review.opendev.org/#/c/655238/	20:55
clarkb	hrm seems like we may still have some ssh failures just not as many of them?	20:55
clarkb	mordred: left a comment	20:56
clarkb	ze01 has three occurences of ansible exit code 4 in the last few minutes	20:57
*** jamesmcarthur has joined #openstack-infra		20:57
*** Goneri has quit IRC		20:59
clarkb	fungi: ^ if you haven't set up the autohold yet you may want to	20:59
corvus	gitea has an "archived" setting for repos	20:59
*** andreykurilin has joined #openstack-infra		21:00
diablo_rojo	clarkb, fungi Monday is marketplace mixer, Tuesday is Trillio Community Party, Thursday is game night, Friday is PTG happy hour. So Monday after the mixer would work for the Lowry	21:00
clarkb	diablo_rojo: ya that is what I'm thinking and the current weather forecast should be reasonable for that	21:01
*** jamesmcarthur has quit IRC		21:01
diablo_rojo	I'd be game for that.	21:01
openstackgerrit	Monty Taylor proposed opendev/system-config master: Use internal gitweb instead of gitea for now https://review.opendev.org/655238	21:03
mordred	clarkb: ^^ does that look better?	21:03
mordred	corvus: neat. that seems like a thing we shoudl make use of when appropriate	21:03
fungi	clarkb: i've readded the previous autoholds	21:03
mordred	clarkb: do you know of any planned GoT viewing parties Sunday evening?	21:04
clarkb	mordred: it does: <% if scope.lookupvar("gerrit::web_repo_url") -%>	21:04
clarkb	mordred: which may still fire on the ''	21:04
mordred	clarkb: what's the right way to set that so that it doesn't? false?	21:04
clarkb	mordred: yes I think that is the right way to manipulate the ruby	21:05
openstackgerrit	Monty Taylor proposed opendev/system-config master: Use internal gitweb instead of gitea for now https://review.opendev.org/655238	21:05
openstackgerrit	Merged opendev/storyboard-webclient master: Show tags with stories in project view. https://review.opendev.org/642230	21:05
*** jamesmcarthur has joined #openstack-infra		21:06
mordred	clarkb: also - I verified that puppet-gerrit installs the gitweb package if gitweb is true	21:07
clarkb	gitea just updated I think	21:08
clarkb	openstack/openstack works now	21:08
fungi	victory!	21:08
*** boden has quit IRC		21:12
mordred	clarkb: I'm not 100% prepared to agree with you	21:13
mordred	oh wait - there it is	21:13
clarkb	it isn't quick. I think tagging commits every so many commits would help mitigate that	21:16
clarkb	since that name-rev lookup is going back hundreds of htousands of commits	21:16
clarkb	and then doing it for each commit that a file is most recent on/	21:16
*** igordc has joined #openstack-infra		21:19
*** Lucas_Gray has quit IRC		21:20
*** rh-jelabarre has quit IRC		21:21
clarkb	mordred: for this networking stuff. One thing I notice is that it seems to happen a lot after what looks like a timeout	21:22
clarkb	which would lend credence to mnaser's suggestion it could be a new firewall or traffic shaper	21:22
clarkb	mordred: can we tell ansible to tell ssh to do ping pongs back and forth every minute or so?	21:22
openstackgerrit	Jason Lee proposed opendev/storyboard master: WIP: BlueprintWriter prototype, attempting bugfixes https://review.opendev.org/654812	21:22
paladox	btw you may want to beware of gerrit 2.15.12, it apparently has some type of problem that is currently causing an outage for us.	21:22
*** rfolco has quit IRC		21:24
clarkb	hrm we already set ServerAliveInteral to 60	21:24
clarkb	which should mean every 60 seconds ping pong	21:24
clarkb	(if you didn't get data otherwise)	21:24
clarkb	and default ServerAliveCountMax is 15	21:26
clarkb	which means after about 15 minutes we should disconnect	21:26
openstackgerrit	Merged opendev/base-jobs master: Update opendev intermediate registry secret https://review.opendev.org/655228	21:26
clarkb	er sorry it is 3	21:27
clarkb	I misread the numbers in the manpage	21:27
clarkb	so 3 minutes	21:27
jamesmcarthur	Hi clarkb: Trying to log into the wiki from https://governance.openstack.org/tc/reference/opens.html throws me back to openstack.org	21:27
jamesmcarthur	Related to this recent migration or something else?	21:27
jrosser	git clone error http://logs.openstack.org/74/652574/3/gate/openstack-ansible-deploy-aio_metal-debian-stable/ecb0b7c/job-output.txt.gz#_2019-04-23_21_09_27_768335	21:27
clarkb	jamesmcarthur: shouldn't be related to the migration. We didn't touch the wiki	21:28
*** tosky has joined #openstack-infra		21:28
clarkb	jrosser: ya we've been trying to figure out persistent connectivity issues between zuul and test nodes all day	21:28
clarkb	jamesmcarthur: where is the wiki login from there?	21:29
jamesmcarthur	seems to be something else going on	21:29
jamesmcarthur	I'm already logged in	21:30
jamesmcarthur	I'll open a ticket on our end and see if I can figure it out :)	21:30
*** smarcet has quit IRC		21:35
openstackgerrit	Merged opendev/storyboard-webclient master: Show all stories created and allows them to filter according to status https://review.opendev.org/642370	21:37
*** whoami-rajat has quit IRC		21:40
*** tjgresha_nope has quit IRC		21:41
*** tjgresha has joined #openstack-infra		21:43
smcginnis	jamesmcarthur: That's not the wiki.	21:43
smcginnis	jamesmcarthur: https://governance.openstack.org/tc/reference/opens.html is sphinx generated content.	21:44
smcginnis	But I can confirm clicking on Log In from there throws back to /	21:44
smcginnis	Not sure what logging in there is supposed to do.	21:44
openstackgerrit	Merged opendev/system-config master: Install socat on zuul executors https://review.opendev.org/654577	21:44
smcginnis	jamesmcarthur: You would need to submit a patch for https://opendev.org/openstack/governance/src/branch/master/reference/opens.rst if you are trying to update that page.	21:46
mordred	smcginnis, jamesmcarthur: yeah a) not sure what logging in to the four opens page is intended to accomplish - but also, https://www.openstack.org/Security/login/?BackURL=/home/ ... the BackURL is /home/ - which is unlikely to ever be correct :)	21:46
mordred	that also was supposed to be b)	21:47
smcginnis	:)	21:47
clarkb	I'm going to take a break now since I feel like Im just spinning wheels with the netwpmring stuff. It seems slightly better and if fungi can catch one maybe we can debug (and then lossibly file a bug with $clouf)	21:48
mordred	clarkb: ++	21:48
mordred	jamesmcarthur: https://opendev.org/openstack/openstackdocstheme/src/branch/master/openstackdocstheme/theme/openstackdocs/header.html	21:49
mordred	that's where the header is coming from	21:49
mordred	and https://opendev.org/openstack/openstackdocstheme/src/branch/master/openstackdocstheme/theme/openstackdocs/header.html#L110	21:49
mordred	is where the login link is coming from - with the nicely hard-coded /home/ as the BackURL	21:50
mordred	jamesmcarthur, smcginnis: since logging in to openstack docs isn't really a thing, maybe we sohuld just remove the login link from openstackdocstheme?	21:50
jamesmcarthur	ah ha	21:50
smcginnis	I wonder if there is somewhere that is actually used.	21:50
mordred	otherwise I think we'd want to replace /home/ there with some javascript or something that sets an appropriate BackURL	21:50
jamesmcarthur	mordred: that's.. kind of an excellent point	21:51
smcginnis	It might need some sort of conditional display.	21:51
mordred	smcginnis: I'm guessing the html got lifted from soewhere else	21:51
smcginnis	Could likely be	21:51
mordred	but there are zero times when we'll need a login page on published static docs	21:51
smcginnis	I do really wish gitea had a Blame button.	21:51
jamesmcarthur	we provide a little javascript include with the openstack menu so that everyone that's using it can stay up to date	21:51
jamesmcarthur	but it's definitely not applicable to docs	21:51
fungi	okay, sustenance has been consumed and i have 8 minutes to catch up before my next conference call	21:51
jamesmcarthur	lol	21:52
mordred	jamesmcarthur: yeah. that said - if we DID want to fix the login link, just for consistency, that seems fine	21:52
openstackgerrit	Merged opendev/system-config master: Add script to automate GitHub organization transfers https://review.opendev.org/644937	21:52
mordred	jamesmcarthur: and that way the javascript include would work and it would look integrated and whatnot :)	21:52
mordred	but - you know - I leave all of that to your very capable hands :)	21:52
smcginnis	mordred, jamesmcarthur: Looks like it was intentionally added since it is the first thing mentioned in the commit message: https://github.com/openstack/openstackdocstheme/commit/d31e4ded8941a69b36de413f1bcf56c91bece779	21:53
mordred	smcginnis: weird. but also - good to know	21:54
*** jcoufal has quit IRC		21:55
*** jamesmcarthur has quit IRC		21:55
* mordred needs to AFK		21:56
fungi	if asettle is awake already, maybe she remembers the reasons there? she was the one who approved that addition	21:56
fungi	er, s/approved/committed/	21:56
*** jamesmcarthur has joined #openstack-infra		21:58
*** jcoufal has joined #openstack-infra		21:58
fungi	AJaeger was the one to approve it	21:58
fungi	three years ago yesterday in fact	21:59
*** imacdonn has quit IRC		22:01
*** ijw has quit IRC		22:01
*** imacdonn has joined #openstack-infra		22:01
jamesmcarthur	Yeah... it was done to try to solidify the various implementations of the openstack header.	22:02
jamesmcarthur	Clearly worth a revisit :)	22:02
corvus	smcginnis: wish granted. merged 4 days ago, probably will be in 1.9.0: https://github.com/go-gitea/gitea/pull/5721	22:08
mriedem	i just noticed this in a non-voting job in stable/pike but it's also in queens, looks like legacy jobs are now failing because of incorrect or missing required-projects on devstack-gate?	22:12
mriedem	http://logs.openstack.org/98/640198/2/check/nova-grenade-live-migration/370efe9/job-output.txt.gz#_2019-04-22_13_16_53_233772	22:12
mriedem	is that a known issue?	22:12
mriedem	seems to be after the opendev rename	22:12
*** slaweq has quit IRC		22:12
mriedem	https://review.opendev.org/#/c/640198/2/.zuul.yaml@38	22:13
mriedem	not sure if we need to change that in stable branch job defs now?	22:13
mriedem	i guess that's what this was for... https://github.com/openstack/nova/commit/fc3890667e4971e3f0f35ac921c2a6c25f72adec	22:14
*** slaweq has joined #openstack-infra		22:14
*** jamesmcarthur has quit IRC		22:15
corvus	mriedem: that change was approved despite the fact that the job it added failed when it ran (since the job is non-voting, it's not gating)	22:20
mriedem	yeah i realize that	22:21
mriedem	i'm wondering if i need to fix this devstack-gate thing on stable branches, or are there redirects in place?	22:21
corvus	mriedem: it needs to be fixed	22:21
mriedem	ok	22:21
mriedem	ah i see there were migration patches per branch, so this isn't as bad i thought it'd be	22:27
*** amoralej\|off has quit IRC		22:27
mriedem	just the one job that missed it	22:27
*** hwoarang has quit IRC		22:32
*** hwoarang has joined #openstack-infra		22:34
*** tonyb has joined #openstack-infra		22:41
*** wehde has quit IRC		22:47
*** tkajinam has joined #openstack-infra		22:55
*** kranthikirang has quit IRC		22:56
Weifan	Our repo has been moved from openstack namespace to x. And our project is not longer published to pypi automatically after pushing a new tag.	23:00
Weifan	Does that mean we should to replicate the repo from opendev to github? Or does anyone know how we can setup it so it would be published based on tags on opendev?	23:00
Weifan	Or is it suggested that we setup our own jobs to publish it?	23:01
*** jcoufal has quit IRC		23:10
*** aaronsheffield has quit IRC		23:11
*** hwoarang has quit IRC		23:13
*** tosky has quit IRC		23:13
*** yamamoto has joined #openstack-infra		23:14
*** hwoarang has joined #openstack-infra		23:14
*** diablo_rojo has quit IRC		23:14
*** jcoufal has joined #openstack-infra		23:16
*** gmann is now known as gmann_afk		23:17
*** yamamoto has quit IRC		23:18
clarkb	pypi and github are independent	23:19
clarkb	the tag jobs should push to pypi regardless of github	23:19
*** diablo_rojo has joined #openstack-infra		23:19
Weifan	it has been 1 day, and it is still not updated on pypi	23:19
Weifan	but the tag can be found on opendev	23:19
clarkb	on https://zuul.openstack.org is a builds tab can you search for your pypi jobs there	23:20
clarkb	it probably failed for some reason	23:20
*** yamamoto has joined #openstack-infra		23:20
*** rcernin has joined #openstack-infra		23:20
*** hwoarang has quit IRC		23:22
*** hwoarang has joined #openstack-infra		23:23
Weifan	any suggestion on how to find the job? can't seem to find it....it was for https://opendev.org/x/networking-bigswitch	23:23
clarkb	let me see	23:24
clarkb	https://zuul.openstack.org/build/e8686ce24e04408aaef4f34c99bd7f27	23:26
*** lseki has quit IRC		23:26
openstackgerrit	Jason Lee proposed opendev/storyboard master: WIP: BlueprintWriter prototype, additional bugfixes https://review.opendev.org/654812	23:27
clarkb	that may be the twine issue?	23:27
clarkb	I'm not in a good spot to debug that as I am on a phone	23:27
Weifan	looks like the ansible task failed	23:28
Weifan	seems like all of them are failing right now, not just our project	23:32
*** igordc has quit IRC		23:32
fungi	yes, we're still trying to work out the cause. it started happening roughly a week ago, so well before the opendev migration	23:32
*** jcoufal has quit IRC		23:33
fungi	though it was intermittent until today-ish	23:33
Weifan	ok, thanks for the information :)	23:34
fungi	Weifan: yep, same error in your log too... "Can not perform a '--user' install. User site-packages are not visible in this virtualenv."	23:35
fungi	we'll reenqueue that tag object once we work out the fix, so no need to push a new tag for that	23:36
*** gyee has quit IRC		23:39
Weifan	would it be related to python3? release-openstack-python seems to have python3 as "release_python", but the job was on queens	23:39
Weifan	which probably uses py2	23:39
*** rlandy\|ruck has quit IRC		23:40
*** yamamoto has quit IRC		23:41
*** yamamoto has joined #openstack-infra		23:42
*** yamamoto has quit IRC		23:43
fungi	i haven't been able to find a correlation by interpreter. the release-openstack-python job has been set to python3 since november	23:44
*** hwoarang has quit IRC		23:49
*** hwoarang has joined #openstack-infra		23:51
*** mattw4 has quit IRC		23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!