Friday, 2019-07-26

*** xek__ has joined #openstack-infra		00:00
*** xek_ has quit IRC		00:02
*** jamesmcarthur has joined #openstack-infra		00:03
fungi	does seem to be going a bit more than twice slower	00:03
*** mattw4 has quit IRC		00:03
*** jamesmcarthur has quit IRC		00:06
*** jamesmcarthur has joined #openstack-infra		00:06
clarkb	fungi: seems like we are about 14-15 minutes in and it has 2100 ish tasks	00:08
clarkb	so about the same speed?	00:08
fungi	yeah, maybe	00:08
*** slaweq has joined #openstack-infra		00:11
*** weifan has joined #openstack-infra		00:13
*** slaweq has quit IRC		00:15
*** weifan has quit IRC		00:17
*** rosmaita has quit IRC		00:19
*** gyee has quit IRC		00:19
fungi	nearly done	00:20
*** yolanda has quit IRC		00:20
fungi	so ~2x14min	00:20
clarkb	ya, you'd expect it to be quicker than that	00:20
clarkb	but at least it hasn't gone slower than the single case	00:20
fungi	so far	00:20
clarkb	should I do 05, 07, 08 next?	00:20
fungi	but yeah, replicating two in parallel took roughly as long as replicating one	00:20
fungi	has 01 been done yet?	00:21
clarkb	yes 01 06 03 04 ar edone	00:21
fungi	oh, right, 01 went first then 06	00:21
clarkb	if we do 05 07 08 together we can see if it is 3x14 minutes	00:22
fungi	and we're not doing 02 because it's already removed to replace	00:22
fungi	so, yep, sounds good	00:22
clarkb	fungi: ya, though maybe you should double check that haproxy has 02 pulled as we expect?	00:22
clarkb	doing 05 07 08 now	00:22
fungi	checking now, sure	00:22
clarkb	6.3k tasks now	00:23
clarkb	I'm going to start on that curry now	00:23
*** weifan has joined #openstack-infra		00:25
fungi	yeah, 02 is no longer in the pools according to "show stat"	00:25
fungi	enjoy curry!	00:25
fungi	i'm monitoring the times for the gerrit replication queue with timestamps, so should know within a minute when it's caught up	00:29
fungi	you know, for science	00:30
*** weifan has quit IRC		00:30
*** rosmaita has joined #openstack-infra		00:32
*** igordc has quit IRC		00:45
ianw	anyoen know how to encode the date range in the url of the kibana query so if you send it to someone they see the same range?	00:47
fungi	no clue, sorry	00:51
fungi	i've not been able to figure out how to direct-link kibana queries for that matter	00:51
*** betherly has joined #openstack-infra		00:51
*** yamamoto has joined #openstack-infra		00:52
logan-	?from=5d	00:52
logan-	there is a "share" button at the top that will link to the query (2nd from the right)	00:52
ianw	logan-: yeah, that share link doesn't seem to add in the date range if you've selected a custom one	00:53
logan-	nope, it doesn't	00:53
logan-	you have to add the from parameter manually :/	00:53
ianw	what if it's a range?	00:54
logan-	not sure	00:54
logan-	sorry, i only know how to specify the start time	00:54
logan-	half way there at least :P	00:54
ianw	maybe ?from=..?to=... ... ?	00:55
fungi	worth a try	00:55
*** ricolin has joined #openstack-infra		00:56
*** betherly has quit IRC		00:56
clarkb	sorry I dont know	00:58
ianw	http://logstash.openstack.org/#/dashboard/file/logstash.json?from=2019-07-25T16:39:00&to=2019-07-25T17:00:00&query=message:%5C%22HTTP%20Error%20404%5C%22%20AND%20node_provider:%20rax-iad	00:58
ianw	does not appear to work	00:58
ianw	it seems like this is done by default in kibana 4, but for 3 might be out of luck :/	00:59
ianw	(date range as part of shared url i mean)	00:59
clarkb	ya but 4 requires write accessto elastocsearch	01:01
clarkb	whoch is why wenever upgraded	01:02
fungi	last batch nearly done	01:02
clarkb	fungi: about on time	01:02
fungi	yeah	01:02
fungi	and done	01:02
clarkb	I wonder where we degrade to an hour per and if it is related to load like you suggested	01:03
fungi	so... 40 minutes?	01:03
fungi	right	01:03
openstackgerrit	Joshua Hesketh proposed opendev/system-config master: Toggle CI should also hide old zuul comments https://review.opendev.org/671436	01:09
*** betherly has joined #openstack-infra		01:13
rm_work	err, is there a good channel to ask someone about some Shanghai visa specifics? nothing to do with infra really, but you folks are the most connected I know :D	01:14
*** betherly has quit IRC		01:17
clarkb	rm_work: there is an invite letter form to fill out on the summit site. Other than that it hasbeen suggested to me to use ahandling company	01:17
rm_work	yeah i think we have one internally we use	01:18
rm_work	following the email directions seemed to indicate i had to sign up for the summit first but i think i misread pre-coffee and it's fine	01:18
rm_work	(to get the invite letter)	01:19
ianw	clarkb / auristor: correlating everything (as clarkb had done anyway) it seems very likely that apache can think files aren't on disk during releases even with 5.3-rc1 afs-next branch; see -> http://lists.infradead.org/pipermail/linux-afs/2019-July/003122.html	01:19
rm_work	ahh no, it does, the form says your order ID is required	01:22
rm_work	so ... if we are waiting for speaker codes... we just have to keep waiting before we can get our visa thing?	01:23
clarkb	I guess? I was also told the visa is relatively quick	01:24
auristor	ianw: is there FileAuditLog data for [Thu Jul 25 16:39:18 2019] kAFS: Volume 536870968 'mirror.epel' is offline ?	01:24
ianw	auristor: no unfortunately, it's turned off atm. i can go back and re-enable it like last time now we have these new changes to test	01:25
rm_work	i've been told to allow 2 months for the visa process, lol	01:26
rm_work	but I guess we do have a bit	01:26
*** diablo_rojo has quit IRC		01:27
ianw	auristor: i can do that in a bit. the tar.gz i provided before was sufficient right? just replicate that?	01:27
*** jamesmcarthur has quit IRC		01:30
auristor	ianw: yes, the same contents as the last time would be great	01:32
*** betherly has joined #openstack-infra		01:33
auristor	ianw: openafs unlike kafs will upon receiving a VBUSY or VOFFLINE error will sleep 15 seconds and then retry the request up to 100 times. at the moment, kafs will immediately failover to the other locations but will then fail the syscall.	01:35
auristor	ianw: I would like to confirm from the FileAuditLog entries whether or not the failover is taking place	01:36
ianw	ok. i'm not sure if there's a way to make apache a bit more verbose too about what it is seeing on disk	01:38
*** betherly has quit IRC		01:38
auristor	I think the translation of VOFFLINE to ENOENT is wrong	01:41
*** betherly has joined #openstack-infra		01:54
*** betherly has quit IRC		01:59
auristor	fs/afs/misc.c afs_abort_to_error() should not convert VOFFLINE to ENOENT but to ENODEV because ENOENT will cause a negative lookup to be cached resulting in a missing file error.	02:03
*** tinwood has quit IRC		02:10
*** slaweq has joined #openstack-infra		02:11
*** tinwood has joined #openstack-infra		02:12
*** betherly has joined #openstack-infra		02:15
*** slaweq has quit IRC		02:16
*** betherly has quit IRC		02:20
*** bobh has joined #openstack-infra		02:22
*** whoami-rajat has joined #openstack-infra		02:26
*** bobh has quit IRC		02:36
*** yamamoto has quit IRC		02:36
*** factor has joined #openstack-infra		02:38
*** yamamoto has joined #openstack-infra		02:50
openstackgerrit	Ian Wienand proposed opendev/system-config master: AFS audit logging: helper script https://review.opendev.org/672847	02:54
ianw	auristor: logging enabled; for infra-root ^ should be helpful so there's less magic involved	02:54
*** betherly has joined #openstack-infra		02:57
prometheanfire	ianw: do_extra_package_install does not include the hooks mount, is this intentional?	02:58
auristor	ianw; thanks	02:59
*** jamesmcarthur has joined #openstack-infra		02:59
ianw	prometheanfire: ahhh, i have no idea :) i guess it's just always been like that	02:59
prometheanfire	ok, may need to unify that behavior then :D	02:59
prometheanfire	I guess I could just have it run portageq itself, since it will always be in the chroot	02:59
prometheanfire	ya, will just do that	02:59
*** betherly has quit IRC		03:01
*** EmilienM\|pto is now known as EmilienM		03:01
*** HenryG has quit IRC		03:04
*** slaweq has joined #openstack-infra		03:11
*** bhavikdbavishi has joined #openstack-infra		03:13
*** rh-jelabarre has quit IRC		03:14
*** slaweq has quit IRC		03:15
*** psachin has joined #openstack-infra		03:22
*** psachin has quit IRC		03:23
*** psachin has joined #openstack-infra		03:26
*** betherly has joined #openstack-infra		03:28
*** michael-beaver has quit IRC		03:32
*** ykarel\|away has joined #openstack-infra		03:32
*** betherly has quit IRC		03:33
*** gregoryo has joined #openstack-infra		03:35
*** diablo_rojo has joined #openstack-infra		03:43
*** bhavikdbavishi1 has joined #openstack-infra		03:46
prometheanfire	doesn't do the caching either	03:47
*** bhavikdbavishi has quit IRC		03:48
*** bhavikdbavishi1 is now known as bhavikdbavishi		03:48
*** betherly has joined #openstack-infra		03:48
*** betherly has quit IRC		03:53
*** HenryG has joined #openstack-infra		03:54
*** udesale has joined #openstack-infra		03:57
*** betherly has joined #openstack-infra		04:09
*** slaweq has joined #openstack-infra		04:11
*** ramishra has joined #openstack-infra		04:12
*** betherly has quit IRC		04:14
*** slaweq has quit IRC		04:15
*** apetrich has quit IRC		04:20
*** betherly has joined #openstack-infra		04:30
*** jamesmcarthur has quit IRC		04:32
*** betherly has quit IRC		04:35
*** ykarel\|away has quit IRC		04:40
openstackgerrit	Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530	04:41
*** yamamoto_ has joined #openstack-infra		04:46
*** yamamoto has quit IRC		04:50
*** diablo_rojo has quit IRC		04:56
*** betherly has joined #openstack-infra		05:01
*** jbadiapa has quit IRC		05:02
*** ykarel\|away has joined #openstack-infra		05:02
*** betherly has quit IRC		05:06
*** armax has quit IRC		05:16
*** jaosorior has joined #openstack-infra		05:21
*** ramishra has quit IRC		05:27
*** ramishra has joined #openstack-infra		05:38
*** rtjure has joined #openstack-infra		05:42
*** yamamoto_ has quit IRC		05:43
*** zbr has joined #openstack-infra		05:44
*** kjackal has quit IRC		05:44
*** jamesmcarthur has joined #openstack-infra		05:45
*** yamamoto has joined #openstack-infra		05:56
openstackgerrit	Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530	05:57
*** rcernin has quit IRC		06:08
*** slaweq has joined #openstack-infra		06:11
*** odicha has joined #openstack-infra		06:14
*** apetrich has joined #openstack-infra		06:15
*** slaweq has quit IRC		06:16
*** gregoryo has quit IRC		06:17
*** yamamoto_ has joined #openstack-infra		06:19
*** yamamoto has quit IRC		06:21
*** jbadiapa has joined #openstack-infra		06:24
*** dpawlik has joined #openstack-infra		06:25
*** Lucas_Gray has quit IRC		06:26
*** jamesmcarthur has quit IRC		06:33
*** ramishra has quit IRC		06:34
*** iurygregory has joined #openstack-infra		06:34
*** ykarel\|away is now known as ykarel		06:36
*** joeguo has quit IRC		06:44
*** kjackal has joined #openstack-infra		06:45
*** rlandy has joined #openstack-infra		06:47
*** kjackal has quit IRC		06:48
*** jpena\|off is now known as jpena		06:51
*** jpena is now known as jpena\|mtg		06:51
*** raukadah is now known as chandankumar		06:51
*** Vadmacs has joined #openstack-infra		06:59
*** kjackal has joined #openstack-infra		07:00
*** tesseract has joined #openstack-infra		07:03
*** slaweq has joined #openstack-infra		07:07
*** ginopc has joined #openstack-infra		07:10
*** ykarel is now known as ykarel\|lunch		07:26
*** bhavikdbavishi has quit IRC		07:26
*** kopecmartin\|off is now known as kopecmartin		07:26
*** cshen has joined #openstack-infra		07:32
*** tosky has joined #openstack-infra		07:34
*** dchen has quit IRC		07:39
*** cshen has quit IRC		07:43
*** rpittau\|afk is now known as rpittau		07:44
*** pcaruana has joined #openstack-infra		07:44
*** Goneri has joined #openstack-infra		07:45
*** ramishra has joined #openstack-infra		07:53
*** bhavikdbavishi has joined #openstack-infra		07:54
*** dtantsur\|afk is now known as dtantsur		07:55
*** ralonsoh has joined #openstack-infra		07:56
*** cshen has joined #openstack-infra		07:56
*** ramishra has quit IRC		07:57
*** ramishra has joined #openstack-infra		07:57
*** lucasagomes has joined #openstack-infra		08:03
*** yamamoto_ has quit IRC		08:05
*** yamamoto has joined #openstack-infra		08:13
*** ricolin has quit IRC		08:19
*** pkopec has joined #openstack-infra		08:23
*** siqbal has joined #openstack-infra		08:29
*** rosmaita has quit IRC		08:30
*** rosmaita has joined #openstack-infra		08:34
*** joeguo has joined #openstack-infra		08:41
*** bhavikdbavishi has quit IRC		08:47
*** ykarel\|lunch is now known as ykarel		08:49
*** bhavikdbavishi has joined #openstack-infra		08:56
*** e0ne has joined #openstack-infra		09:16
*** bhavikdbavishi has quit IRC		09:17
*** derekh has joined #openstack-infra		09:23
*** siqbal has quit IRC		09:31
*** siqbal has joined #openstack-infra		09:31
*** arxcruz is now known as arxcruz\|off		09:32
*** cshen has left #openstack-infra		09:33
*** jbadiapa has quit IRC		09:39
openstackgerrit	Simon Westphahl proposed zuul/zuul master: Spec for allowing circular dependencies https://review.opendev.org/643309	09:49
*** priteau has joined #openstack-infra		09:53
*** rlandy has quit IRC		10:03
*** dpawlik has quit IRC		10:08
*** yamamoto has quit IRC		10:10
*** yamamoto has joined #openstack-infra		10:14
*** yamamoto has quit IRC		10:15
*** dpawlik has joined #openstack-infra		10:26
*** tdasilva has quit IRC		10:29
*** psachin has quit IRC		10:38
*** udesale has quit IRC		10:44
*** udesale has joined #openstack-infra		10:45
*** yamamoto has joined #openstack-infra		10:45
*** siqbal has quit IRC		10:47
*** kjackal has quit IRC		10:48
*** dpawlik has quit IRC		11:00
*** tdasilva has joined #openstack-infra		11:02
*** psachin has joined #openstack-infra		11:02
*** dpawlik has joined #openstack-infra		11:06
*** jaosorior has quit IRC		11:15
*** ramishra has quit IRC		11:19
*** ramishra has joined #openstack-infra		11:20
*** roman_g has quit IRC		11:25
*** roman_g has joined #openstack-infra		11:26
*** kjackal has joined #openstack-infra		11:26
*** EmilienM has quit IRC		11:27
*** EmilienM has joined #openstack-infra		11:28
*** jbadiapa has joined #openstack-infra		11:37
*** larainema has joined #openstack-infra		11:51
*** yamamoto has quit IRC		11:51
*** irclogbot_2 has quit IRC		11:53
*** yamamoto has joined #openstack-infra		11:54
*** irclogbot_1 has joined #openstack-infra		11:54
*** rh-jelabarre has joined #openstack-infra		11:59
openstackgerrit	Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273	12:05
*** joeguo has quit IRC		12:06
*** bhavikdbavishi has joined #openstack-infra		12:07
*** hwoarang has quit IRC		12:08
*** jbadiapa has quit IRC		12:08
*** jbadiapa has joined #openstack-infra		12:08
*** bhavikdbavishi1 has joined #openstack-infra		12:09
*** bhavikdbavishi has quit IRC		12:11
*** bhavikdbavishi1 is now known as bhavikdbavishi		12:11
*** hwoarang has joined #openstack-infra		12:13
*** derekh has quit IRC		12:24
openstackgerrit	Tristan Cacqueray proposed zuul/zuul master: manager: specify report failure in logs https://review.opendev.org/671760	12:31
*** rascasoft has quit IRC		12:32
*** Goneri has quit IRC		12:34
*** rascasoft has joined #openstack-infra		12:34
*** yamamoto has quit IRC		12:34
*** derekh has joined #openstack-infra		12:36
*** dpawlik has quit IRC		12:37
*** yamamoto has joined #openstack-infra		12:38
*** dpawlik has joined #openstack-infra		12:43
*** jpena\|mtg is now known as jpena\|off		12:47
*** mriedem has joined #openstack-infra		12:53
openstackgerrit	Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273	12:55
*** yamamoto has quit IRC		12:57
openstackgerrit	Fabien Boucher proposed zuul/zuul master: Builds page - Fix bad labels display https://review.opendev.org/672973	12:59
*** bhavikdbavishi has quit IRC		12:59
*** xek__ has quit IRC		13:01
*** xek__ has joined #openstack-infra		13:02
donnyd	Last night I finally got together something to gather metrics for FN. It looks like the hypervisors are under utilized, so I am going to turn things back up a bit.	13:06
donnyd	CPU utilization for the most part sits around 20%, and I am thinking 40-50% would make more use of my equipment. So with that, I turned it back up to 60	13:07
donnyd	I will watch over the weekend to see what it does... I am still.. waiting... on... parts.. for the new storage (fans this time) and when I go to put it in place there will be hard downtime. So I will need to roll it back to zero, but there should be plently of advanced notice.	13:08
donnyd	Still getting timeouts, but about 1/3 as much as rax-ord ( probably because they have 4x the instances)	13:10
donnyd	Seems the be the same jobs timing out everywhere, so I am pretty sure its not from the infra	13:10
*** b3nt_pin is now known as beagles		13:15
*** bhavikdbavishi has joined #openstack-infra		13:18
*** jpena\|off is now known as jpena		13:19
*** ykarel has quit IRC		13:19
*** ekultails has joined #openstack-infra		13:20
*** aaronsheffield has joined #openstack-infra		13:28
*** smcginnis has joined #openstack-infra		13:29
*** Goneri has joined #openstack-infra		13:29
*** yamamoto has joined #openstack-infra		13:31
*** goldyfruit has joined #openstack-infra		13:31
*** bhavikdbavishi has quit IRC		13:35
*** pkopec has quit IRC		13:36
*** pkopec has joined #openstack-infra		13:37
*** michael-beaver has joined #openstack-infra		13:38
*** jpena is now known as jpena\|off		13:38
fungi	donnyd: there are also guaranteed to be at least some jobs whose runtimes have crept up close to their timeout values and so minor variances cause them to overrun the allowed duration	13:40
fungi	usually easy to tell by looking at the success runs, though the durations for them are a click away from http://zuul.opendev.org/t/openstack/builds	13:42
fungi	i wonder if including a duration column there would be useful	13:42
fungi	and then there are jobs which have nondeterministic/race condition issues causing some process to hang, so the success runs are well under their timeouts but then sometimes they timeout inexplicably because the job stops indefinitely halfway through	13:43
donnyd	To me it would be extremly useful. A large part of this project is to figure out what it takes to make a CI system goes as fast as possible	13:44
*** yamamoto has quit IRC		13:45
*** xek__ has quit IRC		13:46
*** xek__ has joined #openstack-infra		13:47
*** yamamoto has joined #openstack-infra		13:48
mordred	fungi: if we look in to adding that duration column, we should keep in mind there are some jobs that will show a long duration because they were paused (docker registry, for instance) - so we should account for that somehow, or maybe have a total duration and a total active duration or something	13:51
donnyd	I would think any metric on runtime would be a great place to start. Is there anything I could do to speed up the container based builds, they seem to timeout more than others	13:52
*** liuyulong has joined #openstack-infra		13:55
*** FlorianFa has quit IRC		13:58
*** eharney has joined #openstack-infra		14:02
AJaeger	config-core, puppet-crane is updated now - want to help retiring it, please? https://review.opendev.org/#/c/671268/	14:05
AJaeger	thanks, mordred	14:08
AJaeger	config-core, and one review to rename a job in grafana, please - https://review.opendev.org/672290	14:09
*** ykarel has joined #openstack-infra		14:12
*** dpawlik has quit IRC		14:16
*** psachin has quit IRC		14:22
*** odicha has quit IRC		14:22
openstackgerrit	James E. Blair proposed zuul/zuul master: Add react-lazylog package https://review.opendev.org/672988	14:23
fungi	mordred: great point	14:24
fungi	donnyd: have an example?	14:24
*** bhavikdbavishi has joined #openstack-infra		14:25
*** bnemec is now known as beekneemech		14:26
*** Goneri has quit IRC		14:27
*** xek__ has quit IRC		14:28
*** xek__ has joined #openstack-infra		14:29
donnyd	Well from a provider prospective I concerned with job start/finish times. I am pretty sure we already gather the data. But I really only can track instance on to off on my end	14:30
*** smrcascao has joined #openstack-infra		14:31
clarkb	we track that for every job in graphite	14:31
fungi	donnyd: oh, i meant an example build for something containery which was slow	14:31
donnyd	Oh, yea	14:31
fungi	but sure, makes sense	14:31
clarkb	every job also has its own timeout value though which isn't in graphtie and I think what we really want is proximit of duration to timeout	14:31
fungi	i'm mostly just curious if some of these jobs are failing to use mirrors and whatnot	14:31
clarkb	fungi: ++	14:31
openstackgerrit	Merged openstack/project-config master: Remove puppet-crane https://review.opendev.org/671268	14:32
donnyd	Well for the containery thing, maybe a proxy that can cache container images in memory or very fast disk	14:32
clarkb	fungi: for gitea backend replacements do we have a change to pull more of them out of inventory? or do we want to go ahead with 02 for now?	14:32
donnyd	But I have no real ideas what it would take without digging in	14:32
clarkb	donnyd: that is what our mirror node does	14:32
donnyd	Does it do that for containers	14:32
clarkb	donnyd: yup	14:33
clarkb	donnyd: as long as you request the images through mirror.regionone.fortnebula.opendev.org instead of hub.docker.com directly	14:33
fungi	clarkb: i was mostly thinking of doing more in parallel because syncing took so long, but last night's syncs were fast, so one at a time is likely fine	14:34
donnyd	Oh... well then the job timeouts have to be more related to the actual job config then the service provider... Without a massive CPU / Memory upgrade, I cannot make things turn any faster.. and from what I could tell the workload seems to be IO bound anyways	14:34
openstackgerrit	James E. Blair proposed zuul/zuul master: WIP: try lazylog https://review.opendev.org/672991	14:34
donnyd	clarkb: I guess we would have to look at the failing jobs to see where they are getting their bits from	14:35
*** ricolin has joined #openstack-infra		14:36
fungi	clarkb: anyway, should be safe to delete 02 so i'm doing that now and will then boot the replacement	14:37
openstackgerrit	Graham Hayes proposed zuul/nodepool master: Implement an Azure driver https://review.opendev.org/554432	14:39
clarkb	donnyd: ya that is why fungi was looking for examples	14:39
clarkb	fungi: gotcha. fwiw you don't have to delete the old one first if you don't want to (though with 8 total backends deleting first should also be totally fine)	14:40
clarkb	fungi: don't forget to look for leaked volume after server delete	14:40
fungi	yup	14:40
fungi	no available volumes in that region/tenant	14:41
fungi	so nova took care of it this time	14:41
fungi	sudo /opt/system-config/launch/launch-node.py gitea02.opendev.org --flavor=v2-highcpu-8 --cloud=openstackci-vexxhost --region=sjc1 --image=infra-ubuntu-bionic-minimal-20190612 --boot-from-volume --volume-size=80 --ignore_ipv6 --network=public --config-drive	14:43
fungi	those are the options we settled on previously	14:43
*** jamesmcarthur has joined #openstack-infra		14:43
openstackgerrit	Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273	14:43
clarkb	looks correct to me	14:43
fungi	[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details	14:44
fungi	[WARNING]: Unable to parse /opt/system-config/inventory/emergency.yaml as an inventory source	14:45
fungi	yeah, that file doesn't exist	14:45
clarkb	fungi: that may be a regression in launch-node looking for specific inventory files which have since moved	14:46
fungi	yeah, i'm hunting it down	14:46
clarkb	yup in launch-node.py it lists that file specifically	14:46
clarkb	should be updated to point to /etc/ansible/hosts/emergency.yaml I think	14:46
fungi	it already includes that	14:47
fungi	oh, wait, playbooks/roles/install-ansible/templates/ansible.cfg.j2 includes it	14:48
fungi	launch/launch-node.py only includes the one from the system-config repo	14:48
*** chandankumar is now known as raukadah		14:48
clarkb	/opt/system-config/inventory/emergency.yaml is in launch-node.py	14:48
fungi	i guess we can also clean out the nonexistent system-config one and make sure they both use the one from /etc	14:48
clarkb	yup	14:49
fungi	i'll also correct some references in doc/source/sysadmin.rst	14:50
*** openstackgerrit has quit IRC		14:51
*** openstackgerrit has joined #openstack-infra		14:52
openstackgerrit	James E. Blair proposed zuul/zuul master: WIP: try lazylog https://review.opendev.org/672991	14:52
*** kopecmartin is now known as kopecmartin\|off		14:56
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Correct emergency file reference in launch script https://review.opendev.org/672996	14:57
fungi	turns out the other entries in playbooks/roles/install-ansible/templates/ansible.cfg.j2 were fine, they were for the actual inventories in the system-config repo	14:57
openstackgerrit	Jeremy Stanley proposed opendev/zone-opendev.org master: Update IP address for gitea02 https://review.opendev.org/672997	15:02
clarkb	fungi: ^ we need a change to add that host to the inventory as gitea02 too	15:04
*** piotrowskim has quit IRC		15:04
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Add gitea02 replacement to inventory https://review.opendev.org/672999	15:05
fungi	yeah, i was typing it ;)	15:05
*** ykarel is now known as ykarel\|away		15:05
*** yamamoto has quit IRC		15:06
clarkb	left a couple notes on the emergency file docs changes. I think followups for that are fine so I +2'd	15:07
clarkb	fungi: I +2'd 672999 but just realized I think it needs the exclusion on the remote_puppet_git.yaml playbook to avoid having ansible update the db	15:08
*** yamamoto has joined #openstack-infra		15:09
*** dtantsur is now known as moltendmitry		15:09
fungi	oh, right, thanks	15:09
*** yamamoto has quit IRC		15:10
*** bhavikdbavishi has quit IRC		15:10
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Add gitea02 replacement to inventory https://review.opendev.org/672999	15:11
clarkb	+2 mordred corvus ^ have a moment for that change?	15:13
fungi	gonna grab an early lunch while all that percolates and get the repos initialized when i get back	15:14
clarkb	fungi: we'll want ot double check that gitea01 noops as expected when ansible runs against it but it did with 06 so should be fine	15:14
donnyd	Ok, I'm picking up what your are putting down now	15:14
mordred	clarkb: lgtm	15:14
clarkb	donnyd: I'm happy to look at some examplse too and should have a logstash window with the timeout queryup somewhere let me see	15:14
donnyd	node_provider:"fortnebula-regionone" AND filename:job-output.txt AND message:"RUN END RESULT_TIMED_OUT"	15:15
clarkb	you can keep all your tabs forever but what they don't tell you is that if you do you'll never find the one you want again	15:15
* clarkb has a tab problem		15:15
*** jamesmcarthur has quit IRC		15:17
donnyd	clarkb: needs nested tabbing	15:17
openstackgerrit	Merged opendev/zone-opendev.org master: Update IP address for gitea02 https://review.opendev.org/672997	15:21
*** jamesmcarthur has joined #openstack-infra		15:21
*** cmurphy is now known as cmorpheus		15:25
clarkb	http://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/job-output.txt is an example of a timed out default tempest job	15:29
clarkb	devstack took 32 minutes to run there which isn't the fastest but is also within range of other cloud regions	15:29
clarkb	and it doesn't timeout until it gets into the slow tempest tests run which is right at the end of the job so we are very near the timeout	15:29
donnyd	I would like to get the devstack install times down a little further, but I am not sure where the bottleneck in it is	15:30
clarkb	donnyd: I think a lot of it could be improvements to devstack and the projects themselves. For example we do database migrations for a lot of projects from many releases ago as they haven't been rolled up	15:32
donnyd	On the bright side I bumped the cpu ratios back up and it would seem that this is more where I would like density to be	15:32
clarkb	but ya I agree devstack could be quicker, we just don't have anyone investing in that (and when people suggest alternatives to devstack they tend to be even slower :( )	15:33
donnyd	https://usercontent.irccloud-cdn.com/file/K6D0a0CW/Screenshot%20from%202019-07-26%2011-32-21.png	15:33
donnyd	towards the left side is this morning with cpu ratio at 1.5:1	15:34
donnyd	and right side is 2.0:1	15:34
donnyd	in case anyone is curious	15:34
clarkb	the good thing about that tempest job is that it has dstat logs so we can sanity check those	15:34
clarkb	if you download http://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/controller/logs/dstat-csv_log.txt.gz, uncompress it then you can feed it to https://lamada.eu/dstat-graph/	15:35
donnyd	I am interesting in what can be done from an infra perspective to speed up devstack	15:35
*** yamamoto has joined #openstack-infra		15:35
donnyd	interested*	15:35
clarkb	looking at that dstat graph I think (lack of) memory and resulting spike in load average and disk usage and paging are a big hit	15:37
clarkb	c-bak is still a running service which was identified as a memory hog that isn't actually tested by the job? mriedem you responded to my emails about this in the past do you have up to date info?	15:38
mriedem	patch in tempest is still sitting	15:38
mriedem	https://review.opendev.org/#/c/651865/	15:38
mriedem	i think it's waiting on gmann's work to split apart the integrated gate template into separate jobs so that cinder is running something which runs c-bak so those tests still get run on cinder changes	15:39
*** jbadiapa has quit IRC		15:39
mriedem	having said that, now that gmann is adding all of these new templates, those new templates, except for anything that runs cinder, should probably also disable c-bak	15:39
clarkb	looking at the graph a bit more closely we can see lack of memory results in a spike in wai state	15:39
clarkb	which will definitely slow things down	15:40
clarkb	mriedem: Ideally we'd look at where the memory use in the projects is coming from too. I know heat did this once and dropped memroy use by like 800MB	15:40
donnyd	I can't make disk speeds much more than they already are	15:40
*** yamamoto has quit IRC		15:41
clarkb	donnyd: I think that is fine. Once we hit swap its sort of a "good luck" point	15:41
donnyd	writes are hitting the limits of an individual nic, although reads could be faster	15:41
*** noorul has joined #openstack-infra		15:41
clarkb	donnyd: we have swap there because sometimes it is fast enough that we don't have to throw away the job, but if it isn't well hard to blame the cloud for that	15:41
mriedem	aha https://review.opendev.org/#/c/669312/6/.zuul.yaml@213	15:41
donnyd	I am curious to see if the new storage will speed the jobs up any at all... doesn't look like there is much of a write workload looking at an individual job	15:42
*** armax has joined #openstack-infra		15:43
donnyd	https://usercontent.irccloud-cdn.com/file/YeqEgCTu/image.png	15:43
donnyd	But looking at network traffic overall I surely bang up against the limits of an individual nic	15:43
clarkb	scanning devstack logs we install uwsgi and compile it because our wheel mirror doesn't have it	15:44
clarkb	fixing that will save ~15 seconds	15:44
clarkb	prometheanfire: smcginnis tonyb any idea why uwsgi isn't listed in global-requirements? if it were we would have wheels ready to go for it	15:45
donnyd	On at least my cloud, my cpu to memory ratios could go way up on the memory side. I am using not even 25% of the addressable memory	15:46
clarkb	donnyd: we've intentionally limited memory in the test environments in part to make it possible for someone to say "my test failed" then run it locally on say a laptop or desktop and not require them to have a rack of servers	15:46
clarkb	it also helps reach a balance with clouds and resource utilization where we don't have a ton of underutilized instances	15:47
clarkb	(tests can scale up by requesting multiple nodes if they know they need that)	15:47
donnyd	That makes sense, but I am curious to know if giving devstack more memory would in-fact speed it up	15:47
donnyd	I am not sure what laptops out there have 8 cores, but only 8 GB of memory	15:48
donnyd	The laptops I have that do have better processors (with 8 cores), also usually have a bit more in the memory dept... mine specifically 64G... But i don't think 16 is a typical at this point...	15:49
clarkb	donnyd: with cores we use more of them if we have them (to speed up testing) but you don't need 8 to run tempest	15:49
clarkb	8GB remains the pretty typical laptop memory setup	15:49
clarkb	going to single dimm machines for thinness seems to have really impacted memory availability	15:50
donnyd	I am not disagreeing, because what you are saying makes sense	15:50
*** moltendmitry is now known as dtantsur\|afk		15:50
donnyd	I am just trying to find out where the optimal amount of memory lies to make devstack go as fast as possible with reasonable DC equipment	15:51
*** xek__ has quit IRC		15:51
clarkb	But also we have bloated memory use and I don't think anyone has looked at why other than my quick "oh c-bak made dstat sad"	15:51
clarkb	and rather than simply through more memory at the problem it would be good to understand	15:51
*** xek__ has joined #openstack-infra		15:51
prometheanfire	clarkb: at the time we didn't want to choose one impl (gunicorn vs uwsgi vs whatever)	15:52
donnyd	I will run some tests on my end to see where the balance in between "not achievable on a laptop" and "fast as possible lies"	15:52
prometheanfire	not sure about license off the top of head either	15:52
*** Vadmacs has quit IRC		15:52
*** jangutter has joined #openstack-infra		15:54
clarkb	prometheanfire: uwsgi is gpl v2+ with linking exception	15:54
prometheanfire	clarkb: ya, just looked it up	15:54
prometheanfire	not sure that's allowed or not, I know the base gpl is not	15:55
paladox	mordred bazel caused load for me to go up to 233 apparently	15:55
clarkb	we spend almost 7 minutes just creating keystone services and roles and such	15:57
*** eharney has quit IRC		15:57
prometheanfire	guess it'd be considered Projects run as part of the OpenStack Infrastructure	15:57
prometheanfire	so as it's OSI, it'd be fine	15:57
clarkb	cmorpheus: ^ Any idea if we can trim that down? like maybe we don't need all of those roles by default?	15:57
clarkb	prometheanfire: and the linking exception makes it extra sfae	15:57
prometheanfire	yep	15:57
*** rpittau is now known as rpittau\|afk		15:57
clarkb	cmorpheus: starts at about http://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/controller/logs/devstacklog.txt.gz#_2019-07-26_10_36_48_436 there and goes to about http://logs.openstack.org/34/659434/25/gate/tempest-full/9362471/controller/logs/devstacklog.txt.gz#_2019-07-26_10_43_43_153	15:58
cmorpheus	clarkb: looking	15:58
*** gyee has joined #openstack-infra		15:59
openstackgerrit	Graham Hayes proposed zuul/nodepool master: Implement an Azure driver https://review.opendev.org/554432	16:01
cmorpheus	the reader member and admin roles are created by keystone and can't be trimmed down, the ResellerAdmin role i think is for swift, i don't think anotherrole or invisible_to_admin are really useful right now	16:01
cmorpheus	oh invisible_to_admin is a project i guess	16:02
cmorpheus	i haven't read scrollback, what is the actual problem?	16:02
prometheanfire	clarkb: I'd put it to the list before we settle on it, given that reqs doesn't like duplicate functionality	16:03
clarkb	cmorpheus: looking into general slowness of devstack + tempest jobs	16:03
clarkb	cmorpheus: devstack took 32 minutes in this case and ~7 of that is just that keystone setup	16:03
clarkb	separately it appears that digging into swap during tempest runs may be cause of slowdown when running tempest	16:03
clarkb	cmorpheus: mostly just trying to see if we can improve runtime by fixing inefficiencies	16:04
*** lucasagomes has quit IRC		16:04
cmorpheus	okay cool	16:05
clarkb	is osc the only way to create those keystone setup entries? Is keystoneclient still a thing or is there an admin tool?	16:06
*** icarusfactor has joined #openstack-infra		16:06
clarkb	(might be helpful to do comparison of tool costs if we can)	16:06
cmorpheus	keystoneclient has no cli and is going away some day	16:07
cmorpheus	the keystone-manage admin tool can't be used for most of this, we only use it to bootstrap an admin user	16:07
clarkb	we unfortunately pushed everything into osc and then realized after it was too late that it had a large performance impact	16:08
*** factor has quit IRC		16:08
clarkb	I guess we could write a script to do that chunk of config	16:08
clarkb	to avoid the cost of python and pkg_resources spin up time	16:08
cmorpheus	one thing we could do is remove all the service users and just have one service user do all the things	16:09
clarkb	mordred: ^ how crazy would such a thing be? and maybe you already have such a thing because sdk testing?	16:09
cmorpheus	crap i have a meeting brb	16:09
*** e0ne has quit IRC		16:12
clarkb	https://opendev.org/openstack/devstack/src/branch/master/stack.sh#L1146-L1161 that is the 7 minute block	16:14
openstackgerrit	Paul Belanger proposed opendev/base-jobs master: Switch base-test ansible version to 2.8 https://review.opendev.org/673012	16:14
*** jamesmcarthur has quit IRC		16:15
pabelanger	infra-root: ^if you don't mind reviewing, this allows us to start testing base-test jobs using ansible 2.8. Which should be fine, and humans need to opt into base-test	16:15
*** jamesmcarthur has joined #openstack-infra		16:16
*** mattw4 has joined #openstack-infra		16:17
openstackgerrit	Merged opendev/system-config master: Add gitea02 replacement to inventory https://review.opendev.org/672999	16:17
*** pkopec has quit IRC		16:18
jangutter	is #openstack-infra the go-to place to talk about devstack?	16:19
jonher	jangutter #openstack-qa is probably best for devstack	16:20
jangutter	thanks jonher!	16:20
jonher	np	16:20
*** diablo_rojo has joined #openstack-infra		16:21
*** jamesmcarthur has quit IRC		16:21
fungi	okay, back now... let's see where we're at	16:23
clarkb	fungi: change to run ansible on gitea02 just merged	16:23
clarkb	fungi: waiting on that to appyl now	16:23
*** ricolin has quit IRC		16:23
fungi	accepted the gitea02 ssh host key on bridge.o.o now	16:23
*** noorul has quit IRC		16:23
fungi	on the next ansible pass it ought to set up docker/gitea and then i can initialize the repos	16:24
*** larainema has quit IRC		16:25
clarkb	mordred: do methods like conn.identity.role_project_user_assignments in sdk's examples actually exist? and if so why do they not show up in the api docs nor outside of the examples when grepped for?	16:27
clarkb	(in particular it would be great to know if that takes any parameters for narrowing the list of roles)	16:28
*** jamesmcarthur has joined #openstack-infra		16:31
openstackgerrit	Merged opendev/base-jobs master: Switch base-test ansible version to 2.8 https://review.opendev.org/673012	16:32
*** pkopec has joined #openstack-infra		16:33
*** icarusfactor has quit IRC		16:33
*** icarusfactor has joined #openstack-infra		16:33
openstackgerrit	Paul Belanger proposed zuul/zuul-jobs master: DNM: Switch unitests to use base-test https://review.opendev.org/673014	16:35
*** mriedem is now known as mriedem_lunch		16:35
*** jamesmcarthur has quit IRC		16:36
*** icarusfactor has quit IRC		16:36
*** jamesmcarthur has joined #openstack-infra		16:36
openstackgerrit	Paul Belanger proposed zuul/nodepool master: DNM: testing ansible 2.8 jobs https://review.opendev.org/673015	16:36
openstackgerrit	Paul Belanger proposed zuul/zuul master: DNM: testing ansible 2.8 jobs https://review.opendev.org/673016	16:37
*** roman_g has quit IRC		16:41
*** iurygregory has quit IRC		16:46
openstackgerrit	James E. Blair proposed zuul/zuul master: Add severity filtering to logs https://review.opendev.org/672839	16:46
*** iurygregory has joined #openstack-infra		16:47
openstackgerrit	James E. Blair proposed zuul/zuul master: Add severity filtering to logs https://review.opendev.org/672839	16:49
clarkb	cmorpheus: mordred https://review.opendev.org/673018 is the I have no idea what I am doing change	16:51
openstackgerrit	Jean-Philippe Evrard proposed openstack/project-config master: [WIP] Add tooling to update python jobs on branch creation https://review.opendev.org/673019	16:52
*** mattw4 has quit IRC		16:54
clarkb	fungi: base.yaml is running on gitea01 now	16:57
clarkb	er 02	16:57
fungi	yep, docker not running yet though	16:58
*** mattw4 has joined #openstack-infra		16:58
fungi	keeping tabs on it and will initialize repos as soon as gitea is up	16:58
*** derekh has quit IRC		16:58
openstackgerrit	Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020	17:01
*** eharney has joined #openstack-infra		17:01
fungi	it did install gnupg this time	17:01
fungi	not docker yet though	17:01
*** jangutter has quit IRC		17:02
mnaser	clarkb: i wonder if replacing all that osc shell code by some sort of python code would speed things up	17:03
mnaser	i'd imagine it would	17:03
clarkb	mnaser: yup see https://review.opendev.org/673018	17:04
clarkb	mnaser: thats the small scale check it on a thing we run 15 times	17:04
clarkb	if that shows improvements we can rewrite to be more complete	17:04
* fungi misread that as a suggestion to replace openstackclient with something written in python		17:04
fungi	1000 4476 1.4 1.1 1702848 90592 ? Ssl 17:04 0:01 /app/gitea/gitea web	17:06
fungi	woo!	17:06
fungi	proceeding	17:06
*** iurygregory has quit IRC		17:07
fungi	https://docs.openstack.org/infra/system-config/gitea.html#deploy-a-new-backend indicates the next step is to stop gitea and restore a database dump	17:08
clarkb	++	17:08
*** jamesmcarthur has quit IRC		17:08
dtroyer	heh, I proposed once upon a time replacing those keystone bits with a string of commands piped into osc so it only loads once, it does help time-wise but the lack of decent error handling was a concern so we dropped it…	17:09
* dtroyer is still accepting sponsorship proposals to write a proper cli in a single non-nterpreted binary		17:09
fungi	`docker ps -a` indicates i should stop 5c59a8a31b9d (opendevorg/gitea:latest) and 8e68bb69a209 (opendevorg/gitea-openssh) but leave 5bdefc623895 (mariadb:10.4) running	17:09
clarkb	dtroyer: fwiw non interpreted binary isn't really the problem as much as "python is silly and scans the entire disk when loading packages then sorts them all by name and version because that is fast"	17:10
fungi	okay, now only the mariadb:10.4 container is "up"	17:10
clarkb	dtroyer: I fully expect my python script there to be much quicker since it doesn't pkg_resources	17:10
clarkb	(at least I hope it doesn't end up doing that via openstacksdk)	17:11
clarkb	this is why I'm testing it small scale first	17:11
dtroyer	clarkb: yes, but it is still a PITA to install	17:11
clarkb	dtroyer: wget'ing a prebuilt binary has a lot of problems with it too :/	17:12
clarkb	mostly in verifying the contents are as expected	17:12
* mnaser rather wget a prebuilt binary that's always a fast client (or even build once) than wait 3 seconds every single time i run a command :(		17:12
mnaser	on a brand new openstack cluster (3 controllers in ha, zero load, zero vms): real 0m2.605s for openstack server list	17:13
fungi	see, it's only 3 seconds because you rounded to the nearest second! ;)	17:14
clarkb	mnaser: ya when I first looekd at this a couple years ago I think the numbers I had were about 50% scanning packages and sorting them and 50% http rtt	17:14
clarkb	but again you can avoid scanning packages and sorting them with python	17:15
clarkb	unfortunately the thing that does the scanning and sorting is pretty well entrenched so shows up all over the place (meaning if you remove it one place you find it in another and the list goes on and on)	17:16
clarkb	and of that 50% rtt for http I want to say a good chunk of it is getting a token?	17:18
clarkb	I don't think I was caching the tokens	17:18
clarkb	but maybe that is automagic and I didn't notice	17:18
*** bobh has joined #openstack-infra		17:23
clarkb	fungi: how goes gitea02 db recovery?	17:25
fungi	just about done shuffling db copies around	17:26
*** bobh has quit IRC		17:26
fungi	wanted to grab a fresh one just in case	17:26
fungi	i can reuse it for subsequent replacements today/tomorrow at least	17:26
clarkb	cmorpheus: can you see my responses to your comments on that devstack change? the sdk api docs say that those parameters are not valid	17:28
*** weifan has joined #openstack-infra		17:28
*** udesale has quit IRC		17:31
*** igordc has joined #openstack-infra		17:34
*** harlowja has quit IRC		17:35
*** igordc has quit IRC		17:36
*** igordc has joined #openstack-infra		17:36
fungi	db import to gitea02 completed and "All missing Git repositories for which records existed have been reinitialized."	17:37
fungi	starting the gerrit replication to it now	17:37
clarkb	cmorpheus: does that mean I should do a filter just of the user then scan the resulting list for RoleAssignments that match the user_domain (and project_domain if assigned)?	17:38
*** electrofelix has quit IRC		17:38
fungi	2115 tasks	17:39
clarkb	fungi: I've realized this may take longer than ~14 minutes beacuse there is no data at all on the rmote	17:39
fungi	oh, perhaps	17:39
clarkb	but we should see how long it actually takes compared to the mostly noop case of 14 minutes	17:39
fungi	that could explain some of it	17:39
fungi	yep	17:39
*** Vadmacs has joined #openstack-infra		17:41
cmorpheus	clarkb: did you see my response?	17:41
*** hwoarang has quit IRC		17:42
*** hwoarang_ has joined #openstack-infra		17:43
cmorpheus	filtering by role assignment on a domain is not the same as the user domain that the get_or_add_user_project_role function is getting	17:43
clarkb	cmorpheus: ya I'm not quite sure I understand what we an to get back? I guess we specify the user domain for when the user isn't in the default domain set by env vars? in which case asking for all the roles won't work?	17:43
clarkb	cmorpheus: right I get they aren't the same but I don't know what it is we actualyl want	17:43
clarkb	we want the list of role assignments for the user that is in domain_not_default_in_env_var ?	17:43
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Swap gitea02 into service and bring down gitea03 https://review.opendev.org/673026	17:44
fungi	that ^ should be the next round once replication completes	17:44
clarkb	and sdk doesn't (document at least) a method to get that data short of creating a different connection with different user domain details maybe	17:44
cmorpheus	clarkb: we want the list of role assignments that the user has on the project, domains only come into play here because both the user and the project are namespaced by a domain	17:44
cmorpheus	if you only have the username then you always need to specify the domain	17:45
cmorpheus	except if it's the default domain then i think osc and may sdk do some magic for you there	17:45
clarkb	cmorpheus: right ok. Domain is specified to be default via env vars. So this will work as long as the user isn't in a non default domain	17:45
clarkb	(and I update it to not filter on domain)	17:45
cmorpheus	okay sounds good	17:45
cmorpheus	also i apologize on behalf of my predecessors who came up with this	17:46
fungi	i'll drink to that	17:48
*** rtjure has quit IRC		17:50
clarkb	once I get enough logging to hopefully figure out if the domains are supplied as ID's or names I'll update the connect call	17:50
*** mriedem_lunch is now known as mriedem		17:50
*** ykarel\|away has quit IRC		17:51
*** tesseract has quit IRC		17:54
clarkb	we appear to primarily operate using IDs. change updated to reflect that now	17:54
*** hwoarang_ has quit IRC		17:54
fungi	651 tasks	17:54
*** Vadmacs has quit IRC		17:55
clarkb	fungi: that is slower but not significanlty so	17:55
fungi	i'll self-approve 673026 once it hits ~0 and start on replacing 03	17:55
*** rtjure has joined #openstack-infra		17:56
*** hwoarang has joined #openstack-infra		17:56
*** chason has quit IRC		17:56
clarkb	fungi: I just noticed a bug in that chagne, the ip addr for gitea02 in the haproxy config is not up to date	17:57
fungi	thanks!!!	17:59
fungi	fixing now	17:59
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Swap gitea02 into service and bring down gitea03 https://review.opendev.org/673026	18:01
fungi	clarkb: ^	18:01
clarkb	that looks better thanks	18:02
fungi	also looks like replication finished	18:02
*** mattw4 has quit IRC		18:02
*** mattw4 has joined #openstack-infra		18:02
fungi	currently there's a slew of git-upload-pack '/openstack/nova' for the zuul user	18:03
*** hwoarang has quit IRC		18:03
fungi	and a sudden burst of index changes	18:03
*** hwoarang has joined #openstack-infra		18:04
*** ramishra has quit IRC		18:12
*** weifan has quit IRC		18:16
*** weifan has joined #openstack-infra		18:17
*** weifan has quit IRC		18:17
donnyd	clarkb: fungi So I am hoping that this will be the weekend I can actually get my controller swapped out. Do you think we should update the quota in zuul, as the control plane will be unreachable for an hour or so? Or just let it ride because weekend loads are low anyways	18:20
fungi	donnyd: dropping the quota to 0 won't help i don't think if the api itself is unreachable, but we can certainly set max-servers to 0 in nodepool. or alternatively just expect that there will be a handful of boot failures logged by nodepool until the api is back up	18:22
donnyd	thats what I mean.. yes. max-servers	18:22
donnyd	thanks fungi	18:22
fungi	i don't see the latter as a problem really	18:22
fungi	i mean, we've designed nodepool to withstand severe provider outages	18:23
donnyd	I am hoping to do an mostly online swap from my edge LB between the two control planes	18:23
fungi	if setting max-servers=0 for it will help you feel less rushed, i'm happy to approve such a change	18:23
donnyd	So the new controllers will be built and populated in parallel and then just swap out the place the edge LB sends requests to	18:24
donnyd	All sounds good... till it doesn't	18:24
fungi	if the account credentials, endpoint and so on will remain the same, then i don't personally see any need to zero out the max-servers for it	18:25
donnyd	Yea, I have all of what was used to provision this automated (mostly)	18:27
donnyd	so nothing should change on your end at all	18:27
pleia2	happy sysadmin day :)	18:27
clarkb	nodepool should happily deal with those api errors, it may print a lot of log meesages about it though (thats fine)	18:27
clarkb	pleia2: and to you too!	18:28
donnyd	I'm hopeful for a 5 minute outage... but I also live in the real sysadmin world	18:28
clarkb	pleia2: are you sysadmining for ibm?	18:28
*** Vadmacs has joined #openstack-infra		18:29
fungi	thanks pleia2!!!	18:31
*** hwoarang has quit IRC		18:32
fungi	may your systems be bountiful	18:32
pleia2	clarkb: only a tiny bit here and there (we run an openstack-driven cloud that launches VMs on mainframes, so I poke my head in when needed)	18:32
pleia2	mostly I do dev advocacy though	18:32
*** hwoarang has joined #openstack-infra		18:32
pleia2	(we use the z/VM connector for nova, but switching to KVM soon, which runs on s390x and will make our lives 100x easier)	18:33
*** tdasilva has quit IRC		18:33
*** weifan has joined #openstack-infra		18:34
*** weifan has quit IRC		18:34
*** weifan has joined #openstack-infra		18:38
fungi	pleia2: openstack in use at ibm? i thought that was only a myth!	18:39
pleia2	haha	18:41
clarkb	pleia2: is kvm loaded off of a virtual card deck?	18:42
*** weifan has quit IRC		18:42
clarkb	it would make me so happy if it is	18:42
pleia2	clarkb: I don't actually know how it works :)	18:43
pleia2	it's a supported thing though, right alongside z/VM	18:44
*** mattw4 has quit IRC		18:46
*** bobh has joined #openstack-infra		18:48
corvus	pleia2: happy sysadmin day to you too!	18:48
corvus	who brought the cake?	18:50
clarkb	I don't have cake but now I want some	18:50
clarkb	I did just eat some leftover curry	18:50
*** mattw4 has joined #openstack-infra		18:51
corvus	clarkb: was that breakfast or lunch?	18:51
clarkb	something in the middle but closer to lunch	18:51
clarkb	I think my little python script in devstack didn't break this time	18:52
clarkb	not sure if it is faster yet. Will have to wait for logs and compare to other jobs on that cloud	18:52
fungi	i could go for some currycake	18:53
clarkb	hrm I got the handling of user_domain and project_domain wrong looks like	18:53
clarkb	because they are names not ids	18:53
clarkb	so inconsistent	18:53
fungi	pleia2: i have fond memories of being a racf administrator for s/390 clusters running linux in lpars. i hope it's as enjoyable for you!	18:57
fungi	[edit: i guess we called it a "sysplex" not a "cluster"]	18:57
*** bobh has quit IRC		19:02
pleia2	fungi: cool, it sure is :)	19:03
clarkb	cmorpheus: it is amazing how complicated this little script ends up getting, makes me wonder how much faster it would be overall to continue to support names and ids over osc	19:04
*** rh-jelabarre has quit IRC		19:04
cmorpheus	heh	19:04
clarkb	I've basically run into needingto look up all inputs to get their id's because they might be names	19:05
clarkb	(devstack uses both names and ids)	19:05
fungi	and i guess the api doesn't treat them interchangeably	19:07
clarkb	the wins now may not be seen unless we rewrite that section of shell into python entirely. Then we can get the role user and project data once and reuse it over and over and over	19:07
*** weifan has joined #openstack-infra		19:07
*** weifan has quit IRC		19:08
fungi	and so the slow progression of translating devstack into python continues	19:08
*** weifan has joined #openstack-infra		19:08
clarkb	fungi: apparently not re treating them the same	19:08
fungi	that's one thing i've come to appreciate about gerrit's rest api... you can provide a variety of typed inputs for certain parameters and it will decide how to equate them	19:09
fungi	so if you're doing an account lookup you can provide an id number or a username or an e-mail address or... and it will dereference them all to the same values behind the scenes	19:10
fungi	granted, it also returns lists for just about everything, because there's no guarantee that the inputs reduce to a single value	19:12
cmorpheus	clarkb: yeah you're starting to reproduce part of why osc is so slow	19:12
*** weifan has quit IRC		19:13
fungi	sometimes the only way to truly understand a problem is to try and reproduce the solution?	19:13
openstackgerrit	Paul Belanger proposed zuul/zuul-jobs master: DNM: Switch unitests to use base-test https://review.opendev.org/673014	19:15
*** rh-jelabarre has joined #openstack-infra		19:16
openstackgerrit	Paul Belanger proposed zuul/zuul master: DNM: testing ansible 2.8 jobs https://review.opendev.org/673016	19:18
openstackgerrit	Paul Belanger proposed zuul/nodepool master: DNM: testing ansible 2.8 jobs https://review.opendev.org/673015	19:18
*** diablo_rojo has quit IRC		19:26
clarkb	cmorpheus: ya I think the proper way this gets faster is to rewrite the whole configure accounts stuff in python rather than just the function bits	19:30
openstackgerrit	Merged opendev/system-config master: Swap gitea02 into service and bring down gitea03 https://review.opendev.org/673026	19:31
*** weifan has joined #openstack-infra		19:32
fungi	as soon as that gets installed onto the lb and the active connections to 03 trail off, i'll rip and replace	19:35
*** weifan has quit IRC		19:36
*** weifan has joined #openstack-infra		19:36
*** weifan has quit IRC		19:37
*** weifan has joined #openstack-infra		19:37
*** igordc has quit IRC		19:37
*** weifan has quit IRC		19:38
*** weifan has joined #openstack-infra		19:38
*** weifan has quit IRC		19:39
*** weifan has joined #openstack-infra		19:39
*** weifan has quit IRC		19:39
*** weifan has joined #openstack-infra		19:40
*** weifan has quit IRC		19:40
*** slaweq has quit IRC		19:41
*** wpp has quit IRC		19:48
openstackgerrit	Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020	19:50
*** ralonsoh has quit IRC		19:55
openstackgerrit	Jeff Liu proposed zuul/zuul-operator master: Verify Operator Pod Running https://review.opendev.org/670395	19:55
openstackgerrit	Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020	19:55
mordred	wow. that's a fun change!	19:58
*** goldyfruit has quit IRC		20:02
paladox	mordred bazel caused load on my mac to go up to 233.	20:03
mordred	paladox: well - I can build 2.15 now - but 2.16 and 3.0 are _really_ unhappy	20:04
*** goldyfruit has joined #openstack-infra		20:04
paladox	oh	20:04
paladox	mordred do you run out of cpu/ram for 2.16?	20:04
mordred	yeah - if I use the same settings that work for 2.15	20:05
paladox	I'm surprised 3.0 is a problem as that removed GWTUI (so less to build)	20:05
*** wpp has joined #openstack-infra		20:05
openstackgerrit	Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273	20:05
mordred	yeah	20:05
mordred	paladox: oh - actually - the last time I ran it, 2.16 failed in a new and different way	20:06
paladox	oh	20:06
mordred	http://logs.openstack.org/73/672273/6/check/system-config-build-image-gerrit-2.16/1c13564/job-output.txt.gz#_2019-07-26_13_32_10_643536	20:06
*** harlowja has joined #openstack-infra		20:07
mordred	and I seem to be foot-gunning with 3.0 ... doh.	20:07
paladox	ohhh, was it trying to use the master branch?	20:07
*** Vadmacs has quit IRC		20:07
paladox	https://github.com/GerritCodeReview/plugins_download-commands/commit/891455076417dd097fdfd63f4afc0d28a3e85aff <-- was the change that caused that	20:08
*** Vadmacs has joined #openstack-infra		20:08
paladox	https://github.com/GerritCodeReview/plugins_download-commands/branches dosen't appear to have a stable-2.16 branch	20:08
openstackgerrit	James E. Blair proposed zuul/zuul master: Colorize log severity https://review.opendev.org/673103	20:09
openstackgerrit	Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020	20:09
openstackgerrit	Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273	20:09
*** igordc has joined #openstack-infra		20:10
openstackgerrit	Paul Belanger proposed zuul/nodepool master: DNM: testing ansible 2.8 jobs https://review.opendev.org/673015	20:11
*** slaweq has joined #openstack-infra		20:11
*** weifan has joined #openstack-infra		20:11
mordred	paladox: hrm. good catch.	20:11
*** sgw has joined #openstack-infra		20:12
*** weifan has quit IRC		20:16
*** slaweq has quit IRC		20:16
openstackgerrit	James E. Blair proposed zuul/zuul master: Add raw links to log manifest https://review.opendev.org/673104	20:25
openstackgerrit	James E. Blair proposed zuul/zuul master: Rename view to logfile https://review.opendev.org/673105	20:25
*** wolke has quit IRC		20:26
*** wolke has joined #openstack-infra		20:27
openstackgerrit	Monty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well https://review.opendev.org/672273	20:27
mordred	paladox: thanks! I Think that version might just work (instead of overlaying the zuul cloned repo directly, it uses submodule update --init to do it - but since we don't have an origin remote but we DO have things cloned in the right relative path, it should clone from the already cloned repo and not across the network	20:28
mordred	corvus: ^^ weird but nice side-effect for gerrit submodule plugin repos and zuul	20:28
paladox	:)	20:29
mordred	corvus: I think we might want to update the playbook to do that for all of the plugin repos, not just download-commands	20:29
mordred	so that we get the ref that the gerrit repo is expecting. now - that obviously breaks depends-on - but we can solve that when we have a need to do a depends-on with a plugin ref	20:29
*** wolke has quit IRC		20:29
*** wolke has joined #openstack-infra		20:30
corvus	mordred: wait we use required-projects for the plugins	20:32
mordred	yes. that's why the submodule update --init will work	20:32
mordred	since there's no remote origin remote, it'll actually do relative path	20:32
*** wolke has quit IRC		20:33
*** wolke has joined #openstack-infra		20:33
corvus	right, yes. i think i misunderstood what you were saying before :)	20:33
mordred	corvus: I probably misunderstood what I'm saying - we're talking about submodules after all :)	20:33
corvus	haha	20:34
corvus	and yeah the shape of that change looks good to me	20:34
mordred	corvus: here's hoping this build works!	20:34
mordred	if it does, I might try making all of the plugin repos use this mechanism instead of the copy	20:34
mordred	in fact - why don't I do that as a followup...	20:35
fungi	gitea02 is in rotation now and gitea03 is removed, working on replacement	20:35
corvus	mordred: oh wait	20:35
*** wolke has quit IRC		20:35
paladox	that reminded me to pull in 2.15.15 :P, which i also found the build now fails :(	20:35
corvus	mordred: okay, so download-commands is the issue -- why don't we just specify the right branch for that?	20:35
paladox	corvus it dosen't have any 2.16+ branches	20:35
paladox	apparently	20:35
paladox	See https://github.com/GerritCodeReview/plugins_download-commands/branches	20:36
*** wolke has joined #openstack-infra		20:36
corvus	mordred: ok; the downside of that is that depends-on won't work	20:37
corvus	building at all > supporting depends-on > not supporting depends on	20:38
mordred	yes	20:38
corvus	so i agree this is the best we can do with download-commands :)	20:38
corvus	but maybe it's not what we want for the others	20:38
mordred	yeah - I agree	20:38
mordred	also - doing it for the others makes the playbook more, not less, complex	20:39
mordred	because doing submodule update --init is only useful for "builtin" plugins - and our "standard" set is a mix of both	20:39
mordred	for non-standard, moving the repo in is always the right choice	20:40
mordred	corvus: can override checkout take a sha?	20:40
corvus	mordred: i think it has to be a ref	20:42
mordred	darn	20:42
corvus	but can be a branch or tag	20:42
*** weifan has joined #openstack-infra		20:42
mordred	I was thinking if it could we could just override-checkout for the sha that 2.16 wants and then depends-on is only borked for 2.16	20:43
openstackgerrit	Monty Taylor proposed opendev/system-config master: Override-checkout download-commands to v2.16.10 https://review.opendev.org/673107	20:44
mordred	corvus: woot. there's a tag	20:44
openstackgerrit	Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020	20:47
*** wolke has quit IRC		20:48
*** goldyfruit has quit IRC		20:48
*** wolke has joined #openstack-infra		20:49
*** wolke has joined #openstack-infra		20:51
*** wolke has quit IRC		20:52
*** priteau has quit IRC		20:53
jrosser	this has merged https://review.opendev.org/#/c/672952/, but i don't see it here https://opendev.org/openstack/ansible-config_template/commits/branch/master?lang=en-US#	20:54
jrosser	is something broken?	20:54
fungi	the commit itself seems to have replicated: https://opendev.org/openstack/ansible-config_template/commit/b7f38639a21857aead860195d12eccf6eb9f437e	20:56
jrosser	i just rechecked a ton of jobs that need that and they've all failed	20:57
jrosser	suggests that master doesnt point to quite the right place	20:58
corvus	jrosser: have a link to a job that failed?	20:59
fungi	gerrit's on-disk copy of the repository indicates b7f38639a21857aead860195d12eccf6eb9f437e is the tip of master, so in theory ci jobs should be using that	20:59
jrosser	http://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz	21:00
fungi	but i do wonder what's happened to replication to gitea	21:00
openstackgerrit	Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020	21:02
*** goldyfruit has joined #openstack-infra		21:05
clarkb	cmorpheus: https://review.opendev.org/673108 I probably went a little crazy	21:05
fungi	that master branch state doesn't seem to have replicated to any of the active gitea backends	21:06
corvus	evrardjp, jrosser: is anyone working on updating that job to use zuul git repo checkouts? because we really shouldn't be cloning from opendev.org in jobs	21:06
cmorpheus	clarkb: omg	21:06
fungi	i'm going to try to manually trigger full replication for openstack/ansible-config_template and see what happens	21:06
cmorpheus	clarkb: you replaced 80 lines of shell with 300 lines of python	21:07
clarkb	cmorpheus: I know. I just want real data to know if there are gains to be had here. If not I'll give up	21:07
clarkb	cmorpheus: this change should be alrge enough and caching enough id data to see a delta though	21:08
jrosser	corvus: well in theory it should be using them https://github.com/openstack/openstack-ansible/blob/master/scripts/get-ansible-role-requirements.yml#L35-L61	21:08
jrosser	but of course that could be broken	21:08
mordred	http://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz#_2019-07-26_19_56_33_116422 seems to be running, but then http://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz#_2019-07-26_19_56_46_107736 is still doing the clone	21:10
corvus	jrosser: oh that looks promising... /me digs into that	21:10
mordred	so I'm thinking maybe the filtering in https://github.com/openstack/openstack-ansible/blob/master/scripts/get-ansible-role-requirements.yml#L74-L78 is not doing the right thing/	21:10
mnaser	oh look osa things	21:10
mordred	?	21:10
mnaser	i wonder if we're missing required-projects	21:11
fungi	forcing full replication doesn't seem to have updated openstack/ansible-config_template master branch state	21:11
jrosser	this looks suspect http://logs.openstack.org/73/670473/8/check/openstack-ansible-deploy-aio_metal-ubuntu-bionic/2c66b55/job-output.txt.gz#_2019-07-26_19_56_43_646148	21:11
mnaser	https://review.opendev.org/#/c/670473/ is an os_ceilometer change	21:11
mnaser	and well it checked out os_ceilometer	21:11
corvus	mnaser, jrosser: yes, i think that's it	21:11
mnaser	so because we dont have all the required-projects listed	21:12
corvus	so it is working, it's just there are no required projects, so it's only activated for dependencies	21:12
fungi	for some reason the master branch state is correct in the github mirror but not on the gitea backends	21:12
mnaser	we're not hard failing like things failed whn we moved to zuulv3	21:12
fungi	checking gerrit logs next	21:12
mnaser	cause we don't hard fail if the stuff is not missing	21:12
mnaser	(we should probably hard fail in ci if a required-project is missing)	21:12
corvus	and if we added more there, then the job would benefit from the cache and be faster (and also be immune to mirror hiccups)	21:12
mnaser	and skip when not in ci	21:12
mnaser	let me hack up something	21:12
*** rfarr has joined #openstack-infra		21:13
jrosser	mnaser: thanks :)	21:13
corvus	mnaser, jrosser, fungi: cool, so to summarize -- the job is set up to use zuul repos but simply doesn't have enough required projects listed so we fell back on the mirror, and the mirror is slightly out of date for as-yet-unknown reason.	21:13
corvus	evrardjp: ^ you can ignore ping from earlier :)	21:14
fungi	corvus: agreed. still digging into the outdated state of the master branch ref for that repo in gitea	21:15
fungi	no mentions of that repo in the gerrit error log aside from some timeouts while stackalytics-bot-2 was trying to query it	21:15
clarkb	fungi: if you rereplicate that repo does it update?	21:15
clarkb	can limit it to a single gitea to prevent polluting the debuggable state	21:16
fungi	that did nothing as far as i could tell (saw it enqueue the replication events, waited until they were done, still no change)	21:16
fungi	all active gitea backends seem to be in a similar state with that repo too	21:16
clarkb	did we have fs/disk problems again?	21:19
clarkb	or maybe this is a hold over?	21:19
mnaser	jrosser, corvus, fungi: https://review.opendev.org/673109 should be a failing CI job (because not everything is in required projects). if that fails as expected, i'll readd them to required-projectrs	21:21
*** beekneemech is now known as bnemec-pto		21:22
*** eharney has quit IRC		21:23
fungi	clarkb: what's strange is that gitea02 was created at 14:43z today, its database was copied from gitea01's most recent nightly mysqldump, and then content was replicated in. the openstack/ansible-config_template master branch state updated at 19:38z today, long after we had finished replicating all repository states into gitea02	21:25
clarkb	oh interesting	21:25
fungi	so it's hard to blame this on old state, unless we blame the copied database dump?	21:25
*** raissa has quit IRC		21:26
*** Lucas_Gray has joined #openstack-infra		21:29
*** rfolco\|rover has quit IRC		21:30
openstackgerrit	Jeff Liu proposed zuul/zuul-operator master: use opendev image building system for zuul-operator test https://review.opendev.org/673020	21:30
fungi	on gitea02, /var/gitea/data/git/repositories/openstack/ansible-config_template.git/refs/heads/master is still 09c76e238026d7ba4134ee2b66a4e9fd2617b843	21:33
fungi	which coincides with what the webui shows	21:33
clarkb	does a fsck report anyting?	21:34
*** wolke has joined #openstack-infra		21:34
*** ekultails has quit IRC		21:34
fungi	i do realize i forgot to do a git gc on gitea02, though that shouldn't affect this	21:34
clarkb	no should only affect performance (laod avg may be higher on that host than the others)	21:35
fungi	git fsck reports no problems for that repo	21:36
*** wolke has left #openstack-infra		21:37
*** Vadmacs has quit IRC		21:38
fungi	also, it replicated the head change to github, but not to gitea	21:38
fungi	or rather, the replication to github is successfully reflected while on gitea it is not	21:39
clarkb	maybe fsck of the content in gerrit/github will reveal something that might make gitea unhappy?	21:40
clarkb	we have had that with github before where replication didn't work due to problems in the repo that gerrit was ok with	21:40
fungi	just a few dangling commits: http://paste.openstack.org/show/754915/	21:41
clarkb	fungi: if you docker ps -a you can check the logs for the gitea ssh container using docker logs --since --before $containerid type stuff	21:42
clarkb	maybe do that for the time around when you triggered replication of that repo and see if anything shows up as a problem?	21:42
fungi	the opendevorg/gitea-openssh has no docker logs at all	21:49
fungi	the opendevorg/gitea:latest docker logs don't seem to have any sort of failure/error messages related to openstack/ansible-config_template (just entries which look like clients fetching/cloning from it)	21:50
clarkb	in that case you may have to docker exec -it $opensscontainer bash	21:52
clarkb	then poke around and look for logs	21:52
clarkb	that gives you a bash process running int he context of that container	21:53
*** mattw4 has quit IRC		21:53
*** jamesmcarthur has joined #openstack-infra		21:55
fungi	it looks like everything in that filesystem root is actually just mapped to files in /var/gitea	21:56
fungi	so easier to browse/view them without docker getting in the way	21:57
clarkb	including sshd logs?	21:57
fungi	(at least so far i've found no files via a docker shell which weren't present there outside the container)	21:57
fungi	oh, the ssh container. i'll try that one	21:58
fungi	looks like they share the same filesystem tree	21:59
paladox	mordred heh gerrit 2.15.15 broke some of our plugins due to a bazel change, so i've had to spend time pulling in the plugin update && also adapting one of our other plugins to the changes.	22:00
clarkb	fungi: it is possible that sshd is only logging to syslog and we have to mount in /dev/log for that container to properly log (we did this with haproxy)	22:01
clarkb	corvus: mordred ^ do you know?	22:01
paladox	but that also means that the gerrit docker image we have will have to stay broken with 2.15.15 until we merge && deploy a new image.	22:01
fungi	clarkb: yeah, i don't see any syslog in /var/log of the ssh container at least	22:03
*** whoami-rajat has quit IRC		22:06
fungi	kinda tempted to press forward with the gitea03 replacement (ready to write the change to update dns records and add it back to the inventory) to see if it gets that head replicated	22:06
clarkb	fungi: that seems like a reasonable test, worst case 03 will be in the same situation as the others	22:07
clarkb	it would be cool if pip didn't tell you every package it was ignoring bceause your python version idnd't match env markers	22:10
*** kjackal has quit IRC		22:10
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Add gitea03 replacement to inventory https://review.opendev.org/673113	22:10
clarkb	that IP tells me it is a gitea03 + 2	22:11
*** slaweq has joined #openstack-infra		22:11
openstackgerrit	Jeremy Stanley proposed opendev/zone-opendev.org master: Update IP address for gitea03 https://review.opendev.org/673114	22:11
*** xek__ has quit IRC		22:12
*** raissa has joined #openstack-infra		22:14
*** raissa has quit IRC		22:14
*** raissa has joined #openstack-infra		22:15
*** jamesmcarthur has quit IRC		22:15
*** raissa has quit IRC		22:15
*** slaweq has quit IRC		22:16
*** raissa has joined #openstack-infra		22:16
*** raissa has joined #openstack-infra		22:16
*** raissa has quit IRC		22:17
fungi	infra-root: i'm at a loss for why /var/gitea/data/git/repositories/openstack/ansible-config_template.git/refs/heads/master is out of date on all our gitea servers (still referring to 09c76e238026d7ba4134ee2b66a4e9fd2617b843 when it should be b7f38639a21857aead860195d12eccf6eb9f437e like /home/review_site/git/openstack/ansible-config_template.git/refs/heads/master has)	22:17
fungi	(...has on our gerrit server)	22:17
clarkb	fungi: you did say the ref itself is present but the master ref isn't updated?	22:17
fungi	yep	22:17
corvus	by ref you mean commit?	22:18
clarkb	er yes	22:18
fungi	ref is already there, would have been replicated when the review patchset was pushed	22:18
fungi	commit, yes	22:18
corvus	and yeah, it's not a merge commit, so it's the same as a refs/changes ref	22:18
clarkb	as a sanity check plenty of disk	22:18
clarkb	at least on gitea01	22:18
corvus	i don't know where the ssh logs end up	22:18
corvus	wherever ssh puts them by default at LogLevel INFO	22:19
fungi	yeah, i had no luck finding them either	22:19
clarkb	I think sshd probably logs to syslog and we don't have a syslog	22:19
corvus	there's no other special logging	22:19
fungi	i'm in the process of building a new gitea server (since i was doing that anyway) to see what state ends up replicated to it	22:19
clarkb	we could mount /dev/log as with haproxy, that might get a little confusing as there is the host's sshd and the container sshd but probably not to the point where we don't want to do that	22:20
fungi	from a timeline perspective, that change merged after gitea02 was built, replicated to and brought into service yet it has the same stale state as its sibling gitea servers	22:21
corvus	what was the state of gitea03 at the time	22:21
corvus	[2019-07-26 22:19:55,500] [f900b6cd] Cannot replicate to ssh://git@gitea03.opendev.org:222/openstack/ansible-config_template.git	22:21
fungi	offline	22:21
fungi	i had already deleted the nova server instance for it	22:21
openstackgerrit	Merged opendev/zone-opendev.org master: Update IP address for gitea03 https://review.opendev.org/673114	22:22
fungi	there are now changes up to add dns and inventory entries for the new 03	22:22
fungi	dns just now merged, from the looks of it	22:22
corvus	when was the original replication time?	22:22
clarkb	https://review.opendev.org/#/c/672952/ is the change looks like 19:38UTC ish	22:23
fungi	there were two because i manually initiated a replication of that repo in troubleshooting this	22:23
clarkb	for when it merged which should be near the replication time	22:23
fungi	but yeah, that would be the earlier one	22:23
openstackgerrit	Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530	22:24
fungi	second was sometime between 21:06 and 21:11 since the various tasks were queued in gerrit for a few minutes	22:24
corvus	in a minute, i'm going to manually trigger replication while running strace on sshd on gitea02	22:26
corvus	just as soon as this openstack/openstack replication finishes	22:26
*** michael-beaver has quit IRC		22:27
fungi	sounds like a reasonable test	22:31
corvus	the replication commands are not appearing in gerrit's log	22:33
corvus	if i run this: ssh review replication start zuul/zuul-operator --url gitea02.opendev.org	22:33
corvus	i see this: [2019-07-26 22:32:49,250] [] scheduling replication zuul/zuul-operator:..all.. => ssh://git@gitea02.opendev.org:222/zuul/zuul-operator.git	22:33
corvus	if i run this: ssh review replication start openstack/ansible-config_templates --url gitea02.opendev.org	22:34
corvus	i see nothing in replication.log	22:34
corvus	did i spell the project right?	22:34
corvus	no i did not, it's singular.	22:34
corvus	i'll try again	22:34
clarkb	openstack/ansible-config_template <- straight copy paste from gerrit webui	22:35
clarkb	no s	22:35
clarkb	oh you figured it out before me	22:35
corvus	it's "running" now	22:35
fungi	yep, sorry, had my nose in the gitea database schema seeing if it could somehow be double-tracking head states in there incorrectly	22:35
fungi	(found no obvious place it might track repository heads in the db, fwiw)	22:36
*** mattw4 has joined #openstack-infra		22:38
*** gyee has quit IRC		22:39
corvus	i think the logs are going to stderr	22:41
corvus	i don't know why they are not showing up in "docker logs"	22:41
corvus	-rw-r--r-- 1 1000 1000 73 Jul 26 19:38 .gitconfig	22:43
corvus	-rw-r--r-- 1 1000 1000 0 Jul 26 19:43 .gitconfig.lock	22:43
corvus	well that's a coincidince, huh?	22:43
corvus	have we confirmed any updates since then?	22:44
fungi	those times do look suspicious	22:44
fungi	is that inside that particular repo?	22:44
corvus	no, that's the homedir	22:45
corvus	on gitea08	22:45
corvus	same times on all servers	22:45
*** jamesmcarthur has joined #openstack-infra		22:45
corvus	/var/gitea/data/git	22:45
clarkb	fwiw that .gitconfig.lock is the same file we had to delete after restarting the giteas yesterday before replication would work	22:46
clarkb	(was assumed to be fallout of the ceph disaster earlier in the day)	22:46
clarkb	maybe an OOM left it behind and the 8GB swapfile isn't big enough?	22:46
fungi	that's well after gitea02 was brought back online, so there was nothing manual going on with gitea servers	22:46
corvus	i don't see any currently running processes started around that time	22:47
clarkb	[Fri Jul 26 19:39:45 2019] INFO: task khugepaged:65 blocked for more than 120 seconds.	22:47
clarkb	from dmesg -T	22:48
clarkb	not an OOM but unhappy disk?	22:48
clarkb	[Fri Jul 26 19:39:19 2019] systemd[1]: systemd-journald.service: State 'stop-sigabrt' timed out. Terminating.	22:48
* clarkb looks on other giteas		22:48
fungi	it does indeed look like other repos aren't updating either	22:48
fungi	https://opendev.org/opendev/zone-opendev.org/commits/branch/master is missing 673114 which merged at 22:22	22:49
clarkb	[Fri Jul 26 19:38:44 2019] INFO: task jbd2/vda1-8:248 blocked for more than 120 seconds. <- from gitea01	22:49
corvus	[pid 26929] write(2, "error: could not lock config fil"..., 68) = 68	22:49
clarkb	seems like something happened in the cloud around that time and made the hosts unhappy, git process or gitea may have not cleaned up after itself as a result?	22:49
corvus	[pid 26929] exit_group(-1 <unfinished ...>	22:49
corvus	(that's from that strace)	22:50
clarkb	so maybe we remove those lock files again, rereplicate (again) and bring this up with mnaser?	22:50
fungi	gitea02 is showing signs of write errors as well from 19:39 today	22:50
clarkb	fungi: I expect all of them will exhibit that but 03	22:50
fungi	gitea02 didn't even get nova booted until 14:43 today, so this is definitely not lingering issues from instances which went through the problems on wednesday	22:51
*** goldyfruit has quit IRC		22:52
fungi	looks like a wholly fresh incident similar to the early wednesday one	22:52
clarkb	yup	22:52
fungi	or was that yesterday? i've lost track already	22:52
corvus	clarkb: i agree with the mitigation plan	22:53
corvus	i can easily remove the locks if ya'll are ready	22:53
fungi	previous incident was early 2019-07-25 so i guess that was yesterday (thursday)	22:53
*** weifan has quit IRC		22:53
fungi	corvus: sounds good	22:53
corvus	clarkb: ?	22:54
clarkb	corvus: am ready	22:54
clarkb	and plan sounds good	22:54
corvus	done	22:54
clarkb	fungi: should we trigger replication ~3 at a time like yesterday?	22:55
fungi	i'm on hand to babysit the replication... no exciting friday night plans	22:55
*** gyee has joined #openstack-infra		22:55
clarkb	I'm around for the next hour at least	22:55
openstackgerrit	Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530	22:55
clarkb	fungi: just let me know how I can help	22:55
fungi	we could try x4 and see what the slowdown is	22:55
clarkb	fungi: ++	22:55
corvus	i'll trigger the single rep on gitea02 now	22:56
fungi	thanks corvus, a good canary	22:57
fungi	i have a feeling we could fire all 8 together and they'd still complete in a reasonable amount of time, if the slowdown was really related to other use of the server	22:57
*** aaronsheffield has quit IRC		22:57
corvus	02 is good now on a-c_t	22:58
fungi	though it was rather uncanny that last night the replication time scaled roughly linearly to the number of repos we replicated at once	22:58
fungi	implying "parallel" replication still has some inherent serialization imposed somewhere	22:58
corvus	i have to run, so i'll leave the repl kick-off to you	22:59
fungi	thanks again corvus!!!	22:59
*** weifan has joined #openstack-infra		22:59
fungi	i'll fire up full replication for 05-08 first	22:59
clarkb	sounds good	22:59
corvus	(it wouldn't be sysadmin day if we didn't get to do some sysadminning, huh?)	22:59
fungi	all queued	23:00
fungi	indeed, indeed	23:00
fungi	happy sysadmin day to all	23:00
*** jamesmcarthur has quit IRC		23:00
fungi	i'm polling the queue length with some timestamping again so i'll know what time it reaches ~0	23:03
*** weifan has quit IRC		23:03
fungi	guessing it'll be around 23:55 if the trending from last night remains consistent	23:03
openstackgerrit	Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530	23:05
*** mriedem has quit IRC		23:08
*** pkopec has quit IRC		23:09
*** jamesmcarthur has joined #openstack-infra		23:10
*** slaweq has joined #openstack-infra		23:11
*** slaweq has quit IRC		23:15
openstackgerrit	Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530	23:22
clarkb	I think my devstack into python chagne is working now and takes keystone account setup from ~60 seconds to 7 seconds?	23:27
clarkb	there are a bunch of other keystone user and endpoint type creation things that before were taking that ~7 minutes which we can probably get down to ~30 seconds if rewritten in python	23:27
*** diablo_rojo has joined #openstack-infra		23:28
*** tosky has quit IRC		23:31
*** jamesmcarthur has quit IRC		23:42
fungi	replication seems to be tracking far longer than my earlier estimate	23:42
fungi	so either things are slower now than this time last night, or 4x is past the tipping point for contention	23:43
*** mattw4 has quit IRC		23:45
fungi	taking nearly twice as long as projected	23:50
clarkb	huh	23:51
fungi	yeah, i expected it to wrap up around nowish, but 2788 tasks	23:54
openstackgerrit	Matthew Thode proposed openstack/diskimage-builder master: support alternate portage directories https://review.opendev.org/671530	23:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!