Thursday, 2021-07-01

*** ysandeep\|away is now known as ysandeep		01:09
*** fzzf is now known as Guest1071		02:57
*** fzzf1 is now known as fzzf		02:57
*** ykarel\|away is now known as ykarel		04:39
*** bhagyashris is now known as bhagyashris\|ruck		05:22
*** chandankumar is now known as chkumar\|rover		05:31
frickler	bhagyashris\|ruck: chkumar\|rover: apart from all the nick changes being pretty noisy, can you explain to a non-native speaker what this ruck/rover thing is supposed to mean?	06:04
bhagyashris\|ruck	frickler, hey you will get info here https://docs.openstack.org/tripleo-docs/latest/ci/ruck_rover_primer.html	06:07
bhagyashris\|ruck	the major responsibilities are 1. ensuring gate queues are green to keep TripleO patches merging. 2. ensuring promotion jobs are green to keep TripleO up to date with the rest of OpenStack and everything else that isn’t TripleO! Target is bugs filed + escalated + fixed for promotion at least once a week.	06:08
*** akekane_ is now known as abhishekk		06:32
*** jpena\|off is now known as jpena		07:34
zbr	clarkb: i think it does	07:41
*** ysandeep is now known as ysandeep\|lunch		07:48
*** slaweq_ is now known as slaweq		08:29
*** ykarel is now known as ykarel\|lunch		08:44
*** ysandeep\|lunch is now known as ysandeep		08:45
*** sshnaidm is now known as sshnaidm\|afk		09:24
*** Guest651 is now known as aluria		09:26
*** ykarel\|lunch is now known as ykarel		09:59
*** sshnaidm\|afk is now known as sshnaidm		10:47
*** jpena is now known as jpena\|lunch		11:47
*** jpena\|lunch is now known as jpena		12:45
clarkb	zbr: I'll start to wind down those processes after some breakfast. It is easy enough to turn them on again if we start to fall behind on the queue	14:22
zbr	okey	14:22
*** ykarel is now known as ykarel\|away		14:25
elodilles	hi, could someone approve this? https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/797235	15:19
elodilles	the PTLs have approved already	15:19
clarkb	elodilles: done	15:21
elodilles	and as usual, I'm planning to run the script to delete some more eol'd branch (as there are another batch that could be deleted)	15:21
elodilles	clarkb: thanks!	15:21
fungi	sounds good, thanks elodilles!	15:28
opendevreview	Merged openstack/openstack-zuul-jobs master: Remove ocata from periodic job template of neutron and ceilometer https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/797235	15:32
clarkb	#status log Stopped log workers and logstash daemons on logstash-worker11-20 to collect up to date data on how many indexer workers are necessary	15:39
opendevstatus	clarkb: finished logging	15:39
clarkb	https://grafana.opendev.org/d/5Imot6EMk/zuul-status?viewPanel=17&orgId=1 we will want to monitor this set of metrics as well as what e-r reports its indexing delta is	15:39
clarkb	this is a 50% reduction so we can do a binary search towards what seems necessary	15:40
*** sshnaidm is now known as sshnaidm\|afk		15:46
wolsen[m]	clarkb: I see you commented on https://review.opendev.org/c/openstack/charm-deployment-guide/+/798273 that a recheck won't help with the promotion failure. Do you think the best course of action to submit another patch and let the next promotion effort push it up?	15:49
clarkb	wolsen[m]: that is usually what we recommend if it isn't too much ahssle. Otherwise a zuul admin has to queue the jobs up again by hand	15:50
wolsen[m]	clarkb: ack, that's fairly straightforward and we can go with that route. Thanks for the confirmation :-) just wanted a sanity check before going down a futile path	15:51
fungi	wolsen[m]: yeah, if you don't have anything else worth approving soon, just let us know what needs to be run and i or others can take care of it	15:51
wolsen[m]	thx fungi, I'll circle back if we need it	15:52
*** ysandeep is now known as ysandeep\|dinner		16:01
*** jpena is now known as jpena\|off		16:36
*** ysandeep\|dinner is now known as ysandeep		16:44
clarkb	the indexer queue has grown to about 1k entries. I'll continue to check it periodically. I don't think that is catastrophically high, but if it indicates the start of a "we can't keep up" trend that we should start logstash and log worker proceses again	16:49
*** ysandeep is now known as ysandeep\|out		17:24
elodilles	just to have it here as well: these ocata branches were deleted: http://paste.openstack.org/show/807112/	17:29
*** gfidente is now known as gfidente\|afk		18:09
clarkb	the indexing queue seems fairly stable at ~1.5k entries for the last little bit.	18:49
clarkb	indicating that at least so far this hasn't been a runaway backlog which is good	18:49
clarkb	and now we are up to 3.5k :/	19:57
melwitt	I didn't realize the indexer was falling behind so much :( I have been using this page to see whether indexing was behind http://status.openstack.org/elastic-recheck/	20:57
fungi	melwitt: it's part of an experiment to see how many workers we need, sorry about that	20:57
fungi	we're attempting to determine the number of them we can safely turn down without significant prolonged impact to indexing throughput, so as to reduce the maintenance burden and resource consumption however much we can	20:58
melwitt	oh ok, so it's not that something is catastrophically wrong. that's good heh	20:58
melwitt	makes sense, thanks	20:58
clarkb	ya not an emergency or anything	20:59
clarkb	we did end up super behind for a bit beacuse the cluster had crashed	20:59
fungi	well, i mean, something is catastrophically wrong, we don't have sufficient people to upgrade and keep this system running long term and desparately need someone to build and run a replacement if it's still useful	20:59
clarkb	we've also got that error where we have log entries from centuries in the future still happening so you ahve to look closely at e-r's graph page to see how actually up to date it is	20:59
fungi	yeah, a sane indexer implementation would discard loglines in the future probably	21:00
clarkb	we are down to 2.5k files to index now and trending in the right direction	21:02
clarkb	its possible that 50% is just enough for a normal day as indicated by the short backlog when busy then catching up alter in the day. We can keep an eye on it for a few days before committing to that reduction in size	21:02
melwitt	so something/some service is logging future dates?	21:02
melwitt	I can look into the indexer to discard loglines in the future	21:02
clarkb	melwitt: either that or we are parsing somethign that looks like future dates improperly	21:03
melwitt	ack, ok	21:03
clarkb	let me see if I can convince elasticsearch to tell me what some of those are	21:03
clarkb	"message":"ubun7496ionic \| 2021-01-15 12:52:25,572 zuul.Pipeline.tenant-one.post DEBUG Finished queue processor: post (changed: False)" in a job-output.txt file got parsed to "@timestamp":"2021-11-15T12:52:42.301Z"	21:07
clarkb	thats not a great example because we cannot look at the orignal file its too old /me looks for a better one	21:08
melwitt	ah yeah, so it looks like you're right it's a parsing problem. that makes a lot more sense than something logging dates in the future 😝	21:09
clarkb	https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7a0/795357/1/check/openstack-tox-py36/7a0cdaa/job-output.txt is one that caused it to parse a timestamp of "@timestamp":"2831-06-08T15:13:41.769Z" from "message":"ubuntu-biu-bionic \| warnings.77d41ber and system_scope:all) of 3at2h06-08 s6msg)"	21:14
clarkb	that file is 132MB large	21:14
clarkb	I wonder if we're causing logstash some sort of buffer alignment issue when we jam it full of data like that	21:15
clarkb	also why are ironic unittests exploding with data like that	21:15
melwitt	there are a ton of deprecation messages up in there	21:16
melwitt	hm, it's the policy deprecation thing, that got fixed a long time ago though. hm	21:17
clarkb	looks like the chagne merged with that behavior too (not sure if that chagne introduced the behavior or not)	21:17
clarkb	that is definitely something that ironic should be looking at cleaning up if it persists	21:17
clarkb	no one wants that much output to their console when running unittests	21:17
clarkb	One thing we've done with Zuul is to only attach the extra debug strings (logs and other output) to the subunit stream when the test is failing	21:18
melwitt	yeah.. I'm looking at it, we had the same thing happen in nova and gmann fixed it a long time ago. I'm looking to see if it was something in nova only rather than a global change elsewhere	21:18
clarkb	https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8f9/797055/21/check/openstack-tox-py38/8f938e9/ is an example from today and shows the issue seems to persist	21:19
clarkb	(that was a successful run too so not a weird error causing it)	21:20
melwitt	guh, yeah it was only in nova https://review.opendev.org/c/openstack/nova/+/676670 I'll upload something similar for ironic	21:20
melwitt	(if someone else hasn't already)	21:20
clarkb	`curl -X GET http://localhost:9200/_cat/indices?pretty=true` is how you get the full list of indices. They have timestamps in their names so you can pretty easily identify those that aren't from the last week	21:26
clarkb	Then `curl -X GET http://localhost:9200/logstash-2831.06.08/_search` dumps all records for a specific index	21:27
clarkb	I think the _cat/ url may need to be run on localhost only but the search url works anonymously form the internet if you replace localhost with elasticsearch02.openstack.org	21:27
clarkb	quick notes in case further debugging needs to happen and I'm not around	21:27
*** rlandy is now known as rlandy\|bbl		22:25
opendevreview	Jeremy Stanley proposed openstack/project-config master: Drop use of track-upstream https://review.opendev.org/c/openstack/project-config/+/799123	22:47
opendevreview	Jeremy Stanley proposed openstack/project-config master: Drop use of track-upstream https://review.opendev.org/c/openstack/project-config/+/799123	23:14
gmann	melwitt: we could disable warning at oslo policy side by default as I can see that can happen in more projects while they implement the new RBAC	23:45
fungi	throttling warnings is a common approach in other applications	23:46
gmann	or at least for the time during migration to new RBAC	23:46
fungi	log it once, then shut up about it	23:46
gmann	we can do that . currently it is added per rule check but we can just add a general warning during init time and only once.	23:47
fungi	unfortunately that requires some sort of global registry when the warning is coming from a separate module/library, to track which warnings have already been emitted somehow	23:47
fungi	ahh, yeah sometimes you can find a compromise like that	23:48
gmann	but that would help as each test class init the oslo module	23:48
melwitt	gmann: yeah, that would probably make sense being that we're repeating the same thing per project	23:48
melwitt	gmann: I proposed this for ironic, it's just the exact same thing you did in nova https://review.opendev.org/c/openstack/ironic/+/799120	23:48
gmann	let me check oslo policy and see what we can do. definitely ton of warnings for new RBAC is not going to help operator so they are not meaningful	23:49
gmann	+1	23:49
gmann	melwitt: thanks	23:49

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!