*** ysandeep|away is now known as ysandeep | 01:09 | |
*** fzzf is now known as Guest1071 | 02:57 | |
*** fzzf1 is now known as fzzf | 02:57 | |
*** ykarel|away is now known as ykarel | 04:39 | |
*** bhagyashris is now known as bhagyashris|ruck | 05:22 | |
*** chandankumar is now known as chkumar|rover | 05:31 | |
frickler | bhagyashris|ruck: chkumar|rover: apart from all the nick changes being pretty noisy, can you explain to a non-native speaker what this ruck/rover thing is supposed to mean? | 06:04 |
---|---|---|
bhagyashris|ruck | frickler, hey you will get info here https://docs.openstack.org/tripleo-docs/latest/ci/ruck_rover_primer.html | 06:07 |
bhagyashris|ruck | the major responsibilities are 1. ensuring gate queues are green to keep TripleO patches merging. 2. ensuring promotion jobs are green to keep TripleO up to date with the rest of OpenStack and everything else that isn’t TripleO! Target is bugs filed + escalated + fixed for promotion at least once a week. | 06:08 |
*** akekane_ is now known as abhishekk | 06:32 | |
*** jpena|off is now known as jpena | 07:34 | |
zbr | clarkb: i think it does | 07:41 |
*** ysandeep is now known as ysandeep|lunch | 07:48 | |
*** slaweq_ is now known as slaweq | 08:29 | |
*** ykarel is now known as ykarel|lunch | 08:44 | |
*** ysandeep|lunch is now known as ysandeep | 08:45 | |
*** sshnaidm is now known as sshnaidm|afk | 09:24 | |
*** Guest651 is now known as aluria | 09:26 | |
*** ykarel|lunch is now known as ykarel | 09:59 | |
*** sshnaidm|afk is now known as sshnaidm | 10:47 | |
*** jpena is now known as jpena|lunch | 11:47 | |
*** jpena|lunch is now known as jpena | 12:45 | |
clarkb | zbr: I'll start to wind down those processes after some breakfast. It is easy enough to turn them on again if we start to fall behind on the queue | 14:22 |
zbr | okey | 14:22 |
*** ykarel is now known as ykarel|away | 14:25 | |
elodilles | hi, could someone approve this? https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/797235 | 15:19 |
elodilles | the PTLs have approved already | 15:19 |
clarkb | elodilles: done | 15:21 |
elodilles | and as usual, I'm planning to run the script to delete some more eol'd branch (as there are another batch that could be deleted) | 15:21 |
elodilles | clarkb: thanks! | 15:21 |
fungi | sounds good, thanks elodilles! | 15:28 |
opendevreview | Merged openstack/openstack-zuul-jobs master: Remove ocata from periodic job template of neutron and ceilometer https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/797235 | 15:32 |
clarkb | #status log Stopped log workers and logstash daemons on logstash-worker11-20 to collect up to date data on how many indexer workers are necessary | 15:39 |
opendevstatus | clarkb: finished logging | 15:39 |
clarkb | https://grafana.opendev.org/d/5Imot6EMk/zuul-status?viewPanel=17&orgId=1 we will want to monitor this set of metrics as well as what e-r reports its indexing delta is | 15:39 |
clarkb | this is a 50% reduction so we can do a binary search towards what seems necessary | 15:40 |
*** sshnaidm is now known as sshnaidm|afk | 15:46 | |
wolsen[m] | clarkb: I see you commented on https://review.opendev.org/c/openstack/charm-deployment-guide/+/798273 that a recheck won't help with the promotion failure. Do you think the best course of action to submit another patch and let the next promotion effort push it up? | 15:49 |
clarkb | wolsen[m]: that is usually what we recommend if it isn't too much ahssle. Otherwise a zuul admin has to queue the jobs up again by hand | 15:50 |
wolsen[m] | clarkb: ack, that's fairly straightforward and we can go with that route. Thanks for the confirmation :-) just wanted a sanity check before going down a futile path | 15:51 |
fungi | wolsen[m]: yeah, if you don't have anything else worth approving soon, just let us know what needs to be run and i or others can take care of it | 15:51 |
wolsen[m] | thx fungi, I'll circle back if we need it | 15:52 |
*** ysandeep is now known as ysandeep|dinner | 16:01 | |
*** jpena is now known as jpena|off | 16:36 | |
*** ysandeep|dinner is now known as ysandeep | 16:44 | |
clarkb | the indexer queue has grown to about 1k entries. I'll continue to check it periodically. I don't think that is catastrophically high, but if it indicates the start of a "we can't keep up" trend that we should start logstash and log worker proceses again | 16:49 |
*** ysandeep is now known as ysandeep|out | 17:24 | |
elodilles | just to have it here as well: these ocata branches were deleted: http://paste.openstack.org/show/807112/ | 17:29 |
*** gfidente is now known as gfidente|afk | 18:09 | |
clarkb | the indexing queue seems fairly stable at ~1.5k entries for the last little bit. | 18:49 |
clarkb | indicating that at least so far this hasn't been a runaway backlog which is good | 18:49 |
clarkb | and now we are up to 3.5k :/ | 19:57 |
melwitt | I didn't realize the indexer was falling behind so much :( I have been using this page to see whether indexing was behind http://status.openstack.org/elastic-recheck/ | 20:57 |
fungi | melwitt: it's part of an experiment to see how many workers we need, sorry about that | 20:57 |
fungi | we're attempting to determine the number of them we can safely turn down without significant prolonged impact to indexing throughput, so as to reduce the maintenance burden and resource consumption however much we can | 20:58 |
melwitt | oh ok, so it's not that something is catastrophically wrong. that's good heh | 20:58 |
melwitt | makes sense, thanks | 20:58 |
clarkb | ya not an emergency or anything | 20:59 |
clarkb | we did end up super behind for a bit beacuse the cluster had crashed | 20:59 |
fungi | well, i mean, something is catastrophically wrong, we don't have sufficient people to upgrade and keep this system running long term and desparately need someone to build and run a replacement if it's still useful | 20:59 |
clarkb | we've also got that error where we have log entries from centuries in the future still happening so you ahve to look closely at e-r's graph page to see how actually up to date it is | 20:59 |
fungi | yeah, a sane indexer implementation would discard loglines in the future probably | 21:00 |
clarkb | we are down to 2.5k files to index now and trending in the right direction | 21:02 |
clarkb | its possible that 50% is just enough for a normal day as indicated by the short backlog when busy then catching up alter in the day. We can keep an eye on it for a few days before committing to that reduction in size | 21:02 |
melwitt | so something/some service is logging future dates? | 21:02 |
melwitt | I can look into the indexer to discard loglines in the future | 21:02 |
clarkb | melwitt: either that or we are parsing somethign that looks like future dates improperly | 21:03 |
melwitt | ack, ok | 21:03 |
clarkb | let me see if I can convince elasticsearch to tell me what some of those are | 21:03 |
clarkb | "message":"ubun7496ionic | 2021-01-15 12:52:25,572 zuul.Pipeline.tenant-one.post DEBUG Finished queue processor: post (changed: False)" in a job-output.txt file got parsed to "@timestamp":"2021-11-15T12:52:42.301Z" | 21:07 |
clarkb | thats not a great example because we cannot look at the orignal file its too old /me looks for a better one | 21:08 |
melwitt | ah yeah, so it looks like you're right it's a parsing problem. that makes a lot more sense than something logging dates in the future 😝 | 21:09 |
clarkb | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7a0/795357/1/check/openstack-tox-py36/7a0cdaa/job-output.txt is one that caused it to parse a timestamp of "@timestamp":"2831-06-08T15:13:41.769Z" from "message":"ubuntu-biu-bionic | warnings.77d41ber and system_scope:all) of 3at2h06-08 s6msg)" | 21:14 |
clarkb | that file is 132MB large | 21:14 |
clarkb | I wonder if we're causing logstash some sort of buffer alignment issue when we jam it full of data like that | 21:15 |
clarkb | also why are ironic unittests exploding with data like that | 21:15 |
melwitt | there are a ton of deprecation messages up in there | 21:16 |
melwitt | hm, it's the policy deprecation thing, that got fixed a long time ago though. hm | 21:17 |
clarkb | looks like the chagne merged with that behavior too (not sure if that chagne introduced the behavior or not) | 21:17 |
clarkb | that is definitely something that ironic should be looking at cleaning up if it persists | 21:17 |
clarkb | no one wants that much output to their console when running unittests | 21:17 |
clarkb | One thing we've done with Zuul is to only attach the extra debug strings (logs and other output) to the subunit stream when the test is failing | 21:18 |
melwitt | yeah.. I'm looking at it, we had the same thing happen in nova and gmann fixed it a long time ago. I'm looking to see if it was something in nova only rather than a global change elsewhere | 21:18 |
clarkb | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8f9/797055/21/check/openstack-tox-py38/8f938e9/ is an example from today and shows the issue seems to persist | 21:19 |
clarkb | (that was a successful run too so not a weird error causing it) | 21:20 |
melwitt | guh, yeah it was only in nova https://review.opendev.org/c/openstack/nova/+/676670 I'll upload something similar for ironic | 21:20 |
melwitt | (if someone else hasn't already) | 21:20 |
clarkb | `curl -X GET http://localhost:9200/_cat/indices?pretty=true` is how you get the full list of indices. They have timestamps in their names so you can pretty easily identify those that aren't from the last week | 21:26 |
clarkb | Then `curl -X GET http://localhost:9200/logstash-2831.06.08/_search` dumps all records for a specific index | 21:27 |
clarkb | I think the _cat/ url may need to be run on localhost only but the search url works anonymously form the internet if you replace localhost with elasticsearch02.openstack.org | 21:27 |
clarkb | quick notes in case further debugging needs to happen and I'm not around | 21:27 |
*** rlandy is now known as rlandy|bbl | 22:25 | |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Drop use of track-upstream https://review.opendev.org/c/openstack/project-config/+/799123 | 22:47 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Drop use of track-upstream https://review.opendev.org/c/openstack/project-config/+/799123 | 23:14 |
gmann | melwitt: we could disable warning at oslo policy side by default as I can see that can happen in more projects while they implement the new RBAC | 23:45 |
fungi | throttling warnings is a common approach in other applications | 23:46 |
gmann | or at least for the time during migration to new RBAC | 23:46 |
fungi | log it once, then shut up about it | 23:46 |
gmann | we can do that . currently it is added per rule check but we can just add a general warning during init time and only once. | 23:47 |
fungi | unfortunately that requires some sort of global registry when the warning is coming from a separate module/library, to track which warnings have already been emitted somehow | 23:47 |
fungi | ahh, yeah sometimes you can find a compromise like that | 23:48 |
gmann | but that would help as each test class init the oslo module | 23:48 |
melwitt | gmann: yeah, that would probably make sense being that we're repeating the same thing per project | 23:48 |
melwitt | gmann: I proposed this for ironic, it's just the exact same thing you did in nova https://review.opendev.org/c/openstack/ironic/+/799120 | 23:48 |
gmann | let me check oslo policy and see what we can do. definitely ton of warnings for new RBAC is not going to help operator so they are not meaningful | 23:49 |
gmann | +1 | 23:49 |
gmann | melwitt: thanks | 23:49 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!