corvus | clarkb: sorry was deep into other stuff; glad you worked it out. :) | 00:52 |
---|---|---|
corvus | fungiclarkb i accidentally dropped a vote when rechecking, so the zuul tracing changes haven't landed yet. hopefully tomorrow. :) | 00:53 |
*** ysandeep|out is now known as ysandeep | 01:45 | |
*** ysandeep is now known as ysandeep|afk | 03:42 | |
*** ysandeep|afk is now known as ysandeep | 05:14 | |
*** jpena|off is now known as jpena | 07:36 | |
*** ysandeep is now known as ysandeep|sick | 08:27 | |
*** dviroel|afk is now known as dviroel | 11:22 | |
*** dasm|off is now known as dasm | 13:18 | |
opendevreview | Rafal Lewandowski proposed openstack/diskimage-builder master: Add cloud-init growpart element https://review.opendev.org/c/openstack/diskimage-builder/+/855856 | 13:20 |
dmendiza[m] | Hi friends! | 13:29 |
dmendiza[m] | I'm not seeing zuul runs on openstack/barbican for the new stable/zed branch: https://review.opendev.org/q/project:openstack%252Fbarbican+branch:stable%252Fzed | 13:29 |
fungi | dmendiza[m]: have you checked to see if you have a queue parameter in one of your project pipelines on that branch? | 13:51 |
fungi | the last several times someone has expressed the same symptom recently, that's been the cause | 13:52 |
fungi | dmendiza[m]: https://zuul.opendev.org/t/openstack/config-errors | 13:52 |
fungi | openstack/barbican - .zuul.yaml (stable/zed) | 13:53 |
fungi | extra keys not allowed @ data['gate']['queue'] | 13:53 |
fungi | that would be why zuul is ignoring the configuration for that branch | 13:53 |
fungi | probably something you've already fixed in master and just haven't backported yet? | 13:54 |
dmendiza[m] | fungi: ah yes, that sounds familiiar, thanks! | 13:55 |
fungi | any time | 13:55 |
fungi | dmendiza[m]: yeah, you'll want to backport the queue change from https://review.opendev.org/858205 | 13:58 |
opendevreview | Merged opendev/statusbot master: Handle exception for unprivileged commands https://review.opendev.org/c/opendev/statusbot/+/807948 | 14:15 |
*** marios is now known as marios|out | 14:32 | |
clarkb | fungi: are you aware of any reason to no land that zuul jobs update for the git mirroring loop condensing? | 15:14 |
clarkb | I'll go ahead and do that first thing today if there aren't any issues that need attention first | 15:14 |
fungi | nope, i'll be around in case anything goes haywire | 15:15 |
clarkb | great | 15:15 |
fungi | fire away! | 15:15 |
clarkb | https://review.opendev.org/c/zuul/zuul-jobs/+/858961 has been approved. I'll keep an eye on it too. Once I'm happy that change hasn't broken anything I'll approve the arm64 rocky 9 image update. I realized that we should manually restart launchers once that is in to be extra sure we got it correct | 15:15 |
fungi | also we're still hoping to get a rolling zuul scheduler restart in once 858372 finally merges | 15:19 |
*** dviroel is now known as dviroel|lunch | 15:29 | |
opendevreview | Merged zuul/zuul-jobs master: Reduce the number of loops in prepare-workspace-git https://review.opendev.org/c/zuul/zuul-jobs/+/858961 | 15:34 |
clarkb | Now to look for jobs that have started since that merged | 15:36 |
clarkb | https://zuul.opendev.org/t/openstack/stream/94a597154bc842e79975e27e9c26f77a?logfile=console.log maybe | 15:37 |
clarkb | nope that job used the old version of the role | 15:38 |
clarkb | https://zuul.opendev.org/t/openstack/stream/d8b8e6cf6c0d41b7ae21643ee22dcbfb?logfile=console.log that one used the new version | 15:38 |
clarkb | I see two successful jobs so far that I believe ran with the new version of the role | 15:40 |
clarkb | I'll give it a bit logner before I move on to the arm64 image change | 15:41 |
clarkb | the job failures I'm seeing don't appear related to the git repos. | 15:58 |
clarkb | https://review.opendev.org/c/openstack/project-config/+/858554 has been approved | 15:59 |
opendevreview | Merged openstack/project-config master: Add rockylinux-9-arm64 https://review.opendev.org/c/openstack/project-config/+/858554 | 16:06 |
corvus | fungi: clarkb apparently the first change that emits spans did get deployed this weekend, so we should theoretically already be getting them. i'll look into why they aren't showing up. | 16:23 |
corvus | "http2Server.HandleStreams received bogus greeting from client:" in the jaeger log looks suspicious | 16:23 |
fungi | indeed, maybe creds aren't quite right? | 16:24 |
corvus | yeah, or wrong protocol or something? | 16:25 |
clarkb | used EHLO instead of Hello | 16:25 |
corvus | hah | 16:26 |
corvus | tcpdump sees traffic from the schedulers -> jaeger | 16:31 |
*** jpena is now known as jpena|off | 16:34 | |
*** dviroel|lunch is now known as dviroel | 16:35 | |
corvus | oh i think i see the problem. will prepare a change | 16:37 |
opendevreview | James E. Blair proposed opendev/system-config master: Correct OTLP TLS configuration in jaeger https://review.opendev.org/c/opendev/system-config/+/859650 | 16:41 |
corvus | clarkb: fungi ^ what do you think about me manually applying that fix real quick and restarting? | 16:41 |
corvus | (also gee willikers this thing has a lot of servers and ports) | 16:42 |
clarkb | corvus: I think tahts fine. I guess jaeger has multiple ways of collecting traces? | 16:43 |
corvus | yes so many ways | 16:43 |
corvus | it's a merger of several projects | 16:43 |
corvus | zuul exports using opentelemetry (otlp) which is the new hot universal standard | 16:43 |
corvus | okay, i manually applied/restarted with that. now just waiting for a new buildset to start. | 16:44 |
corvus | "cool". that stopped the errors in the log. but i still don't see any data. (and i do still see tcp traffic) | 16:48 |
clarkb | the arm64 rocky 9 image update is deploying now | 16:48 |
corvus | there seem to be some broken pipe tcp close errors in the log. not sure what that signifies. | 16:50 |
corvus | (ie, not sure if it's important) | 16:50 |
corvus | i'm going to write a simple python script to emit a dummy trace and run it from a zuul server | 16:52 |
corvus | ah, the next problem is that the internal zk-ca cert that i made was for tracing01.opendev.org but i told zuul to connect to tracing.opendev.org | 17:02 |
opendevreview | James E. Blair proposed opendev/system-config master: Correct internal tracing server cert name https://review.opendev.org/c/opendev/system-config/+/859654 | 17:05 |
corvus | clarkb: fungi ^ can you review that and parent? i think we'll want to let the deployment job take care of that before we try again. | 17:06 |
clarkb | done | 17:10 |
clarkb | I've been tailing the launcher log on nl04 since the config update and there are no tracebacks. I think that side of things is happy. | 17:11 |
clarkb | the new image has been building for about 20 minutes, still to early to say if it is happy | 17:12 |
fungi | sorry, had to pop out for an errand, back and reviewing | 17:38 |
opendevreview | James E. Blair proposed opendev/system-config master: Correct internal tracing server cert name https://review.opendev.org/c/opendev/system-config/+/859654 | 17:39 |
corvus | testing caught an issue in the role ^ | 17:39 |
fungi | lgtm | 17:39 |
fungi | both lgtm | 17:40 |
opendevreview | Merged opendev/system-config master: Correct OTLP TLS configuration in jaeger https://review.opendev.org/c/opendev/system-config/+/859650 | 18:13 |
opendevreview | Merged opendev/system-config master: Correct internal tracing server cert name https://review.opendev.org/c/opendev/system-config/+/859654 | 18:29 |
corvus | i restarted jaeger | 19:37 |
*** rlandy is now known as rlandy|biab | 19:47 | |
*** dviroel is now known as dviroel|walk | 19:54 | |
fungi | 858372 merged for zuul and the promote succeeded, so if we want to pull new images on the schedulers and restart them, we could do that now | 19:58 |
corvus | nah, not worth it yet; we should be getting at least one trace now, but it's still not working | 19:59 |
fungi | ah, the comment about span emitting merging before the last restart was you saying that we didn't need to bother | 19:59 |
fungi | got it. thanks! | 19:59 |
corvus | i turned on debug logging in the collector, and it says it's writing spans to storage... | 20:25 |
corvus | so apparently the error at this point is internal to jaeger? | 20:25 |
corvus | {"level":"debug","ts":1664396701.7566938,"caller":"app/span_processor.go:164","msg":"Span written to the storage by the collector","trace-id":"214600fcab74aecbda1d019c4f0a4c69","span-id":"0a82c64ff47f95c8"} | 20:25 |
*** rlandy|biab is now known as rlandy | 20:40 | |
corvus | there's a metrics endpoint on the admin server: jaeger_spans_received_total{debug="false",format="proto",svc="zuul",transport="grpc"} 108 | 20:40 |
*** dviroel|walk is now known as dviroel | 20:44 | |
clarkb | fungi: re mm3 https://etherpad.opendev.org/p/O6e6Quoe_jKivcj-vEXN I could send something like that make sense to send to the mailman users list or perhaps directly to maxking? | 20:45 |
clarkb | corvus: could it be a permissions thing to view spans? basically they are all there but we can't see them because we don't have sufficeint access? | 20:46 |
corvus | i don't think there's any internal access controls; it's sort of like it's not reading them from the storage backend | 20:49 |
corvus | i do see traces in strings on the on-disk storage | 20:51 |
corvus | okay i just restarted it using the in-memory db and it's working, so there's definitely something about the badger storage that's wrong | 21:18 |
clarkb | interesting | 21:19 |
corvus | aha, it's the ttl i set | 21:19 |
corvus | apparently 30d breaks it | 21:19 |
corvus | i'm trying it with 720h now | 21:20 |
corvus | that looks like it might be working | 21:20 |
opendevreview | James E. Blair proposed opendev/system-config master: Fix jaeger badger config and uid https://review.opendev.org/c/opendev/system-config/+/859729 | 21:23 |
corvus | clarkb: fungi ^ that syncs the repo with what i have done on disk | 21:24 |
corvus | https://tracing.opendev.org/trace/73f394c92d9fbe1943178eabfa6da6b1 exists | 21:26 |
corvus | lol a project name would be a good tag to add :) | 21:26 |
clarkb | corvus: and maybe the event id so that we can cross correlate with logs if necessary | 21:29 |
clarkb | oh wait it is there nevermind | 21:29 |
clarkb | :) | 21:29 |
clarkb | I have to expand the listing of tags to see it | 21:30 |
corvus | yeah; and you can search for it like: https://tracing.opendev.org/search?end=1664400672875000&limit=20&lookback=1h&maxDuration&minDuration&service=zuul&start=1664397072875000&tags=%7B%22zuul_event_id%22%3A%2226e28d1c1d234ffd887d95c86d47457d%22%7D | 21:31 |
*** dasm is now known as dasm|off | 21:31 | |
corvus | (by using the "tags" field in the search box) | 21:32 |
*** dviroel is now known as dviroel|out | 21:37 | |
fungi | clarkb: very minor typo corrected on your draft e-mail to the mailman list, but lgtm. thanks! | 21:41 |
clarkb | fungi: do you think it is better to use the public mailing list or try to find email for the maintainer (I'm assuming its in git repos) | 21:42 |
fungi | i don't see why the public ml would be a problem | 21:45 |
fungi | after all, it's a mailing list for a project making mailing list software. you'd think if anyone would be comfortable discussing things on a public ml, it's them | 21:45 |
clarkb | I signed up and got moderated anyway | 21:58 |
clarkb | But email is sent, hopefully it will get forwarded to subscribers soon enough | 21:58 |
fungi | there's probably a feature to moderate first-time posters or a waiting period after account creation | 22:00 |
clarkb | https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/FVZW5DQJ7C3TW4LPIIU7ARI7XMVJYYWX/ | 22:16 |
opendevreview | Merged opendev/system-config master: Fix jaeger badger config and uid https://review.opendev.org/c/opendev/system-config/+/859729 | 22:38 |
clarkb | the rocky 9 arm64 image did end up going ready eventually | 22:46 |
clarkb | so I think that is done until mnasiadka gives it a go | 22:46 |
fungi | excellent | 22:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!