Friday, 2021-10-22

opendevreviewMerged opendev/glean master: Remove debian-stable, add focal/bullseye testing
*** diablo_rojo_phone_ is now known as diablo_rojo_phone02:30
*** diablo_rojo_phone is now known as Guest373902:31
opendevreviewIan Wienand proposed zuul/zuul-jobs master: ensure-dstat-graph: pull updated branch
opendevreviewIan Wienand proposed openstack/diskimage-builder master: Add openEuler jobs back
opendevreviewIan Wienand proposed zuul/zuul-jobs master: ensure-dstat-graph: pull updated branch
opendevreviewIan Wienand proposed zuul/zuul-jobs master: ensure-dstat-graph: pull updated branch
opendevreviewIan Wienand proposed zuul/zuul-jobs master: ensure-dstat-graph: pull updated branch
*** odyssey4me is now known as Guest375106:18
opendevreviewIan Wienand proposed zuul/zuul-jobs master: ensure-dstat-graph: pull updated branch
opendevreviewIan Wienand proposed zuul/zuul-jobs master: ensure-dstat-graph: pull updated branch
opendevreviewDr. Jens Harbott proposed opendev/system-config master: Fixup some details in the zuul doc
fricklerinfra-root: seems I'm getting 502 on zuul.o.o, will take a closer look in a moment08:13
fricklerseems to be the scheduler/kazoo/can't start new thread thing again08:23
fricklerthat makes me notice that the restart docs still don't mention the sigusr2 debug collection?08:24
fricklerI guess with the API also being unreachable, I cannot save queues, the zuul-changes script doesn't seem to make any progress08:28
fricklero.k., got some yappi output, restarting now08:32
frickler#status notice zuul needed to be restarted, queues were lost, you may need to recheck your changes08:45
opendevstatusfrickler: sending notice08:45
-opendevstatus- NOTICE: zuul needed to be restarted, queues were lost, you may need to recheck your changes08:45
*** ysandeep|out is now known as ysandeep09:00
opendevreviewMerged openstack/project-config master: Move the daily periodic trigger earlier
fungifrickler: almost exactly one week after the last time too, yeah?11:33
fricklerfungi: indeed, though there were a couple of restarts in between, so I don't think it is a Friday thing ... yet11:39
fungiright, possible it's pure coincidence11:51
fungilast time we suspected zk connectivity problems, i think? i need to go back through the discussion11:52
*** tosky_ is now known as tosky12:07
artomo/ Zuul question - is there a way to recheck specific jobs on a patch? So if tempest-foo-py42 fails but everything else passes, just rerun that 1 failing job14:56
artomI guess you can hardcode every job name as a trigger?14:56
fungiartom: nope, zuul enqueues and reports complete buildsets, not individual builds. the risk there is you have a change which passes its jobs 50% of the time, so you selectively rerun each job until you get a passing result to get your broken change to merge14:57
artomfungi, ack14:58
fungirequiring all jobs to pass together makes it much harder to merge a broken change14:58
fungithe goal of zuul is to block broken changes from merging while it's still the patch author's job to fix, rather than catching bugs after merging when they become everyone's problem14:59
fungibut doing that requires reliable jobs14:59
fungiso ideally you'd focus on making the jobs more reliable, rather than trying to find ways to ignore their unreliability15:00
artomfungi, there's a convo around trying to save CI resources in the nova PTG room by using job hierarchies, an idea that came up was only re-running the failing jobs when the failure is unrelated15:00
artomBut you make a good point15:00
fungiartom: the most effective ways to save ci resources are: 1. make the jobs more reliable so they don't need to be rerun as often, and 2. make the tests and the software being tested run more quickly on fewer resources15:02
artomfungi, but that's hard ;) It's easier to find Zuul haxx15:02
fungiyeah, that's pretty much i. fixing the real problems correctly is a lot of work, so people would rather pile up workarounds until everything topples under its own weight15:03
fungier, pretty much it15:03
fungiclarkb might be interested in that nova conversation if he's around (i'm still in the tc rbac discussion)15:04
fungilooks like nova already moved on anyway15:06
clarkbsorry I took this as an opportunity to sleep in a bit compared to the last few days.15:15
clarkbartom: fungi: good news is if you take the time to address those difficult problems you tend to make your software better in the process. As I mentioned with the setuptools pinning discussion we need to get away from this idea that the CI exists to show us a green light and instead to show us where our software has issues so that the issues can be addressed. The green light is a side15:15
clarkbeffect not the goal15:15
clarkbfrickler: ah sorry I didn't realize the sigusr2 stuff was specifically what we were talking about re debugging zuul. I'll see if I can write something up about that today15:17
clarkbhopefully the updated restart directions were helpful though15:17
opendevreviewMerged zuul/zuul-jobs master: ensure-docker: remove Debian Stretch testing
fricklerclarkb: yes, went pretty smooth with running the playbook15:20
clarkbinfra-root I plan to merge after some breakfast. I'll make copies of the exim config and iptables configs on review02 and we can use that to double check no unexpected changes between before and after due to the group reorg15:21
clarkbjust a heads up since this change scares me a bit :) it should be fine I've been over it a couple of times now and haven't found any other uses of the gerrit group15:21
fricklerthe sigusr2 thing I just used the first process id and it seemed to be fine15:21
clarkbfrickler: yup the only other thing to be aware of with sigusr2 is that the first one starts yappi which has a performance impact due to its profiling so if you don't plan to restart the processes you should send a second sigusr2 later to stop yappi and reduce the performance overhead (each sigusr2 is a yappi toggle)15:22
clarkbI'll try to get that all documented today15:22
clarkbin this case you restarted so it is fine15:23
frickleryes, I did it twice and checked that there is some output from yappi after the second signal. hoping corvus can use the logs to gather more insight15:23
corvusfrickler: yep, thanks!  just started looking at it15:23
corvusfrickler, clarkb: wow, the stack dump handler logged zero threads.15:26
corvusright before that is a MemoryError exception15:29
clarkbjust thinking out loud here: it is possible that glibc on bullseye is not as happy with python3.8 + zuul as it was on buster?15:35
clarkbwe had the thread issue too iirc15:35
clarkbmemory stuff would be in glibc. threads are in pthreads though?15:36
clarkbI have approved 813675 after making copies of /etc/alises and /etc/iptables/rules.v4 and /etc/iptables/rules.v615:42
clarkbI'll leave my connection to the server open too15:42
*** timburke__ is now known as timburke15:50
opendevreviewMerged zuul/zuul-jobs master: ensure-rust: verify cryptography build on Ubuntu
opendevreviewMerged opendev/system-config master: Remove the gerrit group in favor of the review group
clarkbthe hourly deploy is running now but then deploy jobs for ^ will run. I'm keeping an eye on it16:16
clarkbI think that will run manage-projects.yaml and that will run manage-projects with the gitea always update flag set to true16:17
clarkbwe don't expect any issues from that based on real world experience last friday16:17
opendevreviewClark Boylan proposed opendev/system-config master: Document Zuul's SIGUSR2 handler
clarkbfrickler: ^ do you think that is enough info about the the signal hanlder in zuul?16:26
fricklerclarkb: could you rebase on top of ? I did some formatting and nit cleanup16:28
fungiclarkb: i also left a couple of minor comments16:29
clarkbfrickler: looks like that change needs a reabse too since my other changes landed. Sorry I didn't notice that one as a conflict. But ya I can rebase both and do the related cleanups16:30
fricklerha, fungi wrote the same comments as I did, just phrased differently, nice ;)16:31
fungiit's good to know i wasn't too off track ;)16:32
fungi755155 also reminded me that we apparently haven't moved the db out of trove yet?16:33
frickleroh, actually I meant , not sure where that link came from16:34
fungithough that one's also related, yep16:34
clarkbah well I'll push an update to 755155 since it is almost doen but rebase the usr2 change on 81509416:38
clarkbthough maybe they overlap so nevermind16:39
fungiyeah, 755155 looks like it'll need some more work to reintegrate now16:40
fungibut also i apparently missed that one getting pushed16:41
opendevreviewClark Boylan proposed opendev/system-config master: Document Zuul's SIGUSR2 handler
clarkbupadted thanks16:43
opendevreviewMerged opendev/system-config master: Fixup some details in the zuul doc
clarkbmanage projects ran in not much time and was successful. We are seeing what we expected there. Always a good sign17:06
clarkbservice-review is done and the three files that would've been modified if wires got crossed did not change. I can still ssh into review02 and the gerrit web ui is accessible I think we are good17:17
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Update dstat to support bionic and others
opendevreviewClark Boylan proposed opendev/system-config master: Document Zuul's SIGUSR2 handler
clarkbthat should actually build now17:37
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Update dstat to support bionic and others
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Update dstat to support bionic and others
opendevreviewClark Boylan proposed opendev/system-config master: Refactor infra-prod jobs for parallel running
opendevreviewClark Boylan proposed opendev/system-config master: infra-prod: clone source once
clarkbianw: ^ I resolved the merge conflict I created with those changes and rebased the stack18:26
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Update dstat to support bionic and others
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Update dstat to support bionic and others
clarkbI went ahead and approved the sigusr2 docs update since frickler and fungi had both given similar feedback on a prior patchset and I adressed those comments and fungi +2'd19:58
clarkbactually wait fungi had another comment. The new UI doesn't show that very well19:58
opendevreviewClark Boylan proposed opendev/system-config master: Document Zuul's SIGUSR2 handler
clarkbfungi: ^ how is that?20:03
clarkbseparately I wonder if gertty makes the comment render weird in gerrit web ui20:03
clarkbits not a patchset or file level comment20:04
corvusclarkb: which comment is non-obvious for you?20:13
corvusand separately -- a potential zuul fix for the problem frickler observed has merged; i think it'd be good to restart zuul this afternoon.  i'll do that soon if no objections.20:15
clarkbcorvus: fungi's comment on ps1 at 10:54am pacific. It says "couldn't hurt to say how to identify the daemon's PID accurately."20:15
clarkbcorvus: I think there are two things happening there. First is that it was against ps1 and not ps3 (not sure if that was intentional?) but the other is that the comment isn't against a file or the patchset itself so doesn't get the comment icon on it20:16
clarkbthe gerrit summary view shows you the speech bubble when you've got comments to address but that didn't happen here so I missed it20:16
clarkbcorvus: no objections from me for doing a restart. I'll be around for the most part too. Though I think its shaping up to be a quiet afternoon as everyone is tired post ptg :)20:17
corvusclarkb: so basically, you're inclined to look for the comment bubble and disregard other things in the "change log"?20:19
clarkbcorvus: I don't think thats totally accurate. I opened the change and skimmed the log but gerrit just howed me [..] and I skipped over it20:21
clarkbthen on a different line it said +2 so I went with it20:21
corvusthe "[...]" is literally part of the comment text, so that's probably an unfortunate conicidence that that happened to show up at the beginning...20:22
clarkbthe next bit seems to be quoted too? maybe that is why it renders weird?20:23
corvusit starts collapsed, but so do all the other comments20:23
corvusyeah, that looks like a markdown quote (may be accidental?)20:23
clarkbcorvus: if I compare it to other top level comments I see the entire top level comment by default. I think what happened here is gerrit got smart with what it thought was a quote so it collapsed most of the comment20:25
corvusoh interesting20:25
clarkb for example has a top level comment as well as inline comments from me. But you can see all of my top level comment there20:26
clarkbit isn't clear to me if my top level comment had been longer if gerrit would've trimmed it20:27
clarkbbut ya I suspect it has to do with the quoting and gerrit treats that as a rendering break and trimmed what is show by default20:27
corvusyeah that seems likely, since it chose to break there and not just after the 1st line20:29
fungiclarkb: i had commented on patchset 1 originally, but didn't see any indication it was acknowledged or addressed in subsequent revisions and thought it might have been missed, so i quoted my prior comment in a new comment as a quick reminder of the suggestion. but yes in order to avoid over-quoting i trimmed the quoted block to just what i meant to point out and put in a [...] line to indicate20:32
fungii had trimmed out text there20:32
clarkbgot it. I guess it would help to add a bit more text because gerrit renders oddly in that case20:34
clarkbfungi: did you quote it and post it as a top level comment?20:36
clarkbThat is the other confusing bit since quoting the inline comment should remain an inline comment?  I guess I can test that20:36
fungii used the reply button in gertty. i don't know what a top level comment is. it's just a review comment20:36
corvus"change message" i think is the technical term for what gertty does there20:36
corvusand it's not a gerrit-threaded reply even if you hit the reply button (the gertty reply button predates gerrit-internal threads)20:37
clarkbI just posted a quoted reply to the same comment that fungi repsonded to and you can see the difference between what gertty and gerrit does20:37
corvusso it just copies the text20:37
fungiyeah, i copied the inline comment quote into a change message because i don't think you can quote with inline comment replies20:37
fungiin the future i'll just say "did you see my comment on patchset 1" instead or something20:38
clarkbinterestingly gerrit applies it to ps120:38
clarkber ps4 and gerrty to ps120:38
clarkboh except the inline comment is to ps1 but the top level indicator for it is ps4. That isn't confusing at all :)20:38
clarkbI expected it to be ps1 across the baord given the quote context20:39
corvusi suspect gertty posted the change message to ps1 since fungi hit reply to change message on ps1 to create it20:40
fungiyeah, i assumed it would be to ps1 (gertty shows it that way to me)20:40
clarkbya the reply to the inlien comment is noted as ps1 but the top level comment is noted as ps420:42
clarkbSo it seems the speech bubble means "open to see inline comments" and the top level comment is rendered directly possibly in a limited version depending on rendering rules for newlines/quoting etc20:43
fungiamusingly this is how it ends up showing in gertty for me:
clarkbI think gerrit eating the context after the [...] is what created the original confusion20:43
fungiyeah, gertty shows me exactly what i entered, so i was entirely unaware there was anything odd20:44
clarkbthey have definitely complicated the expectations around comments with things like attention sets and the like20:48
clarkband that is true for users of the web UI too (there are new expectations to get the state transitions right)20:48
clarkbone way to deal with this is to hit expand all when reading comments. Unfortauntely it doesn't look like I can tell it to show expanded by default20:51
clarkbbut I can probably try to get into the habit of hitting that button first20:51
fungiwell, also more generally it was fine if you had just let it merge, i left a +2 on that revision indicating i was already fine with it as-is21:04
clarkbmatrix has decided to stop working for me21:07
clarkbseems to affect all the servers I'm talking to21:07
clarkbanyone else experiencing this?21:08
clarkbfungi: when I noticed what I had missed I wanted to fix it :)21:17
clarkbcorvus: do you see ^?21:24
clarkbwondering if your matrix client is working21:24
clarkb I don't know what I didn't look earlier, maybe because I thought it was more decentralized than that, but my user is a user and the oftc bridge is on too21:26
clarkband now it seems to be back21:37
clarkblooks like almost an hour long loss of the oftc matrix bridge. Otherwise stuff is working21:38
corvusi look forward to reading later whatever happened between "20:48:37            clarkb | and that is true for users" and "21:37:47            clarkb | and now it seems to be back" :)21:47
clarkbcorvus: I found that hitting expand comments in the gerrit ui works around some of the clunkyness here but there is no way to expand it by default so you have to click the button. And then it was me talking about the matrix outage trying to figure out if it was my end or their end21:48
corvusah yep.  i caught up on
corvusi'm guessing we matrix users won't actually get the missing messages in this case since the bridge is associated with the homeserver that failed21:49
clarkbya that is my assumption too21:50
clarkband now fastmail is down. They don't seem to have noticed yet22:19
clarkbfungi: I was going to respond to zigo's latest email pointing out build, but my email provider ^ is not up right now22:30
fungii've not seen it yet, i don't think, checking22:31
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: ensure-zookeeper: eliminate the tmpfs size limit
fungiclarkb: also it's already in debian bullseye:
clarkbits an interesting problem because I can't open thei rsupport site. I could send them email thorugh another provider but I'm worried they won't get it22:35
clarkbif I had a twitter I could tweet at them  Iguess but I don't have that. Things get weird when it is the comms tools that break22:35
corvusimma gonna do that zuul restart thingy now23:30
fungisounds good!23:31
* clarkb is still around23:32
fungithis gets us the presumptive fix for the zk disconnect issue, as well as version info in the components endpoint, looks like23:33
fungioh, and the key deletion fix23:33
clarkbthe key deletion thing was test only23:33
clarkbit wasn't broken but the testing should've checked a few extr abits23:33
corvushrm, we have no release notes... i guess i should make one real quick23:34
fungioh, right, key deletion change landed earlier, so that's just added testing23:34
corvusmostly because i want to have a good rollback release before we land the pipeline series23:34
fungistill new since the last tag though23:34
fungimakes sense, the pipeline stack looked huge23:34
clarkbheh fastmail decided to stop being broken and sent the email draft that I had tried to send earlier but couldn't23:35
corvusif you want to check out the prototype components display23:38
fungiooh, now *that's* slick!23:40
corvusyeah, this thing's starting to look like a system :)23:41
fungii guess those git commit ids correspond to the ephemeral merge commits from the gate pipeline still23:42
clarkbfungi: yes since we deploy from the images built in the gate23:42
fungiwish i could think of a good way to remap those23:42
clarkbfungi: if you look at the image in dockerhub it shows you the change23:42
fungibut i suppose "have zuul push" is the real answer23:43
clarkbwhich manes you cna manually map them at least23:43
corvuszuul push would solve that :/23:43
corvusi wonder if there's any chance we could change the image metadata during promote23:43
corvusi guess that would technically make a new image23:43
fungiwe could substitute the pbr.json contents though?23:44
fungioh, right, still it's a new image23:44
corvushrm, pbr.json....23:44
corvusso pbr writes a json file when it builds?23:45
fungipbr.json is where the git commit id is stored, yes23:45
fungialongside the python package metadata23:45
corvusso we could, in promote, open up the image, update pbr.json with the actual commit id, then push the updated image?  [not saying it's a good idea, just physically possible]23:46
fungiyou can even use standard python package metadata parsing tools to read it back without needing pbr at runtime23:46
fungiyeah, technically possible, though it will change the image checksum obviously23:46
corvusi think process-wise that really boils down to "build [part of] the image in promote."  it would just be a really small part of the image that gets [re]built, so probably doesn't need extra testing23:47
clarkbI think zuul may be up? openstack tenant is loaded at least23:47
corvusall things considered, i don't like like that, so i don't think i want to pursue it.  but helps with the brainstorming.23:47
corvuss/like like/like/23:48
corvusmaybe during the gate build we should include the change in the pbr.json version23:50
corvusremoves one step from the mapping procedure23:50
fungioh, yep we could do that, or add a zuul.json metadata file23:50
corvusre-enqueue done23:50
corvus#status restarted all of zuul on commit 7c377a93e020f20d4535207d9b22bdc303af4050 for zk disconnect/threadpool fix23:51
opendevstatuscorvus: unknown command23:51
corvus#status log restarted all of zuul on commit 7c377a93e020f20d4535207d9b22bdc303af4050 for zk disconnect/threadpool fix23:51
opendevstatuscorvus: finished logging23:51
corvusshould be a command :)23:51
fungiparsing arbitrary package metadata is fairly trivial, modulo the switch from pkg_resources to importlib circa python 3.8, this is how i've done it elsewhere for reading pbr's metadata without using pbr at runtime:;a=blob;f=mudpy/;h=d99617e590aa218e640583d312429ec846205cf6;hb=HEAD#l2023:53

Generated by 2.17.2 by Marius Gedminas - find it at!