Wednesday, 2022-10-12

ianwclarkb: does all line in the results tab with your hires screen?00:00
clarkbianw: right, just wondering if the plan is to force merge it. It did sound like fungi was ok with doing so and as we mentioned no artifacts should need promoting?00:00
ianwi think we might need to limit the width of the node name a bit more 00:01
clarkbianw: yes they all seem to be aligned00:01
ianwyeah i see
corvusclarkb: do you know why my messages are getting to irc (i assume they are) while yours aren't?00:05
clarkbcorvus: I think because you are actually joined to the irc channel but I'm not00:06
clarkbI don't know why you are joined and I'm not though00:06
clarkbwe both dropped out at roughly the same time00:07
dasmclarkb: corvus i can see your messages on irc (if that helps)00:07
clarkbthen you rejoined at 15:58 but I haven't00:08
clarkbdasm: ya this is my IRC connection. My matrix one isn't in the channel nor do I see the messages I send to it via my irc connection00:08
ianwanyone mind if i restart the zuul-web container to pick up ?  i'd like to play with it an the offsets from above00:09
corvusclarkb: apparently i reconnected just before 9am today00:10
clarkbianw: I think that should be safe. I guess that landed after our restarts completed?00:10
corvusianw: i think a rolling restart of both containers is fine; remember they take a few minutes to start.00:10
corvus(so be sure to check the component status page)00:10
clarkbcorvus: I've tried to provide more details to to see if they have any other insight00:12
clarkbI tried using !join #openstack-nova to join an entirely new channel with no luck (element gets the invite to the room from oftc-irc and I can join as far as element is concerned but I don't show up in the user list of the channel on the irc side of things00:15
clarkbI smell dinner. I need to pop out now. But I'll check to see if anything has changed in the morning00:16
corvusclarkb: maybe something with your nick?  maybe you could change it and change back?  just throwing out ideas00:20
corvusbon appetit00:20
NeilHanlonclarkb: it's a breakage on my side as I maintain the mirrors.. our mirrormanager software is supposed to cull mirrors which are serving bad content and/or are inaccessible, after failing some amount of syncs.. but for some reason it doesn't appear to be catching/fixing this situation00:42
opendevreviewMerged opendev/system-config master: bootstrap-bridge: drop pip3 role, add venv
ianwso the bridge bootstrap failed -- "refusing to convert from file to symlink for /usr/local/bin/ansible"02:30
ianwhowever, it did redirect /usr/local/bin/ansible-playbook to the venv installed ansible02:30
ianwi think probably the clearest thing to do here is for me to manually run "pip uninstall ansible" on bridge; that will remove the global ansible pip install and the next bootstrap run should then be able to link it02:31
ianwthe gate is ok, because it just calls "ansible-playbook" anyway02:32
ianwinfra-prod-bootstrap-bridge should re-run out of the periodic jobs soon02:41
opendevreviewIan Wienand proposed opendev/base-jobs master: setup-keys: add bridge node to "bastion" group
opendevreviewIan Wienand proposed opendev/system-config master: Run jobs with a jammy
opendevreviewIan Wienand proposed opendev/system-config master: testinfra: Update selenium calls
opendevreviewIan Wienand proposed opendev/system-config master: Abstract name of bastion host for testing path
opendevreviewIan Wienand proposed opendev/system-config master: Convert production playbooks to bastion host group
opendevreviewIan Wienand proposed opendev/system-config master: Run a base test against "old" bridge
opendevreviewIan Wienand proposed opendev/system-config master: bootstrap-bridge: use abstracted hostname
ianwok, bootstrap bridge ran ok ->
ianwi think that stack should pass, and i'm now running out of places i think it might be broken too :)04:08
opendevreviewDr. Jens Harbott proposed opendev/base-jobs master: Drop ara related vars from the base jobs
opendevreviewDr. Jens Harbott proposed openstack/project-config master: Switch the requirements-constraints job to py310
opendevreviewTony Breeds proposed openstack/project-config master: Switch the requirements-constraints job to py310
ramishrabshephar: commented in the patch, I think there is one more place where it could be changed, though output_dir thing is kind of messy atm06:40
ramishraoops wrong channel:/06:43
opendevreviewgnuoy proposed openstack/project-config master: Add project for managing zuul jobs for charms
opendevreviewjayaditya gupta proposed openstack/diskimage-builder master: Fix issue in extract image
opendevreviewMerged openstack/project-config master: Switch the requirements-constraints job to py310
fungiper-domain import logs for the latest mm3 migration test are available in
fungii don't see any obvious new errors, and the ones about the fields which were too large for their db columns are now gone12:48
opendevreviewJeremy Stanley proposed opendev/system-config master: Add a mailman3 list server
opendevreviewJeremy Stanley proposed opendev/system-config master: Fork the maxking/docker-mailman images
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM force mm3 failure to hold the node
fungiclarkb: ^ i added redirects for the old list info page and the list index page urls13:37
fungitested by adding manually on first13:37
fungiseems to work, though maybe i should add some testinfra checking on that now that i think about it13:38
*** dasm|off is now known as dasm14:15
fungioh, but we're not actually testing the other redirects in the new deployment either14:24
fungijust checking for listening sockets and taking some screenshots14:24
Clark[m]Because we don't have the migration data to check with. I guess we could write a basic html file and use that though14:35
fungiwell, we could test the redirects to the new interface since we pre-create the mailing lists in it, it's just the rewrites exposing the old archives we can't test without adding some content14:41
clarkb any idea what caused that to happen? Seems like our test static.o.o which loads up the afs ro content returned a 403 instead of 200 for starlingx content15:39
clarkbthe prod content is available so not a systemic issue with their afs content and the testinfra tests for static lookup other data out of afs so not an afs specific issue15:40
clarkbI've rechecked to see if they are persistent issues15:40
fungii looked at it briefly15:40
fungipretty sure something happened and the test node had trouble reaching afs when apache wanted to read the .htaccess file15:41
fungiif you look at the error details from apache you get a little more insight15:41
fungii didn't check syslog for actual afs errors, but wouldn't be surprised if there are some15:42
fungiassuming the job collected it15:42
opendevreviewMerged openstack/project-config master: Add project for managing zuul jobs for charms
jrosser_i've not received emails from since the 9th - is it possible to tell if there have been delivery attempts or bounces?16:56
clarkbjrosser_: "SMTP error from remote mail server after end of data: 553-Message filtered."16:58
jrosser_oh dear :(16:59
clarkbseems to be your mail servers are filtering it as spam16:59
clarkbwe should double check on our end if the host ended up on any lists16:59
clarkbdoes sbl not take a full ipv4 address in their query form?17:02
clarkbfungi: ^17:02
clarkbI found a different sbl query form and neither the ipv4 or ipv6 address is listed17:06
clarkbjrosser_: ^ its possible a different list has it listed, but sbl at least says we are good17:06
jrosser_ok thanks - do you have a transcript with anything useful (like which server rejected it) as the mail routing i have to endure is terrible17:07
clarkbjrosser_: the IPs seems to vary but appears to be the shared fqdn17:09
jrosser_ok thats helpful17:10
jrosser_i've added as an allowed domain into my messagelabs portal17:13
fungimaybe add too17:23
fungisince we will likely change that from address in the future if/when we set up an mta for the new domain17:24
clarkbOur rocky 9 image should try to rebuild again shortly. If it ends up on nb02 it will run without the mirrorlist change which would be a good check of that20:49
ianwclarkb: if you have time to loop back over it should be ok.  i did end up moving back to just using "bastion" as the group for testing and production as i think it's a bit easier.  there's a new change to deal with base-jobs/bootstrap as you pointed out too21:10
clarkbianw: oh ya21:12
opendevreviewIan Wienand proposed opendev/system-config master: [wip] switch testing bridge name to
clarkbianw: and I guess bootstrap bridge is weird due to its self referential nature?21:21
clarkboh I see the next chnage splits out the handling for that21:21
ianw then updates that now ... hopefully the comment helps21:22
clarkbianw: left a comment on that one. Apologies if my previous comments may have confused things.21:36
clarkbGoogle CLA issues sorted. that should fix ssh rsa problems with gerrit 3.521:40
JayFI feel like there might be something weird with the opendevreview bot -- was posted as "verification failed" 3 minutes ago, but the V-2 was put on the patch at more like 20 minutes ago22:05
JayFI don't know if that's "normal" or what, but the latency surprised me so I thought I'd mention it in case it's evidence of some kind of service issue22:05
clarkbJayF: what do you mean by "was posted as verification failed"22:07
JayF> 22:01:40  opendevreview | Verification of a change to openstack/ironic stable/xena failed: Stable only: Factor out addition of packaging lib
JayFbut on the patch itself, the V-2 was voted on by zuul at 21:3822:08
JayFer, it's actually worse than that; at 20:38 22:08
JayFI do not care or am bothered by this latency; but noticed because I actioned the email notification, then saw an IRC notification and was like "oh no another one", but it was the same one22:09
JayFI just wanted to mention it because it's the kind of strange that you might wanna know about :)22:09
clarkbthe bot got an event from gerrit at 20:38:59 which rules out gerrit emitting the event slowly22:12
clarkband it says it sent the message at that time22:12
JayFI can guarantee it didn't hit my client at that time; and 22:01:40 is much too late for it to be like, client lag22:13
clarkbit then got a second event (the one generated by your comment) that it dcided it needed to post for as well22:14
clarkband that one seems to be what generated the message you saw22:14
JayFYep, and looking back22:14
JayFI see a 20:38:59 too22:14
JayFI did issue a recheck at 21:4522:15
JayFbetween those events22:15
clarkboh actually no I think it was the comment from the arm pipeline22:15
clarkbso ya I think this is fine other than it triggering off of any zuul comment and not necessarily the one that changes the state22:15
JayFokay; makes sense. extra notifications are not so bad just very, very confusingly timed there22:15
clarkbya thats what it is. It posted at 20:38 when the -2 happened. Then at 22:01 it posts again in response to the arm64 pipeline comment22:15
JayFdoesn't help that I missed that it notified at the right time as well22:15
JayFI may have about 40000 patches up with "Stable only: " or "CI: " prefix across multiple stable branches; it's all mixing together 22:16
ianwclarkb: hrm, i guess you're right in that is using the zuul-run playbooks22:17
fungiJayF: clarkb: what triggered exactly? looking through the comments on that change i don't see anything amiss22:17
JayFfungi: tl;dr: a message popped at 20:38:59 that a change failed verification. This was accurate. Another identical message that it failed verification posted at 22:01:40, which appears to have been sprung by the ARM64 pipeline notification 22:18
fungizuul left the verified -2 result at 20:38, then the next comment i see from it is at 22:01 when it says the arm jobs passed22:18
fungiJayF: what does "popped up" mean in this contect?22:18
JayFIRC robot messages in #openstack-ironic22:18
funginot the comment i guess?22:18
JayFfrom opendevreview 22:18
fungioh! i totally missed you were talking about irc there22:19
fungigot it. i think i've never noticed that behavior because none of the projects i work actively on have it set to do notifications on failure results22:19
fungijust new uploads and merges22:20
opendevreviewIan Wienand proposed opendev/system-config master: [wip] switch testing bridge name to
opendevreviewClark Boylan proposed opendev/system-config master: DNM testing an upstream gerrit change
clarkboh I needed to force a failure too to hold the test nodes22:34
opendevreviewClark Boylan proposed opendev/system-config master: DNM testing an upstream gerrit change
clarkbinfra-root ^ I'm going to hold the gerrit 3.5 job for that and then use it to test that ssh looks happy with rsa keys22:35
clarkbif it does I'll submit the upstream fix and we can deploy that and everyone can use rsa again22:35
fungiawesome, thanks!22:36
clarkbin theory if I can ssh from my local machines to that held node using an rsa key it is working as my openssh is new enough22:37
fungiyeah, i have overrides in my .ssh/config for review.opendev.org22:38
clarkboh heh they already submitted it upstream. Well we'll test it anyway :)22:38
fungii could test with a non-overridden config22:39
fungiin fact, if i ssh by ip address, then my overrides won't be applied anyway22:39
clarkbya I tested this pretty extensively when I fixed 3.6. So I'm 99% sure it will work22:40
clarkbbut I figure being 100% sure is worthwhile22:40
opendevreviewIan Wienand proposed opendev/system-config master: [wip] switch testing bridge name to
clarkbI think the test gerrit instance is working23:33
clarkbfungi: is the host if you want to test. I logged in via the web ui as the zuul user (you click on the button on the login page) and then added an rsa key23:34
clarkbwhen I run ssh -v -i throwaway_rsa_key I get debug1: kex_input_ext_info: server-sig-algs=<...,rsa-sha2-512,rsa-sha2-256,ssh-rsa>23:34
clarkbI'm going to update my change to be a rebuild gerrit change so that we can get new images and hopefully deploy that soon23:35
opendevreviewClark Boylan proposed opendev/system-config master: Update our Gerrit images
clarkbalso I'm fairly certain I would've needed an override with my openssh client so the fact that it works at all is a good indication it is fixed23:40
fungi****    Welcome to Gerrit Code Review    ****23:42
fungiHi Zuul, you have successfully connected over SSH.23:42
fungiConnection to closed.23:42
clarkbfungi: and that was port 29418 right?23:42
clarkbif so I'll go ahead and delete the autohold and I thin kwe can proceed with landing 861117 when we want to plan a gerrit restart23:43
fungidebug1: kex_input_ext_info: server-sig-algs=<...,rsa-sha2-512,rsa-sha2-256,ssh-rsa>23:44
fungidebug3: sign_and_send_pubkey: signing using rsa-sha2-512 SHA256:...23:44
fungialso it would have to have been port 29418 for me to get the gerrit banner23:45
clarkbautohold deleted. Thank you for helping to test23:45
fungiany time! thanks for fixing it23:45
clarkbnow I'm trying to remembre all the people who might've done an override. I guess we can send email to the mailing lists23:46
fungiyeah, just cast a wide net once we're upgraded23:50
