Friday, 2024-05-10

fricklerclarkb: fungi: yes, iiuc the release part of the mirror update script is failing due to the name mismatch, sorry I didn't get to look at that this week08:00
fungiaha, i see... we had mirror.deb-nautilus, mirror.deb-octopus, mirror.deb-quincy but now mirror.ceph-deb-reef13:13
clarkbtonyb: (and other infra-root) I can be around today for meetpad cluster deployment and/or gerrit image and db updates if we're still wanting to do that15:07
clarkbI'm a bit wary of doing gerrit upgrade stuff if I'm the only one around otherwise I'd just go ahead and send those changes in15:12
clarkbno new wiki cert notices since I updated the cert \o/15:13
tonybI'm around.  I need to go for a short run now(ish)  but after that I'm still keen to do the meetpad and gerrit updates if you are15:22
clarkbtonyb: sounds good. I'll go ahead and approve the two gerrit changes now then (they don't modify the inventory so should apply much more quickly than the meetpad change)15:28
clarkbfor the gerrit restart my plan is to pull, down, move waiting queue aside, up only mariadb and wait for the upgrade to complete, then up the gerrit service proper. Slightly more involved than usual but not too bad15:38
clarkbdoes this look good for when we're ready to actually restart things: #status notice There will be a short Gerrit downtime while we update a database and our container image16:12
opendevreviewMerged opendev/system-config master: Actually rebuild Gerrit images to get a new 3.9 image  https://review.opendev.org/c/opendev/system-config/+/91833316:46
opendevreviewMerged opendev/system-config master: Upgrade Gerrit's backend database to MariaDB 10.11  https://review.opendev.org/c/opendev/system-config/+/91684816:46
clarkbdeployment for that is running now. When that is done I'll double check the docker compose file then I guess plan to proceed when tonyb is back16:48
clarkbyup the docker compose file lgtm. I'm ready when you are tonyb 16:52
tonybBack.17:06
tonybSorry the short run turned into a run with a dog, then the short run, and then a trip to the coffee shop17:06
tonybclarkb: ready when you are.17:07
clarkbtonyb: ok, do you want me to start a root screen or should I just go for it and report back here?17:08
tonybI started a screen.17:09
tonybI expect it to be reasonably straight forward17:09
clarkbya my plan is to do the image pull, send the notice, then down containers, move waiting queue aside, up only mariadb and check the upgrade then up gerrit17:10
tonybSounds good to me17:10
clarkbtonyb: does this notice look ok: #status notice There will be a short Gerrit downtime while we update a database and our container image17:10
tonybclarkb: LGTM17:12
clarkb#status notice There will be a short Gerrit downtime while we update a database and our container image17:13
opendevstatusclarkb: sending notice17:13
-opendevstatus- NOTICE: There will be a short Gerrit downtime while we update a database and our container image17:13
opendevstatusclarkb: finished sending notice17:15
clarkbok proceeding with the updates now17:16
tonyb++17:16
clarkbGerrit Code Review 3.8.5-18-g518840c5b8-dirty ready17:18
clarkbThe web UI loads for me too17:19
tonybYup17:19
clarkbstill waiting on diffs to be functioanl. Once those are happy I'll check toggling of the reviewed flag (whcih is all that db is used for)17:19
tonybThe one I checked was fine.17:20
clarkbya diffs are loading. If I review a file I haven't reviewed before it goes into a reviewed state. I've not been able to go from reviewed -> unreviewed successfully yet17:21
clarkbI'm tailing the error log in another terminal and i don't see anything amiss there. Seems to mostly be working17:22
tonybI tried a recheck but it hasn't reach zuul yet17:22
clarkbok if I uncheck the check mark within the diff view page then it gets marked unreviewed. But if I click the mark unreviewed button at the top level change page that doesn't seem to work. Considering it works at all I don't think that is a db problem and may be a cache thing or a bug17:23
clarkbgmann just pushed a change to ozj and it was reported by the bot in #openstack-infra so change push is working17:23
clarkblooks like the top level toggle may be working for me as well now. I think that may be a caching thing17:24
clarkbtonyb: what change did you recheck?17:24
clarkbgmann's change made it in17:24
tonybhttps://review.opendev.org/c/openstack/requirements/+/91656517:24
clarkbhrm17:24
clarkboh did you use two lines? I think recheck may only work on single line comments17:25
gmannyeah, recheck on this worked fine https://review.opendev.org/c/openstack/osc-placement/+/90461217:25
clarkbdue to how the regex checks work17:25
clarkbgmann: thanks! I think the reason is the tonyb's recheck comment has multipel lines in it.17:25
gmannohk17:25
clarkbso far this is looking good to me17:26
tonybclarkb: Oh interesting.   I don't think I've ever seen that before.17:26
tonybclarkb: Yup.  Looks good to me.17:26
gmannyeah, multiline also used to work right? I think I have done many time in past I am remembering correctly 17:26
clarkband the 3.9 image did promote according to the zuul comment on that change. I should be able to do some gerrit testing again next week with a mroe up to date image17:26
tonybclarkb: awesome17:27
clarkbgmann: it worked on 2.13 but then the 3.2 upgrade changed the format slightly or something (there is markup in the event that you have to handle with the regex and you have to do a multiline regex now and we just don't)17:27
gmannclarkb: i see. 17:28
clarkbtonyb: I suppose we can proceed by approving meetpad changes now? I was also going to suggest you double check bridge can ssh to the new hosts as root using their ip addresses to double check we don't have to update known hosts (should be automated now but may be worth checking upfront)17:30
tonybOkay.  Yup I can do that.  I was just looking at those changes.17:31
clarkbtonyb: any objection to me exiting and closing the screen on review? I just read through the log and I'm happy with it and don't think we need to keep that running any longer17:33
tonybGo for it17:33
clarkbdone, thank you for the extra set of eyeballs on that. Feels good to get that out of the way as it sets up upgrade testing for next week17:34
tonybclarkb: root@bridge can ssh into the new hosts as expected17:34
tonybclarkb: You're very welcome17:35
clarkbtonyb: we should probably land the dns change first and make sure it is applied. I think some of the meetpad configs rely on dns17:36
clarkbthen approve the system-config change17:36
tonybYup  That was my plan.  I'm just double checking the changes deployed as expected to the {meetpad,jvb}01 servers17:37
opendevreviewMerged opendev/zone-opendev.org master: Add DNS records for new Meetpad and JVB servers  https://review.opendev.org/c/opendev/zone-opendev.org/+/91836117:42
tonybOkay DNS change is approved, I've double checked and I think the existing meetpad servers have the correct settings.17:42
tonybI'll add {meetpad,jvb}01 to the emergency file17:42
clarkb++17:42
clarkbI was suddenly concerned about LE host/group vars but realized my system-config was out of date. I thought i had reviewed the fix for that and indeed i just needed to pull17:46
tonyb#phew!17:46
tonybI can ssh from bridge to the new servers by ip (v4 and v6) and name but the fingerprints aren't being saved.  Is that expected?17:49
clarkbthose new dns records resolve for me17:49
clarkbtonyb: no, but maybe that has to do with how we manage the fingerprints in the inventory? Eg running jobs are overwriting the file?17:49
tonybAh that could be17:50
tonybclarkb: Okay for me to +A 918362: Add inventory records for new Meetpad and JVB servers | https://review.opendev.org/c/opendev/system-config/+/918362 ?17:50
clarkbtonyb: ya it looks like the known hosts are stored in /etc/ssh/ssh_known_hosts17:51
clarkband that is managed by ansible so ya my idea to check that is probably not useful. I don't know why it doesn't record things in the user file though. Maybe we have that disabled in config?17:51
clarkbtonyb: the only other thing I notice is when ssh into the new server it says they could use reboots17:52
clarkbtonyb: maybe quickly reboot them to apply patches then approve?17:52
clarkbdns is resolving and LE looks right and ssh should work so I think we're good otherwise17:52
tonybOkay rebooting them.17:53
tonybOh dpawlik has migrated onto logscraper02 and logscraper01 is no longer needed.   Any objections with me taking a snapshot and turning logscraper01 off after the meetpad updates?17:54
clarkbtonyb: that seems fine to me17:55
tonybOkay servers are back17:55
tonyb918362 approved17:55
clarkbit will probably take a while for 918362's deploy jobs to run since we're updating the inventory it hits everything with the corresponding infra-prod-* job18:00
tonybOkay18:01
clarkbI've rechecked 893571 in order to get new gerrit upgrade test nodes held with the latest images18:35
clarkbbut it will probably be Monday before I do any testing on those18:35
opendevreviewMerged opendev/system-config master: Add inventory records for new Meetpad and JVB servers  https://review.opendev.org/c/opendev/system-config/+/91836218:35
clarkbmeetpad is about halfway through the list of jobs for ^ I'm giong to guess 38 minutes from now the job starts18:37
tonybOkay, I might grab lunch while the first services go through the deploy pipeline18:38
clarkbits still working its way through. I'm going to eat something too18:53
clarkbmy estimate was low19:15
tonybEnjoy.19:15
opendevreviewGhanshyam proposed openstack/project-config master: End gate and update acl for retiring solum projects  https://review.opendev.org/c/openstack/project-config/+/91921219:23
clarkbthat reminds me I said I would do that for devsatck-gate /me makes another reminder note19:27
clarkbtonyb: the meetpad job should be starting soon19:28
clarkbyup starting now19:28
tonybYup19:28
clarkbjob is done and it reports success19:33
clarkbtonyb: do we edit /etc/hosts locally to point meetpad.o.o at meetpad02 and do a test call and if that works update meetpad.o.o's DNS record?19:33
tonybYup looking at the config on the new nodes it looks good as well19:33
clarkbthen after that we can shutdown services on the old servers just to amek sure we're not somehow relying on them unexpectedly and then delete the servers at $futuretime19:33
tonybclarkb: Sounds good19:34
tonyblet me relocate19:34
clarkbI'm in https://meetpad.opendev.org/isitbroken19:35
clarkbtonyb: after we update DNS we should also see if we can determine that jvb02 is working. But that is less critical as meetpad proper also runs a jvb19:39
opendevreviewTony Breeds proposed opendev/zone-opendev.org master: Switch meetpad to new server  https://review.opendev.org/c/opendev/zone-opendev.org/+/91922019:44
clarkboh shoot it just occured to me that a good prep step would be to set the old dns record to 5 minutes19:44
clarkboh well19:44
clarkb*old dns record ttl19:44
tonybclarkb: Yup.  I guess we shutdown the jvb container on meetpad02 and verify everything still works19:44
tonybAhhh yes that would have been smart19:45
clarkbya or I think maybe if we create two calls it will round robin them and at least one will end up on jvb02 and we should see that in jvb02's logs19:45
tonybOkay.19:45
tonybeeek19:46
tonyb1org.jivesoftware.smack.SmackException$EndpointConnectionException: The following addresses failed: 'RFC 6120 A/AAAA Endpoint + [/104.239.240.194:5222] (/104.239.240.194:5222)' failed because: java.net.NoRouteToHostException: No route to host19:46
tonybfrom jvb0219:47
clarkbtonyb: ya I was worried about that. I think jvb's are configured to speak xmpp to meetpad.o.o which is still pointing at 0119:47
tonybYup.19:48
tonybI can tweak the config manually19:48
clarkbafter DNS updates we can restart services on jvb02 and in theory that message goes away. The reason for the failure is we block that port in the firewall except for nodes in the cluster but since we put the other hosts in the emergency file they aren't in the rules19:48
tonybbut should I?19:48
clarkbno I think this is fine and we can just accept that we have to wait for dns to update and do a service restart19:48
clarkbotherwise we would have to coordinate more stuff in config with changes?19:48
clarkbthough interestingly this means jvb01 may connect to new meetpad after dns updates19:49
tonybActually it wont go away as XMPP_SERVER was explicitly set to the ipaddress of meetpad19:49
tonybjvb01 has the same thing so it will stay talking to meetpad0119:49
clarkbtonyb: where is that configured?19:50
clarkbmeetpad_jvb_xmpp_server that var?19:50
clarkbinventory/service/group_vars/jvb.yaml:meetpad_jvb_xmpp_server: "{{ hostvars['meetpad01.opendev.org'].ansible_host }}"19:50
tonybYup there19:51
clarkbtonyb: I guess we should hae a change that sets it to meetpad02 and then let it update. And in that case I agree you can manually modify it in the interim19:51
tonybOkay19:51
opendevreviewGhanshyam proposed openstack/project-config master: Retire Solum: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/91922319:51
clarkbbut I would say it isn't urgent. The meetpad service will work with the one jvb. It would be good to address before the next event like a ptg though.19:53
clarkbmostly just want to make it clear the service is fine from an end user perspective until we add load to it19:53
opendevreviewTony Breeds proposed opendev/system-config master: Switch +meetpad_jvb_xmpp_server to the correct server  https://review.opendev.org/c/opendev/system-config/+/91922419:53
clarkbtonyb: +meetpad? I don't think it matters so I'ev approved it19:53
tonybYeah I was a litlle too generous with copy-paste selection19:54
tonybOkay after manually fixing jvb02 it looks like it connected just fine19:56
opendevreviewMerged opendev/zone-opendev.org master: Switch meetpad to new server  https://review.opendev.org/c/opendev/zone-opendev.org/+/91922020:02
fungilooks like i missed out on upgrade excitement, but pretty sure i had more fun hiking20:14
tonybfungi: Yes I'm sure you did.20:14
clarkbfungi: there is more to come, but it is mostly just followup cleanups20:14
clarkbmeetpad dns has updated for me20:19
clarkbin an hour or so we can shutdown services on the old servers20:20
tonybSounds good20:20
tonybDNS hasn't updated yet20:20
clarkbtonyb: the job ran, but you may have older cached records20:21
tonybYeah it's cache here20:21
tonybdig meetpad.opendev.org hasn't updated whereas dig meetpad.opendev.org @ns03.opendev.org. has.20:23
opendevreviewMerged opendev/system-config master: Switch +meetpad_jvb_xmpp_server to the correct server  https://review.opendev.org/c/opendev/system-config/+/91922420:47
clarkbtonyb: I think there is a bug in the infra-prod-service-meetpad job where ^ doesn't trigger it to run21:09
clarkbbut you fixed it by hand and we can just check that the daily runs noop next week21:09
clarkbI think in about 10 minutes we'll be safe to shutdown the old services21:10
opendevreviewGhanshyam proposed openstack/project-config master: End gate and update acl for retiring solum projects  https://review.opendev.org/c/openstack/project-config/+/91921221:13
opendevreviewGhanshyam proposed openstack/project-config master: End gate and update acl for retiring senlin projects  https://review.opendev.org/c/openstack/project-config/+/91934821:20
opendevreviewGhanshyam proposed openstack/project-config master: Retire Solum: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/91922321:21
clarkbmy local dns records haev rolled over their TTLs so we should be good to shutdown old services now21:23
tonybOkay21:23
clarkbtonyb: is your local cache seeing the up to date records?21:23
tonybYup21:24
clarkblooks like both servers are shutdown now. I actually logged into meetpad01 just before systemd said "go away" :)21:27
tonybLOL21:27
tonybOkay they're shutoff in rax now21:30
clarkbtonyb: fwiw I had a docker-compose down in my head, but this works equally well and maybe even better as they will consume fewer resources this way21:31
tonybclarkb: Ahh okay.   That would have worked just as well ;P21:32
tonybclarkb: I think it's all okay based on my quick test21:34
tonybit's bad timing but I have to pop to the store21:34
clarkbyup I think everything seems happy21:35
clarkbas noted the only leftover todo is to check that jvb02 gets the expected config from the daily jobs21:36
clarkband eventualyl we clean up the old servers21:36
clarkbthanks!21:36
clarkbI myself am hearing the call of the weekend21:36
opendevreviewGhanshyam proposed openstack/project-config master: Retire Senlin: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/91935521:46
opendevreviewGhanshyam proposed openstack/project-config master: End gate and update acl for retiring murano projects  https://review.opendev.org/c/openstack/project-config/+/91935922:37
opendevreviewGhanshyam proposed openstack/project-config master: Retire Murano: remove project from infra  https://review.opendev.org/c/openstack/project-config/+/91937123:12

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!