frickler | clarkb: fungi: yes, iiuc the release part of the mirror update script is failing due to the name mismatch, sorry I didn't get to look at that this week | 08:00 |
---|---|---|
fungi | aha, i see... we had mirror.deb-nautilus, mirror.deb-octopus, mirror.deb-quincy but now mirror.ceph-deb-reef | 13:13 |
clarkb | tonyb: (and other infra-root) I can be around today for meetpad cluster deployment and/or gerrit image and db updates if we're still wanting to do that | 15:07 |
clarkb | I'm a bit wary of doing gerrit upgrade stuff if I'm the only one around otherwise I'd just go ahead and send those changes in | 15:12 |
clarkb | no new wiki cert notices since I updated the cert \o/ | 15:13 |
tonyb | I'm around. I need to go for a short run now(ish) but after that I'm still keen to do the meetpad and gerrit updates if you are | 15:22 |
clarkb | tonyb: sounds good. I'll go ahead and approve the two gerrit changes now then (they don't modify the inventory so should apply much more quickly than the meetpad change) | 15:28 |
clarkb | for the gerrit restart my plan is to pull, down, move waiting queue aside, up only mariadb and wait for the upgrade to complete, then up the gerrit service proper. Slightly more involved than usual but not too bad | 15:38 |
clarkb | does this look good for when we're ready to actually restart things: #status notice There will be a short Gerrit downtime while we update a database and our container image | 16:12 |
opendevreview | Merged opendev/system-config master: Actually rebuild Gerrit images to get a new 3.9 image https://review.opendev.org/c/opendev/system-config/+/918333 | 16:46 |
opendevreview | Merged opendev/system-config master: Upgrade Gerrit's backend database to MariaDB 10.11 https://review.opendev.org/c/opendev/system-config/+/916848 | 16:46 |
clarkb | deployment for that is running now. When that is done I'll double check the docker compose file then I guess plan to proceed when tonyb is back | 16:48 |
clarkb | yup the docker compose file lgtm. I'm ready when you are tonyb | 16:52 |
tonyb | Back. | 17:06 |
tonyb | Sorry the short run turned into a run with a dog, then the short run, and then a trip to the coffee shop | 17:06 |
tonyb | clarkb: ready when you are. | 17:07 |
clarkb | tonyb: ok, do you want me to start a root screen or should I just go for it and report back here? | 17:08 |
tonyb | I started a screen. | 17:09 |
tonyb | I expect it to be reasonably straight forward | 17:09 |
clarkb | ya my plan is to do the image pull, send the notice, then down containers, move waiting queue aside, up only mariadb and check the upgrade then up gerrit | 17:10 |
tonyb | Sounds good to me | 17:10 |
clarkb | tonyb: does this notice look ok: #status notice There will be a short Gerrit downtime while we update a database and our container image | 17:10 |
tonyb | clarkb: LGTM | 17:12 |
clarkb | #status notice There will be a short Gerrit downtime while we update a database and our container image | 17:13 |
opendevstatus | clarkb: sending notice | 17:13 |
-opendevstatus- NOTICE: There will be a short Gerrit downtime while we update a database and our container image | 17:13 | |
opendevstatus | clarkb: finished sending notice | 17:15 |
clarkb | ok proceeding with the updates now | 17:16 |
tonyb | ++ | 17:16 |
clarkb | Gerrit Code Review 3.8.5-18-g518840c5b8-dirty ready | 17:18 |
clarkb | The web UI loads for me too | 17:19 |
tonyb | Yup | 17:19 |
clarkb | still waiting on diffs to be functioanl. Once those are happy I'll check toggling of the reviewed flag (whcih is all that db is used for) | 17:19 |
tonyb | The one I checked was fine. | 17:20 |
clarkb | ya diffs are loading. If I review a file I haven't reviewed before it goes into a reviewed state. I've not been able to go from reviewed -> unreviewed successfully yet | 17:21 |
clarkb | I'm tailing the error log in another terminal and i don't see anything amiss there. Seems to mostly be working | 17:22 |
tonyb | I tried a recheck but it hasn't reach zuul yet | 17:22 |
clarkb | ok if I uncheck the check mark within the diff view page then it gets marked unreviewed. But if I click the mark unreviewed button at the top level change page that doesn't seem to work. Considering it works at all I don't think that is a db problem and may be a cache thing or a bug | 17:23 |
clarkb | gmann just pushed a change to ozj and it was reported by the bot in #openstack-infra so change push is working | 17:23 |
clarkb | looks like the top level toggle may be working for me as well now. I think that may be a caching thing | 17:24 |
clarkb | tonyb: what change did you recheck? | 17:24 |
clarkb | gmann's change made it in | 17:24 |
tonyb | https://review.opendev.org/c/openstack/requirements/+/916565 | 17:24 |
clarkb | hrm | 17:24 |
clarkb | oh did you use two lines? I think recheck may only work on single line comments | 17:25 |
gmann | yeah, recheck on this worked fine https://review.opendev.org/c/openstack/osc-placement/+/904612 | 17:25 |
clarkb | due to how the regex checks work | 17:25 |
clarkb | gmann: thanks! I think the reason is the tonyb's recheck comment has multipel lines in it. | 17:25 |
gmann | ohk | 17:25 |
clarkb | so far this is looking good to me | 17:26 |
tonyb | clarkb: Oh interesting. I don't think I've ever seen that before. | 17:26 |
tonyb | clarkb: Yup. Looks good to me. | 17:26 |
gmann | yeah, multiline also used to work right? I think I have done many time in past I am remembering correctly | 17:26 |
clarkb | and the 3.9 image did promote according to the zuul comment on that change. I should be able to do some gerrit testing again next week with a mroe up to date image | 17:26 |
tonyb | clarkb: awesome | 17:27 |
clarkb | gmann: it worked on 2.13 but then the 3.2 upgrade changed the format slightly or something (there is markup in the event that you have to handle with the regex and you have to do a multiline regex now and we just don't) | 17:27 |
gmann | clarkb: i see. | 17:28 |
clarkb | tonyb: I suppose we can proceed by approving meetpad changes now? I was also going to suggest you double check bridge can ssh to the new hosts as root using their ip addresses to double check we don't have to update known hosts (should be automated now but may be worth checking upfront) | 17:30 |
tonyb | Okay. Yup I can do that. I was just looking at those changes. | 17:31 |
clarkb | tonyb: any objection to me exiting and closing the screen on review? I just read through the log and I'm happy with it and don't think we need to keep that running any longer | 17:33 |
tonyb | Go for it | 17:33 |
clarkb | done, thank you for the extra set of eyeballs on that. Feels good to get that out of the way as it sets up upgrade testing for next week | 17:34 |
tonyb | clarkb: root@bridge can ssh into the new hosts as expected | 17:34 |
tonyb | clarkb: You're very welcome | 17:35 |
clarkb | tonyb: we should probably land the dns change first and make sure it is applied. I think some of the meetpad configs rely on dns | 17:36 |
clarkb | then approve the system-config change | 17:36 |
tonyb | Yup That was my plan. I'm just double checking the changes deployed as expected to the {meetpad,jvb}01 servers | 17:37 |
opendevreview | Merged opendev/zone-opendev.org master: Add DNS records for new Meetpad and JVB servers https://review.opendev.org/c/opendev/zone-opendev.org/+/918361 | 17:42 |
tonyb | Okay DNS change is approved, I've double checked and I think the existing meetpad servers have the correct settings. | 17:42 |
tonyb | I'll add {meetpad,jvb}01 to the emergency file | 17:42 |
clarkb | ++ | 17:42 |
clarkb | I was suddenly concerned about LE host/group vars but realized my system-config was out of date. I thought i had reviewed the fix for that and indeed i just needed to pull | 17:46 |
tonyb | #phew! | 17:46 |
tonyb | I can ssh from bridge to the new servers by ip (v4 and v6) and name but the fingerprints aren't being saved. Is that expected? | 17:49 |
clarkb | those new dns records resolve for me | 17:49 |
clarkb | tonyb: no, but maybe that has to do with how we manage the fingerprints in the inventory? Eg running jobs are overwriting the file? | 17:49 |
tonyb | Ah that could be | 17:50 |
tonyb | clarkb: Okay for me to +A 918362: Add inventory records for new Meetpad and JVB servers | https://review.opendev.org/c/opendev/system-config/+/918362 ? | 17:50 |
clarkb | tonyb: ya it looks like the known hosts are stored in /etc/ssh/ssh_known_hosts | 17:51 |
clarkb | and that is managed by ansible so ya my idea to check that is probably not useful. I don't know why it doesn't record things in the user file though. Maybe we have that disabled in config? | 17:51 |
clarkb | tonyb: the only other thing I notice is when ssh into the new server it says they could use reboots | 17:52 |
clarkb | tonyb: maybe quickly reboot them to apply patches then approve? | 17:52 |
clarkb | dns is resolving and LE looks right and ssh should work so I think we're good otherwise | 17:52 |
tonyb | Okay rebooting them. | 17:53 |
tonyb | Oh dpawlik has migrated onto logscraper02 and logscraper01 is no longer needed. Any objections with me taking a snapshot and turning logscraper01 off after the meetpad updates? | 17:54 |
clarkb | tonyb: that seems fine to me | 17:55 |
tonyb | Okay servers are back | 17:55 |
tonyb | 918362 approved | 17:55 |
clarkb | it will probably take a while for 918362's deploy jobs to run since we're updating the inventory it hits everything with the corresponding infra-prod-* job | 18:00 |
tonyb | Okay | 18:01 |
clarkb | I've rechecked 893571 in order to get new gerrit upgrade test nodes held with the latest images | 18:35 |
clarkb | but it will probably be Monday before I do any testing on those | 18:35 |
opendevreview | Merged opendev/system-config master: Add inventory records for new Meetpad and JVB servers https://review.opendev.org/c/opendev/system-config/+/918362 | 18:35 |
clarkb | meetpad is about halfway through the list of jobs for ^ I'm giong to guess 38 minutes from now the job starts | 18:37 |
tonyb | Okay, I might grab lunch while the first services go through the deploy pipeline | 18:38 |
clarkb | its still working its way through. I'm going to eat something too | 18:53 |
clarkb | my estimate was low | 19:15 |
tonyb | Enjoy. | 19:15 |
opendevreview | Ghanshyam proposed openstack/project-config master: End gate and update acl for retiring solum projects https://review.opendev.org/c/openstack/project-config/+/919212 | 19:23 |
clarkb | that reminds me I said I would do that for devsatck-gate /me makes another reminder note | 19:27 |
clarkb | tonyb: the meetpad job should be starting soon | 19:28 |
clarkb | yup starting now | 19:28 |
tonyb | Yup | 19:28 |
clarkb | job is done and it reports success | 19:33 |
clarkb | tonyb: do we edit /etc/hosts locally to point meetpad.o.o at meetpad02 and do a test call and if that works update meetpad.o.o's DNS record? | 19:33 |
tonyb | Yup looking at the config on the new nodes it looks good as well | 19:33 |
clarkb | then after that we can shutdown services on the old servers just to amek sure we're not somehow relying on them unexpectedly and then delete the servers at $futuretime | 19:33 |
tonyb | clarkb: Sounds good | 19:34 |
tonyb | let me relocate | 19:34 |
clarkb | I'm in https://meetpad.opendev.org/isitbroken | 19:35 |
clarkb | tonyb: after we update DNS we should also see if we can determine that jvb02 is working. But that is less critical as meetpad proper also runs a jvb | 19:39 |
opendevreview | Tony Breeds proposed opendev/zone-opendev.org master: Switch meetpad to new server https://review.opendev.org/c/opendev/zone-opendev.org/+/919220 | 19:44 |
clarkb | oh shoot it just occured to me that a good prep step would be to set the old dns record to 5 minutes | 19:44 |
clarkb | oh well | 19:44 |
clarkb | *old dns record ttl | 19:44 |
tonyb | clarkb: Yup. I guess we shutdown the jvb container on meetpad02 and verify everything still works | 19:44 |
tonyb | Ahhh yes that would have been smart | 19:45 |
clarkb | ya or I think maybe if we create two calls it will round robin them and at least one will end up on jvb02 and we should see that in jvb02's logs | 19:45 |
tonyb | Okay. | 19:45 |
tonyb | eeek | 19:46 |
tonyb | 1org.jivesoftware.smack.SmackException$EndpointConnectionException: The following addresses failed: 'RFC 6120 A/AAAA Endpoint + [/104.239.240.194:5222] (/104.239.240.194:5222)' failed because: java.net.NoRouteToHostException: No route to host | 19:46 |
tonyb | from jvb02 | 19:47 |
clarkb | tonyb: ya I was worried about that. I think jvb's are configured to speak xmpp to meetpad.o.o which is still pointing at 01 | 19:47 |
tonyb | Yup. | 19:48 |
tonyb | I can tweak the config manually | 19:48 |
clarkb | after DNS updates we can restart services on jvb02 and in theory that message goes away. The reason for the failure is we block that port in the firewall except for nodes in the cluster but since we put the other hosts in the emergency file they aren't in the rules | 19:48 |
tonyb | but should I? | 19:48 |
clarkb | no I think this is fine and we can just accept that we have to wait for dns to update and do a service restart | 19:48 |
clarkb | otherwise we would have to coordinate more stuff in config with changes? | 19:48 |
clarkb | though interestingly this means jvb01 may connect to new meetpad after dns updates | 19:49 |
tonyb | Actually it wont go away as XMPP_SERVER was explicitly set to the ipaddress of meetpad | 19:49 |
tonyb | jvb01 has the same thing so it will stay talking to meetpad01 | 19:49 |
clarkb | tonyb: where is that configured? | 19:50 |
clarkb | meetpad_jvb_xmpp_server that var? | 19:50 |
clarkb | inventory/service/group_vars/jvb.yaml:meetpad_jvb_xmpp_server: "{{ hostvars['meetpad01.opendev.org'].ansible_host }}" | 19:50 |
tonyb | Yup there | 19:51 |
clarkb | tonyb: I guess we should hae a change that sets it to meetpad02 and then let it update. And in that case I agree you can manually modify it in the interim | 19:51 |
tonyb | Okay | 19:51 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire Solum: remove project from infra https://review.opendev.org/c/openstack/project-config/+/919223 | 19:51 |
clarkb | but I would say it isn't urgent. The meetpad service will work with the one jvb. It would be good to address before the next event like a ptg though. | 19:53 |
clarkb | mostly just want to make it clear the service is fine from an end user perspective until we add load to it | 19:53 |
opendevreview | Tony Breeds proposed opendev/system-config master: Switch +meetpad_jvb_xmpp_server to the correct server https://review.opendev.org/c/opendev/system-config/+/919224 | 19:53 |
clarkb | tonyb: +meetpad? I don't think it matters so I'ev approved it | 19:53 |
tonyb | Yeah I was a litlle too generous with copy-paste selection | 19:54 |
tonyb | Okay after manually fixing jvb02 it looks like it connected just fine | 19:56 |
opendevreview | Merged opendev/zone-opendev.org master: Switch meetpad to new server https://review.opendev.org/c/opendev/zone-opendev.org/+/919220 | 20:02 |
fungi | looks like i missed out on upgrade excitement, but pretty sure i had more fun hiking | 20:14 |
tonyb | fungi: Yes I'm sure you did. | 20:14 |
clarkb | fungi: there is more to come, but it is mostly just followup cleanups | 20:14 |
clarkb | meetpad dns has updated for me | 20:19 |
clarkb | in an hour or so we can shutdown services on the old servers | 20:20 |
tonyb | Sounds good | 20:20 |
tonyb | DNS hasn't updated yet | 20:20 |
clarkb | tonyb: the job ran, but you may have older cached records | 20:21 |
tonyb | Yeah it's cache here | 20:21 |
tonyb | dig meetpad.opendev.org hasn't updated whereas dig meetpad.opendev.org @ns03.opendev.org. has. | 20:23 |
opendevreview | Merged opendev/system-config master: Switch +meetpad_jvb_xmpp_server to the correct server https://review.opendev.org/c/opendev/system-config/+/919224 | 20:47 |
clarkb | tonyb: I think there is a bug in the infra-prod-service-meetpad job where ^ doesn't trigger it to run | 21:09 |
clarkb | but you fixed it by hand and we can just check that the daily runs noop next week | 21:09 |
clarkb | I think in about 10 minutes we'll be safe to shutdown the old services | 21:10 |
opendevreview | Ghanshyam proposed openstack/project-config master: End gate and update acl for retiring solum projects https://review.opendev.org/c/openstack/project-config/+/919212 | 21:13 |
opendevreview | Ghanshyam proposed openstack/project-config master: End gate and update acl for retiring senlin projects https://review.opendev.org/c/openstack/project-config/+/919348 | 21:20 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire Solum: remove project from infra https://review.opendev.org/c/openstack/project-config/+/919223 | 21:21 |
clarkb | my local dns records haev rolled over their TTLs so we should be good to shutdown old services now | 21:23 |
tonyb | Okay | 21:23 |
clarkb | tonyb: is your local cache seeing the up to date records? | 21:23 |
tonyb | Yup | 21:24 |
clarkb | looks like both servers are shutdown now. I actually logged into meetpad01 just before systemd said "go away" :) | 21:27 |
tonyb | LOL | 21:27 |
tonyb | Okay they're shutoff in rax now | 21:30 |
clarkb | tonyb: fwiw I had a docker-compose down in my head, but this works equally well and maybe even better as they will consume fewer resources this way | 21:31 |
tonyb | clarkb: Ahh okay. That would have worked just as well ;P | 21:32 |
tonyb | clarkb: I think it's all okay based on my quick test | 21:34 |
tonyb | it's bad timing but I have to pop to the store | 21:34 |
clarkb | yup I think everything seems happy | 21:35 |
clarkb | as noted the only leftover todo is to check that jvb02 gets the expected config from the daily jobs | 21:36 |
clarkb | and eventualyl we clean up the old servers | 21:36 |
clarkb | thanks! | 21:36 |
clarkb | I myself am hearing the call of the weekend | 21:36 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire Senlin: remove project from infra https://review.opendev.org/c/openstack/project-config/+/919355 | 21:46 |
opendevreview | Ghanshyam proposed openstack/project-config master: End gate and update acl for retiring murano projects https://review.opendev.org/c/openstack/project-config/+/919359 | 22:37 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire Murano: remove project from infra https://review.opendev.org/c/openstack/project-config/+/919371 | 23:12 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!