clarkb | I'm working on an ansible update fwiw | 00:00 |
---|---|---|
corvus | mm logs for opendev site lgtm | 00:01 |
corvus | ++exim | 00:01 |
fungi | exim started, sending my update now | 00:02 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible https://review.opendev.org/c/opendev/system-config/+/808570 | 00:03 |
fungi | exim logged receipt of my message | 00:03 |
clarkb | I don't see it in the archive or my inbox yet | 00:04 |
fungi | yeah, mailman doesn't seem to have picked it up, checking its logs | 00:05 |
corvus | what's the queue id of the msg? | 00:07 |
corvus | exim q id | 00:08 |
fungi | 1mPZR5-00018k-Q3 | 00:08 |
fungi | claims it was handed off to mailman | 00:09 |
corvus | same cgi wrapper | 00:14 |
corvus | exit 0 is going to make exim think it succeeded | 00:15 |
corvus | probably need to change all uses of MAILMAN_SITE_DIR to HOST | 00:15 |
corvus | /var/lib/mailman/mail/mailman is the binary exim calls | 00:15 |
fungi | should i stop exim? | 00:15 |
corvus | yeah | 00:15 |
corvus | we're bitbucketing incoming messages | 00:16 |
fungi | stopped | 00:16 |
clarkb | the other place we set it out side of ansible to create lists is the init scripts for each of the sites in /etc/init.d/mailman-* | 00:16 |
corvus | and the exim config | 00:16 |
clarkb | ah yup | 00:17 |
clarkb | I'll work on the ansible side fixup | 00:17 |
fungi | okay, cleared my schedule for the next little while, we need the per-site configs updated, and the initscripts | 00:20 |
fungi | i guess i should stop all the mailman services as well | 00:20 |
fungi | doing that now | 00:21 |
clarkb | fungi: ++ | 00:21 |
fungi | all the mailman services are stopped | 00:21 |
fungi | i'll fix up the local copies of initscripts | 00:21 |
clarkb | the exim one is trickier and I'm not entirely sure how to update it | 00:23 |
clarkb | the current exim config does the lookup for mm_cfg.py but we want mm_cfg.py to do the lookup | 00:23 |
clarkb | I think I do something like environment = HOST=${lc:$domain} | 00:24 |
clarkb | other places we do lc::$domain. Is there a difference? | 00:24 |
clarkb | corvus: ^ | 00:25 |
clarkb | fungi: ya thats the exim file that needs to update I'm not entirely sure of the syntax. I expect you and corvus understand it better | 00:27 |
fungi | yep, i'm looking | 00:27 |
fungi | right now the mailman_transport sets | 00:27 |
fungi | environment = MAILMAN_SITE_DIR=${lookup{${lc:$domain}}lsearch{/etc/mailman/sites}} | 00:27 |
clarkb | I think we want something like environment = HOST=${lc:$domain} | 00:28 |
fungi | i agree HOST=${lc:$domain} should suffice | 00:28 |
fungi | it's really just a subset of MAILMAN_SITE_DIR without the external file mapping | 00:28 |
clarkb | mm_cfg.py is doing the external file mapping for us now when HOST is set | 00:29 |
fungi | right, we essentially moved it there | 00:29 |
fungi | i've made that edit on the server as well as adding HOST to the initscripts if we want to start the opendev site up again, start exim, and send another test | 00:30 |
clarkb | fungi: the only question I had was above in the file we do ${lc::$domain} note the extra : note sure what that difference is | 00:31 |
clarkb | but ya I think starting exim4 and opendev services and trying again is probably worthwhile | 00:32 |
fungi | oh, https://www.exim.org/exim-html-current/doc/html/spec_html/ch-string_expansions.html | 00:34 |
fungi | ${lc:<string>} | 00:34 |
fungi | This forces the letters in the string into lower-case | 00:34 |
clarkb | ah we want that | 00:34 |
fungi | i'm not finding any lc:: examples | 00:34 |
clarkb | fungi: its in the file you edited | 00:34 |
clarkb | unless that is for some other file? it is in a different config section in the host vars area of ansible | 00:34 |
clarkb | I think we want lower case for the lookup to work so that lgtm | 00:34 |
fungi | ${<op>:<string>} | 00:35 |
fungi | The string is first itself expanded, and then the operation specified by <op> is applied to it. | 00:35 |
clarkb | aha | 00:35 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible https://review.opendev.org/c/opendev/system-config/+/808570 | 00:36 |
clarkb | that chagne is trying to keep a running list of the updates necessary | 00:36 |
corvus | back; 1 sec | 00:37 |
corvus | exim.conf lgtm | 00:39 |
fungi | if there are no objections, i'll start mailman-opendev and exim4 and try to send another message | 00:40 |
clarkb | fungi: none from me | 00:40 |
clarkb | re 808570 I think what we can do is hold a node that it executed on in zuul then compare all the files to be happy it will do what we want when deployed | 00:40 |
clarkb | but that is feeling like a tomorrow activity | 00:40 |
fungi | cool, in that case i'll test again now | 00:43 |
fungi | sent | 00:45 |
fungi | and received | 00:45 |
clarkb | I confirm I have recieved it as well | 00:45 |
clarkb | http://lists.opendev.org/pipermail/service-discuss/2021-September/000282.html and it is in the archive | 00:45 |
fungi | i'll start the others and send similar replies to their discussion lists | 00:46 |
corvus | \o/ | 00:46 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible https://review.opendev.org/c/opendev/system-config/+/808570 | 00:46 |
clarkb | that fixes an issue with my transcribing from the list server and also adds a small fix for a thing that isn't on the list server but also shouldn't be a problem on the listserver currently | 00:47 |
clarkb | tomorrow morning I can set it up to hold the test nodes for ^ and then we can cross check similar to when we switched from puppet to ansible | 00:48 |
clarkb | also yay | 00:48 |
clarkb | I'll write these notes on the etherpad | 00:48 |
clarkb | do we want to do one more reboot? | 00:51 |
clarkb | we had talked about that earlier though it may be ok as is? | 00:51 |
fungi | as soon as i see the last reply get through the list i'll status notice the upgrade to provide closure for those watching the status log, but yeah i think we can save the reboot for after the ansible changes are in | 00:51 |
fungi | folks can continue fiddling, but i need to call it a night momentarily | 00:52 |
fungi | and seems like we've gotten it to a safe enough state that we can tackle it with fresh eyes after a bit of sleep | 00:53 |
clarkb | agreed | 00:53 |
fungi | yay, lists at all 5 sites distributed my post, so i think we're good | 00:53 |
clarkb | I'm focusing on writing down my notes in the etherpad so that we can refer to that tomorrow | 00:53 |
fungi | thanks, i started that but then ended up heads-down adjusting files for new fixes | 00:54 |
clarkb | ianw: the one thing that might be good for you to look at is borg on that server. We commented out the cron jobs because borg is in a venv iirc and we chagned python under borg | 00:54 |
clarkb | ianw: its possible that running ansible will automatically fix that for us but I'm not sure. Thought you might have input on that topic ni particular | 00:54 |
fungi | yeah, odds are it's a venv for something like python 3.5 and now we've got 3.8 there | 00:55 |
clarkb | fungi: I'm marking off the bits on the etehrpad for sending the announcement and all that | 00:55 |
fungi | thanks! | 00:55 |
ianw | clarkb: hrm, i imagine it would be broken; i think probably just rm-ing the venv would get it recreated | 00:55 |
ianw | i can take a look in a bit | 00:55 |
fungi | #status log The mailing list services for lists.airshipit.org, lists.opendev.org, lists.openstack.org, lists.starlingx.io, and lists.zuul-ci.org are back in operation once again and successfully delivering messages | 00:56 |
clarkb | ianw: thanks. Note that the server is in the emergency file and we'll keep it there as we need to land 808570 and figure out if we need to disable autoremove of packages on this server too | 00:56 |
opendevstatus | fungi: finished logging | 00:56 |
fungi | clarkb: worst case we set a package hold on the kernel packages we don't want it to uninstall | 00:57 |
fungi | that keeps it from upgrading a package anyway, pretty sure it would also block autoremoval | 00:57 |
clarkb | fungi: yup I took some notes around that on the etherpad. I think we want to reboot before we remove it from the emergency file | 00:57 |
fungi | oh, or we could just toggle the metadata to manually installed instead of autoinstalled | 00:57 |
clarkb | as that will tell us if it reliably boots on the new kernel | 00:57 |
clarkb | then we can set the new kernel to manual | 00:58 |
clarkb | and have it ignore us with package updates (which will overwrite the decompressed file) | 00:58 |
clarkb | we can also do a test boot with a compressed file and see if the chainloader can haldne that | 00:58 |
clarkb | if it can then we're fine | 00:58 |
clarkb | anyway I agree we are in a happy spot and I got my notes written down | 00:58 |
clarkb | thank you everyone for the help and on a weekend too :/ | 00:58 |
fungi | notes on the pad lgtm, thanks for summarizing | 01:01 |
clarkb | ianw: corvus https://etherpad.opendev.org/p/listserv-inplace-upgrade-testing-2021 is the etherpad if you don't have it. The interesting bits are towards the bottom | 01:01 |
clarkb | I just added plan to replace the server to the list as well | 01:02 |
clarkb | and with that I should go find dinner. Thanks again everyone! | 01:03 |
ianw | clarkb / fungi : i've done the simplest thing which is to just re-create the borg virtualenv. i've run a backup to RAX manually and it worked | 06:52 |
ianw | This archive: 6.37 GB 2.64 GB 117.18 MB | 06:52 |
ianw | last is dedup size, so that seems reasonable | 06:53 |
ianw | i've uncommented the cron jobs | 06:53 |
*** jpena|off is now known as jpena | 07:05 | |
*** frenzy_friday is now known as anbanerj|ruck | 07:06 | |
*** ykarel__ is now known as ykarel | 07:28 | |
*** jpena is now known as jpena|away | 07:40 | |
*** ykarel is now known as ykarel|lunch | 08:03 | |
opendevreview | Arx Cruz proposed opendev/elastic-recheck rdo: Add common.js to openstack-health https://review.opendev.org/c/opendev/elastic-recheck/+/808644 | 08:46 |
opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire puppet-freezer - Step 3: Remove Project https://review.opendev.org/c/openstack/project-config/+/808675 | 08:57 |
*** ykarel|lunch is now known as ykarel | 09:15 | |
*** jpena|away is now known as jpena | 09:38 | |
*** ysandeep is now known as ysandeep|brb | 10:55 | |
*** ysandeep|brb is now known as ysandeep | 10:59 | |
*** dviroel|out is now known as dviroel | 11:32 | |
*** jpena is now known as jpena|lunch | 11:35 | |
*** odyssey4me is now known as Guest7126 | 12:12 | |
*** jpena|lunch is now known as jpena | 12:30 | |
*** hjensas is now known as hjensas|afk | 12:45 | |
*** tosky is now known as Guest7131 | 13:40 | |
*** tosky_ is now known as tosky | 13:40 | |
*** ykarel is now known as ykarel|away | 14:36 | |
*** odyssey4me is now known as Guest7135 | 15:00 | |
clarkb | ianw: thanks! | 15:15 |
fungi | clarkb: see martin's comment on https://review.opendev.org/808479 don't we have a full update mode we can use to correct all of those? or does it happen with new renames but was missed in some older ones? | 15:24 |
clarkb | fungi: you have to manually run the playbook to fix that up. its the same playbook but you have to select the "do everything" flag | 15:25 |
clarkb | fungi: when we've tried to be more aggressive about setting that stuff it causes the playbook to take forever and we have had those api errors from gitea too | 15:26 |
fungi | got it, is that something we'd be able to update but restrict to specific projects during renames? | 15:26 |
clarkb | fungi: possibly yes, but it would require updating the code for renames. Currently renames are very specifically just doing the rename bit, but we could add steps to enforce the metadata too | 15:27 |
clarkb | that is all reasonably well tested now if you want to give it a go | 15:27 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible https://review.opendev.org/c/opendev/system-config/+/808570 | 15:30 |
clarkb | that should fix a testing issue with the change | 15:30 |
clarkb | fungi: ^ would probably be good if you could look over that change and evaluate some of the small deltas I've made compared to what is on the server. In particular the line.startswith(host + ':') check in the mm_cfg.py file and the variable setting in the init script template | 15:40 |
fungi | yep, now that i'm well-rested and caffeinated i should be able to do that | 16:07 |
*** odyssey4me is now known as Guest7141 | 16:07 | |
clarkb | fungi: I am neither of those things its a good thing one of us is :) | 16:08 |
clarkb | it helps if I can spell starlingx too :) new patchset up shortly | 16:09 |
opendevreview | Clark Boylan proposed opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible https://review.opendev.org/c/opendev/system-config/+/808570 | 16:10 |
Clark[m] | corvus: ianw left a question https://review.opendev.org/c/zuul/zuul-registry/+/808624 as well | 16:24 |
clarkb | wow I wrong windows that message :) | 16:24 |
*** jpena is now known as jpena|off | 16:25 | |
corvus | looking over zuul stats, everything seems nominal, except that the zuul event processing times seem longer than normal. that measures the amount of time it takes for an event once received from gerrit to make it to the scheduler. i don't think our nodepool changes should have affected this, unless they are just slowing everything down, a lot. the times still seem spikey, which is typical, as usually the spikes are caused by tenant | 16:52 |
corvus | reconfiguration events. looking at the logs, it seems like we have an unusually high number of tenant reconfig events happening right now -- about 3x so far compared to some random days last week. one potential cause could be larger than usual numbers of zuul.yaml changes due to the release cycle. | 16:52 |
corvus | in short, the only metric change i've seen appears to be related to a change in usage patterns, so i think i'm happy with the nodepool series. | 16:53 |
clarkb | that is a neat observation of behaviorial changes and ya release time seems like it would be good for making those changes as bugs in ci are fixed or papered over | 16:54 |
fungi | and also new branches being created (with zuul configuration on them) and changes being prepped to set up jobs for the next openstack development cycle | 16:55 |
*** odyssey4me is now known as Guest7146 | 17:00 | |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM forcing lists testinfra failure to hold nodes https://review.opendev.org/c/opendev/system-config/+/808805 | 17:03 |
fungi | i guess there's an autohold already for that? | 17:04 |
clarkb | fungi: I just created one yes | 17:05 |
clarkb | then we can compare results between the testnode and prod | 17:05 |
fungi | clarkb: inline q on 808570 | 17:27 |
fungi | though maybe that codepath is never reached without a multi-site install | 17:28 |
fungi | ahh, yep, it's never hit in that case | 17:33 |
clarkb | yup its a bit hierarchical based on the top level role config | 17:36 |
clarkb | fungi: corvus 149.202.176.57 is the held node we can cross check with lists.openstack.org to ensure that 808570 does what we expect | 18:02 |
clarkb | I'm going to keep reviewing this zuul stack so I don't lose that context but wanted to point out that job is done and the node is held if others wants to take a look | 18:03 |
clarkb | Then I can probably look at ^ after lunch myself | 18:03 |
fungi | so far, the weekly memory usage graph suggests focal may require slightly less ram for our mailman sites than xenial did: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=219&rra_id=all | 18:16 |
fungi | though we probably won't know until it's been running a week or so | 18:17 |
*** ysandeep is now known as ysandeep|out | 18:20 | |
clarkb | my isp is failing to route to review.opendev.org right now :( | 18:38 |
clarkb | I can hit it from my phone and mtr shows it failing with a last hop in their AS | 18:39 |
clarkb | is anyone else having trouble as a sanity check? | 18:39 |
clarkb | ok packets get through again and it appears the issue is happening between my ISP and HE at NWAX | 18:49 |
clarkb | hopefully it doesn't become a persistent issue | 18:49 |
fungi | ipv6 or v4 (or both)? | 18:52 |
fungi | also, diff with shell subprocess file substitution is a really quick way to compare files between hosts | 18:56 |
fungi | diff -u <(ssh lists cat /etc/mailman/mm_cfg.py) <(ssh 149.202.176.57 cat /etc/mailman/mm_cfg.py) | 18:56 |
clarkb | fungi: v4, I don't have ipv6 from this isp (yet, they keep saying it is coming soon) | 18:59 |
fungi | the slight differences between the files on production and the held test node are ~trivial, looks correct | 19:00 |
clarkb | thanks for checking. I'm eating lunch now then have to run an errand then will do my own cross checking. The diff with ssh subshells is a neat hack. Last time i did this I checked checksums and then when they didn't match looked directly iirc | 19:01 |
fungi | the only file which is a 1:1 match is the exim config, the other local fixes were not entirely 1:1 (for example i didn't get rid of the old envvar, didn't alter the pidfile definitions, wasn't checking for a trailing : after hostnames in the mapping file...) | 19:03 |
clarkb | fungi: line 641 of exim.conf is the place where we do lc::$domain fwiw | 20:11 |
clarkb | but that has been like that and not a difference from our updates here | 20:11 |
clarkb | I agree the exim config is the same on the two servers | 20:12 |
clarkb | for mm_cfg.py the only difference seems to be the ':' append in the line startswith check | 20:13 |
fungi | agreed | 20:15 |
fungi | i have a feeling exim is simply collapsing the double :: there into a single : | 20:16 |
clarkb | ok I checked exim config, the mm_cfg.py, the init scripts and the apache vhost configs and they all lgtm between the two servers. https://review.opendev.org/c/opendev/system-config/+/808570 is probably ready to land if we can get another set of eyeballs on it | 20:18 |
clarkb | ianw: corvus ^ fyi | 20:18 |
corvus | lgtm +2 | 20:23 |
clarkb | thanks for looking. I'm thinking we land that today, double check it didn't do anything to lists.katacontainers.io unexpectedly (it really shouldn't as all the code there is for vhostd mailman). Then tomorrow followup with running ansible on lists.o.o and rebooting it and holding any pacakges we might want to hold etc | 20:27 |
clarkb | fungi: ^ does that make sense to you? If so maybe you can +A it? | 20:28 |
clarkb | I need to get a meeting agenda sent out and am trying to do some more zuul review (but its going slowly because I'm slow today) | 20:28 |
fungi | clarkb: sounds great | 20:31 |
*** dviroel is now known as dviroel|out | 20:37 | |
clarkb | fungi: while putting together the meeting agenda I'm noting the osf -> openinfra renames are not on the wiki yet | 20:42 |
clarkb | fungi: and for the inspur/ and osf/ prefixes are all repos getting moved out of there? I wonder if we separately need to remove the orgs from gitea (I don't think any of the rename stuff does that sort of cleanup today) | 20:54 |
clarkb | fungi: should I +A https://review.opendev.org/c/opendev/system-config/+/808570 or will you? | 21:08 |
clarkb | fwiw I checked the held lists.kc.io on the test job and I don't see any unexpected leaks of things | 21:10 |
clarkb | which is the only other remaining concern to landing that I think since we have the other host in the emergency file currently | 21:11 |
fungi | clarkb: i just approved 808570 | 21:43 |
fungi | sorry, was doing dinner | 21:43 |
fungi | i'll add the project rename changes to the wiki momentarily | 21:43 |
clarkb | thanks! | 21:44 |
fungi | renames list on the meeting agenda page is now updated | 21:55 |
*** odyssey4me is now known as Guest7162 | 22:18 | |
opendevreview | Merged opendev/system-config master: Add support for Ubuntu Focal to our mailman ansible https://review.opendev.org/c/opendev/system-config/+/808570 | 22:23 |
fungi | clarkb: ^ | 22:29 |
fungi | so we want to do a test reboot before or after we take it out of the emergency disable list? | 22:29 |
clarkb | thanks | 22:29 |
clarkb | fungi: I think we should do test reboots before we take it out of the emergency list beacuse taking it out of the emergency list will potentially delete the old kernels we have | 22:30 |
clarkb | but if we can reliably boot the modern kernel that becomes less of a concern | 22:30 |
fungi | sure, should we reboot now or hold off? | 22:31 |
clarkb | up to you I guess. I'm starting to fade fast and not sure I want to resurrect the server this evening if it needs to be rescued again | 22:31 |
fungi | at least we know how to recover it fairly quickly if the reboot chokes, but sure we can save that for tomorrow | 22:31 |
fungi | i'm in no hurry there | 22:31 |
clarkb | ya we know the process now :) and my homedir has a bunch of copies of files we can easily move into place | 22:32 |
clarkb | any idea why 808570 merging is running infra-prod-manage-projects? It should noop but that is unexpected | 22:49 |
clarkb | ok the job to update lists ran for 808570 and lists.o.o was left alone as expected and lists.kc.io continues to seem happy. No HOST configs in any of the files there | 23:11 |
clarkb | I'm going to take a break now and try to be more well rested for tomorrow and reboot testing and all that | 23:12 |
fungi | sounds great, thanks! | 23:53 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!