*** ryohayakawa has joined #opendev | 00:02 | |
openstackgerrit | Andrii Ostapenko proposed zuul/zuul-jobs master: Add ability to use (upload|promote)-docker-image roles in periodic jobs https://review.opendev.org/740560 | 00:13 |
---|---|---|
openstackgerrit | Andrii Ostapenko proposed zuul/zuul-jobs master: Add ability to use (upload|promote)-docker-image roles in periodic jobs https://review.opendev.org/740560 | 00:14 |
openstackgerrit | Merged openstack/project-config master: Update ceph grafana for nova https://review.opendev.org/742516 | 00:59 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Add opendev-3pci/project-config https://review.opendev.org/743143 | 02:25 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Add OpenDev 3rd-party CI tenant https://review.opendev.org/743144 | 02:25 |
*** sgw1 has quit IRC | 02:40 | |
clarkb | ianw: we discivered that our infra-prod ansible jobs werent doing anything since the zuul local execution fix | 02:47 |
clarkb | ianw that git fixed by putting apieceof the infraprod base job into opendev/base-jobs as that repo is trusted | 02:47 |
ianw | oh | 02:47 |
clarkb | just mentioning it asit explains why things like LE certs werent updating | 02:47 |
clarkb | and so if you notice other laggy applies that could be related | 02:48 |
clarkb | should be good now though | 02:48 |
ianw | ok, cool, thanks :) glad i didn't have to debug it :) | 02:49 |
*** sgw1 has joined #opendev | 02:53 | |
*** sgw1 has quit IRC | 02:55 | |
*** sgw1 has joined #opendev | 03:11 | |
*** chandankumar has joined #opendev | 03:44 | |
*** fdegir2 has joined #opendev | 03:48 | |
*** fdegir has quit IRC | 03:49 | |
*** Dmitrii-Sh has quit IRC | 03:57 | |
*** Dmitrii-Sh has joined #opendev | 03:57 | |
*** sgw1 has quit IRC | 04:24 | |
*** DSpider has joined #opendev | 04:32 | |
*** ysandeep|away is now known as ysandeep | 05:19 | |
*** marios has joined #opendev | 06:03 | |
*** tosky has joined #opendev | 06:42 | |
*** ysandeep is now known as ysandeep|rover | 06:46 | |
*** qchris has quit IRC | 06:51 | |
*** qchris has joined #opendev | 07:04 | |
*** lpetrut has joined #opendev | 07:10 | |
*** dpawlik2 has joined #opendev | 07:22 | |
*** hashar has joined #opendev | 07:24 | |
*** fressi has joined #opendev | 07:27 | |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Add opendev-3pci/project-config https://review.opendev.org/743143 | 07:56 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Add OpenDev 3rd-party CI tenant https://review.opendev.org/743144 | 07:56 |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:01 | |
*** tosky has quit IRC | 08:03 | |
*** dtantsur|afk is now known as dtantsur | 08:05 | |
*** ysandeep|rover is now known as ysandeep|lunch | 08:15 | |
*** hrw has joined #opendev | 08:16 | |
hrw | morning | 08:16 |
*** rpittau has joined #opendev | 08:31 | |
*** fdegir2 is now known as fdegir | 08:40 | |
*** ysandeep|lunch is now known as ysandeep|rover | 08:42 | |
*** tosky has joined #opendev | 09:33 | |
*** tkajinam has quit IRC | 10:10 | |
*** lpetrut has quit IRC | 10:17 | |
*** hrw has quit IRC | 10:33 | |
*** iurygregory has quit IRC | 10:59 | |
*** iurygregory has joined #opendev | 11:01 | |
*** ryohayakawa has quit IRC | 11:03 | |
*** ysandeep|rover is now known as ysandeep|afk | 11:04 | |
*** ysandeep|afk is now known as ysandeep|rover | 11:29 | |
*** mordred has joined #opendev | 12:11 | |
openstackgerrit | Artom Lifshitz proposed openstack/project-config master: Add nested-virt-ubuntu-focal label https://review.opendev.org/743217 | 12:55 |
openstackgerrit | Artom Lifshitz proposed openstack/project-config master: Add nested-virt-ubuntu-focal label https://review.opendev.org/743217 | 13:02 |
*** dtantsur is now known as dtantsur|brb | 13:22 | |
openstackgerrit | Pierre Crégut proposed openstack/diskimage-builder master: Adds gnupg2 for apt-keys in ubuntu-minimal https://review.opendev.org/743229 | 13:41 |
*** mnasiadka has joined #opendev | 13:48 | |
*** sgw1 has joined #opendev | 13:50 | |
*** lpetrut has joined #opendev | 14:14 | |
*** hashar has quit IRC | 14:16 | |
*** ysandeep|rover is now known as ysandeep|away | 14:17 | |
*** bhagyashris is now known as bhagyashris|away | 14:24 | |
openstackgerrit | Pierre Crégut proposed openstack/diskimage-builder master: Makes EFI images, bootable by bios https://review.opendev.org/743243 | 14:27 |
*** mlavalle has joined #opendev | 14:43 | |
openstackgerrit | Pierre Crégut proposed openstack/diskimage-builder master: Adds gnupg2 for apt-keys in ubuntu-minimal https://review.opendev.org/743229 | 14:46 |
*** lpetrut has quit IRC | 14:49 | |
*** mlavalle has quit IRC | 14:58 | |
fungi | infra-root: just a heads up, openstack had four more afs-related job failures again today, all ran from ze11 | 14:59 |
fungi | https://zuul.opendev.org/t/openstack/build/7158ad74975b4578916841026f9781a2 https://zuul.opendev.org/t/openstack/build/6a5b49242ff244e0b36c382a12ae5c57 https://zuul.opendev.org/t/openstack/build/37b355f7e3794954ad8a2a70db9e79d1 https://zuul.opendev.org/t/openstack/build/d39dc250efb04bd798b5b924d4098a5e | 14:59 |
clarkb | fungi: I'm guessing a reboot us our next thing since checkvolumes has been run there right? | 15:00 |
fungi | yeah, i did fs checkvolumes on both ze10 and ze11 late last week | 15:00 |
fungi | i wary of rebooting lest we lose evidence of the underlying problem, but at this point i'm at a loss for what to check we haven't already | 15:01 |
clarkb | another option may be to restart the container to ensure all the bind mounts are properly in place? But ya I'm not sure what else could be looked at | 15:02 |
clarkb | fungi: oh do ze10 and 11 run the same kernel andotherhosts have newer or older kernels? | 15:03 |
clarkb | that would be when was the last reboot dependent. I wonderif that could explain why some executors are affected but not others | 15:04 |
fungi | yeah, that could explain it since they rebooted recently | 15:04 |
*** dtantsur|brb is now known as dtantsur | 15:06 | |
fungi | though so did ze02 and we hadn't seen any errors from it yet | 15:06 |
fungi | ze08 is running 4.15.0-70-generic, ze01/05/07 are on 4.15.0-91-generic, ze12 has 4.15.0-101-generic, ze03/04/09 have 4.15.0-106-generic, and ze02/06/10/11 are 4.15.0-107-generic | 15:08 |
fungi | so far we've only seen issues with ze10 and ze11, mostly the latter | 15:08 |
clarkb | thereis a .112 iirc | 15:11 |
clarkb | maybe weupdate 11 to that and reboot and see if it is happier? | 15:11 |
clarkb | (that is based on memory with my local xenial server running hwe kernels) | 15:12 |
fungi | yeah, there's a vmlinuz-4.15.0-112-generic in /boot | 15:12 |
fungi | what's the graceful way to stop an executor now that they're containerized? we don't have initscripts or systemd units any longer, right? | 15:12 |
fungi | just stop the container? | 15:13 |
clarkb | correct and itsnot graceful | 15:13 |
clarkb | zuul did just land graceful support though so we can possibly update docker-compose to use that when we update the image | 15:14 |
clarkb | but ya stopping is fast and not graceful for now | 15:14 |
*** mlavalle has joined #opendev | 15:17 | |
fungi | so might as well just `sudo reboot` i guess? | 15:17 |
clarkb | yes I think that may be roughly equivalent. Unless you want to docker-compose down the container so they dont start on boot | 15:19 |
clarkb | then you can start it manually after checking afs? probably not necessary | 15:19 |
fungi | yeah, i don't expect there's anything worth checking at reboot before starting the container | 15:20 |
clarkb | I'll be around to help more directly in a few. Finishing up breakfast now | 15:22 |
fungi | i'll go ahead and reboot ze11 | 15:22 |
fungi | #status log rebooted ze11 in hopes of eliminating random afs directory creation permission errors | 15:23 |
openstackstatus | fungi: finished logging | 15:23 |
fungi | Linux ze11 4.15.0-112-generic #113~16.04.1-Ubuntu SMP Fri Jul 10 04:37:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | 15:24 |
fungi | looks like everything came back up normally | 15:25 |
clarkb | also it will be 37C ish today so I may melt | 15:25 |
fungi | time difference between dmesg and syslog timestamps is still 10 minutes 37 seconds | 15:27 |
*** mlavalle has quit IRC | 15:27 | |
clarkb | and ntpd shows us forcing a jump I expect? | 15:28 |
*** mlavalle has joined #opendev | 15:29 | |
fungi | looks like syslog started out 10 minutes into the future | 15:29 |
*** mlavalle has quit IRC | 15:32 | |
fungi | also the container runtime (according to logs) started in the future too (i.e. before ntpd corrected the system clock) | 15:32 |
clarkb | fungi: I wonder if the clock is namespaced ? and the container is still out of sync? that could explain why afs within the containeris sad? | 15:33 |
fungi | yeah, and i'm not entirely sure how to check that... i tried docker-compose exec but i think that creates a new container, right? | 15:34 |
clarkb | no exec runs in the existing container | 15:34 |
clarkb | docker run makes anew container. exec attaches to an existing one | 15:34 |
fungi | ahh, well, invoking date via docker-compose exec matched the system time for the server | 15:35 |
fungi | docker-compose logs has some "Temporary failure in name resolution" exceptions raised in the statsd client (and with timestamps which would have been 10 minutes in the future at the time they were logged) but nothing else out of the ordinary | 15:39 |
clarkb | if the issue persists maybe restart the containers so that services there start with updated clock? | 15:40 |
clarkb | and if that fixes it we can probably do a unit after dependency for docker on ntpd | 15:41 |
fungi | yeah, that makes sense. will fix these latest release failures and keep an eye out for any more | 15:42 |
*** lpetrut has joined #opendev | 15:42 | |
fungi | since we've also seen some (though fewer) similar failures on ze10 i wonder if i should go ahead and restart the container on it | 15:42 |
fungi | `docker-compose down` followed by `docker-compose up -d` right? | 15:43 |
fungi | though i suppose i have to wait for the ansible processes on it to actually die in between? | 15:44 |
*** mlavalle has joined #opendev | 15:44 | |
clarkb | yes | 15:44 |
clarkb | from the dir with docker-compose.yaml in it | 15:45 |
fungi | i'll get started with that on ze10 | 15:45 |
fungi | hrm, actually maybe because they're in a common cgroup, the ansible processes all seem to die instantly | 15:45 |
fungi | no lingering ssh persist or agents | 15:46 |
fungi | #status log took the zuul-executor container down and back up on ze10 in order to try to narrow down possible causes for afs permission errors | 15:47 |
openstackstatus | fungi: finished logging | 15:47 |
openstackgerrit | Clark Boylan proposed openstack/project-config master: Run the grafana ansnible when grafana dashboard change https://review.opendev.org/743270 | 15:56 |
*** marios has quit IRC | 16:02 | |
openstackgerrit | melanie witt proposed openstack/project-config master: Fix copy-paste error for the ceph grafana nova ussuri panel https://review.opendev.org/743273 | 16:03 |
clarkb | fungi: thinking out loud here, if it continues to have troubles maybe we should start doing bionic rebuilds (or even focal?) as that will land VMs on different hypervisors with different clocks | 16:08 |
clarkb | for afs bionic is more of a known quantity | 16:08 |
mordred | ++ | 16:13 |
* mordred waves | 16:13 | |
clarkb | mordred: welcome back | 16:14 |
mordred | clarkb: I mean - I hate chasing solutions without fully knowing the problem - but chasing it on focal would put us likely in better position for debugging | 16:14 |
mordred | we'd at least be chasing recent vs old | 16:14 |
clarkb | ya | 16:14 |
mordred | and our userspace in those images is newer too - which shouldn't matter - but maybe dones | 16:15 |
clarkb | ya what is odd is ze10 and 11 seem to hvae hte problems but none of the others | 16:15 |
clarkb | but the systm clock issues stand out | 16:17 |
fungi | yeah, i have a feeling they were reboot-migrated onto hosts with massively skewed clocks | 16:18 |
clarkb | mordred: separately we discovered that buildx does have a pretty big performance impact for cpu intensive activities like compiling software for python wheels. We've had to disable nodepool's arm64 image builds in order to get the kazoo fix released. | 16:19 |
mordred | why does the hypervisor clock matter? | 16:19 |
mordred | clarkb: oh wow | 16:19 |
clarkb | mordred: because VMs boot with hypervisor clock as the starting point for their clocks | 16:19 |
mordred | clarkb: yah - but wouldn't ntp update the kernel clock? | 16:20 |
clarkb | mordred: and then the VM ntpd makes a huge jump and some software (perhaps even afs) dislike that | 16:20 |
mordred | AH | 16:20 |
mordred | nod | 16:20 |
mordred | that makes sense | 16:20 |
fungi | yeah, no clue really, so far it's all conjecture | 16:24 |
fungi | basically we had three executors spontaneously reboot last week | 16:25 |
fungi | all came up with varying degrees of clock skew ntpd had to correct shortly after boot | 16:25 |
fungi | ze02 had the least, ze10 was pretty bad and ze11 was worst of the three (>10 minutes) | 16:25 |
fungi | we pretty much immediately started getting random afs errors when ansible was asked to recursively ensure the project directories existed before writing tarballs, and also similarly before copying release notes into afs | 16:26 |
fungi | most happened on ze11, some on ze10, so far none reported for ze02 | 16:27 |
fungi | https://zuul.opendev.org/t/openstack/build/7158ad74975b4578916841026f9781a2 is a good example from today | 16:27 |
fungi | a file task to idempotently create /afs/.openstack.org/project/tarballs.opendev.org/openstack/tripleo-image-elements/ in case it doesn't exist failed with the error "There was an issue creating /afs/.openstack.org as requested: [Errno 13] Permission denied: b'/afs/.openstack.org'" | 16:29 |
*** hashar has joined #opendev | 16:29 | |
fungi | the "/afs/.openstack.org" in that error seems a bit misleading, i don't think ansible would actually have tried to create /afs/.openstack.org because the task is recursive:false | 16:30 |
*** dtantsur is now known as dtantsur|afk | 16:59 | |
clarkb | fungi: were those jobs reenqueued? I'm wondering if we need to do anything other than wait and monitor at this point? | 17:10 |
fungi | no, for the tarball write errors we can't reenqueue, i have to manually copy them from pypi into afs along with scraping the signatures from the build logs and adding those | 17:11 |
clarkb | ah | 17:11 |
clarkb | right that is because we've already pushed to pypi at that point and reuploads to pypi are errors, got it | 17:12 |
fungi | but a couple of the failures were release announcements, which might be reenqueueable if they ran in the tag pipeline instead of the release pipeline, i need to check that | 17:12 |
fungi | the tarballs fix looks like: 1. pull wheel and sdist from pypi for failed release, 2. copy signature text from build log and put in similarly-named .asc files, 3. use gpg --verify to check that the signatures are valid for the wheel and sdist, 4. copy the wheel/sdist/sigs into correct directory in afs, 5. vos release project.tarballs | 17:14 |
fungi | step 3 is how we confirm the files pypi is serving are really the ones we built | 17:15 |
clarkb | fungi: some of the jobs do run successfully on ze11 with afs writes too right? I'm wondering if maybe we should add a retries: 3 or something to the job tasks | 17:18 |
clarkb | to at least reduce the impact while we sort it out | 17:18 |
fungi | i think so? i'm really not sure what jobs do this exact same file task with ansible though | 17:19 |
fungi | but maybe we could find other builds of the same jobs succeeding on ze11? | 17:19 |
clarkb | ya that would at least be able to rule in that it succeeds sometimes | 17:19 |
clarkb | (if we can't find evidence with that one job we'd have to keep looking) | 17:20 |
clarkb | I think the docs jobs are the other afs writers from executors | 17:20 |
fungi | not entirely sure how to query that, maybe mysqlclient | 17:20 |
clarkb | we don't record the executor in the db. It is in the logs though | 17:20 |
clarkb | so maybe grep for job name in ze11 executor logs? | 17:20 |
fungi | ahh, it's in the manifest, right | 17:20 |
clarkb | and that gives a list of uuids? | 17:20 |
fungi | oh! yeah, executor logs. great idea | 17:21 |
clarkb | the neutron fix for the logstash issue has landed so I'm going to give that whole system a restart | 17:22 |
fungi | looks like the releasenotes failure indicates we're running commands like this in a subprocess `mkdir -p /afs/.openstack.org/docs/releasenotes/tripleo-image-elements/` | 17:23 |
fungi | https://zuul.opendev.org/t/openstack/build/6a5b49242ff244e0b36c382a12ae5c57/console#4/0/7/localhost | 17:23 |
fungi | the -p would in theory try to create /afs/.openstack.org if it wasn't found for some reason | 17:25 |
fungi | so this could also just be good ol' network issues with the server instance the executor's running on | 17:26 |
clarkb | oh interesting | 17:27 |
fungi | all four failures were clustered between 05:58:30 and 06:09:54 | 17:27 |
mordred | that does sound like a good possibility for good-ol-network | 17:28 |
fungi | previous failures were clustered together too, so could be brief periods where that executor was unable to reach the afs servers | 17:28 |
fungi | though i find it odd that the file task on the tarball jobs gets a similar error even though it sets recurse:false https://zuul.opendev.org/t/openstack/build/7158ad74975b4578916841026f9781a2/console#5/0/7/localhost | 17:32 |
fungi | or does that option not work the way i think it works? | 17:33 |
fungi | mmm, so far all the builds of release-openstack-python i can find on ze11 failed the same way | 17:39 |
clarkb | #status log Restarted logstash geard, logstash workers, and logstash now that neutron logs have been trimmed in size. | 17:40 |
openstackstatus | clarkb: finished logging | 17:40 |
fungi | oh, here's one which succeeded 2020-07-22 13:52:20 | 17:42 |
fungi | https://zuul.opendev.org/t/openstack/build/2f94dd93c6864adaa170e522bbe4795c | 17:42 |
clarkb | cool so it isn't a 100% failure and retries may hep | 17:43 |
clarkb | *help | 17:43 |
fungi | and the spontaneous reboot was Wed Jul 22 16:23 | 17:43 |
fungi | so yeah, it's a 100% failure since wednesday's reboot | 17:43 |
clarkb | unrelated it looks like cert check is happy with those expiring certs so fixing the ansible after the zuul updates seems to have been sufficient | 17:43 |
clarkb | oh fun | 17:44 |
fungi | if i authenticate with my afs creds on ze11 and `mkdir -p /afs/.openstack.org/docs/releasenotes/tripleo-image-elements/` it doesn't error | 17:46 |
fungi | but it seems to error every time the executor tries it | 17:46 |
clarkb | fungi: have you tried it iwthin the container? | 17:47 |
clarkb | maybe use zuul's creds in there though | 17:47 |
fungi | i'm not sure how to go about trying that | 17:47 |
clarkb | fungi: I think you can docker exec into the executor container and exec bash, then do what you'd normally do? | 17:47 |
clarkb | mordred: ^ you had tested bwrap things with afs in containers and may have steps sorted out? | 17:48 |
fungi | ahh, maybe, didn't realize we had interactive shells in the images but yeah i guess that makes sense | 17:48 |
fungi | neat, we don't set a default kerberos realm in these | 17:50 |
fungi | mkdir: cannot create directory ‘/afs/.openstack.org’: Permission denied | 17:51 |
fungi | bingo! | 17:51 |
fungi | ls: cannot access '/afs/.openstack.org': No such file or directory | 17:51 |
fungi | maybe we aren't bindmounting /afs into the containers? | 17:51 |
fungi | `mount` says: | 17:52 |
fungi | /dev/xvda1 on /afs type ext4 (rw,noatime,nobarrier,errors=remount-ro,data=ordered) | 17:52 |
fungi | as opposed to the main system context where it instead says: | 17:52 |
fungi | AFS on /afs type afs (rw,relatime) | 17:52 |
fungi | i think that must be our problem | 17:53 |
clarkb | I wonderif it isnt available on boot when services start due to the ntp issue | 17:53 |
clarkb | but then it is later? you restarted ze10 right? does it show that same issue? | 17:54 |
fungi | i'm about to check now | 17:54 |
fungi | yep! on ze10 it's accessible now | 17:57 |
fungi | so i concur, it seems like maybe it's starting the container before afsd | 17:58 |
clarkb | cool restarting the containers on ze11 should fix the immediate issue then we can add unit hints to fix the next reboot | 17:59 |
fungi | #status log took the zuul-executor container down and back up on ze11 in order to make sure /afs is available | 18:03 |
openstackstatus | fungi: finished logging | 18:03 |
fungi | so looks like the reason ze02 hasn't hit this is that it won the race at boot time | 18:05 |
clarkb | I wonder if afs waits for ntp and docker doesnt | 18:05 |
clarkb | the bigger delta likely makes ntp takelongee | 18:06 |
fungi | docker definitely doesn't wait for ntp because the docker logs have skewed timestamps from prior to ntpd correcting them | 18:06 |
fungi | okay, i've manually checked all 12 executors and their containers now have a usable /afs, so at least this shouldn't bite us again until one of them gets rebooted | 18:08 |
clarkb | and before that happens we can drop one of those hint files | 18:12 |
clarkb | we want docker to start after afs | 18:12 |
clarkb | iirc you can make a file that specifies just that relationship without modifying the other units | 18:13 |
clarkb | I thought we did that somewhere else and can look for it in a bit | 18:13 |
fungi | currently /etc/init/docker.conf includes "start on (filesystem and net-device-up IFACE!=lo)" | 18:14 |
fungi | though afsd is handled via sysvinit compat for /etc/init.d/openafs-client | 18:15 |
fungi | https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Examples | 18:19 |
fungi | "...create a directory named unit.d/ within /etc/systemd/system and place a drop-in file name.conf there that only changes the specific settings one is interested in..." | 18:19 |
JayF | it should literally be as simple as echo -e "[Service]\nAfter=after_unit.service\n" > /etc/systemd/system/unit.service | 18:22 |
fungi | their specific example is for a /etc/systemd/system/httpd.service.d/local.conf | 18:22 |
fungi | looks like we have afs.mount and openafs-client.service units, i'm not sure which we should make docker go after | 18:23 |
fungi | though i suppose we could just specify both | 18:23 |
JayF | I'd 100% do afs.mount (maybe both) if openafs-client.service is sysvcompat | 18:24 |
fungi | that's a good point, it might return control to the calling process before it's actually started | 18:24 |
JayF | because sysvcompat services are "up" as soon as /etc/init.d/servicename start (or similar) returns, which isn't always the same as the service being up | 18:24 |
fungi | exactly | 18:24 |
fungi | i like the cut o' yer jib, matey | 18:25 |
JayF | I really love the systemd service/unit model, and I don't sysadmin much in my current job | 18:26 |
JayF | and happened to look at this channel at the right time to help :D | 18:26 |
fungi | thanks! ;) | 18:26 |
JayF | np; gl | 18:26 |
fungi | currently /lib/systemd/system/docker.service (from the docker-ce package) claims Unit.After=network-online.target firewalld.service containerd.service | 18:28 |
JayF | don't ever trust files on disk for systemd units | 18:28 |
JayF | because drop ins can change them | 18:29 |
JayF | systemctl cat docker.service # gives you the effective unit | 18:29 |
fungi | good call. at least it matches in this case | 18:29 |
clarkb | also daemon-reload needs to run to pick up changes | 18:29 |
clarkb | which if we cared about this outside of boot time you'd want to add to your steps above | 18:29 |
fungi | thankfully in this case we don't | 18:30 |
clarkb | I'm still not having any luck finding where we do this already (I know we did it for something) but the stuff above lgtm | 18:30 |
fungi | so anyway, we'll have our executor role create a /etc/systemd/system/docker.service.d/after-afs.conf containing a [Unit] section with only "After=network-online.target firewalld.service containerd.service afs.mount openafs-client.service" | 18:32 |
JayF | Those drop-ins are additive | 18:32 |
JayF | I believe you just need to add the ones you want to add | 18:32 |
JayF | but I'm not 1000% sure | 18:32 |
clarkb | JayF: fungi correct it is additive | 18:32 |
fungi | oh, so it can just be "After=afs.mount openafs-client.service" and that'll be appended | 18:32 |
clarkb | yes | 18:32 |
fungi | even nicer | 18:33 |
fungi | okay, furious typing commences | 18:33 |
*** lpetrut has quit IRC | 18:33 | |
JayF | If you ever need to remove something from a list like that, e.g. you need to remove an After= | 18:33 |
*** tosky has quit IRC | 18:33 | |
JayF | you do it like this: [Service]\nAfter=\nAfter=[list of things or multiple After= directives] | 18:34 |
JayF | Basically a blank entry clears a list | 18:34 |
fungi | neat, though somewhat black magic | 18:36 |
fungi | then again, systemd is nothing if not dark arts | 18:36 |
JayF | It's well documented in long, detailed man pages. Of course, that also means it's hard to find when you wanna do one specific thing. | 18:36 |
fungi | i have my gate seals in place, so this summoning should go okay as long as i don't forget the binding ritual | 18:37 |
JayF | Eh. No harder to understand than reading some of the old school crazy combinations of bash scripts with libraries and including groups of other bash scripts and using numbers at the beginning to force ordering | 18:37 |
JayF | dark magic is any magic a person isn't familiar with, IME | 18:37 |
JayF | I have many users who'd call OpenStack dark magic ;) | 18:37 |
fungi | count me in those ;) | 18:37 |
clarkb | I think the difference with systemd is a lot of the magic is super implicit. Using an @ in a service name for example | 18:38 |
clarkb | whereas bash is bash | 18:38 |
clarkb | basically its a matter of leraning the magic. with bash many already know it | 18:38 |
JayF | That's fair, I'd complain about it a lot harder if it wasn't so well documented. Most of it I have internalized at this point, too, which helps | 18:39 |
fungi | yep, i'm certainly a shellusionist, need totally different tomes to be a systemdmancer | 18:39 |
JayF | I'm just a bard with a bunch of scrolls in my bag of holding, and I'm not really sure how all of them work | 18:40 |
clarkb | JayF: it is well documented but figuring out @ took me forever because its not easily searchable | 18:41 |
clarkb | but now that i know it its fine | 18:41 |
JayF | The @, and the socket activation are the darkest of the systemd magiks | 18:41 |
clarkb | its the sort of thing whree if you want to search "systemd unit parameters" you get immedaitely to the info you need. But if its "What is this @ in the unit filename and why is it special" you get to go on a long quest :) | 18:43 |
fungi | socket activation doesn't seem much different from inetd, so doesn't really strike me as strange | 18:43 |
JayF | As part of our Ironic deployment here, we run the nova-computes with an "@" and the string after-the-@ is set to Host=, to make it easy to run multiple instances of the nova-compute on a single box | 18:43 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Start zuul-executor after afsd and /afs is mounted https://review.opendev.org/743316 | 18:51 |
fungi | clarkb: JayF: mordred: smcginnis: ^ | 18:51 |
clarkb | fungi: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/nameserver/tasks/main.yaml#L20-L38 is where we do it elsewhere fwiw | 18:53 |
clarkb | not sure if yo uwant to add the owner/group/mode settings or not | 18:53 |
fungi | systemd is running as root so i doubt it cares | 18:55 |
clarkb | unless its worried about unprivileged users modifying privileged services (but I don't think that is an issue here either as nasible runs as root) | 18:57 |
smcginnis | fungi: Great find. | 19:01 |
mordred | fungi: nice! +A but there's a comment from JayF | 19:26 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Deny Gerrit /p/ requests https://review.opendev.org/743324 | 19:32 |
clarkb | fungi: ^ thats WIP but we talked about what that would look like earlier if you want to double check it now | 19:33 |
clarkb | infra-root https://review.opendev.org/#/c/741708/ is a simple easy review for etherpad management | 19:50 |
*** DSpider has quit IRC | 19:54 | |
*** hashar has quit IRC | 20:02 | |
openstackgerrit | Merged opendev/system-config master: Start zuul-executor after afsd and /afs is mounted https://review.opendev.org/743316 | 20:24 |
openstackgerrit | Merged opendev/system-config master: Run our etherpad prod deploy job when docker updates https://review.opendev.org/741708 | 20:33 |
*** fressi has quit IRC | 20:46 | |
*** tosky has joined #opendev | 21:25 | |
corvus | clarkb: it's a little late, sorry, but i just added a meeting topic for tomorrow | 21:45 |
corvus | (i talked with some folks at google about CI system integration with Gerrit and wanted to share that with folks) | 21:45 |
clarkb | corvus: no worries I got distracted by other things and haven't sent it out yet | 21:46 |
clarkb | corvus: did you update the wiki? I can wait a bit longer if not | 21:46 |
corvus | clarkb: i did | 21:47 |
clarkb | ah yup refresh shows it. I'll get the email out soon | 21:47 |
fungi | fwiw, i would love to hear what they had to say | 21:50 |
fungi | so thanks for adding it to the agenda! | 21:50 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Add opendev-3pci/project-config https://review.opendev.org/743143 | 22:30 |
openstackgerrit | Ian Wienand proposed openstack/project-config master: Add OpenDev 3rd-party CI tenant https://review.opendev.org/743144 | 22:30 |
*** tkajinam has joined #opendev | 22:52 | |
*** mlavalle has quit IRC | 22:56 | |
*** dtroyer has quit IRC | 23:03 | |
*** qchris has quit IRC | 23:06 | |
openstackgerrit | Merged openstack/project-config master: Run the grafana ansnible when grafana dashboard change https://review.opendev.org/743270 | 23:07 |
openstackgerrit | Merged openstack/project-config master: Fix copy-paste error for the ceph grafana nova ussuri panel https://review.opendev.org/743273 | 23:07 |
*** qchris has joined #opendev | 23:07 | |
*** tosky has quit IRC | 23:25 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!