Thursday, 2021-10-28

opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] trying fedora 35
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] trying fedora 35
*** pojadhav|out is now known as pojadhav|ruck02:46
*** ysandeep|out is now known as ysandeep05:06
*** ykarel|away is now known as ykarel05:46
opendevreviewIan Wienand proposed openstack/diskimage-builder master: fedora-container: update to Fedora 35
opendevreviewMerged openstack/diskimage-builder master: dracut-regenerate: drop Python 2 packages
opendevreviewIan Wienand proposed openstack/diskimage-builder master: fedora-container: regenerate initramfs for F34
*** jpena|off is now known as jpena06:54
*** ysandeep is now known as ysandeep|lunch07:48
*** pojadhav is now known as pojadhav|ruck07:51
*** ykarel is now known as ykarel|lunch08:34
esimoneinfra-root: Can I ask what are the next steps to evaluate the insertion of a new project in the X namespace?
*** ysandeep|lunch is now known as ysandeep08:45
frickleresimone: that project was created in airship namespace, do you want to move it to x/? otherwise it may just be missing a follow-up patch to set up job definitions09:24
esimonefrickler: yes, if it possible I prefer the x/ namespace. It is not part of nor use airship philosophy.09:35
frickleresimone: hmm, we have but that might be openstack-specific, I'll need to check with other admins later09:39
esimonefrickler: thank you. You can consider siss related to OpenStack as Kayobe do, but simpler and linked to Ceph.09:44
*** ysandeep is now known as ysandeep|afk09:46
frickleresimone: oh, you were talking about ? the link you posted was for airship/image-builder09:57
*** ykarel|lunch is now known as ykarel10:04
esimonePlease, frickler: Yes... please, forgive my big mistake. A wrong cut&paste. I'm so sorry.10:05
frickleresimone: no problem, that patch looks good to me, should hopefully get another review and approval soon10:09
esimonefrickler: thanks10:13
*** ysandeep|afk is now known as ysandeep10:37
*** dviroel|rover|afk is now known as dviroel|rover10:50
*** ykarel is now known as ykarel|afk11:15
*** jpena is now known as jpena|lunch11:23
*** ysandeep is now known as ysandeep|mtg11:35
*** ysandeep|mtg is now known as ysandeep11:51
*** jpena|lunch is now known as jpena12:22
*** ykarel|afk is now known as ykarel13:19
opendevreviewMerged openstack/project-config master: Add siss under x namespace
clarkbNot really here yet, but latest update on the gerrit + mina updates is that MINA 2.7.1 isn't actually ever going to happen and is just a placeholder for 2.8.0. Not sure why they don't just refer to 2.8.0 in that case. Also "using 2.8.0 in Gerrit is not a piece of cake because this Apache MINA sshd project unfortunately breaks API between minor releases regularly." and they have to do14:15
clarkbextensive testing with jgit apparently14:15
clarkbI'm staying out of it but can't help but wonder if this would be easier if some backward compat was maintained14:16
fungior maybe there's a use for your backported implementation after all14:17
clarkbya that was also mentioend in the email. I could resurrect that and base it off of the actual java updates rather than my poor attempt at a kotlin translation14:18
*** ricolin_ is now known as ricolin14:40
*** ykarel is now known as ykarel|away14:51
*** ysandeep is now known as ysandeep|mtg15:03
*** pojadhav|ruck is now known as pojadhav|dinner15:08
opendevreviewClark Boylan proposed opendev/system-config master: Upgrade to gitea 1.15.6
clarkbinfra-root fyi ^ new gitea release today15:40
fungithey sure are churning those out at a steady clip lately15:41
clarkbdebian is killing which. Shell scripts everywhere panic15:48
*** ysandeep|mtg is now known as ysandeep15:49
funginot killing it, but you'll need to explicitly choose which implementation of which you want installed if you intend to keep using it, the one which was installed by default is going away15:50
clarkbright but the million and one packages that expect which to exist won't know anything about that and fail when which isn't explicitly installed15:50
clarkbs/packages/packages and scripts/15:50
fungithe discussion highlights that there is no "standard" implementation of the `which` utility, every unix derivative rolled their own slightly different versions, and the maintainers of the one debian had been installing are tired of having to maintain their own15:51
clarkbdebians own package maintainer documentation apparently said use which until 201815:51
clarkbI'm not saying its wrong to try and use a better standard (command -v) but if you said "use this tool" for years nad years then ripping it out is going to be a problem15:52
fungioh, for sure15:52
fungii didn't mean to imply it's not going to be a painful transition15:52
fungiit's just also not a decision which was taken lightly15:53
fungidiscussion around it has been ongoing for months on the debian-devel ml15:53
*** ysandeep is now known as ysandeep|out15:53
clarkbya it just surprises me when changes like that are made. I much prefer the kernel's approach15:53
clarkbthat was the standard interface according to the docs. I'm sorry but that mens you own it now15:54
clarkbI get it isn't ideal but...15:54
clarkbwe've got a bunch of which use that will need to be updated15:55
clarkbit also makes what the kernel manages to do that much more impressive15:57
fungii have a feeling bookworm will release with some implementation of which as part of the standard set, if not essential16:03
*** jpena is now known as jpena|off16:05
*** marios is now known as marios|out16:14
esimoneinfra-root: sorry, who to ask to be added to the siss-core and siss-release Gerrit groups (x/siss has just been created)?16:20
fungiesimone: sure, i'll add you now, give me a moment16:22
fungiesimone: done16:25
esimonefungi: many thanks16:32
*** pojadhav|dinner is now known as pojadhav|out16:43
opendevreviewGonéri Le Bouder proposed zuul/zuul-jobs master: use-buildset-registry: remove useless import
yuriysclarkb; fungi; ive installed the self signed on the s3/swift service, so you should be all set to use that (may need to copy the ca crt) to your s3 client, etc.17:01
clarkbyuriys: cool, thanks17:02
GoneriThe problem above prevents use from using the role on Fedora 34 (py39). The line does not make a lot of sense and it looks like a mistake was introduce during the import of the module.17:03
clarkbGoneri: note that zuul-jobs is primarily discussed on matrix in the room17:03
Gonerioh, sorry.17:03
opendevreviewMerged zuul/zuul-jobs master: use-buildset-registry: remove useless import
*** sshnaidm is now known as sshnaidm|afk17:50
clarkbfungi: do we want to wait for ianw to start the day before approving or just go for it? I can help keep an eye on it for much of today at least18:11
clarkbI think the ovh bhs1 mirror is sad18:11
clarkbit is returning errors for pypi proxying and npm proxying18:12
fungiyou looking into it or shall i?18:12
clarkbbut testing it from here eg works18:12
clarkbfungi: I'm starting to look at it but extra eyeballs would be good since the evidence I'm seeing shows it is fine18:12
clarkbbut and disagree with me18:13
clarkbok I was able to reproduce with a download of the npmjs package in that second link. retrying succeeded18:14
clarkbapache's error log complains of segmentation faults in children18:15
fungi/dev/mapper/main-apache    98G   98G   64K 100% /var/cache/apache218:15
clarkbwe are using almost no memory there, no OOMs, and the most recent afs sad was almost a week ago18:15
clarkbaha that would explain it18:16
fungicheck df18:16
clarkban htcacheclean has been running since 18:0018:16
fungithis has happened there before, i think because of poor i/o performance causing htcacheclean to take forever to complete, and then we end up with multiple runs fighting one another18:16
clarkbI only see one run currently fwiw18:17
fungiwww-data 19795  0.0  0.0  19912   424 ?        Ss   Oct08   0:30 /usr/bin/htcacheclean -d 120 -p /var/cache/apache2/mod_cache_disk -l 300M -n18:17
clarkbthats a different unused cache18:17
fungiahh, okay18:17
clarkb(the default that apache ships with)18:17
fungi/var/cache/apache2/proxy is the one we're using i guess18:17
clarkbdo we disable the provider and give htcacheclean an opportunity to catch up?18:18
fungilooks like htcacheclean runs out of /etc/cron.daily/apache218:18
clarkbI suppose the more forceful option is to stop services, delete the cache and start over18:19
clarkbfungi: it runs out of roots normal crontab18:19
clarkbat the start of ever hour18:19
fungiaha, so it does, hourly there18:19
fungiyeah, so /etc/cron.daily/apache2 is probably what's responsible for the run with /var/cache/apache2/mod_cache_disk18:20
clarkbyes I think so18:20
corvusbuffer 1918:20
fungilooks like we don't log our htcacheclean so hard to know how long it's taking18:20
clarkbfungi: I don't think it really logs anything either18:21
fungi suggests it filled up around 17:2018:22
clarkbI wonder if this coincides with kolla or loci or tripleo publishin a new set of docker images18:22
clarkbI'm working on a change to disable bhs1 in nodepool but I think we need to manually apply it to reliably land anything right now18:23
corvuswhat about rm -fr /var/cache/apache2/modcachedisk ?18:23
fungilooks like it normally hovers at the htcacheclean limit, we may want to consider lowering that given it can easily eat that much cache in less than the period between htcacheclean runs18:24
opendevreviewClark Boylan proposed openstack/project-config master: Disable OVH BHS1 due to a sad mirror node
corvusit's just a cache right? restarting from a cold cache seems like a fine thing to do18:24
fungicorvus: i think we'll want to temporarily disable builds there either way. a recursive delete is likely to take a long time to complete and we wouldn't want to do that with apache actively serving from there18:24
clarkbcorvus: yes I think that is safe from the mirror's perspective but unknown how long it will take18:25
corvusstop apache; mv directory; rm directory in background; wait a minute; start apache18:26
fungiahh, yeah i guess the mv of /var/cache/apache2/proxy would work since /var/cache/apache2 is what we mount18:26
clarkbI've manually edited max-servers for the region on nl0418:26
fungii'll do corvus's suggestion18:27
clarkbfungi: thanks. Let me know when you think we're good to reset max-servers and I can do that18:27
corvusthat should make the removal atomic from apache's perspective, and as long as we give it a few minutes head start, apache should have headroom while the background rm runs18:27
clarkb(and we can avoid landing that change I guess)18:27
corvusfungi:  oh, don't forget to throw in a mkdir/chmod/chown of the proxy dir to match the moved one18:27
fungiyeah, i'm killing the htcacheclean processes too since i'm moving the directory out from under them18:28
fungiokay, it's back up and running with the old cachedir in the process of deleting in a root screen session18:30
opendevreviewClark Boylan proposed opendev/system-config master: Reduce htcachclean limit on our mirrors
fungiplease test18:30
clarkbSomething like ^ for lowering the limit18:30
clarkb loads for me18:31
fungii seem to be able to pull cached contents through it, yeah18:31
clarkbit made my firefox sad though18:31
fungier, proxied contents i mean18:31
clarkb also seems to be working from here18:32
fungideletion seems to be progressing quickly18:32
clarkbI'll abandon my project-config change since it appears unnecessary18:32
clarkbfungi: is the rm done?18:36
clarkband if so should I reset max-servers on nl04?18:36
fungiyeah, seems like it just finished moments ago18:36
clarkbI'll go ahead and restore max-servers now then18:37
fungiwhich tends to suggest we're caching fewer, larger files rather than tons of teensy ones18:37
fungii wonder if we could get by running htcacheclean more frequently18:37
clarkbfungi: I think we implemented the lock file around it because it sometimes takes longer tha nthe hour already18:38
fungiwould need to watch how fast it's able to traverse the cache though18:38
clarkband yes fewer larger files is what I would expect from kolla/tripleo/loci types doing a massive update to their docker images18:38
clarkbI think all told their images come out in the 10-20GB range18:38
clarkband if we're carrying the old version and the new version we can quickly bump up usage there18:39
clarkb(I wish that apache's hashing method was decipherable to make it easier to figure this stuff out)18:39
clarkbI think if we want to debug in more depth in the future we need to use du on directories then pick the largest 5% or so and look for strings in their contents18:39
clarkbshould we do something like #status notice filled its disk. We have corrected this issue and jobs that failed due to this mirror can now be rechecked.18:41
fungiyeah, that looks good18:41
fungimaybe include the times?18:41
fungilooks like errors would have started around or shortly after 17:25 and stopped around 18:2518:42
clarkbupdated to  #status notice filled its disk around 17:25 UTC. We have corrected this issue around 18:25 UTC and jobs that failed due to this mirror can now be rechecked.18:42
clarkb#status notice filled its disk around 17:25 UTC. We have corrected this issue around 18:25 UTC and jobs that failed due to this mirror can be rechecked.18:43
opendevstatusclarkb: sending notice18:43
-opendevstatus- NOTICE: filled its disk around 17:25 UTC. We have corrected this issue around 18:25 UTC and jobs that failed due to this mirror can be rechecked.18:43
clarkbnow back to thinking about gitea. Do we want to go ahead and proceed with that or wait for ianw's day to start and give a proper second review?18:46
clarkbNone of the bugs in the changelog look super important for us. I mostly want to keep up to avoid falling behind. Keep the delta minimal and all that18:47
fungiyes, i'm good going ahead with the gitea upgrade18:50
fungihappy to approve it now18:50
clarkbcool I'll be grabbing lunch soon but that is a good way to spend time while we wait on zuul to test it :)18:51
opendevreviewGonéri Le Bouder proposed zuul/zuul-jobs master: enable-fips: Fedora also support FIPS
opendevreviewGonéri Le Bouder proposed zuul/zuul-jobs master: enable-fips: Fedora also support FIPS
opendevreviewMerged opendev/system-config master: Upgrade to gitea 1.15.6
funginow the wait for deploy20:13
clarkbnb03 is unable to run docker-compose up right now and that is causing service-nodepool to fail. Disks seem fine but the server has fairly high system load and the service isn't actually running20:31
clarkbI think this may be a problem with docker itself. I'm trying to stop the running container which isn't actually running anyp rocesses from what i can see20:31
clarkbok same error trying to stop as we get from up20:34
clarkbI'm trying to stop and start docker itself now20:34
corvusload average is stuck at 4 while idle; that's not a great sign20:35
clarkbdocker did eventually stop. I'm trying to start it now so I can cleanly stop any docker things that might have hung around20:37
clarkbbut I agree I think there is a bigger issue and not sure what it is yet20:37
clarkbthe gitea job just started and should get through the giteas in the next 20 minutes or so20:38
fungii'll be eating right about then, but still plenty available to test/troubleshoot that20:42
clarkbgitea01 has upgraded. it looks happy to me20:42
clarkbI'll keep an eye on them but usually we expecttrouble on gitea01 as the first one to go and if it is happy the rest end up happy too20:43
clarkbdocker still has not started on nb0320:43
clarkbI have no idea where that load is coming from20:43
corvusi'm guessing stuck kernal io threads or something?20:44
clarkbthere are a number of kernel tasks20:44
fungithere's one cpu stuck at 100% iowait20:45
clarkbwe've got 8 processors and 25% iowait. Would that actually be two cpus? But ya lots of kworker threads and those are responsible for IO and the like20:46
clarkbI'm suspecting this might be a reboot20:47
clarkband hand wave around "its arm and there are bugs"20:47
fungicacti couldn't reach it for several hours starting just after utc midnight20:47
fungiand when it started responding again, 100% of its cpus were in use by "system"20:48
fungithough the stuck iowait has been happening for ~4 days according to
fungilike maybe something got "stuck" on monday20:50
clarkbany objections to attempting `sudo reboot`? Not sure if there is more people want to look at20:52
clarkbI suppose linaro might be interested in it?20:52
clarkbassuming this is some issue with arm64 + linux + docker + something that might be of interest to them20:52
fungioh, right, this is the arm builder?20:52
fungiyeah, couldn't hurt to see if they want to do some digging there20:52
fungimaybe it's something pathological with the hypervisor20:53
clarkbkevinz: ^ hey if you are around today our instance in the linaro cloud is exhibiting some sort of problem with high idle system load. We think our next step is to reboot the server but thought you might be interested in details potentially if this represents an issue with arm64 on linux with docker and whatever else we might trip over20:59
clarkbkevinz: is there any information you would like us to collect from the VM and share?20:59
clarkbgitea01-08 appear to have all upgraded. I don't notice any immediate problems21:00
fungiyeah, all seems fine to me on the gitea servers/site21:28
*** dviroel|rover is now known as dviroel|rover|afk21:33
ianwclarkb: if you have time for which adds 9-stream to centos-minimal, i'm sure it would be appreciated21:59
ianwafaict it builds, it boots, so we're more or less ready to move on with it21:59
clarkbya I'm not sure how much review I can do beyond general structure stuff. Not a centos expert, but if it builds and boots that is a very good sign :)22:08
ianwi think that's about it.  for production purposes, a mirror is about 200gb when i dry-run rsync'd it recently22:09
ianwthere's still a few debian stable bits hanging on i need to clear, i hope to free that soon22:10
ianwalso i guess we'll need a f35 mirror as well.  we only need f34+f35 for a short period as we cut over jobs22:10
ianwafs01 is at 3.8/4TB22:11
ianwafs02 at 3.4/4TB22:12
clarkbsoon we'll be able to clean out centos-8 as well22:14
ianwif we can't find too many other things to optimise we might be at the point we need a bit more disk on them22:14
ianwmore disk == more chance of a disk screwing up == more chance of afs recovery hell22:14
clarkb I think we can remove xenial from that too22:14
clarkbdebian stretch + ubuntu ports xenial is probably a good chunk of disk. Then when we get to to centos-8 eol another chunk22:15
ianwyes, indeed!  i'll get a change up for that, we stopped xenial arm64 22:15
opendevreviewIan Wienand proposed opendev/system-config master: reprepro: stop mirroring ubuntu-ports Xenial
ianwwe had a user pop up in #openstack-dib asking about using our mirrors and the gpg signing, on query they said "however the idea behind using the same mirror as the community is to avoid problems like oslo.vmware, which ended up affecting other CIs but not the community's CI."22:18
clarkbwe explicitly do not want people using our mirrors. For reasons like the bhs1 problem too22:18
clarkb*bhs1 problem today22:18
ianwi'm not aware of what that issue was.  but i did make it clear we operate the mirrors for the benefit of the CI we run, so at any time, like above, we might kill/move/drop cloud etc.22:19
clarkbwe were able to address that within our systems but we wouldn't necessarily be able to update them if we turn it off.22:19
clarkbbasically the disk filled. We set max-servers to 0 then started surgery22:19
clarkbif the issue is prolonged they would be broken but we would be fine22:19
clarkband ya clouds come and go too22:19
fungiand we also may reorganize our mirrors without warning22:20
clarkbianw: I think the solution to the wheel problem is to have requirements actually test that it can install stuff from pypi without our mirrors (and frickler was workong on that)22:20
ianwwe don't explicitly list reprepro cleanup22:22
ianwiirc, after applying the removal, if we take the lock and run "|reprepro --delete clearvanished| that should do it?22:23
ianw(applying the removal to the config files)22:23
clarkboh I never remember but ya I think there is a reprepro command for it22:24
ianwclarkb: yes that sound reasonable.  i feel like there's already some sort of giant co-isntall test?22:24
clarkbianw: ya there is, but it uses our wheels. I think frickler's update was to run a second pass without our mirrors set22:24
clarkbianw: any idea why is needed?22:24
ianwclarkb: oh, i'm assuming similar to what fungi was talking about, everyone is dropping iptables lately22:25
clarkbI see. I wonder if anyone has replaced iptables-persistent without needing to do firewalld or something else silly22:26
ianwthat should be in pkg-map anyway22:27
clarkbI think we should also wait for haveged to haev a proper package or not install it on centos-9 (and jobs there can be slow I guess)22:28
clarkbhistorically the rdo package hosting has not been very reliable and I don't want that to be a weak link in our builds22:28
ianwhas it been replaced with some systemd component, etc?  22:29
clarkbI'm not sure they just indicated on that change that rdo is packaging it until centos 9 has a package for it22:31
clarkbI'm not going to -1 over this but has no newline at the end of the file now so it shows up in the diff :P22:31
ianwit's probably worth evaluating if this should be using rng-tools or something else22:32
opendevreviewIan Wienand proposed openstack/project-config master: dstat graph: update to version with fixes
ianwclarkb / corvus : ^ that should fix actual opendev jobs using the roles getting the layout fixes22:40
clarkbianw: any idea why line 146 changes?22:45
ianwi was thinking that was actually a bug fix.  i'm not sure why we'd strip the -stream off it22:47
clarkbya I guess it would've been nice to split out the various stream-8 related work into a couple of parent changes but if we have to fix things we can sort it out then22:51
clarkbthe biggest impact is if we have to revert22:51
clarkbI've approved it and also left a note about how it is weird to need to install glibc explicitly22:51
mordredclarkb: installing glibc explicitly is, indeed, weird22:52
clarkbya I'm not sure what a distro is without a libc22:52
clarkbwithout a kernel ok you're a container image probably22:52
clarkbbut with a libc you're some sort of all in one thing and the distro doesn't make sense anymore :022:53
fungiyou're not a container image, you're a statically compiled executable22:55
corvusianw: thanks approved!22:58
ianwfungi: do you feel like there's any reason why "reprepro --delete clearvanished" couldn't be part of the regular update script steps?23:04
fungiwill that remove packages which we only just updated?23:04
ianwi think it should only remove things we've dropped from the config files23:05
fungijust making sure it won't undo our intentional one-pulse delay on deleting packages which are no longer listed in indices23:06
ianwi'm just writing instructions on how to apply changes like
opendevreviewMerged openstack/project-config master: dstat graph: update to version with fixes
ianwi guess that comes down to "does clearvanished imply deleteifunreferenced"23:08
ianwit seems even wikimedia do it by hand ...
opendevreviewIan Wienand proposed opendev/system-config master: reprepro: add note on removing components
clarkbianw: should ^ do a vos release too before dropping the lock23:48
opendevreviewIan Wienand proposed opendev/system-config master: reprepro: add note on removing components
ianwclarkb: good point, done :)23:50
ianwclarkb: i assume you're ok with removing xenial ubuntu-ports?
ianwif so i'll have a go at it this afternoon and validate the instructions with removing that23:53
clarkbya I'm good with it. Just looking over the role now to make sure we don't have to clean anything else up23:55
clarkbseems like when it was puppet there were a lot more bits to clean up23:55
clarkbaha the updates file doesn't know about specific releases23:56
clarkbso ya this lgtm23:56

Generated by 2.17.2 by Marius Gedminas - find it at!