| sean-k-mooney | frickler: wehre is cirros actully developed? https://opendev.org/cirros/cirros seams to be behind https://github.com/cirros-dev/cirros | 16:04 |
|---|---|---|
| sean-k-mooney | it looks like it moved to github? is that correct | 16:05 |
| fungi | sean-k-mooney: https://review.opendev.org/c/openstack/project-config/+/827719 has some discussion | 16:06 |
| fungi | and yeah, it was imported from a cleaned-up fork of https://github.com/osfrickler/cirros.git | 16:07 |
| fungi | er, that was the fork | 16:07 |
| fungi | cleaned-up fork of https://github.com/cirros-dev/cirros | 16:07 |
| sean-k-mooney | right but that git repo has had quite a number of pathce meged since then | 16:07 |
| sean-k-mooney | not that either are really active | 16:08 |
| sean-k-mooney | but its been 5 months since a commit was merged to https://github.com/cirros-dev/cirros | 16:08 |
| sean-k-mooney | vs 4 years for https://opendev.org/cirros/cirros | 16:08 |
| fungi | yeah, seems like they reactivated and then deactivated again | 16:09 |
| fungi | we could pull in newer history/tags to the one in opendev if people want to maintain it here | 16:09 |
| sean-k-mooney | so the reaons im asking is there has not been a release sicne 2024 and even then we were waiting for a 0.7.0 to change the kernel | 16:09 |
| sean-k-mooney | so for the last 3 years or so we have had kernel panics in the tempest jobs due to what we belvie is an unpatch kernel bug in the kernel cirros has | 16:10 |
| sean-k-mooney | that is fixed in later kernels | 16:11 |
| sean-k-mooney | every so often i bring up the topic of replaceing cirros with soemthign else but im ewondering ife we even have the capablit to update cirros and do a relase currently | 16:12 |
| fungi | looks like smoser is still around, just working on other stuff and not cirros lately | 16:12 |
| sean-k-mooney | im kind of conficlted | 16:18 |
| sean-k-mooney | part of me is thinking if this was hosted on opendev we coudl maintaine it simpler by seting up zuul jobs to build and publish and have a nice end to end release process | 16:19 |
| sean-k-mooney | basiclly jsut uploadign the artifact to t.o.o | 16:20 |
| sean-k-mooney | like the ironic ipa images | 16:20 |
| fungi | agreed, and the other part of me is feeling like we don't need yet something else to maintain | 16:20 |
| sean-k-mooney | ya | 16:20 |
| sean-k-mooney | if there is a project that fits the need and its devleoped then we shoudl just reuse it | 16:20 |
| JayF | I'll note it's a particularly awful time to be talking about maintaining a distro | 16:21 |
| fungi | that especially, yep | 16:21 |
| sean-k-mooney | env if that is the upstream cirros and we just contibute there | 16:21 |
| JayF | given right now security vulns in the kernel and other bits that'd even be in a minimalist distro are being dealt daily | 16:21 |
| clarkb | ya but if it is for testing only the risks are much lower | 16:21 |
| sean-k-mooney | JayF: ya but the lack of maintaicne of our test image has been leadign to a lot of rechecsk for litrally several years now | 16:21 |
| clarkb | (I'm not advocating for maintaining a distro even if it is for testing only. Just want to point out that changes the risk calculations dramatically) | 16:22 |
| sean-k-mooney | we really do need an replacement issue or we shoudl just stop testing cinder voluem resize in nova | 16:22 |
| JayF | sean-k-mooney: I'm not going to help maintain it whether we do it or not, just tossing that out there. Honestly I wish we'd just work with a distro vendor (HINT HINT) to get an official minimalist image for testing like this | 16:22 |
| fungi | it's too bad emdebian didn't stick around | 16:23 |
| sean-k-mooney | it would be nice if said fendors actully cared. we looke dat fedora before but fedora dn ubutnu compile there smalled images to requrie 256mb of ram | 16:23 |
| clarkb | fedora isn't stable enough for a test platform either (personal opinion) | 16:24 |
| clarkb | the amount of churn it goes through with each release is significant | 16:24 |
| sean-k-mooney | oh same i just mentioned it becaus that fedora default is also in rhel/centos ectra | 16:24 |
| sean-k-mooney | alpine was what i wanted to use in the past | 16:25 |
| clarkb | it is really apparent when in dib to use the new ubuntu release we typically only have to change the release name (maybe use a newer debootstrap version). With fedora you have to update all the things every release | 16:25 |
| sean-k-mooney | i created a fork fo cirros on my github quickly and im goign to see if it still builds | 16:25 |
| clarkb | network, filesystem, etc | 16:25 |
| fungi | raspbian could have been a good fit if they produced an amd64 version | 16:26 |
| sean-k-mooney | well i looked at thing liek tiny core in the past too | 16:27 |
| clarkb | at one point puppetlinux was publishing cloud like images but it looks like that didn't get very far | 16:27 |
| sean-k-mooney | there are a buchn of option seven gentoo woudl be fine if it was 1 small and 2 lightweighet | 16:27 |
| sean-k-mooney | what ever else you say about cirros it is highly tuned for the usecase | 16:28 |
| fungi | it's just too narrow of a usecase to have much of a community around it to keep it maintained | 16:29 |
| sean-k-mooney | yep again that why i started looking at alpine or a similar disto that also targets embeded | 16:29 |
| sean-k-mooney | i had alpine workign to some deggree via dib | 16:29 |
| frickler | I'm still not convinced there is an unpatched kernel bug in cirros. my bets are on "root fs corruption due to interrupted startup". all the failures I've seen lately are on the second boot attempt after a resize/migrate operation | 16:57 |
| frickler | building a cirros variant that has a proper rootfs instead of copying it on first boot from the initrd might help, but who has time for that? | 16:59 |
| sean-k-mooney | frickler: so i am pretyy sure i found the specififc ubuntu kernel bug in the past and confirmed it was on in the specific pinned kernel we had but was fixed in a later lts point relese kernel | 16:59 |
| sean-k-mooney | the current cirros kernel is a very early 22.04 kernel | 17:00 |
| sean-k-mooney | frickler: you might be right that it could be currption related but im not sure that allige to what i have seen | 17:01 |
| sean-k-mooney | https://738e0fbcd00a1fc4d556-8f8891a779b543135fbad241baccf135.ssl.cf1.rackcdn.com/openstack/e3f26732409f4715bd7df1adc272cdd0/testr_results.html | 17:02 |
| sean-k-mooney | ok so https://github.com/openstack/tempest/blob/2a85e0ea089b2d522d92a41aec9713cdf4930c24/tempest/api/compute/servers/test_server_actions.py#L514 is doing a resize confirm | 17:04 |
| sean-k-mooney | which means there is a guest reboot | 17:04 |
| sean-k-mooney | but the question is is the pain on the first boot or the second | 17:04 |
| frickler | the state before the console log is VERIFY_RESIZE, so second | 17:05 |
| sean-k-mooney | hum ok | 17:05 |
| sean-k-mooney | so your teory si on first boot the copy does not compelete properly adn teh FS is currepted | 17:07 |
| fungi | how reliably can you reproduce the failure condition? | 17:07 |
| sean-k-mooney | it only happens in ci | 17:08 |
| sean-k-mooney | we have never been able to do it locally | 17:08 |
| fungi | just wondering if we could hold a node so someone could check with a fsck on it | 17:08 |
| sean-k-mooney | but we may have been trying the wrong thing | 17:08 |
| fungi | or add some additional steps in the job to fsck it before the next boot | 17:08 |
| sean-k-mooney | whiel we were cahtting i had ai create a docker file with the cirros buidl deps and do a cirros build | 17:09 |
| fungi | not as a fix of course, but to confirm the fs is actually corrupt | 17:09 |
| frickler | it only happens on some small percentage of CI jobs, so reproduction is tricky. I tried locally without success a long time ago | 17:09 |
| sean-k-mooney | so that mostly owrks altoh i had to update where buildroot is pulled form | 17:09 |
| fungi | yeah, if it's rare then adding debugging into e.g. devstack would make more sense | 17:09 |
| sean-k-mooney | it also only happens on specific test inovling cinder voluems | 17:09 |
| fungi | so that it can give us more info on the hunch the next time it occurs | 17:10 |
| sean-k-mooney | i.e. boot form volume | 17:10 |
| opendevreview | Jon Bernard proposed openstack/project-config master: Add jbernard as an op for the cinder IRC channel https://review.opendev.org/c/openstack/project-config/+/990337 | 17:18 |
| sean-k-mooney | frickler: im doint the rootfs thing or rather pi/gpt 5.5 is now | 17:27 |
| sean-k-mooney | its also usign ext3 im not sure if the extra journaling of ext4 will help but im goign to try addding that as well | 17:28 |
| opendevreview | Merged openstack/project-config master: Add jbernard as an op for the cinder IRC channel https://review.opendev.org/c/openstack/project-config/+/990337 | 17:39 |
| sean-k-mooney | frickler: that was surpsingly easy | 17:40 |
| sean-k-mooney | ill push this up to my fork and maybe submit a pr | 17:41 |
| sean-k-mooney | the image size grows to 31M but from 21 but i think wew can deal with aht | 17:41 |
| sean-k-mooney | also i built master ratehr then 0.6.3 | 17:42 |
| sean-k-mooney | so there are ohter change as aprt of this | 17:42 |
| sean-k-mooney | https://github.com/SeanMooney/cirros/releases/tag/cirros-d260527-x86_64-5a75ef2-test | 17:57 |
| opendevreview | Jeremy Stanley proposed openstack/project-config master: Give infra-root permission to push sandbox notes https://review.opendev.org/c/openstack/project-config/+/990349 | 18:16 |
| sean-k-mooney | frickler: fungi so https://review.opendev.org/c/openstack/devstack/+/990348 https://zuul.openstack.org/status?change=990348 seam to be working but if we actully wanted to use these i assume we would want to not use my github fork. | 18:46 |
| sean-k-mooney | i can open an issue and ask about the current maintance | 18:47 |
| frickler | sean-k-mooney: you can also find us in #cirros on libera. I also wonder how many devstack iterations one would need in order to be confident that this helps | 19:09 |
| sean-k-mooney | oh i can likely join there once i add libera to my irc list | 19:10 |
| sean-k-mooney | i just opened https://github.com/cirros-dev/cirros/issues/131 | 19:11 |
| sean-k-mooney | frickler: we could perhaps sway the problematic nova jobs over to my image for a bit to see if embeding the rootfs does in fact fix it | 19:12 |
| sean-k-mooney | but my confer basiclly is with jobs pullign form github and geting rate limited | 19:12 |
| sean-k-mooney | it woudl be nice to have a way let it back for a while | 19:13 |
| sean-k-mooney | https://tinyurl.com/2dj49y9v so those are not all uniqu josb but e 274 hits in the last 2 weeks | 19:16 |
| sean-k-mooney | doing stats with a mk1 eyeball it happens 20-30 tims a day | 19:18 |
| sean-k-mooney | also interestingly enough in ovh (BHS1 and GRA1) | 19:19 |
| sean-k-mooney | ok rax is there as well jsut not on the job jobs | 19:20 |
| sean-k-mooney | so its acrss all providers | 19:20 |
| opendevreview | Clark Boylan proposed openstack/project-config master: Update jeepyb gerrit build to 3.13 https://review.opendev.org/c/openstack/project-config/+/990373 | 20:24 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!