opendevreview | Merged opendev/irc-meetings master: We have decided to adjust meeting time to 0700 during summer time. https://review.opendev.org/c/opendev/irc-meetings/+/914940 | 09:02 |
---|---|---|
SvenKieske | I'm currently trying to decide which parent job to utilize for my new linting job, and looking at https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L72 I have questions. Are the python27 jobs really still running? the build history for those is not loading for me in zuul. I guess I could grep all the zuul jobs.yamls if it is enabled at all | 09:30 |
SvenKieske | in general it seems - for me anyway - that there are some jobs in there which could probably be updated? referencing EOL branches etc? But I'm not sure if these are still in use for some fips testing or other stuff? | 09:31 |
SvenKieske | I don't see any jobs there explicitly running on newer releases (2023.X et al). newest branches there in most parent jobs e.g. "openstack-tox" is zed release. even openstack-tox-py311's parent is openstack-tox with no newer branches declared? I must be missing something? | 09:50 |
SvenKieske | ah, those are defined here: https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/project-templates.yaml#L645 weird structure.. | 09:54 |
SvenKieske | so it seems, previously gate testing jobs and their branches where defined in jobs.yaml https://opendev.org/openstack/openstack-zuul-jobs/commit/6d85fd8399ed6b9f2358412945cd6683989662cd | 09:59 |
SvenKieske | but nowadays this is done in project-templates.yaml https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/913710 | 10:00 |
SvenKieske | and nobody cleaned that up? afaik it's not necessary to split the branch definition for jobs being run over two files here? | 10:00 |
SvenKieske | maybe I'm still not seeing the whole picture here, but it does seem to make some kind of sense at least. although it seems a little brittle and error prone to have no single source of truth which branch is being used for which job at which point in time. | 10:03 |
*** sfinucan is now known as stephenfin | 12:25 | |
fungi | SvenKieske: you might find https://zuul.opendev.org/t/openstack/jobs an easier way to browse the zuul jobs defined in the openstack tenant, and then cross-reference them to git from there | 12:28 |
fungi | to answer your question about whether some projects still maintain compatibility with and test on python 2.7, yes of course. there are still distributions that support it even if it's not supported upstream by the python community, and openstack project branches that (at least very recently) supported being installed with python 2.7 | 12:30 |
fungi | and even master branches of non-branching tools and libraries that need to continue to support those older versions of the software (e.g. pbr). i think we only dropped python 2.7 support from bindep a few weeks ago | 12:33 |
fungi | https://zuul.opendev.org/t/openstack/builds?job_name=openstack-tox-py27&project=openstack/pbr | 12:36 |
fungi | https://zuul.opendev.org/t/opendev/builds?job_name=tox-py27&project=opendev/bindep though that's in a different zuul tenant these days | 12:38 |
SvenKieske | ah, that's why I didn't find any jobs, thanks! | 12:38 |
SvenKieske | does pbr still use py27? | 12:39 |
fungi | yes, it still supports python 2.7 because other projects supporting python 2.7 need to be installable with the latest versions of pbr | 12:39 |
fungi | a prime example though is https://zuul.opendev.org/t/openstack/builds?project=openstack%252Fswift | 12:40 |
fungi | you'll see it runs several different "py27" jobs even on master branch changes | 12:41 |
SvenKieske | interesting, I was under the impression that the transition to python3 was complete years ago. at least that was some marketing speak around it. so swift still supports python2? | 12:43 |
SvenKieske | is that to support some old redhat cruft? I assume swift does support python3. I wouldn't even know how to currently install python2 on most distros, maybe software heritage has old packages. | 12:45 |
tkajinam | we globally removed python 2 support at ussuri afair and that was because mainly SwiftStack required it for a bit longer ( to run new swift in older ubuntu). Idk if that requiremenet still stands | 12:48 |
tkajinam | s/that was/keeping py2 support in swift/ I mean | 12:49 |
SvenKieske | i quickly grepped for python2|3 in the swift repo, at least there seem to be some remnants of python2 support left. as the only tests being run on python2 seem to be linting|tox stuff I doubt that it works. | 12:52 |
SvenKieske | just recently found out swift support with keystone backend for auth was broken since zed release in k-a because we did not test that. | 12:53 |
SvenKieske | so my basic assumption since roughly 6 years is: untested code is broken code. | 12:54 |
tkajinam | SvenKieske, that's interesting. do you have a bug for it ? | 12:54 |
tkajinam | we run some tempest tests to validate deployment with swift + keystone in puppt jobs but I've never seen any problems so far (though our test coverage is quite limited) | 12:55 |
tkajinam | I think swift + keystone is covered by usual dsvm jobs run in multiple projects | 12:55 |
SvenKieske | tkajinam: https://bugs.launchpad.net/kolla-ansible/+bug/2060121 this is kolla-ansible specific bug, see the attached patch | 12:55 |
SvenKieske | but it reinforces my view. another contributor is working on implementing tempest tests in kolla now. we need more integration tests in kolla and I think tempest is the best we can add. | 12:56 |
tkajinam | https://github.com/openstack/devstack/blob/master/lib/swift#L435 | 12:57 |
SvenKieske | we have custom bash integration test and they mostly work, but they really only test a fraction, even of our default install. | 12:57 |
tkajinam | I'd say that's not a bug in swift but one in deployment tools. though I feel like the requirement of /v3 path is redundant and something we may want to improve | 12:57 |
SvenKieske | yeah sure, I was talking about kolla-ansible, that's my main interaction point with openstack :) it's a bug in this deployment tool | 12:58 |
tkajinam | a bit tricky point with this discussion is that you may need to test s3 api instead of native swift api and tempest does not cover it for now | 12:58 |
SvenKieske | it was just an example where no tests lead to silently broken code, for many releases even. | 12:59 |
tkajinam | yeah | 12:59 |
SvenKieske | so if swift does not run integration tests in python2 I doubt it works in python2, until proven otherwise :) | 13:00 |
tkajinam | hm https://github.com/openstack/swift/tree/master/test/functional/s3api | 13:01 |
fungi | SvenKieske: to be clear, our "marketing speak" was that we fully supported python 3. you can fully support python 3 and 2.7 if you're careful about how you write your software | 13:16 |
SvenKieske | sure, but I also imho read somewhere that python2 support was - to be - removed and afaik there are projects without python2 support | 13:17 |
fungi | and yes, it was in service of people on older (but still supported by their vendor) gnu/linux distributions being able to upgrade to the latest releases of swift | 13:17 |
SvenKieske | I'm pretty sure I myself added the usage of a standardlib that's not available in python2 | 13:17 |
fungi | sometimes people don't want to upgrade the distribution they're running, and as long as that distro still provides necessary things like security fixes i don't see that as a concern | 13:18 |
SvenKieske | no | 13:18 |
SvenKieske | but https://docs.openstack.org/tempest/latest/supported_version.html does not list any python 2 version as supported | 13:18 |
SvenKieske | so imho it's good to wonder why we burn CI cycles on python2 tests? | 13:19 |
SvenKieske | ah damn, that's only tempest | 13:19 |
fungi | SvenKieske: maybe the missing piece here is that swift doesn't require keystone. it can be installed as a stand-alone service | 13:19 |
fungi | and from what i understand there are quite a few large deployments of stand-alone swift without other openstack services alongside it | 13:20 |
fungi | and that's what the swift team has been trying to make sure kept working on python 2.7 | 13:21 |
SvenKieske | but python2 is also not listed on the supported runtimes for zed: https://governance.openstack.org/tc/reference/runtimes/zed.html | 13:21 |
fungi | SvenKieske: right, in the zed release projects were not *required* to support python 2.7, but that doesn't mean they couldn't still choose to do so | 13:22 |
SvenKieske | okay, makes sense :) | 13:22 |
fungi | openstack-wide support guarantees are the minimum required by projects included in openstack, not the extent of what some of they might support in isolation | 13:23 |
SvenKieske | imho it would still make sense to at least think about a roadmap when to officially demand removal of this though. | 13:23 |
fungi | i think i heard that the swift team was sunsetting python 2.7 support, i don't recall what that timeline was, perhaps that's something they'll be talking about in ptg sessions next week | 13:24 |
SvenKieske | okay, thanks for the insights. I wasn't really aware there are actually still openstack projects with python2 support. guess I'm one of the lucky 10000 ( https://xkcd.com/1053/ ) | 13:25 |
fungi | openstack is pretty vast, it's hard to know everything that goes on in it | 13:31 |
corvus1 | my guesstimate of the db import time was way off, it took 9 hours. | 14:14 |
frickler | wow, that's a lot | 14:18 |
corvus1 | yeah, i think it's all the indexes... i made the estimate based on the rate of the artifact table, but it has few indexes. i think the builds/buildsets/refs tables, which have lots of indexes, slowed down a lot | 14:19 |
corvus1 | during the recent migration, i turned off indexes then recreated them at the end. i think we should see if it's possible to do that with mysqldump | 14:20 |
corvus1 | s/turned off/deleted/ | 14:20 |
frickler | unrelated, didn't we have like 500+ config errors? now I see only 335 and I wonder what happened | 14:20 |
corvus1 | i think if we do that, we may approach the runtime of the migration (which i think was 40m?) | 14:20 |
corvus1 | heh, i wonder if archive.org crawled our config error page? and if we're about to block it? :) | 14:21 |
frickler | interesting idea. seems there's only one single copy from 2022 though :( | 14:26 |
corvus1 | frickler: i think the scheduler logs the error count every time it reconfigures a tenant; so you could at least double check the numbers, but not the actual error contents from the logs | 14:28 |
frickler | I'm just check against a copy I made 4 weeks ago. seems my memory of the count is mostly correct and we have 250 less errors for openstack/trove combined with some general increase due to 2024.1 branching | 14:32 |
fungi | hopefully that means someone actually fixed trove's job configs | 14:49 |
clarkb | fungi: I do wonder if we should make a pbr2 package and then have that drop python<3.8 support. Probably a lot of effort for minimal gain | 14:50 |
clarkb | particularly since we may be able to drop python2 in the near future which is probably the biggest tripping hazard | 14:50 |
fungi | we'll probably be able to drop python2 support from pbr soon enough that it's not worth the extra dance | 14:50 |
fungi | yeah, that | 14:50 |
clarkb | the one place where it could get tricky is if people see that as an invitation to start adding newer python3 stuff but then we won't necessarily work with newer python3 when installing in old locations either. THough I suspect that pip's fake pyproject.toml stuff may help a bit there | 14:51 |
clarkb | I'll be joining the gerrit community meeting in about 9 minutes. Going to ask them about this reindexing bug on gerrit 3.9 (my current biggest concern with an upgrade) | 14:51 |
fungi | testing with sufficiently old python3 is probably sufficient to control that | 14:52 |
clarkb | good point | 14:52 |
corvus1 | hrm, the mysqldump script already disables keys during the import, so i'm not sure that manually deleting them and adding them would be faster. | 15:04 |
fungi | any idea what the bottleneck is? network bandwidth? cpu on the trove instance? | 15:05 |
corvus1 | (or string parsing overhead of a text dump file)? | 15:05 |
corvus1 | i don't know, and it's a bit hard to tell without access to the db host... | 15:06 |
fungi | maybe this is where that "break the warranty sticker" root login comes in handy | 15:07 |
fungi | or we can accelerate our plans to launch a dedicated mariadb server instance | 15:07 |
corvus1 | well, even that's just root db user | 15:07 |
fungi | oh, root in the db not a rood command shell in the os | 15:07 |
fungi | yeah, that's maybe some help still (i think mysql has performance details for some stuff?) but not the whole picture | 15:08 |
corvus1 | i'm sure we can make this faster, but is it worth it? do we sink a lot of time into improving it, or is 9 hours of missing build records on a saturday acceptable? | 15:08 |
fungi | i think it's acceptable, but also wonder if it's much less work than an ansible playbook that installs the mariadb container we use elsewhere and launching a server | 15:09 |
frickler | I think it is fine, too. bonus if you make sure to start after the periodic-weekly pipeline is done | 15:10 |
corvus1 | yeah, if we can fold a migration to self-hosted in, that would be ideal; just not sure how fast we can cobble together that change | 15:10 |
fungi | i'm starting to look at it because i can't help myself, but really shouldn't be since i'm up against several other deadlines | 15:13 |
fungi | i guess the main things we need are a servicve-zuul-db playbook that includes the mariadb role and our other standard roles, custom firewall rules allowing query access from the zuul schedulers, some rudimentary testinfra test(s)... what else? | 15:15 |
corvus1 | yes -- except there is no mariadb role because we don't have any standalone mariadbs | 15:15 |
corvus1 | so that has to start as a copy of, say, the gerrit role with a bunch of stuff removed | 15:15 |
fungi | oh, yes i totally missed that and assumed we had already made a shared mariadb role, but i see we haven't | 15:16 |
fungi | instead we just embed mariadb container configuration in every service that needs one | 15:16 |
corvus1 | yep | 15:16 |
* fungi had started from a copy of the etherpad role but gerrit would have also worked yes | 15:17 | |
corvus1 | if you're doing that, i can launch the server | 15:17 |
fungi | but yeah, maybe we just do the migration to another trove on saturday. i don't think i can commit to writing and debugging this before the weekend | 15:18 |
corvus1 | ok i'll stand down then :) | 15:18 |
fungi | the scope isn't substantial, but it's more than i have time for before next week at the earliest | 15:18 |
clarkb | https://gerrit-review.googlesource.com/c/gerrit/+/417857 progress | 15:18 |
fungi | yay! | 15:19 |
fungi | and with that, i'm disappearing for lunch but shouldn't be more than an hour tops | 15:19 |
clarkb | I also understand the issue much better now. Bsaically there was a new feature added that allowed a full offline reindex to start from a checkpoint state (possibly precreated before you do an upgrade) this keeps deltas small and speeds up your "full" offlien reindex. However, there was a bug and in some cases (I think when you did not create the checkpoint state whcih is non default) | 15:28 |
clarkb | it would completely delete the changes index | 15:28 |
clarkb | then you start gerrit and panic. It turns out that if you rerun a full reindex from that state it works beacuse its starting from 0. That means the workaround is to simply rerun the reindex | 15:28 |
clarkb | but this was under documented and obtuse and when you're in a "gerrit is basically completely unusable" state you're not likely to find that path forward | 15:28 |
clarkb | anyway that revert pulls out the functionality and then 3.10 (current master) has reimplemented it in a different more robust way | 15:29 |
clarkb | The other thing that was called out is that SAP apparently has hit what they think may be a race between C git and Jgit during repacking of large repos. It sounds like packed-refs ends up getting truncated and then the repo is unuseable. They were able to restore from backups though | 16:00 |
clarkb | Apparently one installation has for many years (like a decade) done a hard link to packed-refs before running gc then only removes the hard link if things check out cleanly. We may want to investigate doing this in our system. (it was nasser saying they do that so we can followup if it seems like a good idea) | 16:01 |
clarkb | SAP seemed to think it is extremely unlikely to happen though (and they don't even know that it is a race between c git and jgit that is just a theory). | 16:02 |
clarkb | sounds like their gerrit install has many more changes than ours and much larger repos involved | 16:02 |
opendevreview | James E. Blair proposed opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079 | 16:08 |
corvus1 | fungi: clarkb ^ i dropped and added a single index on the new server on the artifacts table and it took over an hour, which is kind of spooking me about using this trove db. so i went ahead and tried to throw together a self-hosted change. | 16:10 |
clarkb | corvus1: you did that on the old server or the new test one? (or maybe both?) | 16:11 |
corvus1 | new one; i don't have a comparable time for the old server, so i can't say whether it's slower or not | 16:14 |
clarkb | corvus1: posted some quick thoughts on that change. But lgtm | 16:17 |
clarkb | well other than addressing those minor things I mean | 16:18 |
opendevreview | James E. Blair proposed opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079 | 16:24 |
corvus1 | clarkb: thanks! | 16:24 |
clarkb | corvus1: looks like you've got the secrets file open so gpg is telling me to go away (I presume ofr the trove stuff) | 16:26 |
clarkb | if you don't need it anymore there is an edit I'd like to make this morning | 16:26 |
corvus1 | yep, i'll exit now | 16:26 |
clarkb | the email for the password change infra-root just got was me | 16:34 |
corvus1 | i'm launching an 8gb performance flavor in dfw for zuul-db01 | 16:36 |
corvus1 | on jammy | 16:36 |
fungi | okay, back now | 16:37 |
fungi | and reviewing the db server change, awesome! | 16:40 |
clarkb | oh maybe the passwd change email only went to me. Maybe we should upate that contact email. One thing at a time :) | 16:41 |
fungi | yeah, we talked about that when i was doing the mfa stuff for it too | 16:42 |
opendevreview | James E. Blair proposed opendev/zone-opendev.org master: Add zuul-db01 https://review.opendev.org/c/opendev/zone-opendev.org/+/915082 | 16:45 |
clarkb | corvus1: on the db role change I think maybe we need to add the groups fiel for testing to the big list that gets copied by ansible to set up the test bridge? | 16:46 |
opendevreview | James E. Blair proposed opendev/system-config master: Add zuul-db01 https://review.opendev.org/c/opendev/system-config/+/915083 | 16:46 |
clarkb | I'm trying to dig that up and will get a pink | 16:47 |
clarkb | corvus1: https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base.yaml#L115 needs an entry there. I'll leavea gerrit comment for historical reasons too | 16:48 |
corvus1 | clarkb: found it | 16:48 |
corvus1 | yep | 16:48 |
opendevreview | James E. Blair proposed opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079 | 16:49 |
opendevreview | James E. Blair proposed opendev/system-config master: Add zuul-db01 https://review.opendev.org/c/opendev/system-config/+/915083 | 16:49 |
corvus1 | clarkb: fungi i just ran the same drop/add on my local mysql8 db with a slightly older copy of the opendev db and it finished in 19min. so i think it's safe to say that the trove mysql8 is not optimal, but it won't be clear if self-hosting will be an improvement until we test there. | 16:52 |
clarkb | ack | 16:52 |
fungi | it definitely sounds like a promising data point though | 16:52 |
corvus1 | i need to run some errands -- if ya'll can continue pushing on the mariadb thing, that would be much appreciated | 16:53 |
fungi | absolutely, thanks!!! | 16:55 |
opendevreview | Clark Boylan proposed opendev/system-config master: Rebuild the etherpad container image https://review.opendev.org/c/opendev/system-config/+/915084 | 16:55 |
opendevreview | James E. Blair proposed opendev/system-config master: Set standalone mariadb innodb buffer pool to 4G https://review.opendev.org/c/opendev/system-config/+/915085 | 17:01 |
corvus1 | okay really leaving now :) | 17:01 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079 | 17:51 |
clarkb | fungi: I think you need to rebase the other two chagnes on that too? | 17:54 |
clarkb | or maybe you want to see check apss first? thats fine I guess | 17:54 |
fungi | yeah, i wasn't eager to rebase those until i see it passing | 17:54 |
fungi | just in case there are other surprises lingering | 17:55 |
fungi | but will do, for sure, once it's all good | 17:55 |
Clark[m] | Thanks. I'm working on lunch now. It got cold again so I'm making a quick dashi to do some ramen | 18:19 |
fungi | yum! what base are you using? shiitaki? kombu? bonito? niboshi? some combination of those? | 18:21 |
Clark[m] | Kombu and katsuobushi (bonito) | 18:22 |
fungi | soooo gooooooooood! | 18:22 |
fungi | oiishi | 18:23 |
fungi | er, oishii i meant | 18:23 |
Clark[m] | Nothing fancy just putting something together with what I've got laying around. Noodles were bought fresh hut had one serving left hiding in thebfreezert | 18:25 |
Clark[m] | *the freezer | 18:25 |
fungi | we end up with a lot of shiitake dashi from rehydrating dried mushrooms for other dishes, excellent for reuse | 18:26 |
fungi | less so recently since we've been growing our own shiitake though | 18:28 |
fungi | given my irc nick you'd think i would have at least attempted mushroom farming earlier in life, but i've discovered it's surprisingly easy | 18:30 |
Clark[m] | I've debated buying one of the kits. I feel like I would end up killing them like I do the plants in my yard | 18:31 |
fungi | kits are mostly for educational purposes and not a sustainable way to farm | 18:36 |
fungi | longer term you can just grow shiitake on billets of oak in a dark place like your basement, root cellar or crawlspace | 18:36 |
fungi | the wood needs to stay damp but not soaked, and you just harvest the mushroom growth from them periodically | 18:37 |
fungi | https://zuul.opendev.org/t/openstack/build/60d3f05987e34125b88c1cdbe8a85ad9 | 19:16 |
fungi | what am i missing? | 19:16 |
fungi | the change has: | 19:16 |
fungi | assert mariadb_log_file.contains('mariadb: ready for connections') | 19:16 |
fungi | oh, wait, that's a buildset for the old patchset | 19:17 |
fungi | check hasn't reported on the new patchset | 19:17 |
fungi | guess i found an outdated notification in my inbox | 19:17 |
fungi | though oddly, it was sent 5 minutes ago | 19:18 |
fungi | oh, it's for a child change, not the one i updated | 19:19 |
fungi | okay, less confused now | 19:19 |
fungi | though the new patchset is failing in a new way | 19:20 |
fungi | "Apr 4 18:24:24 zuul-db99 docker-mariadb[15314]: 2024-04-04 18:24:24 0 [Note] mariadbd: ready for connections." https://zuul.opendev.org/t/openstack/build/e9eb0810ce1c45c995bcea46b6655405/log/zuul-db99.opendev.org/containers/docker-mariadb.log#36 | 19:23 |
fungi | https://zuul.opendev.org/t/openstack/build/e9eb0810ce1c45c995bcea46b6655405/log/job-output.txt#54619-54622 | 19:24 |
corvus1 | ha i see it | 19:24 |
corvus1 | i'll fix | 19:24 |
* fungi sighs | 19:24 | |
fungi | what did i miss? | 19:24 |
opendevreview | James E. Blair proposed opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079 | 19:25 |
corvus1 | fungi: you're gonna love it | 19:25 |
fungi | zomg | 19:26 |
fungi | how did i not notice that extra d? did i not cut and paste? i guess not! | 19:26 |
corvus1 | dbdbdbdbd | 19:26 |
fungi | feels like it should be a friday | 19:27 |
corvus1 | i want to start a db project called pqdb | 19:27 |
fungi | and name the process pqdbdbd | 19:27 |
corvus1 | and the query client pqdbdbdpq | 19:28 |
fungi | i'd use it as often as i was able to type it | 19:29 |
opendevreview | James E. Blair proposed opendev/system-config master: Add zuul-db01 https://review.opendev.org/c/opendev/system-config/+/915083 | 19:29 |
opendevreview | James E. Blair proposed opendev/system-config master: Set standalone mariadb innodb buffer pool to 4G https://review.opendev.org/c/opendev/system-config/+/915085 | 19:29 |
fungi | :: | 19:32 |
fungi | my window manager is unusually squirrelly today | 19:32 |
fungi | guess i'll take the opportunity for a package upgrade. running out of ways to procrastinate on paperwork | 19:33 |
clarkb | its really maria db d ? | 19:55 |
clarkb | thats like the equivalent of a typing tongue twister | 19:55 |
corvus1 | yup for realz | 20:04 |
fungi | really for reals | 20:24 |
corvus1 | first change is looking good, but third change failed with this failure which seems spurious: https://zuul.opendev.org/t/openstack/build/7504595c5a474e7a81bcbf57f62c9f26 | 20:32 |
corvus1 | (and related to the zk host) | 20:33 |
corvus1 | i'm going to recheck that but we should keep that in mind if it shows up again | 20:33 |
clarkb | ya looks like the test node lost networking? | 20:34 |
clarkb | ++ to a recheck | 20:34 |
corvus1 | clarkb: fungi https://review.opendev.org/915082 can you +3 that? | 20:34 |
fungi | that's my interpretation as well | 20:34 |
fungi | corvus1: done | 20:35 |
corvus1 | i have created secret creds for it on bridge, so i think all pieces are in place, just awaiting merge | 20:37 |
opendevreview | Merged opendev/zone-opendev.org master: Add zuul-db01 https://review.opendev.org/c/opendev/zone-opendev.org/+/915082 | 20:38 |
opendevreview | Merged opendev/system-config master: Add a standalone zuul db server https://review.opendev.org/c/opendev/system-config/+/915079 | 21:21 |
opendevreview | Merged opendev/system-config master: Add zuul-db01 https://review.opendev.org/c/opendev/system-config/+/915083 | 21:21 |
corvus1 | that's enough to get started; once that deploys i'll manually make the config change, start the db, and start an import | 21:25 |
fungi | infra-prod-base deploy failed for 915079, checking into the logs to see why | 21:34 |
fungi | zuul-db01.opendev.org : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 | 21:36 |
fungi | not sure what that's all about, i can sudo ssh into it from bridge | 21:37 |
fungi | "Data could not be sent to remote host "104.239.240.24". Make sure this host can be reached over ssh: Host key verification failed." | 21:38 |
fungi | `sudo ssh -4 zuul-db01.opendev.org` is also working with no interaction, so it's not a host key mismatch on the v4 addy or anything like that | 21:39 |
fungi | should we just reenque the change into deploy? | 21:40 |
corvus1 | fungi: yeah, i agree with all that. any chance you have a buildset link? | 21:45 |
corvus1 | fungi: actually there is a giant deploy happening now, i guess we can just let it run? | 21:46 |
corvus1 | fungi: oh weird, actually i would not have expected 079 to try to contact the host because it wasn't added until 915083, which is the change after it, and the one that's running now | 21:49 |
corvus1 | i'm not sure why it thought it had a zuul-db01 in inventory since it hadn't been added to the inventory yet.... :? | 21:49 |
fungi | corvus1: oh, we've seen this before | 21:49 |
corvus1 | but at any rate, i think we can expect 083 to work since at least everything should definitely be in place then | 21:49 |
fungi | approve changes together and the deploy works off the state of everything that merged before it started | 21:50 |
fungi | which fails for the penultimate change because it's operating off state that isn't relevant yet | 21:50 |
fungi | happened to me when i added lists01.opendev.org now that i think about it | 21:51 |
fungi | if we'd waited to approve the inventory addition until after the prior change deploy had finished, it would have been fine | 21:51 |
fungi | but the inventory addition showed up early during the parent change's deploy | 21:51 |
corvus1 | oh because deploy is a change pipeline, not ref-updated | 21:52 |
fungi | zactly | 21:52 |
fungi | so not a show-stopper, but worth thinking about whether there's a way to solve that short-term race i suppose | 21:53 |
fungi | other than just delaying later approvals, that is | 21:54 |
fungi | because we'll never remember to do that | 21:55 |
clarkb | but shouldn'tssh still work? | 21:55 |
fungi | it's possible we didn't add the host key as known at that point | 21:56 |
clarkb | I can understand there may be an order thing going on but if the inventory has the node and we have ssh set up (lauch node does this) it should still be able to connect right? | 21:56 |
clarkb | oh right the host key comes out of the inventory and we may have needed an earlier step to apply that | 21:56 |
fungi | my guess is it's a known_hosts challenge, right | 21:56 |
clarkb | I wonder if bootstrap bridge does that | 21:57 |
fungi | hard to say for sure because the logs are a bit opaque | 21:57 |
clarkb | and it can run at the same time as other playbooks? that might be the source of the bug | 21:57 |
fungi | possible we're skipping a necessary job with changed file filters, yep | 21:57 |
fungi | or at least not running it first | 21:57 |
clarkb | ya I think that is it (either of those two scenarios or both) | 21:58 |
fungi | which we might also be able to solve another way if we could make openssh ignore missing known_hosts entries as long as there's a matching sshfp result | 21:59 |
fungi | that was the original intent with sshfp records, but openssh upstream backpedaled after dnssec failed to gain traction | 22:00 |
corvus1 | the letsencrypt job failed (don't know why yet) but that's a stroke of luck in that it skipped 21 jobs and the zuul-db deploy is next since it doesn't depend on it. | 22:25 |
clarkb | 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'letsencrypt_certcheck_domains' | 22:35 |
clarkb | however in the task just two ish tasks prior it records the value of that variable so I dont' know why that failed | 22:35 |
clarkb | some sort of bug in fact recording? | 22:35 |
opendevreview | Merged opendev/system-config master: Set standalone mariadb innodb buffer pool to 4G https://review.opendev.org/c/opendev/system-config/+/915085 | 22:45 |
corvus1 | clarkb: but nb02 is not in that list... | 22:45 |
corvus1 | okay that's some serious ansible wizardry to build the list of domains and it's failing at the nb02 step | 22:47 |
corvus1 | there's a comment in that task file that makes me think maybe this happens occasionally | 22:50 |
corvus1 | iptables on zuul-db01 looks reasonable | 22:51 |
opendevreview | James E. Blair proposed opendev/system-config master: Mariadb: listen on all IP addresses https://review.opendev.org/c/opendev/system-config/+/915096 | 22:57 |
corvus1 | clarkb: fungi ^ one more thing; i have that in place manually and i think that's it. | 22:57 |
fungi | oh, good catch. for the mailman3 role i configured it to only work on the loopback | 22:59 |
corvus1 | fungi: clarkb we need to put /var/mariadb/db on /opt -- how should we do that? | 23:16 |
fungi | oh, for more disk space? hmm.. | 23:17 |
corvus1 | should i just change the docker-compose to use /opt instead of /var/mariadb? or do a bind mount... or...? | 23:17 |
corvus1 | ya | 23:17 |
corvus1 | seems like maybe just moving the docker-compose volume mounts might be easiest/best? | 23:17 |
fungi | yeah, i think for other things we've done something like /opt/mariadb and then fiddled with fstab and cinder volumes if we deploy in another provider | 23:18 |
fungi | clarkb: ^ does that sound right? | 23:18 |
opendevreview | James E. Blair proposed opendev/system-config master: Move standalone mariadb to /opt https://review.opendev.org/c/opendev/system-config/+/915098 | 23:20 |
corvus1 | since it's easy, i've made that change locally on the server. i'm going to put it into emergency for now so it won't be reverted | 23:33 |
fungi | sounds good. i would have approved it, but would appreciate clarkb's input once he's back | 23:34 |
corvus1 | yeah, it's super easy to undo if we want something else | 23:34 |
Clark[m] | Usually we bind mount the drive to something in /var | 23:35 |
Clark[m] | But the end result is the same other than the path. Etherpad is an example of this iirc | 23:35 |
corvus1 | is that in ansible, or was that just done manually? | 23:36 |
Clark[m] | I think it's done with launch node flags telling it what to do with the epehermal drive? | 23:37 |
Clark[m] | Or with a volume via launch node | 23:37 |
corvus1 | /dev/main/main-etherpad02 /var/etherpad/db ext4 errors=remount-ro,barrier=0 0 2 | 23:37 |
corvus1 | that's looking like lvm | 23:37 |
fungi | oh, so we mount the ephemeral disk to /var/something in those cases? | 23:38 |
corvus1 | well, at least in etherpad's case, we got an extra volume and lvm'd it and mounted that; it's not actually using /opt | 23:38 |
fungi | yeah, on etherpad02 we have /dev/xvdb1 as a pv and then make a logical volume on it | 23:38 |
Clark[m] | That's a volume xvde is ephemeral | 23:39 |
fungi | right, we're using a cinder volume in that case, mainly so that we have some insurance beyond backups | 23:39 |
corvus1 | want i should make a volume for zuul-db02 and mount it at /var/mariadb ? | 23:40 |
corvus1 | mimic etherpad? | 23:40 |
corvus1 | might be nice to have opt for scratch space for giant sql files anyway :) | 23:40 |
fungi | i suppose it's a question of whether we 1. need more space than the ephemeral disk provides, 2. might consider detaching and attaching to a different server in the future | 23:41 |
fungi | or yeah, 3. want to be able to use the ephemeral disk for something else entirely | 23:41 |
corvus1 | 1: not today, but we'll use ~half of it i think; 30 out of 60g maybe more | 23:42 |
corvus1 | 2. probability significantly higher than 0 | 23:42 |
corvus1 | and 3 -- yeah, if we share it, we will be nearly out of space if we make a single mysqldump. so yeah, cinder has some things going for it. :) | 23:42 |
fungi | sounds good to me. also we could probably use some of our ssd quota for that rather than the default sata, for added performance | 23:45 |
corvus1 | yep | 23:45 |
corvus1 | VolumeManager.create() got an unexpected keyword argument 'backup_id' | 23:45 |
corvus1 | i got that when i ran volume create... :( | 23:45 |
corvus1 | do i need to use a certain venv? | 23:45 |
fungi | we don't need it to be huge. probably the 100gb minimum rackspace requires would suffice | 23:45 |
corvus1 | the one in launcher-venv does not work | 23:47 |
tonyb | corvus1: yes we do. I think it's /home/fungi/xyzy | 23:47 |
tonyb | something like that. history should help | 23:48 |
fungi | heh, um... | 23:48 |
corvus1 | tonyb: thanks! different error: The plugin rackspace_apikey could not be found | 23:48 |
corvus1 | so i think fungi's secret venv has bitrotted since the mfa stuff and we need a new one! | 23:48 |
tonyb | ahhh that be new due to the MFA stuff | 23:48 |
fungi | we can probably fix that by installing the rackspace client plugin into that venv | 23:48 |
corvus1 | fungi: if you want -- i won't touch it since it's in your homedir :) | 23:49 |
fungi | i just installed rackspaceauth into that xyzzy venv | 23:50 |
fungi | try again? | 23:50 |
corvus1 | fungi: success, thanks! | 23:51 |
fungi | but really, we should figure out why /usr/launcher-venv doesn't work for that | 23:51 |
corvus1 | other openstack volume commands work in the global env, only the create fails with the backup_id thing | 23:52 |
fungi | yeah, i had previously only tested things like volume list | 23:54 |
fungi | so didn't realize the main venv wasn't able to create new volumes | 23:55 |
corvus1 | okay all done, and change abandoned | 23:57 |
corvus1 | removed host from emergency | 23:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!