Thursday, 2021-11-04

ianwWed Nov  3 23:45:55 UTC 2021 Pruning /opt/backups/borg-wiki-update-test/backup archive wiki-upgrade-test-filesystem00:00
ianwWed Nov  3 23:52:35 UTC 2021 Pruning /opt/backups/borg-wiki-update-test/backup archive wiki-upgrade-test-mysql00:00
ianwi.e., it ignored the .checkpoint archive, and found "wiki-upgrade-test-filesystem" and "wiki-upgrade-test-mysql" to prune00:00
ianwas expected00:00
fungiawesome, thanks00:04
*** frenzy_friday is now known as frenzyfriday|sick04:17
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: update theme to javascript plugin
ianwclarkb/fungi: has ^ applied, although on gerrit 3.406:11
ianw#status log fedora 35 mirror finished syncing.  fedora 33 removed06:12
opendevstatusianw: finished logging06:12
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: update theme to javascript plugin
*** gibi_pto_back_thu is now known as gibi07:56
*** lbragstad4 is now known as lbragstad10:33
*** lbragstad1 is now known as lbragstad10:43
*** lbragstad7 is now known as lbragstad11:07
*** dviroel|rover|out is now known as dviroel|rover11:12
*** dpawlik2 is now known as dpawlik11:47
*** dpawlik1 is now known as dpawlik12:16
opendevreviewAndre Aranha proposed zuul/zuul-jobs master: Add fips version of jobs needed for OpenStack
*** artom_ is now known as artom15:02
*** dviroel|rover is now known as dviroel|lunch|appt|afk15:03
clarkbianw as noted in the change on 3.4 looks good to me in firefox on my desktop and mobile chrome on the phone. If a dark theme user (fungi?) can maybe check that theme looks ccorect to them we should be able to land this and restart 3.3 on it?15:44
fungiyeah, i should be able to take a look after my current meetings wrap up15:45
clarkbfungi: you might have to login to that held gerrit using the test credentials in order to toggle the default. I'm not sure how to force a dark theme load otherwise though I suppose it might be a flag on the url?15:48
fungiprobably need to be logged in, yeah15:49
*** marios is now known as marios|out16:42
*** dviroel|lunch|appt|afk is now known as dviroel|rover17:20
*** jpena|off is now known as jpena17:49
*** jpena is now known as jpena|off18:10
fungiclarkb: ianw: i notice the last patchset for 816618 was uploaded ~1.5 hours after the mention of deployment to, i guess the dockerfile line changed in patchset 2 is immaterial to what's on that server?18:26
clarkbfungi: my hunch is that ianw may have manually updated the js file on the held node18:28
clarkbbut confirming that would be a good thing (since we know the html doesn't work on gerrit 3.4 ps1 wouldn't have worked out of the box and would have needed manual intervention)18:28
fungialso testing the dark theme was easy, since login there is unauthenticated (i picked the "zuul" acccount to become)18:28
fungithe zuul summary plugin still seems to work fine on it, btw18:30
fungiyeah, totally lgtm18:30
fungishould that change be wip until we upgrade, or will it also work on 3.3?18:33
clarkbit should also work on 3.3 and the screenshots on that patchset seem to show that18:34
clarkbI think we should go ahead and land it on 3.3 then we're set for the upgrade to 3.4. But we can confirm with ianw how the file got in place on the held node18:34
fungiahh, yeah for the system-config-run-review-3.3 build18:34
clarkbyup that one18:34
opendevreviewClark Boylan proposed opendev/bindep master: Try out PBR pep 517 support
clarkbI actually think bindep is probably a bad choice to start pep 517 stuff on simply because we want it to run on as many platforms as possible. But I think bindep being a self contained utility that is tested on a lot of platforms makes it a good candidate for testing it18:48
fungiwe likely want to make sure bindep can work with both and pyproject.toml18:50
clarkbgood point we can add the back in once the first round of testing without it is done. I'm mostly abusing the fact that bindep runs tests on all the things which is good forcoverage :)18:58
clarkband everything failed rip18:58
clarkbseems that bindep produced no output so maybe we aren't installing things properly. I'm really glad that I removed PBRs dog fooding of this stuff now :)18:59
clarkbya there is no bindep binary in the installation so something isn't working. Yay for babysteps. Not sure I'll be able to dig into this further today19:00
opendevreviewMerged opendev/system-config master: gerrit: update theme to javascript plugin
ianwyeah, sorry, the file on the held node was the result of me hand-editing it20:04
fungicool, thanks for confirming20:05
clarkbI guess we should plan a restart today to make sure 3.3 is happy with it in production20:10
ianwi can also do it on monday if we like, usually very quiet then20:11
clarkbI think its probably fine today. It has been a quiet week20:11
ianwi've just noticed navigating as a logged in user, it appears gerrit is nto talking to mariadb 20:11
ianw [Warning] Access denied for user 'gerrit'@'' (using password: YES)20:11
ianwwhich is weird20:12
clarkbin production?20:12
ianwno, sorry, on that test node20:12
clarkbI just checked a prod change and was able to mark a file reviewed then mark it unreviewed. Ah ok20:12
clarkbianw: I wonder if we need to update our java driver for mariadb for 3.4?20:12
ianwi guess it is talking, because it's getting login rejections20:13
ianwi'll have to page back in how it sets up20:13
ianwi feel like it's just a docker env with a user/pass20:13
clarkbyes I think the docker compose side configures the database and user/pass for mariadb then we set it in our gerrit etc/secure.conf20:13
fungii noticed the same error when i "authenticated" as the zuul account on that test instance20:13
clarkbthis is good, testing is finding problems. The best kidn of successful failures20:14
fungii should have called it out, but thought it was a nuance of nobody having logged into the account to create any entries for it20:15
ianwdocker-compose has those values filled out20:15
clarkbianw: and then on the gerrit side:
ianwthat all matches20:17
ianwi guess the container must not have setup the user ...20:17
ianwdb container20:17
clarkbianw: I think it does log that stuff when it starts up. You can also try to connect with those credentials using the mysql client in the container20:17
fungiis mariadb even running?20:17
ianwit is, i'm just looking at the logs now20:18
ianwInnoDB: Cannot open '/var/lib/mysql/ib_buffer_pool.incomplete' for writing: Permission denied20:19
ianwInnoDB: The error means mysqld does not have the access rights to the directory.20:19
clarkbI wonder if that has to do with mounting it under /home/gerrit2 so it isn't root owned?20:20
ianw /home/gerrit2/reviewdb/ also appears to be populated with db files, so ... weird20:21
opendevreviewMerged openstack/project-config master: kolla-cli: enter retirement
fungiin production, mysqld is running as the systemd-coredump user (999)20:24
fungiso the db files are owned by that account20:24
fungias is the /home/gerrit2/reviewdb directory20:24
clarkbfungi: same on the test node20:25
fungiinteresting that we don't tell it to run as the gerrit user20:25
clarkbI think we're just using the default in the image which apparently is 99920:26
clarkbit isn't clear to me why there would be a permissions issue in this case20:26
clarkbits a bit odd and probably worth making a plan to change, but that dir is owned by the user running mysqld and should be able to write to the dir20:27
ianwi can't connect as root either20:28
clarkbslightly different issue, but I wonder if this is a buggy container image?20:34
clarkbhrm the tag last updated 20 days ago and production was fine with a recent pull and down up20:35
clarkb is a build from a while back that seems to exhibit this issue too20:39
clarkbso not a new problem20:39
clarkbprod logs the same thing about perms denied so I don't think that is the issue20:40
clarkbianw: I think the segfault is the problem20:42
fungia web search for "ib_buffer_pool.incomplete" turns up people complaining about permission errors all over the place, btw20:44
ianwinteresting; there is not a segfault on the held node20:44
clarkbianw: oh huh there was in my example above20:44
clarkbin prod it complains about permissions on shutdown but not startup it seems20:44
clarkbmaybe it is the same issue but matters when you hit it20:44
ianw is what we're using20:45
ianwpossibly this things they've done to support MARIADB_USER and MYSQL_USER has got something wrong20:48
ianwi see
ianwbut then it's like it never ran the setup @
clarkbya I just ran a hacked up version of that docker compose entry on my local machine and it clearly logs creating the database and adding a user and giving that user perms on the db20:53
clarkbthat doesn't seem to happen in the logs on our test machines20:53
ianwDATABASE_ALREADY_EXISTS is set if the db dir has a "mysql" directory20:55
fungiahh, so that it won't clobber an existing db20:56
clarkbwe do create /home/gerrit2/reviewdb but not /home/gerrit2/reviewdb/mysql from what I see. I 20:58
clarkber I precreated this dir for me locally in testing and chowned it to root:root and it simply chowned it to uid 99920:58
clarkband started up successfully here on my local machine so that wasn't it20:58
ianwyep, that's what it does :
ianwOct 29 07:27:11 review99 docker-mariadb[10226]: 2021-10-29 07:27:11+00:00 [ERROR] [Entrypoint]: Unable to start server.21:03
ianwi think what happens here is that it tries to do this "temporary" server that fails to start, but does enough to create the db dirs21:03
ianwthen, it restarts itself, the entrypoint script now ignores any setup steps, and we have an unconfigured db21:04
clarkbya and then once the temporary db is up it uses that to create the users but we never get that far21:04
clarkbianw: oh ya that makes sense from what I'm seeing21:04
clarkbis it possibly some clash with ulimits or similar on that uid?21:05
ianwmaybe ... no other errors appear to be logged21:09
ianwi'm going to move the directory and see if i can replicate it with a restart21:10
opendevreviewClark Boylan proposed opendev/system-config master: DNM Set a uid/gid for gerrit mariadb to run as.
ianw2021-11-04 21:11:55+00:00 [Note] [Entrypoint]: Creating database accountPatchReviewDb21:12
ianw2021-11-04 21:11:55+00:00 [Note] [Entrypoint]: Creating user gerrit21:12
clarkbThat is another idea ^21:12
ianwof course, it all "just works" ...21:12
clarkbianw: I wonder if we are not changing permissions early enough21:12
clarkbSo when it goes to write the first time it is still owned by root maybe?21:13
ianwgerrit : Setup reviewdb directory for mariadb21:14
ianw-    "owner": 999,21:14
ianw+    "owner": 0,21:14
ianwthis implies the directory was already there, and owned by 999?21:15
clarkband then our ansible that says make it owned by root flipped it back?21:15
ianwthe steps before that are using docker-compose to run gerrit init21:15
fungithat'd explain the permissions error21:15
ianwi bet we're starting mariadb before we think we are21:16
clarkbianw: we do the docker compose up then set the perms to root21:16
clarkbits a race between mariadb starting and the ansible setting it back again21:17
clarkbI think we can simply drop the task to set perms for that21:17
clarkb(then longer term plan to maybe make it owned by gerrit as in my change? and do similar for the other services?)21:17
clarkbianw: in the etherpad case we ensure the dir exists for mysql state but down chown it with ansible21:18
clarkbso maybe we keep the task but remove the ownership details21:18
clarkbianw: you want to write that change or should I?21:19
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: setup mariadb directories before starting gerrit
ianwi think we just move it before the start?21:19
clarkbianw: well we'll continue to flip it then21:19
clarkbbecause mysql will switch it to 999 when it starts then the next ansible run will switch it back to root21:20
ianwoh, right, of course21:20
clarkbI think it is ok to ensure the dir exists but maybe drop the other stuff (etherpad works this way)21:20
fungiand maybe insert a comment so someone doesn't come along later and "fix" it by adding explicit (incorrect) permissions21:21
fungier, s/permissions/ownership/21:21
clarkblooks like gitea and refstack don't do any ansible management. etherpad only ensures it is a dir and exists. refstack is like gerrit and needs fixing21:22
clarkbLonger term we should try and fix these to run mariadb with a non conflicting user21:23
fungiand/or in a more distinct directory21:23
fungihaving mysqld mapping a subdirectory of the homedir for a different user is a bit confusing21:24
clarkbthe process for that is probably annoying: put server(s) in emergency file, do a db backup just incase, stop mariadb, chown files to the correct owner, start services, land change to manage that user and set the user in the docker-compose, then remove from emergency21:24
clarkbfungi: it is that way because of the volume21:24
fungibut i'm holding out hope gerrit will eventually drop that feature or the db21:25
clarkbI'd personally prefer to have one volume over multiple even though that is confusing. The path doesn't have anything to do with the issue here though it would affect any other path21:25
fungiyeah, long term mounting /srv on a cinder volume and then having /srv/gerrit and /srv/mariadb or similar would make sense if we want them using separate accounts21:25
clarkbfungi: I'm not suggesting separate accounts21:26
fungiwell, it's currently separate accounts21:26
clarkbI'm suggesting the opposite21:26
clarkbyes I know. I'm saying we should make a plan to stop doing that21:26
clarkband have it run under 3000:300021:26
clarkb(and then do similar with etherpad, gitea, refstack, and lodgeit21:27
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: don't chown mariadb container directory
ianwthis probably (hopefully) explains the segfault too -- something not handling bits of the db being re-owned21:27
fungiyes, i'm saying if we're going to tell mariadb to write in the gerrit user's homedir then we should make mariadb run as the same user as gerrit, otherwise if we think it's safer to use separate accounts then they should use distinct directory subtrees of a root-owned path21:27
clarkbI don't think there is any real reason to have separate accounts. I think this was just copied over from say gitea/etherpad/etc and we didn't realize what was happening under the hood21:28
clarkbthe data in the db for gerrit particularly isn't privileged and the users for each of these services already have access to the database as well and the database is per service and not shared so no concern about leaking21:28
ianwthe reason for putting it in gerrits homedir is just that we mount that to a volume directly in production, so i wanted to keep all the data on the same volume21:28
ianwoh, i see fungi you said that -- we should mount the homedirs separately on one volume21:29
fungii agree, we'd ideally like to be able to stop having that db entirely, it contains nothing sensitive and we don't even care that much if it's persistent, and it's only exposed over the loopback interface/unix sockets, so there's no real security benefit to separating it21:29
ianwthat could definitely be done, it would be much easier to deal with next time we move servers though :)21:30
clarkblooking at etherpad they set the uid to 5001 apparently21:30
clarkbso we've got a nodejs as uid 5001 and a mariadb as 999 there21:30
clarkbianw: I think refstack has the same issue as gerrit too btw21:30
fungipart of the concern is that ubuntu already assigns user 99921:31
fungiso technically it's a security risk the way it is21:31
clarkbyes I think that is the major concern21:31
clarkband why I suggested we make a plan to fix it for all services21:31
clarkbbut again it will be involved and each service will be slightly different21:32
ianwok, sorry i really have to do school run, bib21:32
clarkbianw: yup21:32
clarkbI think what we should do is fix the flapping that the ansible does today as that is a risk for service uptime21:32
clarkbfor gerrit and refstack. Then make a general plan that applies to each of the services then go through them one by one and fix them21:33
fungiyes, we could race ansible during a service restart21:33
fungior interrupt it with a poorly-timed reboot21:33
fungie.g., if the vm hung at just the wrong time21:34
opendevreviewClark Boylan proposed opendev/system-config master: Don't set lodgeit db dir perms
clarkbit is lodgeit not refstack with the same problem ^ I went ahead and pushed a fix for that too21:38
fungithanks, taking a look21:38
ianwmariadb just adds a user
ianwit seems like it's a bit arbitrary what that uid will be.  from an external pov it could change depending on what versions of things are in the container21:56
*** dviroel|rover is now known as dviroel|out22:13
ianwclarkb: it seems to me we have the same thing in refstack?22:32
ianwi guess we should also extend gerrit testing to leave a review on a file.  we currently set votes, but i guess nothing actually tries to use the reviewdb22:35
clarkbianw: I thought we did but then I looked again and didn't see it. If I just missed the second time around I say go for it (to fix it)22:36
opendevreviewIan Wienand proposed opendev/system-config master: refstack: don't chown db directory
opendevreviewClark Boylan proposed opendev/system-config master: Run zookeeper-statsd as the zookeeper user
opendevreviewClark Boylan proposed opendev/system-config master: Update zookeeper-statsd to python3.9 on bullseye
clarkbfungi: ^ if you have time for those that would be great22:43
clarkb(including the lodgeit and refstack changes)22:43
opendevreviewClark Boylan proposed opendev/system-config master: Run haproxy-statsd as uid 10001
opendevreviewClark Boylan proposed opendev/system-config master: Update haproxy-statsd to bullseye and python3.9
clarkbThats a pair of chagnes to match what I did in zookeeper-statsd22:52
clarkbbut let me check something on those22:52
clarkbah yup I need to do a thing22:53
opendevreviewClark Boylan proposed opendev/system-config master: Update haproxy-statsd to bullseye and python3.9
opendevreviewClark Boylan proposed opendev/system-config master: Run haproxy-statsd as uid 1000
clarkbit should match the haproxy uid like the zookeeper one matches zookeeper22:55
clarkbfungi: also
fungiclarkb: ianw and i both commented on that one23:03
clarkbah yup I can add the comment23:04
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: mark file reviewed during testing
opendevreviewClark Boylan proposed opendev/system-config master: Don't set lodgeit db dir perms
clarkbdone and thanks23:07
ianwthe zookeeper-statsd should be ok to merge, right?  it's not going to down the whole cluster at once or something?23:28
clarkbianw: it shouldn't because it runs as a separate process23:30
clarkbI think worse case we stop getting the stats from it23:30
clarkbbut zk itself will keep moving along23:30
ianwclarkb: don't you need to set the user *before* the CMD in ?23:33
clarkbI wasn't sure about that. gitea does it tht way too23:33
clarkbI think its ok because CMD is setting metadata and not running things23:34
clarkbyou do want to set it before any RUNs that need to run as that user23:34
ianwthe manual does say "that follow it"23:34
ianwcan we tell if it worked from ci?23:35
clarkbthats a good question. I'm not sure how much testing that gets23:36
clarkbianw: there is a check that checks we restartcount 0 on that container in testinfra23:37
clarkbhowever because I'm belts an suspendering with the user directive in docker-compose.yaml I think it may work even if the compose file is off23:37
ianwis blank, not sure if that is it's usual state however23:38
clarkbI agree it should come before23:38
clarkblet me finish the change I'm working on now then I'll fix the two user stacks for statsd23:39
opendevreviewClark Boylan proposed opendev/system-config master: Run gerritbot with a user that will be shared with matrix-gerritbot
opendevreviewClark Boylan proposed opendev/system-config master: Run matrix-gerritbot with gerritbot user
clarkbtristanC: corvus ^ you may be interestedin that stack23:47
opendevreviewClark Boylan proposed opendev/system-config master: Run haproxy-statsd as uid 1000
opendevreviewClark Boylan proposed opendev/system-config master: Run zookeeper-statsd as the zookeeper user
opendevreviewClark Boylan proposed opendev/system-config master: Update zookeeper-statsd to python3.9 on bullseye
clarkbianw: ^ should be fixed in both of those changes now. Thanks23:49
corvusClark: qq on 81676923:49
corvusmaybe the answer is matrix-gerritbot writes files, and you just wanted consistency between the two?23:50
corvusbut, actually, does it write anything?23:50
clarkbcorvus: I'm more worried about being able to read things while not letting other bots cross contaminate23:51
corvusistr tristanC felt strongly about not writing any state in matrix-gerritbot, in which case, maybe my question is relevant23:51
clarkbthe idea is that we'd do similar to the other bots then they would respect each others boundaries23:51
corvusoh, there's private keys involved23:52
corvusso root:root,0444 isn't sufficient, we need some 0400 files23:53
clarkbya exactly23:53
corvusok all caught up, sgtm, thanks!23:53
clarkbthank you for the review!23:53
opendevreviewMerged openstack/project-config master: Add support for CentOS Stream 9 in nodepool elements
opendevreviewMerged openstack/project-config master: Add centos-9-stream nodepool image

Generated by 2.17.2 by Marius Gedminas - find it at!