ianw | fungi: i'm probably just sensitive because i've been using a self-compiled git lately :) | 00:08 |
---|---|---|
fungi | hah | 00:14 |
fungi | that would certainly make me sensitive | 00:14 |
fungi | i agree about just matching on the first two or three version components thpugh | 00:14 |
Clark[m] | Ya I can update the change tomorrow to take the first 3 tuple and emit a warning if we don't determine a version | 01:02 |
*** rlandy|ruck is now known as rlandy|out | 01:52 | |
opendevreview | Merged opendev/system-config master: infra-prod: remove master override steps https://review.opendev.org/c/opendev/system-config/+/818191 | 02:43 |
*** pojadhav|afk is now known as pojadhav | 04:20 | |
*** ysandeep|out is now known as ysandeep | 05:14 | |
*** akahat|rover is now known as akahat|lunch | 08:36 | |
*** jpena|off is now known as jpena | 08:37 | |
*** akahat|lunch is now known as akahat|rover | 09:59 | |
*** ysandeep is now known as ysandeep|afk | 11:04 | |
*** rlandy|out is now known as rlandy|ruck | 11:11 | |
*** ysandeep|afk is now known as ysandeep | 12:22 | |
dtantsur | fungi: to follow-up on the yesterday's cross-project tests discussion: I've come up with https://review.opendev.org/c/openstack/ironic/+/818553/4/zuul.d/ironic-jobs.yaml and it seems to work as I wanted, even when run on the library. | 14:08 |
fungi | dtantsur: awesome! | 14:12 |
fungi | seems fairlt straightforward too | 14:12 |
fungi | er, fairly | 14:12 |
dtantsur | yeah, zuul for the win :) | 14:15 |
fungi | dtantsur: fwiw, i see other projects omit ansible_user_dir for that: https://opendev.org/openstack/horizon/src/branch/master/.zuul.d/cross-jobs.yaml#L16 | 14:17 |
dtantsur | I see. Yeah, I copy-pasted this bit from somewhere in zuul-jobs | 14:17 |
mgariepy | Clark[m], you can release the vm : openstack-ansible-deploy-infra_lxc-ubuntu-focal for https://review.opendev.org/817384 root@198.72.124.136 | 14:27 |
fungi | mgariepy: Clark[m]: i've deleted that autohold. thanks! | 14:59 |
mgariepy | thanks a lot fungi | 15:00 |
fungi | my pleasure | 15:00 |
*** rlandy|ruck is now known as rlandy|ruck|biab | 15:08 | |
*** ysandeep is now known as ysandeep|out | 15:13 | |
clarkb | fungi: I think we should be good to try https://review.opendev.org/c/opendev/system-config/+/816770 next for gerritbot user updates | 16:19 |
fungi | oh, yep, approved. looks virtually identical, i forgot that was separate | 16:20 |
*** rlandy|ruck|biab is now known as rlandy|ruck | 16:22 | |
opendevreview | Clark Boylan proposed opendev/git-review master: Fix use of removed --preserve-merges option https://review.opendev.org/c/opendev/git-review/+/818219 | 16:25 |
clarkb | fungi: ianw ^ I think that addresses the latest comments on that change | 16:26 |
opendevreview | Clark Boylan proposed opendev/lodgeit master: Update docker image to bullseye and python 3.8 https://review.opendev.org/c/opendev/lodgeit/+/818597 | 16:31 |
clarkb | fungi: ^ re bullseye we basically need a bunch of changes like that. For the most part the updates haven't been too bad. Only zuul executors ran into socat behvior changes and nodepool builders had problems with container stuff? Python apps that don't interact with the system much should be easy (like lodgeit) | 16:39 |
fungi | yeah, on the whole i expect there would be no real functional difference | 16:39 |
fungi | occasionally we'll hit things around changes in command-line options | 16:40 |
opendevreview | Merged openstack/project-config master: Retire puppet-senlin - Step 3: Remove Project https://review.opendev.org/c/openstack/project-config/+/817327 | 16:40 |
outbrito | G'day folks! Do you happen to know why zuul is not merging and I'm seeing the "submit" button on this change? https://review.opendev.org/c/starlingx/kernel/+/817140/ | 16:40 |
outbrito | I see it disabled though | 16:40 |
fungi | clarkb: why did we not need to explicitly declare the older python-builder image though? | 16:41 |
clarkb | fungi: I think that waas a bug | 16:41 |
clarkb | outbrito: you should never get a working submit button in gerrit. I think it may be showing you the button because it is submittable but you don't have permissions to do so (only zuul should have permissions for that) | 16:42 |
clarkb | which means we need to figure out why zuul isn't doing that or isn't able to | 16:42 |
fungi | outbrito: it looks like that change is based on an outdated parent | 16:43 |
fungi | its git parent is 816259,2 but someone revised 816259 without rebasing 817140 so now it can't merge | 16:44 |
fungi | if you rebase 817140 onto the master branch at this point it should work | 16:44 |
fungi | 816259,3 is what ended up merging | 16:44 |
clarkb | I guess gerrit 3.3 stopped showing you an orange warning for that | 16:44 |
fungi | i think that's what it's trying to signal by putting the (Merged) next to the parent is in red | 16:45 |
fungi | normally it would be grey/black | 16:45 |
clarkb | ah yup https://review.opendev.org/c/opendev/system-config/+/816770 shows the dark grey color for its merged parent | 16:46 |
fungi | not a good ui choice for accessibility | 16:46 |
fungi | even something common like red/green color-blindness would make that virtually impossible to notice | 16:46 |
mgariepy | clarkb, fungi would it be possible to have an auto-hold on vms that timeout for one role we have ? | 16:46 |
clarkb | mgariepy: we can only filter by project change or job. Not role if that is what you are asking | 16:47 |
*** marios is now known as marios|out | 16:47 | |
mgariepy | so you cloud filter on timeout on a specific patch ? | 16:48 |
fungi | yeah | 16:48 |
mgariepy | could** | 16:48 |
fungi | if "role" here means a particular git repository, that's doable | 16:48 |
clarkb | but it has to be the project that triggered the job not the timeout if that makes sense | 16:49 |
johnsom | Hmm, zuul status is giving me "Something went wrong", is there a restart going on? | 16:49 |
opendevreview | Merged opendev/system-config master: Run matrix-gerritbot with gerritbot user https://review.opendev.org/c/opendev/system-config/+/816770 | 16:49 |
clarkb | I too get the something went wrong | 16:49 |
clarkb | and now it is back | 16:49 |
mgariepy | ok i'll think about it and see what we should do. | 16:50 |
clarkb | johnsom: its probably a bug in zuul-web dealing with updating configs/layouts | 16:50 |
clarkb | johnsom: the service itself seems to be fine though. | 16:50 |
johnsom | Or a mis-configured health monitor on the LB pool? | 16:50 |
clarkb | johnsom: it isn't an LB pool | 16:50 |
johnsom | Well, there is your problem. GRIN | 16:50 |
clarkb | everything is active active active active and needs to deal with locks and such properly | 16:51 |
clarkb | and currently your webbrowser talks to a single web frontend | 16:51 |
clarkb | Basically I think it is a bug but only in rendering the info to the end user. The actual zuul processing in the background seems to be happy. And if you wait 30 seconds it resolves itself | 16:52 |
johnsom | Yeah, it looks like the job I was looking for started even though I couldn't see it | 16:52 |
fungi | the zuul-web logs have a bunch of deserialization exceptions | 16:53 |
fungi | json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) | 16:54 |
clarkb | ya so it probably read data before it was properly written. I'm guess we need to add a lock to something to avoid that | 16:54 |
fungi | also kazoo.exceptions.NoNodeError | 16:54 |
clarkb | *I guess | 16:54 |
fungi | or just avoid invalidating the cache until it gets a good read | 16:55 |
fungi | i also see some KeyError: 'change_queues' | 16:56 |
fungi | all of these are potentially the same underlying cause though | 16:56 |
fungi | the KeyError: 'change_queues' usually follows a kazoo.exceptions.NoNodeError though sometimes i see kazoo.exceptions.NoNodeError without the subsequent KeyError | 16:57 |
clarkb | I think change_queues is a db record that is kept separate for performance reasons. It wouldn't surprise me if we aren't handling its specialness properly in zuul web | 16:58 |
outbrito | fungi, will try, tks | 17:08 |
opendevreview | Clark Boylan proposed opendev/system-config master: Switch lodgeit to run under a dedicated user https://review.opendev.org/c/opendev/system-config/+/818606 | 17:12 |
clarkb | fungi: ^ that just reported in matrix testing channel. Lets confirm the matrix gerritbot restarted | 17:12 |
clarkb | it hasn't updated its docker compose yet or restarted | 17:14 |
clarkb | I guess it did gerritbot first and I need to be patient :) | 17:14 |
clarkb | oh I see, the deploy job that was running was for a project-config update and since we dno't update to master in deploy we weren't running with latest there | 17:14 |
clarkb | we restarted gerritbot for a new project add? But the currently running job is for the matrix-gerritbot update so we should see that update shortly | 17:15 |
fungi | oh, yeah that explains it | 17:15 |
clarkb | ok matrix updte failed beacuse we actually rewrite the config with an ansible task and that couldn't overwrite the existing file bceause it is 644 by root | 17:21 |
clarkb | I'm going to manually chown the file then the next hourly run should sort us out | 17:21 |
clarkb | thats done. I need to find breakfast but will check in on this again later (and the hourly runs should update it I think) | 17:23 |
fungi | oh, and the ansible task runs after? | 17:26 |
clarkb | ya this is a pre step to make a config for the bot | 17:29 |
clarkb | since the bot takes a dhall config but we want ot maintain yaml configs for humans | 17:29 |
*** jpena is now known as jpena|off | 17:41 | |
opendevreview | Ghanshyam proposed opendev/irc-meetings master: Remove Technical committee office hours https://review.opendev.org/c/opendev/irc-meetings/+/818613 | 17:50 |
clarkb | Warning: Could not get or create the default cache directory: <- matrix-gerritbot is unhappy | 17:51 |
clarkb | unfortunately that string doesn't seem to show up in the matrix-gerritbot source so I'm not sure what the default cache directory is | 17:55 |
clarkb | tristanC: ^ do you know what the cache directory is? | 17:57 |
clarkb | my hunch is that HOME=/root here https://github.com/softwarefactory-project/gerritbot-matrix/blob/master/flake.nix#L62 is the problem | 18:03 |
clarkb | and its trying to write to $HOME/.config or some such | 18:03 |
clarkb | I'll push up a partial revert for now | 18:03 |
fungi | could we override $HOME when starting the container? | 18:04 |
clarkb | fungi: yes, we can, but I don't know to what value. I think we should consider only consuming docker images that are built with standard tools. The nix stuff is hard to process | 18:05 |
clarkb | (and build our own matrix-gerritbot image if that is necessary) | 18:05 |
fungi | ahh, yeah | 18:05 |
clarkb | fungi: we could try /tmp maybe | 18:05 |
fungi | using docker to build an image ourselves is probably the most straightforward | 18:05 |
clarkb | fungi: do you want me to try overriding to /tmp by hand before we try the partial revert? | 18:06 |
fungi | i suppose it can't hurt | 18:06 |
clarkb | same error | 18:07 |
fungi | huh, in further exercising the new pep-517 pbr version, i see newer interpreters are complaining that pbr doesn't explicitly close manifests it reads | 18:07 |
clarkb | I've manually done the partial revert (just commented out the user directive in docker-compose.yaml) and it seems happier. I'll push the revert up which should confirm it is happier too | 18:09 |
fungi | thanks | 18:09 |
opendevreview | Clark Boylan proposed opendev/system-config master: Partial revert of matrix-gerritbot user change https://review.opendev.org/c/opendev/system-config/+/818618 | 18:11 |
clarkb | that showed up in the testing channel in matrix I think the revert is sufficient | 18:12 |
clarkb | er I mean the partial revert | 18:12 |
clarkb | fungi: I'll happily review/write PBR updates to use with open(foo) as bar: context managers or similar to fix that | 18:13 |
clarkb | just point me at the location | 18:13 |
fungi | yeah, once i have a handle on where it's complaining about, i'll push some up | 18:13 |
clarkb | tristanC: to summarize matrix-gerritbot can't get or create the default cache location when we override the user. It doesn't tell us what the location is. If you can help us understand that better we'd appreciate it | 18:13 |
fungi | i think it's just find_sources and get_version from pbr.packaging where i'm running into it, but i'll try to make sure whatever i put together is comprehensive since i'm sure i'm not exercising every last code path in pbr here | 18:15 |
clarkb | ya I guess we can grep for open() and then update all occurenaces easily enough | 18:15 |
fungi | confirmed adding {toxinidir} to deps on all testenvs where i was previously relying on usedevelop is still working too | 18:22 |
fungi | i always worry that just . won't work as expected if tox is called from somewhere which isn't the root of the repo | 18:23 |
clarkb | oh good point | 18:24 |
clarkb | I'll update my example bindep change to do that | 18:24 |
fungi | i added a comment on the bindep change just so we don't forget | 18:24 |
opendevreview | Clark Boylan proposed opendev/bindep master: Try out PBR pep 517 support https://review.opendev.org/c/opendev/bindep/+/816741 | 18:26 |
clarkb | that should do it | 18:26 |
fungi | well, good news, the vast majority of open() calls in pbr are already using context managers, and most of the remainder are in tests. i only see one other obvious case besides the two i was hitting in packaging.py | 18:26 |
fungi | oh, in fact i misread, the other one i thought i saw was a subprocess.Popen() it just matched my naive grep for open( | 18:29 |
fungi | one of the two i hit is easy enough to fix, the other will be tricky since it's an open() inside a try/except | 18:32 |
clarkb | fungi: you should be able to use with there or a finally? | 18:32 |
clarkb | (then guard againstalready closed fd) | 18:32 |
fungi | yeah, i can also explicitly close it in another try i guess | 18:32 |
fungi | maybe i can just combine these two try/except blocks? https://opendev.org/openstack/pbr/src/branch/master/pbr/packaging.py#L825-L832 | 18:35 |
fungi | then i can do a with inside the try | 18:35 |
fungi | and catch (IOError, OSError, email.errors.MessageError) | 18:35 |
clarkb | fungi: I would try: with open(filename, 'r') as pkg_metadata: and then catch whatever needs catching from that | 18:37 |
fungi | something like https://review.opendev.org/818622 | 18:37 |
clarkb | its always a continue | 18:37 |
fungi | yeah, i suppose i could nest the try blocks as an alternative | 18:38 |
clarkb | no you did what I was thinking about | 18:38 |
fungi | yeah, this seems more concise | 18:38 |
fungi | should i fix up the tests to not leave open descriptors too, or will anyone likely care? | 18:39 |
clarkb | I've figured out the rough area of matrix-gerritbot that is hainv problems. It is after we connect to matrix and validate our session | 18:39 |
clarkb | I think it is the gerrit connection that is breaking because we also log after gerrit connects that matrix is ready | 18:40 |
clarkb | however haskell does delayed execution so this might be flawed analysis | 18:42 |
clarkb | aha I was reading it wrong. I think it is in the joinroom area of the code beacuse we validate session then join room and we don't get room join logs | 18:49 |
fungi | clarkb: aargh, moving target! https://setuptools.pypa.io/en/latest/history.html#v59-1-0 | 19:00 |
fungi | "Back out deprecation of setup_requires and replace instead by a deprecation of setuptools.installer and fetch_build_egg. Now setup_requires is still supported when installed as part of a PEP 517 build, but is deprecated when an unsatisfied requirement is encountered." | 19:00 |
clarkb | I guess that means you still have to specify it in pyproject.toml so that things line up | 19:01 |
fungi | yeah | 19:01 |
clarkb | good to know, I think we support that just fine | 19:01 |
clarkb | I'm having a really hard time finding anything that would need a cache directory in matrix gerrit bot so far. | 19:01 |
clarkb | It uses in memory "databases" to store things like room info. It forks `ssh` directly | 19:02 |
clarkb | It might be the matrix library? | 19:02 |
fungi | that certainly seems possible | 19:04 |
clarkb | I appreciate that python tends to do a better job of identifying the origins of log messages | 19:05 |
opendevreview | Merged opendev/system-config master: Partial revert of matrix-gerritbot user change https://review.opendev.org/c/opendev/system-config/+/818618 | 19:11 |
clarkb | tristanC: maybe add -prof to the cabal compile options then we can run the executable with -xc for problems like this? I'm not sure what the -prof impact at runtime is but I imagine its small if you have to explicitly set -xc on the executable to get that info back? | 19:19 |
clarkb | I feel like I'm reaching the end of my ability to debug this as I don't intend on pulling in nix to build this image. | 19:19 |
clarkb | it just occured to me that I could lsof the running process to see what cache it might be opening | 19:25 |
clarkb | I'll do that | 19:25 |
fungi | oh, yep now that it's running | 19:26 |
fungi | assuming it holds an open descriptor to its cache anyway | 19:26 |
fungi | (it may not) | 19:26 |
fungi | but worth a shot | 19:27 |
clarkb | ya I'm not seeing anything that could be the cache | 19:27 |
fungi | so it probably only opens things there on demand | 19:27 |
clarkb | ya its got fd 0 on /dev/null 1 and 2 on pipes some event loop kernel fds and then tcp sockets | 19:29 |
clarkb | nothing looks like an on disk caceh | 19:29 |
clarkb | I could probably strace it and hope to filter out the noise somehow to find reads/writes to a cache | 19:30 |
clarkb | but that seems iffy | 19:30 |
clarkb | This has led me to suspecting it might be the prometheus health endpoint that is doing caching as that runs a webserver | 19:35 |
tristanC | clarkb: catching up, let me see if i can reproduce locally | 19:46 |
clarkb | tristanC: I put the exact output in the comments of https://review.opendev.org/c/opendev/system-config/+/818618/1/playbooks/roles/matrix-gerritbot/templates/docker-compose.yaml.j2 | 19:47 |
clarkb | my best guess at this point is that it is related to the web server for prometheus metrics. Otherwise I'm not really finding anything that might be trying to cache stuff. But I'm also not finding anything indicating the web server there is caching (based on lsof and my hitting it manually) | 19:48 |
tristanC | clarkb: i see thanks. So the error couldn't be printed because of a missing utf-8 locale, and it would have showed `$HOME/.cache/dhall` | 19:48 |
clarkb | tristanC: ok we overrode $HOME to be /tmp and that didn't help | 19:49 |
clarkb | I would expect that /tmp would be writable by all users on the image but I'm probably making a bad assumption because nix | 19:49 |
clarkb | tristanC: but also that string isn't utf8 it is ascii? shouldn't putchar be fine with ascii? | 19:50 |
tristanC | clarkb: the error message is using utf-8 character | 19:50 |
clarkb | I guess that comes after what I got since it failed | 19:52 |
clarkb | and is in addition to `$HOME/.cache/dhall` | 19:52 |
tristanC | clarkb: ftr it is https://github.com/dhall-lang/dhall-haskell/blob/30f96178fce9d0bcafc74812df73e46fb66febd3/dhall/src/Dhall/Import.hs#L938 | 19:54 |
clarkb | it is interesting that google hasn't indexed that string | 19:55 |
clarkb | (I tried googling it several different ways before giving up, probably too much source code out there to index entirely) | 19:55 |
clarkb | tristanC: any idea why setting $HOME to /tmp in the docker-compose.yaml didn't correct this? | 19:57 |
clarkb | (we assumed something might be trying to write to $HOME/.cache which is why we tried taht0 | 19:57 |
tristanC | clarkb: there is no /tmp in the image | 19:58 |
clarkb | of course not | 20:00 |
clarkb | tristanC: would it be crazy to suggest that using slightly bulkier images that are possible to debug and build locally using normal tools is a good idea? | 20:00 |
clarkb | I appreciate the nix image is super minimal but that makes it very difficult to debug and it uses very specialized tools to do something that doesn't necessarily benefit from that | 20:01 |
clarkb | the image also sets a bash prompt but bash isn't even installed | 20:01 |
clarkb | we should be able to `cabal build` on something like debian right? | 20:02 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire training-labs: remove project infra https://review.opendev.org/c/openstack/project-config/+/817507 | 20:03 |
clarkb | I think OpenDev should probably consider doing that at least. | 20:03 |
clarkb | Then we won't have to worry about /tmp or bash or utf8 | 20:04 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire training-labs: remove project infra https://review.opendev.org/c/openstack/project-config/+/817507 | 20:07 |
clarkb | I guess the solution here would be to mount something to /root/.cache with appropriate permissions? | 20:07 |
clarkb | assuming no changes to the image | 20:07 |
clarkb | Hrm but /root is likely to not be o+x | 20:07 |
clarkb | that probably won't work either | 20:08 |
opendevreview | Clark Boylan proposed opendev/system-config master: Give matrix-gerritbot a writeable cache https://review.opendev.org/c/opendev/system-config/+/818627 | 20:16 |
clarkb | that seems really hacky but might work? | 20:16 |
opendevreview | Ghanshyam proposed openstack/project-config master: Remove 'publish-training-labs-scripts' definition https://review.opendev.org/c/openstack/project-config/+/818628 | 20:17 |
tristanC | clarkb: i can reproduce the error and i'll provide a fix in the image. Are the host file sharing the same uid as the container one right? | 20:18 |
clarkb | tristanC: they are. I'm not sure what we need you to hardcode the uid in the image, we just need to be able to have enough of a normal filesystem that we can bind mount appropriately | 20:19 |
opendevreview | Ghanshyam proposed openstack/project-config master: Remove 'publish-training-labs-scripts' definition https://review.opendev.org/c/openstack/project-config/+/818628 | 20:19 |
clarkb | The underlying problem here seems to be we've overly minimized (lack of utf8 locale and lack of expect filesytem locations) | 20:19 |
clarkb | and that is made worse by having a bunch of non standard tools. I am really surprised that dhall needs to write to disk | 20:20 |
clarkb | er * Not sure that we need you to hardcode the uid in the image. s/what/that/ | 20:20 |
tristanC | clarkb: i meant with podman, when using the `--user $(id) --volume $HOME/.ssh:/root/.ssh` then the .ssh directory in the container is still owned by root | 20:22 |
clarkb | tristanC: yes because it is under /root | 20:22 |
clarkb | that was my point above we can't mount to /root/.cache. But we could mount to /tmp | 20:23 |
clarkb | or just use /tmp and not cache since it is all epehermal anyway | 20:23 |
tristanC | clarkb: i meant without `--userns keep-id`, which i assume is what docker set by default | 20:23 |
tristanC | clarkb: you can mount in /root, the folder actually doesn't exist | 20:23 |
clarkb | tristanC: is dhall creating it in that case? | 20:23 |
clarkb | well no it can't be because then it wouldn't error | 20:24 |
clarkb | we error because that directory is not readable | 20:24 |
tristanC | i think it happens when the runtime is creating the parent directory | 20:24 |
clarkb | but the runtime would be the non root user and create it as itself if that were the case | 20:24 |
clarkb | but we have strong evidence that directory is not readable | 20:24 |
tristanC | iirc, when bind mounting to /a/b, if /a doesn't exists it get created as root by default | 20:25 |
clarkb | ah | 20:25 |
clarkb | the docker runtime not the haskell runtime | 20:25 |
clarkb | tristanC: thinking out loud here: is there any way to tell dhall to not cache to disk? | 20:28 |
clarkb | since this is a container caching to disk is as ephemeral as the process so caching to memory seems fine | 20:28 |
fungi | so with latest pbr i still see setuptools complain about calls to setup.py install, originating from pbr.util.setup_cfg_to_setup_kwargs here: https://opendev.org/openstack/pbr/src/branch/master/pbr/util.py#L407 | 21:04 |
fungi | this is the full traceback: https://paste.opendev.org/show/811204 | 21:08 |
fungi | clarkb: does that make any sense to you? | 21:08 |
Clark[m] | We might need to avoid instantiating the class to get around the warning? That seems odd though. Does bindep do that? | 21:15 |
Clark[m] | Might be config specific if not? I'm popping out for a bike ride now but can look closer after | 21:15 |
fungi | i'll try to reproduce with bindep in a bit | 21:17 |
fungi | but yeah it seems to happen when pip calls on setuptools to parse setup.cfg | 21:18 |
tristanC | clarkb: in that case, dhall is just issuing a warning that it can't access the cache folder, and the unicode character makes the print fails. But I think the main issue is that the HOME directory is not writable when setting an arbitrary user. | 21:29 |
tristanC | moreover, when using openssh, the .ssh location is resolved through /etc/passwd with a default to `/.ssh` | 21:32 |
tristanC | so i think i know how to slightly adjust the image so that it can works with arbitrary uid | 21:33 |
opendevreview | Tristan Cacqueray proposed opendev/system-config master: Update the gerritbot-matrix image to support arbitrary uid https://review.opendev.org/c/opendev/system-config/+/818645 | 21:46 |
tristanC | clarkb: i'm sorry this caused so much trouble and i hope 818645 should enable what you are trying to do. | 21:48 |
tristanC | clarkb: and of course you can use `cabal build` to build the gerritbot-matrix binary, but i think the dockerfile will need a similar trick to support arbitrary uid | 21:58 |
clarkb | tristanC: thanks. Its mostly that I question the utility of some of these decisiosn as they seem at odds with one another. The minimal image build doesn't seem to get along with dhall (and I guess openssh?) | 22:56 |
clarkb | and if not doing a minimal image build makes sense for the software then I question why use nix to build the image | 22:56 |
clarkb | and I don't think anything prevents us from running cabal in a Dockerfile? | 22:58 |
clarkb | fungi: ok so the issue is that initialize_options is where setuptools raises the deprecation warning beacuse I guess that implies you're calling it on the command line? THat surprises me a little, but I think we work backward from that to figure out how to bypass it with pbr | 22:59 |
clarkb | fungi: I think this is only an issue if using https://docs.python.org/3/distutils/extending.html#integrating-new-commands | 23:02 |
clarkb | fungi: I don't think bindep has this problem because it doesn't extend setup.py this way | 23:02 |
clarkb | fungi: do you have a link to the repo you're hitting this with? | 23:02 |
clarkb | but basically cmdclass is deprecated aiui because you have to run setup.py to hit it ratherthan say build | 23:02 |
clarkb | I think that means this is expected | 23:03 |
tristanC | clarkb: i would say the benefit of nix container is two folds: it declares all the dependencies in a reproducable setting (think base image + bindep + requirements.txt), and sharable layers (each dependency is a single layer) | 23:33 |
clarkb | tristanC: I think the second thing only really matters if you're doing a lot of nix containers right? For example in opendev's case this is our only nix container image so we get all the layers and no deduping for additional images | 23:34 |
clarkb | But you get the deduping using a consistent base image like opendev does anyway | 23:34 |
tristanC | clarkb: it does matters even for a single image where update will only pulls missing layers | 23:35 |
clarkb | the strict control over all the deps is a neat feature of nix. I'm just not sure if gets us much here for a simple service like this. Cabal is capable of pinning things too right? then you're only dealing with the distro ghc and openssh | 23:35 |
fungi | clarkb: it's not extending setup.py, and this was just the pip install tox was doing | 23:35 |
clarkb | fungi: I think your setup.cfg sets a cmdclass value | 23:36 |
tristanC | clarkb: and you can build gerritbot-matrix differently if you prefer, but you would need a similar trick for the home user dir so that it can work with arbitrary uid | 23:36 |
clarkb | fungi: and cmdclass extends setup.py and pbr is trying to make that happen | 23:36 |
fungi | clarkb: it's here: https://mudpy.org/gitweb?p=mudpy.git;a=blob;f=setup.cfg;h=1cbd5501ce8ceecf677085c4272c76468dacc015;hb=HEAD | 23:36 |
clarkb | tristanC: yup I'm trying to work that through in my head. I'm beginning to think it might be a reaosnable thing for us to do for consistency with our images | 23:36 |
tristanC | clarkb: having all the deps frozen is helpful to ensure the image can build in the far future | 23:36 |
clarkb | tristanC: it also ensures that you're not getting security updates | 23:37 |
tristanC | clarkb: right, so instead of updating a comment in a dockerfile to get a new build, you would update the repository commit instead | 23:38 |
clarkb | tristanC: you'd also need to unpin things | 23:38 |
tristanC | clarkb: here is an example dockerfile we use for another cabal base application: https://github.com/change-metrics/monocle/blob/master/Dockerfile-api | 23:38 |
clarkb | but I guess you can do that in the same commit | 23:38 |
clarkb | tristanC: how did the image update for your change above? I don't see the updated flake.nix in the github repo. Maybe that is just a sync problem though | 23:40 |
tristanC | clarkb: nix flake update is the command to update dependencies, and you can do a tree diff to see what exactly changes | 23:41 |
clarkb | re layer splits for updates. I'm not sure there is a ton of value in that. Yes, you'll avoid some network traffic but again that really only matters if you are doing significant numbers of updates that represent large amounts of data | 23:43 |
clarkb | It is "neat" but I don't think it ie necessary when you update an image once a week or less | 23:43 |
clarkb | and only have a handful of images that share those layers | 23:43 |
clarkb | our base debian images with python in them are like 200MB total | 23:44 |
clarkb | If we pull that once a week on a number of servers it isn't a big deal | 23:44 |
clarkb | Basically I'm trying to optimize for simplicity and easy of use. Not for deploying massive amounts of software frequently to large datacenters. There are different needs. | 23:45 |
clarkb | Nix would probably do well if you had hundreds of releases a day hitting tens of thousands of nodes | 23:45 |
clarkb | and you'd accept the complexity and divergence from expected norms as those optimizations become important for you | 23:45 |
clarkb | fungi: thats interesting because the pbr code is executing that path when you've set [global] commands if I'm reading it correctly | 23:49 |
clarkb | and translating that to cmdlcass | 23:49 |
clarkb | fungi: what I did in the past was preinstall pbr and then told build to not use isolated build environments. Then I could instrument the pbr installation to sort out what was going on. Might need to do that here | 23:52 |
clarkb | to see what sorts of values are being handled there to work backwards and figure it out | 23:52 |
clarkb | I wonder if you're hitting it in a dependency? | 23:52 |
fungi | i doubt it's a dependency (the dependencies are listed there in the setup.cfg, passlib and pyyaml) | 23:54 |
fungi | but yeah, first i'll try turning on warnings in bindep and see what i can reproduce with it | 23:55 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!