*** rlandy is now known as rlandy|out | 00:29 | |
*** clarkb is now known as Guest298 | 01:19 | |
*** Guest298 is now known as clarkb | 01:20 | |
*** atmark is now known as Guest305 | 02:10 | |
*** yadnesh|away is now known as yadnesh | 04:14 | |
Tengu | clarkb: need to read some doc about what's done for pypi in the proxy thing, but I think I get it, more or less. basically I'll have to get the S3 URI, and call the "substitute" in order to rewrite it to some "ansible-galaxy-files" location, to match a new "endpoint" in the proxy config. | 07:57 |
---|---|---|
Tengu | I'll work on that. | 07:57 |
Tengu | hey. wait. | 07:59 |
Tengu | actually.... there's ALREADY an endpoint! | 07:59 |
* Tengu dumb for not checking beforehand | 07:59 | |
Tengu | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L127-L133 | 08:00 |
Tengu | fungi: :) you actually created -^ via change-id Ib5664e5588f7237a19a2cdb6eec3109452e8a107 | 08:01 |
*** yadnesh is now known as yadnesh|afk | 08:11 | |
*** jpena|off is now known as jpena | 08:23 | |
*** yadnesh|afk is now known as yadnesh | 08:27 | |
*** rlandy|out is now known as rlandy | 11:05 | |
*** dviroel|afk is now known as dviroel | 11:12 | |
*** frenzy_friday is now known as frenzy_friday|rover | 12:15 | |
fungi | Tengu: somehow this doesn't surprise me | 12:42 |
Tengu | fungi: same :) | 12:43 |
Tengu | fungi: go get your coffee first :] | 12:43 |
fungi | aha, yes i guess the tripleo team asked to have it added roughly a year ago | 12:43 |
Tengu | sounds like something matching the votes :) | 12:43 |
Tengu | and they never put it to use. | 12:43 |
fungi | so this means they didn't end up using it? i wonder why | 12:43 |
opendevreview | Merged openstack/project-config master: Use kolla.config for kolla-ansible in gerrit https://review.opendev.org/c/openstack/project-config/+/865686 | 12:50 |
fungi | Tengu: i guess test it and make sure it's working, so we can adjust it | 12:51 |
Tengu | fungi: yeah, I'll talk with them today during the community call :) | 12:51 |
Tengu | fungi: I've pushed this https://review.opendev.org/c/opendev/base-jobs/+/865970 to make the ansible proxy more "visible" | 12:52 |
fungi | Tengu: would https work better? i have no idea if the ansible-galaxy tool cares either way | 12:57 |
Tengu | fungi: I didn't hit such issue over the testing, but maybe switching to tls would be better. | 12:57 |
Tengu | especially since the certificate is valid | 12:58 |
fungi | we added let's encrypt to our mirrors more recently than we set those existing envvars in the base job, but if the tool is happy either way it probably doesn't matter | 12:58 |
Tengu | bah, let's switch to TLS | 12:58 |
Tengu | it's always better imho. | 12:58 |
Tengu | and future-proof | 12:59 |
Tengu | TLS is in the 4443, isn't it? | 12:59 |
fungi | no, just the regular 443 | 12:59 |
Tengu | really? | 12:59 |
Tengu | fungi: the comment in the mirror config seems to state otherwise... ? | 13:01 |
Tengu | # Dedicated port for proxy caching, as not to affect afs mirrors. and 8080, 4443 | 13:01 |
Tengu | (among things) | 13:01 |
Tengu | fun... | 13:02 |
Tengu | oh. ok. /galaxy/ is defined in the BaseMirror | 13:02 |
fungi | the test_galaxy_mirror test added in the change you referred to just connects to "https://%s/galaxy/" % addr where addr is just a raw ip address | 13:02 |
Tengu | yep | 13:02 |
Tengu | I wanted to double-check with the apache config itself. | 13:03 |
Tengu | now I get it: BaseMirror macro defines the galaxy, and is called for 80 and 443 | 13:03 |
fungi | the reason we use that BaseMirror macro is so that we can serve the same things through http on 80 and https on 443 without duplicating the configuration | 13:03 |
Tengu | same goes for the ProxyMirror macro, but for other ports. | 13:03 |
Tengu | it's a nice feature from httpd | 13:04 |
fungi | all the higher numbered ports are for "special" things which can't have subpaths relative to the root path | 13:04 |
fungi | we try not to add those when we can help it | 13:04 |
fungi | but some tools are a bit braindead in their assumptions | 13:04 |
Tengu | heh - no wonder. | 13:05 |
Tengu | I updated my patch to reference https:// and removing the port. | 13:05 |
Tengu | good catch anyway, because the :8080 would fail anyway. | 13:05 |
*** dasm|off is now known as dasm | 13:05 | |
fungi | ahh, yeah, i didn't even spot the :8080! | 13:06 |
* fungi takes another gulp of coffee | 13:06 | |
Tengu | ;) | 13:08 |
Tengu | and I guess we can merge my NetworkManager thingy? 3x +2 is good | 13:08 |
Tengu | ah, thanks fungi :). Also thanks for the "-print" vote. | 13:18 |
Tengu | I forgot about that one actually :] | 13:18 |
fungi | yeah, i still think the df that one adds won't tell you much new since we already log a df (and df -i) at the start of every job | 13:19 |
Tengu | I can remove it | 13:19 |
opendevreview | Merged openstack/project-config master: Ensure NetworkManager doesn't override /etc/resolv.conf https://review.opendev.org/c/openstack/project-config/+/865433 | 13:19 |
fungi | running a df after the mv might give you more insight | 13:19 |
Tengu | let's do that! | 13:20 |
fungi | since then you can compare against the one from job start | 13:20 |
Tengu | lemme correct/amend. | 13:20 |
opendevreview | Cedric Jeanneret proposed openstack/openstack-zuul-jobs master: Add some output to the `find' command https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/865383 | 13:22 |
Tengu | better. | 13:22 |
Tengu | fungi: also updated the commit message to mention the zuul-info | 13:22 |
*** frenzy_friday|rover is now known as frenzy_friday|rover|food | 13:43 | |
Tengu | fungi: what's the ETA to get the first nodepool images built with the NetworkManager config running in the CI? | 13:45 |
fungi | Tengu: images are rebuilt ~daily, and you can see the list of built images at http://nl01.opendev.org/dib-image-list while the list of uploaded images in each provider is http://nl01.opendev.org/image-list | 13:49 |
Tengu | ah, cool! thanks | 13:49 |
fungi | Tengu: if you want to see the build logs for a particular image, identify the builder it was built on from the dib-image-list and then go to it in a browser, like https://nb01.opendev.org/ | 13:50 |
Tengu | wow. that's neat! | 13:51 |
fungi | i think the zuul info we log from each build may also embed image ids for the nodes, looking now... | 13:51 |
Tengu | I think I've seen it in the zuul-info/ | 13:51 |
Tengu | fungi: the "age" is Day:Hours:Minutes:Seconds I guess? | 13:52 |
Tengu | yep, looks like so | 13:52 |
fungi | correct | 13:53 |
Tengu | seems there are some stalled in "deleting" state :/ | 13:53 |
fungi | and no, i can't seem to find the image id in the logged zuul-info, but if i'm not overlooking it then maybe that's something worth adding | 13:54 |
fungi | Tengu: a fun fact about image deletion. if you use boot from volume for a server instance, you can't delete the image while the server is still running. if a node is held in such a provider, or stuck deleting, then the image it was booted from can't be deleted | 13:55 |
fungi | we go through and try to clean them up manually from time to time | 13:55 |
Tengu | fungi: erf.. | 13:55 |
Tengu | fungi: so we have the "image-hostname" alongside dib-builddate: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_be0/863872/14/check/tripleo-ci-centos-9-standalone/be07f2c/zuul-info/zuul-info.primary.txt | 13:55 |
Tengu | that's the colses I seem to be able to find. | 13:56 |
Tengu | *closest | 13:56 |
fungi | right, the dib-builddate could be used to get us close enough to identifying the image used | 13:57 |
fungi | though actually logging the image id would be even better | 13:57 |
Tengu | i.e. generate the image-id before the actual build, inject it, and use that id while uploading? | 13:57 |
fungi | more likely plumb it back through the node request to the zuul scheduler and add it to the inventory | 13:59 |
Tengu | 'k. well - I don't know how things are piped in there ;) | 14:09 |
*** dviroel is now known as dviroel|lunch | 16:11 | |
*** frenzy_friday|rover|food is now known as frenzy_friday|rover | 16:21 | |
clarkb | vishalmanchanda: ok updated zuul-jobs patch pushed. We can recheck your change once that comes back green | 16:29 |
vishalmanchanda | clarkb: sure, thanks. | 16:29 |
Tengu | clarkb: heya! just saw your comment about the env var for ansible-galaxy proxy - there are some ansible variables already available somewhere? | 16:40 |
clarkb | Tengu: not for galaxy as far as I know. But other things like distro packages mirrors and pypi mirror and so on have roles that configure them | 16:41 |
clarkb | Tengu: there is the base mirror fqdn and the nthe roles tack on the service specific bits and configure them | 16:41 |
clarkb | let me find an example of that | 16:42 |
Tengu | hmm. care to show me? if it's just a matter of adding a role somewhere, and call it, I'd be more than happy | 16:42 |
Tengu | note that tripleo is also using RDO, so maybe that's why our jobs are relying on that "old" file exposing env vars? | 16:42 |
clarkb | https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/defaults/main.yaml#L2-L3 | 16:43 |
Tengu | oh, and then it's used in https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/tasks/mirror.yaml | 16:44 |
Tengu | oook. | 16:44 |
Tengu | and, provided configure-mirror role is called from within the job, we'll get the proper config directly.. ? | 16:44 |
clarkb | for the things that role configures | 16:45 |
clarkb | I don't think galaxy should be configured by that role | 16:45 |
Tengu | i.e. I can ini_file /etc/ansible/ansible.cfg, and add the galaxy.server key and be off with that? | 16:45 |
clarkb | but I wanted to show you an example how you can use the base mirror fqdn to construct a mirror location in an ansible role | 16:45 |
Tengu | 'k | 16:45 |
clarkb | (reall I wish pypi wasn't configured by that role and it only did distro mirrors, but that is a historical artifact that is difficult to change now) | 16:46 |
Tengu | zuul_site_mirror_fqdn is something that exists and is available then? | 16:46 |
clarkb | yes, we set it in opendev. That role is expected to be generic enough to run when it isn't set though hence the omit check | 16:46 |
Tengu | ok. I'll consider it then | 16:46 |
Tengu | just need to make thing that's compatible with RDO infra as well | 16:46 |
clarkb | vishalmanchanda: the zuul-jobs update is green | 16:52 |
vishalmanchanda | clarkb: ack. | 16:53 |
*** dviroel|lunch is now known as dviroel | 17:12 | |
*** jpena is now known as jpena|off | 17:17 | |
clarkb | Tengu: fungi: I've been looking at the /opt move and supposedly rsync might be quicker? That doesn't delete on the source which we also want though | 17:55 |
clarkb | I wonder if the speed ends up equivalent once you add in the delete step after copying | 17:55 |
clarkb | we can test this | 17:57 |
fungi | yeah, that change was more just to get some indication of the current performance before experimenting with alternatives | 17:58 |
clarkb | oh is there an existing change? | 17:58 |
opendevreview | Clark Boylan proposed openstack/openstack-zuul-jobs master: Test /opt move using rsync https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/866054 | 18:08 |
clarkb | fungi: Tengu ^ more debugging | 18:08 |
clarkb | is there a change depending on the parent that I can update? | 18:09 |
clarkb | https://review.opendev.org/c/openstack/devstack/+/858996 is a devstack change I already had for similar purposes I've updated it | 18:14 |
frickler | just note that in general performance of our nodes seems to vary by +/- 50%, so comparing performance needs a large sample size | 18:22 |
clarkb | yup | 18:24 |
clarkb | mtreinish had good data on this once upon a time too. And the variance is crazy | 18:24 |
clarkb | even when you only look at nodes in a single provider | 18:24 |
fungi | clarkb: Tengu's change is https://review.opendev.org/865383 Add some output to the `find' command | 18:24 |
frickler | the other question is do we really need to free the space on /? otherwise we could consider moving /opt/git to /srv/git or whatever and just symlink to that? | 18:28 |
clarkb | frickler: jobs hit the 20gb limit on rax all the time | 18:29 |
clarkb | even with clening out the 10gb of /opt | 18:29 |
clarkb | the problem is that /var is used by journald and docker and so on | 18:30 |
clarkb | makes it really easy to fill a few gigabytes on / | 18:30 |
frickler | hmm, from the flavor I see we should have 40G as root disk, where do you see 20G? | 18:40 |
clarkb | hrm I thought it was 20GB maybe that is what we end up with free and not total size | 18:41 |
clarkb | another thing we can/should look at is trimming the contents of /opt | 18:41 |
clarkb | the bulk of the data there is git repos and maybe we've got some git repos we can prune out | 18:41 |
clarkb | also maybe the cirros images and friends can be reduced (they are very small already) | 18:41 |
frickler | on a random node I see 29G of 37G free after the move to /opt has happened. /opt has 13G used. if 16G on / aren't enough (without rming), then IMO jobs need to be fixed | 18:47 |
frickler | or we need to declare rax unusable for that kind of jobs | 18:47 |
clarkb | "/bin/sh: 5: time: not found" fyi | 18:48 |
clarkb | frickler: its 16GB after rming though right? | 18:48 |
opendevreview | Clark Boylan proposed openstack/openstack-zuul-jobs master: Test /opt move using rsync https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/866054 | 18:49 |
frickler | no, after rming we have 29G free on /. not sure about the original usage, but it can have been at most the 13G now used on /opt | 18:50 |
clarkb | fwiw the /opt move is limited to openstack jobs. Its not something we do globally in base jobs | 18:59 |
clarkb | I guess with a bit of testing for explosions we might be able to remove it for openstack as well. But the potential blast radius is quite large | 19:00 |
mtreinish | clarkb: I think I have a subunit2sql db archived somewhere if people want hard numbers from like 4-5 yrs ago :) | 19:33 |
mtreinish | looking through old presentations on the topic I had this image in a slide: https://blog.kortar.org/wp-content/uploads/2022/11/runtime_variance.png | 19:47 |
mtreinish | but I don't remember the context of exactly it was graphing (and the details aren't in the slide besides just saying "Runtime variance") | 19:47 |
mtreinish | I assume it's just of a random tempest test across all gate runs based on the y axis | 19:48 |
Tengu | clarkb: ah, i was thinking about rsync as well. though I think find might have been used for potential hidden directories? | 20:09 |
Tengu | we can of course discuss tomorrow if you want, I'm on a private device with no acxess but irc | 20:10 |
Tengu | clarkb: "time" not found?! errrr.. is it embeded in bash? will check that out tomorrow. | 20:21 |
fungi | Tengu: or we're not installing the package needed to make it available | 20:22 |
*** swalladge is now known as Guest401 | 21:24 | |
*** dasm is now known as dasm|off | 21:39 | |
*** dviroel is now known as dviroel|out | 22:00 | |
clarkb | fwiw I think my test change has failed to land on rax but I need to double check that before rechecking | 22:18 |
clarkb | caught one https://zuul.opendev.org/t/openstack/stream/37dffb86cfad4ae3b3717f86ed294efc?logfile=console.log | 22:36 |
clarkb | its not looking any quicker | 22:39 |
clarkb | (granted sample size of one) | 22:39 |
clarkb | I'm not super surprised by that. The bottleneck is almost certainly disk io | 22:39 |
*** rlandy is now known as rlandy|out | 23:51 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!