tonyb | Should nodepool be 'pre-booting' nodes in the inmotion cloud? I expect to see some number in the 'Available' state, | 00:00 |
---|---|---|
fungi | nodepool only "pre-boots" nodes that have a min-ready count and only enough across all providers to meet that count | 00:01 |
tonyb | Ah okay | 00:01 |
fungi | what's the max-servers set to in inmotion? | 00:02 |
fungi | in our nodepool configs i mean | 00:02 |
tonyb | I'll check but I think the max is 51 | 00:02 |
fungi | a nonzero value? | 00:02 |
fungi | we sometimes set that to 0 to temporarily disable booting nodes in a provider, is why i ask | 00:02 |
corvus | https://grafana.opendev.org/d/53e8120f2a/nodepool3a-inmotion?orgId=1 says 51 | 00:03 |
fungi | so if you think we should be booting nodes there already and aren't, the next place to look would be the debug log for the launcher that provider is tied to. see if it's turning down node requests there for a specified reason | 00:05 |
corvus | the ready node launch attempts graph shows values for the past 2 hours which indicates successful launches | 00:07 |
tonyb | max-servers: 51 ... no min-servers so that makes matches | 00:07 |
corvus | there were errors between 22 and 2300 but not since 2300 | 00:08 |
corvus | i think 2300 is your approximate "done" time, right? so those 2 graphs look good. | 00:08 |
tonyb | I was mostly done yesterday today was just cleaning things out. | 00:10 |
tonyb | I'll look at the errors in the last 24 hours | 00:11 |
tonyb | Yeah I think the thirds stage fix: https://etherpad.opendev.org/p/opendev-inmotion_debugging#L121 | 01:01 |
tonyb | has cleaned up the errors | 01:01 |
tonyb | the periodic pipeline will go off in about 1 hour that'll be a good test | 01:01 |
tonyb | and since I completed that the number of nodes launched in this cloud seems to be a little higher | 01:08 |
*** dtantsur_ is now known as dtantsur | 01:50 | |
opendevreview | Takashi Kajinami proposed openstack/diskimage-builder master: Get rid of 3rd party mock https://review.opendev.org/c/openstack/diskimage-builder/+/907513 | 02:29 |
tonyb | I think inmotion is doing better. for the last 3+ hours it's sitting on 40+ nodes in use, there were a few errors in the last 30mins but far fewer than before | 05:08 |
ykarel | Hi is this known issue to infra mirrors for centos-stream not updated since 16th January? | 07:32 |
ykarel | http://mirror.iad.rax.opendev.org/centos-stream/timestamp.txt | 07:32 |
ykarel | vs http://mirror.rackspace.com/centos-stream/timestamp.txt | 07:33 |
ykarel | http://mirror.iad.rax.opendev.org/logs/rsync-mirrors/centos-stream.log | 07:39 |
ykarel | shows rsync: close failed on "/afs/.openstack.org/mirror/centos-stream/9-stream/CRB/x86_64/os/Packages/.dotnet-sdk-7.0-source-built-artifacts-7.0.115-2.el9.x86_64.rpm.5ic9Kr": Disk quota exceeded (122) | 07:40 |
ykarel | fungi, frickler can you check ^ | 07:42 |
frickler | ykarel: it has been known that some volumes are running close to their limits for some time, but I didn't know that this has already happened | 09:55 |
frickler | tonyb: did you set up your AFS credentials yet? ^^ might be a good opportunity to get a bit of practice. otherwise I'll just do a small quota bump as a quick workaround | 09:58 |
tonyb | frickler: I haven't done anything with AFS credentials. so I guess point me in the direction of doing that and I'll do it ASAP | 10:00 |
frickler | tonyb: https://docs.opendev.org/opendev/system-config/latest/afs.html is the general doc, I must admit I'm not too deep into this myself, so if you need help better wait for fungi or clarkb | 10:14 |
frickler | the command I would run would be "fs setquota /afs/.openstack.org/mirror/centos-stream -max 350000000", current value is 300M | 10:16 |
tonyb | frickler: thanks | 10:55 |
ykarel | thx frickler tonyb | 12:05 |
ykarel | so is updated? | 12:05 |
frickler | ykarel: not yet, is this causing actual job failures or was is just something you noticed? | 12:17 |
ykarel | frickler, noticed with weekly job https://zuul.openstack.org/builds?job_name=neutron-fullstack-with-uwsgi-fips&branch=master&skip=0 | 12:18 |
frickler | oh, important gotcha when debugging a held system-config-run-* job: the nodes are configured with our usual sysadmin accs, can't login as root as with "normal" held nodes. but I guess you all already knew this and I'm just late to the party | 12:42 |
frickler | aah, that's a nice failure. if we merge/test https://review.opendev.org/c/opendev/system-config/+/907500, the page https://opendev.org/opendev/system-config/ does actually contain 'Internal Server Error' because that text is in the commit message which is then shown on that page. so we'll need to temporarily disable this check | 12:52 |
frickler | https://opendev.org/opendev/system-config/src/branch/master/testinfra/test_gitea.py#L133 unless someone has a better idea | 12:52 |
frickler | ykarel: ok, I've done some small quota increase now, will check after the next rsync run. I think we still have some other quota bumps for tonyb to look at | 12:56 |
ykarel | thx frickler | 12:57 |
*** blarnath is now known as d34dh0r53 | 14:53 | |
fungi | infra-root: ems has confirmed the terms of our matrix homeserver business plan and will be officially upgrading us on wednesday next week (2024-02-07). we shouldn't expect any downtime or other impact to the service, the only outwardly visible change will, i think, be the increase in our user quota which we weren't using all of before anyway | 15:31 |
fungi | i'll try to remember to remind folks the day before, in the weekly meeting, too | 15:32 |
fungi | popping out for a lunch break, may take a slightly longer one than usual since it's friday and i also have some quick errands to run, but should still be back by 18:00 at the latest (probably sooner) | 16:08 |
clarkb | fungi: thank you for daling with that | 16:23 |
clarkb | frickler: I'll work on adjusting the test to be less collision likely | 16:23 |
opendevreview | Clark Boylan proposed opendev/system-config master: Increase gitea db connection limit https://review.opendev.org/c/opendev/system-config/+/907500 | 16:40 |
clarkb | this should pass testing now I hope | 16:40 |
clarkb | that gitea db change passes now | 18:15 |
opendevreview | Merged opendev/system-config master: Retire the OpenInfra Labs mailing list https://review.opendev.org/c/opendev/system-config/+/907103 | 18:30 |
opendevreview | Elod Illes proposed openstack/project-config master: [relmgt] Update reno when cutting unmaintained branch https://review.opendev.org/c/openstack/project-config/+/907626 | 19:15 |
*** priteau_ is now known as priteau | 21:44 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!