bauzas | good morning Nova | 07:44 |
---|---|---|
* bauzas just waves but has to leave for 10 mins ;) | 07:45 | |
kashyap | gibi[m]: bauzas: Friday shameless plug: I recently summarized a KVM maintainer's talk for LWN on QEMU and software complexity. I think the lessons are interesting for OpenStack too: | 08:46 |
kashyap | gibi[m]: bauzas: "A QEMU case study in grappling with software complexity" — https://lwn.net/SubscriberLink/872321/221e8d48eb609a38/ | 08:46 |
bauzas | ++ | 08:46 |
kashyap | If you're short on time, read the intro "Sources of complexity", "Ways to fight back", and the short, one-para conclusion. | 08:47 |
kashyap | (Especially check out the idea of "incomplete transitions") | 08:47 |
mdbooth | My devstack is failing when it tries to run 'openstack --os-cloud devstack-system-admin registered limit create --service glance --default-limit 10000 --region RegionOne image_size_total' with 'Cloud devstack-system-admin was not found'. This is a fresh install. I don't know what devstack-system-admin is or what creates it, so I'm at a loss for | 09:47 |
mdbooth | where to look. There's a comment from dansmith that this is a hack: https://github.com/openstack/devstack/blob/82facd6edf7cefac1ab68de4fe9054d7c4cb50db/lib/glance#L291-L294 . Has something invalidated the hack? Does anybody know where 'devstack-system-admin' is supposed to come from? | 09:47 |
mdbooth | To the best of my knowledge there is no clouds.yaml anywhere on this system. If there is, it was created by devstack and put somewhere I don't know to look for it. | 09:49 |
kashyap | mdbooth: See this commit in DevStack: 56905820 (Add devstack-system-admin for system scoped actions, 2019-01-08) | 09:49 |
mdbooth | 👀 | 09:51 |
kashyap | Heh | 09:51 |
kashyap | Also see the code under the comment "#admin with a system-scoped token -> devstack-system" in devstack/functions-common | 09:51 |
kashyap | Although the commit message isn't particularly descriptive; and assumes "inside knowledge" | 09:52 |
mdbooth | Hmm, that appears to be updating a clouds.yaml file | 09:54 |
mdbooth | As I don't have a clouds.yaml file, I wonder if this is an ordering thing | 09:55 |
mdbooth | Did devstack create the glance limit before creating clouds.yaml? | 09:55 |
mdbooth | I'm going to remove GLANCE_LIMIT_IMAGE_SIZE_TOTAL from my local.conf and re-run glance, then look to see what it put in clouds.yaml | 09:57 |
mdbooth | s/glance/stack.sh/ | 09:57 |
kashyap | I don't know about the Glance limit ... but there are bunch of commits that might give a hint (git log --oneline | egrep -i 'glance.*limit') | 09:58 |
frickler | /etc/openstack/clouds.yaml is what devstack generates | 10:06 |
mdbooth | frickler: Yeah, that was my RTFS. It seems to be running this without having created it, though 🤔 | 10:07 |
mdbooth | Although I just found GLANCE_ENABLE_QUOTAS. I wonder if I can sidestep this whole thing. | 10:07 |
frickler | mdbooth: are you using a reduced set of services? it might be a bug in the async dependencies | 10:19 |
mdbooth | Very possibly. I rewrite local.conf this morning to use ovn and I'm convinced I wasn't hitting this yesterday. | 10:19 |
frickler | mdbooth: if you can share your local.conf I can give it a spin | 10:20 |
mdbooth | Just re-provisioning. I'll have a fully hydrated one in a few minutes. | 10:21 |
mdbooth | frickler: Actually you can just get it here: https://github.com/shiftstack/cluster-api-provider-openstack/blob/devstack-on-openstack/hack/ci/cloud-init/default.yaml.tpl | 10:22 |
mdbooth | That's the bottom half of a cloud-init which runs devstack | 10:23 |
mdbooth | OPENSTACK_RELEASE is xena | 10:25 |
frickler | ok, having a meeting now, will try to run it afterwards | 10:27 |
mdbooth | frickler: Thanks. That version includes GLANCE_ENABLE_QUOTAS=False because I'm just testing that. | 10:28 |
mdbooth | But it previously used GLANCE_LIMIT_IMAGE_SIZE_TOTAL=10000 instead | 10:28 |
mdbooth | frickler: FWIW I've been in hacker mode on that config for a while (just look at the history!). I just disabled tempest and horizon which had accidentally become enabled again, and it seems to have completed. It's still using GLANCE_ENABLE_QUOTAS=False. | 11:02 |
mdbooth | Which is to say, if there's a dependency issue I'll bet it relates to tempest or horizon, but I haven't proven that. | 11:03 |
frickler | mdbooth: o.k., at least I could reproduce your failure with GLANCE_LIMIT_IMAGE_SIZE_TOTAL being set | 11:29 |
frickler | mdbooth: nice one, this actually only fails consistently with DEVSTACK_PARALLEL=False | 11:50 |
frickler | with async, https://github.com/openstack/devstack/blob/82facd6edf7cefac1ab68de4fe9054d7c4cb50db/stack.sh#L1107 runs in the background and write_clouds_yaml in L1122 has a fair chance of being fast enough | 11:52 |
frickler | dansmith: ^^ | 11:52 |
mdbooth | frickler: Oh, wow! I only turned that on temporarily to rule it out as the potential cause of another issue! | 11:56 |
kashyap | mdbooth: TIL, "tpl" extension | 12:27 |
mdbooth | kashyap: Not mine in this case, but I'm pretty sure I've used it before. | 12:27 |
kashyap | (From your link. Probably it's just a convenient reference to refer to that YAML file as a "template") | 12:27 |
kashyap | mdbooth: I see | 12:27 |
gibi | sean-k-mooney: about stoping the services. you are right we are doing it already. that does not stop all the eventlets the service spawnd. I also tried to iterate all the eventlets and and call throw() on them to stop them but that did not help either. | 12:56 |
gibi | kashyap: thanks for the links, I added it as weekend reading :) | 12:57 |
kashyap | gibi: No prob. (It took 8 gruelling revisions. :D. But I always become a bit of a better person after writing for LWN) | 12:57 |
sean-k-mooney | gibi: ya, i have your review open on my other monitor. the more i read over it and look at it the more compleing it becomes. | 12:57 |
sean-k-mooney | gibi: its a little non obviious at first glance why we have to do this but its a nice solution when yuou did into it | 12:58 |
gibi | kashyap: I follow LWN but not a subscriber. I think it is a prestige to write there :) | 12:58 |
kashyap | gibi: I realize not everyone has a subscription; Red Hat has a group sub. Hence I created a "subscriber link", as I posted it in a community channel. | 12:59 |
gibi | sean-k-mooney: would be better to kill eventlets at the end of each testcase, but I did not find a way to do it | 12:59 |
gibi | kashyap: yeah I see and I thank you for it | 12:59 |
kashyap | No prob at all. (And sorry for the plug.) | 12:59 |
kashyap | But the main idea of essential vs. accidental complexity comes from the famous 1986 paper called "No Silver Bullet" by Fred Brooks - https://en.wikipedia.org/wiki/No_Silver_Bullet | 13:00 |
sean-k-mooney | gibi: well there might be a way to do it if we modifed the test setup so that each test used a seperate greenpool then we could stop all eventlets in the pool and discard it at the end of the test | 13:00 |
kashyap | (So it was nice to see concrete examples of it in QEMU.) | 13:01 |
sean-k-mooney | to do that i think we would have to modify the nova service deffintion and possible nova utils to use a non default eventlet pool | 13:01 |
sean-k-mooney | but if we did that we could extend the kill function to terminate the pool | 13:02 |
frickler | mdbooth: I wanted to move the write_clouds_yaml earlier anyway in https://review.opendev.org/c/openstack/devstack/+/780417, I guess I can just do that step in its own patch to fix your issue | 13:03 |
gibi | kashyap: ohh yeah essential and accidental complexity I like those topics | 13:05 |
mdbooth | frickler: I'd appreciate it | 13:05 |
kashyap | gibi: Yeah; the idea goes back 2000 years ago! (Aristotle++) | 13:05 |
gibi | ohh | 13:05 |
gibi | I did not know that | 13:05 |
kashyap | I linked to it in the intro too :) | 13:06 |
mdbooth | eventlet-- | 13:06 |
kashyap | mdbooth: Heh, what a contrasting negative karma | 13:06 |
kashyap | (Sorry for your pain) | 13:06 |
mdbooth | I only have the scars now, and occasionally the nightmares. | 13:07 |
gibi | sean-k-mooney: if terminating the pool also just calls greenlet.throw() then that would have the same problem as I had when I manually called that at the end of the test on each greenlet | 13:07 |
frickler | mdbooth: https://review.opendev.org/c/openstack/devstack/+/814142 | 13:08 |
mdbooth | It blows my mind that at some point there was a meeting and somebody said: "You know what, lets just monkey patch everything and replace it all with our own stuff, what could go wrong?". And somebody else in that meeting agreed with them, and they started doing it. | 13:08 |
gibi | mdbooth: I assume it was a single person project. :) | 13:09 |
sean-k-mooney | mdbooth: well the alternitive was to continue to use twisted so... | 13:09 |
gibi | our use threading until you scale too big where the overhead of threads are too much | 13:10 |
gibi | our use other than python without the GIL ;) | 13:10 |
sean-k-mooney | gibi: i was considering coudl we stop all the service we spawwned as greantreads and then etiehr call waitall to wait for the to finsih or loop over and call kill on the all running greenthreads | 13:11 |
sean-k-mooney | so stop service and call https://eventlet.net/doc/modules/greenpool.html#eventlet.greenpool.GreenPool.waitall or stop services and call https://eventlet.net/doc/modules/greenthread.html#eventlet.greenthread.kill on all greentreads in the pool | 13:12 |
gibi | it is multiple eventlet per service, but yes, you are right. That should work. I did not call wait after throw, maybe that was the problem | 13:12 |
gibi | note that we not just have greenthreads, we have naked greenlets as well somehow | 13:12 |
gibi | I did not traced where they are coming from | 13:13 |
sean-k-mooney | i need to test something else today but i still think your current patch is likely a viable solution in the sort term and we could explore the green pool approche in parallel/after | 13:15 |
bauzas | folks, looking at the nova PTG agenda we have atm | 13:25 |
bauzas | it looks to me we don't have a lot of topics to discuss, so maybe we shouldn't have a schedule, ok ? | 13:25 |
bauzas | I'll just prioritize some topics | 13:26 |
dansmith | frickler: ah, need to wait for all those accounts to finish before write_clouds_yaml I guess huh? | 13:30 |
sean-k-mooney | we might want to keep one of the session free for an unconfrence/follow up dicussions | 13:30 |
sean-k-mooney | bauzas: ^ | 13:30 |
sean-k-mooney | bauzas: but ya we could also just priortise the list and see how far we get each day | 13:31 |
dansmith | frickler: er, no I guess that's just writing static things out, so .. I'm not sure what the problem is (if any) | 13:37 |
gibi | bauzas: I suggest to frontload the important stuff and then just follow the etherpad. If we run out of topics then we are done :) | 13:38 |
gibi | sean-k-mooney: yeah I have to do other things too today so I have no chance to try the greenpool approach | 13:38 |
bauzas | gibi: yeah, for example, I'll move the melwitt's topic for unified limits above | 13:48 |
gibi | ack | 13:54 |
mdbooth | FYI: $ curl --compressed -H "X-Auth-Token: ${token}" -X GET https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13292/v2.1/images/$imageid/file | nbdcopy -- - [ qemu-nbd -f qcow2 capo-e2e-worker.qcow2 ] | 14:05 |
mdbooth | Ah, wrong channel. Maybe still interesting, though :) | 14:06 |
mdbooth | dansmith: The problem I was hitting was that we were trying to create the glance quotas before clouds.yaml had been created. | 14:08 |
mdbooth | And to be clear, I'm basically just cargo culting this local.conf. I have very little idea what's actually going on. | 14:10 |
mdbooth | Speaking of which, anybody ever seen: "The unit files have no installation config (WantedBy=, RequiredBy=, Also=, Alias= settings in the [Install] section, and DefaultInstance= for template units). This means they are not meant to be enabled using systemctl.". This failure seems to be non-deterministic, and unfortunately only happens in CI so I | 14:14 |
mdbooth | can't debug :( | 14:14 |
dansmith | mdbooth: ah, maybe I should move that in glance because I do it super early when the *glance* accounts are available, but I definitely need clouds.yaml | 14:52 |
opendevreview | Ade Lee proposed openstack/nova master: Add check job for FIPS https://review.opendev.org/c/openstack/nova/+/790519 | 17:03 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline starting in 5 minutes, at 18:00 UTC, for scheduled project rename maintenance, which should last no more than an hour (but will likely be much shorter): http://lists.opendev.org/pipermail/service-announce/2021-October/000024.html | 17:59 | |
opendevreview | melanie witt proposed openstack/nova master: DNM Run against unmerged oslo.limit changes https://review.opendev.org/c/openstack/nova/+/812236 | 21:08 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!