opendevreview | Merged openstack/nova master: Fix pre_live_migration rollback https://review.opendev.org/c/openstack/nova/+/815324 | 00:33 |
---|---|---|
*** dasm|afk is now known as dasm | 01:17 | |
*** dasm is now known as dasm|gone | 03:41 | |
*** efried1 is now known as efried | 05:08 | |
opendevreview | Merged openstack/nova-specs master: Remove setup.py and setup.cfg https://review.opendev.org/c/openstack/nova-specs/+/835759 | 09:14 |
opendevreview | Merged openstack/nova-specs master: Move implemented specs for the Yoga release https://review.opendev.org/c/openstack/nova-specs/+/835272 | 09:31 |
bauzas | melwitt: thanks for the help ^ | 09:33 |
ihti[m] | Hi, we are facing a bug with volume attachments(https://bugs.launchpad.net/nova/+bug/1964576). We have a proposed fix for it. If some one has time to review the bug/fix, it would be great. Thanks! | 10:13 |
EugenMayer | sean-k-mooney i know have the issue present with glance showing a 'queed' status for an image, while nova / the instance shows 'image backup'. I cannot see any tasks use 'glance task-list' nor 'glance image-tasks <id>' | 11:45 |
EugenMayer | which nova-logs could be interesting? glance logs do not show anything interesting / error like. neither nova-api-error.log | 11:47 |
sean-k-mooney | the nova compute agent log is the only place that might have an error | 11:48 |
sean-k-mooney | but if the image is queued i think that means that nova has already finsihed uploading it | 11:48 |
opendevreview | kiran pawar proposed openstack/nova master: VMware: Early fail spawn if memory is not multiple of 4. https://review.opendev.org/c/openstack/nova/+/835739 | 11:48 |
sean-k-mooney | and glance should be processing it | 11:49 |
sean-k-mooney | have you checked the glance api host to see if it actully has the image on disk | 11:49 |
EugenMayer | sean-k-mooney so the nova compute could have logs or the glance api if the image is there, the question is, why the status is queue and how to find out why it is that way | 11:50 |
EugenMayer | there are no error logs on nova-compute.log since 30 days (nothing has logged at all, last from 23th march | 11:52 |
sean-k-mooney | so i think this si useing the glance interoperal import pipeline when the image is uploaded then queued to become active after the import pipeline has finsihed processin the image | 11:53 |
sean-k-mooney | EugenMayer: that kind of sound like the agent is hung | 11:53 |
sean-k-mooney | you shoudl at least see the periodics | 11:53 |
EugenMayer | So my image id (that is queued) is 57850bd9-dfdc-45bd-bd9f-cec297f3fdae - checking the storage folder i see images, but this one is not present | 11:53 |
sean-k-mooney | ack | 11:53 |
sean-k-mooney | dansmith: do you know where the image would be in the queued state? | 11:54 |
sean-k-mooney | it only enters that state after the upload has happend right? | 11:54 |
sean-k-mooney | or am i miss rememebering that | 11:54 |
EugenMayer | using 'glance image-tasks <id>' does not show any tasks, neither 'glance task-list' | 11:54 |
EugenMayer | interesting, all those backup tasks on compute3 are broken. Means other computes finished, just compute3 backups did not (all of them). This kind of tells that the compute is somehow flaky - but why and what | 11:56 |
EugenMayer | if you have any idea how to trace, happy to look at it. Currently not sure where to look at at all | 12:20 |
sean-k-mooney | the only thing that comes to mind is that the agent is exasuting the thread pool or has made a blocking call on the main thread that was not monkey patched by eventlets | 12:23 |
sean-k-mooney | generating a guru meditation report might shed some light on that | 12:23 |
sean-k-mooney | but that will basicaly crashdump the process so you will have to restart the agent after you do the sig_hup | 12:25 |
sean-k-mooney | actully not sig_hup | 12:25 |
sean-k-mooney | sig_usr2 | 12:25 |
EugenMayer | who holds the state in general right now? | 12:27 |
sean-k-mooney | when nova calls glance evenlet yeild form the greenthread and the state is stored in memory in the greenthread local varibles | 12:31 |
sean-k-mooney | same when we do any io like copying the image for snapshot | 12:32 |
sean-k-mooney | we yield | 12:32 |
sean-k-mooney | and when the io op complete event resumes the greenthread | 12:32 |
EugenMayer | not sure what a greenthread is. If the state is a memory state, restarting the service (what-ever that is) would reset the state for nova, right? | 12:48 |
sean-k-mooney | the threading model in nova is to use implicat coroutiens by using userspace thread | 12:51 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/doc/source/reference/threading.rst | 12:51 |
sean-k-mooney | so everythime we do io eventlet yeild execution of the current function and it starts running the next greenthread | 12:52 |
EugenMayer | i see, this is the nova-compute process then, right? | 12:52 |
sean-k-mooney | then when the io compelte the previous green trhead is added to the queue to be resumed | 12:52 |
sean-k-mooney | yes | 12:52 |
sean-k-mooney | nova-compute but also conductor and schduler | 12:52 |
sean-k-mooney | technially nova-api is monkeypatch but the way its run with appache means it only process one request per worker process | 12:53 |
sean-k-mooney | because apache queues the request before it get to the api application | 12:53 |
EugenMayer | oh holy moly. | 12:57 |
EugenMayer | I mean, my day job is being a software engeneer. Yes with bigger EE software, yes with microservices, distributed and all that. But this really is very weired to me - or it is simply to complex for me to play around in the mind since i do not know any components properly and have no save-haven to return / start thinking from | 12:58 |
EugenMayer | thank you for elaborating on that | 12:59 |
EugenMayer | I left with 2 things: glance has an tasks status 'queued' of an tasks that does not exists and it is unclear where this state comes from. Second is, why my compute3 (out of 4) fails to create any backups, all others can. | 13:00 |
EugenMayer | Ah now i understand - not a task is 'queued' .. the image is queued - without any task. So it is the image state | 13:00 |
sean-k-mooney | a very long time ago around the catus release opensack moved form twisted to eventlet to remove the need for peopel to explcitly think about multithreading and concurancy most of the time. howver ther eare still case where you have to use locks ectra to ensure no data races. so for the most part eventlet simplifes the common code path when you are io bound which tends to be | 13:01 |
sean-k-mooney | the case for nova | 13:01 |
sean-k-mooney | yes the image is queue | 13:02 |
sean-k-mooney | not a task | 13:02 |
sean-k-mooney | https://docs.openstack.org/glance/latest/user/statuses.html | 13:05 |
sean-k-mooney | queued | 13:05 |
sean-k-mooney | The image identifier has been reserved for an image in the Glance registry. No image data has been uploaded to Glance and the image size was not explicitly set to zero on creation. | 13:05 |
sean-k-mooney | ok so queue means we have crerate the image but not uploaded it | 13:05 |
sean-k-mooney | which i guess make sense since nova is not currently in the image_uploading task_state | 13:06 |
sean-k-mooney | so that likely means that nova is failing to create the snapshot via libvirt/qemu | 13:06 |
EugenMayer | i see, thank you so much sean! | 13:07 |
EugenMayer | i removed the broken (queued) images now | 13:08 |
EugenMayer | reset the instances and restarted them | 13:08 |
sean-k-mooney | ack | 13:08 |
EugenMayer | i will check the logs of that compute once again and then restart it. Usually those restart fix 9/10 issues i have with openstack | 13:08 |
sean-k-mooney | what you likely shoudl do is try and find the request-id for the backup call and see if you can fined the last log for that operation | 13:08 |
EugenMayer | Which obviously is not a good sign, sure | 13:08 |
sean-k-mooney | to seee where it got stuck | 13:08 |
EugenMayer | this is SO hard to trace for me considering the amount of subsystems, proxies and systems involved | 13:09 |
EugenMayer | it usually is my bread and butter debugging those kind of things in our software stacks. But well, as i must learn, the reason i can do it there is - i know the software a lot better | 13:10 |
sean-k-mooney | ya there is a lot of context to grok | 13:22 |
*** dasm|gone is now known as dasm | 13:26 | |
viks__ | hi, with `soft-anti-affinity`, whenever i create 2 instances together via horizon, it goes in to 2 different hosts. But when i create one after the other with `soft-anti-affinity`, it goes in to the same host? is it expected? | 14:42 |
sean-k-mooney | viks__: the behavior of the second case will depend on if you waith for the first one to go active before you create the second | 15:01 |
sean-k-mooney | but yes it is expected in the second case you ahve a race to update the instance.host before the second vm is schduled | 15:02 |
sean-k-mooney | we provide no affinity garunetees in this case unless you enable the affinity upcall | 15:02 |
viks__ | sean-k-mooney: what all things i need to set for affinity upcall in the nova.conf? | 15:05 |
sean-k-mooney | the workaround config option in the compute nova.conf and api database in conductor nova.conf | 15:24 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/xena: Test aborting queued live migration https://review.opendev.org/c/openstack/nova/+/835853 | 15:39 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/xena: Add functional tests to reproduce bug #1960412 https://review.opendev.org/c/openstack/nova/+/835854 | 15:40 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/xena: Clean up when queued live migration aborted https://review.opendev.org/c/openstack/nova/+/835855 | 15:41 |
viks__ | sean-k-mooney: ok.. but even with `[workaround]/disable_group_policy_check_upcall = false`, i get the same behaviour | 15:53 |
sean-k-mooney | the second isntance will stilll race and get sent to the same host | 15:53 |
sean-k-mooney | but it shoudl then be rejected | 15:53 |
sean-k-mooney | and reschudled to a differnt host | 15:53 |
sean-k-mooney | form the alternate host list | 15:54 |
sean-k-mooney | noonedeadpunk: by they way https://specs.openstack.org/openstack/nova-specs/specs/ussuri/implemented/flavor-extra-spec-validators.html should have caught your typo if you use the correct microverion | 17:09 |
opendevreview | melanie witt proposed openstack/nova-specs master: Repropose spec for ephemeral storage encryption https://review.opendev.org/c/openstack/nova-specs/+/835877 | 17:14 |
opendevreview | melanie witt proposed openstack/nova-specs master: Repropose spec for ephemeral storage encryption https://review.opendev.org/c/openstack/nova-specs/+/835877 | 17:36 |
*** dasm is now known as dasm|off | 21:43 | |
*** ianw_pto is now known as ianw | 22:24 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!