openstackgerrit | Brin Zhang proposed openstack/nova-specs master: Remove the todo in the migrations spec https://review.opendev.org/689056 | 00:02 |
---|---|---|
*** markvoelker has joined #openstack-nova | 00:15 | |
*** markvoelker has quit IRC | 00:20 | |
*** markvoelker has joined #openstack-nova | 00:22 | |
*** mriedem_afk has quit IRC | 00:30 | |
openstackgerrit | Merged openstack/nova-specs master: Remove the todo in the migrations spec https://review.opendev.org/689056 | 00:43 |
*** bnemec has quit IRC | 00:45 | |
*** rcernin has joined #openstack-nova | 00:51 | |
*** brinzhang has joined #openstack-nova | 00:53 | |
*** bnemec has joined #openstack-nova | 00:56 | |
*** bnemec has quit IRC | 01:07 | |
*** Liang__ has joined #openstack-nova | 01:10 | |
*** brinzhang_ has joined #openstack-nova | 01:11 | |
*** brinzhang has quit IRC | 01:14 | |
*** mtreinish has quit IRC | 01:27 | |
*** nanzha has joined #openstack-nova | 01:30 | |
*** brinzhang has joined #openstack-nova | 01:34 | |
*** brinzhang_ has quit IRC | 01:36 | |
*** brinzhang_ has joined #openstack-nova | 02:05 | |
*** brinzhang_ has quit IRC | 02:05 | |
*** yaawang_ has quit IRC | 02:06 | |
*** yaawang_ has joined #openstack-nova | 02:06 | |
*** brinzhang has quit IRC | 02:08 | |
*** SonPham has joined #openstack-nova | 02:12 | |
SonPham | hi. how to install python-novaclient package from github. I installed from git but horizon error | 02:14 |
*** awalende has joined #openstack-nova | 02:44 | |
*** HagunKim has joined #openstack-nova | 02:46 | |
*** victor286 has joined #openstack-nova | 02:47 | |
*** mdbooth has quit IRC | 02:47 | |
*** awalende has quit IRC | 02:48 | |
*** mdbooth has joined #openstack-nova | 02:49 | |
*** ricolin has joined #openstack-nova | 02:55 | |
*** dave-mccowan has quit IRC | 03:00 | |
openstackgerrit | Liang Fang proposed openstack/nova-specs master: Support volume local cache https://review.opendev.org/689070 | 03:07 |
*** yaawang_ has quit IRC | 03:08 | |
*** yaawang_ has joined #openstack-nova | 03:08 | |
*** mkrai__ has joined #openstack-nova | 03:09 | |
*** Kevin_Zheng has joined #openstack-nova | 03:21 | |
*** nweinber has joined #openstack-nova | 03:32 | |
*** mkrai__ has quit IRC | 03:33 | |
*** gbarros has quit IRC | 03:33 | |
*** SonPham has quit IRC | 03:37 | |
*** takashin has left #openstack-nova | 03:39 | |
*** psachin has joined #openstack-nova | 03:41 | |
*** nweinber has quit IRC | 03:51 | |
*** mkrai__ has joined #openstack-nova | 03:54 | |
*** larainema has joined #openstack-nova | 03:55 | |
*** Liang__ is now known as LiangFang | 04:05 | |
LiangFang | hi cores, I want to propose a spec about volume local cache. Can anybody be my Feature liaison?Thank you so much. spec: Support volume local cache https://review.opendev.org/689070 | 04:14 |
LiangFang | I'm new to Nova | 04:14 |
*** igordc has quit IRC | 04:15 | |
*** pcaruana has joined #openstack-nova | 04:18 | |
*** mkrai__ has quit IRC | 04:22 | |
*** mkrai__ has joined #openstack-nova | 04:23 | |
*** jangutter has joined #openstack-nova | 04:38 | |
*** jangutter has quit IRC | 04:42 | |
*** markvoelker has quit IRC | 04:45 | |
*** lbragstad has quit IRC | 04:53 | |
*** lbragstad has joined #openstack-nova | 04:53 | |
*** dansmith has quit IRC | 04:54 | |
*** ianw has quit IRC | 04:54 | |
*** ianw_ has joined #openstack-nova | 04:55 | |
*** dansmith has joined #openstack-nova | 04:55 | |
*** ianw_ is now known as ianw | 04:56 | |
*** mkrai__ has quit IRC | 05:01 | |
*** mkrai_ has joined #openstack-nova | 05:01 | |
*** Luzi has joined #openstack-nova | 05:04 | |
*** ratailor has joined #openstack-nova | 05:07 | |
*** ociuhandu has joined #openstack-nova | 05:26 | |
*** ociuhandu has quit IRC | 05:31 | |
*** sridharg has joined #openstack-nova | 05:33 | |
*** brinzhang has joined #openstack-nova | 05:34 | |
*** brinzhang has quit IRC | 05:39 | |
*** takamatsu has quit IRC | 05:47 | |
*** jawad_axd has joined #openstack-nova | 05:59 | |
*** jawad_ax_ has joined #openstack-nova | 06:03 | |
*** jawad_axd has quit IRC | 06:03 | |
*** lpetrut has joined #openstack-nova | 06:04 | |
*** lpetrut has quit IRC | 06:05 | |
*** lpetrut has joined #openstack-nova | 06:05 | |
*** jawad_ax_ has quit IRC | 06:07 | |
*** jawad_axd has joined #openstack-nova | 06:14 | |
*** janki has joined #openstack-nova | 06:18 | |
*** rcernin has quit IRC | 06:18 | |
*** udesale has joined #openstack-nova | 06:22 | |
*** nanzha has quit IRC | 06:26 | |
*** nanzha has joined #openstack-nova | 06:28 | |
*** LiangFang has quit IRC | 06:29 | |
*** Liang__ has joined #openstack-nova | 06:29 | |
*** dpawlik has joined #openstack-nova | 06:42 | |
*** markvoelker has joined #openstack-nova | 06:46 | |
*** markvoelker has quit IRC | 06:51 | |
*** FlorianFa has quit IRC | 06:53 | |
*** trident has quit IRC | 06:55 | |
*** slaweq has joined #openstack-nova | 06:55 | |
*** damien_r has joined #openstack-nova | 06:56 | |
eandersson | VMs stuck in BUILD is such a pain :'( | 06:58 |
*** trident has joined #openstack-nova | 06:58 | |
gibi_off | eandersson: can I do something to help with that? | 07:02 |
*** gibi_off is now known as gibi | 07:02 | |
eandersson | gibi, been fighting these for a while in Rocky. We got a few issus fixed upstream. | 07:03 |
eandersson | We first though these were due to restarted computes | 07:03 |
eandersson | https://review.opendev.org/#/c/687535/ | 07:03 |
eandersson | but I think the issues are due to race conditions in this case | 07:04 |
eandersson | I see two instances scheduled at the same second on the same compute | 07:04 |
eandersson | I see an allocation, but both VMs are stuck in building / scheduling | 07:04 |
eandersson | with no compute visible in the instance info | 07:04 |
eandersson | (had to track the compute down using the db) | 07:04 |
gibi | eandersson: interesting. So you see both server having allocation in placement or only one of the server has allocation? | 07:05 |
eandersson | both have allocation in placement | 07:05 |
eandersson | both are stuck in BUILD | 07:06 |
*** FlorianFa has joined #openstack-nova | 07:06 | |
gibi | and none of the servers has instance.host updated to point to the compute? | 07:06 |
eandersson | correct | 07:07 |
eandersson | I see on the compute 2x Final resource view, image xxx at /var/lib | 07:08 |
eandersson | and then nothing else | 07:08 |
*** tesseract has joined #openstack-nova | 07:09 | |
eandersson | hmm maybe the compute is bad in this case | 07:17 |
eandersson | I just got 3 more stuck on that host | 07:17 |
eandersson | but don't understand why they never fail | 07:18 |
gibi | eandersson: I guess instance.task_state is None. Unfortunately there is no logs coming out from the compute between the build request reaching the compute and setting the instance.vm_state to BUILDING and between the instance_claim that will set instance.host | 07:18 |
*** mjozefcz|afk has joined #openstack-nova | 07:18 | |
*** takamatsu has joined #openstack-nova | 07:18 | |
*** ttsiouts has joined #openstack-nova | 07:18 | |
eandersson | btw we are running the very latest rocky | 07:20 |
*** awalende has joined #openstack-nova | 07:20 | |
gibi | eandersson: did you happen to see instance.create.start versioned notification or compute.instance.create.start legacy notification for these servers? | 07:21 |
eandersson | I can check designate | 07:21 |
eandersson | or nvm they only catch end | 07:21 |
gibi | eandersson: does pci devices or sriov ports are requested for these servers? | 07:23 |
eandersson | nothing special | 07:24 |
gibi | eandersson: the two Final resource view logged considers the allocation of the servers? | 07:25 |
eandersson | Sorry, don't fully understand that | 07:26 |
eandersson | last part | 07:27 |
gibi | you mentioned that you saw two "Final resouce view " logs from the compute | 07:27 |
gibi | that contains how much vcpu and ram is used on the compute | 07:27 |
gibi | does that usage contains the usage of your two servers stuck in build? | 07:27 |
eandersson | actually does not look like it adds up | 07:29 |
eandersson | in fact when they are stuck I dont see disk in allocations | 07:29 |
eandersson | It honestly looks like allocations is just wrong | 07:31 |
eandersson | (in the db) | 07:31 |
eandersson | I see 3 items in the db, but only two vms on the box (and nothing stuck in building atm) | 07:32 |
gibi | eandersson: do you see logs like " Lock "compute_resources" acquired by "nova.compute.resource_tracker.instance_claim"" ? | 07:35 |
gibi | or in general any 'Lock "compute_resources"' | 07:36 |
eandersson | nothing ;'( | 07:36 |
*** jangutter has joined #openstack-nova | 07:38 | |
*** ttsiouts has quit IRC | 07:38 | |
*** ttsiouts has joined #openstack-nova | 07:39 | |
eandersson | It's possible that the allocations are a legacy of some failed cold migrations tbh | 07:41 |
eandersson | but not sure why they would cause a deadlock | 07:41 |
*** ttsiouts has quit IRC | 07:43 | |
*** Liang__ has quit IRC | 07:44 | |
eandersson | nvm the unaccounted for allocation was another vm stuck in bad state | 07:45 |
eandersson | just forgot --all-projects | 07:45 |
eandersson | stuck in building for over 24 hours :p | 07:46 |
eandersson | also nvm the disk allocation issue... my query had LIMIT 10 on it :D | 07:47 |
eandersson | Let me try to schedule a VM manually to that host and see if it works | 07:48 |
*** rpittau|afk is now known as rpittau | 07:52 | |
*** xek has joined #openstack-nova | 07:52 | |
*** ttsiouts has joined #openstack-nova | 07:53 | |
gibi | eandersson: I suggest to add oslo_concurrency=DEBUG to the [DEFAULT]/default_log_levels config of the nova-compute service because if you see Final resource view logs then you should see logs about the " Lock "compute_resources" as well | 07:54 |
eandersson | Sure I can do that now | 07:54 |
*** xek_ has joined #openstack-nova | 07:54 | |
eandersson | Can reproduce this issue 100% on this host | 07:54 |
gibi | so the periodic jobs can run and update the resource view but no new instances get the chance to claim resoruces | 07:55 |
gibi | both uses the same compute_resources lock so I don't see how one of them can progress and not the other | 07:57 |
eandersson | > Lock "compute_resources" released by "nova.compute.resource_tracker._update_available_resource" : | 07:57 |
*** xek has quit IRC | 07:57 | |
*** ralonsoh has joined #openstack-nova | 07:57 | |
eandersson | > Running periodic task ComputeManager._poll_unconfirmed_resizes run_periodic_tasks | 07:57 |
*** dtantsur|afk is now known as dtantsur | 07:59 | |
*** tssurya has joined #openstack-nova | 07:59 | |
eandersson | > Compute_service record updated for computexxx:computexxx _update_available_resource | 07:59 |
eandersson | > Lock "compute_resources" released by "nova.compute.resource_tracker._update_available_resource" | 07:59 |
eandersson | These are the last two lines | 07:59 |
gibi | eandersson: now when you boot a VM do you see Lock "compute_resources" acquired by "nova.compute.resource_tracker.instance_claim" ? | 08:00 |
*** sapd1 has joined #openstack-nova | 08:00 | |
eandersson | I don't see instance_claim | 08:01 |
gibi | then somehow the build request cannot grab the compute_resources lock | 08:02 |
eandersson | > > Instance x has been scheduled to this compute host, the scheduler has made an allocation against this compute node but the instance has yet to start. Skipping heal of allocation: | 08:02 |
eandersson | > _remove_deleted_instances_allocations | 08:03 |
gibi | eandersson: that is logged because instance.host is not set | 08:04 |
*** tkajinam has quit IRC | 08:05 | |
gibi | eandersson: do you see logs like Lock "805e10fe-2601-4849-9593-3a83f2875bfb" acquired by "nova.compute.manager._locked_do_build_and_run_instance" where the lock name is the uuid of the server being built? | 08:06 |
*** ociuhandu has joined #openstack-nova | 08:06 | |
eandersson | nothing with locked_do_build | 08:07 |
gibi | eandersson: do you have max_concurrent_builds configured ? | 08:09 |
eandersson | nope | 08:09 |
*** xek__ has joined #openstack-nova | 08:11 | |
eandersson | I don't see any lock not getting released | 08:12 |
eandersson | unless it was held before the vm was created | 08:13 |
gibi | this is strange. The instance.uuid lock is grabbed here https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L2039 and inside that lock we set the instance.vm_state from SCHEDULING to BUILDING here https://github.com/openstack/nova/blob/1a226aaa9e8c969ddfdfe198c36f7966b1f692f3/nova/compute/manager.py#L2132 | 08:14 |
*** xek_ has quit IRC | 08:15 | |
gibi | just to be sure does the instance.vm_state is in BUILDING ? | 08:15 |
gibi | or it is still in SCHEDULING ? | 08:15 |
gibi | eandersson: nvm | 08:15 |
gibi | the SCHEDULING was the task_state | 08:15 |
eandersson | > OS-EXT-STS:task_state | scheduling | 08:15 |
*** ociuhandu has quit IRC | 08:15 | |
eandersson | > OS-EXT-STS:vm_state | building | 08:16 |
*** cdent has joined #openstack-nova | 08:16 | |
eandersson | btw restarting the nova-compute does nothing | 08:17 |
eandersson | but deleting the vm works fine | 08:17 |
*** cdent has left #openstack-nova | 08:17 | |
gibi | eandersson: if restarting the compute does not fix the issue then it cannot be a lock as that would be cleaned up by the restart. also it cannot be that your compute run out of RPC workers to process incomming messages either | 08:18 |
*** Shatadru has joined #openstack-nova | 08:18 | |
gibi | but it feels like the build request does not reach the compute | 08:20 |
eandersson | I can see the build request in the db at least | 08:20 |
gibi | eandersson: does other computes using the same message bus (rabbit) work properly? | 08:21 |
eandersson | yep I see messaging flowing between nova and nova-compute | 08:22 |
eandersson | queues are all in rabbit | 08:22 |
eandersson | I could capture the rmq messages to the compute | 08:24 |
*** ratailor_ has joined #openstack-nova | 08:26 | |
bauzas | good morning Nova | 08:26 |
bauzas | eandersson: gibi: any logs from the conductor vs. compute showing this ? | 08:27 |
eandersson | conductor is not in debug so got very little logs there :'( | 08:28 |
* bauzas is just trying to find good SIM card opportunities for data usage while in Shanghai :) | 08:28 | |
*** sapd1 has quit IRC | 08:28 | |
*** ratailor has quit IRC | 08:28 | |
bauzas | eandersson: are you able to follow the req-id down to compute ? | 08:28 |
bauzas | or as gibi said, nothing there ? | 08:28 |
jkulik | is it possible to reconfigure the log-level via the eventlet_backdoor? | 08:28 |
jkulik | just in case it's activated ... | 08:29 |
* gibi have to step away from the machine for a while | 08:29 | |
jkulik | probably doesn't make sense if it's an old request and can't be reproduced. nevermind. | 08:30 |
eandersson | I can reproduce this 100% on this compute | 08:30 |
eandersson | but it's not a dev environment, so can't mess too much | 08:31 |
*** dpawlik has quit IRC | 08:31 | |
eandersson | I have about 10 VMs stuck in this state, so it isn't just one host. | 08:31 |
*** takamatsu has quit IRC | 08:32 | |
eandersson | but this host I am looking into fails 100%, even when specifying the compute using --availability-zone | 08:33 |
sean-k-mooney | eandersson: are you seeing the compute agent pause for a long peiord when running the update resouces periodic task? | 08:37 |
eandersson | It does not look like it | 08:38 |
sean-k-mooney | ok i was wondiering if it was related to libvirt thing we recetly fixed | 08:39 |
eandersson | That is what I was thinking as well. | 08:39 |
eandersson | When we started seeing htis. | 08:39 |
eandersson | I have that build in my lab. | 08:39 |
eandersson | I assume you are referring to https://review.opendev.org/#/c/687535/ | 08:40 |
sean-k-mooney | the fix for the eventlet issue with libvirt | 08:40 |
*** derekh has joined #openstack-nova | 08:40 | |
eandersson | ah not sure about that one | 08:40 |
eandersson | can you link it? | 08:40 |
sean-k-mooney | no i was thinkin of something else ya let me find it | 08:40 |
* gibi is back | 08:40 | |
*** ociuhandu has joined #openstack-nova | 08:40 | |
eandersson | (btw this is with rocky) | 08:40 |
*** mkrai_ has quit IRC | 08:41 | |
*** ociuhandu has quit IRC | 08:42 | |
bauzas | eandersson: sean-k-mooney: if that's a lock issue, the logs will tell it | 08:42 |
sean-k-mooney | yes it likely unrelated and not backported https://review.opendev.org/#/c/677736/ | 08:43 |
sean-k-mooney | its not a lock issue | 08:43 |
bauzas | again, tracking the request-id is super important | 08:43 |
sean-k-mooney | but it was causeing rpc issues | 08:43 |
bauzas | sean-k-mooney: we were suspecting some lock holding the RPC calls | 08:44 |
bauzas | but, anyway, logs, logs, logs | 08:44 |
eandersson | If it was an RPC issue it would timeout at some point at some end | 08:44 |
eandersson | right? | 08:44 |
bauzas | correct | 08:44 |
sean-k-mooney | eventrually you would gett a messaging time out yes | 08:44 |
eandersson | One of these VMs are stuck for 24 hours with no error logs | 08:44 |
bauzas | eandersson: again, are you able to track the request down on compute ? | 08:45 |
bauzas | what's the last step the logs are telling you for a specific instance ? | 08:45 |
gibi | also the bug behind the https://review.opendev.org/#/c/677736/ says that this bug makes the compute marked down, which is not the case for eandersson | 08:45 |
openstackgerrit | Tushar Patil proposed openstack/nova-specs master: Allow compute nodes to use DISK_GB from shared storage RP https://review.opendev.org/650188 | 08:45 |
bauzas | oh wait, the last task state is "scheduling" ? | 08:46 |
bauzas | that's waaaaay different from a compute issue then :) | 08:46 |
sean-k-mooney | gibi: right it cause the compute agent too block on the call to libvirt and nothing else gets processed until libvirt returns | 08:47 |
eandersson | Yea - not sure it is nova-compute specific, since restarting nova-compute has no effect | 08:47 |
eandersson | but that specific compute has this issue, but the next one works fine | 08:48 |
eandersson | keep in mind I am just bypassing the scheduler using --availability-zone nova:<compute> | 08:48 |
sean-k-mooney | there was a case about a mont ago where a patch was submitted for a codepath were we did not catch an exception that left the vm in building for ever. but i cant recall which one it was | 08:48 |
bauzas | eandersson: IIRC, 'scheduling' task state is different from 'spawning' | 08:48 |
eandersson | Yea | 08:49 |
sean-k-mooney | you would see a trace in the compute log if that was the case which also seams not to be the case | 08:49 |
bauzas | eandersson: so, I suspect nothing goes back from scheduler | 08:49 |
bauzas | so the conductor can't trigger the RPC call to compute | 08:49 |
bauzas | which will change the task to spawn | 08:49 |
bauzas | actually, https://docs.openstack.org/nova/latest/reference/vm-states.html#create-instance-states | 08:52 |
bauzas | we unset the task state before plugging the network and the devices | 08:52 |
bauzas | but since you're still on 'scheduling', I bet nothing goes down to compute and stays on conductor | 08:52 |
bauzas | again, logs... | 08:52 |
*** mkrai_ has joined #openstack-nova | 08:54 | |
eandersson | If I search for the req from VM create I only find the api call | 08:55 |
eandersson | I only find the compute because it's in the database. | 08:55 |
eandersson | If I could reproduce it in the lab I could give you all the logs you can dream off :p | 08:56 |
*** dpawlik has joined #openstack-nova | 08:57 | |
bauzas | eandersson: even on production, can't you just query through some specific request-id ? | 08:59 |
eandersson | Yes | 08:59 |
eandersson | I can give you anything that is INFO or higher =] | 08:59 |
bauzas | that's enough | 09:00 |
bauzas | also, you could just query the os-instance-action API to get you a bit of what happened https://docs.openstack.org/api-ref/compute/?expanded=#servers-actions-servers-os-instance-actions | 09:01 |
bauzas | eandersson: accordingly, for a specific request-id corresponding to a server create, are you able to track the last service involved ? | 09:01 |
eandersson | nova.api is the last service with that request-id | 09:02 |
bauzas | crazy | 09:02 |
eandersson | > [<req> <x> <y>]<IP> "POST /v2.1/<tenant>/servers" status: 202 len: 472 microversion: 2.1 time: 1.453706 | 09:02 |
eandersson | I can see the scheduling request in the database under request_specs | 09:03 |
bauzas | eandersson: I'd suggest you to use the os-instance-action API then | 09:03 |
bauzas | and look at the events, if you enabled them | 09:03 |
bauzas | eandersson: the request_spec record is created by the nova-api service so that doesn't prove it reached the scheduler service | 09:04 |
eandersson | Sure | 09:04 |
eandersson | Wouldn't specifying the host actually by-pass the scheduler? | 09:04 |
bauzas | it depends, which release ? | 09:04 |
eandersson | Rocky | 09:05 |
sean-k-mooney | if you use the avialbality zone way partly | 09:05 |
bauzas | nope, it won't then | 09:05 |
sean-k-mooney | it will check that the az exists | 09:05 |
bauzas | sean-k-mooney: even with this, it will call out the scheduler | 09:05 |
sean-k-mooney | and then skip the rest | 09:05 |
sean-k-mooney | sure | 09:05 |
sean-k-mooney | but it does not run all the filters | 09:05 |
bauzas | oh yeah, syre | 09:05 |
bauzas | but eandersson tells us that nothing but nova-api shows evidence of the request ID | 09:05 |
bauzas | even not the conductor | 09:06 |
sean-k-mooney | it will still call the schduler however so you should see something in the log for it | 09:06 |
eandersson | Well conductor logs nothing | 09:06 |
eandersson | We have many thousands vms per day and the logs in the conductor is zero | 09:06 |
bauzas | you should call Alice to follow the rabbit... | 09:06 |
eandersson | ... | 09:06 |
*** zbr has quit IRC | 09:07 | |
sean-k-mooney | eandersson: is the only indication of this request the api log and the db entry? | 09:07 |
sean-k-mooney | e.g. are you not seeing it in any other service at all? | 09:07 |
eandersson | Let me search for the instance uuid | 09:08 |
*** victor286 has quit IRC | 09:08 | |
*** janki has quit IRC | 09:09 | |
eandersson | nova, placement and neutron-server are showing up | 09:10 |
bauzas | eandersson: we do non-blocking RPC calls for the nova-api service | 09:10 |
*** zbr has joined #openstack-nova | 09:11 | |
bauzas | eandersson: so I wouldn't be surprised if you wouldn't capture RPC timeouts | 09:11 |
bauzas | hence the rabbit queue checks | 09:11 |
eandersson | The rabbitmq queues look fine. Nothing queued, nothing stuck in unack'd. I could capture messages going to the queues. | 09:12 |
sean-k-mooney | in rocky we are not patching the api with eventlet so we dont have the heartbeat issue but i was just wondering how far it got in the boot process before going silent | 09:13 |
sean-k-mooney | if we only see reference to the instance boot request in the api but not in the conductor or schduler it implies its failing very early | 09:14 |
eandersson | I only see the compute in placement | 09:14 |
sean-k-mooney | ok so its getting to the schduler then | 09:15 |
sean-k-mooney | infact if its created the allcoation its pass the filters and selected the host | 09:15 |
*** HagunKim has quit IRC | 09:16 | |
sean-k-mooney | so its failing somethime beetween the schduler retruning to the conductor and the conductor callign the compute node | 09:16 |
bauzas | sean-k-mooney: that's not my understanding from what eandersson said by "I only see the compute in placement" | 09:17 |
sean-k-mooney | oh i may have misread that | 09:17 |
bauzas | eandersson: no GET logs on https://docs.openstack.org/api-ref/placement/?expanded=list-allocation-candidates-detail#list-allocation-candidates ? | 09:17 |
eandersson | I see build_and_run_instance hit the compute | 09:18 |
bauzas | WTF | 09:18 |
eandersson | when capturing a rabbitmq message | 09:18 |
eandersson | received by the compute | 09:18 |
bauzas | but no logs ? | 09:19 |
bauzas | I suspect your log factory not working then :D | 09:19 |
bauzas | that's... crazy | 09:19 |
*** ociuhandu has joined #openstack-nova | 09:21 | |
eandersson | bauzas, talking about compute or conductor? | 09:21 |
bauzas | I frankly don't know what to say, I'm just lost | 09:22 |
eandersson | like look at the conductor https://zuul.opendev.org/t/openstack/build/9a820944e63e409cb8dbf5b83931263e/log/logs/screen-n-cond.txt.gz | 09:22 |
eandersson | and search for INFO | 09:22 |
bauzas | you're seeing only logs on api service, but you just told you're able to see a build call to compute | 09:22 |
eandersson | You'll find like 5 logs | 09:22 |
eandersson | https://zuul.opendev.org/t/openstack/build/d34ddb7bd62148968c33f4fe8f348e8b/log/controller/logs/screen-n-super-cond.txt.gz | 09:22 |
eandersson | I am not sure what you are talking about to be honest. | 09:23 |
eandersson | https://zuul.opendev.org/t/openstack/build/d34ddb7bd62148968c33f4fe8f348e8b/log/controller/logs/screen-n-sch.txt.gz | 09:23 |
eandersson | This is the scheduler, again search for INFO | 09:24 |
eandersson | you'll find like zero entries | 09:24 |
eandersson | We have 1k compute nodes, and enabling debug would be very... spammy | 09:24 |
eandersson | but unfortunately INFO does not provide a ton of logs outside of API | 09:25 |
*** ociuhandu has quit IRC | 09:25 | |
bauzas | eandersson: as I proposed you, can you please do some 'nova instance-action-list' stuff to get more knowledge ? | 09:27 |
bauzas | but I understand your point, INFO logs aren't talkative | 09:27 |
bauzas | I thought we fixed that in Newton (or sometimes around it) | 09:28 |
eandersson | Is that the same as openstack server even list? | 09:28 |
eandersson | *event | 09:28 |
eandersson | because os-instance-actions just shows me the request id, server id, action and start time | 09:29 |
eandersson | let me check the body | 09:29 |
*** ociuhandu has joined #openstack-nova | 09:31 | |
*** ociuhandu has quit IRC | 09:32 | |
*** mkrai_ has quit IRC | 09:32 | |
*** ociuhandu has joined #openstack-nova | 09:33 | |
sean-k-mooney | bauzas: ya looking at the conductor logs for the gate jobs the non debug version is not very useful | 09:33 |
sean-k-mooney | we likely should make https://zuul.opendev.org/t/openstack/build/df644e9fdde346f2813e6220312a9ca5/log/controller/logs/screen-n-super-cond.txt.gz#867 info level | 09:34 |
eandersson | I enabled debug | 09:35 |
eandersson | > [instance: x] Selected host: compute1031; Selected node: compute1031; ; Alternates: [] schedule_and_build_instances | 09:35 |
bauzas | eandersson: I checked and events are shown by default if you are on Pike and above when you call out instance-action API | 09:35 |
eandersson | > Re-scheduling is disabled. populate_retry | 09:35 |
bauzas | eandersson: but you need to login with admin creds | 09:36 |
bauzas | https://docs.openstack.org/api-ref/compute/?expanded=show-server-action-details-detail#id170 | 09:36 |
bauzas | anyway, gotta run, | 09:36 |
bauzas | \o | 09:36 |
eandersson | http://paste.openstack.org/show/EUjnd5TWftogxGIVaSNP/ | 09:37 |
eandersson | This is the last log line | 09:37 |
eandersson | from the conductor / scheduler | 09:37 |
eandersson | yea bauzas nothing interesting in the events | 09:38 |
*** Shatadru has quit IRC | 09:38 | |
eandersson | Everything looks to be scheduled fine | 09:39 |
sean-k-mooney | right that is the last log for the instace we expect to see in the conductor log | 09:39 |
sean-k-mooney | well unless there is an error | 09:39 |
eandersson | no errors in the logs | 09:40 |
eandersson | :'( | 09:40 |
sean-k-mooney | and nothing in the compute agent long in debug either | 09:40 |
sean-k-mooney | it just disappars? | 09:40 |
eandersson | I pasted some logs early on | 09:40 |
sean-k-mooney | you captured the rpc right beign recived | 09:41 |
eandersson | let me check again | 09:41 |
sean-k-mooney | oh ill scroll back | 09:41 |
*** mkrai_ has joined #openstack-nova | 09:44 | |
tssurya | eandersson: what's the exact boot request you are making ? | 09:45 |
sean-k-mooney | i dont see the logs for the compute scrolling back unfortunetly. | 09:45 |
eandersson | sean-k-mooney, I think the logs we discussed were things like | 09:46 |
eandersson | > instance x has been scheduled to this compute host | 09:46 |
eandersson | Let me enable even more debug logs | 09:47 |
sean-k-mooney | ah am you can re disable debug on the conductor/schduler since we are pretty sure its getting to the compute node at this point right | 09:47 |
eandersson | Yea already disabled | 09:48 |
eandersson | generates a lot of logs :p | 09:48 |
sean-k-mooney | for 200 servers with all servceince in debug i belive its something like 30GB a day in uncompressed logs | 09:48 |
sean-k-mooney | thats what kolla-ansibel said in there configuration docs in anycase | 09:49 |
eandersson | hehe well that is without kubernetes, octavia etc hitting your apis constantly | 09:49 |
eandersson | plus all other automation | 09:49 |
sean-k-mooney | right you would really want log rotate to be working well if you use debug always | 09:50 |
*** ricolin has quit IRC | 09:50 | |
sean-k-mooney | we shoudl try to have more of a blance however. e.g. make info more useful without being just spam | 09:51 |
eandersson | On compute I see a message being received, but nothing more | 09:52 |
eandersson | > DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: ad3b7ccda9944e9e87f497f4401aa0b4 __call__ | 09:52 |
sean-k-mooney | ok so its not building the libvirt xml or plugging any ports | 09:52 |
sean-k-mooney | its just reciving it | 09:53 |
eandersson | yea | 09:53 |
sean-k-mooney | did you also check with the instance uuid | 09:53 |
eandersson | I bet this compute is bust, but everything seems fine (including already existing vms) | 09:53 |
eandersson | yea | 09:53 |
*** mkrai_ has quit IRC | 09:53 | |
eandersson | I do get some logs with the instance id | 09:53 |
eandersson | but it's just the ones I posted earlier | 09:54 |
sean-k-mooney | ok | 09:54 |
eandersson | > Instance ... has been scheduled to this compute host, the scheduler has made an allocation against this compute node but the instance has yet to start. Skipping heal of allocation: ... | 09:54 |
sean-k-mooney | right that is the periodic task | 09:54 |
sean-k-mooney | that make sure the resouce useage on the host matches what we expect | 09:55 |
sean-k-mooney | that is normal to see between when the host has been selected in the db as the target for the instnace and the vm starting | 09:55 |
eandersson | Anyway bed time. It's 3AM. | 09:57 |
*** brinzhang has joined #openstack-nova | 09:57 | |
eandersson | Thanks for helping sean-k-mooney and gibi | 09:57 |
sean-k-mooney | eandersson: o/ get some rest | 09:58 |
*** Luzi has quit IRC | 10:00 | |
*** mkrai_ has joined #openstack-nova | 10:10 | |
*** mjozefcz|afk is now known as mjozefcz | 10:18 | |
*** ociuhandu has quit IRC | 10:27 | |
*** ociuhandu has joined #openstack-nova | 10:27 | |
*** ttsiouts has quit IRC | 10:30 | |
*** ttsiouts has joined #openstack-nova | 10:31 | |
openstackgerrit | Brin Zhang proposed openstack/python-novaclient master: Add functional test for migration-list in v2.80 https://review.opendev.org/688635 | 10:32 |
openstackgerrit | Merged openstack/nova stable/queens: Fix 'has_calls' method calls in unit tests https://review.opendev.org/677378 | 10:33 |
*** brinzhang_ has joined #openstack-nova | 10:34 | |
*** ttsiouts has quit IRC | 10:36 | |
*** brinzhang has quit IRC | 10:36 | |
*** brinzhang has joined #openstack-nova | 10:37 | |
*** brinzhang_ has quit IRC | 10:38 | |
gibi | it is only me or the powervm unit tests started failing on master? | 10:42 |
*** tbachman has quit IRC | 10:42 | |
*** dpawlik has quit IRC | 10:43 | |
gibi | AttributeError: 'DiGraph' object has no attribute 'node' | 10:44 |
*** Luzi has joined #openstack-nova | 10:44 | |
*** CeeMac has joined #openstack-nova | 10:47 | |
gibi | https://702b7e8f253d29e679a6-2fe3f6c342189909aad5220492fb4721.ssl.cf1.rackcdn.com/688387/5/check/openstack-tox-cover/2c6410c/testr_results.html.gz | 10:47 |
gibi | http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AttributeError%3A%20'DiGraph'%20object%20has%20no%20attribute%20'node'%5C%22&from=7d | 10:48 |
gibi | there are plenty of hits not just in nova | 10:48 |
gibi | only affecting python 3.x jobs not the 2.7 jobs | 10:50 |
jkulik | https://github.com/networkx/networkx/commit/6b1ce03f485076d39994e8d624bbf6ca82466eb9#diff-027182481aebf9ad0dda6ca00714653aR95 seems to be the cause | 10:55 |
jkulik | seems like there's no upper version requirement defined in the taskflow library for python3 https://opendev.org/openstack/taskflow/src/branch/master/requirements.txt#L24 | 10:56 |
*** brinzhang_ has joined #openstack-nova | 10:57 | |
*** udesale has quit IRC | 10:57 | |
*** brinzhang_ has quit IRC | 10:58 | |
gibi | jkulik: good find. I agree that this can be the problem | 10:59 |
*** brinzhang has quit IRC | 11:00 | |
*** mkrai_ has quit IRC | 11:02 | |
frickler | gibi: jkulik: that matches what I found comparing pip freeze from working and broken jobs. want to propose a cap to reqs as a quick fix? | 11:06 |
*** mtreinish has joined #openstack-nova | 11:12 | |
*** dpawlik has joined #openstack-nova | 11:16 | |
*** nweinber has joined #openstack-nova | 11:32 | |
*** ttsiouts has joined #openstack-nova | 11:33 | |
*** ttsiouts has quit IRC | 11:44 | |
*** ttsiouts has joined #openstack-nova | 11:45 | |
*** dviroel has joined #openstack-nova | 11:45 | |
*** epoojad1 has joined #openstack-nova | 11:45 | |
*** nweinber has quit IRC | 11:46 | |
*** ociuhandu has quit IRC | 11:47 | |
*** ociuhandu has joined #openstack-nova | 11:48 | |
*** Luzi has quit IRC | 11:48 | |
*** markvoelker has joined #openstack-nova | 11:48 | |
*** ttsiouts has quit IRC | 11:49 | |
*** ratailor_ has quit IRC | 11:51 | |
*** ociuhandu has quit IRC | 11:54 | |
*** bbowen has joined #openstack-nova | 12:03 | |
*** tbachman has joined #openstack-nova | 12:03 | |
*** ociuhandu has joined #openstack-nova | 12:03 | |
*** Luzi has joined #openstack-nova | 12:04 | |
*** jamesden_ is now known as jamesdenton | 12:09 | |
*** ociuhandu has quit IRC | 12:11 | |
*** ociuhandu has joined #openstack-nova | 12:11 | |
*** ociuhandu has quit IRC | 12:16 | |
*** sapd1 has joined #openstack-nova | 12:21 | |
*** larainema has quit IRC | 12:26 | |
*** takamatsu has joined #openstack-nova | 12:35 | |
*** brinzhang has joined #openstack-nova | 12:37 | |
*** dtantsur is now known as dtantsur|brb | 12:38 | |
*** mriedem has joined #openstack-nova | 12:42 | |
*** sapd1 has quit IRC | 12:42 | |
openstackgerrit | Brin Zhang proposed openstack/python-novaclient master: Add functional test for migration-list in v2.80 https://review.opendev.org/688635 | 12:42 |
brinzhang | mriedem: There is an issue, do you have time to review https://review.opendev.org/#/c/688635/5/novaclient/tests/functional/v2/test_migrations.py@116 | 12:44 |
*** dave-mccowan has joined #openstack-nova | 12:44 | |
*** eharney has quit IRC | 12:45 | |
brinzhang | mriedem: I am not find the way to get the in-progress live migration server to do the server-migration-list/show | 12:45 |
gibi | frickler jkulik: sorry I was pulled into a meeting in the meanwhile | 12:45 |
gibi | frickler: at the end I think taskflow needs to be fixed or the networkx req in taskflow needs to be pinned | 12:46 |
mriedem | brinzhang: likely not going to be able to do that one since it's not deterministic | 12:47 |
frickler | gibi: yes, updating taskflow not to use the deprecated attribute anymore would be the correct permanant solution I think | 12:47 |
frickler | gibi: also the u-c update here is failing, so this currently should only affect "special" jobs https://review.opendev.org/689079 | 12:48 |
*** Luzi has quit IRC | 12:48 | |
*** epoojad1 has quit IRC | 12:48 | |
brinzhang | mriedem: should I remove the server-migraton tests from this patch ? and todo it by follow-up in novaclient ? | 12:48 |
*** Luzi has joined #openstack-nova | 12:49 | |
*** derekh has quit IRC | 12:49 | |
gibi | frickler: hm, but if upper-contraints pins networkx to 2.3 then how can be that py3.7 jobs are pulling in networkx 2.4? | 12:50 |
brinzhang | mriedem: If I make fake data, I don't think it's necessary to do this test. | 12:50 |
frickler | gibi: might be jobs ignoring u-c? didn't check in detail yet. | 12:51 |
mriedem | brinzhang: yeah i suppose, i forgot that those are only for in-progress live migrations | 12:51 |
mriedem | we can't do those anyway since the functional job is single-node | 12:51 |
brinzhang | yes | 12:51 |
openstackgerrit | Brin Zhang proposed openstack/python-novaclient master: Add functional test for migration-list in v2.80 https://review.opendev.org/688635 | 12:53 |
*** brinzhang_ has joined #openstack-nova | 12:54 | |
brinzhang_ | mriedem: I was updated this patch. | 12:54 |
*** takashin has joined #openstack-nova | 12:54 | |
*** ttsiouts has joined #openstack-nova | 12:54 | |
mriedem | ok | 12:54 |
*** tetsuro has quit IRC | 12:55 | |
*** tetsuro has joined #openstack-nova | 12:55 | |
brinzhang_ | mriedem: thanks. | 12:56 |
*** brinzhang has quit IRC | 12:56 | |
gibi | frickler: locally when I reproduce the problem I see the following in the tox log | 12:59 |
gibi | $ cat .tox/py37/log/py37-1.log | grep networkx | 12:59 |
gibi | Ignoring networkx: markers 'python_version == "2.7"' don't match your environment | 12:59 |
gibi | Ignoring networkx: markers 'python_version == "3.4"' don't match your environment | 12:59 |
gibi | Ignoring networkx: markers 'python_version == "3.5"' don't match your environment | 12:59 |
gibi | Ignoring networkx: markers 'python_version == "3.6"' don't match your environment | 12:59 |
gibi | frickler: nvm, I use py3.7 so those logs are valid | 13:00 |
frickler | gibi: IIUC that's expected when running with python3.7, but is there a cap with ==3.7 in place? | 13:00 |
frickler | gibi: which repo are you running this in, nova or taskflow? | 13:00 |
*** nweinber has joined #openstack-nova | 13:04 | |
*** ttsiouts has quit IRC | 13:06 | |
*** ttsiouts has joined #openstack-nova | 13:07 | |
*** ociuhandu has joined #openstack-nova | 13:09 | |
*** ttsiouts has quit IRC | 13:11 | |
mriedem | are the powervm driver unit tests failing since yesterday a known issue? | 13:11 |
mriedem | https://c6fecb2db5c55fa0effa-6229cc6450d9b491384804026d2fbd81.ssl.cf5.rackcdn.com/688980/1/gate/openstack-tox-py36/71a8bdd/testr_results.html.gz | 13:11 |
frickler | mriedem: IIUC that's the networkx issue | 13:11 |
frickler | gibi: seems transitive upper-constraints are ignored | 13:12 |
frickler | see e.g. https://zuul.opendev.org/t/openstack/build/2c6410c8f6344d19b9c88844b93f0683/log/job-output.txt#525-528 | 13:12 |
*** derekh has joined #openstack-nova | 13:12 | |
mriedem | yeah 2.4 was released 11 hours ago | 13:12 |
*** gbarros has joined #openstack-nova | 13:12 | |
*** takamatsu has quit IRC | 13:12 | |
*** ttsiouts has joined #openstack-nova | 13:14 | |
mriedem | https://bugs.launchpad.net/nova/+bug/1848499 | 13:15 |
openstack | Launchpad bug 1848499 in OpenStack Compute (nova) "powervm driver tests fail with networkx 2.4: "AttributeError: 'DiGraph' object has no attribute 'node'"" [Critical,Confirmed] | 13:15 |
efried | mriedem: do we need to cap taskflow? | 13:15 |
efried | oh, or networkx | 13:15 |
mriedem | it is capped in upper-constraints, | 13:15 |
mriedem | but as frickler said it seems it's not being honored | 13:15 |
frickler | if I add "networkx>=1.11" to nova/test-reqs.txt, the cap works. without it, it doesn't | 13:15 |
* mriedem goes to requirements | 13:15 | |
*** udesale has joined #openstack-nova | 13:16 | |
openstackgerrit | Huachang Wang proposed openstack/nova master: To create single NUMA node instance in function '_get_numa_topology_auto' https://review.opendev.org/688932 | 13:17 |
openstackgerrit | Huachang Wang proposed openstack/nova master: Assign and track instance pinning cpu through 'cpu_pinning' field https://review.opendev.org/688933 | 13:17 |
openstackgerrit | Huachang Wang proposed openstack/nova master: Add a new instance CPU allocation policy: mixed https://review.opendev.org/688934 | 13:17 |
openstackgerrit | Huachang Wang proposed openstack/nova master: virt/libvirt: Get host pin cpuset according instance cpu_pinning https://review.opendev.org/688935 | 13:17 |
openstackgerrit | Huachang Wang proposed openstack/nova master: metadata: export the vCPU IDs that are pinning on the host CPUs https://review.opendev.org/688936 | 13:17 |
*** mjozefcz has quit IRC | 13:19 | |
gibi | frickler: it is the nova repo I'm using to reproduce | 13:24 |
*** mjozefcz has joined #openstack-nova | 13:26 | |
gibi | frickler, mriedem: If I add '+ -r{toxinidir}/requirements.txt' | 13:27 |
gibi | frickler, mriedem: to tox.ini then the transitive deps are correct and the test passes | 13:27 |
mriedem | stephenfin: ^? | 13:28 |
mriedem | looks like networkx 2.4 changed Graph.node to Graph.nodes, weeee https://github.com/networkx/networkx/blob/networkx-2.4/doc/release/release_2.4.rst#deprecations | 13:28 |
*** mkrai_ has joined #openstack-nova | 13:30 | |
*** eharney has joined #openstack-nova | 13:30 | |
stephenfin | looking | 13:30 |
mriedem | gibi: i want to say we used to explicitly include -r{toxinidir}/requirements.txt in deps but it was removed b/c it's installed for us in the tox env | 13:30 |
mriedem | but maybe that breaks u-c processing? | 13:30 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Make sure tox install requirements.txt with upper-constraints https://review.opendev.org/689152 | 13:31 |
gibi | mriedem, frickler, stephenfin: ^^ temporary fix | 13:31 |
stephenfin | mriedem, gibi: We need mordred/smcginnis for this | 13:31 |
frickler | gibi: mriedem: stephenfin: that seems a bug introduced in https://review.opendev.org/#/c/684775/ | 13:31 |
stephenfin | gibi: You're essentially reverting b13c33caa07fc82b19c233f9ad46a1813eb3e76d | 13:31 |
* smcginnis reads scrollback | 13:32 | |
stephenfin | I think we should probably do that or 19a0bdfec454bd921b718e7dc49fe2673fa79b10 | 13:32 |
stephenfin | smcginnis: we don't include 'requirements.txt' in deps because tox builds an sdist for us which will include said deps automagically | 13:32 |
openstackgerrit | Takashi NATSUME proposed openstack/nova stable/stein: Fix unit of hw_rng:rate_period https://review.opendev.org/689153 | 13:32 |
stephenfin | however, I switched from overriding 'install_command' to providing constraints via '-c FILE' in deps | 13:33 |
mriedem | stephenfin: you mean revert https://review.opendev.org/#/c/684775/ right? | 13:33 |
frickler | stephenfin: but it seems to include uncapped reqs | 13:33 |
gibi | stephenfin: install_commands are used for every install step in tox while deps only used to install test-requirements ? | 13:33 |
stephenfin | gibi: Yeah, I think so | 13:33 |
smcginnis | stephenfin: This seems OK - https://review.opendev.org/#/c/689152/1/tox.ini | 13:33 |
smcginnis | That is what is done elsewhere. | 13:33 |
openstackgerrit | Takashi NATSUME proposed openstack/nova stable/rocky: Fix unit of hw_rng:rate_period https://review.opendev.org/689154 | 13:33 |
smcginnis | Then the other jobs that need different deps, like the lower-constraints job, override "deps" to set the requirements it needs. | 13:34 |
stephenfin | smcginnis: Cool. I didn't know this would happen so I need to go make sure we haven't broken other projects | 13:34 |
smcginnis | As long as the -c isn't hard coded in the install_command, settings "deps" should be flexible enough to use different requirements for different jobs. | 13:35 |
openstackgerrit | Takashi NATSUME proposed openstack/nova stable/queens: Fix unit of hw_rng:rate_period https://review.opendev.org/689155 | 13:35 |
*** factor has quit IRC | 13:36 | |
smcginnis | One tricky bit I've seen is that it's not always obvious that putting a deps line in a tox environment overrides rather than appends to the deps that are used. So there have been some cases where teams have meant to use an additional file but have ended up excluding some common ones. | 13:36 |
smcginnis | So just make sure wherever used deps you always include things like test-requirements (where appropriate of course). | 13:36 |
stephenfin | gibi: comments left | 13:36 |
stephenfin | smcginnis: Yeah, we're good there. We use the '{testenv[blah]}deps' syntax everywhere that matters | 13:37 |
mriedem | stephenfin: replied to you in there | 13:37 |
*** ileixe has joined #openstack-nova | 13:38 | |
smcginnis | stephenfin: ++ | 13:38 |
stephenfin | mriedem: yup, both valid | 13:38 |
stephenfin | gibi, efried, mriedem, alex_xu: Also, I'm on PTO from tomorrow until the summit. Just FYI | 13:39 |
efried | holy crap, that's in like two and a half weeks, I didn't realize how close it was. | 13:40 |
sean-k-mooney | yep | 13:40 |
stephenfin | you're telling me | 13:40 |
efried | gdi, I have to take my kid to the dentist this morning during meeting time | 13:41 |
gibi | stephenfin: I've just confirmed that haveing the constraint in the install_command also works | 13:41 |
stephenfin | gibi: yeah, I think we don't want to do that because people forget to override install_command for the lower-constraints target | 13:41 |
stephenfin | at least that's what I took away from smcginnis' comments above and elsewhere | 13:42 |
*** lbragsta_ has joined #openstack-nova | 13:42 | |
smcginnis | Correct. | 13:42 |
gibi | stephenfin, smcginnis: I got your comments on the fix, I will respin that patch quickly | 13:43 |
openstackgerrit | Takashi NATSUME proposed openstack/nova stable/pike: Fix unit of hw_rng:rate_period https://review.opendev.org/689158 | 13:43 |
*** gbarros has quit IRC | 13:44 | |
efried | okay, I'm going to be late to the meeting, but hopefully not more than a few minutes | 13:46 |
*** gbarros has joined #openstack-nova | 13:46 | |
efried | poor planning (from like months ago) | 13:46 |
*** efried is now known as efried_afk | 13:46 | |
*** brinzhang has joined #openstack-nova | 13:47 | |
*** brinzhang_ has quit IRC | 13:50 | |
mriedem | i'll start the meeting | 13:50 |
*** jawad_axd has quit IRC | 13:52 | |
*** jawad_axd has joined #openstack-nova | 13:54 | |
*** ociuhandu has quit IRC | 13:57 | |
*** Luzi has quit IRC | 13:57 | |
*** jawad_axd has quit IRC | 13:58 | |
mriedem | nova meeting starting now in -meeting | 14:00 |
*** gbarros has quit IRC | 14:02 | |
*** ociuhandu has joined #openstack-nova | 14:02 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Make sure tox install requirements.txt with upper-constraints https://review.opendev.org/689152 | 14:03 |
gibi | mriedem, stephenfin, smcginnis ^^ | 14:03 |
*** gbarros has joined #openstack-nova | 14:04 | |
openstackgerrit | Dan Smith proposed openstack/nova stable/train: Update compute rpc version alias for train https://review.opendev.org/689164 | 14:05 |
*** jawad_axd has joined #openstack-nova | 14:05 | |
sean-k-mooney | oh for the docs job. there was a reson we did not isntall requriement and test requireemnt in the docs job before | 14:06 |
sean-k-mooney | the way we build docs useign auto generation its kind of required but there was pushback to doing htis in the past. | 14:07 |
*** ociuhandu has quit IRC | 14:07 | |
*** brinzhang_ has joined #openstack-nova | 14:08 | |
openstackgerrit | Huachang Wang proposed openstack/nova master: [WIP] metadata: export the vCPU IDs that are pinning on the host CPUs https://review.opendev.org/688936 | 14:08 |
*** bnemec has joined #openstack-nova | 14:11 | |
*** brinzhang has quit IRC | 14:12 | |
*** efried_afk is now known as efried | 14:12 | |
*** gbarros has quit IRC | 14:13 | |
*** sapd1 has joined #openstack-nova | 14:13 | |
*** dtantsur|brb is now known as dtantsur | 14:14 | |
*** ociuhandu has joined #openstack-nova | 14:15 | |
stephenfin | sean-k-mooney: We're installing requirements.txt by default already. Unless you have skipsdist=False configured, tox will build your package for you | 14:15 |
stephenfin | which obviously requires all dependencies in requirements.txt at a minimum | 14:16 |
sean-k-mooney | ya i know | 14:16 |
sean-k-mooney | because of the way our docs work you need /requirements.txt. and /docs/requirements.txt | 14:16 |
sean-k-mooney | you should not need test-requirements.txt but it wont break anything if its installed | 14:17 |
*** gbarros has joined #openstack-nova | 14:18 | |
*** lpetrut has quit IRC | 14:18 | |
*** yan0s has joined #openstack-nova | 14:20 | |
*** gbarros has quit IRC | 14:20 | |
*** gbarros has joined #openstack-nova | 14:23 | |
*** dpawlik has quit IRC | 14:27 | |
*** gbarros has quit IRC | 14:27 | |
*** gbarros has joined #openstack-nova | 14:28 | |
*** mkrai_ has quit IRC | 14:30 | |
*** mkrai_ has joined #openstack-nova | 14:30 | |
*** takamatsu has joined #openstack-nova | 14:32 | |
*** yan0s has quit IRC | 14:32 | |
*** yan0s has joined #openstack-nova | 14:32 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Change order of two classes https://review.opendev.org/689178 | 14:36 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Rework '_delete_server' https://review.opendev.org/689179 | 14:36 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Make '_wait_for_state_change' behave consistently https://review.opendev.org/689180 | 14:36 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Unify '_wait_until_deleted' implementations https://review.opendev.org/689181 | 14:36 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Make 'ServerTestBase' subclass 'InstanceHelperMixin' https://review.opendev.org/689182 | 14:36 |
*** jawad_axd has quit IRC | 14:37 | |
stephenfin | mdbooth, sean-k-mooney: ^ | 14:40 |
*** gyee has joined #openstack-nova | 14:40 | |
*** ttsiouts has quit IRC | 14:40 | |
stephenfin | the juicy one in the middle is failing for reasons I haven't yet grokked, but that's where I'm going with it | 14:40 |
*** ttsiouts has joined #openstack-nova | 14:43 | |
*** sridharg has quit IRC | 14:43 | |
openstackgerrit | Matthew Booth proposed openstack/nova master: Add new functional test base for libvirt tests https://review.opendev.org/689186 | 14:44 |
mdbooth | stephenfin sean-k-mooney: ^^^ | 14:45 |
mdbooth | sean-k-mooney: I ripped out a few things you added for your test because I'm not using them in mine, so you'll almost certainly want to add some of them back in | 14:46 |
mdbooth | sean-k-mooney: But I think that's ok. i.e. Extend functionality as the use case arises. | 14:46 |
sean-k-mooney | sure | 14:46 |
*** ricolin has joined #openstack-nova | 14:47 | |
mdbooth | However, with ^^^, a functional libvirt test is: inherit IntegratedTestBase; self._start_compute('compute1'); server = self._create_active_server() | 14:47 |
mdbooth | And I like that simplicity | 14:47 |
*** jmlowe has quit IRC | 14:48 | |
sean-k-mooney | ya that seam like a good way forward | 14:48 |
* efried chauffeurs again | 14:53 | |
* efried totally fubared today's calendar | 14:53 | |
*** efried is now known as efried_afk | 14:53 | |
* melwitt lobbys | 14:53 | |
melwitt | dansmith: from the meeting, any opinion on whether this is better off as a bp or a wishlist bug that is backportable? https://blueprints.launchpad.net/nova/+spec/nova-manage-db-purge-task-log | 14:54 |
melwitt | task_log records pile up and there's no way to clean them up | 14:54 |
*** TxGirlGeek has joined #openstack-nova | 14:54 | |
melwitt | gibi, stephenfin, bauzas: ^ any opinion | 14:55 |
melwitt | ? | 14:55 |
* mriedem notes she asked everyone except the ptl :) | 14:56 | |
gibi | melwitt: I will have to dig a bit | 14:57 |
gibi | melwitt: give me 15 minutes as there is a paralle meeting | 14:57 |
mriedem | mdbooth: new functional tests shouldn't be using IntegratedTestBase | 14:57 |
*** mkrai_ has quit IRC | 14:57 | |
melwitt | mriedem: he's afk! | 14:57 |
mriedem | it's got all sorts of warts from api samples tests, like CastAsCall fixture and stuff | 14:57 |
melwitt | efried_afk: can you give your opinion on bp vs wishlist bug while you're driving pls ^ | 14:58 |
*** gbarros has quit IRC | 14:58 | |
dansmith | melwitt: I'd probably say it's a feature like db purge was, but I understand the desire to make it backportable (for real value), so I don't feel that strongly | 14:59 |
melwitt | ack | 14:59 |
bauzas | melwitt: /me looks | 15:00 |
bauzas | honestly, I have the same thoughts about the audit command | 15:01 |
*** mkrai_ has joined #openstack-nova | 15:01 | |
bauzas | once we merge it (and honestly, it still needs some time from me), I think we *could* honestly backport it to help operators | 15:01 |
bauzas | I said we *could* | 15:01 |
mriedem | like, honestly? | 15:01 |
bauzas | but we need some consensus | 15:01 |
bauzas | if someone doesn't want about backporting any feature or a wishlist bug, I understand it | 15:02 |
bauzas | because I could tell this | 15:02 |
melwitt | yeah, I mean, the usual is we backport downstream only in the feature cases. example: I'm in the middle of backporting db purge, archive_deleted_rows --before and --all-cells | 15:02 |
*** dpawlik has joined #openstack-nova | 15:02 | |
*** takashin has left #openstack-nova | 15:02 | |
bauzas | yeah, honestly, backporting the audit command only downstream wouldn't be a problem for me | 15:03 |
melwitt | if people are ok with backporting purge_task_log upstream, then bug it up I guess | 15:03 |
*** jmlowe has joined #openstack-nova | 15:03 | |
bauzas | but I think operators not using OSP would also love it, even for Train | 15:03 |
melwitt | yeah | 15:03 |
bauzas | and I think for purge, it's the same | 15:03 |
melwitt | I dunno, I would have thought the same for purge, --before and --all-cells | 15:03 |
melwitt | though | 15:04 |
bauzas | so, yeah, I agree with you, maybe just provide a backport change in stable/train and then we could discuss about it there | 15:04 |
*** gbarros has joined #openstack-nova | 15:04 | |
*** mjozefcz has quit IRC | 15:04 | |
mriedem | imo backporting standalone new commands (like heal_allocations in my case) is less of an issue because if they are busted then whatever, no one is using them on stable already anyway, | 15:05 |
mriedem | but backporting big changes to existing CLIs that people are using, like the all cells stuff for archive, is much riskier | 15:05 |
bauzas | actually, good point | 15:05 |
melwitt | yeah, I could see that. risk aspect | 15:05 |
bauzas | if we're adding some argument, I don't see the problem | 15:05 |
bauzas | but if we're changing some arg, then yes it's at risk | 15:06 |
mriedem | depends on how invasive it is | 15:06 |
melwitt | heh yeah. | 15:06 |
bauzas | right, hence us should be discussing on the stable change | 15:06 |
mriedem | diconico07: i've commented in https://bugs.launchpad.net/nova/+bug/1847367 from the results of the meeting | 15:06 |
openstack | Launchpad bug 1847367 in OpenStack Compute (nova) "Images with hw:vif_multiqueue_enabled can be limited to 8 queues even if more are supported" [Undecided,Confirmed] - Assigned to sean mooney (sean-k-mooney) | 15:06 |
bauzas | I mean, anyone can provide any change to the stable branches | 15:06 |
bauzas | it's just the stable cores that either accept or disagree with it | 15:07 |
*** dpawlik has quit IRC | 15:07 | |
sean-k-mooney | mriedem: cool i have something typed up as well | 15:07 |
melwitt | bauzas: the master change isn't written yet :P but really the discussion here is whether to do it as a bp or a wishlist bug with backports in the mind | 15:07 |
melwitt | I think the process has been, if it's a bp then it's totally nacked on stable | 15:08 |
bauzas | melwitt: mriedem: but honestly, the stable rules don't say 'please don't backport any feature' | 15:08 |
bauzas | https://docs.openstack.org/project-team-guide/stable-branches.html#appropriate-fixes | 15:08 |
bauzas | apart of https://docs.openstack.org/project-team-guide/stable-branches.html#active-maintenance rule #1 | 15:09 |
sean-k-mooney | mriedem: this was the downstream bug that i was going to fix https://bugzilla.redhat.com/show_bug.cgi?id=1714075 but to be honest i have known about this bevhaior for years and its bugged me so ill be happy to fix it | 15:09 |
openstack | bugzilla.redhat.com bug 1714075 in openstack-nova "[OSP13][NFV] 8 queues limit is applicable for tap device not for vhostuser port in kernel version 3." [Medium,Assigned] - Assigned to smooney | 15:09 |
bauzas | but in https://docs.openstack.org/project-team-guide/stable-branches.html#review-guidelines we say " Proposed backports breaking any of the above guidelines can be discussed as exception requests on the openstack-discuss list (prefix with [stable]) where the stable maintenance core team will have the final say. " | 15:09 |
bauzas | melwitt: so, see, even with stable, you can still have exceptions | 15:09 |
sean-k-mooney | mriedem: i was pretty sure i had already filed a bug for vhost-user but i cnat find it in launchpad so ill file a new one as you said | 15:09 |
bauzas | melwitt: so I don't see a problem with you asking for an exception once you're done with master | 15:10 |
mriedem | "the backport guidelines don't say anything about new features...oh except this part where it says backports for new features are completely forbidden" | 15:11 |
*** nanzha has quit IRC | 15:11 | |
mriedem | :/ | 15:11 |
melwitt | bauzas: I don't think that's likely to fly with the stable team :P just mho | 15:11 |
mriedem | melwitt: just do a wishlist bug, drop the bp, write the patch and we can slit each others throats on backport policy in 3 months? | 15:11 |
melwitt | mriedem: lol, sounds great | 15:11 |
bauzas | mriedem: it tells about some possible exceptions :p | 15:12 |
mriedem | how about someone familiar with the new blueprint process tell me the decoder ring for what i can set for the Direction and Definition fields when approving a specless blueprint? | 15:12 |
dansmith | If we couldn't backport a tiny feature to mitigate spectre, I can't imagine we're going to get permission to backport something like this | 15:12 |
mriedem | can i mark both as "approved"? | 15:12 |
*** nanzha has joined #openstack-nova | 15:13 | |
melwitt | I think there's an ML mail about that. /me looks | 15:13 |
sean-k-mooney | mriedem: i think the intent was to mark the direct as appoved after review around m2 | 15:13 |
*** mlavalle has joined #openstack-nova | 15:14 | |
sean-k-mooney | but this is a small thing that i expect mel will have ready pretty quickly so i hope its merged well before that point | 15:14 |
sean-k-mooney | so ya you proably could mark both as approved | 15:14 |
melwitt | nvm, I guess it doesn't really explain it http://lists.openstack.org/pipermail/openstack-discuss/2019-October/009945.html | 15:14 |
bauzas | mriedem: https://specs.openstack.org/openstack/nova-specs/readme.html#the-lifecycle-of-a-specification | 15:14 |
bauzas | mriedem: basically, now set Definition as "approved" | 15:15 |
* dansmith hopes nobody notices him in the corner trying to light the building on fire | 15:15 | |
bauzas | FWIW, for "Direction", we didn't had a consensus when merging the proposal | 15:15 |
bauzas | so, leave it blank | 15:16 |
dansmith | bauzas: you might say there was no.....Direction? | 15:16 |
sean-k-mooney | bauzas: if the thing is merged before m2 or m3 it really does not matter | 15:16 |
* mriedem rimshots | 15:16 | |
bauzas | in theory, the PTL should set "Direction" would be used for 'important' BPs | 15:16 |
mriedem | if only we had a priority field... | 15:16 |
* sean-k-mooney recoils form that joke | 15:16 | |
melwitt | lol ahhhh | 15:16 |
bauzas | but I disagreed on that since it wasn't explaining the process to define *which* BPs would be blessed | 15:16 |
bauzas | hence the use of conditional | 15:17 |
mriedem | Direction is binary btw, approved or not approved, | 15:17 |
mriedem | like you can be pregnant or not | 15:17 |
dansmith | kinda like this conversation can make you suicidal or not? | 15:17 |
bauzas | heh honestly, we shouldn't care now about those fields until someone (say efried_afk) clarifies the use | 15:18 |
sean-k-mooney | i was goign to ask about the inplace rebuild but im just going to write a unit test and fix my typos instead | 15:18 |
bauzas | I see those fields as "optional" for further usage :) | 15:18 |
bauzas | I was more interested honestly in the other side of the change, which is the feature liaison concept | 15:18 |
*** yan0s has quit IRC | 15:23 | |
*** igordc has joined #openstack-nova | 15:24 | |
*** gbarros has quit IRC | 15:26 | |
*** ileixe has quit IRC | 15:26 | |
*** maciejjozefczyk has joined #openstack-nova | 15:27 | |
*** zbr has quit IRC | 15:29 | |
*** zbr__ has joined #openstack-nova | 15:29 | |
gibi | melwitt: my suggestion for the https://blueprints.launchpad.net/nova/+spec/nova-manage-db-purge-task-log . Do the implementation with backportability in mind. Use a bug if you want to avoid the procedural -2 on stable backport. If the bug backport will be nack-ed by the stable team then you still have a backportable fix that a distro can backport | 15:31 |
melwitt | gibi: makes sense, thanks | 15:31 |
*** jawad_axd has joined #openstack-nova | 15:32 | |
gibi | melwitt: and I have not technical problems with the proposed change in that bp | 15:32 |
gibi | I mean I don't have any technical issues | 15:32 |
melwitt | ack, thanks | 15:33 |
*** mkrai_ has quit IRC | 15:34 | |
*** mkrai__ has joined #openstack-nova | 15:34 | |
*** maciejjozefczyk has quit IRC | 15:37 | |
*** nanzha has quit IRC | 15:39 | |
*** jbernard has quit IRC | 15:41 | |
*** TxGirlGeek has quit IRC | 15:42 | |
*** maciejjozefczyk has joined #openstack-nova | 15:47 | |
*** dpawlik has joined #openstack-nova | 15:47 | |
openstackgerrit | Merged openstack/os-traits master: Add COMPUTE_NODE trait https://review.opendev.org/688969 | 15:47 |
*** maciejjozefczyk has quit IRC | 15:48 | |
*** jmlowe has quit IRC | 15:49 | |
*** brinzhang_ has quit IRC | 15:49 | |
*** jbernard has joined #openstack-nova | 15:49 | |
*** brinzhang_ has joined #openstack-nova | 15:50 | |
*** brinzhang_ has quit IRC | 15:51 | |
*** nanzha has joined #openstack-nova | 15:51 | |
*** dpawlik has quit IRC | 15:52 | |
*** TxGirlGeek has joined #openstack-nova | 15:52 | |
*** ttsiouts has quit IRC | 15:54 | |
*** gbarros has joined #openstack-nova | 15:54 | |
mriedem | gibi: on https://review.opendev.org/#/c/689049/1/nova/scheduler/client/report.py@1844 - i'm adding a new kwarg to handle the logic if the target consumer does not exist, | 15:56 |
mriedem | thoughts on variable names? i was thinking "target_is_new" or "reverting_allocations" | 15:56 |
mriedem | what makes more sense to you? | 15:56 |
mriedem | the former might be more obvious, the latter is maybe too tightly coupled to what is calling the method | 15:56 |
*** gbarros has quit IRC | 15:58 | |
*** gbarros has joined #openstack-nova | 15:58 | |
*** tssurya has quit IRC | 16:00 | |
*** efried_afk is now known as efried | 16:02 | |
efried | melwitt: If it matters for backportability, make it a bug. | 16:03 |
melwitt | efried: thanks, I will bug it | 16:03 |
efried | All evidence to the contrary, I'm anti-process. I just want to get shit done. | 16:03 |
efried | so whatever moves the ball | 16:04 |
melwitt | wfm | 16:04 |
bauzas | efried: the only problem with any negociation on what's reasonable and what's not depends on the mandate people give you | 16:07 |
bauzas | efried: and I'm super afraid of us trying to decide priorities based on biaised arguments | 16:07 |
efried | what do you mean, mandate? | 16:07 |
efried | and what people? | 16:07 |
*** mkrai__ has quit IRC | 16:08 | |
efried | and yes, I agree, it's tough to coordinate priorities | 16:08 |
bauzas | efried: I think I mixed two things | 16:08 |
efried | bauzas: but it's a fact that we approve more than we can hope to accomplish | 16:08 |
bauzas | you were mentioning the stable rules with melwitt's BP, I was thinking of the new spec process with the Direction field | 16:09 |
*** damien_r has quit IRC | 16:09 | |
efried | and IMO it's better to cut things off, even if it's completely arbitrary (like a hard number, picked at random) than to just meander along and have no idea at the start of the release what's got a chance of merging by the end of the release. | 16:09 |
efried | that's what's been motivating me from day one. | 16:09 |
bauzas | efried: you'd probably be surprised if I was telling you I don't see a problem of having more things approved than we can't accomplish | 16:10 |
efried | I wouldn't say I'm surprised. The fact that nobody seems to mind is why we are where we are. | 16:10 |
melwitt | as a person trying to get my things done, I'd rather have some chance than no chance, but that's just MHO | 16:11 |
bauzas | right | 16:11 |
*** gbarros has quit IRC | 16:11 | |
bauzas | and having some way to help contributors to understand the dynamics improve the situation | 16:11 |
bauzas | improves* | 16:11 |
efried | Mine can't be the only downstream that yells at me because "what do you mean it didn't get reviewed? The blueprint was approved!" | 16:11 |
sean-k-mooney | no but that has always been a thing | 16:12 |
efried | which is a perfect non-argument for continuing to allow it. | 16:12 |
bauzas | efried: that's one of the reasons why I don't want our upstream process with specs be an OKR for my management | 16:12 |
efried | okr? | 16:12 |
dansmith | efried: there are more reasons for things not getting reviewed than bandwidth or over-committing | 16:12 |
bauzas | objective key result | 16:12 |
bauzas | I mean, I don't wanna brag because my spec is approved | 16:13 |
efried | dansmith: understood and acknowledged in the ML Thread of Doom. | 16:13 |
sean-k-mooney | well my point was the bottel neck has never been writing the code. it has alwasy been reviewing it. we can reduce the scope for new feature but i know that that leads to less in vestment in openstck | 16:13 |
bauzas | if I were bragging, that would be because I feel we got a consensus on the design we gonna achieve for the thing I wanna implement | 16:13 |
bauzas | but certainly not implying that my stuff is done | 16:14 |
bauzas | anyway, I need to call it a day | 16:15 |
bauzas | things change, people become parents of kids who grown up and have social activities | 16:15 |
bauzas | grow* | 16:15 |
bauzas | and since stupidely kids under 8 can't drive, I need to AWOL | 16:15 |
sean-k-mooney | bauzas: dont sell your kids short. have you given the opertuity to try. what could possibly go wrong | 16:16 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Delete source allocations in move_allocations if target no longer exists https://review.opendev.org/689049 | 16:16 |
*** gbarros has joined #openstack-nova | 16:17 | |
mriedem | a wild idea: more cores should do more reviews | 16:20 |
mriedem | *gasp* | 16:20 |
mriedem | https://www.stackalytics.com/report/contribution/nova/120 | 16:20 |
*** jbernard has quit IRC | 16:20 | |
mriedem | who is <1 review per day on that list? | 16:20 |
*** udesale has quit IRC | 16:22 | |
efried | 1/3 of the core team. | 16:22 |
mriedem | right, | 16:22 |
mriedem | so if you're a core below that threshold, stop complaining | 16:23 |
*** ociuhandu_ has joined #openstack-nova | 16:23 | |
efried | to be fair, I don't see those cores complaining. | 16:23 |
efried | Actually, I think I'm the only one complaining. | 16:23 |
mriedem | i see bauzas and melwitt complaining above | 16:23 |
*** jbernard has joined #openstack-nova | 16:23 | |
melwitt | my comment was not intended as a complaint | 16:24 |
efried | fwiw I saw both as stating reasons for preferring the status quo wrt approving more than we can hope to review. | 16:25 |
dansmith | yeah, that was my understanding as well | 16:25 |
dansmith | both melwitt and bauzas have been around since we've tried many similar schemes in the past too | 16:26 |
dansmith | as have I and mriedem | 16:26 |
*** ociuhandu has quit IRC | 16:26 | |
melwitt | I've not argued or voted on any of the new process things because I am in a difficult spot these days with upstream review time. I said one sentence from the perspective of being a contributor. I didn't want anyone to see it as complaining from me | 16:26 |
*** markvoelker has quit IRC | 16:26 | |
*** rpittau is now known as rpittau|afk | 16:27 | |
*** nanzha has quit IRC | 16:27 | |
*** ociuhandu_ has quit IRC | 16:30 | |
bauzas | folks, I was on and off last cycles, and I promised too much so now I'm done with this | 16:32 |
*** maciejjozefczyk has joined #openstack-nova | 16:32 | |
bauzas | what I just want is helping others as much as I can | 16:32 |
efried | There was a request for a feature liaison earlier :) | 16:33 |
bauzas | so, I'm glad mriedem pings me with asking to review stable changes for example, or spec review request | 16:33 |
efried | http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-10-17.log.html#t2019-10-17T04:14:06 | 16:33 |
bauzas | and then, if I can commit myself, I do | 16:33 |
*** dtantsur is now known as dtantsur|afk | 16:37 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Make '_wait_for_state_change' behave consistently https://review.opendev.org/689180 | 16:39 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Unify '_wait_until_deleted' implementations https://review.opendev.org/689181 | 16:39 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Make 'ServerTestBase' subclass 'InstanceHelperMixin' https://review.opendev.org/689182 | 16:39 |
*** lpetrut has joined #openstack-nova | 16:39 | |
*** lpetrut has quit IRC | 16:40 | |
*** lpetrut has joined #openstack-nova | 16:40 | |
eandersson | sean-k-mooney, the issue is oslo messaging related btw, or maybe rabbitmq related. | 16:45 |
*** sapd1 has quit IRC | 16:46 | |
eandersson | I tried to publish and consume to the specific compute queues and every thing worked fine | 16:46 |
eandersson | I even captured the message last night from the scheduler | 16:46 |
eandersson | but deleting the queues and restarting the compute, and magically it's working | 16:46 |
*** derekh has quit IRC | 16:58 | |
sean-k-mooney | that makes me think of an oslo bug | 17:00 |
sean-k-mooney | eandersson: https://bugs.launchpad.net/oslo.messaging/+bug/1661510 | 17:01 |
openstack | Launchpad bug 1661510 in oslo.messaging "topic_send may loss messages if the queue not exists" [Medium,In progress] - Assigned to Gabriele Santomaggio (gsantomaggio) | 17:01 |
sean-k-mooney | i think that the one im thinking of | 17:01 |
*** lpetrut has quit IRC | 17:01 | |
*** markvoelker has joined #openstack-nova | 17:05 | |
sean-k-mooney | melwitt: do you rembere this oslo chage that intoduce the mandatory flag for rabbitmq | 17:05 |
sean-k-mooney | https://review.opendev.org/#/c/660373/ | 17:05 |
melwitt | a little bit, yeah | 17:06 |
sean-k-mooney | melwitt: do you recal if we ever started using it in nova | 17:06 |
melwitt | not that I know of | 17:06 |
sean-k-mooney | i would guess the nova bug for lossing message is still open | 17:06 |
sean-k-mooney | its possible that is what eandersson hit | 17:07 |
*** amodi has joined #openstack-nova | 17:07 | |
*** tbachman_ has joined #openstack-nova | 17:10 | |
melwitt | I don't recall a nova bug for this one | 17:10 |
melwitt | in lp | 17:10 |
sean-k-mooney | i think there was one but my seach foo is failing | 17:10 |
sean-k-mooney | is there a way to search for bugs you commented on? | 17:11 |
*** jangutter_ has joined #openstack-nova | 17:11 | |
*** tbachman has quit IRC | 17:11 | |
*** tbachman_ is now known as tbachman | 17:11 | |
sean-k-mooney | oh there is | 17:12 |
melwitt | here's the IRC convo about it from the same day we commented on the oslo.messaging review http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-06-06.log.html#t2019-06-06T22:47:09 | 17:12 |
sean-k-mooney | oh thats clever i would not have thought of that. i rememerned it was mnaser that hit it | 17:13 |
*** jangutter has quit IRC | 17:14 | |
melwitt | rest of the convo is here and I don't see any nova lp bug mentioned other than an old one that got no additional info on it | 17:15 |
melwitt | http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-06-07.log.html | 17:15 |
sean-k-mooney | ok so there wasnt a specifc nova bug but there were 3 related bugs | 17:15 |
sean-k-mooney | https://bugs.launchpad.net/nova/+bug/1794706 | 17:15 |
openstack | Launchpad bug 1794706 in OpenStack Compute (nova) "The instance left stuck when oslo.messaging raised MessageDeliveryFailure exception" [Undecided,Expired] | 17:15 |
sean-k-mooney | https://bugs.launchpad.net/oslo.messaging/+bug/1437955 | 17:15 |
openstack | Launchpad bug 1437955 in oslo.messaging "RPC calls and responses do not use the mandatory flag (AMQP)" [Wishlist,Confirmed] - Assigned to Gabriele Santomaggio (gsantomaggio) | 17:15 |
melwitt | yeah | 17:15 |
sean-k-mooney | https://bugs.launchpad.net/oslo.messaging/+bug/1661510 | 17:15 |
openstack | Launchpad bug 1661510 in oslo.messaging "topic_send may loss messages if the queue not exists" [Medium,In progress] - Assigned to Gabriele Santomaggio (gsantomaggio) | 17:15 |
*** maciejjozefczyk has quit IRC | 17:15 | |
sean-k-mooney | eandersson: so i think you were hitting the same issue as mnaser | 17:16 |
*** awalende has quit IRC | 17:16 | |
*** jangutter_ has quit IRC | 17:16 | |
*** awalende has joined #openstack-nova | 17:16 | |
sean-k-mooney | melwitt: thanks :) | 17:17 |
melwitt | :) | 17:17 |
sean-k-mooney | shoudl i un expire https://bugs.launchpad.net/nova/+bug/1794706 by the way | 17:18 |
openstack | Launchpad bug 1794706 in OpenStack Compute (nova) "The instance left stuck when oslo.messaging raised MessageDeliveryFailure exception" [Undecided,Expired] | 17:18 |
sean-k-mooney | i guess there is no point we dont have the info we need | 17:18 |
*** awalende has quit IRC | 17:21 | |
eandersson | Yea that sounds like the exact issue | 17:22 |
eandersson | btw another major issue with this is that unless you know what you are doing it's very difficult to identify the bad compute | 17:23 |
melwitt | sean-k-mooney: yeah, I'd say don't bother, it's from 2017 and not enough info to move forward, unless I've missed something | 17:23 |
melwitt | oh, nvm 2018. I can't read | 17:23 |
melwitt | sean-k-mooney: if you wanted to unexpire and add more info to it based on eandersson experience (if it's the same thing) then I think that makes sense | 17:27 |
sean-k-mooney | eandersson: yes. would you feel comfortable writing this up as a bug | 17:27 |
sean-k-mooney | melwitt: its not exactly the same thing as the old nova bug | 17:27 |
sean-k-mooney | but i think its the same oslo bug | 17:27 |
melwitt | ack | 17:28 |
sean-k-mooney | which is what mnaser was hitting | 17:28 |
eandersson | I am pretty sure what happened here was that we had a rabbitmq network partition weeks ago | 17:28 |
eandersson | and that partition somehow damaged the queue | 17:28 |
sean-k-mooney | yep | 17:28 |
eandersson | but only from openstack perspective, because I could consume the queue using the ui etc. | 17:28 |
sean-k-mooney | basically if the que gets deleted | 17:28 |
sean-k-mooney | and you dont restart the nova-compute agent | 17:28 |
sean-k-mooney | then it wont recreate it | 17:28 |
eandersson | and conductor | 17:28 |
eandersson | yea | 17:28 |
eandersson | the weird thing is that it existed in a "healthy" state with the bindings etc | 17:29 |
melwitt | and if we leverage the earlier mentioned oslo.messaging change, we can make it recover in nova? | 17:29 |
eandersson | If I can find another bad queue I can test | 17:29 |
sean-k-mooney | i think so | 17:29 |
melwitt | kewl | 17:30 |
sean-k-mooney | i think either the condocot or compute node would get and excpetion if tehy tried to do a topic send to the queue and that would allwo use to fix it by recreating the queues | 17:30 |
openstackgerrit | Merged openstack/nova stable/queens: lxc: make use of filter python3 compatible https://review.opendev.org/676500 | 17:30 |
openstackgerrit | Merged openstack/nova master: Make sure tox install requirements.txt with upper-constraints https://review.opendev.org/689152 | 17:31 |
sean-k-mooney | melwitt: i would need to fully re read the oslo feature but they had a recovery mechanium in mind | 17:31 |
melwitt | sean-k-mooney: yeah. if I'm remembering right, they were saying just setting an option called 'mandatory' would make it do the things by itself | 17:32 |
*** mdbooth has quit IRC | 17:32 | |
melwitt | and didn't want to change the default to mandatory=1/true, so they added this config option passing mechanism. and we just needed to pass 'mandatory' to it in nova | 17:33 |
melwitt | something like that | 17:33 |
sean-k-mooney | ya something like that | 17:33 |
sean-k-mooney | https://bugs.launchpad.net/oslo.messaging/+bug/1437955 | 17:33 |
openstack | Launchpad bug 1437955 in oslo.messaging "RPC calls and responses do not use the mandatory flag (AMQP)" [Wishlist,Confirmed] - Assigned to Gabriele Santomaggio (gsantomaggio) | 17:33 |
sean-k-mooney | this kind fo explains it but i think there was more in teh review | 17:33 |
* melwitt nods | 17:33 | |
*** mdbooth has joined #openstack-nova | 17:34 | |
sean-k-mooney | while this is only the 2nd time someone has reported this in the last 4 months it is really hard to traige/identify this issue so we proably shoudl try to fix this. i guess we soudl follow up with the oslo folks and see if it ready to use | 17:36 |
*** lpetrut has joined #openstack-nova | 17:37 | |
sean-k-mooney | im surpiesed more people have not hit this to be honest | 17:37 |
*** ociuhandu has joined #openstack-nova | 17:38 | |
sean-k-mooney | mayeb they have and just rebooted the node | 17:38 |
melwitt | yeah, should just be a matter of bumping our minimum oslo.messaging version and finding where to set the mandatory flag | 17:41 |
melwitt | that doesn't help for stable branches obvs. not sure if we have any options there | 17:42 |
melwitt | I can give it a go unless someone else wants to | 17:42 |
melwitt | looks like the change was released in version 9.8.0 | 17:43 |
melwitt | thinking more, there's no way we could have this for stable branches | 17:46 |
*** psachin has quit IRC | 17:46 | |
melwitt | and also, how to reproduce and prove the mandatory flag fixes anything | 17:46 |
*** dpawlik has joined #openstack-nova | 17:48 | |
*** dpawlik has quit IRC | 17:53 | |
*** mriedem has quit IRC | 17:55 | |
*** mriedem has joined #openstack-nova | 17:56 | |
*** jawad_axd has quit IRC | 17:59 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'os-consoles' API https://review.opendev.org/687907 | 18:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'nova-console' service, 'os-consoles' API https://review.opendev.org/687908 | 18:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove 'nova-xvpvncproxy' https://review.opendev.org/687909 | 18:05 |
*** jawad_axd has joined #openstack-nova | 18:11 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Change order of two classes https://review.opendev.org/689178 | 18:12 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Rework '_delete_server' https://review.opendev.org/689179 | 18:12 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Make '_wait_for_state_change' behave consistently https://review.opendev.org/689180 | 18:12 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Unify '_wait_until_deleted' implementations https://review.opendev.org/689181 | 18:12 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: functional: Make 'ServerTestBase' subclass 'InstanceHelperMixin' https://review.opendev.org/689182 | 18:12 |
*** ociuhandu has quit IRC | 18:13 | |
mnaser | eandersson: i have hit that exact issue many times :( | 18:17 |
mnaser | it really has to do with rabbitmq being unhappy when the queues come back | 18:18 |
mnaser | curious -- what version of it are you running? | 18:18 |
*** ociuhandu has joined #openstack-nova | 18:26 | |
*** lbragsta_ has quit IRC | 18:29 | |
eandersson | 3.7.5 | 18:31 |
*** zbr__ has quit IRC | 18:32 | |
*** zbr has joined #openstack-nova | 18:34 | |
mordred | stephenfin: did y'all sort out the requirements thing from earlier/ | 18:37 |
mordred | ? | 18:37 |
dansmith | mordred: with the networx thing or whatever? | 18:38 |
dansmith | mordred: https://review.opendev.org/#/c/689152/ | 18:38 |
mordred | dansmith: cool! | 18:40 |
*** ceryx has joined #openstack-nova | 18:41 | |
*** spatel has joined #openstack-nova | 18:44 | |
*** lpetrut has quit IRC | 18:46 | |
*** jawad_axd has quit IRC | 18:50 | |
*** dpawlik has joined #openstack-nova | 18:51 | |
*** ociuhandu has quit IRC | 18:52 | |
*** dpawlik has quit IRC | 18:55 | |
*** tbachman has quit IRC | 19:01 | |
*** jangutter has joined #openstack-nova | 19:06 | |
*** tesseract has quit IRC | 19:07 | |
eandersson | sean-k-mooney, mnaser > Message not delivered: NO_ROUTE (312) to queue 'compute.<compute_name>' from exchange 'nova' | 19:07 |
eandersson | This is the actual issue | 19:07 |
eandersson | It's an super odd bug because the queue is fine, and the compute can consume from it | 19:07 |
eandersson | but the binding (that is 100% there) simply does not work | 19:07 |
*** maciejjozefczyk has joined #openstack-nova | 19:08 | |
eandersson | if confirm-deliveries isn't enabled, which might be the problem, amqp isn't going to raise an error | 19:09 |
eandersson | mandatory flag isn't enough afaik | 19:09 |
eandersson | mandatory + confirm_deliveries is what I used | 19:10 |
eandersson | http://paste.openstack.org/show/784547/ | 19:12 |
openstackgerrit | Merged openstack/nova master: Add functional recreate test for bug 1848343 https://review.opendev.org/688980 | 19:13 |
openstack | bug 1848343 in OpenStack Compute (nova) "Reverting migration-based allocations leaks allocations if the server is deleted" [Medium,In progress] https://launchpad.net/bugs/1848343 - Assigned to Matt Riedemann (mriedem) | 19:13 |
eandersson | * http://paste.openstack.org/show/784548/ | 19:14 |
openstackgerrit | Merged openstack/nova master: Add live migration recreate test for bug 1848343 https://review.opendev.org/688994 | 19:14 |
openstackgerrit | Merged openstack/nova master: Add compute side revert allocation test for bug 1848343 https://review.opendev.org/689013 | 19:17 |
openstack | bug 1848343 in OpenStack Compute (nova) "Reverting migration-based allocations leaks allocations if the server is deleted" [Medium,In progress] https://launchpad.net/bugs/1848343 - Assigned to Matt Riedemann (mriedem) | 19:17 |
*** maciejjozefczyk has quit IRC | 19:24 | |
*** tbachman has joined #openstack-nova | 19:29 | |
*** tbachman has quit IRC | 19:37 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Delete source allocations in move_allocations if target no longer exists https://review.opendev.org/689049 | 19:44 |
*** awalende has joined #openstack-nova | 19:46 | |
*** ralonsoh has quit IRC | 19:47 | |
*** pcaruana has quit IRC | 19:49 | |
*** maciejjozefczyk has joined #openstack-nova | 19:49 | |
*** awalende has quit IRC | 19:51 | |
*** mlavalle has quit IRC | 19:55 | |
*** mlavalle has joined #openstack-nova | 19:55 | |
*** mgariepy has quit IRC | 19:56 | |
*** maciejjozefczyk has quit IRC | 19:58 | |
*** bnemec has quit IRC | 20:03 | |
*** maciejjozefczyk has joined #openstack-nova | 20:04 | |
*** bnemec has joined #openstack-nova | 20:05 | |
mriedem | easy couple of patches to fix an upgrade issue since stein https://review.opendev.org/#/q/topic:bug/1824435+(status:open+OR+status:merged) | 20:06 |
*** ricolin_ has joined #openstack-nova | 20:06 | |
*** ricolin has quit IRC | 20:09 | |
* efried chauffeurs *again* | 20:19 | |
*** efried is now known as efried_afk | 20:19 | |
*** maciejjozefczyk has quit IRC | 20:19 | |
*** jangutter has quit IRC | 20:23 | |
*** igordc has quit IRC | 20:26 | |
*** nweinber has quit IRC | 20:30 | |
*** gbarros has quit IRC | 20:37 | |
*** maciejjozefczyk has joined #openstack-nova | 20:41 | |
*** mriedem has quit IRC | 20:49 | |
*** eharney has quit IRC | 20:49 | |
eandersson | mnaser, https://github.com/rabbitmq/rabbitmq-server/pull/1884#issuecomment-464277810 | 20:51 |
eandersson | I think this is the issue | 20:51 |
*** dpawlik has joined #openstack-nova | 20:51 | |
*** spatel has quit IRC | 20:52 | |
eandersson | https://github.com/rabbitmq/rabbitmq-server/pull/1879 | 20:52 |
*** dpawlik has quit IRC | 20:56 | |
*** mlavalle has quit IRC | 20:56 | |
*** mlavalle has joined #openstack-nova | 20:56 | |
*** igordc has joined #openstack-nova | 20:58 | |
*** trident has quit IRC | 20:58 | |
*** _mlavalle_1 has joined #openstack-nova | 20:59 | |
*** _mlavalle_1 has quit IRC | 20:59 | |
melwitt | if that's the case, then all that should be needed is upgrade rabbit to 3.7 and no need to consume the mandatory flag option in nova, iiuc | 20:59 |
melwitt | oh wait, according to the backscroll, need both the flag and the fix in 3.7 then sounds like | 21:02 |
*** mlavalle has quit IRC | 21:02 | |
melwitt | eandersson: can you correct me pls? ^ | 21:03 |
*** igordc has quit IRC | 21:03 | |
melwitt | just trying to understand whether we need to do anything in nova | 21:03 |
*** trident has joined #openstack-nova | 21:04 | |
*** panda has quit IRC | 21:08 | |
*** igordc has joined #openstack-nova | 21:09 | |
*** lpetrut has joined #openstack-nova | 21:11 | |
*** panda has joined #openstack-nova | 21:11 | |
*** gbarros has joined #openstack-nova | 21:11 | |
*** gbarros has quit IRC | 21:13 | |
*** lbragstad has quit IRC | 21:14 | |
*** lpetrut has quit IRC | 21:17 | |
*** gyee has quit IRC | 21:20 | |
*** lbragstad has joined #openstack-nova | 21:24 | |
*** mlavalle has joined #openstack-nova | 21:30 | |
*** tbachman has joined #openstack-nova | 21:39 | |
*** rcernin has joined #openstack-nova | 21:39 | |
*** markvoelker has quit IRC | 21:39 | |
*** maciejjozefczyk has quit IRC | 21:42 | |
openstackgerrit | Matthew Booth proposed openstack/nova master: Add new test base for libvirt functional tests https://review.opendev.org/689186 | 21:46 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Unplug VIFs as part of cleanup of networks https://review.opendev.org/663382 | 21:46 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Fix incorrect instance state after build failure https://review.opendev.org/689278 | 21:46 |
*** jmlowe has joined #openstack-nova | 21:46 | |
eandersson | melwitt, so the queue can still get stuck in a bad state. It happens. | 21:52 |
eandersson | So having that extra layer of protection is still good. | 21:52 |
eandersson | (e.g. network partitions etc) | 21:53 |
melwitt | eandersson: sorry, what I mean is, what do we need to do in nova, if anything? pass the mandatory flag? pass the mandatory flag and the confirm_deliveries flag? or is nothing needed? | 21:53 |
*** rcernin has quit IRC | 21:53 | |
eandersson | I think this probably has to be done on the oslo.messaging side | 21:53 |
melwitt | I wasn't sure whether your findings meant a change in the plan for | 21:54 |
melwitt | nova | 21:54 |
eandersson | but I still feel like a vm shouldn't be able to stay in a state like that indefinitely | 21:54 |
eandersson | maybe that could be changed to an rpc call instead? | 21:55 |
melwitt | no, it shouldn't. sorry, somehow what I'm saying is not coming across. you posted links about rabbit bugs earlier, so I wasn't sure if you were saying it's a rabbit bug and there's nothing we can do to help in nova | 21:55 |
*** slaweq has quit IRC | 21:56 | |
melwitt | this is the change they made in oslo.messaging to allow people (like nova) to pass flags https://review.opendev.org/660373 | 21:57 |
eandersson | Yea - if they is exposed I would add mandatory to these calls in nova. | 21:58 |
melwitt | ok, cool | 21:59 |
eandersson | It will help protect against a bunch of potential RabbitMQ issues | 22:02 |
melwitt | thanks for clarifying that. I'll find how to add the flag on the nova side | 22:03 |
*** rcernin has joined #openstack-nova | 22:09 | |
*** ricolin_ has quit IRC | 22:10 | |
*** mmethot_ has quit IRC | 22:13 | |
*** tbachman has quit IRC | 22:13 | |
*** tbachman has joined #openstack-nova | 22:15 | |
*** spatel has joined #openstack-nova | 22:24 | |
eandersson | melwitt, some information on mandatory here https://www.rabbitmq.com/reliability.html#routing | 22:27 |
*** rcernin has quit IRC | 22:27 | |
*** rcernin has joined #openstack-nova | 22:27 | |
*** spatel has quit IRC | 22:29 | |
*** adriant has quit IRC | 22:46 | |
melwitt | eandersson: thx | 22:50 |
*** dpawlik has joined #openstack-nova | 22:52 | |
*** dpawlik has quit IRC | 22:56 | |
*** TxGirlGeek has quit IRC | 23:00 | |
*** xek__ has quit IRC | 23:01 | |
*** TxGirlGeek has joined #openstack-nova | 23:02 | |
*** tkajinam has joined #openstack-nova | 23:11 | |
openstackgerrit | Merged openstack/nova stable/stein: Stop sending bad values from libosinfo to libvirt https://review.opendev.org/688067 | 23:17 |
openstackgerrit | Merged openstack/nova stable/queens: Add functional recreate test for regression bug 1825537 https://review.opendev.org/675355 | 23:17 |
openstack | bug 1825537 in OpenStack Compute (nova) queens "finish_resize failures incorrectly revert allocations" [Medium,In progress] https://launchpad.net/bugs/1825537 - Assigned to Matt Riedemann (mriedem) | 23:17 |
*** mlavalle has quit IRC | 23:27 | |
*** TxGirlGeek has quit IRC | 23:36 | |
*** markvoelker has joined #openstack-nova | 23:40 | |
*** markvoelker has quit IRC | 23:45 | |
*** nweinber has joined #openstack-nova | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!