| chandankumar | jgilaber_: Hello, I have tried to reproduced this bug https://bugs.launchpad.net/openstack-cyborg/+bug/2017513 But from bug description, I am not user is removing pci device from the config or from the host directly like unplugging the device. | 08:50 |
|---|---|---|
| chandankumar | In my reproducer, I removed the pci device from config and then similar traceback comes from cyborg-agent log. | 08:51 |
| chandankumar | Feel free to take a look at the bug and triage accordingly. Thank you! | 08:51 |
| *** jgilaber_ is now known as jgilaber | 08:54 | |
| jgilaber | I understood it originally as unplugging the device | 08:54 |
| jgilaber | thanks for the reproducer! | 08:56 |
| jgilaber | I'm not sure what would be the best way to handle such cases | 08:57 |
| chandankumar | one more thing, we delete the device profile, you can see the error message, it is also not clear for end user | 08:57 |
| jgilaber | ideally we would probably unbind the arq and send a new bind request to nova if an equivalent free device exists | 08:57 |
| jgilaber | the agent automatically deletes the device profile? | 08:58 |
| chandankumar | let me try with unbind and see what happens | 09:00 |
| chandankumar | jgilaber: I ubind the device https://paste.openstack.org/raw/bTWU8NVSsBOdJjRNbym3/ but in cyborg, hotplug of a new pci device in running instance is not supported | 10:19 |
| chandankumar | we can create the bind but it does not will have the device inside the vm | 10:19 |
| chandankumar | or I misunderstood send a new bind request to nova if an equivalent free device exist this part | 10:20 |
| jgilaber | hmm I guess that with the current integration any change requires a resize | 10:25 |
| sean-k-mooney | correct it does | 10:26 |
| jgilaber | to be clear my previous comment was me speculating how we could address the bug, I did not expect that was currently supported | 10:26 |
| sean-k-mooney | device profilces shoudl not be modifed if they are refence by a falvor | 10:26 |
| sean-k-mooney | and the only way to chagne the allocation ot a vm is via reisze | 10:26 |
| sean-k-mooney | i am still catching up on email form beign away for a few days so we can chat more about this tomorrow | 10:27 |
| chandankumar | sure | 10:27 |
| jgilaber | sure no problem sean-k-mooney. For context, we're talking about an old bug we dicussed briefly in yesterday's meeting | 10:28 |
| jgilaber | https://bugs.launchpad.net/openstack-cyborg/+bug/2017513 | 10:28 |
| sean-k-mooney | python 3.6 :) | 10:28 |
| sean-k-mooney | ya that been a while | 10:29 |
| sean-k-mooney | this looks simialr to a very old nova bug | 10:29 |
| sean-k-mooney | i think the pci driver is likely missing the protection we ahve on the nova side but ya we can revisit this once im back up to speed | 10:30 |
| sean-k-mooney | https://github.com/openstack/nova/commit/26c41eccade6412f61f9a8721d853b545061adcc https://github.com/openstack/nova/commit/284ea72e96604bdf16d1c5c4db47247334841b2f https://github.com/openstack/nova/commit/0208be629c3853863bcd49b8bdbe2b9889b85012 https://github.com/openstack/nova/commit/f37cdf0c4182103ad81dbf39188ff39955da3850 | 14:04 |
| sean-k-mooney | those are the nova patches related to the isseu reproted in https://bugs.launchpad.net/openstack-cyborg/+bug/2017513 | 14:04 |
| sean-k-mooney | https://bugs.launchpad.net/nova/+bug/1633120 https://bugs.launchpad.net/nova/+bug/1969496 and https://bugs.launchpad.net/nova/+bug/2115905 are the releated nova bugs we had | 14:05 |
| sean-k-mooney | the tl;dr is if a device is refence by a ARQ and that device is not in the whitelist or viaabel on teh host anymore we cannot remove the device form the db or placmeent until that ARQ is deleted and we shoudl not do that automaticlly | 14:07 |
| sean-k-mooney | the admin need to move or delete the vm or readd the device | 14:07 |
| sean-k-mooney | we shoudl complain very very loadly in teh logs when the compute agent start up in a miscondifured state but we dont geenrally want to make that an agent startup failure as that a potical dos vector if we do | 14:08 |
| jgilaber | I'm not sure from the bug report if it actually crashes the agent or it just logs the traceback in an ugly way | 14:23 |
| jgilaber | Chandan reproduced the error, we can ask him later/tomorrow | 14:23 |
| sean-k-mooney | in either case thsi shoudl be handeled gracefully and we shoudl not attepmt to delete the resouce provider | 14:30 |
| sean-k-mooney | trying to delete it if it has allcoation agaisnt it is a bug in our internal logic | 14:30 |
| sean-k-mooney | the protections in placment are really the last line of defence | 14:30 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!