*** mriedem has joined #openstack-nova | 00:10 | |
mriedem | dansmith: a few comments in your reno | 00:11 |
---|---|---|
*** mriedem has quit IRC | 00:20 | |
*** tbachman has joined #openstack-nova | 00:20 | |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Add Cyborg device profile groups to request spec. https://review.opendev.org/631243 | 00:36 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Define Cyborg ARQ binding notification event. https://review.opendev.org/692707 | 00:36 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs. https://review.opendev.org/631244 | 00:36 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Pass accelerator requests to each virt driver from compute manager. https://review.opendev.org/698581 | 00:36 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Compose accelerator PCI devices into domain XML in libvirt driver. https://review.opendev.org/631245 | 00:36 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted. https://review.opendev.org/673735 | 00:36 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable hard/soft reboot with accelerators. https://review.opendev.org/697940 | 00:36 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable start/stop of instances with accelerators. https://review.opendev.org/699553 | 00:36 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Enable and use COMPUTE_ACCELERATORS trait. https://review.opendev.org/699554 | 00:36 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Bump compute rpcapi version and reduce Cyborg calls. https://review.opendev.org/704227 | 00:36 |
openstackgerrit | Sundar Nadathur proposed openstack/nova master: Add cyborg tempest job. https://review.opendev.org/670999 | 00:36 |
*** damien_r has joined #openstack-nova | 00:48 | |
*** TxGirlGeek has quit IRC | 00:52 | |
*** damien_r has quit IRC | 01:04 | |
*** nweinber has joined #openstack-nova | 01:14 | |
*** damien_r has joined #openstack-nova | 01:15 | |
*** tbachman has quit IRC | 01:16 | |
*** damien_r has quit IRC | 01:18 | |
*** mlavalle has quit IRC | 01:57 | |
*** gyee has quit IRC | 02:03 | |
*** Liang__ has joined #openstack-nova | 02:12 | |
*** vishalmanchanda has joined #openstack-nova | 02:18 | |
*** Dinesh_Bhor has quit IRC | 02:21 | |
*** tbachman has joined #openstack-nova | 02:26 | |
*** brinzhang__ has joined #openstack-nova | 02:31 | |
*** brinzhang has joined #openstack-nova | 02:34 | |
*** brinzhang_ has quit IRC | 02:35 | |
*** brinzhang__ has quit IRC | 02:36 | |
*** nweinber has quit IRC | 02:54 | |
*** nweinber has joined #openstack-nova | 03:06 | |
*** mkrai has joined #openstack-nova | 03:12 | |
*** yikun has joined #openstack-nova | 03:30 | |
*** nweinber has quit IRC | 03:34 | |
*** psachin has joined #openstack-nova | 03:35 | |
*** mdbooth has quit IRC | 03:58 | |
*** mdbooth has joined #openstack-nova | 04:00 | |
*** tetsuro has quit IRC | 04:13 | |
*** tetsuro has joined #openstack-nova | 04:13 | |
*** damien_r has joined #openstack-nova | 04:47 | |
*** damien_r has quit IRC | 04:47 | |
*** damien_r has joined #openstack-nova | 04:47 | |
*** udesale has joined #openstack-nova | 04:49 | |
*** abhishekk|out is now known as abhishekk | 05:03 | |
*** gmann has quit IRC | 05:27 | |
*** evrardjp has quit IRC | 05:34 | |
*** evrardjp has joined #openstack-nova | 05:34 | |
*** ociuhandu has joined #openstack-nova | 05:53 | |
*** ociuhandu has quit IRC | 05:57 | |
*** owalsh has quit IRC | 06:01 | |
*** owalsh has joined #openstack-nova | 06:04 | |
*** Dinesh_Bhor has joined #openstack-nova | 06:08 | |
*** ratailor has joined #openstack-nova | 06:15 | |
*** Liang__ has quit IRC | 06:17 | |
*** dtantsur|afk is now known as dtantsur | 07:27 | |
*** dpawlik has joined #openstack-nova | 07:28 | |
*** lpetrut has joined #openstack-nova | 07:29 | |
*** lpetrut has quit IRC | 07:30 | |
*** lpetrut has joined #openstack-nova | 07:31 | |
*** ccamacho has joined #openstack-nova | 07:35 | |
*** dpawlik has quit IRC | 07:44 | |
*** dpawlik has joined #openstack-nova | 07:44 | |
*** slaweq__ has joined #openstack-nova | 08:09 | |
*** amoralej|off is now known as amoralej | 08:09 | |
*** rpittau|afk is now known as rpittau | 08:10 | |
*** tkajinam has quit IRC | 08:12 | |
*** slaweq has joined #openstack-nova | 08:18 | |
*** iurygregory has joined #openstack-nova | 08:19 | |
*** slaweq__ has quit IRC | 08:19 | |
gibi | efried: ack, I will try to spend some time on the cyborg series today | 08:29 |
*** tesseract has joined #openstack-nova | 08:36 | |
*** ivve has joined #openstack-nova | 08:37 | |
*** ralonsoh has joined #openstack-nova | 08:41 | |
*** xek has joined #openstack-nova | 08:44 | |
*** tetsuro has quit IRC | 08:46 | |
*** tosky has joined #openstack-nova | 08:48 | |
*** Liang__ has joined #openstack-nova | 08:54 | |
*** mlycka has joined #openstack-nova | 09:05 | |
*** obondarev has joined #openstack-nova | 09:08 | |
*** obondarev has left #openstack-nova | 09:08 | |
*** david-lyle has quit IRC | 09:16 | |
*** david-lyle has joined #openstack-nova | 09:16 | |
*** martin_midolesov has joined #openstack-nova | 09:25 | |
*** mvkr has joined #openstack-nova | 09:38 | |
*** mlycka has quit IRC | 09:42 | |
*** ociuhandu has joined #openstack-nova | 09:45 | |
*** ratailor has quit IRC | 09:45 | |
*** derekh has joined #openstack-nova | 09:46 | |
*** ociuhandu has quit IRC | 09:51 | |
*** ociuhandu has joined #openstack-nova | 10:02 | |
*** abhishekk is now known as abhishekk|away | 10:11 | |
*** slaweq has quit IRC | 10:13 | |
*** slaweq has joined #openstack-nova | 10:15 | |
stephenfin | gibi, lyarwood: real quick, do either of you see anything obviously wrong in https://zuul.opendev.org/t/openstack/build/5b196c5a7cf944df8857a08dc15aa79f/console ? | 10:22 |
stephenfin | It's saying "TypeError: create_port_binding() missing 2 required positional arguments: 'port_id' and 'data'" | 10:22 |
stephenfin | But line it's failing on is obviously passing those? | 10:22 |
stephenfin | "binding = client.create_port_binding(port_id, data)['binding']" | 10:22 |
stephenfin | gibi: Also, low priority but I updated a commit message on a patch you'd previously approved, if you have time to revisit https://review.opendev.org/705655 | 10:23 |
gibi | stephenfin: could you link the patch that produced such test result? | 10:24 |
stephenfin | gibi: https://review.opendev.org/#/c/706295/ | 10:24 |
gibi | interesting :) | 10:25 |
stephenfin | right? | 10:25 |
gibi | pulling it down | 10:25 |
gibi | trying to reproduce locally.. | 10:29 |
gibi | stephenfin: nova.tests.fixtures.NeutronFixture.get_port_binding has get_port_binding(self, context, client, port_id, host) signature so when the code pass two positional args it fills only client and context and not the port and host | 10:34 |
stephenfin | gdi | 10:34 |
* stephenfin hangs head in shame | 10:34 | |
stephenfin | gibi++ Thanks /o\ :) | 10:34 |
gibi | no problem | 10:34 |
stephenfin | I'd been staring at that for an hour | 10:34 |
gibi | the error message was deeply missleading | 10:34 |
stephenfin | Yuuup. Bad coincidence :) | 10:35 |
gibi | yepp | 10:35 |
*** priteau has joined #openstack-nova | 10:46 | |
*** vishalmanchanda has quit IRC | 10:50 | |
*** purplerbot has quit IRC | 10:50 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove universal wheel configuration https://review.opendev.org/706466 | 10:50 |
*** purplerbot has joined #openstack-nova | 10:51 | |
*** slaweq_ has joined #openstack-nova | 10:51 | |
*** slaweq has quit IRC | 10:53 | |
*** bauzas has quit IRC | 10:57 | |
*** bauzas has joined #openstack-nova | 10:57 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Add new default roles in os-instance-actions policies https://review.opendev.org/706470 | 11:02 |
lyarwood | stephenfin: sorry missed that, still need me to take a look? | 11:05 |
stephenfin | lyarwood: Nope, we sussed it | 11:05 |
lyarwood | wonderful | 11:05 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: WIP: Use neutronclient's port binding APIs https://review.opendev.org/706295 | 11:06 |
*** yikun has quit IRC | 11:09 | |
*** rpittau is now known as rpittau|bbl | 11:18 | |
*** mkrai has quit IRC | 11:31 | |
*** tbachman has quit IRC | 11:35 | |
*** udesale_ has joined #openstack-nova | 12:04 | |
*** udesale has quit IRC | 12:07 | |
*** gmann has joined #openstack-nova | 12:08 | |
*** slaweq__ has joined #openstack-nova | 12:10 | |
*** mkrai has joined #openstack-nova | 12:10 | |
*** slaweq_ has quit IRC | 12:11 | |
*** nicolasbock has joined #openstack-nova | 12:13 | |
*** ociuhandu has quit IRC | 12:27 | |
*** ociuhandu has joined #openstack-nova | 12:28 | |
*** ratailor has joined #openstack-nova | 12:36 | |
*** mkrai has quit IRC | 12:36 | |
*** ociuhandu has quit IRC | 12:42 | |
*** spatel has joined #openstack-nova | 12:43 | |
*** spatel has quit IRC | 12:50 | |
*** ratailor has quit IRC | 12:53 | |
*** damien_r has quit IRC | 13:06 | |
*** tbachman has joined #openstack-nova | 13:06 | |
*** ociuhandu has joined #openstack-nova | 13:08 | |
*** slaweq has joined #openstack-nova | 13:13 | |
*** rpittau|bbl is now known as rpittau | 13:14 | |
*** bbowen_ has joined #openstack-nova | 13:14 | |
*** bbowen has quit IRC | 13:15 | |
*** slaweq__ has quit IRC | 13:15 | |
*** hongbin has joined #openstack-nova | 13:15 | |
*** Luzi has joined #openstack-nova | 13:16 | |
*** dtantsur is now known as dtantsur|brb | 13:26 | |
*** davidsha has joined #openstack-nova | 13:27 | |
*** damien_r has joined #openstack-nova | 13:37 | |
*** gmann_ has joined #openstack-nova | 13:40 | |
*** gmann has quit IRC | 13:40 | |
*** gmann_ is now known as gmann | 13:40 | |
*** gmann has quit IRC | 13:40 | |
*** gmann has joined #openstack-nova | 13:41 | |
*** nweinber has joined #openstack-nova | 13:44 | |
*** martin_midolesov has quit IRC | 13:47 | |
*** READ10 has joined #openstack-nova | 13:50 | |
*** hongbin has quit IRC | 13:54 | |
*** eharney has joined #openstack-nova | 13:59 | |
*** yan0s has joined #openstack-nova | 14:00 | |
*** amoralej is now known as amoralej|lunch | 14:07 | |
*** lbragstad has quit IRC | 14:17 | |
openstackgerrit | Huachang Wang proposed openstack/nova-specs master: Use PCPU and VCPU in one instance https://review.opendev.org/668656 | 14:19 |
*** tesseract-RH has joined #openstack-nova | 14:19 | |
*** lbragstad has joined #openstack-nova | 14:19 | |
*** tesseract has quit IRC | 14:21 | |
*** spatel has joined #openstack-nova | 14:30 | |
huaqiang | stephenfin:sean-k-mooney:alex_xu spec https://review.opendev.org/668656 is updated, hope you guys have time to review it before freezing day | 14:30 |
*** jmlowe has joined #openstack-nova | 14:30 | |
*** tesseract has joined #openstack-nova | 14:34 | |
*** spatel has quit IRC | 14:34 | |
*** ociuhandu has quit IRC | 14:35 | |
*** tesseract has quit IRC | 14:43 | |
*** tesseract-RH has quit IRC | 14:43 | |
*** tesseract has joined #openstack-nova | 14:44 | |
*** Luzi has quit IRC | 14:45 | |
stephenfin | gibi: Comments left on https://review.opendev.org/#/c/704759/ | 14:49 |
stephenfin | huaqiang: Ack. I'll hit that next Tuesday (spec review day) | 14:49 |
*** Sundar has joined #openstack-nova | 14:52 | |
huaqiang | stephenfin: appreciate! | 14:55 |
*** tbachman has quit IRC | 14:55 | |
gibi | stephenfin: thanks. I will check | 14:56 |
*** amoralej|lunch is now known as amoralej | 14:56 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Add new default roles in os-instance-actions policies https://review.opendev.org/706470 | 15:01 |
*** rpittau is now known as rpittau|brb | 15:04 | |
openstackgerrit | Martin Midolesov proposed openstack/nova master: Implementing graceful shutdown. https://review.opendev.org/666245 | 15:05 |
*** lpetrut has quit IRC | 15:08 | |
*** jmlowe has quit IRC | 15:15 | |
*** ociuhandu has joined #openstack-nova | 15:15 | |
*** ivve has quit IRC | 15:15 | |
*** ociuhandu has quit IRC | 15:16 | |
*** ociuhandu has joined #openstack-nova | 15:17 | |
*** rpittau|brb is now known as rpittau | 15:20 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Add SYSTEM_READER role to servers actions API https://review.opendev.org/706179 | 15:22 |
*** udesale_ has quit IRC | 15:23 | |
*** udesale_ has joined #openstack-nova | 15:24 | |
*** spatel has joined #openstack-nova | 15:25 | |
kashyap | efried: Want to have a stab at this spec? Already approved previously; and has a +2: https://review.opendev.org/#/c/693844/ | 15:26 |
Sundar | dansmith, gibi, sean-k-mooney, efried: About https://review.opendev.org/#/c/706083/, this is a refactor that affects pci requests, numa topology etc. that are not related to the Cyborg patch. I see that some of you are noncommittal and, in any case, probably needs further discussion. The Cyborg series is already long in the tooth; if we add this | 15:29 |
Sundar | general refactor, keeping that entire context in our minds will only prolong our efforts and strain. Plus, some of us are hitting time constraints. What do you think of considering the series as it is now, and feel free to add refactors on top of that later? Both of them could even go concurrently, but the discussions on the refactor need not hold | 15:29 |
Sundar | up the main series. | 15:29 |
*** mriedem has joined #openstack-nova | 15:29 | |
Sundar | Just to be clear, I am not objecting to the refactoring. I am fine either way and can contribute to the reviews. | 15:31 |
sean-k-mooney | Sundar: im currently planing to redeploy with the code you pushed yesterday. | 15:33 |
Sundar | sean-k-mooney: Thanks. | 15:33 |
*** tbachman has joined #openstack-nova | 15:33 | |
sean-k-mooney | i found that with the previous version i was unable to delete the cyborg vm | 15:33 |
*** ociuhandu_ has joined #openstack-nova | 15:33 | |
sean-k-mooney | the conductor was not able to delete the arp binding | 15:34 |
sean-k-mooney | i was able to manually do it but could still not delete the nova vm | 15:34 |
sean-k-mooney | ill see if i hit the same issue with the new version and let you know | 15:34 |
Sundar | I see. In the worst case, for development purposes only, once could delete the ARQs using curl. | 15:35 |
sean-k-mooney | yes but you still cannot delte the nova vm | 15:36 |
Sundar | Sure. You are also updating the Cyborg side, right? | 15:36 |
sean-k-mooney | yep | 15:36 |
sean-k-mooney | i delete all my repos and confirm it pulle the correct version if i also set CYBORG_REPO and CYBORG_BRANCH | 15:36 |
Sundar | Could you expand on what you mean by Nova VM? Different from a devstack all-in-1 setup? | 15:37 |
*** jmlowe has joined #openstack-nova | 15:37 | |
*** ociuhandu has quit IRC | 15:37 | |
sean-k-mooney | i mean if i do openstack server delete <uuid of vm with fake cyborg device> then you cant delete it | 15:37 |
*** igordc has joined #openstack-nova | 15:37 | |
gibi | Sundar: I agree that such refactor could move the focus away from the goal to merge the cyborg integration code to nova. And I understand the time pressure here. I'm fine doing the refactor on top of the series | 15:37 |
sean-k-mooney | the conductor got an unexpected reponce code form cyborg and the delete failed | 15:38 |
gibi | Sundar: I'm reviewing the series as we speak | 15:38 |
sean-k-mooney | Sundar: this could be related to the error handeling you and dansmith were talking about. i did not dig into it as it was late when i tried to delete the vm | 15:39 |
*** ociuhandu_ has quit IRC | 15:40 | |
sean-k-mooney | Sundar: im going to recreate teh env with the latest version and see if the issue is still there | 15:40 |
*** ociuhandu has joined #openstack-nova | 15:42 | |
*** ociuhandu has quit IRC | 15:47 | |
*** spatel has quit IRC | 15:48 | |
*** david-lyle is now known as dklyle | 15:53 | |
Sundar | gibi: Thanks | 15:54 |
Sundar | sean-k-mooney: Please let me know how it goes. I may get on phone calls once in a while but I'll be around all morning PST. | 15:55 |
*** bnemec has quit IRC | 15:55 | |
*** bnemec has joined #openstack-nova | 15:56 | |
efried | Sundar: The refactor is already "on top" of the series, not folded in the middle. It shouldn't block anything else. | 15:58 |
*** jmlowe has quit IRC | 15:58 | |
efried | (It's on top of a patch that's in the middle, but it's its own tip) | 15:58 |
efried | I just put it out there to get agreement with dansmith about what the refactor should look like, and to satisfy myself that it could be done without too much trouble, so that I could be okay with the preceding patch as it is. | 15:59 |
efried | kashyap: ack, in queue | 15:59 |
kashyap | Thank you. | 16:00 |
*** TxGirlGeek has joined #openstack-nova | 16:01 | |
sean-k-mooney | efried: which patch is the refactor https://review.opendev.org/#/c/706083/1 you dont mean https://review.opendev.org/#/c/704227/11 right | 16:03 |
sean-k-mooney | im planning to deploy https://review.opendev.org/#/c/704227/11 | 16:03 |
efried | the former | 16:04 |
sean-k-mooney | cool | 16:04 |
mriedem | dansmith: +2 on your hidden=null fix in case efried or melwitt want to approve | 16:05 |
*** spatel has joined #openstack-nova | 16:06 | |
spatel | sean-k-mooney: hey! i have few more question related NUMA, I did lots of testing with erlang and have pretty good result with numa tuning | 16:07 |
sean-k-mooney | thats good to hear | 16:08 |
sean-k-mooney | so memory seams to be your bottleneck then | 16:08 |
spatel | when i create vm with 16vcpu core just running on single numa node then result is freaking good | 16:08 |
dansmith | mriedem: thanks | 16:08 |
spatel | but if i create vm with 32vCPU and set numa_node=2 in flavor in that case result is 70% worst | 16:09 |
spatel | now question is how do i tell openstack to pin down vCPU with pCPU 1-0-1 ? | 16:09 |
spatel | I do have hw:cpu_policy=dedicated policy but its randomly mapping vCPU <--> pCPU | 16:10 |
spatel | sean-k-mooney: what do you suggest here? | 16:11 |
spatel | In short i want to make my VM fully NUMA map like bare metal has. | 16:11 |
*** sapd1_x has joined #openstack-nova | 16:13 | |
sean-k-mooney | spatel: so it look like the erlang applciation is not numa aware | 16:14 |
*** N3l1x has joined #openstack-nova | 16:14 | |
spatel | sean-k-mooney: but if same erlang application i am running on bare metal it does work and result is good | 16:14 |
sean-k-mooney | meaning it is internally not numa affinitising its memory allocations | 16:14 |
sean-k-mooney | spatel: wehn you set hw:cpu_policy deicated it does a 1:1 mapping between cores on a virtual numa node to the a host numa node | 16:16 |
spatel | i thought if somehow openstack pin vcpu0 <--> pcpu0 , vcpu1 <--> pcpu1 so on... that would be great | 16:16 |
sean-k-mooney | so its not random it is numa aware | 16:16 |
sean-k-mooney | spatel: no it does not by design | 16:16 |
spatel | sean-k-mooney: look at this current mapping - http://paste.openstack.org/show/789225/ | 16:17 |
sean-k-mooney | but it will map all fo the cores in one geust numa node to the a singel numa node on the host | 16:17 |
sean-k-mooney | thats takeing a long time to load for some reason | 16:17 |
sean-k-mooney | paste.openstack.org is down for me it seams | 16:18 |
lyarwood | https://review.opendev.org/#/c/701430/ - trying to get my head around the required api samples for a microversion that changes behaviour but nothing in the request or response, does anyone know what I would need to add here? | 16:19 |
sean-k-mooney | ok it finally loaded | 16:19 |
sean-k-mooney | spatel: how many cores to you have on the host | 16:20 |
sean-k-mooney | im assuming the are 16 core cpus and you have hypter treading enabled? | 16:20 |
sean-k-mooney | spatel: can you provide the host capablities xml and the full guest xml | 16:22 |
spatel | Give me few min.. i am on phone.. | 16:22 |
sean-k-mooney | spatel: ok that xml fragment does not really look correct. | 16:24 |
efried | dansmith: I can push https://review.opendev.org/#/c/706331/ if you don't plan to reword that reno. | 16:36 |
dansmith | efried: it makes sense to me and nobody else has suggestions right? | 16:37 |
*** slaweq has quit IRC | 16:37 | |
*** tesseract has quit IRC | 16:38 | |
efried | dansmith: let me take a real swing at a reword. If I can't come up with something I like, I'll push it. | 16:38 |
*** xek has quit IRC | 16:39 | |
efried | dansmith: do db migrations happen automatically as part of the upgrade process, or do users have to trigger them manually? (Or does it depend on the distro?) | 16:41 |
dansmith | efried: depends a lot | 16:41 |
*** yan0s has quit IRC | 16:43 | |
*** gyee has joined #openstack-nova | 16:44 | |
*** udesale_ has quit IRC | 16:45 | |
efried | dansmith: done, see what you think. | 16:46 |
*** udesale_ has joined #openstack-nova | 16:46 | |
efried | dansmith: for me, the distinction of "apply the fix" versus "install a version that includes the fix" is where the confusion lies. | 16:46 |
dansmith | I totes don't get the substantive difference between your sentences and mine, but as such I'll be glad to change it.. just a sec | 16:47 |
*** udesale_ has quit IRC | 16:48 | |
efried | mriedem: if you're around -- does my suggestion improve anything IYO? | 16:48 |
*** udesale_ has joined #openstack-nova | 16:48 | |
openstackgerrit | Dan Smith proposed openstack/nova master: Fix instance.hidden migration and querying https://review.opendev.org/706331 | 16:48 |
efried | It becomes hard to see anymore having discussed it to death | 16:48 |
efried | dansmith: coulda added the link while you were at it :P | 16:49 |
dansmith | guh, will fix | 16:49 |
dansmith | working on the patch above this so I wanted to get back to that | 16:49 |
openstackgerrit | Dan Smith proposed openstack/nova master: Fix instance.hidden migration and querying https://review.opendev.org/706331 | 16:50 |
dansmith | efried: please check before I roll my tree forward again :) | 16:50 |
*** udesale_ has quit IRC | 16:50 | |
*** udesale_ has joined #openstack-nova | 16:51 | |
efried | dansmith: I'll build locally to make sure. You may need an extra newline between the paragraph and the link def. Also would be nice to link same from the `fixes` note (you don't need / can't include a second target def, they'll both use the same one) | 16:51 |
mriedem | you do | 16:52 |
mriedem | it blows up without that | 16:52 |
openstackgerrit | Dan Smith proposed openstack/nova master: Fix instance.hidden migration and querying https://review.opendev.org/706331 | 16:52 |
mriedem | http://rst.ninjs.org/ | 16:52 |
dansmith | like that? | 16:52 |
efried | dansmith: add trailing newline on the fixes one | 16:53 |
efried | s/newline/undescore/ | 16:53 |
dansmith | "for all you people who have now corrupted their database, I hope you appreciate that we waited quite a while to make sure the bug link was perfect" | 16:53 |
efried | Yeah, these five minutes are really going to make the difference. | 16:53 |
dansmith | a trailing newline where | 16:53 |
dansmith | ? | 16:53 |
efried | If you want to go back to real work, I'll doodle with the formatting and fast approve it. | 16:53 |
efried | not a newline, the second bug link is backticked but not underscored. | 16:54 |
dansmith | no, it's very important to me that I get this right | 16:54 |
mriedem | .. _bug 1862205: https://launchpad.net/bugs/1862205 | 16:54 |
openstack | Launchpad bug 1862205 in OpenStack Compute (nova) "Instances not visible when hidden=NULL" [Critical,In progress] - Assigned to Dan Smith (danms) | 16:54 |
dansmith | oh dammit | 16:54 |
mriedem | you need the .. | 16:54 |
openstackgerrit | Dan Smith proposed openstack/nova master: Fix instance.hidden migration and querying https://review.opendev.org/706331 | 16:55 |
*** hongbin has joined #openstack-nova | 16:55 | |
efried | this would be a great time for that fast-reno tool I've been meaning to write. The releasenotes target takes forEVER to build. | 16:55 |
efried | there, yeah, ps9 looks right to me. | 16:55 |
dansmith | ttfl | 16:56 |
stephenfin | efried: We need to stop 'rm -rf' ing the built docs | 16:56 |
efried | is that what does it? | 16:56 |
efried | I'll try that | 16:57 |
*** tesseract has joined #openstack-nova | 16:57 | |
mriedem | efried: dansmith: so PS9 might work, but i've seen reno complain about links in weird ways between sections, so we'll see if the link defined in the first section is honored in the second | 16:57 |
dansmith | mriedem: efried said it was only needed once | 16:57 |
mriedem | i guess we'll know in a couple of hours :) | 16:57 |
efried | should be the case, but it depends how reno does its build. | 16:58 |
stephenfin | Sphinx builds doctrees and reuses those when possible. Unfortunately it occasionally breaks if you e.g. remove a document that previously existed (like by changing to an older branch) | 16:58 |
stephenfin | probably less on an issue for renos | 16:58 |
*** Sundar has quit IRC | 16:58 | |
*** dpawlik has quit IRC | 16:58 | |
*** hongbin has quit IRC | 16:59 | |
*** spatel has quit IRC | 17:00 | |
efried | stephenfin: In the specs repo I wrote a tool that would only build one directory at a time. It was a fiddly thing to figure out exactly how to word the sphinx-build command. In this case... if I could restrict it to building just $release.rst it would probably be equivalent. | 17:01 |
*** mriedem has quit IRC | 17:02 | |
*** rpittau is now known as rpittau|afk | 17:02 | |
*** mriedem has joined #openstack-nova | 17:03 | |
efried | dansmith, mriedem: local build is green, the link works in both spots as expected. Fast approving. | 17:09 |
*** jmlowe has joined #openstack-nova | 17:09 | |
efried | ...mriedem do you agree with the new wording? Or at least it's not worse? | 17:10 |
efried | dansmith: we shouldn't have to backport the fixture poison beyond Train, right? because we wouldn't be backporting any future fix that hasn't gone through that fixture in Train+ | 17:12 |
dansmith | efried: we wouldn't backport that anyway | 17:12 |
openstackgerrit | Dan Smith proposed openstack/nova stable/train: Fix instance.hidden migration and querying https://review.opendev.org/706582 | 17:14 |
dansmith | backport ^ | 17:14 |
efried | +1 | 17:15 |
efried | stephenfin: fwiw commenting out the 'rm' doesn't seem to be speeding anything up. It's still scanning the world, which seems to be the part that takes forever. | 17:17 |
stephenfin | darn | 17:17 |
stephenfin | I was thinking all that would be cached | 17:17 |
efried | stephenfin: considering a target that patches index.rst and (re)moves $not-this-release.rst before building. | 17:18 |
*** davidsha has quit IRC | 17:18 | |
gibi | efried, dansmith: do we have re-schedule handling implemented in the cyborg series? | 17:18 |
efried | ...and then of course it would have to undo that after | 17:18 |
dansmith | gibi: unsure, I'm kinda depending on sean-k-mooney to poke at some of those things | 17:18 |
efried | gibi: How is that handled for bw? I would have thought it would be the same way. | 17:19 |
stephenfin | efried: Have we closed any more branches yet? https://github.com/openstack/nova/commit/857b5003ccc0b37f4642ed77d9f0d08f9ee28dfb | 17:19 |
dansmith | yeah, I think it should be the same for ports even, but actually showing that it works is different of course | 17:19 |
stephenfin | I guess with EM we no longer do that | 17:19 |
efried | stephenfin: yeah, was thinking about that | 17:19 |
efried | do we truly never EOL anything at this point? | 17:20 |
gibi | the arqs needs to be unbound from the failed dest host and bound to the alternate host. the placement handling shoudl work out of the box, but we at least needs some test coverage to prove | 17:20 |
stephenfin | Maybe we could just live with the lack of updated release notes for those branches? | 17:21 |
dansmith | gibi: yup | 17:21 |
*** psachin has quit IRC | 17:21 | |
sean-k-mooney | dansmith: how would you like me to test reschdule? i can set up a multi node deployment i guess and maybe kill libvirt? | 17:22 |
efried | gibi: agree. At a glance, it's weird that delete_arqs_for_instance is first introduced here https://review.opendev.org/#/c/673735/ | 17:22 |
dansmith | sean-k-mooney: I dunno.. if that'd work then maybe that's easy, or just throw a raise into the code on one node? | 17:24 |
gibi | sean-k-mooney: I think if you kill libvirt then the compute service will go down and the scheduler will not select it as a target | 17:24 |
efried | stephenfin: I removed ocata-train rsts and from index, it's still taking bloody forever. It's the scanning that's slow. | 17:24 |
stephenfin | efried: dhellmann told me there was a way to configure earliest-version to previous each one going back forever | 17:25 |
sean-k-mooney | if i do a multinode deployment then maybe i could kill ovs or the cyborg agent instead | 17:25 |
dansmith | gibi: well, if you disable the compute filter you can make that work still I think | 17:25 |
gibi | Sundar, dansmith, efried: anyhow I left a -1 to get some answer / re-schedul test coverage from Sundar https://review.opendev.org/#/c/631244/61/nova/tests/functional/test_servers.py@7625 | 17:25 |
sean-k-mooney | so that the vif plugin would fail | 17:25 |
efried | stephenfin: I imagine it'd be a matter of fixing the reno.sphinxext code itself | 17:25 |
dansmith | gibi: ack | 17:25 |
gibi | dansmith: true, that can be done | 17:25 |
gibi | another question, does the series gracefully handle that move operations are not supported yet with accelerators? | 17:26 |
stephenfin | efried: I think earliest-version would do the trick. Let me try with that | 17:26 |
dansmith | gibi: probably not, but I'm trying to think of other examples of things we know don't work where we handle that gracefully | 17:29 |
gibi | dansmith: qos ports was implemented a check at the API level | 17:29 |
dansmith | gibi: there are some cases where we don't really know until the virt driver(s) get involved, which is *kinda* this case | 17:29 |
gibi | dansmith: here we know that nova does not support migrating an instance with arq yet, so we should reject that | 17:30 |
dansmith | gibi: okay but that's kindof a high-level thing because the coordination needs to be done above the compute (i.e. with neutron) anyway right? | 17:30 |
dansmith | gibi: you mean libvirt I assume | 17:30 |
stephenfin | efried: nope, still a couple of minutes and fans running at full blast :) | 17:31 |
dansmith | well, hmm, I was going to say the nova bits would still try, but maybe not because the conductor kicks the bind.. /me looks | 17:31 |
gibi | I think not just libvirt but also nova needs to grow support for migration with arqs, like re-querying the resource request of the arq at the start of the migration to include them into the scheduling request | 17:31 |
*** dtantsur|brb is now known as dtantsur | 17:31 | |
*** tbachman has quit IRC | 17:32 | |
*** udesale_ has quit IRC | 17:32 | |
dansmith | gibi: yeah I guess you're right since we don't persist those in the reqspec | 17:32 |
gibi | OK, left this as well as a comment in https://review.opendev.org/#/c/631244/61 | 17:33 |
*** evrardjp has quit IRC | 17:34 | |
*** evrardjp has joined #openstack-nova | 17:34 | |
gibi | I have to leave soon I will continue reading the series on Monday | 17:35 |
dansmith | I guess we probably have cold migration handling for pci devices specifically, | 17:35 |
dansmith | but I was kinda thinking this would fail for live migration like I think it does for pci, which is.. late in the virt driver IIRC | 17:35 |
*** tesseract has quit IRC | 17:36 | |
gibi | my point is that we know we need to write some nova code to support these ops. So while we dont have that code we can reject such ops from the API. | 17:37 |
*** ivve has joined #openstack-nova | 17:38 | |
dansmith | gibi: yep, agree, I'm just talking out loud | 17:38 |
*** jcosmao has left #openstack-nova | 17:38 | |
dansmith | talking out loud? thinking out loud :) | 17:38 |
* gibi tend to talk out loud while thinking alone | 17:38 | |
dansmith | I don't want to put api-level barriers to something that is just a virt limitation but you're right, none of the paths where we do the arq stuff gets tickled in the, for example, resize paths | 17:39 |
*** tosky has quit IRC | 17:40 | |
gibi | yeah, I thin resize, migrate, live migrate, evacuate, unshelve (after offload) needs some code to recreate the proper resource request for the scheduling | 17:40 |
dansmith | yup | 17:40 |
dansmith | gibi: I was telling efried the other day that I had been so tunnel-vision on the bones of this that I wanted him to run through it again to break that up for me | 17:42 |
dansmith | for exactly this reason, so thanks for being that force :) | 17:42 |
gibi | I saw that two of you already handling this seris so I decided to focus energies elsewhere. but agree that fresh eyes helps to se things differently | 17:44 |
dansmith | yup | 17:48 |
*** ociuhandu has joined #openstack-nova | 17:50 | |
*** tbachman has joined #openstack-nova | 17:50 | |
*** Liang__ has quit IRC | 17:52 | |
*** Liang__ has joined #openstack-nova | 17:54 | |
*** ociuhandu has quit IRC | 17:56 | |
*** priteau has quit IRC | 17:56 | |
*** jmlowe has quit IRC | 17:56 | |
*** amoralej is now known as amoralej|off | 17:58 | |
*** mlavalle has joined #openstack-nova | 18:04 | |
*** derekh has quit IRC | 18:05 | |
*** spatel has joined #openstack-nova | 18:10 | |
spatel | sean-k-mooney: I am back now, sorry was in back to back meeting | 18:10 |
*** igordc has quit IRC | 18:18 | |
*** artom has quit IRC | 18:25 | |
*** artom has joined #openstack-nova | 18:25 | |
*** jmlowe has joined #openstack-nova | 18:30 | |
sean-k-mooney | dansmith: we have a check that blocks live migration if there are pci_request spec object that are not related to neutorn sriov ports | 18:38 |
sean-k-mooney | dansmith: but that wont block cyborg devices | 18:38 |
dansmith | sean-k-mooney: okay so something similar for cyborg I guess | 18:38 |
sean-k-mooney | since we also dont have pci_request spec objects | 18:38 |
sean-k-mooney | well for now we could just check if the flavor has accl:device-profile or whatever teh extraspec is | 18:39 |
dansmith | right | 18:39 |
sean-k-mooney | im not sure we will be able to do that check in the api because we would need to check the embeded flaovr form the cell db. actully no we can. we can grab it from the request_spec in the api db | 18:40 |
sean-k-mooney | so ya we can reject the live migration in the api layer until we support that | 18:40 |
dansmith | not sure what the problem is.. the api can look at the instance's embedded flavor | 18:41 |
sean-k-mooney | yep it can so no problem | 18:41 |
sean-k-mooney | i was thinking we might need to do a down call to the cell db to get it but we dont | 18:41 |
sean-k-mooney | so all good | 18:41 |
dansmith | we do to get the actual embedded flavor, but that's fine of copurse | 18:42 |
dansmith | anything that does anything on an instance gets the instance record from the cell in the api | 18:42 |
sean-k-mooney | well the embeded flavor is aslo stored in the api db in the request spec | 18:42 |
dansmith | it's stored as it was at the time of creation, but not necessarily the same as what the instance has now | 18:42 |
dansmith | and if we've done a data migration or something they could have diverged | 18:43 |
sean-k-mooney | oh that wont be update after reisze? | 18:43 |
sean-k-mooney | ah ok | 18:43 |
dansmith | the instance's actual flavor is what we should use, and it's no more expensive to get | 18:43 |
sean-k-mooney | sure makes sense | 18:43 |
*** dtantsur is now known as dtantsur|afk | 18:45 | |
mriedem | the requestspec.flavor is updated as part of a resize, but .... there be bugs | 18:45 |
*** igordc has joined #openstack-nova | 18:49 | |
spatel | sean-k-mooney: here is the virsh capability - http://paste.openstack.org/show/789301/ | 18:50 |
*** jmlowe has quit IRC | 18:50 | |
spatel | here is the vCPU pinning map - http://paste.openstack.org/show/789302/ | 18:51 |
sean-k-mooney | ok so the first 15 cpus are pinned to host numa node 0 | 18:58 |
spatel | Yes | 18:58 |
sean-k-mooney | and the second 15 are all pinned to host numa node 1 | 18:58 |
sean-k-mooney | so this is correct | 18:59 |
spatel | Yes | 18:59 |
spatel | if you see all looks correct from VM point of view.. | 18:59 |
spatel | VM correctly mapped its vCPU pins across physical numa nodes | 19:00 |
sean-k-mooney | yes so you left out one of the imporatn numa elelmnt form the vm | 19:00 |
*** imacdonn has quit IRC | 19:00 | |
spatel | ? | 19:00 |
*** imacdonn has joined #openstack-nova | 19:01 | |
sean-k-mooney | you left out the numa element | 19:01 |
mriedem | important numa elements from the vm | 19:01 |
mriedem | translation-as-a-service | 19:01 |
sean-k-mooney | oh ya i miss typed talking downstream as well | 19:01 |
spatel | I am not following you guys.. | 19:01 |
sean-k-mooney | ill slow down | 19:01 |
mriedem | remember how this isn't a support channel? | 19:01 |
*** Sundar has joined #openstack-nova | 19:01 | |
* mriedem leaves | 19:02 | |
sean-k-mooney | nova will map the first half of the cpus to numa node 0 in the guess so cores 0-15 and form what you have show it is working correctly | 19:02 |
*** mriedem has left #openstack-nova | 19:02 | |
spatel | Yes.. that is what i am also saying.. CPU pins correctly mapped out to NUMA nodes | 19:03 |
*** martinkennelly has joined #openstack-nova | 19:03 | |
spatel | This is what NUMA looks inside VM0 -> http://paste.openstack.org/show/789304/ | 19:03 |
spatel | now question is when i run application why i am seeing poor performance? | 19:04 |
Sundar | gibi: Any pointers to how you handled reschedule for bandwidth provider? | 19:04 |
spatel | Do you think i need to do something with -> hw:cpu_thread_policy ? | 19:04 |
*** tbachman has quit IRC | 19:05 | |
sean-k-mooney | spatel: it woudl appear the aplication is not correctly optimising for the numa toplogy. | 19:05 |
sean-k-mooney | hw:cpu_thread_polciy wont help | 19:05 |
sean-k-mooney | you can alther the cpu thread and socket toplogy in the guest which might but this does not seeam to be a nova/libvirt issue | 19:06 |
sean-k-mooney | spatel: my guess is that you have not toled the guest it is using hyper treads so it is miss allocationg threads internally to cores it think are independed but are not | 19:07 |
spatel | I am just trying to understand if i run application on bare metal then performance is great but when i create VM with all CPU then performance is bad.. | 19:07 |
sean-k-mooney | hw:cpu_thread_policy=isolate woudl help but it woudl reduce the available core count | 19:07 |
sean-k-mooney | you shoudl set hw:cpu_treads=2 and hw:cpu_sockets=2 | 19:08 |
spatel | how may core count it will reduce ? | 19:08 |
sean-k-mooney | but you cant do that with 30 cpus | 19:08 |
sean-k-mooney | you need either 28 or 32 | 19:08 |
spatel | i will go with 28.. | 19:08 |
Sundar | gibi: "does the series gracefully handle that move operations are not supported yet with accelerators?" There was an earlier attempt to explicitly block unsupported ops: https://review.opendev.org/#/c/674726/ But it was decided that it is not needed, and we should document the supported ops in Cyborg. | 19:08 |
sean-k-mooney | so that you have an even number of cpus per numa node/socket | 19:08 |
spatel | you are saying i should set flavor -> hw:cpu_treads=2 and hw:cpu_sockets=2 and run benchmark right? | 19:09 |
sean-k-mooney | spatel: set hw:cpu_treads=2 and hw:cpu_sockets=2 and do not set hw:cpu_thread_policy=isolate and see if that fixes the issue | 19:09 |
spatel | sean-k-mooney: sounds good | 19:09 |
sean-k-mooney | spatel: actuly i misspelled the first one check the flavor docs to confrim they are right | 19:10 |
sean-k-mooney | spatel: https://docs.openstack.org/nova/latest/user/flavors.html | 19:10 |
spatel | I will check doc don't worry all i need clue to try something :) | 19:10 |
Sundar | gibi: "nova needs to grow support for migration with arqs" Since Cyborg does PCI passthrough, live migration is not supported. | 19:11 |
spatel | sean-k-mooney: you are saying that option will tell my VM you have two socket.. | 19:11 |
sean-k-mooney | by default openstack emulates each gust cpu as a sperate socket which is incorrect if you have HT enabled on the host | 19:11 |
spatel | I do have HT enabled | 19:11 |
sean-k-mooney | yep so you want to tell the vm it has 2 threads per cpu and in this case 2 socket 1 per numa node | 19:12 |
sean-k-mooney | that will help the guest kerenl make correct schduling decisions | 19:12 |
spatel | sean-k-mooney: i think you are correct.. that could be the issue why erlang getting confused | 19:13 |
spatel | because erlang also run inside VM and it should have own scheduler | 19:13 |
spatel | sean-k-mooney: I will keep you posted about my testing.. (today isn't possible but Monday i will have some result. | 19:14 |
spatel | sean-k-mooney: let me create i should set hw:cpu_treads=2 and hw:cpu_sockets=2 and hw:numa_node=2 right? | 19:15 |
spatel | otherwise it won't let me run my VM on two numa nodes | 19:16 |
sean-k-mooney | the socket and numa nodes dont have to match but it general works better | 19:16 |
Sundar | sean-k-mooney: Catching up on earlier discussion about instance ops with Cyborg. Explicit blocking of requests with a device profile name in the extra specs ( https://review.opendev.org/#/c/674726/) was not considered the way to go. Is that still your thinking? | 19:17 |
*** tbachman has joined #openstack-nova | 19:17 | |
spatel | sean-k-mooney: i will set all three option and give it a try.. thank you.. | 19:18 |
sean-k-mooney | Sundar: am honestly i dont rember the full context of the discussin. i know we discussed this at lenght but cant recal what the decision was | 19:18 |
sean-k-mooney | i think we said skip the explcit check but document what works? but we can aslo block it if we want | 19:19 |
*** igordc has quit IRC | 19:19 | |
sean-k-mooney | So the consensus is Option 2: | 19:20 |
sean-k-mooney | > Gradually phase in the support for the server operations and document the limitations in the meantime but don't actively block them in the API like this change does. | 19:20 |
sean-k-mooney | We will state the limitations, if any, in Cyborg documentation. | 19:20 |
sean-k-mooney | that was the last comment on that | 19:20 |
sean-k-mooney | option 2 was "2. Gradually phase in the support for the server operations and document the limitations in the meantime but don't actively block them in the API like this change does. They either work (by chance) or they don't, but they aren't officially supported. Once they are supported, we patch them in without a new microversion as bug fixes (or just claim test support so they are no longer | 19:22 |
sean-k-mooney | considered experimental)." | 19:22 |
sean-k-mooney | gibi: dansmith ^ are ye still ok with that regarding the livemigation check | 19:23 |
sean-k-mooney | so no check for now and document. and we can add one if we want in the future | 19:23 |
dansmith | sean-k-mooney: where was that? in the spec? | 19:25 |
dansmith | it really depends on what the result is | 19:25 |
sean-k-mooney | https://review.opendev.org/#/c/674726/ | 19:25 |
sean-k-mooney | its was a nova patch | 19:26 |
dansmith | if it's data corruption, state intervention required, etc then it needs a check to be graceful | 19:26 |
dansmith | if it fails in some reasonable way then I'm not so concerned | 19:26 |
sean-k-mooney | dansmith: i think libvirt will raise an error | 19:26 |
sean-k-mooney | qemu will reject a migration if the domain has a hostdev that is not of type usb | 19:27 |
dansmith | for live migration I assume, but based on the changes we've made to the flow, I'm not sure what will happen on resize | 19:27 |
sean-k-mooney | so we will get to the migrate call and it will fail | 19:27 |
sean-k-mooney | yes | 19:27 |
dansmith | we might migrate the instance and ignore the fact that it's missing an accelerator | 19:27 |
sean-k-mooney | ya i dont know what will happen for resize evacuate or shelve | 19:27 |
sean-k-mooney | ill find out | 19:28 |
dansmith | if live fails in a predictable and recoverable way, then I'm fine without a check on that one, | 19:28 |
dansmith | which is what I said earlier that virt-specific limitations shouldn't be enforced in the api when we can help it | 19:28 |
Sundar | dansmith, sean-k-mooney: FWIW, IMHO, an explicit check for device profiles in extra specs, as in https://review.opendev.org/#/c/674726/, is probably the safest and clearest to the user. | 19:29 |
Sundar | We did say that it should be documentation only. But I don;t know if folks will read Cborg dics, or any docs, before kicking off an op. | 19:30 |
sean-k-mooney | Sundar: the main issue with that patch was it was too agressive in what it blocked | 19:31 |
Sundar | *Cyborg docs | 19:31 |
sean-k-mooney | and users never read docs until it breaks | 19:31 |
Sundar | sean-k-mooney: The details of the patch can be adjusted. | 19:31 |
Sundar | dansmith, efried, gibi, sean-k-mooney: Would you all recommend to bring back https://review.opendev.org/#/c/674726/? | 19:32 |
*** dpawlik has joined #openstack-nova | 19:33 | |
sean-k-mooney | i would not restore it as is. if we add code to block it we shoudl only block the operation we know do not work. | 19:36 |
sean-k-mooney | that change blocks all snapshoting and backups, and interface and volume atache/removal and other operations like rescue and lock. | 19:38 |
sean-k-mooney | blocking resize,live-migrate and evacuate might make sense. the rest i think are questionable | 19:39 |
sean-k-mooney | also blocking shelve might make sense but again we said document and fix as bugfixes so we could adress them one by one without api changes for each | 19:40 |
Sundar | "Without API changes" -- are you thinking of microversion changes for blocking now and end every unblock in the future, if we support more ops? | 19:42 |
Sundar | *and every | 19:43 |
sean-k-mooney | yes im saying we didnt want to do a microversion bump for evey one | 19:43 |
*** xek has joined #openstack-nova | 19:47 | |
sean-k-mooney | speaking of op i just booted a vm with your latest revision so ill go test some of them | 19:47 |
Sundar | Sure. Thanks. | 19:48 |
sean-k-mooney | my isp broke my home network so its takeing longer then i hoped to test this. i might set up the multi node setup on my laptop instead of my home openstack on monday but ill do what i can on a singel node first | 19:50 |
*** dpawlik has quit IRC | 19:55 | |
*** mriedem has joined #openstack-nova | 19:55 | |
*** mmethot has joined #openstack-nova | 20:07 | |
efried | Sundar: I've always been in favor of that idea (blocking unsupported operations with a useful message rather than letting them fail "organically" and mysteriously), but I know others disagree. | 20:09 |
*** mmethot_ has quit IRC | 20:09 | |
sean-k-mooney | for what its worth i have just done boot,stop,start,reboot,add/remove volume, rescue/unrescue and so far no errors | 20:11 |
sean-k-mooney | i am not seeing any real interaction with cyborg during those operations | 20:11 |
sean-k-mooney | which is more or less expected but im not sure if we would loss the acclerator when we regenerate the xml | 20:12 |
*** READ10 has quit IRC | 20:13 | |
sean-k-mooney | i will look at this more closely next week but right now im just check to make sure the operation complette correctly | 20:13 |
*** mdbooth has quit IRC | 20:14 | |
*** mdbooth has joined #openstack-nova | 20:15 | |
*** ociuhandu has joined #openstack-nova | 20:16 | |
*** xek has quit IRC | 20:19 | |
*** spatel has quit IRC | 20:19 | |
*** ociuhandu has quit IRC | 20:21 | |
Sundar | sean-k-mooney: Great. FWIW, I do most of these with FPGAs. Except rescue/unrescue with different images. The list of ops i have checked are in: https://review.opendev.org/673735 | 20:29 |
sean-k-mooney | im still makeing my way through the list. | 20:29 |
sean-k-mooney | im hoping to get access to a real server with a rush creek or vista creek next week or the week after | 20:30 |
sean-k-mooney | which one are you using again? | 20:30 |
sean-k-mooney | i have jsut done add/remove network interface, pause,unpause, suspend,resume, lock,unlock and rebuild | 20:31 |
Sundar | Rush Creek, DCP 1.2 | 20:31 |
sean-k-mooney | cool we have 1 server i think with one of each so ill ask for the rush creek system | 20:31 |
Sundar | OPAE version 1.1.2-1 | 20:31 |
Sundar | The OPAE packages come with some sample bitstreams, sp. NLB modes 0 and 3. I use both. | 20:32 |
Sundar | I can also be reached at ns1.sundar AT gmail DOT com if there is a need for more detailed responses or file transfers. | 20:33 |
Sundar | I have some utility functions to create/delete device profiles, ARQs using curl. ould be happy to share them if you prefer. | 20:35 |
sean-k-mooney | im just using the openstack client and i have a test script | 20:37 |
Sundar | The openstack client is WIP. There are patches to recast it to use openstacksdk etc. https://review.opendev.org/#/c/681391/ | 20:38 |
sean-k-mooney | yep im using that | 20:38 |
sean-k-mooney | with the openstacksdk patch too | 20:39 |
sean-k-mooney | there are a bunch of design issue with it that should be adressed but its kind of useable | 20:39 |
sean-k-mooney | like when you create a device profile you can only spify its name not the uuid but it only allows you to show a device profile by uuid not name | 20:40 |
Sundar | Yea, agreed. The APIs allow both name and uuid though. | 20:40 |
sean-k-mooney | yep so the patches are just incomplete | 20:41 |
sean-k-mooney | i have worked around that | 20:41 |
*** dpawlik has joined #openstack-nova | 20:41 | |
Sundar | Do you have any pointers for me to look at the rescheduling question? | 20:42 |
sean-k-mooney | i have not read the question so no. i was going to try and force it by making the vm spawn fail | 20:43 |
sean-k-mooney | ok so first bug. when i shelve the instance the arq is still bound to the host when the vm is shevle_offloaded | 20:44 |
Sundar | Yup, I have not added support for shelve. The Delete ARQ patch states what I support. | 20:45 |
sean-k-mooney | yep but we want to check them all anyway | 20:45 |
Sundar | Basically, I prioritized the basic ops. I asked some folks outside Intel what they do with FPGAs in their lab with Cyborg, and went with that. | 20:46 |
sean-k-mooney | ya which is fine. unshevel "works" but i dont see any interacation with cyborg os i suspect it would not have the accellerator attached after unshelve. | 20:47 |
sean-k-mooney | again thats fine we just need to document it | 20:48 |
*** lbragstad has quit IRC | 20:48 | |
Sundar | ok | 20:49 |
sean-k-mooney | ok so after unshelve the allocations do not container the fake device so ya that means the device woudl be lsot | 20:50 |
*** Liang__ has quit IRC | 20:51 | |
sean-k-mooney | it looks like resize to a different flavor and then back does not fix the placement allcoation | 20:54 |
sean-k-mooney | it did however complete | 20:54 |
*** TxGirlGeek has quit IRC | 20:55 | |
*** spatel has joined #openstack-nova | 20:55 | |
Sundar | Yes, I have only tested with resize to the same flavor, which is of course a no op. I was just making sure that there is no basic gotcha. | 20:58 |
sean-k-mooney | this is what i tested with the fake driver today | 20:59 |
sean-k-mooney | http://paste.openstack.org/show/789306/ | 20:59 |
sean-k-mooney | Sundar: resize to same flavor should be blocked in the api | 20:59 |
sean-k-mooney | it is in the client | 21:00 |
sean-k-mooney | you can migrate but resize to same flavor is invalid | 21:00 |
Sundar | remove-vol/net: unrelated to Cyborg, right? | 21:01 |
sean-k-mooney | yes but you blocked them in your patch that check for the flavor extra spec | 21:01 |
sean-k-mooney | they seam to work fine | 21:01 |
Sundar | Suspend will not work with real FPGAs, because libvirt will error out with: Domain has assigned non-USB devices. | 21:02 |
sean-k-mooney | at least with the fake dirver | 21:02 |
sean-k-mooney | suspend shoudl do a managed save which will detach all hostdev devices | 21:02 |
Sundar | delete failed? | 21:02 |
sean-k-mooney | yes | 21:02 |
sean-k-mooney | it failed in the last versions too | 21:03 |
sean-k-mooney | the conductor explodes with an unexpected respoce form cyborg | 21:03 |
sean-k-mooney | when it tries to delete /unbind the arqs | 21:03 |
Sundar | Hmm, please send me the logs. I do deletes all the time, but your sequence of ops probably triggered something. | 21:04 |
sean-k-mooney | i found that was broken just with boot then delete two days ago | 21:04 |
sean-k-mooney | ill need to unstack and stack to be able to test it again | 21:04 |
Sundar | I'll try boot + delete. Was it hard reboot? | 21:05 |
sean-k-mooney | no just boot then delete | 21:06 |
sean-k-mooney | hardreboot seams to be fine | 21:06 |
sean-k-mooney | Sundar: https://etherpad.openstack.org/p/sean-cyborg-testing-delete-logs | 21:08 |
sean-k-mooney | paste.openstack.org isnt loadign for me | 21:08 |
sean-k-mooney | but that is all the nova and cyborg logs | 21:08 |
sean-k-mooney | it look like the cyborg api returned a 401 | 21:10 |
Sundar | devstack@cyborg-api.service[26903]: 2020-02-07 20:56:05.719 .... Authorization failed for token: keystonemiddleware.auth_token._exceptions.InvalidToken: Token authorization failed. | 21:12 |
sean-k-mooney | yep | 21:12 |
sean-k-mooney | so the call to cyborg either need an admin token which i dont think is correct or the token expired and you need to handel that | 21:13 |
sean-k-mooney | deleting the vm again does not fix it by they way | 21:14 |
sean-k-mooney | but i can create and delete non cyborg vms | 21:14 |
sean-k-mooney | i get the same 401 reponce on the second attempt | 21:15 |
sean-k-mooney | but i can delete hte arq myself | 21:15 |
Sundar | What does it take to handle an expired token? BTW: https://opendev.org/openstack/cyborg/src/branch/master/cyborg/common/policy.py#L86 applies defuault rule for ARQ deletes | 21:15 |
*** owalsh has quit IRC | 21:16 | |
Sundar | *default rule: rule:admin_or_owner | 21:16 |
sean-k-mooney | so there is basically two ways to try and resovle it or you can raise an error | 21:17 |
sean-k-mooney | if service tokens are configured youc an fall back to using those alternitvly you can have an admin token for the servcie in the nova conf and escalate | 21:17 |
sean-k-mooney | or you can fail | 21:17 |
sean-k-mooney | but in this can my token was not expired | 21:17 |
sean-k-mooney | when i did it again i was issue a new token and it failed again | 21:18 |
*** dpawlik has quit IRC | 21:18 | |
sean-k-mooney | my devstack also issue tokesn that last for an hour so it shoudl be fine | 21:18 |
sean-k-mooney | anyway im done for the day o/ | 21:20 |
Sundar | Ok. Not sure what is going wrong here. I'll try it out as non-admin. | 21:21 |
Sundar | Thanks a lot for all your efforts. ;) Have a good weekend! | 21:21 |
*** derekh has joined #openstack-nova | 21:29 | |
*** derekh has quit IRC | 21:30 | |
*** owalsh has joined #openstack-nova | 21:34 | |
Sundar | efried: Re. "blocking unsupported operations", the objection seems to be that each change requires microversion changes. | 21:43 |
*** ociuhandu has joined #openstack-nova | 21:44 | |
Sundar | FWIW, I think any further changes after this series will be few and far between. We could put in a microversion change now. Any further changes should be clumped together as much as possible, based on operator requirements. | 21:45 |
*** ociuhandu has quit IRC | 21:47 | |
*** ociuhandu has joined #openstack-nova | 21:47 | |
*** nweinber has quit IRC | 21:50 | |
*** N3l1x has quit IRC | 22:02 | |
*** ociuhandu has quit IRC | 22:04 | |
*** ociuhandu has joined #openstack-nova | 22:05 | |
*** jmlowe has joined #openstack-nova | 22:10 | |
*** mriedem has left #openstack-nova | 22:18 | |
*** jmlowe has quit IRC | 22:19 | |
*** jmlowe has joined #openstack-nova | 22:20 | |
*** jmlowe has quit IRC | 22:21 | |
*** ociuhandu has quit IRC | 22:25 | |
*** iurygregory has quit IRC | 22:29 | |
*** tosky has joined #openstack-nova | 22:31 | |
*** spatel has quit IRC | 22:32 | |
*** tbachman has quit IRC | 22:38 | |
*** spatel has joined #openstack-nova | 22:43 | |
*** spatel has quit IRC | 22:48 | |
*** Sundar has quit IRC | 22:51 | |
*** dpawlik has joined #openstack-nova | 22:51 | |
*** dpawlik has quit IRC | 22:59 | |
*** damien_r has quit IRC | 23:09 | |
*** bbowen has joined #openstack-nova | 23:22 | |
*** bbowen_ has quit IRC | 23:23 | |
*** mlavalle has quit IRC | 23:33 | |
*** martinkennelly has quit IRC | 23:41 | |
*** ralonsoh has quit IRC | 23:56 | |
*** spatel has joined #openstack-nova | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!