*** jmlowe has joined #openstack-nova | 00:01 | |
*** brinzhang0 has joined #openstack-nova | 00:06 | |
*** brinzhang_ has quit IRC | 00:09 | |
*** aj_mailing has joined #openstack-nova | 00:10 | |
*** xek_ has quit IRC | 00:14 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Add new default roles in tenant networks policies https://review.opendev.org/742771 | 00:15 |
---|---|---|
*** brinzhang has joined #openstack-nova | 00:17 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Add test coverage of tenant networks policies https://review.opendev.org/742765 | 00:18 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Introduce scope_types in tenant networks policy https://review.opendev.org/742766 | 00:19 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Add new default roles in tenant networks policies https://review.opendev.org/742771 | 00:19 |
*** brinzhang0 has quit IRC | 00:20 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Pass the actual target in tenant networks policy https://review.opendev.org/742772 | 00:23 |
*** aj_mailing has quit IRC | 00:33 | |
*** aj_mailing has joined #openstack-nova | 00:34 | |
*** songwenping__ has joined #openstack-nova | 00:45 | |
*** xiaolin has joined #openstack-nova | 00:53 | |
*** jmlowe has quit IRC | 00:58 | |
*** yaawang has quit IRC | 01:00 | |
*** jmlowe has joined #openstack-nova | 01:00 | |
*** yaawang has joined #openstack-nova | 01:00 | |
*** songwenping_ has joined #openstack-nova | 01:07 | |
*** songwenping__ has quit IRC | 01:10 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Add test coverage of volumes policies https://review.opendev.org/742773 | 01:15 |
*** masterpe has quit IRC | 01:16 | |
*** masterpe has joined #openstack-nova | 01:20 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Introduce scope_types in volumes policy https://review.opendev.org/742774 | 01:22 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Add new default roles in security_groups policies https://review.opendev.org/742763 | 01:23 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Pass the actual target in security_groups policy https://review.opendev.org/742764 | 01:23 |
openstackgerrit | Yingji Sun proposed openstack/nova master: Set different VirtualDevice.key https://review.opendev.org/713565 | 01:38 |
*** aj_mailing has quit IRC | 01:47 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Add new default roles in volumes policies https://review.opendev.org/742777 | 01:57 |
*** yaawang has quit IRC | 02:04 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Pass the actual target in volumes policy https://review.opendev.org/742779 | 02:07 |
*** songwenping__ has joined #openstack-nova | 02:09 | |
*** songwenping_ has quit IRC | 02:12 | |
*** mkrai has joined #openstack-nova | 02:20 | |
*** yaawang has joined #openstack-nova | 02:24 | |
*** aj_mailing has joined #openstack-nova | 02:25 | |
*** dave-mccowan has quit IRC | 02:26 | |
alex_xu | stephenfin: gibi, I saw you mentioned the upgrade issue for provider config yaml. I didn't follow the spec discussion in the beginning, could you remind me what is about? then I think I can help tony_su go through the problem. | 02:28 |
*** lbragstad_ has joined #openstack-nova | 02:30 | |
openstackgerrit | Merged openstack/nova stable/stein: compute: Allow snapshots to be created from PAUSED volume backed instances https://review.opendev.org/729176 | 02:30 |
*** aj_mailing has quit IRC | 02:31 | |
*** lbragstad has quit IRC | 02:32 | |
*** gyee has quit IRC | 02:33 | |
*** lbragstad_ has quit IRC | 02:35 | |
*** gyee has joined #openstack-nova | 02:40 | |
*** Yumeng has joined #openstack-nova | 02:43 | |
openstackgerrit | Merged openstack/nova stable/ussuri: objects: Update keypairs when saving an instance https://review.opendev.org/742631 | 02:50 |
*** yaawang has quit IRC | 03:00 | |
*** yaawang has joined #openstack-nova | 03:01 | |
*** huaqiang has joined #openstack-nova | 03:09 | |
openstackgerrit | Xinran WANG proposed openstack/nova-specs master: SRIOV SmartNic Support Specification https://review.opendev.org/742785 | 03:15 |
*** songwenping__ has quit IRC | 03:19 | |
*** songwenping__ has joined #openstack-nova | 03:19 | |
*** mriedem has left #openstack-nova | 03:23 | |
tony_su | gibi: stephenfin: A status update for provider-config-file patches. I am handling your comments which are all valuable. Most of them are easy and okay to simply upgrade patches. But a few like refactor schema into code or add new test coverage require more consideration and more days ... | 03:26 |
tony_su | gibi: stephenfin: A status update for provider-config-file patches. I am handling your comments which are all valuable. Most of them are easy and okay to simply upgrade patches. But a few like refactor schema into code or add new test coverage require more consideration and more days ... | 03:27 |
*** aj_mailing has joined #openstack-nova | 03:27 | |
*** tony_su has left #openstack-nova | 03:27 | |
*** tony_su has joined #openstack-nova | 03:28 | |
*** yaawang has quit IRC | 03:31 | |
openstackgerrit | Yingji Sun proposed openstack/nova master: Set different VirtualDevice.key https://review.opendev.org/713565 | 03:32 |
*** yaawang has joined #openstack-nova | 03:32 | |
*** brinzhang_ has joined #openstack-nova | 03:33 | |
*** brinzhang has quit IRC | 03:36 | |
*** psachin has joined #openstack-nova | 03:36 | |
*** huaqiang has quit IRC | 03:40 | |
*** yaawang has quit IRC | 04:09 | |
*** yaawang has joined #openstack-nova | 04:09 | |
*** gyee has quit IRC | 04:14 | |
openstackgerrit | Xinran WANG proposed openstack/nova-specs master: SRIOV SmartNic Support Specification https://review.opendev.org/742785 | 04:16 |
*** aj_mailing has quit IRC | 04:28 | |
*** udesale has joined #openstack-nova | 04:33 | |
*** mkrai has quit IRC | 04:34 | |
*** mkrai has joined #openstack-nova | 04:44 | |
*** songwenping_ has joined #openstack-nova | 04:54 | |
*** eharney has quit IRC | 04:55 | |
*** amodi has quit IRC | 04:55 | |
*** songwenping__ has quit IRC | 04:57 | |
*** aj_mailing has joined #openstack-nova | 05:02 | |
*** eharney has joined #openstack-nova | 05:08 | |
*** yaawang has quit IRC | 05:11 | |
*** yaawang has joined #openstack-nova | 05:12 | |
*** ratailor has joined #openstack-nova | 05:14 | |
*** aj_mailing has quit IRC | 05:17 | |
*** aj_mailing has joined #openstack-nova | 05:25 | |
*** links has joined #openstack-nova | 05:37 | |
*** songwenping__ has joined #openstack-nova | 05:47 | |
*** songwenping_ has quit IRC | 05:51 | |
*** jsuchome has joined #openstack-nova | 06:31 | |
*** tinwood is now known as tinwood-afk | 06:33 | |
*** yaawang has quit IRC | 06:59 | |
*** yaawang has joined #openstack-nova | 06:59 | |
*** aj_mailing has quit IRC | 07:05 | |
*** aj_mailing has joined #openstack-nova | 07:06 | |
*** aj_mailing has quit IRC | 07:09 | |
*** tesseract has joined #openstack-nova | 07:13 | |
*** ralonsoh has joined #openstack-nova | 07:28 | |
gibi | tony_su: don't worry. I appreciate your work on that series and I will look at it when you are ready | 07:29 |
gibi | alex_xu: I'm not sure I can recall an upgrade issue in the provider config series (but it is Friday so my brain is already slow) do you have a reference? | 07:31 |
bauzas | gibi: do you know the answer of https://review.opendev.org/#/c/739211/5/nova/tests/unit/test_crypto.py@21 | 07:32 |
bauzas | ? | 07:32 |
bauzas | that's an horrible import | 07:32 |
gibi | bauzas: looking... | 07:32 |
bauzas | hmmm, can't find a castellanclient kind of thing | 07:34 |
gibi | bauzas: does castellan just an interface and by having castellen we don't have to pull in whole key manager backend like barbican | 07:34 |
gibi | ? | 07:34 |
bauzas | I'm not a specialist of any OpenStack key manager | 07:35 |
bauzas | but if you're right, that explains my readings | 07:35 |
* bauzas goes looking at the castellan docs | 07:35 | |
bauzas | mmmm https://docs.openstack.org/castellan/latest/user/index.html#basic-usage | 07:36 |
bauzas | looks you're right indeed | 07:36 |
*** tosky has joined #openstack-nova | 07:37 | |
*** yaawang has quit IRC | 07:40 | |
*** yaawang has joined #openstack-nova | 07:40 | |
*** mkrai has quit IRC | 07:44 | |
*** maciejjozefczyk has joined #openstack-nova | 07:45 | |
*** xinranwang__ has joined #openstack-nova | 08:05 | |
*** markvoelker has joined #openstack-nova | 08:11 | |
*** markvoelker has quit IRC | 08:15 | |
*** nightmare_unreal has joined #openstack-nova | 08:27 | |
*** mkrai has joined #openstack-nova | 08:32 | |
*** xek_ has joined #openstack-nova | 08:35 | |
stephenfin | bauzas, gibi: The fix for that o.vo version issue is here, btw https://review.opendev.org/#/c/742650/1 | 08:41 |
gibi | stephenfin: thanks | 08:41 |
stephenfin | alex_xu: As gibi said, I don't think anyone noted any upgrade issues with provider.yaml. Perhaps you're confusing it with the investigation of upgrade issues bauzas was doing for the vTPM series? | 08:42 |
gibi | stephenfin: ahh, that was the upgrade discussion yesterday ^^ | 08:43 |
gibi | I knew there was something | 08:43 |
gibi | I just did not remember what | 08:43 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Use compression by default for 'SshDriver' https://review.opendev.org/684393 | 08:45 |
*** tinwood-afk is now known as tinwood | 08:45 | |
alex_xu | stephenfin: ah, thanks :) | 08:46 |
*** derekh has joined #openstack-nova | 08:51 | |
*** dtantsur|afk is now known as dtantsur | 08:53 | |
*** janno has quit IRC | 08:53 | |
*** janno has joined #openstack-nova | 08:54 | |
*** janno has quit IRC | 08:55 | |
*** janno has joined #openstack-nova | 08:55 | |
*** ociuhandu has joined #openstack-nova | 09:06 | |
*** ratailor_ has joined #openstack-nova | 09:06 | |
*** ratailor has quit IRC | 09:08 | |
*** ociuhandu has quit IRC | 09:09 | |
*** xek_ has quit IRC | 09:12 | |
*** jraju__ has joined #openstack-nova | 09:23 | |
*** links has quit IRC | 09:23 | |
openstackgerrit | Merged openstack/nova master: scheduler: Request vTPM trait based on flavor or image https://review.opendev.org/739210 | 09:23 |
openstackgerrit | Merged openstack/nova master: crypto: Add support for creating, destroying vTPM secrets https://review.opendev.org/739211 | 09:24 |
openstackgerrit | Merged openstack/nova master: manager: Prevent compute startup on invalid vTPM config https://review.opendev.org/739212 | 09:24 |
openstackgerrit | Merged openstack/nova master: tests: Rename tests for '_create_guest_with_network' https://review.opendev.org/740464 | 09:24 |
openstackgerrit | Merged openstack/nova master: tests: Move single use constants to their callers https://review.opendev.org/741280 | 09:24 |
openstackgerrit | Merged openstack/nova master: tests: Define constants in '_IntegratedTestBase' https://review.opendev.org/741281 | 09:24 |
openstackgerrit | Merged openstack/nova master: tests: Remove 'test_servers.ServersTestBase' https://review.opendev.org/741282 | 09:24 |
openstackgerrit | Merged openstack/nova master: tests: Add 'PlacementHelperMixin', 'PlacementInstanceHelperMixin' https://review.opendev.org/741283 | 09:25 |
openstackgerrit | Merged openstack/nova master: tests: Make '_IntegratedTestBase' subclass 'PlacementInstanceHelperMixin' https://review.opendev.org/741284 | 09:25 |
*** mkrai has quit IRC | 09:27 | |
*** mkrai_ has joined #openstack-nova | 09:27 | |
*** yaawang has quit IRC | 09:30 | |
*** yaawang has joined #openstack-nova | 09:30 | |
stephenfin | Holy s***, they all merged in one go. No CI failures :O | 09:31 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Use compression by default for 'SshDriver' https://review.opendev.org/684393 | 09:31 |
gibi | stephenfin: that was a nice set | 09:31 |
stephenfin | bauzas, gibi: Can you look at ^ again real quick? Turns out 'scp' cares about the order of arguments. CI caught it for us and will catch it again if it's wrong | 09:31 |
gibi | stephenfin: looking | 09:32 |
stephenfin | (from https://zuul.opendev.org/t/openstack/build/7e8c6c6ddaba44e09a90a847dfe6ee46/log/logs/screen-n-cpu.txt) | 09:32 |
stephenfin | Thanks | 09:32 |
bauzas | ack | 09:32 |
* bauzas wonders then why CI didn't catch it | 09:32 | |
stephenfin | It did | 09:32 |
bauzas | hah | 09:33 |
bauzas | fwiw https://linux.die.net/man/1/scp | 09:33 |
stephenfin | I just thought it was intermittent failures and wasn't looking at it often enough to spot the trend :) | 09:33 |
stephenfin | zuul++ | 09:33 |
bauzas | stephenfin: i don't see any required ordering with scp manpage | 09:34 |
stephenfin | bauzas: neither did I, but the CI failure is fairly unambiguous | 09:34 |
stephenfin | probably the implementation of getopt they're using is borked | 09:34 |
bauzas | in theory, you could also scp -rC | 09:35 |
* bauzas prefers tar over nc | 09:35 | |
gibi | I can reproduce the ordering requirement of scp locally | 09:35 |
gibi | so the manpage is incomplete :) | 09:35 |
bauzas | gibi: I honesly never used the -C flag | 09:35 |
bauzas | like I said, I tend to use tar over nc when I wanted to transfer large files | 09:36 |
stephenfin | tbf, parsing command line arguments is hard work | 09:36 |
bauzas | waaaaay more efficient | 09:36 |
* stephenfin suggests looking at the bug list for argparse /o\ | 09:36 | |
gibi | scp is secure tar + nc is fast, it is a tradeoff :) | 09:36 |
stephenfin | so broken :-( | 09:36 |
stephenfin | to the point that click (which is actually awesome) uses the deprecated optparse. Less magical and more reliable, apparently | 09:37 |
* stephenfin goes back to breaking stuff | 09:38 | |
* gibi hugs zuul both for being stable and for catching bugs | 09:39 | |
gibi | stephenfin: btw https://that.guru/blog/the-numa-scheduling-story-in-nova/ is a great article that made me think about where and when nova selects the resources to consume | 09:42 |
bauzas | stephenfin: gibi: that's an argparse bug http://paste.openstack.org/show/796277/ | 09:42 |
bauzas | definitely not scp-related | 09:42 |
stephenfin | gibi: You can thank sean-k-mooney for most of that. I just spell checked and reorganized :) | 09:43 |
stephenfin | bauzas: Put '-C' at the end | 09:43 |
bauzas | oh that | 09:43 |
stephenfin | the issue isn't with the order of the positionals | 09:43 |
bauzas | of course, it won't work then | 09:43 |
stephenfin | options | 09:43 |
stephenfin | it's with options coming after positionals | 09:43 |
bauzas | you shock me if you thought it would work :p | 09:44 |
stephenfin | but it does in many applications! | 09:44 |
bauzas | but I honestly haven't paid attention at the argparse result :) | 09:44 |
openstackgerrit | Merged openstack/nova master: trivial: Test object backporting against correct version https://review.opendev.org/742650 | 09:44 |
bauzas | it NEVER worked with scp then :) | 09:44 |
bauzas | and many BSD commands | 09:44 |
bauzas | (many many) | 09:44 |
stephenfin | bauzas: http://paste.openstack.org/show/796278/ | 09:46 |
stephenfin | run that with e.g. 'python test.py 123 MB -b test' | 09:46 |
stephenfin | it'll work just fine | 09:46 |
stephenfin | so optparse (or whatever scp is using) is just plain broken | 09:47 |
stephenfin | but hey, I'm not going to fix it :) | 09:47 |
gibi | yeah 'grep foo ./ -R' works too | 09:48 |
gibi | sean-k-mooney: good article https://that.guru/blog/the-numa-scheduling-story-in-nova/ :) | 09:49 |
bauzas | stephenfin: fyk https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html | 09:50 |
bauzas | tl;dr: options != operands | 09:50 |
bauzas | argparse was probably written by Linux geeks who weren't knowing about UNIX :p | 09:51 |
bauzas | time for a quote | 09:51 |
bauzas | BSD is what you get when a bunch of UNIX hackers sit down to try to port a UNIX system to the PC. Linux is what you get when a bunch of PC hackers sit down and try to write a UNIX system for the PC | 09:52 |
tosky | nice as a quote, even though iirc historically incorrect: when BSD started, there were no PC | 09:53 |
bauzas | that's not coming from me :) | 09:54 |
bauzas | but I used to play with some BSD OSes in the past, and this pun was very well known | 09:54 |
bauzas | do people know that 'ps' has a very specific POSIX syntax that people can use indefffrently from the OS ? | 09:55 |
*** mkrai_ has quit IRC | 10:01 | |
*** markvoelker has joined #openstack-nova | 10:03 | |
*** markvoelker has quit IRC | 10:08 | |
*** k_mouza has joined #openstack-nova | 10:08 | |
stephenfin | bauzas: I was taught to use e.g. 'ps aux' which I think is BSD compatible too | 10:08 |
bauzas | that's correct, and that's the old syntax | 10:09 |
bauzas | we made it forward compatible | 10:09 |
bauzas | whoops | 10:09 |
bauzas | they, not we | 10:10 |
bauzas | I'm not THAT modest | 10:10 |
*** zhanglong has quit IRC | 10:10 | |
bauzas | tl;dr: options without the dash come from BSD | 10:10 |
bauzas | and Linux ported them | 10:11 |
bauzas | but in theory, you *should* always follow the POSIX syntax to be 100% compliant across all platforms | 10:11 |
bauzas | https://askubuntu.com/questions/484982/what-is-the-difference-between-standard-syntax-and-bsd-syntax | 10:12 |
bauzas | or slighly better https://man7.org/linux/man-pages/man1/ps.1.html | 10:15 |
*** spatel has joined #openstack-nova | 10:18 | |
*** k_mouza has quit IRC | 10:19 | |
*** brinzhang_ has quit IRC | 10:20 | |
*** mkrai_ has joined #openstack-nova | 10:22 | |
*** spatel has quit IRC | 10:22 | |
*** martinkennelly has joined #openstack-nova | 10:26 | |
*** k_mouza has joined #openstack-nova | 10:29 | |
*** psachin has quit IRC | 10:34 | |
*** links has joined #openstack-nova | 10:40 | |
*** jraju__ has quit IRC | 10:41 | |
*** mkrai_ has quit IRC | 10:43 | |
*** mkrai__ has joined #openstack-nova | 10:43 | |
*** k_mouza has quit IRC | 10:45 | |
*** yaawang has quit IRC | 10:50 | |
*** yaawang has joined #openstack-nova | 10:51 | |
*** k_mouza has joined #openstack-nova | 10:51 | |
*** k_mouza has quit IRC | 10:53 | |
*** k_mouza has joined #openstack-nova | 10:53 | |
*** ociuhandu has joined #openstack-nova | 10:54 | |
*** ociuhandu has quit IRC | 10:59 | |
*** k_mouza has quit IRC | 11:01 | |
*** Yumeng has quit IRC | 11:08 | |
*** udesale_ has joined #openstack-nova | 11:11 | |
*** k_mouza has joined #openstack-nova | 11:12 | |
*** udesale has quit IRC | 11:13 | |
*** k_mouza has quit IRC | 11:17 | |
*** mkrai__ has quit IRC | 11:42 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: scheduler: Default request group to None https://review.opendev.org/742651 | 11:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: tests: Add helpers for suspend, resume and reboot of server https://review.opendev.org/741285 | 11:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Pass context, instance to '_create_domain' https://review.opendev.org/741286 | 11:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: api: Reject non-spawn operations for vTPM https://review.opendev.org/741500 | 11:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: libvirt: Add emulated TPM support to Nova https://review.opendev.org/631363 | 11:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: docs: Add docs for vTPM support https://review.opendev.org/739213 | 11:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Don't unset Instance.old_flavor, new_flavor until necessary https://review.opendev.org/741995 | 11:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add support for resize and cold migration of emulated TPM files https://review.opendev.org/639934 | 11:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add type hints to 'nova.compute.manager' https://review.opendev.org/742863 | 11:53 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: privsep: Add support for recursive chown, move_tree operations https://review.opendev.org/742864 | 11:53 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add type hints to 'nova.virt.libvirt.utils' https://review.opendev.org/742865 | 11:53 |
*** markvoelker has joined #openstack-nova | 12:04 | |
*** markvoelker has quit IRC | 12:05 | |
*** markvoelker has joined #openstack-nova | 12:06 | |
*** k_mouza has joined #openstack-nova | 12:07 | |
*** k_mouza has quit IRC | 12:12 | |
*** k_mouza has joined #openstack-nova | 12:18 | |
*** psachin has joined #openstack-nova | 12:18 | |
*** k_mouza has quit IRC | 12:23 | |
*** derekh has quit IRC | 12:24 | |
*** k_mouza has joined #openstack-nova | 12:28 | |
*** k_mouza has quit IRC | 12:31 | |
*** k_mouza has joined #openstack-nova | 12:31 | |
*** ratailor_ has quit IRC | 12:34 | |
*** ociuhandu has joined #openstack-nova | 12:43 | |
*** ociuhandu has quit IRC | 12:47 | |
*** derekh has joined #openstack-nova | 12:51 | |
*** lbragstad has joined #openstack-nova | 13:05 | |
*** mriedem has joined #openstack-nova | 13:07 | |
*** artom has joined #openstack-nova | 13:09 | |
*** zigo has quit IRC | 13:19 | |
*** gokhani has joined #openstack-nova | 13:25 | |
*** ociuhandu has joined #openstack-nova | 13:30 | |
*** zigo has joined #openstack-nova | 13:31 | |
openstackgerrit | Elod Illes proposed openstack/nova stable/rocky: compute: Allow snapshots to be created from PAUSED volume backed instances https://review.opendev.org/729177 | 13:41 |
*** sean-k-mooney has joined #openstack-nova | 13:46 | |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Add regression test for bug 1879787 https://review.opendev.org/741230 | 13:48 |
openstack | bug 1879787 in OpenStack Compute (nova) "post_live_migration does not handle Neutron errors" [Medium,In progress] https://launchpad.net/bugs/1879787 - Assigned to Artom Lifshitz (notartom) | 13:48 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Handle Neutron errors in _post_live_migration() https://review.opendev.org/729763 | 13:48 |
*** gokhani has quit IRC | 13:48 | |
openstackgerrit | Merged openstack/nova stable/ussuri: libvirt: Handle VIR_ERR_DEVICE_MISSING when detaching devices https://review.opendev.org/742414 | 13:49 |
sean-k-mooney | gibi: hi o/ i was away for a funeral this week so just seeing your commnet on the attach/detach patch now. i can take a look at it more closely next week but still pretty burt out today. hopefully ill be less mentally exausted after the weekend | 13:54 |
sean-k-mooney | so ya i think your right there is a bug related to macvtap detach that is prexisting in the libvirt driver | 13:56 |
sean-k-mooney | well 2 one its not updating the domain correctly because its not finding the device properly and 2 its not relasing the vf claim because we just dont do that today for sriov detach | 13:57 |
sean-k-mooney | problem 1 is want prevents the vf mac from being reset and the macvtap being removed on detach | 13:58 |
gibi | sean-k-mooney: no worries. take your time to recover | 13:58 |
*** mlavalle has joined #openstack-nova | 13:58 | |
gibi | sean-k-mooney: I think I've just found the reason of 2 | 13:58 |
gibi | and I think I can fix it | 13:58 |
gibi | I will be away next week | 13:59 |
gibi | so feel free to touch my code or add patches to the series while I'm away and I will continue the weak after | 13:59 |
sean-k-mooney | what proably makes sense is to have 3 patches. 1 that block detach in the api, then your current one and a final patch for macvtap | 13:59 |
gibi | yes, make sense to have separate patches for the separate issues | 14:00 |
*** k_mouza has quit IRC | 14:00 | |
sean-k-mooney | we also need a ptach for direct-physical but it is basically the same issue the device lookup fails although it fails for a different reason | 14:00 |
*** psachin has quit IRC | 14:01 | |
sean-k-mooney | it filas because the mac is not present rather then the target_dev | 14:01 |
sean-k-mooney | but its still failing in the same if i belive https://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/guest.py#L252-L257 | 14:01 |
gibi | sean-k-mooney: cool. I haven't had time to try direct-physical yet, | 14:02 |
sean-k-mooney | what i think make sense is to just have 2 code paths. if its an sriov inteface find it by the pci adresss and remove it | 14:03 |
sean-k-mooney | if not find it by its mac and remove it | 14:03 |
*** xek_ has joined #openstack-nova | 14:04 | |
sean-k-mooney | ill have to look at the code and see if that makes sense in partice however as im not sure if we have the vnic_type or vif type avaiable | 14:05 |
gibi | the code that searchs for the interface has access to the vif 124: enp129s16f6: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000 | 14:06 |
gibi | bah | 14:06 |
gibi | the code that searchs for the interface has access to the vif 124: enp129s16f6: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000 | 14:06 |
gibi | my copy paste buffer is brokn :/ | 14:07 |
*** ociuhandu has quit IRC | 14:07 | |
sean-k-mooney | ok cool | 14:07 |
gibi | https://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/guest.py#L252-L257 | 14:07 |
*** ociuhandu has joined #openstack-nova | 14:07 | |
gibi | nah, this is the place where the matching between the current domain and the vif being detached happens https://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/guest.py#L252-L257 | 14:07 |
sean-k-mooney | same link :) i think you ment https://github.com/openstack/nova/blob/4a925cf01ac6ca313ff10c3075a86d65095de299/nova/virt/libvirt/driver.py#L2199 | 14:09 |
sean-k-mooney | and yes it has the vif | 14:09 |
stephenfin | lyarwood: Are you the person I need to shout at for nova-ceph-multistore failing? :P | 14:10 |
sean-k-mooney | so we could add a get_interface_by_pci_address and call that instead for sriov devices. | 14:10 |
gibi | sean-k-mooney: yes both yours and mine points to the code that causes the failure | 14:10 |
stephenfin | jk, but heads up I'm seeing a lot of failures on that today. Haven't investigated yet though | 14:10 |
*** dpawlik2 has quit IRC | 14:11 | |
*** k_mouza has joined #openstack-nova | 14:11 | |
lyarwood | stephenfin: dansmith introduced it while I was out so no ;) | 14:12 |
lyarwood | stephenfin: what's up? | 14:12 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: [WIP] Support SRIOV interface attach and detach https://review.opendev.org/740995 | 14:12 |
dansmith | stephenfin: link? | 14:12 |
sean-k-mooney | stephenfin: its a modifed verion of the previous ceph job | 14:12 |
stephenfin | dansmith: https://review.opendev.org/#/c/741286/ | 14:12 |
sean-k-mooney | so the test are the same but the config is slighly different to enable multistore and the image import form copy feature | 14:13 |
dansmith | stephenfin: thanks will look through it in a sec | 14:13 |
gibi | sean-k-mooney: https://review.opendev.org/#/c/740995/5/nova/virt/libvirt/guest.py@240 this change fixes the macvtap detach issue in my env, but I agree that the condition might need a refactoring to have two condition one for pci and another for mac | 14:14 |
* lyarwood wonders if this is a space issue again | 14:14 | |
sean-k-mooney | gibi: ya so that will work for macvtap but we will still fail for direct-physical | 14:15 |
gibi | sean-k-mooney: yes, probably, haven't tried | 14:15 |
sean-k-mooney | interfaces = self.get_all_devices( | 14:15 |
sean-k-mooney | vconfig.LibvirtConfigGuestInterface) | 14:16 |
sean-k-mooney | that wont return the direct-physical interfaces | 14:16 |
sean-k-mooney | since they are not element <interface ...> and use <hostdev ...? | 14:16 |
gibi | ohh | 14:16 |
gibi | interesting | 14:16 |
sean-k-mooney | also they dont have a mac in the host develement | 14:16 |
sean-k-mooney | libvirt cant passthough a pf with the <interface type=hostdev> only VFs | 14:17 |
sean-k-mooney | and the hostdev element dose not have a mac either so interface.mac_addr == cfg.mac_addr would fail | 14:18 |
sean-k-mooney | proably with an attribute error if we got that far | 14:18 |
dansmith | stephenfin: did you look into those fails at all? looks to me like just novalidhost on at least one of the three failed tests, and it's a conflict from placement during scheduling: | 14:20 |
dansmith | https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/screen-n-sch.txt#3493 | 14:20 |
dansmith | meaning, are you sure it's just that job failing more? because that fails way before the point where we get to any of the new (i.e. ceph or multistore) stuff | 14:21 |
sean-k-mooney | there are traces in the n-cpu log https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/screen-n-cpu.txt#14267-14323 | 14:21 |
sean-k-mooney | nova.exception.ImageNotFound: Image a549f544-e4e3-4f66-962e-03c1514ee21f could not be found | 14:21 |
stephenfin | dansmith: Barely. I'm seeing image retrieval failures in n-cpu | 14:21 |
stephenfin | yeah, those ^ | 14:21 |
*** links has quit IRC | 14:21 | |
dansmith | hmm, maybe the first test I picked was a rando failure then | 14:22 |
stephenfin | but there are a couple of patches in that series failing and I don't think they're related to the code | 14:22 |
sean-k-mooney | if those tests are uploading new images maybe they are not ready when the boot is started because the import/conversion takes longer or something | 14:23 |
dansmith | ah yeah, I see now | 14:23 |
dansmith | sean-k-mooney: yeah that could be | 14:24 |
dansmith | I think we should still be able to GET the image though | 14:24 |
sean-k-mooney | looks like that is not the case for rescure at least https://github.com/openstack/tempest/blob/257f3b009f7978723a8748f9f5b413aa8eb38e3a/tempest/api/compute/servers/test_server_rescue.py#L55-L67 | 14:25 |
dansmith | sean-k-mooney: what is not the case for rescue? | 14:26 |
sean-k-mooney | ya it just does rescue without specifying an image so it will use the image the vm was booted with or the image specifid in the config. i wonder if it failed before that | 14:26 |
sean-k-mooney | dansmith: the rescue test is not uploading any images | 14:26 |
dansmith | are you looking at a different fail? | 14:26 |
sean-k-mooney | tempest.api.compute.servers.test_server_rescue.ServerRescueTestJSON.test_rescue_unrescue_instance | 14:27 |
sean-k-mooney | its the second failure in the test report | 14:27 |
dansmith | ack, the first thing you linked is to an ImagesTest not rescue right? | 14:27 |
dansmith | it's definitely doing a snapshot | 14:27 |
sean-k-mooney | actully looking at the server uuid its not in the ncpu log so the novalid host looks like it really could not fit | 14:29 |
dansmith | sean-k-mooney: right, that's what I was saying, I just picked poorly on the first test to look at :) | 14:29 |
dansmith | sean-k-mooney: hmm, I see a DELETE of the image just before the failed GET in the glance logs, for that snapshot one, which is odd | 14:29 |
sean-k-mooney | yep Got no allocation candidates from the Placement API. | 14:30 |
sean-k-mooney | oh downstream call | 14:31 |
dansmith | ah | 14:31 |
dansmith | so, | 14:31 |
dansmith | I think that stack trace from sean-k-mooney is a red herring | 14:31 |
dansmith | I think that's an images test that tries to delete the image whilst snapshotting or something | 14:31 |
dansmith | it's not even one of the tests that failed in the testr report :) | 14:31 |
dansmith | all three of those tests are novalidhost | 14:32 |
dansmith | so maybe we're actually reporting something different to placement and running out of disk or something? | 14:32 |
sean-k-mooney | ya maybe | 14:32 |
sean-k-mooney | we have 80G of disk in the ci vms but it may not all be avaible int /opt | 14:33 |
sean-k-mooney | so i dont know we might have ran out of space | 14:33 |
dansmith | well, | 14:34 |
dansmith | it might be a reporting thing or something and not actually out of space, | 14:34 |
dansmith | because we're not seeing problems, just placement is refusing to find space | 14:34 |
dansmith | Jul 24 12:44:22.575632 ubuntu-bionic-ovh-bhs1-0018770257 devstack@placement-api.service[50512]: DEBUG placement.wsgi_wrapper [req-eeb6d563-2483-4e4f-91e8-2dc3a694ade4 req-c57d5bd6-fc4e-469d-9784-cdfe1652d653 service placement] Placement API returning an error response: Unable to allocate inventory: Unable to create allocation for 'DISK_GB' on resource provider 'e786426a-5ae2-4732-8cf6-16325fd2bf2a'. The requested amount would exceed | 14:36 |
dansmith | the capacity. {{(pid=50513) call_func /opt/stack/placement/placement/wsgi_wrapper.py:31}} | 14:37 |
dansmith | Over capacity for DISK_GB on resource provider e786426a-5ae2-4732-8cf6-16325fd2bf2a. Needed: 1, Used: 10, Capacity: 10.0 | 14:37 |
dansmith | 10G doesn't sound right | 14:37 |
*** mlavalle has quit IRC | 14:40 | |
mriedem | random drive by comment but https://review.opendev.org/#/c/586363/ | 14:40 |
*** eharney has quit IRC | 14:40 | |
mriedem | anyway related to ceph ci jobs? | 14:41 |
dansmith | I'm trying to figure out, but we are running a ceph df right before we report inventory | 14:41 |
* mriedem ducks back into hole | 14:41 | |
openstackgerrit | Alex Deiter proposed openstack/nova master: Detach is broken for multi-attached fs-based volumes https://review.opendev.org/741712 | 14:41 |
*** mlavalle has joined #openstack-nova | 14:43 | |
*** k_mouza has quit IRC | 14:43 | |
*** k_mouza has joined #openstack-nova | 14:53 | |
*** eharney has joined #openstack-nova | 14:53 | |
sean-k-mooney | dansmith: by the way if the traceback is unrelated then we likely have another silent bug as we are not catching the excpetion in the missing image case | 14:54 |
dansmith | sean-k-mooney: yep | 14:54 |
dansmith | so we're calling ceph df to get the total size of the pool and reporting that | 14:55 |
sean-k-mooney | dansmith: i think you are right that its unrelated | 14:55 |
dansmith | as best I can tell, the ceph is backed by a 24G partition | 14:55 |
*** udesale_ has quit IRC | 14:55 | |
dansmith | so I dunno where the 10G is coming from | 14:55 |
sean-k-mooney | this is using the ceph image backend in nova so the local_GB should be the ceph pool size right | 14:56 |
dansmith | well, it should be yes | 14:56 |
dansmith | ceph has 24G, so I'm trying to find where our images pool would be limited to 10G but not seeing it | 14:56 |
dansmith | one thing that might explain this, | 14:56 |
dansmith | is that our normal ceph job was using qcow on rbd, which is not what you're supposed to do, | 14:57 |
sean-k-mooney | oh ya because we have to flatten it | 14:57 |
sean-k-mooney | it should be raw | 14:57 |
sean-k-mooney | to get the cow optimization | 14:57 |
dansmith | and so we convert the image to raw, which is 44M per image instead of 12 or something.. although we shouldn't really be using that much space, so... hmm | 14:57 |
dansmith | and this is just placement saying we're out of space, not ceph | 14:57 |
dansmith | I wonder if glance is incorrectly determining the size of the new image after it flattens or something | 14:58 |
sean-k-mooney | well with after teh first image import is all cow clones in ceph right | 14:58 |
dansmith | and telling us we need a lot more than we do or something | 14:58 |
dansmith | right | 14:58 |
dansmith | check this out: Jul 24 12:44:22.270293 ubuntu-bionic-ovh-bhs1-0018770257 nova-scheduler[55176]: WARNING nova.scheduler.host_manager [None req-eeb6d563-2483-4e4f-91e8-2dc3a694ade4 tempest-MultipleCreateTestJSON-968818181 tempest-MultipleCreateTestJSON-968818181] Host ubuntu-bionic-ovh-bhs1-0018770257 has more disk space than database expected (8 GB > 1 GB) | 15:01 |
sean-k-mooney | reserved_host_disk_mb IS 0 TOO | 15:01 |
sean-k-mooney | that is strange do we have the hoststate update enabled | 15:02 |
*** k_mouza has quit IRC | 15:03 | |
sean-k-mooney | im pretty sure we do | 15:03 |
*** eharney has quit IRC | 15:03 | |
sean-k-mooney | ya we do | 15:03 |
*** derekh has quit IRC | 15:03 | |
sean-k-mooney | disk_allocation_ratio=1.0,disk_available_least=8,free_disk_gb=10,f | 15:04 |
dansmith | we're only asking placement for DISK_GB=1 allocation so I don't think we're getting a bad number from glance or anything | 15:04 |
sean-k-mooney | what do our flavor look like | 15:06 |
sean-k-mooney | actully no never mind | 15:06 |
sean-k-mooney | this is not bfv | 15:06 |
sean-k-mooney | the flavor should be either 1 or 2GB per instance i think | 15:06 |
*** jsuchome has quit IRC | 15:07 | |
dansmith | and it seems like 1 since we're asking for that size allocation | 15:07 |
sean-k-mooney | ya its based on teh image size https://github.com/openstack/devstack/blob/2ecd1823850ae0e00ad0ecebbbceb312be60ccf4/lib/tempest#L204-L206 | 15:09 |
sean-k-mooney | so for cirros image it will be 1g | 15:09 |
dansmith | sudo ceph -c /etc/ceph/ceph.conf osd pool create vms 8 8 | 15:10 |
dansmith | that's 8G for the vms pool | 15:10 |
dansmith | I dunno where we're getting 10G | 15:10 |
sean-k-mooney | i dont think that is the size | 15:10 |
sean-k-mooney | i think that is the buckest to share it in | 15:10 |
sean-k-mooney | let me check | 15:10 |
dansmith | hmm, okay it seems like size | 15:11 |
sean-k-mooney | i think its the placment groups but its been a while | 15:11 |
dansmith | okay yeah, maybe you're right | 15:11 |
sean-k-mooney | ceph osd pool create <pool-name> <pg-num> <pgp-num> [replicated] \ | 15:12 |
sean-k-mooney | [crush-ruleset-name] [expected-num-objects] | 15:12 |
sean-k-mooney | so ya its not the size | 15:12 |
dansmith | yeah | 15:13 |
dansmith | I still dunno where we're getting 10G, | 15:13 |
sean-k-mooney | same | 15:13 |
*** ociuhandu has quit IRC | 15:14 | |
dansmith | because CEPH_LOOPBACK_DISK_SIZE=24G | 15:14 |
sean-k-mooney | so it should be 8 https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/settings#L17 | 15:14 |
sean-k-mooney | by default | 15:15 |
dansmith | it's overridden in our job somewhere | 15:15 |
dansmith | you can see in the devstacklog | 15:15 |
sean-k-mooney | CEPH_LOOPBACK_DISK_SIZE is | 15:15 |
sean-k-mooney | is VOLUME_BACKING_FILE_SIZE | 15:15 |
dansmith | ues | 15:15 |
dansmith | both are | 15:15 |
sean-k-mooney | ok cool | 15:15 |
dansmith | VOLUME_BACKING_FILE_SIZE=24G | 15:16 |
dansmith | and the df shows 24G on /var/lib/ceph | 15:16 |
sean-k-mooney | ah yes it does | 15:17 |
*** eharney has joined #openstack-nova | 15:17 | |
dansmith | we run "ceph df" to get the DISK_GB we report, | 15:17 |
dansmith | and don't really do much to it, | 15:17 |
dansmith | so it really seems like we're being told 10G | 15:18 |
*** k_mouza has joined #openstack-nova | 15:19 | |
dansmith | lyarwood: do you know anything about what ceph df may be telling us about total pool size that differs from the backing store's size? | 15:19 |
lyarwood | dansmith: nope, AFAIK it just reports the size of the images_rbd_pool | 15:20 |
* lyarwood looks | 15:20 | |
dansmith | seems straightforward :) | 15:21 |
lyarwood | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L374-L382 - ah well melwitt has a handy comment here that might help | 15:21 |
lyarwood | gah highlights are a new lines off but you get the point | 15:21 |
dansmith | oh I read that, | 15:21 |
dansmith | but didn't grok until now | 15:22 |
dansmith | so replication makes the thing looks smaller I guess? | 15:22 |
dansmith | seems weird to go from 24G to 10G, as that's not an even factor | 15:22 |
-openstackstatus- NOTICE: We are renaming projects in Gerrit and review.opendev.org will experience a short outage. Thank you for your patience. | 15:22 | |
dansmith | er, no wait | 15:23 |
dansmith | that's for max_avail, which is "free" not total right? | 15:23 |
*** k_mouza has quit IRC | 15:23 | |
lyarwood | right sorry and you're seeing 10 reported as the total capacity right? | 15:24 |
dansmith | correct | 15:24 |
lyarwood | kk sorry then that isn't it | 15:24 |
*** dklyle has joined #openstack-nova | 15:25 | |
*** maciejjozefczyk has quit IRC | 15:26 | |
dansmith | I guess one thing we could do is increase the ceph backing size to 36G and see if DISK_GB goes up | 15:26 |
bauzas | can someone tell me what the fuck is ? http://paste.openstack.org/show/796292/ | 15:26 |
bauzas | tl;dr: ssh: connect to host review.openstack.org port 29418: Network is unreachable | 15:26 |
bauzas | have I missed a memo ? | 15:27 |
melwitt | there's a openstackstatus above ^ said there will be a short outage | 15:27 |
openstackgerrit | Sylvain Bauza proposed openstack/nova-specs master: WIP: Offline Reshape tool spec https://review.opendev.org/742908 | 15:29 |
bauzas | yay, it worked | 15:29 |
bauzas | melwitt: thanks | 15:29 |
bauzas | calling it a day | 15:29 |
*** k_mouza has joined #openstack-nova | 15:32 | |
melwitt | dansmith: MAX_AVAIL should be total actually, just taking number of replicas into account. if you only have 1 replica (default NUM_REPLICAS=1) then MAX_AVAIL should match whatever total says in 'ceph df' | 15:32 |
dansmith | melwitt: you're reporting free as max_avail though in that thing aren't you? | 15:32 |
dansmith | or does MAX_AVAIL != max_avail ? | 15:32 |
melwitt | but if you've set NUM_REPLICAS=2 when you deployed a devstack, then since the devstack ceph plugin creates 2 OSDs on the same HDD in that case, it would be 2x the real disk | 15:32 |
melwitt | no MAX_AVAIL is a ceph thing | 15:33 |
melwitt | (if you're referring to what is written about ceph df in rbd_utils.py) | 15:33 |
dansmith | you mean half the disk I assume | 15:33 |
dansmith | yeah | 15:33 |
melwitt | no like the old behavior used to report 20G if you had a 10G disk, of you had created 2 OSDs that point at the same HDD | 15:33 |
dansmith | so maybe (24 - overhead) / 2 == 10 or something | 15:34 |
melwitt | you're using NUM_REPLICAS=1 right? you didn't set it in the job | 15:34 |
melwitt | if so, there shouldn't be a difference | 15:34 |
dansmith | I'm not setting it, but let me look if it's getting set | 15:34 |
melwitt | I doubt it, I've never seen it set in CI before. I had to set it locally to do the testing for that MAX_AVAIL change | 15:35 |
dansmith | yeah I don't even see that variable anywhere | 15:35 |
dansmith | is that a devstack-plugin-ceph thing? | 15:35 |
melwitt | yeah sec | 15:35 |
lyarwood | dansmith: https://docs.ceph.com/docs/jewel/rados/operations/pools/#create-a-pool ; sudo ceph -c /etc/ceph/ceph.conf osd pool create vms 8 8 ; that doesn't mean create a 8GB pool | 15:36 |
melwitt | bah sorry it's CEPH_REPLICAS https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L109 | 15:36 |
*** k_mouza has quit IRC | 15:36 | |
dansmith | lyarwood: yeah we established that :) | 15:36 |
lyarwood | ah sorry wasn't watching irc | 15:36 |
dansmith | lyarwood: somewhere in the plugin I saw a comment that made it sound like that was size | 15:36 |
melwitt | 10G honestly I would have thought is just the cloud image's disk size, no? | 15:37 |
dansmith | melwitt: yeah 1 | 15:37 |
melwitt | or do we probably use something larger in CI | 15:37 |
dansmith | melwitt: no, said above, it's 24G | 15:37 |
dansmith | melwitt: https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/df.txt | 15:37 |
dansmith | and it's overridden to 24G in the devstack log | 15:37 |
lyarwood | that's total for the three different pools | 15:37 |
*** k_mouza has joined #openstack-nova | 15:37 | |
lyarwood | vms images and volumes? | 15:37 |
melwitt | oh I see | 15:38 |
dansmith | lyarwood: and are the pools set to something specific for size? that's what we're trying to find and can't :) | 15:38 |
dansmith | lyarwood: the way it looks now I'd assume it just reports that they're all 24G in size, with various amounts free like zfs does for filesystems on a pool, | 15:38 |
dansmith | but I'm just guessing | 15:38 |
dansmith | I'm stacking a ceph devstack so I can poke but right now all I have is logs | 15:38 |
dansmith | if total decreases as we use space, then we're not really reporting the right thing to placement | 15:39 |
dansmith | which could be part of the problem of coruse | 15:39 |
lyarwood | dansmith: yup true and that's also going to bounce around alot during a tempest run | 15:40 |
dansmith | yep | 15:40 |
dansmith | I'm pretty sure this is not a consequence of my job, by the way, I think mine is just a little slower because we have some glance features turned on, so we probably have a little more of a logjam than normal | 15:40 |
*** xek_ has quit IRC | 15:41 | |
dansmith | oh jeez, you know what I just realized? | 15:41 |
dansmith | we might be snapshotting to the file store and not the ceph store in some cases, actually | 15:41 |
dansmith | hmm | 15:41 |
dansmith | nova does the snapshots itself so maybe not, but if we ever do a raw image upload.. the default store is the file store | 15:41 |
*** gyee has joined #openstack-nova | 15:42 | |
dansmith | not that that would cause this, but it might be changing the timing characteristics | 15:42 |
*** k_mouza has quit IRC | 15:42 | |
dansmith | I'll have to think on that a bit | 15:42 |
melwitt | well, this doesn't look promising for MAX_AVAIL, it sounds like it would decrease with use and is not a total https://access.redhat.com/solutions/3537961 | 15:42 |
dansmith | ah yeah | 15:43 |
dansmith | melwitt: did you read this? https://access.redhat.com/solutions/2273951 | 15:43 |
dansmith | we're not replicated I guess so maybe that doesn't affect us in CI, but probably has some impact for real users of this | 15:44 |
melwitt | no | 15:44 |
melwitt | so there are multiple reasons MAX_AVAIL shouldn't be used :( | 15:45 |
*** bnemec is now known as beekneemech | 15:46 | |
dansmith | not it! | 15:46 |
*** k_mouza has joined #openstack-nova | 15:47 | |
dansmith | the other problem I'm guessing, | 15:47 |
melwitt | yeah... I'm thinking whether to revert that or tweak it to take total and divide by pool size, the latter would do what was actually desired and report total with replication considered | 15:47 |
dansmith | is that if we report the real actual total (even minus replication overhead), but other pools can consume space from the same store, | 15:48 |
dansmith | we will tell placement we have more room than it can allocate | 15:48 |
dansmith | so really we need to sum up all the pools on the same store, and then set reserved= for any space they use I guess, but then we race with those other uses in our reporting | 15:48 |
dansmith | and could go negative | 15:48 |
melwitt | yeah, I'm trying to remember, I could have sworn this get_pool_info was only used to report free space, not total space, but I could be totally making that up | 15:49 |
melwitt | or that that's what it's used for ultimately in higher layers | 15:49 |
melwitt | let me look up what "total" used to be, maybe it meant "total available" | 15:50 |
melwitt | no, looks like it was total. had total, total used, and total available | 15:51 |
*** k_mouza has quit IRC | 15:51 | |
sean-k-mooney | dansmith: one thing that i just tought of | 15:52 |
*** dtantsur is now known as dtantsur|afk | 15:53 | |
sean-k-mooney | by default replicate pools have a replciation factor of 3 | 15:53 |
sean-k-mooney | so if we have 24G of space we would only have 8 useable | 15:53 |
melwitt | but looking at the clip again https://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L374-L382 I did parse out total_bytes to go with 'total', max_avail to go with 'free', and bytes_used to go with 'used'. so this should be fine.... | 15:53 |
dansmith | I'm confused about whether we're replicating or not | 15:53 |
dansmith | sean-k-mooney: ^ | 15:53 |
sean-k-mooney | that is the default unless we create a erasure encoded pool | 15:53 |
dansmith | and even still, 24/3==10 only for very small values of 3 :P | 15:53 |
dansmith | hmm, okay what is CEPH_REPLICAS then? | 15:54 |
melwitt | that's the number of replicas for when it creates the pools | 15:54 |
sean-k-mooney | well we have 24G for ceph but we have multiple pools right? | 15:54 |
*** k_mouza has joined #openstack-nova | 15:54 | |
sean-k-mooney | the images pool will also be using that | 15:54 |
dansmith | right, vms and images | 15:55 |
sean-k-mooney | https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L109 | 15:56 |
sean-k-mooney | its 1 | 15:56 |
sean-k-mooney | CEPH_REPLICAS | 15:57 |
sean-k-mooney | wich for ci makes sense | 15:57 |
dansmith | right, I think we established that earlier :) | 15:57 |
melwitt | yeah, I was saying earlier I've never seen CI use anything other than the default of 1 | 15:57 |
sean-k-mooney | well we dont need to test anything else in our ci since we are not really testing ceph | 15:58 |
dansmith | https://pastebin.com/6gcGhTHQ | 15:58 |
sean-k-mooney | just ceph integration with other thngs | 15:58 |
dansmith | this is what my ceph df shows on a clean devstack | 15:58 |
*** gibi is now known as gibi_pto | 15:58 | |
dansmith | interestingly I didn't update my backing size from 8 to 24, but still got 24 | 15:58 |
gibi_pto | so I'm going away for a week. I will be back on 3rd of Aug | 15:58 |
dansmith | gibi_pto: p/ | 15:59 |
*** k_mouza has quit IRC | 15:59 | |
gibi_pto | o/ | 15:59 |
lyarwood | \o | 15:59 |
sean-k-mooney | dansmith: i think VOLUME_BACKING_FILE_SIZE is a devstack setting | 15:59 |
dansmith | oh, I see, and ceph plugin uses that, gotcha | 16:00 |
sean-k-mooney | yes https://github.com/openstack/devstack/blob/e0d06adffcf4c8da1aefebc66f2de9a440badbf6/stackrc#L766 | 16:00 |
sean-k-mooney | and devstack defaults it to 24 | 16:00 |
sean-k-mooney | so that is where that is comming form | 16:00 |
sean-k-mooney | that was orginically for cinder | 16:01 |
sean-k-mooney | oh | 16:03 |
sean-k-mooney | https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph_log.txt | 16:03 |
sean-k-mooney | pgmap v5: 0 pgs: ; 0 B data, 704 KiB used, 9.0 GiB / 10 GiB avail | 16:03 |
sean-k-mooney | so ceph does think it has only 10G | 16:03 |
dansmith | oh nice, but from where? | 16:03 |
*** k_mouza has joined #openstack-nova | 16:04 | |
sean-k-mooney | there is a ceph follder at the root of the contoler logs | 16:04 |
dansmith | from my devstack: 2020-07-24 08:25:52.400945 mgr.x client.14099 192.168.201.41:0/3299763660 2 : cluster [DBG] pgmap v5: 0 pgs: ; 0B data, 188MiB used, 23.8GiB / 24.0GiB avail | 16:04 |
dansmith | no I mean where is it getting the 10G | 16:04 |
sean-k-mooney | im wondering if we are using the filestore backend and didnt resize the filesystem or something? | 16:04 |
sean-k-mooney | although DF on the host shose 24G right | 16:05 |
dansmith | it does, | 16:05 |
sean-k-mooney | is that the block device size or filesystem | 16:05 |
dansmith | and my local devstack shows the 24G | 16:05 |
dansmith | filesystem | 16:05 |
dansmith | doesn't look like we grab the ceph configs | 16:06 |
sean-k-mooney | https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph-osd.0_log.txt#10 | 16:06 |
sean-k-mooney | so its using filestore | 16:06 |
sean-k-mooney | but wy is that 10G | 16:06 |
*** songwenping_ has joined #openstack-nova | 16:06 | |
sean-k-mooney | oh its using bluestore not file store | 16:06 |
sean-k-mooney | but same question | 16:06 |
dansmith | you mean files in /var/lib/ceph right? | 16:07 |
sean-k-mooney | bluestore(/var/lib/ceph/osd/ceph-0) _setup_block_symlink_or_file resized block file to 10 GiB | 16:07 |
sean-k-mooney | its not using the mount | 16:07 |
dansmith | not using the mount for what? | 16:07 |
sean-k-mooney | its creating a ceph-0 file inside it it think | 16:07 |
dansmith | sure, that's inside the mount | 16:07 |
dansmith | it's creating a 10G flat file right? | 16:08 |
sean-k-mooney | i think so | 16:08 |
*** k_mouza has quit IRC | 16:08 | |
sean-k-mooney | that is then being used for the osd | 16:08 |
dansmith | right | 16:08 |
sean-k-mooney | so we are creatinga 24G flatifile and attaching it as a loopback device then mounting it on /mnt/ceph | 16:09 |
sean-k-mooney | sorry | 16:09 |
sean-k-mooney | /var/lib/ceph | 16:09 |
dansmith | right | 16:09 |
sean-k-mooney | then inside that they are creating another flatfile | 16:09 |
dansmith | and then it's creating a file called block inside there as the actual thing the osd uses | 16:09 |
sean-k-mooney | and using that for the osd | 16:09 |
sean-k-mooney | yep | 16:10 |
sean-k-mooney | so this is wrong | 16:10 |
dansmith | and that thing is 10G | 16:10 |
*** songwenping__ has quit IRC | 16:10 | |
sean-k-mooney | i think we are expecting them to use the /var/lib/ceph mound directly for the osd | 16:10 |
sean-k-mooney | i suspect this behavior changed when we changed form the filestore to bluestore backend | 16:10 |
sean-k-mooney | we should mount the loopback device at /var/lib/ceph/osd/ceph-0/block | 16:11 |
sean-k-mooney | instead that way it would have teh full 24G | 16:11 |
dansmith | I dunno what "directly" means.. they still have to store their data in their special format right? | 16:11 |
dansmith | and it's normally a raw disk they want, so if we give them a filesystem they need to create a flat file to emulate the block device on no? | 16:11 |
dansmith | fwiw, I don't have a block file (yet) and mine is reporting 24G | 16:12 |
dansmith | so i dunno why it's different | 16:12 |
dansmith | ah, my osd0.log: | 16:13 |
dansmith | xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf | 16:13 |
dansmith | so that's different than bluestore I guess you're saying? | 16:13 |
sean-k-mooney | i think if we just left /var/lib/ceph mounted under / as part of the root filestem and moved wehere we mount the 24G loopback device file to /var/lib/ceph/osd/ceph-0/block ceph would have all 24G | 16:13 |
sean-k-mooney | dansmith: yes that is teh filestore backend | 16:13 |
sean-k-mooney | that use need a folder to use | 16:13 |
dansmith | ah, CI is using the nautilus version of ceph, I'm on luminous | 16:13 |
sean-k-mooney | luminous is the defualt in the devstack plugin ya | 16:14 |
dansmith | but CI is using nautilus | 16:14 |
sean-k-mooney | but ci i guess is overriding it | 16:14 |
dansmith | ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable), process ceph-osd, pid 8952 | 16:14 |
sean-k-mooney | ya | 16:14 |
dansmith | can we set the backing store driver? | 16:14 |
dansmith | back to xfs? | 16:14 |
sean-k-mooney | we could but i think the better solution is to change how we do the mounting | 16:15 |
sean-k-mooney | bluestore is not the default | 16:15 |
sean-k-mooney | upstream | 16:15 |
sean-k-mooney | in ceph and downstream as of osp 16 | 16:15 |
sean-k-mooney | so its nice to test with bluestore | 16:15 |
*** k_mouza has joined #openstack-nova | 16:15 | |
dansmith | unless you see where we're setting it to bluestore, it would seem maybe the default changed in nautilus? | 16:15 |
sean-k-mooney | well after lumious in any case but yes i dont think we currently set it directly | 16:16 |
dansmith | you said "bluestore is not the default" above | 16:16 |
dansmith | so I'm confused about what you're proposing | 16:17 |
sean-k-mooney | oh i ment is | 16:17 |
sean-k-mooney | it is now the default in ceph | 16:17 |
*** k_mouza has quit IRC | 16:17 | |
sean-k-mooney | i filestore used to be the default before | 16:17 |
dansmith | okay that's what I was saying | 16:18 |
*** k_mouza has joined #openstack-nova | 16:18 | |
dansmith | I still don't get where the 10G comes from, other than that something is clearly different with blue vs xfs stores | 16:18 |
sean-k-mooney | bluestore has been the default for a few releases now. filestore is deprecated upstream and downstream in ops | 16:18 |
sean-k-mooney | dansmith: i think that is the default size that the ceph tool uses | 16:19 |
sean-k-mooney | when its creating a backing file | 16:19 |
dansmith | okay I don't see that anywhere | 16:19 |
sean-k-mooney | its being created by the ceph osd itself here https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph-osd.0_log.txt#4 | 16:20 |
dansmith | I imagine that keeping the loopback mount for var lib ceph is ideal for the plugin as long as we have stable branches that use that | 16:20 |
dansmith | sean-k-mooney: yeah I get that :) | 16:20 |
dansmith | sean-k-mooney: I'm saying I don't know where 10G is set or assumed or whatever ;) | 16:20 |
sean-k-mooney | yes we can proably change that in the job? | 16:21 |
melwitt | dansmith: is it not here? https://zuul.opendev.org/t/openstack/build/13d8a055ff1b4be0b627205f4d51d50f/log/controller/logs/ceph/ceph-osd.0_log.txt#18 | 16:21 |
dansmith | lol | 16:21 |
dansmith | yes, I understand 10G is being used | 16:21 |
sean-k-mooney | or if the destack pluging is branched we can change it only on the branchs that use nautilus | 16:21 |
dansmith | I'm saying I don't see a config for that | 16:21 |
dansmith | https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/ | 16:21 |
sean-k-mooney | if there is a config it would be the ceph config file | 16:21 |
melwitt | oh, sorry, just saying that's a command that is setting 10G deliberately | 16:22 |
dansmith | melwitt: yep I think that's understood now | 16:22 |
melwitt | from what sean-k-mooney was saying, I thought no one saw a deliberate setting of it yet | 16:22 |
melwitt | that it was happening "automatically" | 16:22 |
melwitt | I am caught up now | 16:22 |
dansmith | all I'm saying is, I imagine that bluestore can have more than 10G of backing store, and if it's not basing that on actual disk free space, it's probably a config somewhere or something :) | 16:23 |
melwitt | yeah, I understand now | 16:23 |
melwitt | I misunderstood what sean was saying earlier | 16:24 |
dansmith | the bluestore config actually seems to be mostly focused on using physical devices | 16:25 |
*** songwenping__ has joined #openstack-nova | 16:25 | |
sean-k-mooney | melwitt: it happens frequently that people missunderstad me. well not that often if i type what i ment to type but i am bad at not doing that | 16:27 |
*** songwenping_ has quit IRC | 16:27 | |
*** nightmare_unreal has quit IRC | 16:27 | |
melwitt | sean-k-mooney: eh, I often have trouble understanding people so by our powers combined... ! | 16:28 |
dansmith | I HAVE NO IDEA WHAT YOU PEOPLE ARE SAYING | 16:30 |
melwitt | I AM GOOD AT DEALING WITH PEOPLE | 16:31 |
* stephenfin decides this is too much weirdness and bails | 16:31 | |
sean-k-mooney | stephenfin: talking about storage does this to people | 16:31 |
sean-k-mooney | just taking a step back | 16:32 |
sean-k-mooney | we are happy we know where the 10G size is comming from now | 16:32 |
melwitt | StOrAGE | 16:32 |
*** eharney has quit IRC | 16:32 | |
sean-k-mooney | and that we are proably just hitting a real no valid host error because we are actully providing 10G to cpeh instead of 24 | 16:33 |
sean-k-mooney | so the ceph jobs failrues are not related to dansmith's recent changes to the job | 16:33 |
sean-k-mooney | yes? | 16:33 |
*** ociuhandu has joined #openstack-nova | 16:34 | |
dansmith | sean-k-mooney: yeah I thought we were assuming that | 16:34 |
dansmith | because it's clearly just placement | 16:34 |
dansmith | my job might be slower (or faster) causing us to hit it more than we were or something | 16:34 |
sean-k-mooney | so we just either a.) swap back to file store to get the old behavior or b.) mount our loopback file in such a way that the bluestore block device uses our 24G loopback device instead of creating its own | 16:35 |
dansmith | yeah so I figured going back to xfs would be ideal for compatibility with everything | 16:35 |
dansmith | my system clearly gets xfs | 16:35 |
dansmith | I assume the workers are getting blue because they're newer ubuntu or something | 16:35 |
dansmith | I'm still on bionic | 16:36 |
sean-k-mooney | dansmith: sure but eventurally we will have to move since i think filestore is deprected in ceph | 16:36 |
dansmith | sure | 16:36 |
dansmith | pain now or pain later | 16:36 |
dansmith | pain later might be someone else's pain :P | 16:36 |
sean-k-mooney | so i guess what we are looking for is a ceph config option to select filestore for the osd backend | 16:36 |
sean-k-mooney | that or we set it on the osd create command | 16:37 |
dansmith | yeah | 16:38 |
sean-k-mooney | so this code https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L475-L484 | 16:38 |
*** ociuhandu has quit IRC | 16:39 | |
sean-k-mooney | that inital sudo ceph -c ${CEPH_CONF_FILE} osd create | 16:39 |
dansmith | well, the other option is to figure out how to make blue use 20ish G instead of 10, | 16:39 |
dansmith | which would be less impactful than retooling the mount stuff in the ceph plugin | 16:39 |
sean-k-mooney | well we are mounting it on /var/lib/ceph | 16:40 |
sean-k-mooney | so i guess this is already plugin specific | 16:40 |
sean-k-mooney | we are likely resuing the same function that is used for cinder and just passing the mount path | 16:40 |
sean-k-mooney | ya we are just calling create_disk https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L387 | 16:40 |
dansmith | cinder just wants a loop not a mounted fs though right/ | 16:41 |
sean-k-mooney | maybe this is the fucntion in devstack https://github.com/openstack/devstack/blob/eee60c76719c02c08dba7b7fb703798a056b22b9/functions#L758-L789 | 16:41 |
sean-k-mooney | that kind of looks like a hack | 16:42 |
sean-k-mooney | e.g. that does not look like it was created orginally for ceph | 16:42 |
melwitt | hm, I found this https://forum.proxmox.com/threads/proxmox-ceph-osd-partition-created-with-only-10gb.55291/ | 16:43 |
sean-k-mooney | oh its for swift orginially | 16:43 |
*** maciejjozefczyk has joined #openstack-nova | 16:44 | |
sean-k-mooney | i think the "sudo ceph-osd -c ${CEPH_CONF_FILE} -i ${OSD_ID} --mkfs" is the one we would need to modify | 16:46 |
sean-k-mooney | melwitt: that does seam like the same issue more or less | 16:49 |
melwitt | yeah, I'm having trouble understanding it | 16:49 |
melwitt | the last comment links to another post https://forum.proxmox.com/threads/where-can-i-tune-journal-size-of-ceph-bluestore.44000/ where they're talking about tuning journal size and bluestore_block_db_size and bluestore_block_wal_size | 16:50 |
melwitt | and I don't know what any of that is or means | 16:51 |
melwitt | (in ceph.conf) | 16:51 |
*** markvoelker has quit IRC | 16:52 | |
*** maciejjozefczyk has quit IRC | 16:59 | |
*** k_mouza has quit IRC | 16:59 | |
sean-k-mooney | those are not realated to the data storage size of the osd | 17:01 |
sean-k-mooney | blustore has an embeed database that track where the logic block are located on disk | 17:01 |
sean-k-mooney | wal i think it the write ahead log or somethingl like that | 17:02 |
sean-k-mooney | its part of how it does write journalling | 17:02 |
sean-k-mooney | in both cases they are turning parmatner for how bluestore can save its metadata | 17:03 |
sean-k-mooney | unlike file sotre it can save it inline in the blockdevice it is managening or it can save it oh external devices and they support tuneing of the sizing of them independelty | 17:03 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Handle Neutron errors in _post_live_migration() https://review.opendev.org/729763 | 17:04 |
melwitt | sean-k-mooney: found a new thing https://bugzilla.redhat.com/show_bug.cgi?id=1597048 | 17:09 |
openstack | bugzilla.redhat.com bug 1597048 in RADOS "ceph osd df not showing correct disk size and causing cluster to go to full state" [High,Closed: notabug] - Assigned to bhubbard | 17:09 |
dansmith | imagine that :) | 17:10 |
melwitt | what | 17:11 |
sean-k-mooney | it should be 3.7TB but is 10G | 17:11 |
dansmith | melwitt: "not showing correct disk size" | 17:11 |
melwitt | yeah? | 17:11 |
melwitt | I'm still googling for why bluestore is maxed out at 10G | 17:12 |
dansmith | melwitt: just saying, I think we've stumbled into a realization that our df reporting on ceph in libvirt es no bueno right? | 17:12 |
melwitt | no | 17:12 |
dansmith | oh did I miss something? I thought those RHN articles were indicating that we're reporting the wrong thing still | 17:13 |
sean-k-mooney | https://bugzilla.redhat.com/show_bug.cgi?id=1597048#c8 | 17:13 |
openstack | bugzilla.redhat.com bug 1597048 in RADOS "ceph osd df not showing correct disk size and causing cluster to go to full state" [High,Closed: notabug] - Assigned to bhubbard | 17:13 |
dansmith | like, if we're reporting the total size of the osd, but that's shared by vms and images, we'll be telling placement it can allocate all that space for instances but it can't | 17:14 |
sean-k-mooney | so it look like they hever actully got to the root cause of why the bluestore file was a 10G file | 17:14 |
sean-k-mooney | they just redeployed with file store an ignored it | 17:14 |
*** songwenping_ has joined #openstack-nova | 17:14 | |
melwitt | yeah.. but then what does this mean? "The BlueStore block device was a file named with a block, not a symlink to block device partition of this disk and that file size was 10G hence it was showing the size of the OSD as 10G." | 17:15 |
sean-k-mooney | i think they ment | 17:15 |
sean-k-mooney | that in stead of it being a symlink to /dev/sdX | 17:15 |
sean-k-mooney | it was a file named block | 17:15 |
sean-k-mooney | that was 10G | 17:15 |
sean-k-mooney | hence -rw-r--r--. 1 ceph ceph 10737418240 Jul 2 16:51 block | 17:16 |
melwitt | right.. so you think it's correct that it's pointing at a file named block? and that the problem is that the file is not larger than 10G? | 17:16 |
sean-k-mooney | they were expecting it to be a symlink to the actual hdd | 17:16 |
sean-k-mooney | well in that case yes | 17:16 |
sean-k-mooney | and also likely in our case | 17:16 |
melwitt | ok. from reading that I thought maybe it was pointing wrongly at a file | 17:17 |
*** songwenping__ has quit IRC | 17:17 | |
*** k_mouza has joined #openstack-nova | 17:17 | |
sean-k-mooney | /var/lib/ceph/osd/ceph-0/block is liekly a 10G file | 17:17 |
dansmith | I think that bug is that they deployed on file instead of having the bluestore osd use the disk they wanted | 17:17 |
sean-k-mooney | dansmith: yes | 17:17 |
dansmith | we want file, they wanted disk, right? | 17:17 |
sean-k-mooney | yes | 17:18 |
melwitt | oh | 17:18 |
melwitt | ok so why is /var/lib/ceph/osd/ceph-0/block only 10G ... who creates it ... | 17:18 |
dansmith | right, that I think we still don't know.. where the 10G comes from and how we change it | 17:19 |
dansmith | because the osd itself (the driver) seems to create that as a flat 10G file if it's not there | 17:19 |
melwitt | yeah, at least before now I did not know that the 10G comes from the size of the file named "block" so now I'm gonna see if I can find where that file is created | 17:20 |
*** k_mouza has quit IRC | 17:21 | |
*** k_mouza has joined #openstack-nova | 17:22 | |
*** k_mouza has quit IRC | 17:27 | |
melwitt | hm https://github.com/ceph/ceph/blob/master/src/common/legacy_config_opts.h#L940 | 17:28 |
*** hamalq has joined #openstack-nova | 17:29 | |
dansmith | lol | 17:29 |
melwitt | https://github.com/ceph/ceph/blob/8c1a077e560248760ac441f315b84304aa693e72/src/common/options.cc#L4122 | 17:29 |
dansmith | so maybe we're supposed to create that block file to be what we want it to be | 17:30 |
melwitt | looks like they changed the default to 100G at some point | 17:30 |
dansmith | definitely obscure though | 17:30 |
sean-k-mooney | dansmith: yes we are ment to create the file/partion first normaly when deploying ceph | 17:30 |
melwitt | well I think you can set bluestore_block_size in ceph.conf no? | 17:30 |
melwitt | oh | 17:30 |
sean-k-mooney | am likely | 17:31 |
sean-k-mooney | but ceph does not expect to have to create this normally | 17:31 |
melwitt | https://github.com/ceph/ceph/commit/57890fce7064811780823e298b31e7fced2fa0e3 | 17:31 |
sean-k-mooney | if you use the tooling they provide tehy create the partions ahead of time | 17:31 |
melwitt | that's more recent, change from 1 TB -> 100G default. but in older versions the default was 10G, trying to see when that was so we can compare with what version we're running | 17:32 |
sean-k-mooney | this is the funtion that actully creates teh file https://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L5934 | 17:33 |
melwitt | v15.1.0 is Octopus | 17:33 |
sean-k-mooney | if the block file is not present it creates it https://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L5931-L5934 | 17:34 |
melwitt | and somehow 'size' is passed in from the config option I assume | 17:34 |
sean-k-mooney | that is what im currently trying to find yes | 17:34 |
sean-k-mooney | this maybe https://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L5943-L5944 | 17:36 |
sean-k-mooney | ah no its here | 17:38 |
sean-k-mooney | https://github.com/ceph/ceph/blob/nautilus/src/os/bluestore/BlueStore.cc#L6050-L6052 | 17:38 |
sean-k-mooney | so in the mkfs call | 17:38 |
sean-k-mooney | so when we do this | 17:39 |
sean-k-mooney | https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L482-L483 | 17:39 |
melwitt | ah yup, and it's pulling the conf option | 17:39 |
sean-k-mooney | it cause the mkfs function to be invoked on the backend store | 17:39 |
sean-k-mooney | whic for bluestore uses that config option to create the 10G file | 17:39 |
melwitt | the interesting thing is, I wonder why it uses the legacy option and not the new one. I don't understand how that works in their code. cause in nautilus they have both the 10G and 100G default in the legacy conf vs the non | 17:40 |
sean-k-mooney | if var/lib/ceph/osd/ceph-0/block is not a symplink to a device | 17:40 |
sean-k-mooney | melwitt: its legacy on master | 17:41 |
sean-k-mooney | it might not be on nautalius | 17:41 |
melwitt | oh, I thought you mentioned earlier that CI is using nautilus | 17:41 |
sean-k-mooney | actully its alos here on master https://github.com/ceph/ceph/blob/master/src/common/options.cc#L4127-L4131 | 17:41 |
sean-k-mooney | melwitt: yes it is | 17:42 |
sean-k-mooney | actully just above that | 17:42 |
sean-k-mooney | https://github.com/ceph/ceph/blob/master/src/common/options.cc#L4122-L4125 | 17:42 |
melwitt | yeah I'm saying it's weird that it's not defaulting to 100G like that is showing | 17:42 |
melwitt | the old default was 10G | 17:42 |
sean-k-mooney | yep | 17:43 |
sean-k-mooney | we are pulling 14.2.2 https://github.com/ceph/ceph/blob/v14.2.2/src/common/options.cc#L4339 | 17:43 |
sean-k-mooney | which is 10 | 17:44 |
sean-k-mooney | they backported the 100G change to nautilus | 17:44 |
sean-k-mooney | but its not in the tag we are pulling | 17:44 |
sean-k-mooney | i think legacy_config_opts.h is just an old way to define config options | 17:45 |
melwitt | ohhh | 17:45 |
melwitt | good find. ok at least everything makes sense now | 17:45 |
sean-k-mooney | rather then deprected by the way | 17:45 |
sean-k-mooney | ya so i guess we just set that config option to say 20G? | 17:45 |
sean-k-mooney | in ceph.conf | 17:46 |
melwitt | yeah, seems like it | 17:46 |
sean-k-mooney | which we can do here https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L415-L428 | 17:46 |
melwitt | yarp. just have to double check whether it's a "global" or what. are those config groups or? | 17:47 |
sean-k-mooney | i like how this is basically undocumeted other then in the source code | 17:47 |
sean-k-mooney | i think in global yes | 17:47 |
melwitt | yeah, I know. they have a bluestore config doc but zero mention of this https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref | 17:48 |
sean-k-mooney | iniset -sudo ${CEPH_CONF_FILE} global "bluestore_block_size" "20" | 17:48 |
sean-k-mooney | is that right? | 17:48 |
sean-k-mooney | i was search for 10_G but i think _G is a user defied suffix | 17:49 |
sean-k-mooney | so now i need to find that | 17:49 |
*** aj_mailing has joined #openstack-nova | 17:50 | |
sean-k-mooney | yep https://github.com/ceph/ceph/blob/8c1a077e560248760ac441f315b84304aa693e72/src/common/options.cc#L343-L345 | 17:50 |
melwitt | oh, is the unit GB or something else? | 17:51 |
sean-k-mooney | its in bytes i think | 17:52 |
sean-k-mooney | 10_G is doing 10 << 32 | 17:52 |
sean-k-mooney | its a c++ 11 user defied literal https://en.cppreference.com/w/cpp/language/user_literal | 17:52 |
sean-k-mooney | actuly its << 30 not 32 | 17:53 |
sean-k-mooney | but ya still bytes | 17:53 |
sean-k-mooney | unsigned long long .... im glad they also defined a bettere way to name integers in c++11 so you dont have toe use that c way of naming types | 17:54 |
melwitt | ok so you can't just put "20" in the conf | 17:54 |
sean-k-mooney | i think we have to do 20<<30 | 17:54 |
sean-k-mooney | so 21474836480 | 17:55 |
melwitt | right | 17:55 |
*** tesseract has quit IRC | 17:59 | |
sean-k-mooney | ill pretend tehy are not potting a unsigned long long into a size_t variant without asserting it fits | 17:59 |
melwitt | :) | 18:00 |
*** k_mouza has joined #openstack-nova | 18:00 | |
*** aj_mailing has quit IRC | 18:01 | |
*** aj_mailing has joined #openstack-nova | 18:02 | |
*** k_mouza has quit IRC | 18:05 | |
*** gmann is now known as gmann_lunch | 18:13 | |
dansmith | have ya'll fixed it yet? | 18:16 |
sean-k-mooney | im looking at a linux bridge issue from the neutron channel currently but it looks liek we jsut need one more line here to set the config option https://github.com/openstack/devstack-plugin-ceph/blob/master/devstack/lib/ceph#L429 | 18:18 |
sean-k-mooney | dansmith: can you test it with your local setup | 18:18 |
sean-k-mooney | just add iniset -sudo ${CEPH_CONF_FILE} global "bluestore_block_size" "21474836480" | 18:19 |
dansmith | yup | 18:19 |
dansmith | oh wait, I can't | 18:19 |
dansmith | because mine doesn't use blue | 18:19 |
dansmith | but I can float a patch and get jobs going | 18:19 |
sean-k-mooney | ya that works | 18:19 |
sean-k-mooney | i dont have a ceph env currently i could set one up but its almost half past 7 on a friday so dont want to wait for it to stack :) | 18:20 |
*** ociuhandu has joined #openstack-nova | 18:22 | |
dansmith | sean-k-mooney: dude, you need to cut yourself off :) | 18:24 |
dansmith | https://review.opendev.org/#/c/742961/ | 18:25 |
*** ociuhandu has quit IRC | 18:27 | |
dansmith | I think the nova team needs to have the keys to sean-k-mooney's irc bouncer so we can turn it off when it's time for him to sleep | 18:27 |
sean-k-mooney | hehe i dont use one i just dont trun my laptop off :P | 18:28 |
dansmith | like giving car keys to the bartender | 18:28 |
melwitt | sean-k-mooney laptop and dev box permanently ON | 18:28 |
dansmith | sean-k-mooney: well, then an ssh account to your laptop I guess | 18:29 |
sean-k-mooney | melwitt: yes they more or less are. | 18:29 |
dansmith | melwitt: more like sean-k-mooney permanently ON | 18:30 |
melwitt | man, what was I doing earlier | 18:30 |
melwitt | true | 18:30 |
dansmith | laptop sleep timer be like "jesus when is he going to go to bed, I'm exhausted" | 18:30 |
melwitt | haha yeah | 18:30 |
artom | dansmith, I think we'll need remote access to his fuse box... | 18:31 |
artom | First we'll need to invent an SSHable fuse box... | 18:31 |
* artom checks | 18:31 | |
dansmith | artom: he's a property owner now, so we can't go to the landlord | 18:31 |
dansmith | artom: no need to cut the power just reset his luks key | 18:32 |
* artom was half expecting remotable fuse boxes to exist, because IoT | 18:32 | |
artom | But then they'd have a crappy web UI with 'password' hardcoded as the admin password | 18:33 |
artom | So maybe not | 18:33 |
melwitt | yeah, I would not be surprised | 18:33 |
sean-k-mooney | dansmith: oh your mean resting the luks key on my laptop would be a pain to fix | 18:34 |
dansmith | sean-k-mooney: no, we can reset it back to the one you know when you should be online | 18:34 |
sean-k-mooney | ah ok | 18:34 |
dansmith | hah | 18:35 |
sean-k-mooney | speaking of which o/ | 18:35 |
dansmith | good :) | 18:35 |
melwitt | wait, weren't you on pto today too? what the heck | 18:35 |
*** slaweq has joined #openstack-nova | 18:35 | |
sean-k-mooney | yesterday | 18:35 |
sean-k-mooney | well untill today | 18:36 |
dansmith | maybe /melwitt/ needs the sleep | 18:36 |
sean-k-mooney | i was at a funeral | 18:36 |
sean-k-mooney | so ya back for a "shortish" day of not very stressful things | 18:36 |
sean-k-mooney | i planned to leave after teh bug call but got distraced | 18:36 |
sean-k-mooney | anyway food | 18:36 |
sean-k-mooney | o/ | 18:37 |
melwitt | well we swapped running the bug call today so I was like why is sean here | 18:37 |
melwitt | have a nice weekend o/ | 18:37 |
artom | melwitt, he was on PTO wednesday, so unable to send out the email | 18:37 |
artom | And usual email + run the call go hand in hand | 18:37 |
melwitt | no, I swapped with him and I was supposed to send the email | 18:37 |
melwitt | I just forgot to | 18:37 |
artom | Right, so swap means you get to run the call as well | 18:37 |
melwitt | I know, and I did | 18:38 |
artom | But... he's allowed to be there for the call | 18:38 |
melwitt | I know he's allowed to be there lol | 18:38 |
artom | WELL WHAT THE HELL ARE WE ARGUING ABOUT | 18:38 |
melwitt | I just thought if we swapped cause he was out, I was surprised when he was there | 18:38 |
melwitt | I DONT KNOW | 18:38 |
artom | WHY ARE WE YELLING | 18:38 |
melwitt | WE ARE HAVING TROUBLE CONTROLLING THE VOLUME OF OUR VOICE | 18:39 |
artom | OH RIGHT I STARTED IT IM SO SORRY | 18:39 |
melwitt | s/TROUBLE/DIFFICULTY/ | 18:40 |
artom | Umm, how about we do real work for a bit? Is there a way to see stats for a particular job? nova-ceph-multistore just failed twice on me | 18:40 |
dansmith | lol | 18:40 |
dansmith | dude | 18:40 |
artom | mriedem would have hacked up a logtash query in seconds | 18:40 |
dansmith | have you like paid attention to the last three hours in here at all? | 18:40 |
melwitt | omg | 18:40 |
dansmith | and also, logstash is still fubar I think | 18:40 |
melwitt | no u di'nt | 18:40 |
artom | dansmith, me? Pay attention? lol u cray cray | 18:41 |
dansmith | apparently ;) | 18:41 |
*** xinranwang__ has quit IRC | 19:04 | |
*** huaqiang has joined #openstack-nova | 19:18 | |
*** gmann_lunch is now known as gmann | 19:19 | |
mriedem | i only do logdna queries these days now anyway | 19:22 |
*** dklyle has quit IRC | 19:26 | |
*** dklyle has joined #openstack-nova | 19:30 | |
*** dklyle has quit IRC | 19:40 | |
melwitt | dansmith: I opened https://bugs.launchpad.net/nova/+bug/1888895 for the gate failure. going to mail the ML now | 19:43 |
openstack | Launchpad bug 1888895 in devstack-plugin-ceph "nova-ceph-multistore job fails often with 'No valid host was found. There are not enough hosts available.'" [Undecided,In progress] | 19:43 |
dansmith | cool | 19:43 |
melwitt | great, the WIP patch just failed | 19:44 |
melwitt | [errno 110] error connecting to the cluster wtf | 19:44 |
dansmith | maybe that config made it fail to start? | 19:46 |
dansmith | no ceph logs | 19:47 |
dansmith | hrm, don't even see it set the ini, | 19:47 |
dansmith | so maybe it didn't even get that far | 19:47 |
melwitt | yeah must be unless it's a fluke coincidence that ceph totally bombed this time | 19:50 |
dansmith | I think it must be a fluke bombing because it didn't run that line | 19:51 |
*** dklyle has joined #openstack-nova | 19:51 | |
dansmith | ah here we go: 2020-07-24 18:32:36.045 | /opt/stack/devstack-plugin-ceph/devstack/lib/ceph: line 429: (24G: value too great for base (error token is "24G") | 19:53 |
melwitt | ah | 19:55 |
dansmith | hacky fix | 19:55 |
*** ralonsoh has quit IRC | 19:57 | |
*** dklyle has quit IRC | 20:04 | |
dansmith | melwitt: oops, should have put the bug on that, sorry | 20:05 |
dansmith | but it'll need cleanup | 20:05 |
melwitt | ah yeah | 20:05 |
melwitt | I was just thinking, I think all ceph jobs in openstack are broken over this, not just ours | 20:05 |
dansmith | potentially, but like I said, my job may be running slower or have different behaviors | 20:06 |
dansmith | actually | 20:07 |
melwitt | yeah. I was thinking of replying to mention other ceph jobs may be affected too. I see the older version 14.2.2 being pulled in openstack/tempest, for example | 20:07 |
dansmith | heh, I just saw a glance failure on the plain ceph job which is the same novalidhost | 20:07 |
dansmith | so I think yeah. | 20:07 |
dansmith | yeah | 20:07 |
* melwitt nods | 20:07 | |
dansmith | this is from one of my glance patches: https://e81e6b81331830d4903c-5acdef5dc10478cee5291df1596ec66a.ssl.cf1.rackcdn.com/742065/9/check/devstack-plugin-ceph-tempest-py3/292fd21/testr_results.html | 20:08 |
melwitt | ah yeah. and this is the tempest job failure I was looking at https://1ca2ee7583d21788b1d8-42b9b3ca9891e58d539431fcfb5b799d.ssl.cf2.rackcdn.com/742836/2/check/devstack-plugin-ceph-tempest-py3/547fab7/testr_results.html | 20:09 |
melwitt | (I picked a recent patch proposed to the tempest repo) | 20:09 |
dansmith | cool | 20:09 |
*** dklyle has joined #openstack-nova | 20:09 | |
dansmith | that's awesome because of two things: | 20:09 |
dansmith | 1. I didn't break stuff | 20:10 |
dansmith | 2. I get to steal the credit from sean-k-mooney for fixing more people! | 20:10 |
melwitt | lol awww | 20:10 |
*** ociuhandu has joined #openstack-nova | 20:10 | |
dansmith | dang, now I better give him honorable mention in the commit message ;P | 20:10 |
melwitt | hey, I found the default config value too | 20:11 |
melwitt | after that he blew past me finding how/where it was used | 20:11 |
dansmith | heh | 20:12 |
dansmith | he gave me a line to copy/paste, so... | 20:12 |
melwitt | yeah, that's what it's all about | 20:13 |
*** ociuhandu has quit IRC | 20:15 | |
*** dave-mccowan has joined #openstack-nova | 20:25 | |
*** mriedem has left #openstack-nova | 20:58 | |
*** ociuhandu has joined #openstack-nova | 22:02 | |
*** ociuhandu has quit IRC | 22:07 | |
*** _erlon_ has quit IRC | 22:23 | |
*** raildo has quit IRC | 22:28 | |
*** dave-mccowan has quit IRC | 22:44 | |
*** mlavalle has quit IRC | 23:02 | |
*** martinkennelly has quit IRC | 23:09 | |
*** hamalq has quit IRC | 23:10 | |
*** tonyb[m] has left #openstack-nova | 23:21 | |
*** bbowen has quit IRC | 23:34 | |
*** bbowen has joined #openstack-nova | 23:35 | |
*** tosky has quit IRC | 23:55 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!