| dviroel | it looks like our grenade job is broken with the issue: "ModuleNotFoundError: No module named 'pkg_resources'" | 11:56 |
|---|---|---|
| dviroel | I think that we need to wait for the pin setuptools fixes | 12:03 |
| opendevreview | Takashi Kajinami proposed openstack/watcher master: devstack: Remove ineffective ssl options https://review.opendev.org/c/openstack/watcher/+/975809 | 12:08 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Use wrapper classes for novaclient objects https://review.opendev.org/c/openstack/watcher/+/972913 | 13:02 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Change id field to uuid when appropiate https://review.opendev.org/c/openstack/watcher/+/975200 | 13:02 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Prepare to use openstacksdk instead of novaclient https://review.opendev.org/c/openstack/watcher/+/974710 | 13:02 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Complete migration from novaclient to openstacksdk https://review.opendev.org/c/openstack/watcher/+/974924 | 13:02 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Remove usage of novaclient from Watcher https://review.opendev.org/c/openstack/watcher/+/974925 | 13:02 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Remove nova_helper retries for openstacksdk params https://review.opendev.org/c/openstack/watcher/+/975498 | 13:02 |
| sean-k-mooney | dviroel: yes i think gmann is lookign into grenade fixes but the setuptools pin is not trivial | 14:22 |
| sean-k-mooney | we actully need to disable build isolation which is not good | 14:22 |
| sean-k-mooney | so this is obviosly evolving | 14:23 |
| sean-k-mooney | fixing grenade alse require use to emrge patchies in teh reverse order in some cases so its a mess | 14:23 |
| dviroel | yeah, I'm also following the discussion in the ML | 14:24 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Use wrapper classes for novaclient objects https://review.opendev.org/c/openstack/watcher/+/972913 | 15:19 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Change id field to uuid when appropiate https://review.opendev.org/c/openstack/watcher/+/975200 | 15:19 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Prepare to use openstacksdk instead of novaclient https://review.opendev.org/c/openstack/watcher/+/974710 | 15:19 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Complete migration from novaclient to openstacksdk https://review.opendev.org/c/openstack/watcher/+/974924 | 15:19 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Remove usage of novaclient from Watcher https://review.opendev.org/c/openstack/watcher/+/974925 | 15:19 |
| opendevreview | Joan Gilabert proposed openstack/watcher master: Remove nova_helper retries for openstacksdk params https://review.opendev.org/c/openstack/watcher/+/975498 | 15:19 |
| dviroel | sean-k-mooney: amoralej_ jgilaber: WRT SQLite errors when running with parallel engine and native threads, i have some other tests results: | 15:19 |
| dviroel | I can reproduce the same issue replacing Taskflow by a ThreadPool, that runs actions in parallel | 15:19 |
| sean-k-mooney | ack we expect each thread to use its own sperate connection | 15:20 |
| dviroel | It is also possiblle to create a new unit test in Nova to simulate the same behavior (concurrent threads updating Server states), and it also fail in the same way... | 15:20 |
| sean-k-mooney | well you are ment to use a lock on the instnace uuid to prevent that | 15:20 |
| sean-k-mooney | that actully might be the fix | 15:20 |
| sean-k-mooney | we can use the oslo concurance lockutils.synconise decorator to add a lock on the action state update using the action uuid | 15:21 |
| amoralej_ | if the problem is in sqlite managing concurrency that should work | 15:21 |
| sean-k-mooney | so any one action shoudl only be mutated form one trhead at a time | 15:21 |
| sean-k-mooney | if we are modifyign it form diffent thread we need to ahve a lock to serialsie the critical sections | 15:22 |
| amoralej_ | but the processes which are interlocking are on the same action ? | 15:22 |
| sean-k-mooney | and we need that lock even with eventlet | 15:22 |
| sean-k-mooney | its a pre exisitng bug | 15:22 |
| amoralej_ | what are actually the locking queries ? | 15:22 |
| dviroel | no, each thread updates a different action | 15:22 |
| amoralej_ | that was my understanding | 15:22 |
| amoralej_ | for what i understood from dviroel, it's not a bug | 15:23 |
| amoralej_ | it's a limitation in sqlite | 15:23 |
| dviroel | yes, that's my point, it doesn't deal with concurrent writes | 15:24 |
| amoralej_ | (btw, i've been testing the sqlalchemy traces with osprofile and it will work, i'll send the patch soon if you want to use it) | 15:24 |
| amoralej_ | and that's why we never hit that out of unit tests, i guess | 15:25 |
| amoralej_ | is there any way to improve concurrency behavior in sqlite? what we mentioned of using file-backed db? | 15:26 |
| sean-k-mooney | i guess sqlight m8ight be lockign the file/table | 15:26 |
| sean-k-mooney | rather then at the rwo level but i tought hat was also doable | 15:26 |
| amoralej_ | so, i wouldn't do exotic things in sqlite or db connection side to fix an issue which is ci only | 15:31 |
| sean-k-mooney | amoralej_ its not clear it is ci only | 15:32 |
| amoralej_ | everything i hear looks like so :) | 15:32 |
| sean-k-mooney | if its diffent action it may be | 15:32 |
| amoralej_ | is it possible to use the mysql fixtures for this test ? | 15:32 |
| dviroel | the unit test that claude assisted to create: https://paste.centos.org/view/c5679727 (for Nova) | 15:32 |
| amoralej_ | at least to check if we can reproduce it with mysql | 15:32 |
| sean-k-mooney | yes but i dont really want to do that | 15:32 |
| amoralej_ | https://news.ycombinator.com/item?id=45508462 "The single-writer limitation in SQLite is per-database, not per-connection." | 15:33 |
| amoralej_ | it's not a manual ^ but ... | 15:33 |
| sean-k-mooney | amoralej_: so sqlachme is ment to handelign that for us | 15:34 |
| amoralej_ | how | 15:34 |
| sean-k-mooney | its ment to make sure that only one transaction is executing at a time | 15:34 |
| amoralej_ | even for different threads? | 15:35 |
| sean-k-mooney | i belive so yes | 15:35 |
| sean-k-mooney | dviroel: canyou set https://github.com/openstack/watcher/blob/master/watcher/tests/fixtures/conf_fixture.py#L30 to ture | 15:35 |
| amoralej_ | if sqlalchemy can do the trick for us, that'd be good | 15:37 |
| sean-k-mooney | we are setting it to false for perforamce reasons i think in test but we might need to rethink that when using real treads | 15:38 |
| dviroel | yes, I already did that, but i can just rerun with true again here | 15:39 |
| dviroel | yeah, still fails with synchronous to True | 15:41 |
| sean-k-mooney | ack | 15:41 |
| dviroel | i've also trie enabled WAL too: https://www.sqlite.org/wal.html | 15:41 |
| sean-k-mooney | so filebacked and synconus and with wal and it still happens with all 3 | 15:41 |
| amoralej_ | https://oldmoe.blog/2024/07/08/the-write-stuff-concurrent-write-transactions-in-sqlite/ according to that, wal only helps in multiple reads | 15:42 |
| amoralej_ | not writes | 15:42 |
| dviroel | yeah | 15:42 |
| dviroel | it allow concurrent reads and one write, but does not solve multiple writes | 15:43 |
| sean-k-mooney | so all db writes in watcher should be happening under a writer tranaction | 15:43 |
| sean-k-mooney | in the test we coudl monkey patch that to add a lock | 15:43 |
| sean-k-mooney | that will serialise all writes | 15:43 |
| amoralej_ | but we don't want that for production, right? | 15:44 |
| dviroel | i think that would be just for unit test? | 15:45 |
| sean-k-mooney | that why i said monky patch or baiclly use a fixture | 15:45 |
| sean-k-mooney | so in the base testcase we can wrap the normal write traskaction in a lock | 15:45 |
| amoralej_ | but, why we need to force to use sqlite even if that means hacking code in tests? | 15:46 |
| sean-k-mooney | becasue we have a requiremetn that all unit test must run with sqlite | 15:46 |
| sean-k-mooney | so we make it work or we remvoe teh test | 15:46 |
| amoralej_ | then, let's change the unit tests | 15:46 |
| dviroel | humm | 15:46 |
| amoralej_ | but for me, that means we will never be able to test concurrency in unit tests | 15:47 |
| sean-k-mooney | well we shoudl not by defintion | 15:47 |
| amoralej_ | which ok, maybe that's something only for funtional or tempest tests | 15:47 |
| sean-k-mooney | its ok to do that in a fucntional test but not really in unit tests | 15:48 |
| amoralej_ | ok, then let's rework the test and mock the writes or whatever we need | 15:48 |
| sean-k-mooney | that a way forward but we will ahve to solve this for the funcitoal tests where we will want to test with mutliple threads | 15:49 |
| amoralej_ | and another database :) | 15:50 |
| sean-k-mooney | no | 15:50 |
| sean-k-mooney | the funcitonal test sshoudl work with sqlite too | 15:50 |
| amoralej_ | then, it's not functional | 15:50 |
| amoralej_ | why | 15:50 |
| sean-k-mooney | it is | 15:50 |
| sean-k-mooney | funcitonal test are not allowed to dependn on external processes | 15:50 |
| sean-k-mooney | they do not mock the internal code but mock the extenral service they interact with | 15:51 |
| sean-k-mooney | or use a replacement like sqlight | 15:51 |
| sean-k-mooney | they are inteded to be run localy before pushing to ci | 15:51 |
| sean-k-mooney | and we shoudl not need to install a db on your laptop to run them | 15:51 |
| sean-k-mooney | requiring mysql to run the funcional tests basiclly means your going to have to do yoru dev in a vm most of the time because you should be runnign unit and funcitonal tests locally beofre pushing a patch for review | 15:52 |
| sean-k-mooney | im not saying we canat decisde to use something other then sqlite im sayign if we do that it a very big change in the develope workflow and not one we shoudl do lightly | 15:53 |
| amoralej_ | but in functional tests we'll need to run watcher-api, watcher-applier and watcher-decision-engine in parallel | 15:54 |
| amoralej_ | right? | 15:54 |
| sean-k-mooney | yep | 15:54 |
| amoralej_ | and sqlite will be able to handle concurrency between different processes? | 15:54 |
| amoralej_ | i'd say not | 15:54 |
| sean-k-mooney | which we do for nova-compute, nova-api nova-schduler and nova-conductor to name but a few today | 15:54 |
| sean-k-mooney | with sqlite at least under eventlet | 15:54 |
| amoralej_ | how you do it in nova? | 15:55 |
| sean-k-mooney | we havent done it in threaded mode but ineventlety each runs in its own eventlet | 15:55 |
| sean-k-mooney | and only one is ever writign to the db at once as a result as there is only 1 thread | 15:55 |
| sean-k-mooney | i want to emulate that in the threaded world too if we can | 15:55 |
| sean-k-mooney | we use oslo messaging in memroy transport to simulate rabbit | 15:56 |
| sean-k-mooney | and sqlite fo rthe db | 15:56 |
| amoralej_ | so you run a single process running nova-compute, nova-schedule, etc.. in different threads? | 15:56 |
| sean-k-mooney | in diffent greentread via eventlet yes | 15:56 |
| amoralej_ | same process but different eventlets, i mean | 15:56 |
| sean-k-mooney | yep | 15:56 |
| amoralej_ | that's much more invasive that i thought, tbh | 15:56 |
| sean-k-mooney | not really | 15:57 |
| amoralej_ | i was thinking in running them as independent processes similar to a real deployment | 15:57 |
| sean-k-mooney | that woudl not be a fucntionla test that woudl be an integration test at that point | 15:57 |
| amoralej_ | well, functional tests are kind of integration tests, imo | 15:59 |
| sean-k-mooney | they are not | 15:59 |
| sean-k-mooney | they are very much a sepreate clase between unit and integration tests | 15:59 |
| sean-k-mooney | as they are not ment to tst the integration with external services | 15:59 |
| sean-k-mooney | just the internal funcitoning fo the service under test | 15:59 |
| amoralej_ | but that does not imply that we are force to run everything in the same process | 16:00 |
| sean-k-mooney | so they mock or fake the external deps btu do not mock the internal service | 16:00 |
| sean-k-mooney | we are not force too but i strongly belive we should | 16:00 |
| sean-k-mooney | neutron and glance both tried to so testing clsoe to what you descibed and that proved very hard to maintain and debug | 16:02 |
| amoralej_ | well, if we force to use sqlite, there is no other way (assuming we can make sqlite_synchronous work) | 16:02 |
| dviroel | i got the idea, and the requirement that anyone could run both unit and funcional tests without installing/configuring additional services | 16:02 |
| sean-k-mooney | is https://review.opendev.org/c/openstack/watcher/+/975522 the patch your working on? | 16:03 |
| dviroel | I can continue in this one, but was just to show the current issue yeah | 16:04 |
| sean-k-mooney | i am asking because i want to play with it locally | 16:04 |
| sean-k-mooney | i.e. to test an idea | 16:04 |
| dviroel | ah ok, yes you can get this patch | 16:05 |
| sean-k-mooney | so i want too repoduce it first which ithink just needs engine_type = "parallel" | 16:05 |
| dviroel | and use the file based configuration too | 16:06 |
| dviroel | in memory is more limited than the file | 16:06 |
| *** dviroel is now known as dviroel_lunch | 16:11 | |
| *** dviroel_lunch is now known as dviroel | 16:58 | |
| opendevreview | Alfredo Moralejo proposed openstack/watcher master: Add OSProfiler support for cross-service tracing https://review.opendev.org/c/openstack/watcher/+/976280 | 17:28 |
| dviroel | \o/ | 17:57 |
| opendevreview | sean mooney proposed openstack/watcher master: Enable parallel TaskFlow engine in threading mode https://review.opendev.org/c/openstack/watcher/+/976293 | 19:37 |
| sean-k-mooney | dviroel: ^ seams to work | 19:38 |
| sean-k-mooney | we were actully missing decorators https://review.opendev.org/c/openstack/watcher/+/976293/1/watcher/db/sqlalchemy/api.py | 19:38 |
| sean-k-mooney | but that by itslef was not enough to fix it | 19:39 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!