*** gyee has quit IRC | 00:07 | |
*** hoonetorg has quit IRC | 03:02 | |
*** rcernin has quit IRC | 03:14 | |
*** hoonetorg has joined #openstack-glance | 03:16 | |
*** rcernin has joined #openstack-glance | 03:16 | |
*** rcernin has quit IRC | 03:24 | |
*** rcernin has joined #openstack-glance | 03:30 | |
*** rcernin has quit IRC | 03:45 | |
*** rcernin has joined #openstack-glance | 03:48 | |
*** udesale has joined #openstack-glance | 03:52 | |
*** rcernin has quit IRC | 03:56 | |
*** ratailor has joined #openstack-glance | 04:00 | |
*** rcernin has joined #openstack-glance | 04:05 | |
*** evrardjp has quit IRC | 04:34 | |
*** evrardjp has joined #openstack-glance | 04:34 | |
*** jmlowe has quit IRC | 04:46 | |
*** jmlowe has joined #openstack-glance | 04:49 | |
*** zzzeek has quit IRC | 05:41 | |
*** zzzeek has joined #openstack-glance | 05:41 | |
*** m75abrams has joined #openstack-glance | 05:59 | |
*** m75abrams has quit IRC | 05:59 | |
*** m75abrams has joined #openstack-glance | 06:00 | |
*** zzzeek has quit IRC | 06:36 | |
*** zzzeek has joined #openstack-glance | 06:39 | |
openstackgerrit | Abhishek Kekane proposed openstack/glance master: Fix: Interrupted copy-image leaking data on subsequent operation https://review.opendev.org/737867 | 07:16 |
---|---|---|
*** zzzeek has quit IRC | 07:36 | |
*** zzzeek has joined #openstack-glance | 07:37 | |
*** rcernin has quit IRC | 08:17 | |
*** jawad_ax_ has joined #openstack-glance | 08:35 | |
*** jawad_axd has quit IRC | 08:39 | |
*** jmlowe has quit IRC | 08:48 | |
*** jmlowe has joined #openstack-glance | 08:50 | |
*** bhagyashris is now known as bhagyashris|afk | 09:55 | |
*** tkajinam has quit IRC | 10:22 | |
*** bhagyashris|afk is now known as bhagyashris | 11:00 | |
*** udesale_ has joined #openstack-glance | 11:52 | |
*** udesale has quit IRC | 11:54 | |
mordred | dansmith: fixed https://review.opendev.org/737608 | 12:32 |
mordred | dansmith: I've rechecked your nova test change to be sure | 12:32 |
*** ratailor has quit IRC | 13:25 | |
dansmith | mordred: hmm, same deal? https://zuul.opendev.org/t/openstack/build/1acf786e19fa460c97ee8c930e7fdfd4/log/controller/logs/devstacklog.txt#19181 | 13:26 |
dansmith | oh wait, it didn't run on your last recheck | 13:29 |
dansmith | abhishekk: hmm, does glance's fast8 not work? | 13:30 |
openstackgerrit | Dan Smith proposed openstack/glance master: Add image_set_property_atomic() helper https://review.opendev.org/737868 | 13:35 |
abhishekk | dansmith, you mean pep8? | 13:42 |
dansmith | no, fast8 | 13:42 |
dansmith | I had run fast8 and gotten a pass so assumed my patch was clean, but apparently that doesn't work in glance, even though the tox target runs | 13:43 |
dansmith | fast8 smartly just checks the things that have changed since master, so it runs way way quicker on nova | 13:43 |
dansmith | maybe pep8 runs quick enough on glance that you've had no need I guess | 13:43 |
abhishekk | I never tried fast8 | 13:43 |
abhishekk | will have a look though | 13:44 |
dansmith | abhishekk: assume you saw my atomic update patch? | 13:44 |
abhishekk | dansmith, yes, I saw it | 13:45 |
abhishekk | We need API wrapper around it | 13:46 |
dansmith | yeah, assumed you would make your patch use it | 13:46 |
abhishekk | I will update my patch with dependency of yours | 13:50 |
abhishekk | need some time for that though | 13:50 |
dansmith | ack, it would be nice to get an ack from zzzeek too | 13:52 |
abhishekk | agree | 13:54 |
abhishekk | rosmaita, jokke, smcginnis, dansmith glance weekly meeting at #openstack-glance in 5 minutes | 13:55 |
dansmith | oh is it in here? wiki says meeting-4 | 13:55 |
abhishekk | damn | 13:57 |
abhishekk | dansmith, #openstack-meeting | 14:00 |
dansmith | heh okay | 14:00 |
* abhishekk I really need some time off | 14:00 | |
abhishekk | rosmaita, around? | 14:02 |
rosmaita | omw | 14:02 |
abhishekk | ack | 14:02 |
*** alistarle has joined #openstack-glance | 14:05 | |
openstackgerrit | Dan Smith proposed openstack/glance master: Check authorization before import for image https://review.opendev.org/737548 | 14:13 |
*** alistarle has quit IRC | 14:42 | |
*** alistarle has joined #openstack-glance | 14:48 | |
*** m75abrams has quit IRC | 15:00 | |
abhishekk | dansmith, updated the meeting channel in wiki | 15:03 |
dansmith | cool :) | 15:04 |
dansmith | abhishekk: if I stage some new image content for an image, and then do a copy-image to a new store, I would end up with two locations for the image, with different image content right? | 15:12 |
dansmith | since you just check the image size (as of your patch) I could cause havoc if I innocently did that right? | 15:12 |
abhishekk | dansmith, yes, if you managed to staged exact same size of image in satging | 15:16 |
*** jawad_ax_ has quit IRC | 15:16 | |
dansmith | that seems, like, bad and stuff :) | 15:16 |
dansmith | I would hope that glance is using the hash to ensure that every location (at least the ones it populates) are identical | 15:16 |
dansmith | I mean, as a user I'd expect that | 15:16 |
dansmith | is there any image.fsck() command to recalculate and validate the hash on each location? | 15:17 |
abhishekk | dansmith, again, then it will pass this check but will fail when it matches the checksum and hash with original image later | 15:18 |
*** jawad_axd has joined #openstack-glance | 15:18 | |
dansmith | I'm not sure what you mean...what will fail later? | 15:19 |
*** bhagyashris is now known as bhagyashris|afk | 15:19 | |
abhishekk | So if you stage some new image with exact size of the image you want to copy in staging area | 15:20 |
abhishekk | and the you initiates copy image operation | 15:20 |
abhishekk | This will pass the the current check which I have added in the patch | 15:21 |
*** tosky has joined #openstack-glance | 15:22 | |
abhishekk | But During import task when that image is imported to desired store, it will check size, hash and checksum of new image with the original image | 15:22 |
*** jawad_axd has quit IRC | 15:22 | |
abhishekk | https://github.com/openstack/glance/blob/master/glance/location.py#L522 | 15:23 |
abhishekk | and either of them mismatched it will fail | 15:23 |
tosky | hi glance people, can you please add this (not critical) change to your review list? https://review.opendev.org/#/c/733023/ | 15:23 |
dansmith | ah, okay, so we will upload what we find in store (if size matches), but after upload, if the hash wasn't the same, we'd fail, and delete the copy we just made instead of adding the new location? | 15:24 |
tosky | It removes the last legacy job | 15:24 |
abhishekk | dansmith, yes | 15:25 |
dansmith | abhishekk: gotcha okay | 15:25 |
abhishekk | tosky, ack | 15:25 |
dansmith | abhishekk: that's the answer to my question in the review about hash vs size, which I just said more about, so just ignore :) | 15:25 |
abhishekk | dansmith, ack :D | 15:26 |
* abhishekk dinner break, back in 30 | 15:26 | |
*** udesale_ has quit IRC | 15:28 | |
jokke | dansmith: yeah, we upload through hashing so we don't have verification what landed to the store is what we expected, but we have checkpoint that what we sent matches to the metdata | 15:31 |
jokke | dansmith: unfortunately I think none of the store backends gives us nice interface to get hash of what they have stored for us. Also when consumed clinet will do hash verification of what it receives | 15:32 |
dansmith | ack, I was wondering why we didn't hash on the download to staging, store the hash in a sidecar and then avoid needing to copy the bad data to the new location just to check the hash | 15:32 |
dansmith | but as long as it's doing it sometime, that is the important part | 15:32 |
*** udesale has joined #openstack-glance | 15:34 | |
jokke | yeah | 15:35 |
jokke | The copy is utilizing lots of the tasks from the already existing import flows, we could have done one more hashing check in between like you said, but a) it's async task so we're not holding client waiting and b) it's done already on the later stage so quite diminishing returns | 15:38 |
jokke | It's not like majority of the transactions are corrupt | 15:38 |
dansmith | heh, hashing is for catching the 0.001% that are, but yeah I understand | 15:39 |
jokke | yup and we still catch it, just after bit of extra work :P | 15:39 |
dansmith | yup | 15:40 |
*** jawad_axd has joined #openstack-glance | 15:53 | |
zzzeek | hey dansmith I just noticed I think you need to include "WHERE value=None" in the WHERE clause there, otherwise the UPDATE will match the row regardless | 15:55 |
dansmith | zzzeek: I don't think that will work because we don't know the value is None, it's whatever the last attempt set it to | 15:56 |
zzzeek | OK if "name" is not being updated to a unique token then the rows matched here will not be zero | 15:56 |
zzzeek | the idea is, there's a row, and you expect to have "<values>" in it. if some other thread gets the row, then "<values>" is gone | 15:56 |
zzzeek | that's how you check atomically | 15:57 |
dansmith | zzzeek: I just want the presence of a row with name=$mykey to be atomic, not anything to do with the value column | 15:57 |
*** jawad_axd has quit IRC | 15:57 | |
zzzeek | OK so then shouldnt you include "name" in the SET clause ? | 15:57 |
dansmith | but the name won't change, why do I need to set it? | 15:58 |
zzzeek | dansmith: OK then this won't work. you're trying to grab a "lock" on that row and we are trying to do compare and swap | 15:59 |
dansmith | zzzeek: can you look at these tests and see if that clears up what I'm going for? https://review.opendev.org/#/c/737868/3/glance/tests/functional/db/test_sqlalchemy.py | 15:59 |
zzzeek | dansmith: oh ok "deleted" is the significant column | 16:00 |
dansmith | right | 16:01 |
zzzeek | thoguht there was some kind of token OK | 16:01 |
dansmith | only one should be able to un-delete it right? | 16:01 |
zzzeek | but that's not the "lock" | 16:01 |
zzzeek | yes within the scope of this method, if you set deleted to zero, another UPDATE looking for deleted == 0 will not find the row | 16:01 |
zzzeek | so as far as DBDuplicate, is this UPDATE setting a set of values that has a unique constraint on it? | 16:02 |
zzzeek | like unique on (name, value, deleted) or soemthign? | 16:02 |
dansmith | the only constraint is over (image_id, name) deleted is not included, | 16:03 |
dansmith | which means you can't even create a new row with name if there's already an existing deleted one (which is weird) | 16:03 |
zzzeek | OK so you are only affeting value and deleted so this UPDATE can't raise a dbduplicate | 16:03 |
dansmith | that's why I'm having to update-or-insert.. I would have expected all I needed was the insert until I looked at their schema | 16:03 |
dansmith | okay gotcha | 16:03 |
zzzeek | the INSERT can | 16:03 |
dansmith | yeah | 16:04 |
zzzeek | if you were updating another row to have the identical name/image_id as this one, that woudl raise a cosnraint error | 16:04 |
jokke | name is not unique constrain | 16:04 |
zzzeek | which is the dbduplicate | 16:04 |
jokke | only ID is | 16:04 |
dansmith | jokke: UniqueConstraint('image_id', | 16:04 |
dansmith | 'name', | 16:04 |
dansmith | name='ix_image_properties_' | 16:04 |
dansmith | 'image_id_name'),) | 16:04 |
dansmith | zzzeek: makes sense | 16:05 |
dansmith | jokke: https://github.com/openstack/glance/blob/master/glance/db/sqlalchemy/models.py#L154 | 16:05 |
jokke | dansmith: Ok, let me correct. Should not be and is not treated as such in Glance | 16:05 |
dansmith | jokke: I don't understand the difference :) | 16:05 |
jokke | interesting | 16:05 |
jokke | I wonder what that is actually affecting | 16:06 |
dansmith | well, if I create a property, delete it, and try to insert another such row, I get a duplicate exception, since the UC doesn't cover deleted | 16:06 |
dansmith | presumably the property updates are looking for existing rows (ignoring deleted) and updating them if they find them, undeleting them | 16:07 |
dansmith | but I didn't fully chase that as it can't really work any other way | 16:07 |
jokke | Oh yes, that the property name, not the image name | 16:08 |
dansmith | right, we're talking about properties here | 16:08 |
jokke | sorry, my bad | 16:08 |
dansmith | so all good? | 16:08 |
jokke | gotta love the overloading of keys :D | 16:08 |
jokke | yup | 16:08 |
dansmith | cool | 16:08 |
jokke | I was like "WTF, I surely can create multiple images with same name so where is that overwritten" but yes, property names are unique per image | 16:09 |
dansmith | heh yeah | 16:09 |
jokke | As I would have not been surprised it being overwritten somewhere else before actually used :P | 16:10 |
openstackgerrit | Dan Smith proposed openstack/glance master: Add image_set_property_atomic() helper https://review.opendev.org/737868 | 16:14 |
dansmith | jokke: so for the policy thing.. it seems like the "is owner or admin" stuff is baked super deep into glance, such that the context we start the task with determines whether or not the image can be updated at all | 16:15 |
dansmith | jokke: in nova we've moved as much policy stuff to the API as possible, so my tendency is to do the check in the API (like I'm doing for the current bugfix), but obviously I need to do something to make the task able to update the image | 16:16 |
dansmith | so ... what's the best strategy there? have the task check policy and use an admin context for the update, or check in the API as well, and pass an admin context to the task? | 16:17 |
jokke | dansmith: we have had this dream to refactor the policies in glance for I think at least couple of years now. Just no-one who has enough interest and/or time to do it. The policies in Glance are real mess as some are closer to API when some are right at the db layer or any of that damn onion in between | 16:18 |
dansmith | yep, been there :) | 16:18 |
jokke | dansmith: and the onion model is very descriptive. Every time you try to cut through, it makes you cry. | 16:18 |
* dansmith nods | 16:18 | |
jokke | So majority of the import code just directly bypasses the domain model. If we can work around that, great. One less thing that needs refactoring when we get to that | 16:20 |
dansmith | I don't know what the domain model is. I've chased calls through it and though "gateway" but I don't really know what they are | 16:20 |
jokke | If we need to inject "No adding the location from this user is really ok, we checked it already" to the context so be it :P | 16:21 |
jokke | dansmith: just that the gateway + ~dozen layers of different "Repo" proxies layered on top of eachother | 16:22 |
jokke | dansmith: some of which defined in glance/domain/ | 16:23 |
dansmith | it's more than just locations that needs updating.. the reason this fails now is because the task can't even start by adding the store to os_glance_importing_stores | 16:23 |
dansmith | is there some reason that I can't pass the actual image to the task in task_input? Allowing the API to grab a non-immutable image and pass it to the task and say "here use this instead of fetching it yourself" seems like it might be reasonable | 16:24 |
jokke | dansmith: ouch, yeah, there is that | 16:24 |
*** jawad_axd has joined #openstack-glance | 16:25 | |
dansmith | alternately, | 16:25 |
dansmith | if we're doing the auth check in the api (which we are as of my patch), then is there any reason to not just let the copy task run with an admin context entirely? | 16:26 |
jokke | dansmith: I think all of it is done in the context of the request. There really isn't any more real image object | 16:26 |
dansmith | not sure what you mean.. this is failing now because image_repo.get() returns an ImmutableImage which refuses to allow the task to update image.extra_properties['os_glance_importing_stores'] | 16:27 |
dansmith | that happens inside the task, as the task has only image_id and the user's context, which is what causes image_repo to return the immutable form | 16:28 |
jokke | dansmith: yes, that immutable image is returned based on the context. If you call that as owner or admin that all changes. That's what I mean we do everything in the scope of the request context | 16:30 |
dansmith | okay, that's why I was suggesting running the task with an admin context instead of the user, | 16:31 |
openstackgerrit | Vlad Gusev proposed openstack/glance stable/ussuri: Fix hacking min version to 3.0.1 https://review.opendev.org/738062 | 16:31 |
dansmith | because image_repo.get() won't know that we're doing a copy and that it should give a non-owner a mutable image object | 16:31 |
jokke | dansmith: so we would do all the permission/policy checks in the api call, deem it ok and safe and the swap the context to admin when passed to taskflow? | 16:33 |
dansmith | yeah | 16:34 |
dansmith | or, | 16:34 |
jokke | I'm kind of ok with that, not sure if it's can of worms the rest want to open | 16:34 |
dansmith | we just use an admin_context in get_flow() to get the image if there's other stuff the user's context should use | 16:35 |
jokke | I think we need to go through the copy-image taskflow and just make sure it's static (no plugins will be enabled for it) and see there is nothing that can be leaked into the from the import parameters etc. | 16:35 |
dansmith | but based on the way this works, I'm not sure what else we could do, other than a flag on the context that says "let this context do stuff" or of course something more complicated like a PartiallyMutableImage class that allows updating locations, and the couple of properties we need, but I'm not sure that's really the right thing to do | 16:36 |
jokke | I really really hate getting into these situations just because of the crappy core design :( | 16:36 |
jokke | Indeed | 16:37 |
dansmith | yeah the middle of this is so naive it's hard to do something better, which is why I figured just checking auth up front and letting the task run as admin is the most in line with everything else | 16:37 |
jokke | And I'm not sure how much refactoring the kind of partially mutable object would need | 16:37 |
*** alistarle has quit IRC | 16:37 | |
jokke | it's probably needing tons of work every layer between the API and DB | 16:38 |
dansmith | right, any time the task changes behavior slightly, that class would need to be tweaked to allow whatever it does | 16:38 |
dansmith | note we could also do this admin context thing only for import where method='copy-image', so that normal imports continue to run as the user | 16:39 |
dansmith | since normal imports do things like set the image active, etc | 16:39 |
jokke | dansmith: yup, and you can imagine how many other things there is that are exactly the same. You think you fix one simple thing and then you spend next week tracking down those proxy classes that needs to be changed to allow that passing :( | 16:39 |
dansmith | yup | 16:40 |
jokke | dansmith: yes, if we are going to that route it will be _only_ the copy-image | 16:40 |
jokke | and we need to harden that taskflow | 16:40 |
jokke | to make sure it doesn't leak permissions on anything else than it's supposed to | 16:40 |
jokke | For me it kind of makes sense and is yet another reason why we should try to find time to fix the core. But I can only speak on my own behalf | 16:41 |
jokke | need to check what abhishekk rosmaita smcginnis thinks of it | 16:42 |
* abhishekk reading scroll backs | 16:42 | |
dansmith | ack, I can extend my existing check in the API and make it run the task as admin in the copy-image case, but probably need some help auditing the task to make sure it's safe to do that | 16:42 |
jokke | dansmith: yeah, I'll take some time to read the execution path through | 16:43 |
abhishekk | dansmith, do we also need to make sure to not include any other import plugins if import operation is copy-image? | 16:43 |
dansmith | I don't know.. can the import task do other things? | 16:44 |
abhishekk | like we have conversion, decompression, inject_metadata | 16:44 |
dansmith | yeah, do those get called from the same task? | 16:44 |
jokke | dansmith: so how it works is that the methods are "plugins" for the API that triggers them. But they will add also the user (admin/deployer in this context) definable plugins into the flow. And I personally think if ran in admin context we need to do that as the plugin chain is common for every method | 16:46 |
*** gyee has joined #openstack-glance | 16:46 | |
dansmith | the real problem here is that copy-image is a plugin or method of import, where it really should probably be a separate action right? | 16:47 |
dansmith | because the others are all dealing with changing the content of the image itself, which definitely is owner-only, | 16:47 |
dansmith | but copy-image is a little different | 16:47 |
jokke | dansmith: all three import methods are that. They are all different combinations of same Taskflow Tasks | 16:48 |
dansmith | so api_image_import is a composite task for anything that is done via import | 16:48 |
jokke | dansmith: correct | 16:49 |
dansmith | I'm feeling like maybe just giving up is the best course here | 16:51 |
*** udesale has quit IRC | 16:53 | |
jokke | dansmith: I'm very familiar with the Taskflows and how we built it. I definitely can help sorting those bits out once we have figured out the way around the major blocker which is the context | 16:53 |
jokke | dansmith: so if we can agree to run the copy-image in admin context and lock it down, it's no problem for me to get it in shape where it needs to be so we can actually do that | 16:54 |
abhishekk | +1 | 16:55 |
jokke | dansmith: and the more I look this discussion the more convinced I am that we should do some of those changes regardless. | 16:56 |
dansmith | okay, well, I'll try to get it to the point where I can run it as admin, depends-on from my devstack test and then I guess we can go from there | 16:57 |
*** jawad_axd has quit IRC | 16:57 | |
jokke | So please don't give up ... this is more feedback on a fresh feature we've got in ages and I think we got somewhat tunnelvisioned in some of the decisions and implementation due to the lack of feedback for so long | 16:57 |
dansmith | okay | 17:00 |
dansmith | jokke: abhishekk: how about a hybrid approach where I factor out all the things we need to do for a copy-image into a helper, which grabs the image as admin just to do things like os_glance_importing_to_store=$store and the locations thing | 17:09 |
dansmith | I don't know if that will be feasible, but that stuff is a bit messy in the task anyway, | 17:10 |
dansmith | so refactoring it to one place and then saying "this stuff can run as admin" would be a much smaller scope | 17:10 |
dansmith | so, this for example would be a helper call: https://github.com/openstack/glance/blob/master/glance/async_/flows/api_image_import.py#L479-L482 | 17:11 |
dansmith | sort of a "sudo set_image_importing($store)" | 17:11 |
jokke | dansmith: you mean like wrapper that uses image object picked up with admin context? | 17:13 |
dansmith | yeah, let me just write up what I mean with that one thing delegated to the wrapper and I think it will be more clear | 17:14 |
jokke | it would definitely limit the scope of things that makes me afraid | 17:14 |
dansmith | lemme ask another unrelated question | 17:18 |
dansmith | jokke: abhishekk: it seems to me that if two import actions happen on an image simultaneously, the task will munge the stores and os_glance_importing_to_store type things stepping on top of each other | 17:19 |
dansmith | i.e. two nova computes are booting from the same instance in two different edge sites with different rbd clusters, | 17:19 |
dansmith | both ask to have the image copied to their rbd cluster at the same time | 17:19 |
dansmith | one or both will be confused as the two competing tasks will overwrite the other, such that my store disappears from os_glance_importing_to_store, without going into stores or os_glance_import_to_store_failed (or whatever that error property is) | 17:20 |
jokke | dansmith: I was thinking of that, but thought that lets try to get the basics in place before bringing the question up if we should put the new task waiting or just append to the list and continue. | 17:21 |
dansmith | I'm just asking what will happen today | 17:21 |
abhishekk | today you are right | 17:22 |
dansmith | I assume competing tasks will step all over each other and potentially result in the image copied to a store that doesn't end up in stores | 17:22 |
dansmith | okay | 17:22 |
jokke | dansmith: today the second task will overwrite the importing to stores property, everything will go just fine as it should until the first task finishes, tries to remove it's store from the list and blows up :P | 17:22 |
abhishekk | isn't it the same that get solved with the race condition logic? | 17:22 |
dansmith | jokke: right okay | 17:22 |
dansmith | abhishekk: yeah I was thinking more about corrupting the single copy when I reported that, but I'm just extending it to cover internal glance metadata corruption too | 17:23 |
jokke | dansmith: the taskflow will not pull the next store from that property, it has it's own reference list. Just the fact that updating that property will fail I think | 17:24 |
dansmith | abhishekk: your race lock will make it so only one can be started at a time I guess, but you will want to communicate to the user clearly why that is, since the image doesn't change state while copying | 17:24 |
jokke | It actually might not blow up as I'm not sure if it fetches the image object again from db in between. It just might be that it overwrites the property again | 17:25 |
abhishekk | hmm | 17:25 |
jokke | it's either or, not sure | 17:25 |
dansmith | I think in at least several cases it will not fetch the image and not notice that it is blowing away the existing | 17:25 |
dansmith | if that property is being changed, it will just update the property to the new value | 17:26 |
abhishekk | IMO 2nd thread will overwrite os_glance_importing_to_store | 17:26 |
dansmith | if that is not what is changed, the update-match method in the db api might cause you to just explode because something else changed one of those values | 17:26 |
dansmith | abhishekk: yep | 17:26 |
jokke | dansmith: exactly ... not sure how it worked. It might or might not blow up at that point when they are racing to update the property | 17:27 |
abhishekk | then, 1st thread after completing the copy will come back to remove the store from that property and founds that it is not in the list, log a message and continue next | 17:27 |
jokke | So what we might want to do as hardening excersize is to append to that list if it exists instead of just going and overwriting it. And do get_image(imageid) in between every store | 17:28 |
abhishekk | https://github.com/openstack/glance/blob/master/glance/async_/flows/api_image_import.py#L274 | 17:29 |
jokke | yeah, so we do pull fresh metadata for each task instead of passing the image object around | 17:30 |
dansmith | not at the start of the import though, we assume we have the master list of importing stores right? | 17:32 |
dansmith | https://github.com/openstack/glance/blob/master/glance/async_/flows/api_image_import.py#L479-L482 | 17:32 |
dansmith | this ^ uses stores from the task creation | 17:32 |
jokke | dansmith: that takes it from the request and writes it to the property. But the actual taks that executes and removes them one by one pulls the metadata from db to remove the store id from that list when they finish | 17:34 |
dansmith | right, so that one case would clobber any list, which if it runs at just the right time, it could clobber those two properties for a period of time at least | 17:35 |
jokke | https://github.com/openstack/glance/blob/master/glance/async_/flows/api_image_import.py#L441-L456 creates the individual tasks and adds them to the flow. So each of those tasks handles one store in the list | 17:35 |
dansmith | the other cases seem to do the merge, but unless it's done atomically they can still interleave and lose information | 17:36 |
dansmith | right but not atomically | 17:36 |
jokke | yeah, there is no cluster wide locking to ensure no race condition would ever happen if you hammer 500 requests simultaneously | 17:37 |
dansmith | you don't need to do cluster-wide locking, just do it atomically at the DB layer | 17:39 |
openstackgerrit | Abhishek Kekane proposed openstack/glance master: Fix: Interrupted copy-image leaking data on subsequent operation https://review.opendev.org/737867 | 17:49 |
* abhishekk signing out for the day | 17:50 | |
*** jmlowe has quit IRC | 18:11 | |
*** jmlowe has joined #openstack-glance | 18:20 | |
dansmith | holy crap, this goes so much deeper than I even thought it might | 18:39 |
*** jawad_axd has joined #openstack-glance | 18:41 | |
*** jawad_axd has quit IRC | 19:15 | |
*** jawad_axd has joined #openstack-glance | 20:32 | |
openstackgerrit | Dan Smith proposed openstack/glance master: Add a test to replicate the owner-required behavior of copy-image https://review.opendev.org/738101 | 20:33 |
openstackgerrit | Dan Smith proposed openstack/glance master: WIP: Refactor TaskFactory and Executor to take an admin ImageRepo https://review.opendev.org/738102 | 20:33 |
openstackgerrit | Dan Smith proposed openstack/glance master: WIP: Admin import action wrapper https://review.opendev.org/738103 | 20:33 |
*** jawad_axd has quit IRC | 21:06 | |
dansmith | mordred: looks like --import worked | 21:57 |
dansmith | only took 6.5 hours after rechecking this morning to get that result | 21:57 |
dansmith | however, I wonder if you expected this: | 21:57 |
dansmith | 2020-06-25 21:51:02.964679 | controller | | properties | OpenStack-image-import-methods='['glance-direct', 'web-download', 'copy-image']', locations='[]', os_hidden='False', owner_specified.openstack.md5='', owner_specified.openstack.object='images/cirros-0.5.1-x86_64-disk', owner_specified.openstack.sha256='' | | 21:58 |
dansmith | kinda looks like you coped the header into the image properties? | 21:58 |
dansmith | I haven't seen that before, maybe glance is exposing that automatically when you use the import mechanism? | 21:58 |
mordred | dansmith: \o/ | 22:00 |
mordred | dansmith: hrm. no - I was not expecting that | 22:00 |
dansmith | er, I meant "copied" and "the supported import methods header (or whatever)" | 22:01 |
mordred | dansmith: I will make a fixing of that | 22:01 |
mordred | yeah | 22:01 |
dansmith | cool thanks | 22:01 |
dansmith | but thanks all around for making that easier | 22:02 |
mordred | \o/ | 22:06 |
mordred | dansmith: we've made some progress towards making success | 22:06 |
dansmith | I'm sorry, but I'm not willing to concede that 6.5h to run part of one job is "progress" towards anything | 22:07 |
dansmith | but discounting that...yes :) | 22:07 |
*** jmlowe has quit IRC | 22:22 | |
*** jawad_axd has joined #openstack-glance | 22:24 | |
*** jmlowe has joined #openstack-glance | 22:25 | |
*** tosky has quit IRC | 22:40 | |
mordred | fair | 22:54 |
mordred | it's better than if it had taken 65h | 22:55 |
mordred | an order of magnitude better even! | 22:55 |
* mordred remembers those days unfondly | 22:55 | |
*** jawad_axd has quit IRC | 22:56 | |
*** rchurch has quit IRC | 23:43 | |
*** rchurch has joined #openstack-glance | 23:43 | |
*** gyee has quit IRC | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!