Wednesday, 2022-01-26

*** lajoskatona_ is now known as lajoskatona00:31
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Add upload-logs-ibm role
*** luigit is now known as luigi07:59
*** amoralej|off is now known as amoralej08:05
opendevreviewMerged opendev/infra-manual master: Update recommended ACL for createSignedTag keyword
*** anbanerj is now known as frenzyfriday08:35
*** jpena|off is now known as jpena08:38
*** rlandy|out is now known as rlandy|ruck11:12
*** dviroel|afk is now known as dviroel11:21
mnasiadkaDoes anybody have an idea why DIB CI nodepool-functional jobs are timing out (all of them) in ?11:39
*** odyssey4me is now known as Guest65212:40
*** sboyron_ is now known as sboyron12:41
louroto/ not sure if you have noticed - thanks!12:50
*** amoralej is now known as amoralej|lunch13:09
fungimnasiadka: if it's the one i'm thinking of, it needs updating to do zk over tls. i started trying to update it here:
*** amoralej|lunch is now known as amoralej13:59
*** dviroel is now known as dviroel|lunch15:01
clarkbfungi: mnasiadka it couldn't launch a node because of a python error. I strongly suspect this is an openstacksdk bug16:16
clarkbbecause they apparently rewrote everything16:17
*** dviroel|lunch is now known as dviroel16:23
fungiartom: ^ maybe worth a look16:27
artomfungi, I think you mean gtema?16:28
fungioh, yes sorry! gtema ^16:28
*** artom is now known as artom_not_the_sdk_one16:28
*** artom_not_the_sdk_one is now known as artom16:29
fungid'oh ;)16:29
gtemaoki, looking.16:29
gtemaI expect I would need to make some patches to nodepool16:30
gtemaI guess it would anyway make sense to cap version of sdk for zuul, cause ansible 2.9 will stop working properly with new version (as announced some months ago)16:32
clarkbmaybe I'm crazy, but this sort of thing makes me think we need to fork shade back out of sdk16:33
clarkbthe whole point of shade was to avoid these problems16:33
clarkband now we've reintroduced them16:33
clarkbnote that nodepool and ansible don't interact. It is zuul and ansible and the sdk installed for that that would need to be pinned for those issues16:34
clarkbBut again, shade existed because these problems in the tools were pervasive and users (us) found it extremely frustrating to use the tools16:34
gtemabut nobody is going to further maintain shade - it is nightmare looking to the evolution of all services16:35
clarkbwell I think the real nightmare is breaking users constantly16:36
clarkbbecause it doesn't matter if no one wants to use your software16:36
gtemawhat you say is more or less then "never touch a running system" (no offence)16:36
clarkbno thats not true16:37
clarkbwe made this work for many years with shade16:37
clarkbpapering over all the warts (which absolutely do exist)16:37
clarkbit was work and it wasn't always fun (glance image upload tasks ugh)16:37
clarkbBut it was also important that end users could use cloud apis in a portable manner without thinking too hard or needing various pins. I mean what are we supposed to do if one version of sdk supports cloud foo and another supports cloud bar16:38
fungiin actuality it was code inside nodepool. we extracted it from nodepool and named it shade16:38
gtemaand that is exactly what I am trying to improve. When there are  20000 different interfaces (and mostly for the doing the same) that are not possible to be maintained you will come to point of not having fun with it16:39
fungiwe could in theory just re-absorb it into nodepool16:39
clarkbgtema: I don't think shade had 2000 different interfaces though. It had "upload image" "boot instance" and so on. It was very high level and intentionally so with an attempt to try and paper over this stuff16:39
gtemafungi - it will not help for ansible. 16:39
clarkband it was my udnerstanding that the sdk intended to keep that functionality otherwise I would never have been on board with the two merging16:39
gtemaclarkb: shade - no, but once it was spinned off it absorbed tons of other cases for ansible16:40
fungigtema: that wasn't ansible where we errored16:41
gtemaand since we are not able to update modules in ansible 2.9 any more we also can not make any changes in sdk16:41
clarkbit is also possible nodepool grew more non shade uses of the api16:41
gtemafungi - this particular case sure. I am talking generally16:41
clarkbBut I do think it is worth thinking about the impact to end users here if they need different sdk versions to talk to two different clouds16:41
clarkbSpecifically if we are breaking interfaces then that becomes much more difficult for end users to deal with16:41
gtemaI am not capable maintaining multiple different interfaces in sdk for the same while extending coverage16:42
fungithe reason that code came into existence at all is that we are operating a multi-cloud service which talks to probably a dozen different versions of openstack simultaneously. we thought it might be nice if there was an external api others could use to do the same thing, but if it's not really externally maintainable then we should reconsider that choice and possibly put it back inside nodepool16:42
fungiwhere it came from16:42
clarkbya I guess tahts reasonable. Basically move the two different versions of sdk concerns back into a porcelain library16:42
gtemacorrect, but once there are few ways to achieve the same you can not maintain it reasonably anymore16:43
gtemai really try to keep it as much backward compatible as possible, but there are many dead bodies that are not funny16:43
clarkbgtema: I agree it is difficult. But at the same time if no one wants to use the tools beacuse they break often or don't supports the clouds they need to talk to we aren't really accomplishing much either. It is definitely frustrating that despite having common APIs clouds are able to differ in common tasks so much (image upload is an excellent example)16:44
gtema"often" - what exactly you want to say here?16:44
clarkbgtema: interfaces change so my code no logner works (what happened to nodepool), and new cloud has new APIs but to support that I need to update sdk but if I do that my existing old cloud will stop being able to be talked to16:45
gtemaand going back to the image upload case: It is not sustainable to maintain image upload for nodepool, image upload for osc, image upload for ansible16:45
clarkbThe first happens due to changes in the library interfaces. The second happens when APIs/functionality for older installations is removed16:45
clarkbgtema: why do you need an image upload for each of them?16:45
clarkbgtema: there should be one high level interface that works for the 80% case16:46
clarkb(this is what shade did)16:46
gtemabecause it is currently similar to that: there is "cloud" layer interface, there is "proxy" interface, and then there is still possibility to do direct operations bypassing all resources16:46
clarkbit made reasonable decisions based on the cloud you were talking to. It didn't need to do everything16:46
gtemaand nodepool/ansible/osc are currently all using different things16:46
clarkbgtema: so are we removing everything ut the low level primitives then?16:47
gtemaI decided to normalise this by switching all those usages to proxy16:47
gtemato have real support for microversions16:47
clarkbok so shade really is being removed (that was the cloud layer)16:48
gtemano, we are not removing, but just inside of the call I rewrote it to use proxy layer (what was done not consitently before)16:48
clarkbah ok16:48
gtemaexample with limit is from another side - every service implements limits differently16:48
fungiso for that specific error, is max_total_instances no longer an available attribute? was it deprecated previously and we missed it? or not supposed to be part of the public api?16:49
gtemaalso here I worked on introducing common class for that so that every user can "trust" the interface of limit/quota/etc for any service16:49
gtemait is there, but most likely need to be fetched from different place16:49
gtemalemme check pls16:49
clarkbfwiw I'm more worried about how changing interfaces causes me to have to choose between different clouds if I want ot use sdk than fixing specific updates for an itnerface16:50
fungiso just a simple case of missing a backward-compatible alias for that attribute?16:50
clarkbthey are related problems though16:50
gtemafungi: that's possible, but not necessarily16:51
clarkbalso this isn't a theoreticaly concern. To talk to rackspace volume api we need cinder v1 which isn't in openstacksdk anymore aiui16:53
clarkbthis means that any tools we build to do volume management (like our cloud launcher script) will need to run with different versions of sdk potentially16:54
clarkband either have different implementations or some sort of shade like layer to accomodate differences16:54
gtemavolumev1 was actually never in sdk16:54
clarkbhrm are you sure? our cloud launcher uses sdk and definitely makes volumes against rax16:55
gtemaI never seen it16:55
clarkboh it might just rely on the nova api for boot from volume though16:55
clarkbsince nova can request a volume and delete it when the instance is deleted16:56
gtemaand the issue on that side is that we are not even able to test things cause services dropped them completely16:56
gtemanova support for networking is still in16:56
fungiyes, the bigger pain is getting the openstack cli to work with volume subcommands for rackspace's cinder v1 api16:56
fungii've resorted to having multiple versions of the cli installed for talking to different clouds16:57
gtemaso, max_total_instances: in reality resource has been changed (to accomodate services evolutions) - it now has absoluteLimits and rateLimits16:57
fungiwith no backward-compatibility alias for the old attribute, i guess16:58
gtemaI can make a "special" conversion for nodepool case and mark them "red" in the code to keem them working16:58
fungithat probably should have followed deprecation guidelines if it was intentional16:58
funginodepool may not be the only consumer of that attribute16:59
gtemafungi - deprecation once we have never reached major release is also something complex16:59
gtemathe work is exactly to finally normalize interfaces and make r116:59
gtemanodepool is easy to fix, I have more worries on zuul (ansible) side17:00
fungiansible... testing side? zuul doesn't use openstacksdk17:01
gtemamodules were written not really correctly (or basically ansible makes some assumptions on objects) - it tries to modify them17:01
gtemaand we may have attributes that are not allowed to be modified (read only)17:01
gtemafungi - yes, ansible on the jobs side17:02
fungigot it17:02
clarkbfwiw I don't think many jobs use ansible + openstacksdk without nested ansible17:02
clarkbbut I could be wrong about that17:02
clarkbIts definitely a concern for ansible + sdk wherever that may be though17:03
gtemanot in opendev afaik, but users might do this17:03
fungigtema: anyway, i'm happy to propose a patch to nodepool for the new openstacksdk interfaces if you have a list of which old attributes were removed/replaced17:03
fungiin the meantime we likely do need to pin openstacksdk to the previous release17:03
gtemafungi - I have EOB here, will doublecheck over night/early tomorrow17:04
clarkbwell I think those jobs intentionally deploy sdk from source17:04
clarkbso pinning might be the wrong choice. Not sure17:04
fungioh, got it if that's not yet in 0.61.0 then we have some time17:04
gtemathat's in master currently17:05
fungiwe can probably make nodepool flexibly use the old and new versions with a try/except or something so that it supports multiple sdk versions17:05
gtemaand I wanted to have some time exactly to catch issues like that before making rc17:05
clarkbright the jobs are testing sdk from source so that we find these issues early and call them out17:05
clarkbsince the idea is/was nodepool is the canary for shade17:06
gtemasadly some time ago we disabled nodepool jobs cause they were permanently failing and therefore this was not noticed on our side17:06
clarkbgtema: they run reliably for nodepool and dib17:06
clarkband I think glean17:06
fungibut yes, if you want users to have fewer surprises, an audit of the interfaces which have gone missing since the last release and then adding compat aliases for them for at least the next cycle while calling them out as deprecated would be great17:06
gtemanow yes, but 5-6 month ago they were not17:06
gtemaas I said, I will think and maybe return this limits stuff (in the nodepool usage way, which was itself really specific). Or I propose fix into nodepool to address new structure17:08
clarkbI think we should consider an sdk pin to the previous version now since we know some things are going to change. Then going forward try and keep up with the updates17:09
gtemayes, this sounds great. Thanks17:09
gtemaone more time - please do not think I like to break things. I only to try to make this (in the meanwhile) beast maintainable17:10
clarkbgtema: yup I understand. Its mostly that we've spent yaers with openstack repeatedly breaking users on these api interfaces and wondering why we get frustrated17:11
gtemaI know, and this is also one of the things that I want to make sure through the single interface will be more reliable17:12
clarkbI think it would be good to think about who the end user is for the sdk in particular when making changes and designing things. Another complaint I've long had with many of the library tools (python-*client, sdk etc) is they use a form of method lookup that makes it almost impossible to know what a function I call in my code ends up doing. Maybe this refactoring will help that as it17:12
clarkbwill be more consistent17:12
clarkbI find it really frustrating when an SDK for end users isn't greppable for the function names they call17:12
clarkbI think that should be a requirement for any end user facing tool17:12
fungiat least open source ones ;)17:12
clarkbif I call getAttribute() and 'def getAttribute' doesn't existing in the code base I have a sad17:13
gtemacorrect - that is exactly what I try to address: make sure nodepool/ansible/osc/etc are all landing in the same piece of the code17:13
clarkbI think it is fine for code bases that are largely internal facing to have their fancy magic, but anything that random people on the Internet (me!) are expected to look at, use and occasionally debug shoudl be greppable17:14
gtemathat's why cloud layer (a.k.a. old shade) is not maintainable anymore - it gives you 200-300 functions back filling whole screen17:15
gtemathis isn't working anymore17:15
clarkbgtema: well those functions are all greppable though right? it is the proxy layer that does magic based on the name of the api you are interfaces with and it composes function names dynamically before invoking them? I admit I get lost in all that and it may be a different layer that does that17:17
gtemathat is what happens mostly in proxy layer17:17
opendevreviewMarcin Juszkiewicz proposed opendev/system-config master: reprepro: mirror Ubuntu UCA Yoga
clarkbI've always felt that was a bug as it meant people using the SDK had to become experts in how the SDK is built in order to understand the few methods that they are using17:18
gtemathis gives you a lot back17:19
gtemaand `dir(conn.compute)` now gives you only compute methods back17:19
gtemathe issue was that those (i.e. create_server) were/are not doing same17:19
clarkbgtema: no but if I grep create_server I get the function definition17:20
clarkband I think that is a basic fundamental requirement of any external user facing interface17:20
gtemaso depending on whether you call `conn.create_server` or `conn.compute.create_server` you are doing different things17:20
clarkbIt sounds like these improvemtns won't help with grep but basic lookup of functionality will at least be consistent so once you learn how to do that it becomes easier17:22
* gtema sent a code block:
gtemaand ```... (full message at
gtemaso yes - inspect now works17:22
clarkbright but I'm not going to know to use these tools if I've never written python before (and in many cases even if I have) and am just trying to write a tool to talk to my openstack cloud17:24
clarkbThat is the target audience here17:24
fungii write python like it's a shell script ;)17:24
gtemayeah, for those in sdk we have quite lot of api ref and some examples17:26
clarkbmight be good to add use of the repl + inspect to find method defs to the docs. I would find that useful for the next time i'm trying to hunt down some weird behavior17:28
fungiso when you say you're getting rid of the cloud layer in the sdk, will the simple examples like still be as simple, or will doing simple things like authenticating and booting a server instance with external network access remain as simple as it has been?17:29
gtemawell, I have not got rid of it, but ensured that it will definitely land in proxy layer17:29
gtemathat is exactly touching the limits case17:30
clarkbya sounds like the cloud layer is being updated to call the proxy layer for all operations so that its behaviors match the proxy laye17:30
gtemawhich ensure that whichever changes nova will implement and new microversions added it will always work like "we expect"17:30
fungithe main things we wanted from shade were to hide differences between different openstack environments, and abstract away a lot of the complex orchestration around things like attaching networks or floating ips17:30
clarkbfungi: and image uploads17:31
clarkbfungi: I suspect the cloud layer can continue to do that but at the cost of type changes since the proxy layer is returning the results now17:31
gtemathis is still there, but to make it possible you sometimes need to finally sacrifice things that block you17:31
fungibe able to say "i have this local file i want to upload to $cloud, please do whatever is necessary to make it happen"17:31
fungiand let the sdk figure out the rest17:32
gtemacorrect, and that if you tell me something is wrong I can pin to the place in the code where this is definitely happening17:32
gtemacause this "and which interface have you used" question is making me sick17:32
fungiyeah, i can understand that it's hard to field bug reports if people don't show you exactly how they're calling the lib17:33
clarkbya so to summarize the goals of the cloud layer remain the same, but the underlying implementation for teh cloud layer is being updated to use the proxy layer exclusively. This will make it more consistent with the rest of the SDK but will also result in type changes in places17:33
gtemaand with this we also guarantee that the response from different methods (proxy or cloud) is always of the same type17:34
gtemathis is exactly what previously was not the case17:34
fungiso may require some very careful explaining in release nodes/docs, and maybe finally a major version increase ;)17:34
fungier, release notes17:34
gtemathat is exactly the target17:34
clarkbWe know this will break some people (since it already broke nodepool and is expected to break ansible). Ya I think the key then becomes communication not just in "this has changed" but how to fix/update the usage that has broken17:34
gtemaright - is all on my desk17:35
clarkbAnd if users run into problems where old sdk is needed for one cloud and new sdk is needed for another cloud. We're punting on that for now?17:35
fungifor nodepool specifically, we ought to be able to support old and new sdk at least for some brief period to give people a chance to upgrade17:35
clarkbThis is still my biggest concern just with the problems we've had with things like cinder v1 and rackspace. But maybe there isn';t much to be done with that17:35
fungibut yes, hopefully we can still find a way to be able to use the new sdk with old public cloud deployments17:36
gtemawrt "transition" period it doesn't really feel possible with such huge change. because of that I plan exactly to cut major release to be able to have those "breaking" changes17:36
gtemaif required we can always make some interim release (from before the r1 merge state)17:37
fungiopenstack doesn't have much control over when major cloud providers do their upgrades, if ever. we can declare bankruptcy on some of them, and requre consumers of the software to implement their own support for old apis, but that's also going to alienate users of some popular hosting providers and maybe the providers themselves17:37
clarkbfungi: ya thats the tension here and there aren't many good answers17:38
clarkbbasically someone has to take the pain (user, sdk devs, or cloud)_17:38
gtemaso far I haven't seen any good answer17:38
gtemabut again - sdkr1 will still work on older clouds, but it breaks users in few cornercases17:39
*** jpena is now known as jpena|off17:44
*** sshnaidm is now known as sshnaidm|afk17:49
*** amoralej is now known as amoralej|off17:53
corvusi'm going to perform a rolling restart of zuul except the executors20:54
corvusstopping zuul0120:55
corvusrestarting it20:57
*** dviroel is now known as dviroel|afk21:06
corvus01 is up, restarting 02 now.  there will be a web outage.21:11
corvusthat's up.  i'm going to restart the mergers now21:26
jrosseri think that may have caused this
jrosserit likley doesnt matter but just an fyi21:27
corvusyep.  that can be made more robust by load-balancing to multiple zuul-web processes21:27
jrosseragain fyi but this is an odd failure probably related to the restart
jrosserit suggests a rebase but as far as i can see the parent is already the most recent patch21:34
corvuscan happen with a merger restart; 'recheck' to double check21:36
ianwon the gerrit avatars ( i somewhat agree with frickler on the concerns with gravatar23:07
ianwi found interesting; and i think hashar_ and co. went through just about everything i was thinking23:07
ianw(have a job where people upload their own avatars to a repo, use a proxy, etc.)23:08
ianwso far it actually seems to me that the grey circle should be disabled if you don't have a avatar plugin, so i'm not sure what's going on yet23:09
hashar_ianw: o/23:09
hashar_oh yeah that was a bit of an epic discussion23:10
hashar_the gravatar based backend is all fine and work out of the box23:10
hashar_the problem Wikimedia has is that we have a very tight privacy policy and we don't want to leak any browsing activities to third parties23:10
hashar_well it is not a problem, more of a constraint23:10
hashar_so we can't use the gravatar backend cause that means leaking info to the company running it. Though we could have set up a local proxy or implement our own backend serving avatars people fill in their profile on our Phabricator23:11
hashar_then we dont have the same account on Gerrit and Phabricator23:12
hashar_so essentially after a few rounds of discussion we went with "sure it is a great idea, but that requires significant amount of work here and there for something that is not really critical to our mission" and it got declined23:13
clarkbI personally prefer not to have avatars at all...23:13
hashar_that is the summary on the top of my head23:13
clarkbianw: ya if we can just disable it that might be best23:13
hashar_Google / Gerrit consider folks have avatars so there might be ui glitch when your instance do not have avatars enabled23:13
hashar_I had a bug filed cause the name was touching the left side of the gray ship with 0px padding which was a bit annoying23:14
ianwyeah, i guess that comes from the google account23:14
hashar_probably yes23:14
hashar_the good news is that Gerrit is properly maintained at Google, they have a dedicated UX afaik and more than a few UI developers23:14
hashar_and I think they genuily try to be as reactive as possible regarding non google request. I had really positive experience since I took over maintenance of our gerrit23:15
hashar_disabling avatar might be made a per user option23:16
clarkbyes, as I've gotten more involved I've been happy with working with upstream. Still trying to learn their expectations and processes though23:16
hashar_and if you have a source of avatars on your infra, I imagine it is not the end of the world to write the java adaptor in the plugin23:16
ianwthey do merge in a weird way :)  but yeah, everyone has been helpful23:16
hashar_the "repo-discuss" list is the way to go it is quite great23:16
ianwyeah, the source of avatars is the issue.  we could have people upload to a specific repository (i noted that was an option wikimedia explored :)23:17
hashar_and a side channel on Slack  (might require some invite, I can't remember how I joined it )23:17
clarkbI had to ask Luca but he got me on there23:17
ianwbut to not leak a list of usernames i guess we would have to have people hash their image, and then write something for the external-avatars plugin to hash the username in the url23:17
ianwand also, we have to have someone to maintain that repo23:18
ianwi think i'll try and get 3.5 into our infra, and we can look at the screenshots and see if the problem still occurs23:19
ianwit does seem like it should be checking, e.g.
hashar_feel free to reach out to tgr who filed the task ( ).  He is on and might have some insights23:20
hashar_I am not sure in which timezone he is though23:20
*** hashar_ is now known as hashar23:21
hasharI am off it is past midnight here.  Happy hacking!23:21
*** hashar is now known as Guest70523:21
Guest705clarkb: congratulations on the Gerrit 3.4 upgrade!23:22
clarkbthanks! ianw did a lot of the work23:23
*** Guest705 is now known as hashar23:24
hasharwell congratulations ianw for the Gerrit 3.4 upgrade23:25
hasharI am still working on it and fixing some javascript here and there23:25
hasharanyway, it is past midnight I said .So bed time!23:26
fungii agree that leaking browsing activity and content to a third party like gravatar is not great for a privacy-conscious deployment like we strive to maintain23:31
fungii'm not sure if we'd need to hash the lookups since gerrit already puts e-mail addresses for everyone in tooltips anyway, but it might be nice if we ended up using it for other services23:32
clarkbfwiw I think gerrit has had a lookup for avatars for a long time it just 404s because our server has never been set up to return them. I think the difference now is you get teh grey orb23:32
clarkbrather than just not rendering at all23:33
ianwi feel like that only happens though if you have the avatars plugin installed, which is either gravatar or the "lookup a url" external one23:37
clarkbya I don't have any 404s in my request log. Old gerrit would 40423:37
clarkbI guess the grey orb is expecting a request and a response and then gets filled over like a template?23:38
clarkbso ya a UI bug then23:38
*** rlandy|ruck is now known as rlandy|out23:41
clarkbthere is a hideAvatar flag in the hard to read js for the names23:43
ianwyeah, i think that comes from 23:44
ianw  const hasAvatars = !!_config?.plugin?.has_avatars;23:44
ianw    this.cancelLeftPadding = !this.hideAvatar && !hasAttention && hasAvatars;23:44
ianw?. is a new one for my javascript box23:45
clarkbit does default to False23:45
ianwyeah, which suggests to me config.plugin.has_avatars is somehow true ... but when i look at23:46
ianw(can't find the link, you can get a config json)23:48
ianw curl
clarkbya that doesn't have it and if not set it is false according to the docs23:50
clarkbI suspect that this is a bug in handling that somewhere then23:50

Generated by 2.17.3 by Marius Gedminas - find it at!