Wednesday, 2022-11-23

*** ministry is now known as __ministry03:31
*** yadnesh|away is now known as yadnesh04:03
*** akekane is now known as abhishekk05:16
*** jpena|off is now known as jpena08:29
*** yadnesh is now known as yadnesh|afk08:47
opendevreviewMerged openstack/project-config master: Add Allow-Post-Review flag to OpenStackSDK project  https://review.opendev.org/c/openstack/project-config/+/85997609:04
opendevreviewMerged openstack/project-config master: Add post-review pipeline  https://review.opendev.org/c/openstack/project-config/+/85997709:07
*** yadnesh|afk is now known as yadnesh09:22
Tenguhello there! We're seeing this task taking a "nice" amount of time in tripleo jobs: https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/roles/configure-swap/tasks/ephemeral.yaml#L10809:43
TenguI'm wondering about the reasons of its presence: what's already in /opt at this point, and isn't there any way to get the volume mounted there before writing anything?09:44
opendevreviewCedric Jeanneret proposed openstack/openstack-zuul-jobs master: Add some output to the `find' command  https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/86538310:03
Tengu-^ this should help understand the actual thing.10:04
*** rlandy|out is now known as rlandy|rover11:10
*** dviroel|afk is now known as dviroel11:15
*** soniya29 is now known as soniya29|afk11:23
fungiTengu: it's because some of our donor clouds give us too small of a rootfs (20gb), but provide an unformatted "ephemeral disk" we can format and mount. in those providers we use part of the ephemeral disk for swap (so as not to take up more space on the rootfs with a swapfile) and mount the rest at /opt but need to shuffle the git cache out of the way and then put it back since it's in12:34
fungi/opt on our images12:34
Tengufungi: ah, so the /opt is indeed pre-provisionned...12:34
fungii agree we ought to look at possible ways to speed those tasks up12:35
Tengufungi: are the git repositories there really that useful? I mean, does it represent a real gain of time compared to having to move them?12:35
fungiit can take several minutes to clone nova over the network, for example, and puts a significant strain on our git servers if every build does it (we've seen it result in an instant self-imposed denial of service against opendev.org in the past when we've accidentally uploaded images without a cache there)12:38
Tenguhmm ok. but it also take a long time to then move the content :/12:38
Tengufungi: /opt is replaced in order to free space for /home/zuul, I guess?12:39
fungiyes, and also jobs like devstack/tempest do most of their work in /opt in order to be assured of enough available space12:40
fungithe git refs in /home/zuul are populated from the cache in /opt too12:40
fungiso that the executor only needs to push refs which aren't already in the node's cache12:41
Tenguhmm ok. wouldn't it be possible to connect as root first, check for ephemeral, switch the /home directory (and, why not, create symlinks in /opt for tempest/others), ensure zuul's home is present, and then only run as zuul ?12:41
fungiwell, there's still a need to mount the extra space on /opt unless the job is really going to be able to operate within a 20gb rootfs on some providers and with no swap12:42
Tenguhmm. can't we "just" remove the /opt/git (and, if anything is left in /opt, then only move the content to the new partition)?12:44
fungiit's possible we can shuffle some of this around, though we've set expectations for projects that 1. if you need extra space do your work in /opt and 2. there's a cache of all git repos in /opt12:44
TenguI think "rm" would be faster than "mv" in such case.12:44
Tengui.e. once the cache is used, of course.12:44
fungii'm not sure there's necessarily a "one the cache is used" unless that's "once the job is over"12:45
fungiat least devstack and grenade used to rely on the cache in /opt, but it's possible they no longer do12:46
Tenguoh, so there are other things than https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/prepare-workspace-git/tasks/main.yaml hitting on that cache?12:46
fungithere was at one time, i don't know if there still is so that's what we'd need to find out12:46
Tenguhmm ok. *maybe* we can do a middle-step, by passing a parameter that switches the "mv" to "rm/cleaning"? That way, we may be able to test on some jobs?12:47
fungione example that springs to mind is the tag-releases job, which is triggered by changes merging to the openstack/releases repo but could create and push tags for any project in openstack so can't really add every single project as a required-projects entry for the job12:48
Tengufair enough. So maybe a parameter would help.12:48
TenguI can propose something12:48
Tengufor instance, within tripleo, we have a tripleo-ci repository, and that one is calling the configure-swap role. We could pass some params at this point, allowing to just empty /opt/git instead of moving everything?12:49
fungiobviously we'll want input from some other folks as well. i won't pretend to have the best grasp of what might break (which may not become apparent within the job if it's side effects like overloading our git servers at peak activity), and also i just woke up so still a little groggy12:50
Tengu:))12:50
* Tengu pours some fresh coffee12:50
fungii have a bad habit of rolling over in bed and instantly opening my computer12:51
TenguI have a dog in order to avoid that: first take the dog out, then coffee, and then only laptop :)12:51
fungithe cats got fed right away, since i didn't want them realizing i'm also made of meat12:54
Tengu:D12:54
Tenguthough they don't have any thumb - meaning they may get some issues if the Might Thumbs Owner isn't there anymore12:55
Tengu*Mighty12:55
Tengu(yeah - also have a cat, I know how it is with them ;))12:55
Tenguanyway - have to jump on a meeting, but that's an interesting discussion/topic: optimizing things12:56
Tengufungi: and, once you get some coffee and all, I'll also have a question about unbound, networkmanager and, well, dnsmasq :).12:56
fungiask your dns/network question at your convenience13:00
Tengufungi: (still in a call, but...) so, I see unbound gets configured, but actually not really used - that is, NetworkManager doesn't know about it, and usually this means we'll get the network nameservers (provided by dhclient) instead of the 127.0.0.1 in the /etc/resolv.conf. that's a first thing.13:14
Tengunow, after some searches, it seems NetworkManager support for unbound is ending. Would it make sense to install dnsmasq instead, and properly configure it so that NetworkManager actually uses it for dns caching?13:14
Tenguand, if not, why isn't NetworkManager properly configured to not override the /etc/resolv.conf?13:15
fungiTengu: i'm not sure what direct support nm needs for unbound, it just needs to know to query 127.0.0.1 as its dns server. i guess the question is how do we correctly configure nm to not overwrite the resolver hint when you're restarting it. ianw did a lot of work on nm integration/configuration for recent fedora and centos so probably has a better handle on what's going on there13:16
Tenguwell, I actually knows what to do in nm in order to make it use dnsmasq/unbound.13:17
fungiunbound is really used at the beginning of your jobs, but at some point over the course of the tripleo jobs something triggers nm to overwrite the configuration and after that point you're no longer resolving through unbound's cache13:17
Tengubasically, you can configure main.dns value in its config file and it will do the magic - for instance, if we set the value to "dnsmasq", it will start dnsmasq with the nameservers provided by the DNS, plus additional config we may push in /etc/NetworkManager/dnsmasq.d/13:18
Tenguunbound also used to have the same support, but it was removed lately.13:18
Tengu(sorry, has to focus on the call - back in a few)13:18
fungiand yeah, we really don't want the nameservers provided by dhcp to ever be used from test nodes, even as forwarded resolvers from the local caching resolver13:19
Tengueven as forwader? why so?13:19
fungiso nm "integration" to provide that would be a problem13:19
fungiit's a long story, but basically the resolvers our cloud donors provide are often broken13:19
Tenguerf...13:20
Tenguthat's why it's using google/cloudflare by default then. OK.13:20
Tengurlandy|rover: -^^  guess we'll just go with Sandeep work on ensuring it's using unbound as resolver, without the networkmanager integration, and using the public forwarders we currently have.13:20
Tengufungi: may I then propose a patch against the configure-unbound in order to configure NetworkManager to NOT override the /etc/resolve.conf?13:21
Tenguso that unbound will be used all the way..13:22
fungia good example is rackspace. they've had problems with abusers/compromised machines flooding their resolvers in attempts to use them in ddos attacks, so they have some sort of security device which adds rules blocking client ip addresses which appear to be a problem. but that doesn't take into account that the blocked addresses might get recycled to another tenant's vm and so we13:22
fungiconstantly ended up with test nodes which couldn't resolve anything because their addresses had been previously blocked from reaching the resolvers13:22
Tenguah, yeah, security at its finest :)13:22
rlandy|roverTengu: ack- that is the current plan13:23
fungianyway, yes a patch to configure unbound not to accidentally undo our configuration would be useful13:23
Tengufungi: on it - I'll push it today13:23
TenguI can make it optional.13:23
Tengu(default true, but if we don't want to touch networkmanager config, we can set it to false)13:24
fungimy best guess is that https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/nodepool-base/finalise.d/89-boot-settings is where we'd want to do it13:26
Tengufungi: not here? https://opendev.org/opendev/base-jobs/src/branch/master/roles/configure-unbound/tasks13:28
Tengu(using ansible would be easier, since it's an ini file, and there's a module for that)13:30
fungii guess there's the question of how early would be best to apply the additional configuration13:31
Tenguapparently the configure-unbound is called anyway during the job13:32
Tengubut yeah, seeing the /etc/resolv.conf is overridden at boot, it's probably better to edit that 89-boot-settings.13:33
fungidiskimage-builder sets up some of this when creating the node images, then glean does boot-time configuration based on the detected network info from configdrive/dhcp, then ansible configures things at job runtime13:33
Tenguyeah, DIB modification is probably safer.13:33
Tengufine. I'll propose a patch shortly.13:33
Tengu(after my current call)13:34
fungithere's no rush, things are slow this week and a lot of us aren't around the computer much (and the rest of us are juggling a lot of stuff as always)13:34
Tengu:)13:35
Tengufungi: I'll also propose something against the ansible role in order to be consistent. And this would allow tripleo to depends-on the change for some more testing.13:36
fungiTengu: depends-on to opendev/base-jobs won't work since it's a trusted repo where speculative execution isn't performed13:40
fungito properly exercise that we merge changes to a copy of the role and modify the base-test job to use it instead of the real one, then you can have a proposed change in an untrusted repo which is parented to base-test explicitly (instead of implicitly to base)13:41
Tenguah, right, we could test with another repo though. There are so many of them involved :/13:42
Tenguopenstack-zuul-jobs isn't trusted - I'm mixing things.13:42
Tengu(that was for the /opt thingy)13:42
fungicorrect, ozj is an untrusted repo, so you can depends-on to changes in review for it no problem13:43
Tenguyeah - I wanted to get some insight https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/86538313:43
Tengumaybe it's still worth to get in - up to you folks.13:43
fungizuul's security model prohibits speculatively executing proposed changes for config repos though, because that could be leveraged to expose or exfiltrate secrets and perform other sensitive actions13:44
Tenguof course13:44
Tengufungi: do we have crudini in DIB env?13:50
Tenguhumpf. apparently not.13:52
fungicrudini the magnificent, prestidigitation extraordinaire13:54
Tenguit's a good tool when it comes to ini_file things :)13:54
Tenguwould it be OK to get it in the nodepool-base?13:54
Tengudependecy-wise, it's apparently relying on python3-iniparse13:54
Tenguwhich isn't heavy at this point..13:55
fungiwe could install it outside the chroot. the problem with installing things inside the image which have python library dependencies is that they're very hard to uninstall when they may end up conflicting with libs from pip later. putting crudini into a venv might be another solution since that would isolate it from the system python library path and avoid causing problems for jobs13:56
Tenguhmmmm so here I don't know how to do either of those two better solutions :/13:57
fungiit's one of the reasons we use glean instead of cloud-init, because the latter drags in python modules13:57
TenguI could of course override the wole file - but while I'm sure nothing will be lost on a standard fc-35 and cs-9, I don't know about other distros such as opensuse13:58
Tenguand doing multi-line regexp is a bad idea imho.13:59
fungibasically we just want to avoid polluting the resulting server image with unnecessary libs that end up conflicting with what jobs may want to install on those nodes later13:59
*** dasm|off is now known as dasm13:59
TenguI understand this - and am all for "keep it clean".13:59
fungitake a look at nodepool/elements/infra-package-needs/install.d/40-install-bindep for an example of the venv approach13:59
Tenguah14:00
fungithen anything needing to run bindep can just call /usr/bindep-env/bin/bindep directly or symlink to it from somewhere in the path like /usr/local/bin14:01
Tengusooooo.... am I to add such a file in the nodepool-base/install.d so that I can install crudini, and then call crudini from wihtin the venv in the 89-boot-settings ?14:01
fungithat would be one option, yes. the other option would be to install crudini into the nodepool-builder container image and then use whatever the dib mechanism is to run that from outside the chroot instead of from inside14:02
Tenguah, there's already a 91-venv-os-testr in nodepool-base14:02
TenguI'm not fluent enough with dib for that. Since there's already a venv available in the nodepool-base, I can "just" install crudini in it..14:03
fungibut again, all this is on the very edge of my experience so when others are around they may have superior suggestions14:03
Tengusure14:03
TenguI'll propose something, so that it's not "just" theory14:04
Tenguguess we'll get more ppl next week, after thanksgiving.14:04
fungilikely, or perhaps later today when other parts of the world wake up14:04
fungiianw is probably our foremost expert on this particular topic, but he's on apac time14:05
Tenguah, so too late14:05
fungior too early, depending on how you look at a globe14:06
Tengu:) I'll try to catch him tomorrow during my morning (EMEA)14:08
Tengushould be fine.14:08
opendevreviewCedric Jeanneret proposed openstack/project-config master: Ensure NetworkManager doesn't override /etc/resolv.conf  https://review.opendev.org/c/openstack/project-config/+/86543314:12
vishalmanchandaclarkb: fungi : hello, a query regarding bug https://bugs.launchpad.net/horizon/+bug/199663814:15
vishalmanchandaclarkb: fungi : here is link of logs what we discussed in the past https://etherpad.opendev.org/p/migrate-to-jammy#L3314:15
vishalmanchandaright now after migration to ubuntu-jammy, more horizon jobs like horizon-selenium-headless, horizon-integration-tests start failing.14:17
fungivishalmanchanda: this is the issue with using snap-installed browsers in a headless xvfb right?14:18
vishalmanchandaIn the latest P.S. you can find error logs for both of these job https://review.opendev.org/c/openstack/horizon/+/86114014:18
vishalmanchandafungi: yes.14:18
vishalmanchandaI am getting the same error in my local env. deployed on ubuntu-focal.14:18
fungiwere you able to see if coreycb or tinwood had ideas? this may get deep into ubuntu dbus/login design choices14:19
vishalmanchandafungi: not yet.14:19
vishalmanchandafungi: but I found one fix which works in my local env.14:19
vishalmanchandafungi: now wants to know how can we apply the same in our openstack CI job.14:20
fungian alternative might be to run the jobs on debian, where chromium and firefox are available as normal deb packages14:20
fungiwhat's the workaround?14:20
vishalmanchandafungi: For e.g. If install firefox following all these steps mentioned here https://www.omgubuntu.co.uk/2022/04/how-to-install-firefox-deb-apt-ubuntu-22-04#:~:text=Installing%20Firefox%20via%20Apt%20(Not%20Snap)&text=You%20fist%20add%20the%20Mozilla,reinstalled%20at%20a%20later%20date. , then  2 jobs works fine 14:21
fungiahh, okay, so similar to running the test on debian instead of ubuntu14:24
vishalmanchandaInstall Firefox as a .Deb on Ubuntu 22.04 (Not a Snap)14:25
fungibut in this case getting a deb of the browser from outside ubuntu's archive rather than switching to a distro which provides something along those lines14:25
fungiyeah, that's fairly straightforward. i should be able to find you an example, i think your jobs already do something similar to install newer nodejs packages14:25
vishalmanchandaok14:27
fungivishalmanchanda: i think this is the basic template for it... https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-nodejs/tasks/main.yaml14:28
fungioh, though maybe easier since this is in a ppa on lp and ubuntu already provides some slightly nicer integration for those14:30
vishalmanchandafungi: ok, thanks for the reference, I will try to do the same thing to install Firefox as a .deb package14:30
fungivishalmanchanda: here's an example of consuming packages from a ppa in a role: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-skopeo/tasks/Ubuntu.yaml14:31
vishalmanchandafungi: So I should update this task https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/nodejs-test-dependencies/tasks/main.yaml#L6 ?14:33
vishalmanchandabecause this task is responsible for installing firefox and use snap by default to install firefox14:34
fungivishalmanchanda: changing it in the zuul-jobs repo is probably a bigger question, since you're wanting to apply an ubuntu-specific workaround and that role is generic for a variety of different distros, so we'd need an ubuntu-specific variant of it similar to how the ensure-skopeo role does it (you'll see there's an Ubuntu-22.04.yaml in it which does just a normal package install of14:39
fungiskopeo vs the Ubuntu.yaml which installs skopeo from a ppa)14:39
fungiyou'll see https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-skopeo/tasks/main.yaml does include_tasks: "{{ zj_distro_os }}" in order to allow us to switch behavior based on the platform where it's run14:40
vishalmanchandafungi: Could you help me in writing tasks and roles to fix this issue if you have time, because I have very basic knowledge on anisble side.14:49
fungivishalmanchanda: this is a very bad week for me to commit to more work, i'm already drowning and at the moment i seem to be the only one around to answer questions14:50
vishalmanchandafungi: ok np.14:50
vishalmanchandafungi: let me just hit and try then.14:50
vishalmanchandafungi: anyway thanks for providing all these references, these are very helpful:)14:51
fungivishalmanchanda: happy to help, and i might be able to help more i just don't want to commit to anything at least until i get caught up on the pile of pending things in my inbox14:56
*** dviroel is now known as dviroel|lunch15:07
*** soniya29|afk is now known as soniya2915:48
*** akekane is now known as abhishekk16:09
*** dviroel_ is now known as dviroel16:15
*** yadnesh is now known as yadnesh|away16:41
clarkbTengu: fungi: while the /opt setup in that cloud does take some time I really feel like optimizing that is poor application of optimization. We've got jobs that spend more than half an hour curating logs for example. Some of that is ansible per task slowness, some of it is poor log layout for swift object storage, and some of it is just slow runtime. Tempest for example is16:59
clarkbinstalled three times in tempest jobs one of which deletes a previous install and reinstalls. Basically all that to say we have a lot of low hanging performance fruit and almost all of it is safer than changing base node expectations.16:59
Tenguclarkb: I'd be happy to help chasing those fruits and making some nice salad with them :)17:00
clarkbTengu: fungi: and yes to be extra extra clear unbound is used by our nodes when we hand them over to you. removing unbound would breakt hings17:00
fungihowever, configuring nm to know not to blow away the resolver settings we've supplied would probably be good17:01
clarkbfungi: yup  Iagree however I'm confused as to why this is an issue if things boot just fine17:02
clarkbTengu: fungi: ^ I looked into this the other day quite a bit and everything I could see is that NM and unbound were working as expected and the tripleo jobs do something to break it. Can we just stop doing the something instead?17:02
clarkbor maybe the something should be smarter about what it does. But as far as I can tell there wasn't anything wrong with NM and unbound as configured at boot17:03
Tenguclarkb: I'd rather be smart with NetworkManager and make it understand "don't touch that file"17:03
Tengusince it's something we can do....17:03
clarkbTengu: right but why is it touching the file. I think that is my confusion17:03
fungithere may be a startup race, since we're setting the resolver in rc.local which probably runs after nm has settled17:03
Tengubasically, nm will touch it whenever there's a reload (nmcli reload, systemctl restart networkmanager), lease refresh and the like17:04
clarkbhuh my local NM doesn't do that (in fact its a problem on my laptop beacuse I can't get it to reset thingsafter tethering with my phone)17:04
fungiit might also be specific to centos stream 917:05
clarkbspecifically if I usb tether and then stop NM refuses to update networking settings when switching back to wifi or similar. Its really annoying17:05
Tenguclarkb: check the NetworkManager config to see what's going on in there. Also, the default behavior of that service changes depending on the release, availability of some tools and so on17:05
Tenguin any cases - being specific and precise in the config may (and will) avoid headaches.17:06
clarkback so this may be specific to the installation and config. In that case having NM ignore /etc/resolv.conf does make sense. Can we be sure to document why we're making that change pretty clearly in the commit/comments17:06
clarkbI'm mostly just surprised because NM refuses to act that way on my laptop where it would actually be helpful :)17:06
clarkbTengu: re crudini you don't need it in the resulting image so I'd install it and then remove it17:07
clarkbbut another option (and maybe preferable option) is to just use a small python script17:07
clarkbuse config parser, set the flag, and done17:07
Tenguclarkb: I can check for that tomorrow (getting late EMEA here). Care to comment the thing about crudini? Maybe installing it on the builder itself and calling it from outside the chroot is a thing? not really sure what's best...17:09
Tenguor I can, indeed, drop the venv.17:09
clarkbTengu: for low hanging performance fruit in tripleo specifically I would look at optimizing the log uploads. Tripleo log upalods are extremely slow and I think a coupel of things can be done to improve them. The first is to ensure that we're avoiding ansible task loops with large numbers of inputs (each ansible task can take a second or more to bootstrap, process 600 log17:09
clarkbfiles and that is 10 minutes). Second is to remove files that never change (lots of /etc type content etc) and flatten dir structures as each dir entry is a swift object that neesd to be created with an index file17:09
Tengu(also, sorry, I'm working on another issue in //...)17:09
clarkbTengu: I would not install it in the builder image. Personally I would just make a tiny python script that uses config parser to set whatever ini values you need.17:09
Tenguclarkb: like, calling "python 'import config_parser ....'" ?17:10
clarkbbut to be clear deleting things from /opt is the very last optimization I would look at once everything else that is easy has been done17:10
Tenguor a plain file17:10
clarkbTengu: either way. YOu can have a script file or inline it. Probably depends on how complicated and large the python needs to be17:11
Tenguclarkb: "re: flatten dir structure" maybe replacing / by _ ? but that would make the whole thing terrible to browse.17:11
clarkbTengu: in tripleos case the logs often have like 10 levels or more with no file contents then only the leaf has any contents17:11
Tenguclarkb: basically, it's only one option to add in the [main] section.17:11
clarkbIn cases where you are deeply nested like that it seems rare that the full path provides any value17:12
* Tengu takes note for the log thing17:12
clarkbjust store the log file somewhere without the deep nesting17:12
Tengulogs are collected from within a dedicated ansible project, I can have a look at it.17:12
Tenguindeed, if we can make the log collection faster, that may already be a thing17:13
clarkbBut also (imo) tripleo collects a lot of useless log files17:13
clarkbfiles that never ever  change and can be collected only when specifically needed17:13
TenguI can more than probably make a review with the CI folks (ping rlandy|rover )17:14
clarkbhttps://zuul.opendev.org/t/openstack/build/91f108985637462c9ea5d868c77f9378/log/logs/undercloud/etc/rsyncd.conf is a good example17:14
clarkbbut there are many like that17:14
Tenguthing is, at some point, that one was actually used :).17:15
Tengubut yeah, that's for an old, deprecated/dead release iirc.17:15
clarkbsure, but it isn't useful today and log upload for tripleo jobs is slow17:15
Tengusooo. yeah. cleaning might help as well.17:15
clarkbI'm just saying deleting useless stuff first is what I would do before touching /opt which many jobs rely on and itself is an optmiation tool17:15
clarkbyou can very easily make jobs run longer or start failing by editing /opt incorrectly17:15
clarkbnot collecting a log file that can be retrieved from the package associated with its installation hurts nothing17:16
Tengusure17:16
rlandy|roverTengu: pls add whatever needs to review to our review list and we'll take care of it17:17
Tengurlandy|rover: I'll create a jira card for that and come back with some more meat.17:17
clarkbon the ansible task cost side of things it would be great if we could convince ansible that per task costs are actually important, but unfortunately we've not been able to make headway there17:17
rlandy|roversure17:17
Tenguclarkb: same here... and it will get even worts with the AEE...17:18
clarkbanytime you have a loop with more than a handful of items you now need to consider writing a python module...17:18
Tengua former colleague counted over 10s to just bootstrap the env, before anything was actually started with the playbook and all.17:18
Tenguand with tripleo being composed of many playbook, you can see how bad it can be.17:19
clarkbI've written at least one module to address some problematic loops in base jobs. There are more all over our jobs though because as someone writing a job it isn't apparent that a loop is dangerous17:19
Tenguyeah, we also did some of that work here. Maybe the log collection may benefit of some python love as well.17:20
clarkb(also I think the task bootstrap time has steadily gotten worse over time so when some of these were written it was fine but now under ansible 5/6 its a huge problem)17:20
clarkbalso ansible 5 broke pipelining, but I haven't seen ansible 6 be particularly quick with pipelining fixed17:21
fungialso ansible-lint contributes to this problem by insisting that lots of things which could be done efficiently by shelling out to an external command should instead use existing ansible modules which may need to be executed in loops where the shell commands would have operated efficiently on multiple files in one execution17:22
clarkbfungi: exactly. Its a core part of the ansible language and one that is strongly encouraged by the ecosystem. There is no warning that says "you have more than 5 elements here consider a different tool"17:23
clarkbvishalmanchanda: fungi: I wonder if the easiest thing is to switch the job definition to debian and see if it just works?17:24
clarkbvishalmanchanda: fungi: considering the jobs install firefox and chromium from pacakges and then run tox I half expect that to just work17:25
clarkbbut then youdon't need to sort out PPAs and special packages.17:25
vishalmanchandaclarkb: yeah we can try that, how can  I do that?17:26
clarkbvishalmanchanda: in the jobs you want to run with selenium change nodeset: ubuntu-jammy to nodeset: debian-bullseye17:27
vishalmanchandaclarkb: ok let me quickly try that.17:28
*** jpena is now known as jpena|off17:30
clarkbTengu: just talking out loud here for other optimization ideas: One along the lines of /opt optimizing is looking for redundant actions (like installing tempest multiple times or editing /etc/ssh/known_hosts and ~zuul/.ssh/known_hosts with the same content). I know a number of projects continue to install python2 related stuff even though they no longer support python217:34
clarkbanymore (this happens via bindep entries but could happen in other ways). In Zuul's testsuite the performance difference between python 3.8 and 3.10 was dramatic, using newer python interpreters can have a pretty big impact. In the past its been common to trace job slowness and timeouts back to heavy use of swap. Double checking jobs are avoiding swap might be helpful. As17:35
clarkbwould improving known memory hogs (cinder-backup, privsep, etc). Apparently running python in containers can slow things down due to seccomp :( it may not be appropriate to avoid that though. There are so many things that we could do better if we had a concerted effort around performance, but it seems the desire just often isn't there and I end up poking at things that are17:35
clarkbnon controversial and relatively easy to try and help.17:35
clarkboh! I can't forget replacing osc with dedicated scripts that can reuse tokens (though apparently we can configure osc to do this too anyone know if progress has been made on that?) and avoid python startup time finding entrypoints (which is costly)17:41
Tenguclarkb: I'll do some listing of the potential things we can do within tripleo. but first, I want that unbound vs networkmanager situation solved for good17:46
clarkb++ I don't think any of this is urgent, but it is something that I periodically look into and so have a bunch of random ideas for things that can help.17:48
Tengu:=17:48
Tengu:)17:48
Tengufor now - EOD is hitting hard, wide will be angry if I extend more..17:49
Tenguclarkb, fungi thanks for the pointers and help!17:49
vishalmanchandaclarkb: here is patch https://review.opendev.org/c/openstack/horizon/+/865453/2 chnages nodeset from ubuntu-focal->debian-bullseye17:53
clarkbvishalmanchanda: cool lets see if that is any happier. Its possible there are debian vs ubuntu differences we need to handle, but in this case because the testing should be very self contained I expect it may just work17:54
vishalmanchandaclarkb: it's failing:(17:54
vishalmanchandahttps://zuul.openstack.org/status#86545317:55
clarkblooks like the debian package for chromium is just chromium and not chromium-browser17:56
vishalmanchandaclarkb: hmm ok some are not available... 17:56
clarkband firefox is firefox-esr? I didn't expect those differences. But I guess if they have diverged on how to pacakge them different names isn't surprising17:57
clarkbvishalmanchanda: I've pushed https://review.opendev.org/c/zuul/zuul-jobs/+/865459 and once testing for that change looks good we can update your change to depends on it18:15
vishalmanchandaclarkb: ok thanks18:15
fungiclarkb: there is a "firefox" package in debian too but not in stable because... stability18:30
fungiso firefox-esr tracks the firefox extended stable release versions instead of latest18:30
fungiand i guess the maintainers thought it best to provide those as different package names18:31
fungi(on unstable you can choose between esr or latest that way)18:31
clarkbgotcha18:32
fungimight have made more sense to make "firefox" be a virtual package which depends: firefox-latest|firefox-esr and then on stable you'd get whichever was available, but yeah18:32
clarkb++18:32
clarkbit looks like my zuul-jobs edit and vishalmanchanda's change to switch the jobs to bullseye is working for at least some of the jobs18:32
clarkbthe selenium-headless job is rerunning for some reason but nodejs16-run-test ran18:33
clarkbits a bindep issue in horizon, one sec18:33
fungineat18:34
clarkbinterestingly I thought bindep ran in the job that succeeded so not sure why it didn't break too.18:34
fungidifferent profile maybe?18:39
clarkbya there is a selenium profile in use18:40
clarkbchange updated to distinguish between the two firefox packages across ubuntu and debian18:40
clarkbI think there may be another issue because horizon-integration-tests is restarting. But the good news is the nodejs16-run-test job seems to show it talking to firefox successfully18:52
clarkblooks like the integration-tests job runs devstack? Which implies that devstack isn't working on bullseye? I thought I fixed that recently..18:56
clarkboh I see its a nodeset issue18:57
clarkbvishalmanchanda: it looks like there are errors finding the firefox binary now. hopefully those are addressabel though. It does look like the nodejs suite has figured it out at least but not python19:15
clarkbvishalmanchanda: I've just pushed a new update that bumps the geckodriver version. One thing I notice is they have done work to improve snap support with newer geckodrivers. It is possible this may be part of the puzzle for ubuntu too?19:33
clarkbvishalmanchanda: see https://github.com/mozilla/geckodriver/releases/19:33
vishalmanchandaclarkb: yeah I done that too in my local env. to fix selenium-headless job19:36
vishalmanchandaclarkb: but I just tried with older version v0.27.0 and it also works if install firefox as deb. package on ubuntu-jammy.19:43
vishalmanchandaclarkb: I also updated geckodriver version here https://review.opendev.org/c/openstack/horizon/+/861140 but selenium-headless job still fails.19:44
fungiwell, yeah because that entirely sidesteps the snap situation19:44
fungii mean, the reason it works when installing an actual deb of the browser on ubuntu19:45
vishalmanchandaone question, I have in my mind.19:47
vishalmanchandaIs there any cons of running these jobs on debian-bullseye instead of ubuntu-jammy?19:48
fungivishalmanchanda: other than potential inconsistency with other jobs, not really. debian is one of the target distros for this cycle: https://governance.openstack.org/tc/reference/runtimes/2023.1.html19:49
vishalmanchandafungi: ok19:50
clarkbright I strongly suspect that part of the solution here is a newer geckodriver. That may not be sufficient though20:01
clarkbvishalmanchanda: the selenium headless job passes now after updating geckodriver20:02
clarkbso ya that was a piece of the puzzle20:02
vishalmanchandaclarkb: cool, thanks for the help.20:03
*** dviroel is now known as dviroel|afk21:25
*** rlandy|rover is now known as rlandy|out22:25
*** dasm is now known as dasm|off23:49

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!