*** hamalq has quit IRC | 00:45 | |
*** hashar has joined #opendev-meeting | 02:48 | |
*** hashar has quit IRC | 05:13 | |
*** diablo_rojo has quit IRC | 06:29 | |
*** hashar has joined #opendev-meeting | 07:24 | |
*** hashar has quit IRC | 13:16 | |
*** jentoio_ has joined #opendev-meeting | 13:57 | |
*** gouthamr__ has joined #opendev-meeting | 13:57 | |
*** gouthamr has quit IRC | 14:00 | |
*** fungi has quit IRC | 14:00 | |
*** jentoio has quit IRC | 14:00 | |
*** gouthamr__ is now known as gouthamr | 14:00 | |
*** jentoio_ is now known as jentoio | 14:00 | |
*** fungi has joined #opendev-meeting | 14:09 | |
-openstackstatus- NOTICE: Our PyPI caching proxies are serving stale package indexes for some packages. We think because PyPI's CDN is serving stale package indexes. We are sorting out how we can either fix or workaround that. In the meantime updating requirements is likely the wrong option. | 16:09 | |
*** diablo_rojo has joined #opendev-meeting | 17:44 | |
clarkb | anyone else here for the meeting? | 19:00 |
---|---|---|
fungi | sure, why not | 19:00 |
frickler | o/ | 19:00 |
clarkb | #startmeeting infra | 19:01 |
openstack | Meeting started Tue Sep 15 19:01:20 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:01 | |
openstack | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2020-September/000097.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:01 | |
clarkb | If our air clears out before the end of the week I intend on taking a day off to get out of the house. This is day 7 or something of not going outside | 19:02 |
clarkb | but the forecasts have a really hard time predicting that and it may not happen :( just a heads up that I may pop out to go outside if circumstances allow | 19:02 |
frickler | good luck with that | 19:03 |
ianw | o/ | 19:03 |
clarkb | #topic Actions from last meeting | 19:03 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:03 | |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-08-19.01.txt minutes from last meeting | 19:03 |
clarkb | Theer were no recorded actions | 19:03 |
clarkb | #topic Priority Efforts | 19:04 |
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)" | 19:04 | |
clarkb | #topic Update Config Management | 19:04 |
*** openstack changes topic to "Update Config Management (Meeting topic: infra)" | 19:04 | |
clarkb | nb03.opendev.org has replaced nb03.openstack.org. We are also trying to delete nb04.opendev.org but it has built our only tumbleweed images and there are issues building new ones | 19:04 |
clarkb | Overall though I think taht went pretty well. One less thing running puppet | 19:05 |
clarkb | to fix the nb04.opendev.org thing I think we want https://review.opendev.org/#/c/751919/ and its parent in a dib release | 19:06 |
clarkb | then nb01 and nb01 can build the tumbleweed images and nb04 can be deleted | 19:06 |
clarkb | ianw: ^ fyi that seemed to fix the tumbleweed job on dib itself if you have a chance to review it | 19:06 |
clarkb | there are pypi problems which we'll talk about in a bit | 19:06 |
ianw | ok, there were a lot of gate issues yesterday, but they might have all fixed | 19:06 |
ianw | or, not :) | 19:06 |
fungi | the priority for containerized storyboard deployment has increased a bit, i think... just noticed that we're no longer deploying new commits to production because it now requires python>=3.6 which is not satisfyable on xenial, and if we're going to redeploy anyway i'd rather not spend time trying to hack up a solution in puppet for that | 19:06 |
clarkb | fungi: any idea why we bumped the python version if the deployment doesn't support it? just a miss? | 19:07 |
corvus | o/ | 19:07 |
clarkb | fungi: I agree that switching to a container build makes sense | 19:07 |
fungi | it was partly cleanup i think, but maybe also dependencies which were dropping python2.7/3.5 | 19:08 |
fungi | storyboard has some openstackish deps, like oslo.db | 19:09 |
fungi | we'd have started to need to pin a bunch of those | 19:09 |
clarkb | note that pip should handle those for us | 19:09 |
clarkb | because openstack libs properly set python version metadata | 19:10 |
clarkb | (no pins required) | 19:10 |
fungi | but not the version of pip shipped in xenial ;) | 19:10 |
clarkb | but we would get old libs | 19:10 |
clarkb | fungi: we install a newer pip so I think it would work | 19:10 |
fungi | yeah, we do, though looks like it's pip 19.0.3 on that server so maybe still too old | 19:11 |
fungi | could likely be upgraded | 19:11 |
fungi | i'm betting the pip version installed is contemporary with when the server was built | 19:11 |
clarkb | ya | 19:11 |
clarkb | in any case a container sounds good | 19:11 |
clarkb | any other config management issues to bring up? | 19:12 |
clarkb | #topic OpenDev | 19:13 |
*** openstack changes topic to "OpenDev (Meeting topic: infra)" | 19:13 | |
clarkb | I've not made progress on Gerrit things recently. Way too many other fires and distrctions :( | 19:13 |
clarkb | ttx's opendev.org front page update has landed and deployed though | 19:13 |
clarkb | fungi: are there followups for minor fixes that you need us to review? | 19:13 |
fungi | yeah, i still have some follow-up edits for that on my to do list | 19:13 |
fungi | haven't pushed yet, no | 19:13 |
clarkb | ok, feel free to ping me when you do and I'll review them | 19:13 |
fungi | gladly, thanks | 19:14 |
clarkb | I didn't really have anything else to bring up here? Anyone else have something? | 19:15 |
fungi | seems like we can probably dive into the ci issues | 19:15 |
clarkb | #topic General Topics | 19:16 |
*** openstack changes topic to "General Topics (Meeting topic: infra)" | 19:16 | |
clarkb | really quicky want to go over two issues that I think we've mitigated then we can dig into new ones | 19:16 |
clarkb | #topic Recurring bogus IPv6 addresses on mirror01.ca-ymq-1.vexxhost.opendev.org | 19:16 |
*** openstack changes topic to "Recurring bogus IPv6 addresses on mirror01.ca-ymq-1.vexxhost.opendev.org (Meeting topic: infra)" | 19:16 | |
clarkb | We did end up setting a netplan config file to statically configure what vexxhost was setting via RAs on this server | 19:17 |
clarkb | since we'vedone that I've not seen anyone complain about broken networking on this server | 19:17 |
fungi | we should probably link the open neutron bug here for traceability | 19:17 |
fungi | though i don't have the bug number for that handy | 19:17 |
frickler | I've started my tcpdump again earlier to double check what happens when we see another stray RA | 19:17 |
clarkb | #link https://bugs.launchpad.net/bugs/1844712 This is the bug we think causes the problems on mirror01.ca-ymq-1.vexxhost.opendev.org | 19:18 |
openstack | Launchpad bug 1844712 in OpenStack Security Advisory "RA Leak on tenant network" [Undecided,Incomplete] | 19:18 |
fungi | thanks, perfect | 19:18 |
clarkb | fungi: ^ there it is | 19:18 |
frickler | would also be good to hear back from mnaser whether still to collect data to trace things | 19:18 |
clarkb | mnaser: ^ resyncing on that would be good. Maybe not in the meeting as I don't think you're here, but at some point | 19:19 |
clarkb | #topic Zuul web performance issues | 19:19 |
*** openstack changes topic to "Zuul web performance issues (Meeting topic: infra)" | 19:19 | |
clarkb | Last week users noticed that the zuul web ui was really slow. corvus identified a bug that caused zuul's status dashboard to always fetch the status json blob even when tabs were not foregrounded | 19:20 |
clarkb | and ianw sorted out why we weren't caching things properly in apache | 19:20 |
clarkb | it seems now like we are stable (at least as a zuul web user I don't have current complaints) | 19:20 |
fungi | brief anecdata, the zuul-web process is currently using around a third of a vcpu according to top, rather than pegged at 100%+ like it was last week | 19:20 |
clarkb | I also have a third change up to run more zuul web processes. It seems we don't need that anymore so I may rewrite it to be a switch from mod rewrite proxying to mod proxy proxying as taht should perform better in apache too | 19:21 |
clarkb | thank you for all the help on that one | 19:22 |
corvus | clarkb: why would it perform better? | 19:22 |
corvus | clarkb: (i think mod_rewrite uses mod_proxy with [P]) | 19:22 |
clarkb | corvus: that was something that ianw foudn when digging into the cache behavior | 19:22 |
clarkb | ianw: ^ did you have more details? I think it may be because the rules matching is simpler? | 19:22 |
ianw | well the manual goes into it, i'll have to find the page i read | 19:23 |
corvus | so something like "mod_proxy runs less regex matching code?" | 19:23 |
clarkb | "Using this flag triggers the use of mod_proxy, without handling of persistent connections. This means the performance of your proxy will be better if you set it up with ProxyPass or ProxyPassMatch" | 19:23 |
clarkb | I guess its connection handling that is the difference | 19:23 |
ianw | no something about the way it's dispatched with threads or something | 19:23 |
ianw | or yeah, what clarkb wrote :) ^ | 19:24 |
clarkb | from http://httpd.apache.org/docs/current/rewrite/flags.html#flag_p | 19:24 |
ianw | that's it :) | 19:24 |
corvus | does that affect us? | 19:24 |
clarkb | I could see that being part of the problem with many status requests | 19:25 |
clarkb | if a new tcp connection has to be spun up for each one | 19:25 |
fungi | that probably won't change the load on zuul-web, only maybe the apache workers' cycles? | 19:25 |
clarkb | fungi: but ya the impact would be on the apache side not the zuul-web side | 19:25 |
corvus | i'm not sure we ever evaluated whether zuul-web is well behaved with persistent connections | 19:25 |
clarkb | corvus: something to keep in mind if we change it I guess. Would you prefer we leave it with mod rewrite? | 19:27 |
corvus | anyway, if we don't need mod_rewrite, i'm fine changing it. maybe we'll learn something. i normally prefer the flexibility of rewrite. | 19:27 |
fungi | are the connections tied to a thread, or are the descriptors for them passed around between threads? if the former, then i suppose that could impact thread reaping/recycling too | 19:27 |
fungi | (making stale data a concern) | 19:29 |
clarkb | I think connection == thread and they have a tll | 19:30 |
clarkb | or can ahve a ttl | 19:30 |
fungi | that's all handled by cherrypy? | 19:30 |
clarkb | no thats in apache | 19:30 |
clarkb | I don't know what cherrypy does | 19:30 |
clarkb | its less critical anyway since rewrite seems to work now | 19:31 |
fungi | oh, sorry, i meant the zuul-web threads. anyway yeah no need to dig into that in the meeting | 19:31 |
clarkb | it was just part of my update to do multiple backends because rewrite can't do multiple backends I don't think | 19:31 |
clarkb | #topic PyPI serving stale package indexes | 19:31 |
*** openstack changes topic to "PyPI serving stale package indexes (Meeting topic: infra)" | 19:31 | |
clarkb | This has become the major CI issue for all of our python based things in the last day or so | 19:32 |
clarkb | the general behavior of it is project that uses constraints that pins to a recent (and latest) package version fails because only version prior to that latest version are present in the index served to pip | 19:32 |
clarkb | there has been a lot of confusion about this from people. From thinking that 404s are the problem to afs | 19:33 |
clarkb | AFS is not involved and there are no 404s. We appear to be getting back a valid index because pip says "here is the giant list of things I can install that doesn't include the version you want" | 19:33 |
clarkb | projects that don't use constraints (like zuul) may be installing prior versions of things occasionally since that won't error | 19:33 |
clarkb | but I expect they are mostly happy as a result (just something to keep in mind) | 19:34 |
frickler | I was wondering if we could keep a local cache of u-c pkgs in images, just like we do for git repos | 19:34 |
clarkb | we have seen this happen for different packages across our mirror nodes | 19:34 |
frickler | #link https://pip.pypa.io/en/latest/user_guide/#installing-from-local-packages | 19:34 |
clarkb | frickler: I think pip will still check pypi | 19:34 |
clarkb | though it may not error if pypi only has older versions | 19:34 |
ianw | i wonder if other people are seeing it, but without such strict constraints as you say, don't notice | 19:34 |
clarkb | ianw: ya that is my hunch | 19:35 |
frickler | running the pip download with master u-r gives me about 300M of cache, so that would sound feasible | 19:35 |
frickler | IIUC the "--no-index --find-links" should assure a pure local install | 19:35 |
clarkb | the other thing I notice is that it seems to be openstack built packages | 19:35 |
clarkb | frickler: that becomes tricky because it means we'd have to update our images before openstack requirements cna update constraints | 19:36 |
clarkb | frickler: I'd like to avoid that tight coupling if possible as it will become painful if image builds have separaet problems but openstack needs a bug fix python lib | 19:36 |
clarkb | its possible we could use an image cache with still checking indexes though | 19:36 |
fungi | current release workflow is that as soon as a new release is uploaded to pypi a change is pushed to the requirements repo to bump that constraint entry | 19:37 |
ianw | fungi: i noticed in scrollback you're running some grabbing scripts; i had one going in gra1 yesterday to with no luck catching an error | 19:37 |
fungi | ianw: yep, we're evolving that to try to make it more pip-like in hopes of triggering maybe pip-specific behaviors from fastly | 19:37 |
fungi | i'm currently struggling to mimic the compression behavior though | 19:38 |
clarkb | https://mirror.mtl01.inap.opendev.org/pypi/simple/python-designateclient/ version 4.1.0 is one that just failed a couple minutes ago at https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b5f/751040/4/check/openstack-tox-pep8/b5ff50c/job-output.txt | 19:38 |
ianw | i've definitely managed to see it happen from fastly directly ... but i guess we can't rule out apache? | 19:38 |
clarkb | but if you load it 4.1.0 is there :/ | 19:38 |
fungi | wget manpage on the mirror server says it supports a --compression option but then wget itself complains about it being an unknown option | 19:38 |
clarkb | ianw: ya its possible that apache is serving old content for some reason, | 19:39 |
clarkb | pip sets cache-control: max-age=0 | 19:39 |
clarkb | that should force apache to check with fastly that the content it has cached is current for every pip index request though | 19:39 |
clarkb | we could try disabling the pypi proxies in our base job | 19:40 |
clarkb | and see if things are still broken | 19:40 |
clarkb | I expect they will be beacuse ya a few months back it seemed like we could repreoduce from fastly occasionally | 19:40 |
clarkb | I also wonder if it could be a pip bug | 19:41 |
fungi | aha, compression option confusion resolved... it will only be supported if wget is built with zlib, and ubuntu 19.04 is the earliest they added zlib to its build-deps | 19:41 |
clarkb | perhaps in pips python version checking of packages it is excluding results for some reason | 19:41 |
clarkb | but I haven't been able to reproduce that either (was fiddling with pip installs locally against our mirrors and same version of pip here is fine with it) | 19:42 |
clarkb | the other thing I looked into was whether or not pip debug logging would help and I don't think it will | 19:43 |
corvus | fungi: looks like curl has an option | 19:43 |
clarkb | at least for succesful runs it doesn't seem to log index content | 19:43 |
clarkb | `pip --log /path/to/file install foo` is one way to set that up which we could add to say devstack and have good coverage | 19:44 |
clarkb | but I doubt it will help | 19:44 |
corvus | fungi: "curl -V" says "Features: ... libz" for me | 19:44 |
fungi | corvus: yep, i'm working on rewriting with curl instead of wget | 19:45 |
clarkb | not having issues with common third party pacakges does make me wonder if it is something with how we build and release packages | 19:46 |
clarkb | at least I've not seen anything like that fail. six, cryptography, coverage, eventlet, paramiko, etc why don't they do this too | 19:46 |
clarkb | other than six they all have releases within the last month too | 19:47 |
clarkb | I've largely been stumped. Maybe we should start to reach out to pypi even though what we've got is minimal | 19:48 |
clarkb | Anything else to add to this? or should we move on? | 19:49 |
fungi | do all the things we've seen errors for have a common-ish release window? | 19:49 |
clarkb | fungi: the oldest I've seen is oslo.log August 26 | 19:49 |
clarkb | fungi: but it has happened for newer packages too | 19:49 |
fungi | okay, so the missing versions weren't all pushed on the same day | 19:49 |
clarkb | correct | 19:49 |
ianw | yeah, reproducing will be 99% of the battle i'm sure | 19:50 |
fungi | i suppose we could blow away apache's cache and restart it on all the mirror servers, just to rule that layer out | 19:50 |
fungi | though that seems like an unfortunate measure | 19:50 |
clarkb | fungi: ya its hard to only delete the pypi cache stuff | 19:51 |
ianw | i was just looking if we could grep through, to see if we have something that looks like an old index there | 19:51 |
clarkb | ianw: we can but I think things are hashed in weird ways, its doable just painful | 19:51 |
ianw | that might clue us in if it *is* apache serving something old | 19:51 |
clarkb | ya that may be a good next step then if we sort out apache cache structure we might be able to do specific pruning | 19:52 |
clarkb | if we decide that is necessary | 19:52 |
frickler | but do we see 100% failure now? I was assuming some jobs would still pass | 19:52 |
fungi | also if the problem isn't related to our apache layer (likely) but happens to clear up around the same time as we reset everything, we can't really know if what we did was the resolution | 19:53 |
ianw | looks like we can zcat the .data files in /var/cache/apache2/proxy | 19:53 |
ianw | i'll poke a bit and see if i can find a smoking gun old index, for say python-designateclient in mtl-01 (the failure linked by clarkb) | 19:53 |
clarkb | frickler: correct many jobs seems fine | 19:53 |
frickler | also, not sure if you saw my comment earlier, I did see the same issue with local devstack w/o opendev mirror involved | 19:53 |
clarkb | frickler: its just enough fo them to be noticed and cause problems for developers. But if you look at it on an individual job basis most are passing I think | 19:53 |
frickler | so I don't see how apache2 could be the cause | 19:54 |
fungi | and we're sure that the python version metadata says the constrained version is appropriate for the interpreter version pip is running under? | 19:54 |
clarkb | frickler: oh I hadn't seen that. Good to confirm that pypi itself seems to exhibit it in some cases | 19:54 |
clarkb | frickler: agreed | 19:54 |
clarkb | fungi: as best as I can tell yes | 19:54 |
fungi | just making sure it's not a case of openstack requirements suddenly leaking packages installed with newer interpreters into constraints entries for older interpreters which can't use them | 19:54 |
clarkb | fungi: in part because those restrictions are now old enough that the previous version that pip says si valid also has the same restriction on the interpreter | 19:55 |
clarkb | fungi: so it will say I can't install foo==1.2.3 but I can install foo==1.2.2 and they both have the same interpreter restriction in the index.html | 19:55 |
fungi | k | 19:55 |
clarkb | however; double double checking that would be a good idea | 19:55 |
clarkb | since our packages tend to be more restricted tahn say cryptography | 19:55 |
clarkb | anyway we're just about at time. I expect this to consume most of the rest of my time today. | 19:56 |
clarkb | we can coordinate further in #opendev and take it from there | 19:56 |
clarkb | #topic Open Discussion | 19:56 |
*** openstack changes topic to "Open Discussion (Meeting topic: infra)" | 19:56 | |
clarkb | Any other items that we want to call out before we run out of time? | 19:56 |
fungi | the upcoming cinder volume maintenance in rax-dfw next month should no longer impact any of our servers | 19:57 |
clarkb | thank you for handling that | 19:58 |
fungi | i went ahead and replaced or deleted all of them, with the exception of nb04 which is still pending deletion | 19:58 |
ianw | ++ thanks fungi! | 19:58 |
fungi | they've also cleaned up all our old error_deleting volumes now too | 19:58 |
fungi | note however they've also warned us about an upcoming trove maintenance. databases for some services are going to be briefly unreachable | 19:58 |
frickler | maybe double check our backups of those are in good shape, just in case? | 19:59 |
fungi | #info provider maintenance 2020-09-30 01:00-05:00 utc involving ~5-minute outages for databases used by cacti, refstack, translate, translate-dev, wiki, wiki-dev | 19:59 |
clarkb | I think ianw did just check those but ya double checking them is a good idea | 19:59 |
clarkb | and we are at time. | 20:00 |
clarkb | THank you everyone! | 20:00 |
clarkb | #endmeeting | 20:00 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 20:00 | |
fungi | thanks clarkb! | 20:00 |
openstack | Meeting ended Tue Sep 15 20:00:09 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-15-19.01.html | 20:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-15-19.01.txt | 20:00 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-15-19.01.log.html | 20:00 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!