Tuesday, 2020-09-15

*** hamalq has quit IRC00:45
*** hashar has joined #opendev-meeting02:48
*** hashar has quit IRC05:13
*** diablo_rojo has quit IRC06:29
*** hashar has joined #opendev-meeting07:24
*** hashar has quit IRC13:16
*** jentoio_ has joined #opendev-meeting13:57
*** gouthamr__ has joined #opendev-meeting13:57
*** gouthamr has quit IRC14:00
*** fungi has quit IRC14:00
*** jentoio has quit IRC14:00
*** gouthamr__ is now known as gouthamr14:00
*** jentoio_ is now known as jentoio14:00
*** fungi has joined #opendev-meeting14:09
-openstackstatus- NOTICE: Our PyPI caching proxies are serving stale package indexes for some packages. We think because PyPI's CDN is serving stale package indexes. We are sorting out how we can either fix or workaround that. In the meantime updating requirements is likely the wrong option.16:09
*** diablo_rojo has joined #opendev-meeting17:44
clarkbanyone else here for the meeting?19:00
fungisure, why not19:00
fricklero/19:00
clarkb#startmeeting infra19:01
openstackMeeting started Tue Sep 15 19:01:20 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2020-September/000097.html Our Agenda19:01
clarkb#topic Announcements19:01
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:01
clarkbIf our air clears out before the end of the week I intend on taking a day off to get out of the house. This is day 7 or something of not going outside19:02
clarkbbut the forecasts have a really hard time predicting that and it may not happen :( just a heads up that I may pop out to go outside if circumstances allow19:02
fricklergood luck with that19:03
ianwo/19:03
clarkb#topic Actions from last meeting19:03
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:03
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-08-19.01.txt minutes from last meeting19:03
clarkbTheer were no recorded actions19:03
clarkb#topic Priority Efforts19:04
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:04
clarkb#topic Update Config Management19:04
*** openstack changes topic to "Update Config Management (Meeting topic: infra)"19:04
clarkbnb03.opendev.org has replaced nb03.openstack.org. We are also trying to delete nb04.opendev.org but it has built our only tumbleweed images and there are issues building new ones19:04
clarkbOverall though I think taht went pretty well. One less thing running puppet19:05
clarkbto fix the nb04.opendev.org thing I think we want https://review.opendev.org/#/c/751919/ and its parent in a dib release19:06
clarkbthen nb01 and nb01 can build the tumbleweed images and nb04 can be deleted19:06
clarkbianw: ^ fyi that seemed to fix the tumbleweed job on dib itself if you have a chance to review it19:06
clarkbthere are pypi problems which we'll talk about in a bit19:06
ianwok, there were a lot of gate issues yesterday, but they might have all fixed19:06
ianwor, not :)19:06
fungithe priority for containerized storyboard deployment has increased a bit, i think... just noticed that we're no longer deploying new commits to production because it now requires python>=3.6 which is not satisfyable on xenial, and if we're going to redeploy anyway i'd rather not spend time trying to hack up a solution in puppet for that19:06
clarkbfungi: any idea why we bumped the python version if the deployment doesn't support it? just a miss?19:07
corvuso/19:07
clarkbfungi: I agree that switching to a container build makes sense19:07
fungiit was partly cleanup i think, but maybe also dependencies which were dropping python2.7/3.519:08
fungistoryboard has some openstackish deps, like oslo.db19:09
fungiwe'd have started to need to pin a bunch of those19:09
clarkbnote that pip should handle those for us19:09
clarkbbecause openstack libs properly set python version metadata19:10
clarkb(no pins required)19:10
fungibut not the version of pip shipped in xenial ;)19:10
clarkbbut we would get old libs19:10
clarkbfungi: we install a newer pip so I think it would work19:10
fungiyeah, we do, though looks like it's pip 19.0.3 on that server so maybe still too old19:11
fungicould likely be upgraded19:11
fungii'm betting the pip version installed is contemporary with when the server was built19:11
clarkbya19:11
clarkbin any case a container sounds good19:11
clarkbany other config management issues to bring up?19:12
clarkb#topic OpenDev19:13
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:13
clarkbI've not made progress on Gerrit things recently. Way too many other fires and distrctions :(19:13
clarkbttx's opendev.org front page update has landed and deployed though19:13
clarkbfungi: are there followups for minor fixes that you need us to review?19:13
fungiyeah, i still have some follow-up edits for that on my to do list19:13
fungihaven't pushed yet, no19:13
clarkbok, feel free to ping me when you do and I'll review them19:13
fungigladly, thanks19:14
clarkbI didn't really have anything else to bring up here? Anyone else have something?19:15
fungiseems like we can probably dive into the ci issues19:15
clarkb#topic General Topics19:16
*** openstack changes topic to "General Topics (Meeting topic: infra)"19:16
clarkbreally quicky want to go over two issues that I think we've mitigated then we can dig into new ones19:16
clarkb#topic Recurring bogus IPv6 addresses on mirror01.ca-ymq-1.vexxhost.opendev.org19:16
*** openstack changes topic to "Recurring bogus IPv6 addresses on mirror01.ca-ymq-1.vexxhost.opendev.org (Meeting topic: infra)"19:16
clarkbWe did end up setting a netplan config file to statically configure what vexxhost was setting via RAs on this server19:17
clarkbsince we'vedone that I've not seen anyone complain about broken networking on this server19:17
fungiwe should probably link the open neutron bug here for traceability19:17
fungithough i don't have the bug number for that handy19:17
fricklerI've started my tcpdump again earlier to double check what happens when we see another stray RA19:17
clarkb#link https://bugs.launchpad.net/bugs/1844712 This is the bug we think causes the problems on mirror01.ca-ymq-1.vexxhost.opendev.org19:18
openstackLaunchpad bug 1844712 in OpenStack Security Advisory "RA Leak on tenant network" [Undecided,Incomplete]19:18
fungithanks, perfect19:18
clarkbfungi: ^ there it is19:18
fricklerwould also be good to hear back from mnaser whether still to collect data to trace things19:18
clarkbmnaser: ^ resyncing on that would be good. Maybe not in the meeting as I don't think you're here, but at some point19:19
clarkb#topic Zuul web performance issues19:19
*** openstack changes topic to "Zuul web performance issues (Meeting topic: infra)"19:19
clarkbLast week users noticed that the zuul web ui was really slow. corvus identified a bug that caused zuul's status dashboard to always fetch the status json blob even when tabs were not foregrounded19:20
clarkband ianw sorted out why we weren't caching things properly in apache19:20
clarkbit seems now like we are stable (at least as a zuul web user I don't have current complaints)19:20
fungibrief anecdata, the zuul-web process is currently using around a third of a vcpu according to top, rather than pegged at 100%+ like it was last week19:20
clarkbI also have a third change up to run more zuul web processes. It seems we don't need that anymore so I may rewrite it to be a switch from mod rewrite proxying to mod proxy proxying as taht should perform better in apache too19:21
clarkbthank you for all the help on that one19:22
corvusclarkb: why would it perform better?19:22
corvusclarkb: (i think mod_rewrite uses mod_proxy with [P])19:22
clarkbcorvus: that was something that ianw foudn when digging into the cache behavior19:22
clarkbianw: ^ did you have more details? I think it may be because the rules matching is simpler?19:22
ianwwell the manual goes into it, i'll have to find the page i read19:23
corvusso something like "mod_proxy runs less regex matching code?"19:23
clarkb"Using this flag triggers the use of mod_proxy, without handling of persistent connections. This means the performance of your proxy will be better if you set it up with ProxyPass or ProxyPassMatch"19:23
clarkbI guess its connection handling that is the difference19:23
ianwno something about the way it's dispatched with threads or something19:23
ianwor yeah, what clarkb wrote :) ^19:24
clarkbfrom http://httpd.apache.org/docs/current/rewrite/flags.html#flag_p19:24
ianwthat's it :)19:24
corvusdoes that affect us?19:24
clarkbI could see that being part of the problem with many status requests19:25
clarkbif a new tcp connection has to be spun up for each one19:25
fungithat probably won't change the load on zuul-web, only maybe the apache workers' cycles?19:25
clarkbfungi: but ya the impact would be on the apache side not the zuul-web side19:25
corvusi'm not sure we ever evaluated whether zuul-web is well behaved with persistent connections19:25
clarkbcorvus: something to keep in mind if we change it I guess. Would you prefer we leave it with mod rewrite?19:27
corvusanyway, if we don't need mod_rewrite, i'm fine changing it.  maybe we'll learn something.  i normally prefer the flexibility of rewrite.19:27
fungiare the connections tied to a thread, or are the descriptors for them passed around between threads? if the former, then i suppose that could impact thread reaping/recycling too19:27
fungi(making stale data a concern)19:29
clarkbI think connection == thread and they have a tll19:30
clarkbor can ahve a ttl19:30
fungithat's all handled by cherrypy?19:30
clarkbno thats in apache19:30
clarkbI don't know what cherrypy does19:30
clarkbits less critical anyway since rewrite seems to work now19:31
fungioh, sorry, i meant the zuul-web threads. anyway yeah no need to dig into that in the meeting19:31
clarkbit was just part of my update to do multiple backends because rewrite can't do multiple backends I don't think19:31
clarkb#topic PyPI serving stale package indexes19:31
*** openstack changes topic to "PyPI serving stale package indexes (Meeting topic: infra)"19:31
clarkbThis has become the major CI issue for all of our python based things in the last day or so19:32
clarkbthe general behavior of it is project that uses constraints that pins to a recent (and latest) package version fails because only version prior to that latest version are present in the index served to  pip19:32
clarkbthere has been a lot of confusion about this from people. From thinking that 404s are the problem to afs19:33
clarkbAFS is not involved and there are no 404s. We appear to be getting back a valid index because pip says "here is the giant list of things I can install that doesn't include the version you want"19:33
clarkbprojects that don't use constraints (like zuul) may be installing prior versions of things occasionally since that won't error19:33
clarkbbut I expect they are mostly happy as a result (just something to keep in mind)19:34
fricklerI was wondering if we could keep a local cache of u-c pkgs in images, just like we do for git repos19:34
clarkbwe have seen this happen for different packages across our mirror nodes19:34
frickler#link https://pip.pypa.io/en/latest/user_guide/#installing-from-local-packages19:34
clarkbfrickler: I think pip will still check pypi19:34
clarkbthough it may not error if pypi only has older versions19:34
ianwi wonder if other people are seeing it, but without such strict constraints as you say, don't notice19:34
clarkbianw: ya that is my hunch19:35
fricklerrunning the pip download with master u-r gives me about 300M of cache, so that would sound feasible19:35
fricklerIIUC the "--no-index --find-links" should assure a pure local install19:35
clarkbthe other thing I notice is that it seems to be openstack built packages19:35
clarkbfrickler: that becomes tricky because it means we'd have to update our images before openstack requirements cna update constraints19:36
clarkbfrickler: I'd like to avoid that tight coupling if possible as it will become painful if image builds have separaet problems but openstack needs a bug fix python lib19:36
clarkbits possible we could use an image cache with still checking indexes though19:36
fungicurrent release workflow is that as soon as a new release is uploaded to pypi a change is pushed to the requirements repo to bump that constraint entry19:37
ianwfungi: i noticed in scrollback you're running some grabbing scripts; i had one going in gra1 yesterday to with no luck catching an error19:37
fungiianw: yep, we're evolving that to try to make it more pip-like in hopes of triggering maybe pip-specific behaviors from fastly19:37
fungii'm currently struggling to mimic the compression behavior though19:38
clarkbhttps://mirror.mtl01.inap.opendev.org/pypi/simple/python-designateclient/ version 4.1.0 is one that just failed a couple minutes ago at https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b5f/751040/4/check/openstack-tox-pep8/b5ff50c/job-output.txt19:38
ianwi've definitely managed to see it happen from fastly directly ... but i guess we can't rule out apache?19:38
clarkbbut if you load it 4.1.0 is there :/19:38
fungiwget manpage on the mirror server says it supports a --compression option but then wget itself complains about it being an unknown option19:38
clarkbianw: ya its possible that apache is serving old content for some reason,19:39
clarkbpip sets cache-control: max-age=019:39
clarkbthat should force apache to check with fastly that the content it has cached is current for every pip index request though19:39
clarkbwe could try disabling the pypi proxies in our base job19:40
clarkband see if things are still broken19:40
clarkbI expect they will be beacuse ya a few months back it seemed like we could repreoduce from fastly occasionally19:40
clarkbI also wonder if it could be a pip bug19:41
fungiaha, compression option confusion resolved... it will only be supported if wget is built with zlib, and ubuntu 19.04 is the earliest they added zlib to its build-deps19:41
clarkbperhaps in pips python version checking of packages it is excluding results for some reason19:41
clarkbbut I haven't been able to reproduce that either (was fiddling with pip installs locally against our mirrors and same version of pip here is fine with it)19:42
clarkbthe other thing I looked into was whether or not pip debug logging would help and I don't think it will19:43
corvusfungi: looks like curl has an option19:43
clarkbat least for succesful runs it doesn't seem to log index content19:43
clarkb`pip --log /path/to/file install foo` is one way to set that up which we could add to say devstack and have good coverage19:44
clarkbbut I doubt it will help19:44
corvusfungi: "curl -V" says "Features: ... libz" for me19:44
fungicorvus: yep, i'm working on rewriting with curl instead of wget19:45
clarkbnot having issues with common third party pacakges does make me wonder if it is something with how we build and release packages19:46
clarkbat least I've not seen anything like that fail. six, cryptography, coverage, eventlet, paramiko, etc why don't they do this too19:46
clarkbother than six they all have releases within the last month too19:47
clarkbI've largely been stumped. Maybe we should start to reach out to pypi even though what we've got is minimal19:48
clarkbAnything else to add to this? or should we move on?19:49
fungido all the things we've seen errors for have a common-ish release window?19:49
clarkbfungi: the oldest I've seen is oslo.log August 2619:49
clarkbfungi: but it has happened for newer packages too19:49
fungiokay, so the missing versions weren't all pushed on the same day19:49
clarkbcorrect19:49
ianwyeah, reproducing will be 99% of the battle i'm sure19:50
fungii suppose we could blow away apache's cache and restart it on all the mirror servers, just to rule that layer out19:50
fungithough that seems like an unfortunate measure19:50
clarkbfungi: ya its hard to only delete the pypi cache stuff19:51
ianwi was just looking if we could grep through, to see if we have something that looks like an old index there19:51
clarkbianw: we can but I think things are hashed in weird ways, its doable just painful19:51
ianwthat might clue us in if it *is* apache serving something old19:51
clarkbya that may be a good next step then if we sort out apache cache structure we might be able to do specific pruning19:52
clarkbif we decide that is necessary19:52
fricklerbut do we see 100% failure now? I was assuming some jobs would still pass19:52
fungialso if the problem isn't related to our apache layer (likely) but happens to clear up around the same time as we reset everything, we can't really know if what we did was the resolution19:53
ianwlooks like we can zcat the .data files in /var/cache/apache2/proxy19:53
ianwi'll poke a bit and see if i can find a smoking gun old index, for say python-designateclient in mtl-01 (the failure linked by clarkb)19:53
clarkbfrickler: correct many jobs seems fine19:53
frickleralso, not sure if you saw my comment earlier, I did see the same issue with local devstack w/o opendev mirror involved19:53
clarkbfrickler: its just enough fo them to be noticed and cause problems for developers. But if you look at it on an individual job basis most are passing I think19:53
fricklerso I don't see how apache2 could be the cause19:54
fungiand we're sure that the python version metadata says the constrained version is appropriate for the interpreter version pip is running under?19:54
clarkbfrickler: oh I hadn't seen that. Good to confirm that pypi itself seems to exhibit it in some cases19:54
clarkbfrickler: agreed19:54
clarkbfungi: as best as I can tell yes19:54
fungijust making sure it's not a case of openstack requirements suddenly leaking packages installed with newer interpreters into constraints entries for older interpreters which can't use them19:54
clarkbfungi: in part because those restrictions are now old enough that the previous version that pip says si valid also has the same restriction on the interpreter19:55
clarkbfungi: so it will say I can't install foo==1.2.3 but I can install foo==1.2.2 and they both have the same interpreter restriction in the index.html19:55
fungik19:55
clarkbhowever; double double checking that would be a good idea19:55
clarkbsince our packages tend to be more restricted tahn say cryptography19:55
clarkbanyway we're just about at time. I expect this to consume most of the rest of my time today.19:56
clarkbwe can coordinate further in #opendev and take it from there19:56
clarkb#topic Open Discussion19:56
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:56
clarkbAny other items that we want to call out before we run out of time?19:56
fungithe upcoming cinder volume maintenance in rax-dfw next month should no longer impact any of our servers19:57
clarkbthank you for handling that19:58
fungii went ahead and replaced or deleted all of them, with the exception of nb04 which is still pending deletion19:58
ianw++ thanks fungi!19:58
fungithey've also cleaned up all our old error_deleting volumes now too19:58
funginote however they've also warned us about an upcoming trove maintenance. databases for some services are going to be briefly unreachable19:58
fricklermaybe double check our backups of those are in good shape, just in case?19:59
fungi#info provider maintenance 2020-09-30 01:00-05:00 utc involving ~5-minute outages for databases used by cacti, refstack, translate, translate-dev, wiki, wiki-dev19:59
clarkbI think ianw did just check those but ya double checking them is a good idea19:59
clarkband we are at time.20:00
clarkbTHank you everyone!20:00
clarkb#endmeeting20:00
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"20:00
fungithanks clarkb!20:00
openstackMeeting ended Tue Sep 15 20:00:09 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-15-19.01.html20:00
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-15-19.01.txt20:00
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-15-19.01.log.html20:00

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!