Tuesday, 2020-09-15

*** Dmitrii-Sh9 has joined #opendev00:40
*** Dmitrii-Sh has quit IRC00:41
*** Dmitrii-Sh9 is now known as Dmitrii-Sh00:41
openstackgerritwu.shiming proposed openstack/diskimage-builder master: Remove install unnecessary packages  https://review.opendev.org/75192601:00
ianw[2020-09-15 01:16:46.308] "GET /api/status HTTP/1.1" 200 68963 "cache hit" "https://zuul.openstack.org/status" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0"01:17
ianwthat's better01:17
ianwhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64789&rra_id=all seems to show the host isn't under too much pressure01:22
openstackgerritwu.shiming proposed openstack/diskimage-builder master: Remove install unnecessary packages  https://review.opendev.org/75192602:06
*** iurygregory has quit IRC02:08
ianwERROR: No matching distribution found for oslo.log===4.4.0 (from -c /opt/stack/requirements/upper-constraints.txt (line 265))02:19
ianwseem to hit a bunch of dib jobs too :/02:20
fungiin ovh-gra1 again?02:23
ianwseems so, putting it together02:23
ianwfungi: iirc, you had to run a loop to occasionally get bad returns right?02:23
fungiyeah, just stuck date and wget in a for loop with a sleep 102:26
fungiwhile :;do sleep 10;echo `wget -SO- https://pypi.org/simple/libvirt-python/ 2>&1|grep -e 'X-Served-By: .*' -e '>libvirt-python-6\.6\.0\.tar\.gz<'|sed -e 's/.*X-Served-By: //' -e 's/.*pythonhosted.*>\(libvirt-python-6\.6\.0\.tar\.gz\)<.*/\1/'`;done|tee results02:27
fungithat's an example from when i was seeing libvirt-python-6.6.0.tar.gz sometimes missing02:28
fungifrom the simple api index02:28
fungii guess it was every 10 seconds02:29
fungiand i didn't timestamp them after all02:29
fungii was just logging which fastly backends returned what02:31
ianwgot it running, so far it's returning the right thing but we'll see02:36
ianwsigh, and something up with fedora mirror ... it's not a good day for the dib gate :/02:42
prometheanfireheh, 6 seconds off02:44
*** hashar has joined #opendev02:48
ianwthe results are something like "cache-bwi5141-BWI, cache-cdg20768-CDG"02:59
ianwwhere the first part is constant, and the second part seems to choose one of 58 difference instances02:59
ianwit's just looping around the same 58 results now, and all are returning the correct thing it seems03:17
ianwfor reference the failing results were in https://review.opendev.org/#/c/747878/03:19
ianw2020-09-15 01:21:01.448075 | ubuntu-bionic | ERROR: No matching distribution found for oslo.log===4.4.0 (from -c /opt/stack/requirements/upper-constraints.txt (line 265))03:19
ianwat or around that time03:19
hasharianw: looks like airport codes. bwi baltimore, cdg for paris ;)03:46
hasharParis, in France03:46
ianwyeah, so still 58 different backends, all responding correctly04:13
ianw2020-09-14 22:22:08.327035 | controller | ERROR: Could not find a version that satisfies the requirement oslo.log===4.4.004:15
*** ykarel|away has joined #opendev04:17
*** ykarel_ has joined #opendev04:21
*** ykarel|away has quit IRC04:24
*** Topner has joined #opendev04:32
*** fressi has joined #opendev04:35
*** Topner has quit IRC04:36
*** fressi has quit IRC04:44
*** hashar has quit IRC05:13
*** ysandeep|away is now known as ysandeep05:13
*** ykarel_ has quit IRC05:22
*** ykarel_ has joined #opendev05:23
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-docker: only run docker-setup.yaml when installed  https://review.opendev.org/74706205:48
openstackgerritIan Wienand proposed zuul/zuul-jobs master: update-json-file: add role to combine values into a .json  https://review.opendev.org/74683405:48
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-docker: Linaro MTU workaround  https://review.opendev.org/74706305:48
*** ykarel_ has quit IRC06:10
openstackgerritRico Lin proposed zuul/zuul-jobs master: Allow skip files when download logs  https://review.opendev.org/75197306:11
*** ykarel_ has joined #opendev06:11
*** lpetrut has joined #opendev06:14
*** qchris has quit IRC06:20
*** ykarel_ is now known as ykarel06:21
ianwinfra-root: ^ found https://58aca06c0a1750aa55ba-25dc1a1096eaf82a51392114c0a4a973.ssl.cf2.rackcdn.com/732435/7/check/heat-functional/852a5bf/controller/logs/screen-barbican-keystone-listener.txt06:22
ianwin a browser that gives me a "content encoding error", when I curl it I get back a 500 page06:23
ianwAn error occurred while processing your request.<p>06:23
ianwReference&#32;&#35;112&#46;9b052017&#46;1600151007&#46;18aba19906:23
ianwprobably something to keep an eye on06:24
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-docker: only run docker-setup.yaml when installed  https://review.opendev.org/74706206:27
openstackgerritIan Wienand proposed zuul/zuul-jobs master: update-json-file: add role to combine values into a .json  https://review.opendev.org/74683406:27
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-docker: Linaro MTU workaround  https://review.opendev.org/74706306:27
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Retire Fedora 31 for 32  https://review.opendev.org/75197506:27
fricklerianw: I've seen these content errors before and attributed them to upload errors. as long as they are only sporadic I guess that would be fine06:28
*** diablo_rojo has quit IRC06:29
*** qchris has joined #opendev06:34
ianwfrickler: ok, thanks; i guess with the amount we upload something is going to go wrong06:36
*** iurygregory has joined #opendev06:58
*** slaweq has joined #opendev07:05
*** andrewbonney has joined #opendev07:14
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Add support to use stow for ensure-python  https://review.opendev.org/75161107:15
*** hashar has joined #opendev07:24
*** fressi has joined #opendev07:27
*** tosky has joined #opendev07:34
*** priteau has joined #opendev07:39
openstackgerritWitold Bedyk proposed openstack/project-config master: Use only noop jobs for openstack/monasca-transform  https://review.opendev.org/75198307:58
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:02
*** sshnaidm|afk is now known as sshnaidm08:04
*** DSpider has joined #opendev08:06
openstackgerritWitold Bedyk proposed openstack/project-config master: End project gating for openstack/monasca-analytics  https://review.opendev.org/75198708:20
openstackgerritRico Lin proposed zuul/zuul-jobs master: Allow skip files when download logs  https://review.opendev.org/75197308:21
*** ricolin has joined #opendev08:26
*** ykarel is now known as ykarel|lunch08:28
ricolinianw, please review again if you got time:) https://review.opendev.org/#/c/75197308:33
openstackgerritWitold Bedyk proposed openstack/project-config master: End project gating for openstack/monasca-analytics  https://review.opendev.org/75198708:39
openstackgerritWitold Bedyk proposed openstack/project-config master: Remove openstack/monasca-analytics  https://review.opendev.org/75199308:46
*** ysandeep is now known as ysandeep|lunch08:55
*** dtantsur|afk is now known as dtantsur09:00
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed zuul/zuul-jobs master: Add support to use stow for ensure-python  https://review.opendev.org/75161109:09
*** ykarel|lunch is now known as ykarel09:12
openstackgerritSorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Made parse_jenkins_failure a non static  https://review.opendev.org/75199909:19
openstackgerritSorin Sbarnea (zbr) proposed opendev/system-config master: Sorted statusbot nick  https://review.opendev.org/75200809:48
*** tkajinam has quit IRC09:49
openstackgerritSorin Sbarnea (zbr) proposed opendev/system-config master: Allow zbr to use statusbot  https://review.opendev.org/75200909:53
openstackgerritwu.shiming proposed openstack/diskimage-builder master: Update imp module to importlib  https://review.opendev.org/75201110:02
*** ysandeep|lunch is now known as ysandeep10:10
*** slaweq has quit IRC10:13
*** slaweq has joined #opendev10:21
*** stephenfin has joined #opendev10:39
*** fressi has quit IRC11:16
fricklerinfra-root: mirror.kna1.airship-citycloud.opendev.org has no /afs mount, leading to lots of 403s. there was a process stuck trying to do "rmmod openafs" since the node was rebooted on friday. trying another reboot now11:22
*** fressi has joined #opendev11:32
*** Goneri has joined #opendev11:44
fungii don't recall us rebooting it11:52
fungilast manual reboot we logged for it at https://wiki.openstack.org/wiki/Infrastructure_Status was 2020-05-2011:53
*** lpetrut has quit IRC11:54
fricklermy reboot sadly hasn't helped, either11:54
fricklerso I deferring to someone with more experience debugging afs11:54
fricklerI'm11:54
frickleroh, now there's a backtrace in dmesg11:55
fricklerhttp://paste.openstack.org/show/797875/11:56
*** rosmaita has joined #opendev11:57
fungiyeah, not super useful without the ksyms12:08
fungilooks from /var/log/dpkg.log like unattended-upgrades replaced the kernel package 5 days ago and may have suffered a failure12:13
fungii'm trying to manually reinstall the kernel package and headers and rebuild the openafs module12:15
fungiand rebooting it12:17
*** Goneri has quit IRC12:23
*** sshnaidm has quit IRC12:25
*** slaweq has quit IRC12:30
fungi#status log reinstalled the kernel, kernel headers, and openafs-client on mirror.kna1.airship-citycloud.opendev.org and rebooted it, as it seems to have possibly been previously rebooted after an incomplete package update12:34
openstackstatusfungi: finished logging12:34
fungifrickler: seems to be working for me now, but i don't have time to test thoroughly just yet12:34
*** slaweq has joined #opendev12:35
*** priteau has quit IRC12:42
*** sshnaidm has joined #opendev12:50
*** priteau has joined #opendev12:55
fungii'm heading out to run errands, but should be back by 15:00 utc12:57
*** lpetrut has joined #opendev13:03
*** priteau has quit IRC13:15
*** hashar has quit IRC13:16
auristorfungi frickler ianw:  RIP: 0010:afs_CellNumValid+0x4f/0xd0 [openafs] not the result of an incomplete package update.   its due to a race during the startup of the openafs kernel module.  if another process attempts to read from openafs /proc data or issue an afs pioctl like the "fs" command or attempts to set tokens using "aklog" while afsd is configuring the kernel module, this crash can occur.13:26
zbrso basically there was nothing wrong with pypi?13:47
*** fressi_ has joined #opendev13:51
*** fressi has quit IRC13:51
*** fressi_ is now known as fressi13:51
*** qchris has quit IRC13:53
*** bbezak_ has joined #opendev13:57
*** ShadowJonathan_ has joined #opendev13:57
*** jentoio_ has joined #opendev13:57
*** davidlenwell_ has joined #opendev13:57
*** gouthamr__ has joined #opendev13:57
*** priteau has joined #opendev13:57
*** gmann_ has joined #opendev13:57
*** cgoncalves has quit IRC13:58
*** jbryce_ has joined #opendev13:58
*** mrunge_ has joined #opendev13:59
*** gouthamr has quit IRC14:00
*** ShadowJonathan has quit IRC14:00
*** jbryce has quit IRC14:00
*** davidlenwell has quit IRC14:00
*** bbezak has quit IRC14:00
*** fungi has quit IRC14:00
*** Eighth_Doctor has quit IRC14:00
*** logan- has quit IRC14:00
*** gmann has quit IRC14:00
*** mrunge has quit IRC14:00
*** jentoio has quit IRC14:00
*** jroll has quit IRC14:00
*** knikolla has quit IRC14:00
*** dtantsur has quit IRC14:00
*** jbryce_ is now known as jbryce14:00
*** bbezak_ is now known as bbezak14:00
*** ShadowJonathan_ is now known as ShadowJonathan14:00
*** gouthamr__ is now known as gouthamr14:00
*** davidlenwell_ is now known as davidlenwell14:00
*** jentoio_ is now known as jentoio14:00
*** logan_ has joined #opendev14:00
*** gmann_ is now known as gmann14:00
*** logan_ is now known as logan-14:00
*** qchris has joined #opendev14:04
*** knikolla has joined #opendev14:07
*** fungi has joined #opendev14:07
*** Eighth_Doctor has joined #opendev14:08
*** ysandeep is now known as ysandeep|away14:08
*** dtantsur has joined #opendev14:08
*** jroll has joined #opendev14:09
*** Tengu has joined #opendev14:17
*** Tengu has quit IRC14:25
*** cgoncalves has joined #opendev14:26
*** Tengu has joined #opendev14:27
clarkbzbr: pypi is proxied and does not involve afs. Also pypi issues were in another cloud  I believe these are separate problems14:28
*** Topner has joined #opendev14:30
zbrclarkb: thanks for clarifying. it didn't make any sense for me to use a distributed fs for a mirror.14:32
zbrs/mirror/proxy14:32
zbrclarkb: was anyone able to reproduce the pypi issue? see https://discuss.python.org/t/any-chance-to-an-issue-tracker-for-pypi-org-operational-problems/5219/314:34
zbrwithout a curl -v ... output from a specific address address, nobody will start working on it.14:35
*** Topner has quit IRC14:35
clarkbianw discovered there are 58 backends14:36
clarkband each likely has mulyiple servers14:36
clarkbpreviously the issur was attributed to fastly errors14:37
clarkband oslo.log 4.4.0 was updated around when those fastly errors happened last month14:37
clarkbI assume they just never fully resynced after that and there is 1/1000 cdn nodes that has a problem14:37
zbrseems like a time to write a script that tests an URL with all backends14:37
zbri bet is more than this, the number of failures I seen with my changes was more like 1/5 failed.14:38
clarkbzbr: but only in the last day14:39
zbrwas the whl or the tar.gz that failed to download?14:39
clarkbzbr: neither, its the index missing the latest package version whcih constraints requires14:39
clarkbthe index is old (like many weeks old)14:40
clarkbif we didn't use constraints we'd install the older version and be fine (likely why most people don't notice)14:41
clarkbbecause I'm sure other packages have the same problem14:42
Tenguconsistency issue in mirror/cdn? that's.... great.14:43
louroto/ we're getting apparently random POST_FAILUREs, e.g. here twice in a row: https://review.opendev.org/#/c/748650/ - is it known / related to what is being discussed here? thanks!14:46
clarkblourot: likely unrelated14:47
clarkbmirrors tend to be used in pre and run not post14:47
clarkbzbr: ianw tracked this down previously and this is the logs of what it was attributed to http://kafka.dcpython.org/day/pypa-dev/2020-08-25#00.11.33.PSFSlack14:48
clarkbzbr: note that oslo.log was released around the time of that incident14:48
clarkbI don't know if the issue is in fastly or pypi but its like they simply never recovered from that14:48
Tenguthough it's marked as "resolved" 22 days ago for fastly at least.14:49
Tenguclarkb: any link to those 58 backends?14:51
clarkbTengu: sure they may think it was resolved :) but they are still serving stale oslo.log indexes a month several weeks later14:51
clarkbTengu: you have to do the requests and see the headers14:51
clarkbdo enough of them and you build up a large collection14:52
Tenguclarkb: what request exactly? I came late to the party ^^'14:52
clarkbI don't know that they publish them anywhere14:52
clarkbTengu: https://pypi.org/simple/oslo.log is the source material. Then many of the failures were happenign in our ovh gra1 region which has a caching proxy to that url at https://mirror.gra1.ovh.opendev.org/pypi/simple/oslo.log14:53
clarkblourot: your errors are related to a new swift backend we added yesterday to store logs14:53
*** user_19173783170 has quit IRC14:53
Tenguyup. I'm getting 404 on that ovh proxy.14:53
clarkbTengu: it shouldn't 40414:54
Tenguclarkb: maybe purgin some cache... ?14:54
clarkb(it was never 404ing before)14:54
clarkbthe issue is that the index.html served by pypi (and cached by our proxy) was stale and lacked the 4.4.0 release14:54
Tenguoh. funky. lemme recheck the URL I used14:54
clarkbnote that url does not 404 and the 4.4.0 release is currently present if I load it14:55
Tenguindeed.14:55
Tenguoslo.log-4.4.0.tar.gz#sha256=14:55
fungiclarkb: zbr: to be exact, that's 58 endpoints which respond for requests from ovh-gra1, they won't be the same endpoints which respond from other parts of the world necessarily14:55
Tenguah. the 404 was on https://mirror.gra1.ovh.opendev.org/simple/oslo-log/   apparently missing the "pypi" prefix.14:56
Tenguthat URL comes from the CI job I've checked.14:56
*** Topner has joined #opendev14:56
clarkbfungi: maybe, I'm not sure how ianw generated the list14:58
clarkbTengu: can you share links to that? that appears to be a different issue14:58
fungiclarkb: a while loop over wget -S grepping out the X-whatever header which lists the fastly chain14:58
Tengu1s14:58
clarkbfungi: wgeting pypi.org directly?14:58
clarkbfrom the mirror?14:59
*** TheJulia has joined #opendev14:59
fungiyep, run on a vm in that cloud region (i assume on our mirror server for convenience)14:59
funginow that i'm back from errands i can try to continue the data collection experiment while multitasking with meetnigs14:59
Tenguclarkb: for instance, it hits https://mirror.gra1.ovh.opendev.org/pypi/simple which loads as expected. but every link in there are apparently /simple/<module-name>, missing the "pypi" prefix.14:59
clarkbalso can someone push a opendev/base-jobs change to remove the oe cloud from playbooks/base/post-logs.yaml and playbooks/base-minimal/post-logs.yaml? Its erroring (see lourot's link above) and my local disk is full so I can't do git things right now and I have a meeting to dial into :/14:59
Tengunot sure if it has anything to do though. No pypi/pip expert here, just a bit wondering.15:00
Tenguclarkb: for instance, in this job: https://e9bcfc635397bc0a3785-9ea232b51691ad27ed2d3f8993736155.ssl.cf2.rackcdn.com/751828/8/check/tripleo-build-containers-ubi-8/b164baf/job-output.txt15:00
clarkbTengu: but none of the links there should point at simple15:00
Tenguclarkb: sooooo.... that's the issue?15:00
clarkbTengu: if you load up the index page and inspect source you'll see they are all hosted under /pypifiles15:00
fungiprobably *an* issue but not the same issue we're digging into15:01
clarkbTengu: its possible they are serving stale and also different data15:01
fungior could be a related issue i guess15:01
clarkbbut different symptioms15:01
clarkbTengu: I don't think you are reading that error correctly15:02
clarkbTengu: what it is saying is that the index lists a bunch of versions but not 4.4.015:02
clarkbI don't see where it says https://mirror.gra1.ovh.opendev.org/simple/oslo-log/ is a 40415:02
Tenguclarkb: so. go here: https://mirror.gra1.ovh.opendev.org/pypi/simple   you get everything listed right? click on ANY link there, and it ends 404.15:02
clarkbTengu: that isn't what pip is doing15:03
Tengustill weird the links on that index page are just plain wrong.15:03
clarkbTengu: pip doesn't rely on the top level index (so if thats broken its a problem maybe but not the problem)15:04
Tengu'k.15:04
clarkbpip instead will directly lookup package names like /pypi/simple/oslo.log15:04
clarkbthat gives it an html index with a list of versions and some metadata about the versions of python those packages support15:04
clarkbThe error you're seeing is that it sees a bunch of versions like 4.3.0 but not 4.4.0 which is what openstack's constraints requires15:04
clarkbbecause constraints requires a very specific version that does not exist pip then errors15:05
Tenguok. so that top-level index is just misleading basic humans hoping to just click'n'follow links then.15:05
fungior more specifically, which isn't included in the index it's returning for that package name15:05
clarkbthe last time we looked into this pypi said there were fastly errors, see http://kafka.dcpython.org/day/pypa-dev/2020-08-25#00.11.33.PSFSlack and those happened around the time when oslo.log was released. That makes me suspect that they never fully resynced the stale data after that issue15:05
Tenguso yeah, I do understand it doesnt find the right version - I was just surprised ending in a stupid 404 when just browsing the ovh host.15:06
clarkbI suspect if we released a oslo.log 4.4.1 this issue would correct15:06
clarkbI also suspect if pypi took their users at their word and resnycing any packages publsihed around august 24/25 we'd be good too15:06
clarkbbut without doing a billion requests to find the needle in the haystack I bet we won't get that15:07
Tenguso getting a dummy 4.4.1 might just do the trick15:07
Tenguand would probably be faster15:07
Tenguand we'd get 4.4.0 (the one we want in the requirements) and that new 4.4.115:07
openstackgerritClark Boylan proposed opendev/base-jobs master: Disable OE swift for log storage  https://review.opendev.org/75206615:08
yoctozeptomorning infra15:08
clarkbI found a VM image I could rm :) infra-root donnyd fyi ^ we should land that then debug why we get errors on some OE swift uploads15:08
clarkbTengu: that is my hunch15:09
fungii've got a loop going every 10 seconds testing https://pypi.org/simple/oslo.log/ looking for whether it includes oslo.log-4.4.0.tar.gz and logging the X-Served-By: header values in the responses15:09
fungirunning on mirror.gra1.ovh15:09
yoctozeptoI see you already know how bad the situation is15:09
clarkbTengu: the reason you get a 404 at the root index is because we're proxying pypi under a subpath in this case /pypi and we aren't doing rewrite of the response content (just the headers)15:10
fungiyoctozepto: yes, i'm hoping we can collect sufficient data to convince the pypi admins there's an issue in fastly so they can report it or at least try to get the problem indices recirculated15:10
clarkbTengu: we can fix this by moving pypi proxy caching onto a separate port (then we can drop the subpath)15:11
Tenguclarkb: ahhh15:11
clarkbTengu: but pip doesn't care so we haven't bothered15:11
*** ykarel is now known as ykarel|away15:12
fungiodds are there was a fastly endpoint in france which was down/disconnected during the incident around the time oslo.log 4.4.0 got uploaded, so missed the memo, and has recently been brought back online in the last day or so15:12
yoctozeptofungi: /me keeping fingers crossed for the infra team :-)15:12
smcginnisOf course, all the oslo deliverables were released around the same time, so we would probably have to refresh all of them.15:14
yoctozeptosmcginnis: ++, I saw different oslos failing indeed15:14
*** slaweq has quit IRC15:19
*** slaweq has joined #opendev15:23
clarkbyoctozepto: can you link to the other similar failures? we should double check that they aren't python version mismatches and that they exhibit the same basic behavior (may help in tracking it down)15:27
clarkbfungi: fyi https://review.opendev.org/#/c/752066/15:28
clarkbdonnyd: re ^ we are getting HTTP 400 errors for container urls looks like15:28
*** ysandeep|away is now known as ysandeep15:28
clarkbdonnyd: there isn't a whole log of detail. Maybe you can see more on your side? zuul_opendev_logs_d0d is one that had this problem15:29
donnydThat is vert strange15:29
clarkbdonnyd: happened around 2020-09-15 14:38:16 from ze09.openstack.org (it should have ipv4 and ipv6 addrs if you need to check both)15:30
donnydWell that would be because container d0d was never created15:35
donnyddonny@office:~> openstack container list |grep d0d15:35
donnyddonny@office:~>15:35
yoctozeptoclarkb: sure, I'll create a paste of them15:37
donnydnow that is strange15:39
donnydWhen I checked at 6am there were 1004 containers... when I check just now... there are 1004 containers15:39
donnydthere should be more15:39
donnydmaybe there is a quota limit on the # of containers by default15:40
yoctozeptoclarkb: done: http://paste.openstack.org/show/797891/15:41
johnsomOh joy, the packages are still not loading... Let me know if I can collect any info for you.15:42
donnydannnnd I found the issue15:45
donnydhttps://www.irccloud.com/pastebin/P6b5ZSGC/15:46
donnydclarkb: ^^^15:46
clarkbdonnyd: we need to increase max buckets?15:46
clarkbdonnyd: let me do some math to give you an idea of how many we use15:46
clarkb(its capped)15:46
*** lpetrut has quit IRC15:47
donnydyep15:48
donnydI would rather just disable it15:48
donnydhttps://www.irccloud.com/pastebin/nvyGm5h7/15:50
clarkbdonnyd: ah ok I was having a hard time finding wheer we generated it (I know I edited in the past and its capped)15:51
clarkbremoving the limit works too15:51
donnydshould be good to go now15:51
clarkbdonnyd: infra-root I WIP'd https://review.opendev.org/#/c/752066/1 as we think ^ should fix it15:51
donnydcan I get someone with the openstackzuul user to try to create a bucket?15:52
clarkbdonnyd: ya I can create a test bucket15:52
clarkbdonnyd: Unauthorized (HTTP 401)15:54
clarkbI can list them though15:54
donnydso I was getting the same error and I though it was maybe just me15:55
donnydok, I will get it fixed and get back to you15:55
clarkbdonnyd: should we land the disabling change for now?15:55
clarkbI assume that error will hit jobs if they try to create containers too15:55
donnydyes15:55
donnydI would assume until I get this sorted it will fail jobs15:56
clarkbok I've approved the change15:56
clarkbthank you for looking at it15:56
donnydso setting that value to -1 does not work15:57
donnydit makes things come back as unauth15:57
donnyd    "max_buckets": 999999999,15:58
donnydtry it now15:58
clarkbdonnyd: that worked it created clarkb_test15:59
clarkbcan you double check you see ^15:59
clarkbremoved my approval from the base-jobs update now :)15:59
donnyddonny@office:~> openstack container list |grep clark15:59
donnyd| clarkb_test           |15:59
donnydyep, I can see15:59
donnydI will have to figure out how to just disable that entirely16:00
donnyddon't want to get to 1000000000 and have it fail again16:01
donnydshould be good for now though16:01
clarkbdonnyd: we shard with the first 3 chars of the build uuid. build uuids can have a-f and 0-9 in them which is a total of 16 * 15 * 14 containers I think16:03
clarkbdonnyd: so ~336016:03
donnydso we should be good with 999999999 for the foreseeable future16:03
clarkbya16:03
fungiyeah, i guess the default was just only enough for roughly a third of what we needed16:04
clarkber no its 16 * 3 because you can have duplicates16:04
clarkbsorry I was thinking of deck of cards16:04
clarkber 16 ** 316:04
clarkbmath I can do it :)16:04
clarkb409616:04
clarkbif you want to set it to 8192 it should be plent16:04
fungis/third/quarter/ then ;)16:04
donnydok, 24576 is what I set it to16:05
donnydI can still create container16:06
donnydI think this is resolved16:06
donnydsorry for that16:06
donnydshould have thought about it before16:06
donnydexit16:06
donnydwrong terminal16:07
fungiexeunt!16:07
*** ykarel|away has quit IRC16:09
-openstackstatus- NOTICE: Our PyPI caching proxies are serving stale package indexes for some packages. We think because PyPI's CDN is serving stale package indexes. We are sorting out how we can either fix or workaround that. In the meantime updating requirements is likely the wrong option.16:09
openstackgerritMerged opendev/elastic-recheck master: Create elastic-recheck container image  https://review.opendev.org/75095816:14
*** ysandeep is now known as ysandeep|away16:16
auristorno16:30
auristoroops16:30
*** Topner has quit IRC16:33
*** fressi has left #opendev16:35
*** dtantsur is now known as dtantsur|afk16:46
*** slaweq has quit IRC16:47
clarkbugh I've run out of local disk space again. That 3.2GB went quickly. I must have a misbehaving daemon chewing up log space or something17:15
clarkband now I'm deep in the bowels of btrfs balancing17:30
fungimy sympathies17:35
fungii'm back to trying to bang my head against the storyboard continuous deployment being noncontinuous17:36
fungiand also getting openstack wallaby cycle elections underway17:36
*** paladox has quit IRC17:37
*** paladox has joined #opendev17:38
*** tosky has quit IRC17:40
clarkbthe good news is I think I've sorted my problem on17:42
clarkbits basically defragmenting btrfs17:42
clarkbI thought we left those problems behind17:43
smcginnisDoes btrfs have a pleasing UI to watch it defrag? :)17:44
clarkbsmcginnis: btrfs balance status -v /path17:44
clarkbup to you if you find that pleasing17:44
smcginnisI want to watch all the little blocks move around. Was my favorite part about defrag.exe.17:44
*** diablo_rojo has joined #opendev17:44
clarkbfedora users beware :)17:45
clarkbI'm going to ensure both of my btrfs filesystems are happy before doing anything else today as I've hit problems with ti a couple times in a short period and its super frustrating17:48
clarkbok I think I'm set now with plenty of free space17:57
clarkbback to trying to grok the pypi things17:57
johnsomI had to dump btrfs. The VMs were killing it with painful CoW chains.18:00
fungiError: /Stage[main]/Storyboard::Application/Exec[install-storyboard]: 'pip install /opt/storyboard' returned 1 instead of one of [0]18:02
fungiwell, doesn't give me the output from pip, but at least it's somewhere to start from18:02
clarkbfungi: I'm pulling up recent fails from the zuul status page a few within the last ~6 minutes. Anyway out of my ~3 examples all of them have the versions in the index files if I request them through the mirror in my browser18:02
clarkbthat does make me wonder if pip is doing something with headers or maybe fetching some other path that is breaking18:03
fungiyeah, and i still haven't caught any wgets to pypi from the mirrors returning stale content18:03
fungiwhat packages/releases/regions were those? anything besides oslo.log 4.4.0 in ovh-gra1 and oslo.service 2.4.0 in rax-dfw?18:04
clarkbya https://mirror.ord.rax.opendev.org/pypi/simple/python-neutronclient/ from https://ace7c3bbea1b346d61ae-b7ce26267fa7738cf72e86f3ba4b5e8c.ssl.cf5.rackcdn.com/751040/3/check/openstack-tox-py38/20a4543/job-output.txt18:04
clarkbhttps://mirror.mtl01.inap.opendev.org/pypi/simple/reno/ from https://api.us-east.open-edge.io:8080/swift/v1/AUTH_e02c11e4e2c24efc98022353c88ab506/zuul_opendev_logs_66f/751040/3/check/openstack-tox-py36/66f5320/job-output.txt and https://ee2d68f892c305f185f6-70dee07df320095fd5d268c658f557da.ssl.cf2.rackcdn.com/749748/5/check/openstack-tox-docs/3a08312/job-output.txt18:05
clarkbalso thats two different hosts in the same region seeing the same content which makes it extra weird that if I fetch it I can't reproduce18:05
fungiso i could also start similarly polling in rax-ord for python-neutronclient <someversion> and...18:05
clarkbgranted I may be fetching after the ttl has expired on that18:05
fungiwere any of the failures you observed for anything you already tried to flush in the cdn?18:06
clarkbno I've only tried to flush oslo.log18:06
fungijust wondering whether we have positive evidence of that workaround not helping18:07
clarkbanother curiousity is that its always packages we control it seems like18:07
clarkband all the things have releases since end of august18:08
fungiokay, i've confirmed the fact that we stopped deploying storyboard after the change to drop python2.7 was no coincidence: storyboard-api requires Python '>=3.6' but the running Python is 2.7.1218:08
clarkbkind of makes me wonder if it is something about pushing newer packages18:08
clarkbfungi: pip sets cache-control: max-age=0 when fetching indexes eg /pypi/simple/reno18:22
fungineat, so we're always going to cache-miss those regardless18:22
*** andrewbonney has quit IRC18:23
clarkbthe server then sets cache-control: max-age=60018:23
clarkbI think max-age=0 from the client means the proxy has to recheck the content is hasn't changed18:24
clarkbbut that is a simple check and maybe we're breaking in that somehopw18:24
clarkbexcept if that was the case I'd expect it to serve me the broken version when I load it 6 minutes later18:24
fungiit does also debunk the theory that we're somehow magnifying the problem by caching and continuing to serve stale responses for multiple consecutive requests18:25
clarkbfungi: fwiw I sorted out pip request headers via tcpdump and installing reno via http://mirror.mtl01.inap.opendev.org18:27
clarkbthe index contents themselves are gzip'd so I'll need to do better packet capture than tcpdump -A to verify thosecontents however I figure the install is working for reno 3.2.0 so the index contents likely won't show anyhting usefl18:27
clarkbfungi: for reproducing though we may need to set max-age=0 like pip does18:30
clarkbbecause that may also effect the fastly caches18:31
clarkbyour wgets are probably not setting that so can use good cached data, but maybe this breaks when fastly has to refresh or something18:31
clarkbin the case of our jobs talking through our caches pip will make a request with max-age=0. That will force our cache to check with fastly if things have been updated. That check may not actually include the max-age=0 though its probably implied when you are checking updated timestamps. If fastly says we are stale then we send the request to reup the value to fastly which will include the max-age=0 which18:34
clarkbshould force fastly to repopulate data?18:34
clarkbif fastly says we are good then we just serve the cached content18:34
clarkbfungi: I wonder if we shouldn't try and trace this out and draw it up18:37
clarkbjust thinking through it in my head it does seem like we'd get stale indexes more often this way if there are stale indexes in fastly18:38
clarkbbecause pip is going to always check and if they are stale compared to what we've got we'll be told to "update"18:38
clarkbthen we fetch the bad data and have a sad. Then the next request that comes by checks and sees we are actually stale and updates again and gets good results18:38
clarkbwhich can explain why this is so difficult to diagnose18:39
fungiso should i adjust wget to send a max-age=0 with the request to get a pull through the cache?18:40
clarkbyes I think so in order to mimic what pip is doing18:43
clarkb(and I wonder if it is that behavior that increases the odds of getting the stale content)18:43
clarkbhttps://868d9d181f45f3dc08e4-0ff94bb25fd1b2b22d207854a15b8765.ssl.cf5.rackcdn.com/744883/16/check/openstack-tox-py36/96380ed/job-output.txt just complained about https://mirror.ord.rax.opendev.org/pypi/simple/taskflow/ but loading it in my brwoser its fine18:45
fungi--header='Cache-Control: max-age=0' i guess?18:49
clarkbyes that looks correct to me18:49
*** rosmaita has quit IRC18:49
fungiokay, i've incorporated that into my test loops18:50
*** Topner has joined #opendev19:08
*** Topner has quit IRC19:13
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Retire Fedora 31 for 32  https://review.opendev.org/75197519:15
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-docker: only run docker-setup.yaml when installed  https://review.opendev.org/74706219:15
openstackgerritIan Wienand proposed zuul/zuul-jobs master: update-json-file: add role to combine values into a .json  https://review.opendev.org/74683419:15
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-docker: Linaro MTU workaround  https://review.opendev.org/74706319:15
*** priteau has quit IRC19:16
clarkbfungi: for example https://702b7e8f253d29e679a6-2fe3f6c342189909aad5220492fb4721.ssl.cf1.rackcdn.com/743189/10/check/openstack-tox-py38/2c678fd/job-output.txt complains about octavia-lib==2.2.0 but says 2.1.1 is valid. If you view source on https://mirror.mtl01.inap.opendev.org/pypi/simple/octavia-lib/ they appear to have the same interpreter requirements19:59
clarkbit really does seem like we're getting the previous index19:59
clarkbrather than breaking with python versions19:59
clarkbhttps://etherpad.opendev.org/debugging-pypi-index-problems I'm going to start taking notes there20:00
johnsomWhich package are you using for the cache? squid?20:03
fungiapache mod_cache20:03
clarkbbut as frickler reported this has been observed locally running devstack wthout our mirrors20:05
clarkband we haev seen this before talking directly to fastly too20:05
clarkb(strongly implying it isn't apache that is at fault)20:05
johnsomI assume you have tried a flush on an apache instance that is behaving strange too. That would also rule out mod_cache issue20:09
clarkbjohnsom: that is what we just discussed in our meeting20:09
clarkbI think we want to observe what we've cached first20:09
fungiwe talked about it in the meeting. if it continues after we flush the cache then we can say it's likely not the cache, but if it stops after we flush the cache we don't know that it was the cache flushing which resolved it or just coincidentally happened to clear up on its own at the same time20:10
clarkb(made difficult by the structure of the cache data, but should be doable)20:10
johnsomYes, fair enough20:10
fungialso doesn't explain why its cropped up in different provider regions each with its own cache in the span of a day or so20:11
fungithe problem would have had to start impacting multiple caches on different virtual machines in different providers all at approximately the same time20:11
johnsomWell, we don't know it's a problem until upper-constraints update, then we all start looking for new versions20:11
fungii suppose it couldn't hurt to check when the problem entries were added to the constraints lists, but i understand were seeing it complain about missing releases which were tagged several weeks ago20:12
clarkbjohnsom: ya though some of these pacakges like oslo.log are old20:12
clarkbdid we not update constraints for oslo.log until recently maybe?20:12
clarkbfungi: correct20:12
johnsom20 days ago, but the new rush for this is probably the switch to focal and needing py3.x support20:14
fungifolks started reporting the problem yesterday20:15
fungii think20:15
clarkbfungi: ya20:15
johnsomYeah, but it's happened at least twice before in the past20:15
fungioh, it's happened many times in recent years20:15
clarkbthe joys of cdn20:15
fungibut more recently we've looked into incidents on august 19 and august 2520:16
clarkbwe've definitely tracked it back to cdn returning stale data before20:16
clarkbthe difficulty now is that its persisting and not reproducing as easily20:16
clarkband last time it came up pypi pointed at a fastly outage20:16
clarkbmaybe its the same thing this time around and fastly hasn't noticed yet20:16
* clarkb finds lunch20:16
smcginnisNearly all projects should have been running in py3.x for quite some time now.20:17
smcginnisAnd yeah, some of these were updated in upper-constraints for weeks now.20:17
johnsomYeah, most have been, but the focal switch caused new problems that forced most everyone to bump lower-constraints20:17
fungiare these mostly lower-constraints jobs we're seeing?20:19
johnsomNo20:19
johnsomI have seen it across tempest and unit tests. It seems more regional than test related.20:20
ianwi don't exactly know what this means yet, but i've found 3 instances of what looks like the pypi python-designateclient page on mtl-120:22
ianwhttp://paste.openstack.org/show/797902/20:22
ianwyes!  this has headers20:24
ianwhttp://paste.openstack.org/show/797903/20:26
ianwfungi / clarkb: ^ double check my work here, but this looks like a cached version of the pypi page *without* designateclient 4.1.020:26
ianwDate: Tue, 15 Sep 2020 07:24:38 GMT20:27
clarkbwith no serial and no interpreter requirements20:28
clarkbwhen did we release 4.1.0?20:28
clarkbnot today ya?20:28
smcginnisNo, not today.20:28
smcginnisChecking...20:28
clarkbif so that points to fastly I think given the date20:28
*** Topner has joined #opendev20:28
smcginnisAugust 3 is the merge commit.20:28
ianwclarkb: yeah, you're right, no serial on that result, i didn't accidentally cut it off20:28
ianwdouble check mtl-01 mirror /var/cache/apache2/proxy/91/pq/qD/Xf/1O/Cmdy2okLRNwQ.header.vary/BJ/1z/1Z/_7/jD/HNLd0wzgcbGA.data20:29
smcginnisAnd python-designateclient also updated in upper-constraints on Aug 3.20:29
fungiokay, i've revamped the test loop to use curl, pass similar compression headers, the cache-control max-age=0 and also a lifelike pip ua i scraped from one of our logs20:30
ianwthere's now 3 other hits from walking the cache; they all have 4.1.0 and a serila20:30
ianwserial even20:30
fungiwhich is the mtl-01 mirror? ca-ymq-1.vexxhost?20:31
clarkbfungi: mtl01.inap20:31
fungiahh20:31
fungidid we get failures there too?20:31
clarkbianw makena copy so that htcacheclean doesnt accidentally get it20:31
clarkbfungi: yes all over20:32
ianwthat was the one mentioned in the meeting just now20:32
fungigot it20:32
*** Topner has quit IRC20:33
fungiand yeah, that cache example was modified today, probably created today but stat doesn't report creation time20:34
clarkbianw: if I zcat that I see no 4.1.0 but don't see the headers20:34
ianwheaders are in ./91/pq/qD/Xf/1O/Cmdy2okLRNwQ.header.vary/BJ/1z/1Z/_7/jD/HNLd0wzgcbGA.header20:34
clarkbI guess that is a separate file20:34
clarkbthanks20:34
clarkband ya I agree with what yuo've found20:35
fungiX-Served-By: cache-lcy19265-LCY, cache-yul8920-YUL20:35
clarkbnote the index.html is completetly different too20:35
clarkbthe sdists and wheels are in two separate lists rather than a shared list sorted by version20:36
clarkbthee is no python interpreter requirements either20:36
clarkbI think thats what we needed to go to pypi. Though we should note we see similar all over the place20:37
clarkbits just this is what we managed to catch20:37
clarkbhttps://702b7e8f253d29e679a6-2fe3f6c342189909aad5220492fb4721.ssl.cf1.rackcdn.com/743189/10/check/openstack-tox-py38/2c678fd/job-output.txt is octavia-lib 2.2.0 on the same mirror20:38
clarkbianw: ^ maybe you want to run the same find on that and we can see if the same x-served-by shows up?20:38
ianwhttp://paste.openstack.org/show/797905/20:39
ianwcomparison of the two headers for the designate index page; one good, the pip one bad20:40
ianwok, i'll kill and see if we can find similar for octavia-lib20:41
ianwhttp://paste.openstack.org/show/797906/ would seem to be it20:42
clarkbthee is no x-last-serial in the bad one either20:43
ianwDate: Tue, 15 Sep 2020 07:22:02 GMT .. same time20:43
ianw~ same time, anyway20:43
clarkbLCY again20:43
clarkbtoo20:43
clarkbLCY is it you!?20:44
ianwbut, apache must be deciding that it can serve this data based on the headers coming back from pypi/fastly, right?20:44
fungii guess lcy is london?20:44
clarkbianw: yes, I think the way it works with the client setting max-age=0 is apache will ask the backend if the data it has is fresh20:44
ianwhashar i think pointed out they're airport codes20:44
johnsomThere is an "I love Lucy" joke in there20:45
fungiright, lcy is the iata code for london city airport20:45
clarkbianw: so what must've happened there is we asked LCY again if it was fresh and it said yes and we reused it?20:45
ianwthat's going to be the e-tag matching?20:45
* ianw goes to read exactly what etag is20:45
clarkbianw: I'm not entirely sure how apache checks but I would expect something like etags yes20:45
clarkbfungi: fwiw I found fastly docs that confirmed what you said about hat header. The first region is the one that atlks to pypi directly and the second is the one that we talk to20:48
ianwso theory is apache does a GET request, gets back headers, the returned ETag: must match what we have in the cache there, and we serve up the old data20:48
clarkbianw: ya20:48
clarkbianw: and the other times when it works we get the other etag and we match that to the other content we've cached20:48
clarkbif fastly stops serving the bad etag'd data we should stop using what we've cached20:49
clarkb(because it won't be considered valid anymore)20:49
clarkbI expect this is sufficient info to take to them20:50
clarkbwhile also noting that we're seeing it in a fairly globally distributed manner so it may affect more than just LCY20:50
clarkbrax dfw and ord have hit it, ovh gra1, and inap mtl01 at least20:51
clarkbwith various packages20:51
johnsomThe Last-Modified header seems suspicious as well. In the bad examples, it's 9 of April.20:51
clarkbjohnsom: ya and the content itself is all different too20:51
clarkbjohnsom: current content is a list sorted by version. Old content is two list one for sdists and one for wheels20:51
clarkbcurrent content has a serial in the html old content doesn't20:51
ianwwhen did <!--SERIAL 8050067--> get added?20:51
clarkbianw: I don't know20:52
ianwwas that a flag day that pypa regenerated every index file?20:52
clarkb(I just know that its there now :) )20:52
ianwthat might be a data point20:52
ianwi think we have quite a bit of data ... the question is ... who can do anything about it :)20:53
clarkbpypi I think. The warehouse issue tracker is where they ask for feedback and issues20:53
clarkbhttps://github.com/pypa/warehouse/issues20:53
fungior ping dstufft and di in #pypa-dev on freenode20:53
*** slaweq has joined #opendev20:54
clarkband zbr posted https://discuss.python.org/t/any-chance-to-an-issue-tracker-for-pypi-org-operational-problems/5219/3 which is related20:54
clarkbalso note that it isn't 404s...20:54
*** slaweq has quit IRC20:55
clarkbI'm happy to write the issue but don't want to step on toes (ianw did the magic that sorted it out I think)20:55
clarkblet me know how I can help and I'll gladly do what I can :)20:55
ianwi think we should get maybe a few more results from some of the other regions to amass a bigger picture20:55
clarkbianw: sunds good. Let me look through my job links to find some from not mtl0120:56
clarkbhttps://60ce53b5f45ed0ef7bd3-e873feb845d99f2e0685947947034235.ssl.cf1.rackcdn.com/751040/3/check/openstack-tox-pep8/1921a1e/job-output.txt oslo.serialization from rax dfw20:56
clarkbhttps://868d9d181f45f3dc08e4-0ff94bb25fd1b2b22d207854a15b8765.ssl.cf5.rackcdn.com/744883/16/check/openstack-tox-py36/96380ed/job-output.txt taskflow from rax ord20:56
ianwi need to get some breakfast :)  maybe we should etherpad up a few results with headers/timestamps and then we can write a good bug20:56
ianwbib20:56
clarkbhttps://api.us-east.open-edge.io:8080/swift/v1/AUTH_e02c11e4e2c24efc98022353c88ab506/zuul_opendev_logs_158/743189/10/check/openstack-tox-pep8/158d6e1/job-output.txt octavia-lib on ovh bhs120:57
fungii can sympathize with their desire not to maintain an "issue tracker for operational problems" any more than we want to, they want help tracking these problems down and assistance solving them, not some black box where people can drop their complaints20:57
clarkbhttps://etherpad.opendev.org/p/debugging-pypi-index-problems lets keep working there20:58
clarkbI'll incorporate what we've arleady found on mtl01 on that etherpad20:58
clarkbthen start on the list above20:58
*** slaweq has joined #opendev20:58
fungiwow, bug 1449136 is so very full of misinformation21:01
openstackbug 1449136 in tripleo "Pip fails to find distribution for package" [Critical,Triaged] https://launchpad.net/bugs/144913621:01
fungioutright blaming it on our caches, conflating the swift log upload issue with the stale pypi indices...21:02
fungii'm just going to pretend i didn't see that rather than rage comment in it21:02
clarkbyes, I've basically decided I'll continue to set teh record straight here :)21:02
fungii have openstack elections to finish setting up, and related tooling bugs to hotfix21:05
*** slaweq has quit IRC21:05
clarkbI'm working on the rax dfw oslo.serialization grepping now21:07
clarkbdfw is lcy too21:13
clarkbputting details on the etherpad21:13
ianwclarkb: i can pull up bhs1 if you haven't already?21:15
clarkbianw: go for it. I was going to add more info on the others and copy the files on dfw to my home dir so that htcacheclean doesn't rm them21:16
fungiyeah, i think the only first-level caches i've seen appear in responses are either bwi (baltimore) or lcy (london)21:18
clarkbianw: I'm just copying them into the root of my homedir on each of the mirrors I do that way we've got the exact file for later if we need it21:19
clarkband thats done now for mtl01 and dfw. I'll do rax ord now21:19
*** slaweq has joined #opendev21:20
ianwi've asked in #pypa-dev about the index file format, and if anyone knows around what time it changed, as i think this might be a clue to where stale entries might be coming from21:21
ianwit's not generally a very active channel though, so we'll see21:21
clarkbif we note it in the warehouse issue I'm sure thats something they'll track down21:21
ianwyeah for sure, that's the path forwrad21:21
clarkbanother interesting thing is the format of the etag is different21:23
clarkbits almost like they've got a stale warehouse backend for fastly21:23
clarkband its looking at an old db or something21:23
ianwclarkb: mirror.bhs1.ovh.opendev.org:/var/cache/apache2/proxy/4r/Kl/1x/Zd/Wx/DOmpJToA1chw.header.vary/Ht/iP/SP/ng/Z2/v1_4jNm08zWA.<header|data> is the one to scp21:25
clarkbianw: oh I'm not scpin'g I'm just cp'ing to my homedir on the mirror21:26
clarkbbut I'll do that now21:26
clarkbwas mostly worried about htcacheclean21:26
clarkbalso LCY again21:27
clarkbon ord I've only found what looks like a good index for taskflow21:27
clarkbstill waiting for find to find more21:27
clarkbmy draft is bad :?21:31
clarkband ya feel free to hack it up :)21:34
ianwok, relaying pypa-dev, but there is somehow an old bandersnatch instance involved behind pypi/cdn21:40
clarkbianw: do they have a public log somewhere?21:41
clarkbhttp://kafka.dcpython.org/day/pypa-dev/2020-09-15 found it21:42
fungioh fun21:42
clarkbfungi: ianw you think I should file an issue with what we've got? and there shouldn't be any concern attaching the files verbatim ya?21:58
clarkbI'll gunzip the data too21:58
*** slaweq has quit IRC21:59
ianwclarkb: yeah, i think pypa is narrowing in on the issue but an issue will be good22:00
ianwif nothing else to point people to22:00
clarkbya22:00
clarkbI'll work on that next22:00
fungiyeah, that would be a good next step22:01
ianwi get the feeling there's going to be a super-out-of-date-failing-bandersnatch behind this ... i will certainly not throw stones because it's a glass house we have been in :)22:05
clarkbya22:09
fungii used to live in that glass house. it had a lovely view, but eventually ran out of space22:12
clarkbhttps://github.com/pypa/warehouse/issues/8568 for those following along22:25
clarkbdstufft has also found that their fallback bandersnatch mirror (what fastly talks to if it fails to talk to pypi proper) has filled its 12TB of disk22:25
clarkband so it is stale and this is likely the issue22:25
clarkbwe only see this when fastly falls back to the bandersnatch mirror22:25
clarkbthey are discussing how to fix it now22:26
*** DSpider has quit IRC22:26
fungii'm guessing there's a separate issue which has caused fastly's first level cache in london to start having problems reaching warehouse proper, causing it to fall back to the 2-month-stale bandersnatch mirror more often22:29
fungiwhich would explain it just starting to get bad again in the last day or two22:30
ianwhttps://mirror.dub1.pypi.io ... i wonder if that is dub1 too, which seems close to london ...22:30
ianwsorry i mean dublin?22:30
ianwmirror.dub1.pypi.io is an alias for ec2-34-244-193-164.eu-west-1.compute.amazonaws.com.22:30
clarkbsounds like they are removing the bandersnatch fallback from fastly22:31
clarkbI think that means we may see more failures but pip retries those so should make our jobs happier22:31
fungisince it looks like their disk filled up in early august, that would also cover the somewhat recent flare-ups we say on august 19 and august 2522:31
clarkbanyone want to update the lp bug that isn't me? >_> I'll do it if not22:31
fungithe lp bug is for tripleo anyway, i guess it can be full of wrong if that's what they prefer22:32
clarkbalso I'm really glad we didn't try to keep up with bandersnatch ourselves22:33
fungi>12tb now! yeesh22:33
clarkbsmcginnis: johnsom TheJulia ^ fyi I would expect things to start getting better around now22:34
clarkbassuming that the issue ahs been properly identified and the mitigations are effective22:34
johnsomYeah, I have been following along and sent a canary22:34
clarkbI'll probably send a #status notice in an hour or so if zuul agrees22:34
TheJuliaAwesome22:35
clarkbianw: maybe its network issues from london to wherever pypi proper is located22:36
clarkbianw: that fails due to transatlantic problems then it falls back to bandersnatch? I dunno22:36
ianwi blame brexit22:37
johnsomlol, they cut the lines early!22:37
johnsomSad trombones, ovh-bhs1 couldn't get octavia-lib: https://api.us-east.open-edge.io:8080/swift/v1/AUTH_e02c11e4e2c24efc98022353c88ab506/zuul_opendev_logs_a7f/751918/1/check/openstack-tox-pep8/a7f5e55/job-output.txt22:47
fungi22:45:1022:49
clarkbjohnsom: I wonder if they have to flush fastly caches too22:49
johnsomIt was one of the two canary patches I rechecked22:49
fungiwe may have also cached a response for up to... 600 seconds is it?22:49
ianwi'll ask dstufft22:49
fungiwe can probably find the cachefile on our mirror for that22:50
fungisince we can narrow to find -mtime-10 or something22:50
ianwseems we should -XPURGE ... looking into that22:51
*** tkajinam has joined #opendev22:51
clarkbwe should maybe do that for everything in requirements22:52
ianwi did curl -XPURGE https://pypi.org/simple/octavia-lib and got an OK22:55
ianwjohnsom: maybe try again22:55
johnsomI can send another recheck, but it's a roll of the dice here it will land.22:56
fungiroll them bones22:57
clarkbmaybe we purge everything in https://opendev.org/openstack/requirements/src/branch/master/upper-constraints.txt afterremoving the === and clearing duplicates22:59
* clarkb makes that list22:59
fungiremember to dereference the index urls too23:00
fungi. and _ get rewritten to - looks like?23:00
fungialso lower-cased23:00
fungiand i think runs of multiple punctuation are coalesced23:00
fungithough we shouldn't have any of those23:01
clarkbhttp://paste.openstack.org/show/797908/ I think constraints already normalized23:02
fungibut not necessarily normalized to pypi's rules23:02
clarkbalso are we sure its - and not .23:02
clarkbah yup seems to redirect23:03
ianwi have "cat upper-constraints.txt | sed 's/===.*//' | tr '.' '-' | tr '[:upper:]' '[:lower:]'"23:03
clarkbso we sjust have to s/./-/23:03
clarkbianw: ya that looks better than what I did23:03
ianwrunning "cat upper-constraints.txt | sed 's/===.*//' | tr '.' '-' | tr '[:upper:]' '[:lower:]'  | xargs -I "PROJ"  curl -XPURGE https://pypi.org/simple/PROJ"23:07
ianwit's returning OK but i think it might just do that for everything?23:07
clarkbI wonder too if we have to do it form the mirrors so that it hits the right fastly endpoints23:09
fungixargs doesn't need you to do https://pypi.org/simple/$PROJ i guess?23:09
clarkbfungi: not with -I23:09
johnsomNo luck, ovh-bhs1 missing octavia-lib: https://cac8a4551058cbabfd7f-de2a4e9610e68e853bfff1f1436a242e.ssl.cf5.rackcdn.com/751925/4/check/openstack-tox-py36/957de24/job-output.txt23:10
fungiahh, awesome, one more way to avoid for loops23:10
clarkbianw: where did you run the purge from?23:11
ianwjust my local machine23:11
clarkbI guess next thing would be to do it from the mirrors?23:11
clarkband if that still doesn't work then we may need to defer to pypa again23:11
johnsomSorry to say I need to step away soon. I can post a DNM patch that has only  the faster running jobs if you would like it for testing before I go.23:12
ianwjohnsom: i think this might be a rare case of just coming back tomorrow will make it work :)23:12
johnsomYeah, that is my perspective as well23:13
johnsomThough that didn't work yesterday... grin23:13
clarkbI think they said the fastly caches are 24 hours?23:14
johnsomYeah, that is what I saw23:14
clarkbso maybe if we do nothing else it will work again tomorrow this time23:14
johnsomToo bad it's not a good time to take a few days and go to the coast.23:16
ianwtrying to read about purging, i feel like it's a global thing23:16
clarkbjohnsom: my brothers went fishing at barview jetty a couple days ago. I told them they were stupid23:16
clarkbjohnsom: they did get a keeper ling cod though so I don't think they cared23:16
clarkbianw: rgr23:16
fungiwe could declare tomorrow international write stuff in rust day23:17
fungior whatever the hip language is this week23:17
johnsomfungi I am down with that. I have been playing with it in my spare time23:17
*** smcginnis has quit IRC23:20
*** smcginnis has joined #opendev23:22
ianwperhaps pip pulls some variant of that url23:32
ianwhttps://pypi.org:443 "GET /simple/octavia-lib/ HTTP/1.1" 304 023:37
ianwdoesn't seem so23:37

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!