*** ryohayakawa has joined #openstack-infra | 00:02 | |
*** d34dh0r53 has quit IRC | 00:09 | |
*** yamamoto has joined #openstack-infra | 00:10 | |
*** tetsuro has joined #openstack-infra | 00:10 | |
*** ysandeep|away is now known as ysandeep|rover | 00:11 | |
*** d34dh0r53 has joined #openstack-infra | 00:12 | |
*** rlandy|bbl is now known as rlandy | 00:34 | |
*** gyee has quit IRC | 00:52 | |
*** ysandeep|rover is now known as ysandeep|afk | 01:42 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: add-build-sshkey: Generate PEM format key https://review.opendev.org/740841 | 01:52 |
---|---|---|
*** apetrich has quit IRC | 02:14 | |
*** ysandeep|afk is now known as ysandeep|rover | 02:22 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: add-build-sshkey: Generate PEM format key https://review.opendev.org/740841 | 02:25 |
*** Goneri has quit IRC | 02:29 | |
*** rlandy has quit IRC | 02:32 | |
*** Lucas_Gray has quit IRC | 02:44 | |
*** rakhmerov has joined #openstack-infra | 02:47 | |
*** psachin has joined #openstack-infra | 02:59 | |
*** rfolco has quit IRC | 02:59 | |
*** ricolin_ has joined #openstack-infra | 03:00 | |
*** tetsuro has quit IRC | 03:04 | |
*** ysandeep|rover is now known as ysandeep|afk | 03:10 | |
*** ykarel|away has joined #openstack-infra | 03:47 | |
*** iurygregory has quit IRC | 03:54 | |
*** TheJulia has quit IRC | 03:54 | |
prometheanfire | clarkb: ok, will do, I will say to just be prepared for breakage tomorrow | 03:54 |
*** TheJulia has joined #openstack-infra | 03:55 | |
*** psachin has quit IRC | 04:07 | |
*** tetsuro has joined #openstack-infra | 04:22 | |
*** tetsuro_ has joined #openstack-infra | 04:26 | |
*** tetsuro has quit IRC | 04:27 | |
*** ykarel|away is now known as ykarel | 04:31 | |
*** evrardjp has quit IRC | 04:33 | |
*** evrardjp has joined #openstack-infra | 04:33 | |
*** tetsuro_ has quit IRC | 04:41 | |
*** psachin has joined #openstack-infra | 04:45 | |
*** tetsuro has joined #openstack-infra | 04:46 | |
*** marios has joined #openstack-infra | 04:54 | |
*** ysandeep|afk is now known as ysandeep | 04:57 | |
*** tetsuro has quit IRC | 04:58 | |
*** ociuhandu has joined #openstack-infra | 05:01 | |
*** ociuhandu has quit IRC | 05:05 | |
*** soniya29 has joined #openstack-infra | 05:20 | |
*** lmiccini has joined #openstack-infra | 05:25 | |
*** udesale has joined #openstack-infra | 05:29 | |
*** psachin has quit IRC | 05:29 | |
*** psachin has joined #openstack-infra | 05:39 | |
*** vishalmanchanda has joined #openstack-infra | 05:41 | |
openstackgerrit | zhangboye proposed openstack/openstack-zuul-jobs master: migrate testing to ubuntu focal https://review.opendev.org/740875 | 05:45 |
openstackgerrit | zhangboye proposed openstack/os-testr master: migrate testing to ubuntu focal https://review.opendev.org/740889 | 06:08 |
*** elod is now known as elod_off | 06:15 | |
*** ccamacho has joined #openstack-infra | 06:18 | |
*** ykarel_ has joined #openstack-infra | 06:34 | |
*** ykarel has quit IRC | 06:37 | |
mnasiadka | Morning | 06:38 |
mnasiadka | Anybody from Linaro managing aarch64 CI nodes around? They seem to have an issue connecting to Docker Hub on Kolla jobs... | 06:39 |
*** dklyle has quit IRC | 06:39 | |
*** SotK has quit IRC | 06:54 | |
*** SotK has joined #openstack-infra | 06:55 | |
*** eolivare has joined #openstack-infra | 07:02 | |
*** rcernin has quit IRC | 07:05 | |
*** iurygregory_ has joined #openstack-infra | 07:10 | |
*** kevinz has joined #openstack-infra | 07:26 | |
*** iurygregory_ is now known as iurygregory | 07:31 | |
*** ralonsoh has joined #openstack-infra | 07:31 | |
*** nightmare_unreal has joined #openstack-infra | 07:31 | |
*** ccamacho has quit IRC | 07:32 | |
*** ysandeep is now known as ysandeep|brb | 07:34 | |
*** tosky has joined #openstack-infra | 07:37 | |
*** yamamoto has quit IRC | 07:38 | |
*** xek has joined #openstack-infra | 07:40 | |
*** yamamoto has joined #openstack-infra | 07:40 | |
*** bhagyashris|afk is now known as bhagyashris | 07:44 | |
*** dtantsur|afk is now known as dtantsur | 07:56 | |
*** fumesover3 has joined #openstack-infra | 08:07 | |
*** rcernin has joined #openstack-infra | 08:09 | |
*** ysandeep|brb is now known as ysandeep|rover | 08:10 | |
*** Lucas_Gray has joined #openstack-infra | 08:10 | |
tkajinam | Is zuul down now ? | 08:14 |
*** rcernin has quit IRC | 08:14 | |
*** rcernin has joined #openstack-infra | 08:15 | |
frickler | tkajinam: not for me, what issue do you see with it? | 08:18 |
tkajinam | frickler, seems like it came back. I couldn't access to it a couple minutes ago | 08:18 |
frickler | mnasiadka: ianw mentioned a possible network issue with linaro in #opendev, likely related | 08:18 |
tkajinam | because of connection timeout | 08:18 |
mnasiadka | frickler: yeah, it's long running | 08:19 |
frickler | tkajinam: o.k., let us know if you see any further issue | 08:20 |
tkajinam | frickler, thanks. it seems that it is working well so far | 08:20 |
tkajinam | I can see zuul status and its job results | 08:21 |
tkajinam | now | 08:21 |
*** ykarel_ is now known as ykarel | 08:36 | |
*** rcernin has quit IRC | 08:42 | |
*** derekh has joined #openstack-infra | 08:48 | |
*** ociuhandu has joined #openstack-infra | 08:50 | |
*** donnyd has quit IRC | 08:53 | |
*** donnyd has joined #openstack-infra | 08:53 | |
*** ccamacho has joined #openstack-infra | 09:00 | |
*** apetrich has joined #openstack-infra | 09:00 | |
*** piotrowskim has joined #openstack-infra | 09:03 | |
*** auristor has quit IRC | 09:03 | |
*** auristor has joined #openstack-infra | 09:09 | |
*** lucasagomes has joined #openstack-infra | 09:10 | |
*** xek has quit IRC | 09:12 | |
*** pkopec has joined #openstack-infra | 09:23 | |
*** pkopec has quit IRC | 09:41 | |
*** frickler is now known as frickler_pto | 09:44 | |
*** frickler_pto is now known as frickler | 09:47 | |
*** dtantsur is now known as dtantsur|bbl | 09:49 | |
*** grantza has joined #openstack-infra | 10:08 | |
*** tkajinam has quit IRC | 10:12 | |
*** yamamoto has quit IRC | 10:16 | |
*** yamamoto has joined #openstack-infra | 10:22 | |
*** gouthamr has quit IRC | 10:22 | |
*** gouthamr has joined #openstack-infra | 10:23 | |
*** Jeffrey4l has quit IRC | 10:23 | |
*** Jeffrey4l has joined #openstack-infra | 10:24 | |
*** yamamoto has quit IRC | 10:26 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kuberenetes with kind https://review.opendev.org/740935 | 10:50 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kuberenetes with kind https://review.opendev.org/740935 | 10:51 |
*** ysandeep|rover is now known as ysandeep|afk | 10:53 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kuberenetes with kind https://review.opendev.org/740935 | 10:58 |
*** yamamoto has joined #openstack-infra | 11:02 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kuberenetes with kind https://review.opendev.org/740935 | 11:05 |
*** yamamoto has quit IRC | 11:08 | |
*** rcernin has joined #openstack-infra | 11:13 | |
*** Lucas_Gray has quit IRC | 11:28 | |
*** Lucas_Gray has joined #openstack-infra | 11:33 | |
*** yamamoto has joined #openstack-infra | 11:40 | |
*** rlandy has joined #openstack-infra | 11:43 | |
*** rlandy is now known as rlandy|ruck | 11:45 | |
*** iurygregory has quit IRC | 11:46 | |
*** ysandeep|afk is now known as ysandeep | 11:47 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kuberenetes with kind https://review.opendev.org/740935 | 11:54 |
*** iurygregory has joined #openstack-infra | 12:00 | |
*** rfolco has joined #openstack-infra | 12:03 | |
zbr|ruck | more mirror errors? https://c3c4d9b326375e78bcd8-2bdda90a1128cbc54c09909a8150f07c.ssl.cf2.rackcdn.com/739939/2/gate/tripleo-tox-molecule/3975439/tox/molecule-1.log | 12:04 |
zbr|ruck | mirror.mtl01.inap.opendev.org - Caused by ResponseError('too many 503 error responses', -- looks weird. | 12:05 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/740935 | 12:06 |
*** yamamoto has quit IRC | 12:08 | |
*** yamamoto has joined #openstack-infra | 12:13 | |
zbr|ruck | I also tried to search for "too many 503 error responses" on logstash but got no results. | 12:13 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/740935 | 12:13 |
*** udesale_ has joined #openstack-infra | 12:22 | |
*** xek has joined #openstack-infra | 12:23 | |
*** derekh has quit IRC | 12:23 | |
*** udesale has quit IRC | 12:25 | |
*** ryohayakawa has quit IRC | 12:25 | |
*** yamamoto has quit IRC | 12:27 | |
*** grantza has quit IRC | 12:33 | |
*** grantza has joined #openstack-infra | 12:34 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/740935 | 12:36 |
*** yamamoto has joined #openstack-infra | 12:41 | |
*** artom has quit IRC | 12:43 | |
*** artom has joined #openstack-infra | 12:44 | |
*** artom has quit IRC | 12:44 | |
*** artom has joined #openstack-infra | 12:45 | |
*** rcernin has quit IRC | 12:48 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/740935 | 12:59 |
*** rlandy|ruck is now known as rlandy|ruck|mtg | 13:00 | |
*** ysandeep is now known as ysandeep|rover | 13:03 | |
*** derekh has joined #openstack-infra | 13:03 | |
*** yamamoto has quit IRC | 13:10 | |
*** eharney has joined #openstack-infra | 13:11 | |
*** xek has quit IRC | 13:13 | |
*** Goneri has joined #openstack-infra | 13:14 | |
fungi | zbr|ruck: that's an odd message indeed. i mean, the url it complained about wouldn't have been served from there, but should be generating 404 not 503 | 13:16 |
*** ykarel is now known as ykarel|away | 13:16 | |
fungi | i'll see if i can find some 503 errors in the apache logs for port 443 | 13:16 |
fungi | oh, right, we have proxying for /pypi and /pypifiles on 443 | 13:18 |
fungi | so it could have been passing through the 503 responses from pypi/fastly | 13:18 |
zbr | fungi: 2nd question would be why not visible in logstash? | 13:20 |
fungi | to tox/*.log get indexed in logstash? | 13:21 |
fungi | or is logstash processing backlogged? | 13:21 |
zbr | pyyaml was released in march and is very popular, so it should have being already in cache anyway | 13:21 |
fungi | yeah, but the url it's complaining about is /pypifiles/packages/b7/f5/0d658908d70cb902609fbb39b9ce891b99e060fa06e98071d369056e346f/ansi2html-1.5.2.tar.gz | 13:22 |
zbr | no idea yet, but i plan to find-out, seems a very important informtion | 13:22 |
zbr | lol, that is 2018, even worse. | 13:23 |
fungi | it's too bad tox doesn't record timestamps, but i can probably find the requests by ip address | 13:24 |
fungi | zbr: yeah, so it tried 6 times to get https://mirror.mtl01.inap.opendev.org/pypifiles/packages/b7/f5/0d658908d70cb902609fbb39b9ce891b99e060fa06e98071d369056e346f/ansi2html-1.5.2.tar.gz and each time https://files.pythonhosted.org/packages/b7/f5/0d658908d70cb902609fbb39b9ce891b99e060fa06e98071d369056e346f/ansi2html-1.5.2.tar.gz (to which it's proxying) responded with a 503 | 13:29 |
fungi | currently that mirror can retrieve it with wget, so whatever the problem was seemed to have been temporary | 13:30 |
fungi | http://paste.openstack.org/show/795915/ | 13:30 |
zbr | it happened at the gate.... :p | 13:30 |
zbr | and w/o logstash search we are unable to discover how common is the 503 from pypi. | 13:30 |
fungi | i can grep the apache logs real quick on that mirror to get an idea | 13:31 |
zbr | tbh, we should expect 503 from time to time. | 13:31 |
zbr | question is if we can find a way to prevent a failure | 13:32 |
zbr | what is our current mirroring logic? can we have another fallback on specific errors, like 503 and try another mirror instead? | 13:33 |
fungi | access log on that mirror starts 06:25z today, 12 503 responses and knowing that pip tries 6 times before giving up that suggests it happened twice to builds in inap in the past ~7 hours | 13:33 |
zbr | fungi: you greped for 503 in general or this file in particular? | 13:34 |
zbr | i guess it does not make sense to ask at pypi, as they will say "it is possible". | 13:34 |
fungi | yeah, in addition to your ansi2html-1.5.2.tar.gz failure at 10:40:55-10:41:02 there was a similar failure for lxml-4.5.0-cp27-cp27mu-manylinux1_x86_64.whl at 10:46:11-10:46:19 | 13:35 |
fungi | so the 12 error responses spanned a little over 5 minutes | 13:36 |
fungi | also remember it's not *necessarily* pypi because they're going through a cdn (fastly) so it could have been isolated to the nearest cdn endpoint to inap's mtl01 region | 13:36 |
fungi | i'll check some other mirrors' access logs for comparison | 13:37 |
fungi | ovh bhs1 had a build which saw two such failures for varlink-30.3.0-py2.py3-none-any.whl at 10:42:27 | 13:39 |
*** benj_ has quit IRC | 13:39 | |
fungi | since it wasn't six, i'm guessing the next try was successful... checking | 13:39 |
fungi | yep, 200 ok for it at 10:42:28 | 13:40 |
*** benj_ has joined #openstack-infra | 13:41 | |
fungi | however, both those provider regions are geographically near one another (quebec, canada) so it could still be a localized issue with that fastly endpooint | 13:43 |
fungi | i didn't see any similar occurrences in our other ovh region (in france) | 13:43 |
zbr | fungi: would it be possible to fallback to one central mirror of ours for 503? | 13:44 |
*** psachin has quit IRC | 13:44 | |
fungi | nor in vexxhost's california region | 13:45 |
zbr | that needs to go in our mirror implementation, something like "try pypi else if 503 try fallbackl ...." | 13:45 |
*** dtantsur|bbl is now known as dtantsur | 13:45 | |
fungi | i have no idea, nor do i know whether that would be generally more robust, nor whether this is such an isolated incident that the human cost in maintaining a workaround exceeds the benefit | 13:46 |
fungi | so far we have a situation which caused some downloads to fail in quebec over a 5 minute span of time | 13:46 |
zbr | i know for sure that I seen 503 from pypi in the past (like months ago). | 13:47 |
zbr | i asked about fallback, w/o knowing what implementation we have. | 13:47 |
zbr | i know how to do a fallback with nginx, but no idea if is easily doable with what we use. | 13:47 |
fungi | no similar occurrences today yet in any of our rackspace regions either (texas, illinois, virginia) | 13:48 |
zbr | yes, if is too hard it may not make sense, but if if it is easy. it may safe us from few failures. | 13:48 |
*** ralonsoh has quit IRC | 13:49 | |
*** frickler is now known as frickler_pto | 13:50 | |
fungi | it's apache mod_proxy, the configuration can be found here: sudo grep 'Response status 503' /var/log/apache2/mirror_443_access.log|cut -d' ' -f5,7 | 13:50 |
fungi | er, sorry, pasted from wrong buffer | 13:50 |
fungi | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L49-L101 | 13:50 |
fungi | there | 13:50 |
*** ralonsoh has joined #openstack-infra | 14:00 | |
*** rlandy|ruck|mtg is now known as rlandy|ruck | 14:00 | |
*** dave-mccowan has joined #openstack-infra | 14:02 | |
zbr | i am reading the docs now, apparently it has supports for lots of failover models. i am still not sure to where to point it to. | 14:02 |
fungi | part of the problem is that pypi itself provides no fallback, instead it trusts its cdn network to be robust | 14:03 |
fungi | so there's no "if the cdn is broken go here instead" url i'm aware of | 14:04 |
*** ykarel|away has quit IRC | 14:05 | |
openstackgerrit | Oleksandr Kozachenko proposed openstack/project-config master: Add openstack/horizon to the vexxhost tenant https://review.opendev.org/740969 | 14:06 |
*** lucasagomes has quit IRC | 14:13 | |
*** lucasagomes has joined #openstack-infra | 14:14 | |
*** armax has joined #openstack-infra | 14:16 | |
*** knikolla has joined #openstack-infra | 14:21 | |
zbr | fungi: i asked for help from pypi-dev and I was asked to raise https://github.com/pypa/warehouse/issues/8260 | 14:22 |
zbr | they will look into, hopefully they will give us a fallback | 14:22 |
zbr | fungi: lets watch it, dstufft offered to look into :) | 14:25 |
*** ykarel|away has joined #openstack-infra | 14:26 | |
*** yamamoto has joined #openstack-infra | 14:28 | |
fungi | thanks for bringing it up with them | 14:28 |
AJaeger | config-core, please review https://review.opendev.org/739876 https://review.opendev.org/739892 https://review.opendev.org/740711 https://review.opendev.org/740614 https://review.opendev.org/740310 | 14:29 |
*** dave-mccowan has quit IRC | 14:33 | |
*** dklyle has joined #openstack-infra | 14:34 | |
*** dave-mccowan has joined #openstack-infra | 14:37 | |
openstackgerrit | Merged openstack/project-config master: Add openstack/horizon to the vexxhost tenant https://review.opendev.org/740969 | 14:43 |
openstackgerrit | Merged openstack/project-config master: update-constraints: Install pip for all versions https://review.opendev.org/738926 | 14:43 |
clarkb | zbr: fungi: an important clarification is that our apache mirrors have a 24 hour limit on any cached object. That means we'll still query the remote backend to check validity after that period. Also the cache objects can be expired early if necessary to make room for other objects. | 14:47 |
*** ysandeep|rover is now known as ysandeep|food | 14:47 | |
clarkb | and yes logstash indexing has been bhind recently. I think some file is causing things to OOM or similar though I've not had time to look closer than e-r's status page | 14:47 |
zbr | clarkb: that 24h is an implementation aspect, it does not influence the outcome. | 14:48 |
clarkb | zbr: it does, you assumed that an older popular package would always be in the cache and the mirror wouldn't make external requests | 14:48 |
clarkb | this isn't necessarily true for multipel reasons. I wanted to make sure that was clarified | 14:48 |
zbr | ok. for pypi packages that cannot be updated by design we could decide to improve the logic maybe. | 14:49 |
clarkb | 13:21:49 zbr | pyyaml was released in march and is very popular, so it should have being already in cache anyway | 14:49 |
*** manfly000 has joined #openstack-infra | 14:49 | |
clarkb | no I think teh current logic is correct even for objects that shouldn't change | 14:50 |
clarkb | it acts as belts and suspenders | 14:50 |
*** johnthetubaguy has quit IRC | 14:50 | |
zbr | still, i would like to see a way to avoid sending the 503 to pip, when it happens, but we will need upstream help for a fallback. | 14:50 |
clarkb | our mirrors shouldn't change responses | 14:50 |
clarkb | so ya thats entirely up to pypi and their cdn | 14:51 |
*** johnthetubaguy has joined #openstack-infra | 14:52 | |
zbr | i disagree here, we are in business to make CI/CD fast and less dependent on external networking issues. It would be up to us to decide what to do when upstream is flaky. | 14:53 |
fungi | note that files on pypi *can* change by being deleted, and we want that state correctly reflected too | 14:53 |
clarkb | that isn't quite accurate imo either | 14:53 |
zbr | i mentioned only 503 code, which is a non-cachable one | 14:53 |
clarkb | we are in the business of testing software that accurately reflects its "life" outside of the CI system | 14:54 |
zbr | i did not say we should cache 404, or other codes. | 14:54 |
*** manfly000 is now known as xiaoguang | 14:54 | |
fungi | also, i don't know when we got into business, i don't personally remember that, i'm here to serve community collaborations | 14:54 |
zbr | there is a small set of http requests which indicate that the client should try again, as "we have a problem". | 14:54 |
clarkb | fungi: figure of speech | 14:54 |
clarkb | the goal of the CI system isn't to make the software pass on every run | 14:55 |
*** xiaoguang is now known as manfly000 | 14:55 | |
clarkb | the goal of the CI system is to accurately reflect the life of software outside of the CI system so that we can catch and address problems early | 14:55 |
*** artom has quit IRC | 14:56 | |
clarkb | if we change external behaviors we make the software dependent on the CI system and that is a problem | 14:56 |
*** artom has joined #openstack-infra | 14:56 | |
clarkb | in this particular case it seems like maybe we found a pypi bug and now it can be addressed | 14:56 |
clarkb | that is why we run the CI tooling | 14:57 |
*** artom has quit IRC | 14:57 | |
*** artom has joined #openstack-infra | 14:57 | |
zbr | clarkb: fungi: please comment on the bug, eventually even posting a comment on #pypa-dev -- maybe we can persuade them do to something while is still "hot". | 14:58 |
zbr | i clearly agree that is pypi, issue, but i am realistic that I cannot always rely on external service providers to fix their stuff. | 14:58 |
zbr | we should first engage upstream, and see what we can do after. | 14:59 |
clarkb | even then we shouldn't update the CI system to fix it, because the software still has to deal with it outside of the CI system | 14:59 |
clarkb | working around the problem in the software itself is fine | 14:59 |
*** manfly000 has left #openstack-infra | 14:59 | |
*** artom has quit IRC | 14:59 | |
clarkb | but the CI "platform" should do its best to be special in that regard | 14:59 |
fungi | if we can't rely on external providers to fix things, we shouldn't expect the users of our software to have to either though, right? | 14:59 |
clarkb | *do its best not to be special in that regard | 15:00 |
*** xek has joined #openstack-infra | 15:00 | |
zbr | fungi: pypi was reliable enough so far, but i am sure you would have a different take if it would become more flaky suddenly | 15:00 |
*** artom has joined #openstack-infra | 15:01 | |
*** artom has quit IRC | 15:01 | |
clarkb | for similar reasons this is why I've said that devstack should handle setuptools distutils vendoring and not try and fix that in the CI system | 15:01 |
zbr | anyway, i consider the subject closed, not proposing any chance on our side for mirrors. | 15:01 |
clarkb | because other people using debian and ubuntu will have the same problem and we don't want the tooling to be dependent on the CI platform | 15:01 |
*** artom has joined #openstack-infra | 15:01 | |
*** xiaoguang has joined #openstack-infra | 15:01 | |
zbr | the irony is that I got a similar issue today with docker build with alpine apk, which is so stupid that it ignores failure to add a repository. | 15:02 |
zbr | clearly not the CI/CD issue, only a combination of lack of retry support in both apk and docker build | 15:02 |
zbr | and it happened exactly on my release tag :p | 15:03 |
*** yamamoto has quit IRC | 15:03 | |
*** artom has quit IRC | 15:04 | |
*** rlandy|ruck is now known as rlandy|ruck|mtg | 15:06 | |
*** xiaoguang has quit IRC | 15:06 | |
*** lmiccini has quit IRC | 15:16 | |
*** armax has quit IRC | 15:18 | |
*** xek has quit IRC | 15:19 | |
*** Limech has joined #openstack-infra | 15:20 | |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Enable grenade again for Stein, Rocky and Queens https://review.opendev.org/740310 | 15:21 |
*** hamalq has joined #openstack-infra | 15:23 | |
openstackgerrit | Merged openstack/project-config master: maintain-github-mirror: add requests dependency https://review.opendev.org/740711 | 15:26 |
*** artom has joined #openstack-infra | 15:27 | |
*** hamalq has quit IRC | 15:29 | |
*** ykarel|away has quit IRC | 15:30 | |
*** hamalq has joined #openstack-infra | 15:35 | |
*** udesale_ has quit IRC | 15:35 | |
*** ysandeep|food is now known as ysandeep | 15:38 | |
clarkb | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_a60/733250/19/check/neutron-tempest-dvr-ha-multinode-full/a6025d9/controller/logs/screen-q-svc.txt (don't actually open that link directly, drop the file at the end to browse the dir) is ~300MB large and I think is contriburing to the logstash behindness | 15:39 |
clarkb | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_464/735125/10/check/neutron-ovn-rally-task/4648c27/controller/logs/screen-q-svc.txt as well | 15:40 |
clarkb | it seems that maybe neutron exploded its logging size | 15:40 |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Add template for charm check and gate https://review.opendev.org/740614 | 15:40 |
* clarkb joins #neutron | 15:41 | |
*** ysandeep is now known as ysandeep|away | 15:43 | |
clarkb | * #openstack-neutron. I've brought it up there and will see if they can help make that happier | 15:44 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/740935 | 15:47 |
*** gyee has joined #openstack-infra | 15:50 | |
clarkb | neutron is instrumenting function call times in debug log output | 15:50 |
clarkb | that accounts for the majority fo the logging output there. I've left that info in #openstack-neutron and suggested it be off by default and when enabled it can log to a separate file possibly | 15:50 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/740935 | 15:51 |
clarkb | zbr: ^ you had wondered about that and I think this is the root cause | 15:51 |
zbr | wow 300, we should definetly have a limit per job. | 15:53 |
fungi | we do | 15:53 |
clarkb | if they are unwilling or unable to fix that we can stop indexing that file, but I'd like to see if they can address the underlying issue first | 15:57 |
*** artom has quit IRC | 15:59 | |
*** lucasagomes has quit IRC | 15:59 | |
tosky | fungi: about os-loganalyze, is the OpenDev Infra team meeting later today to discuss about it (whether to port the jobs or retire it altogether), or have your discussed about it already? | 16:00 |
clarkb | tosky: we aren't using it anymore. I don't know that there is much to discuss. | 16:01 |
*** marios is now known as marios|out | 16:01 | |
*** Lucas_Gray has quit IRC | 16:01 | |
clarkb | maybe we want to advertise we aren't able to maintain it anymore and if anyone is using it they can maintain it if they like | 16:01 |
clarkb | but we've long since switched to zuul's rendering of log output | 16:02 |
*** rlandy|ruck|mtg is now known as rlandy|ruck | 16:02 | |
fungi | i think the question is specifically about current jobs for osla blocking the effort to get rid of legacy ci jobs | 16:04 |
fungi | we could delete those jobs or retire the repo entirely | 16:04 |
fungi | but i don't personally see the benefit in us spending time updating jobs for a project we no longer use | 16:05 |
clarkb | ya if the desire is to fix jobs for osla I'd say delete the jobs instead | 16:05 |
*** sshnaidm is now known as sshnaidm|afk | 16:08 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/740935 | 16:09 |
tosky | clarkb: yes, sorry, I should have provided more details (it's a follow-up from a few days ago); it's about removing or porting a legacy job | 16:12 |
*** aedc_ has joined #openstack-infra | 16:24 | |
*** aedc_ has quit IRC | 16:24 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind https://review.opendev.org/740935 | 16:25 |
*** aedc has quit IRC | 16:27 | |
zbr | clarkb: fungi : https://github.com/pypa/warehouse/issues/8260 -- i knew it it was too good to be true. every time i have to interact with him, i regret it. | 16:46 |
*** sshnaidm|afk is now known as sshnaidm | 16:48 | |
fungi | i think they're being reasonable on that. pypi is funded by psf, which has basically no budget and something like 1.5 full time engineers managing all of their infrastructure. the cdn is generously donated by fastly, so it's not like they have any leverage if it'd down occasionally. this is just one of a vast number of ways pip install can encounter a failure, and it already has a number of general | 16:51 |
fungi | mitigations built in (in the cases we logged, it retried the request 6 times with an escalating backoff timer over a couple of minutes) | 16:51 |
*** nightmare_unreal has quit IRC | 16:52 | |
fungi | as i noted in my comment, trying to engineer solutions for an ever increasing number of corner cases increases the complexity of your software until that complexity becomes a liability and contributes to worse outages | 16:52 |
fungi | we had fewer request failures when we maintained a full mirror, but we decided to switch to a caching proxy even though we knew it would at least slightly decrease reliability, because it also significantly reduced the maintenance cost | 16:53 |
*** jackedin has joined #openstack-infra | 16:54 | |
fungi | these community services (our ci system and pypi as well) have to balance maintenance burden against reliability | 16:54 |
fungi | i also think it was not at all nice of you to make comments like 'a Netflix engineer would not have closed the bug with a "not my problem"' | 16:55 |
fungi | these are people striving to make the best choices for their community with the funds and time available to them | 16:56 |
*** ociuhandu_ has joined #openstack-infra | 16:58 | |
*** derekh has quit IRC | 16:58 | |
*** yamamoto has joined #openstack-infra | 17:01 | |
*** ociuhandu has quit IRC | 17:01 | |
*** ociuhandu_ has quit IRC | 17:02 | |
*** dtantsur is now known as dtantsur|afk | 17:31 | |
*** marios|out has quit IRC | 17:35 | |
zbr | i see things bit different here, nobody asked fastly about how to resolve this, nor we asked them if they could provide another endpoint. | 17:47 |
*** yamamoto has quit IRC | 17:53 | |
*** piotrowskim has quit IRC | 17:56 | |
zbr | at this point we have proof that it happened 6 times on one mirror in two weeks. Can we start logging 503 for longer period of time? what is the number of failures for 6 months for example? | 17:57 |
*** ralonsoh has quit IRC | 17:57 | |
zbr | two weeks single host is not enough to get an estimate for number of failures/year | 17:58 |
fungi | those requests spanned a few minutes, so were part of a single incident, not six | 17:58 |
fungi | i'm pretty sure that the pypi maintainers are happy that their content is accessible most of the time in most of the world. they're not accountable to you or me or anyone else. they're providing a public service run on donated funding and donated infrastructure, and it's up to their donors to determine where that time and money are best applied. sometimes it's down. sometimes the internet is down | 18:03 |
fungi | too. there are likely much larger problems they'd rather focus on fixing, and that's their prerogative | 18:03 |
clarkb | right, we've sort of accepted there is a certain degree of unreliability inherrent to the system | 18:03 |
clarkb | this is why we cache and mirror as we are able | 18:04 |
clarkb | its not perfect but it helps and when things sneak through we can report them to see if upstream can fix them. If not oh well | 18:04 |
*** xek has joined #openstack-infra | 18:04 | |
zbr | ok. so no action, even if i would have preferred to have a way to log such occurences for longer period of times. | 18:05 |
fungi | log analysis might be warranted to determine the degree of unreliability so that choices we make can be reevaluated, but i wouldn't take one incident as a signal that it's worth the time to perform that deeper level of investigation | 18:06 |
zbr | fungi: the reason why i spend this amount of time today on this is because is not the first time I seen this error. | 18:06 |
zbr | usually on first occurence I just... recheck | 18:07 |
zbr | but i remember seeing exactly 503 on pythonhosted before. obviously I have to logs to prove it. | 18:07 |
clarkb | fwiw we DO have a method of trackign these problems. the unfortuante reality is those tools have languished because few use them and they process terabytes of data constantly and as a result need care | 18:07 |
*** d34dh0r53 has quit IRC | 18:08 | |
fungi | it apparently impacted two builds in the span of two weeks from what i could see. out of, i don't know how many builds total but we did a *lot* likely tens of thousands. the percentage failure from this occurrence is disproportionate to the time invested in trying to "solve" it | 18:08 |
zbr | current logstash instance is not good for this as we feed it too much data, we need something where we cherry pick and log specific events, and keep them for like 1-2years. | 18:09 |
clarkb | zbr: it would be trivial to record occurence data over the long term via logstash queries | 18:09 |
zbr | basically is the same tool, but with a very different usage approach. | 18:09 |
clarkb | basically once a week record the number of instances | 18:09 |
clarkb | then log that over time | 18:09 |
fungi | as long as we know what to look for (say based on elastic-recheck queries getting maintained regularly) | 18:10 |
zbr | hmm, that would be interesting. | 18:10 |
clarkb | fungi: yes exactly | 18:10 |
zbr | like a log-compress filter | 18:11 |
*** d34dh0r53 has joined #openstack-infra | 18:15 | |
*** jackedin has quit IRC | 18:29 | |
*** rmcall has joined #openstack-infra | 18:30 | |
*** xarses has quit IRC | 18:35 | |
*** xarses has joined #openstack-infra | 18:35 | |
*** eolivare has quit IRC | 18:37 | |
*** ricolin_ has quit IRC | 18:41 | |
*** slaweq has joined #openstack-infra | 18:54 | |
*** slaweq has quit IRC | 18:56 | |
*** ramishra has quit IRC | 18:58 | |
*** ociuhandu has joined #openstack-infra | 19:14 | |
*** ociuhandu has quit IRC | 19:23 | |
*** ramishra has joined #openstack-infra | 19:25 | |
*** soniya29 has quit IRC | 19:32 | |
*** yamamoto has joined #openstack-infra | 19:51 | |
*** auristor has quit IRC | 20:38 | |
*** auristor has joined #openstack-infra | 20:40 | |
*** yamamoto has quit IRC | 20:51 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: upload-afs-synchronize: expand documentation https://review.opendev.org/741051 | 20:52 |
*** markvoelker has joined #openstack-infra | 21:23 | |
openstackgerrit | Merged zuul/zuul-jobs master: write-inventory: add per-host variables https://review.opendev.org/739892 | 21:26 |
*** markvoelker has quit IRC | 21:27 | |
*** yamamoto has joined #openstack-infra | 21:35 | |
*** krotscheck has joined #openstack-infra | 21:46 | |
*** vapjes has joined #openstack-infra | 21:53 | |
*** jamesmcarthur has joined #openstack-infra | 22:32 | |
*** vishalmanchanda has quit IRC | 22:39 | |
*** rcernin has joined #openstack-infra | 22:44 | |
*** armax has joined #openstack-infra | 22:47 | |
*** tosky has quit IRC | 22:50 | |
*** rcernin has quit IRC | 22:51 | |
*** rcernin has joined #openstack-infra | 22:51 | |
*** tkajinam has joined #openstack-infra | 22:58 | |
*** hamalq has quit IRC | 22:58 | |
*** yamamoto has quit IRC | 23:02 | |
*** armax has quit IRC | 23:12 | |
*** yamamoto has joined #openstack-infra | 23:33 | |
*** jamesmcarthur has quit IRC | 23:36 | |
*** xek has quit IRC | 23:36 | |
*** jamesmcarthur has joined #openstack-infra | 23:48 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!