*** ryohayakawa has joined #opendev | 00:05 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Mirror OpenSUSE Leap 15.2 https://review.opendev.org/745251 | 00:09 |
---|---|---|
openstackgerrit | Clark Boylan proposed openstack/project-config master: Update opensuse-15 to 15.2 https://review.opendev.org/745252 | 00:10 |
clarkb | that ^ was something that I noticed when debugging the zypper issues | 00:10 |
clarkb | but I'm not sure we have room for that on our afs vicapas or the volume quota | 00:11 |
clarkb | I'll WIP For now | 00:11 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add ansible role to manage gerritbot https://review.opendev.org/744795 | 00:18 |
fungi | clarkb: we likely just need to replace 15.1 with 15.2? the plan going forward was not to have separate 15.x images, so probably don't need multiple versions mirrored either right? | 00:21 |
clarkb | fungi: correct, but if we do that we'll likely break the existing opensuse-15 label for some period if time | 00:21 |
clarkb | (I'm not sure we can coordinate the image build update and hte mirr updates) | 00:22 |
clarkb | but maybe that is ok? | 00:22 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Pre-install python3 for CentOS https://review.opendev.org/741868 | 00:38 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Deprecate dib-python; remove from in-tree elements https://review.opendev.org/741877 | 00:38 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Remove glance-registry https://review.opendev.org/739796 | 01:43 |
openstackgerrit | Merged openstack/diskimage-builder master: Update the tox minversion parameter. https://review.opendev.org/738754 | 01:45 |
openstackgerrit | Merged openstack/diskimage-builder master: Fixes DIB_IPA_CERT certificate copy issue https://review.opendev.org/741583 | 03:03 |
ianw | fungi: a mail forwarding thing i've been interested in just implemented ARC which apparently gmail does : https://www.ietf.org/id/draft-ietf-dmarc-arc-usage-09.txt | 03:21 |
ianw | apropos nothing; just thought that was interesting | 03:21 |
ianw | if gmail does it, i guess that basically means that's the way it's done now | 03:21 |
*** ysandeep is now known as ysandeep|off | 04:24 | |
*** raukadah is now known as chkumar|rover | 04:31 | |
*** DSpider has joined #opendev | 04:58 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Fedora 32 support https://review.opendev.org/737217 | 05:13 |
*** tkajinam has quit IRC | 05:34 | |
*** tkajinam has joined #opendev | 05:35 | |
*** tkajinam has quit IRC | 05:50 | |
*** tkajinam has joined #opendev | 05:51 | |
*** redrobot has quit IRC | 06:37 | |
*** fressi has joined #opendev | 06:39 | |
*** bhagyashris is now known as bhagyashris|off | 07:13 | |
*** hashar has joined #opendev | 07:33 | |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:01 | |
*** tosky has joined #opendev | 08:04 | |
*** sshnaidm|afk is now known as sshnaidm|off | 08:19 | |
*** fressi has quit IRC | 08:26 | |
*** fressi has joined #opendev | 08:57 | |
*** tkajinam has quit IRC | 09:17 | |
*** fressi has quit IRC | 09:21 | |
*** dtantsur|afk is now known as dtantsur | 09:30 | |
dtantsur | clarkb, fungi, thank you for handling the suse issue! | 09:30 |
*** DSpider has quit IRC | 09:49 | |
openstackgerrit | Merged openstack/diskimage-builder master: Pre-install python3 for CentOS https://review.opendev.org/741868 | 09:53 |
openstackgerrit | Carlos Goncalves proposed zuul/zuul-jobs master: configure-mirrors: add CentOS 8 Stream https://review.opendev.org/734787 | 09:54 |
openstackgerrit | Merged openstack/diskimage-builder master: Deprecate dib-python; remove from in-tree elements https://review.opendev.org/741877 | 09:57 |
openstackgerrit | Carlos Goncalves proposed openstack/project-config master: CentOS 8 Stream initial deployment https://review.opendev.org/734791 | 09:59 |
openstackgerrit | Carlos Goncalves proposed openstack/diskimage-builder master: Add support for CentOS 8 Stream cloud image https://review.opendev.org/737245 | 10:02 |
*** ryohayakawa has quit IRC | 10:27 | |
frickler | infra-root: I've seen this a couple of times now, jobs failing in pre with failure to set up swap, does anyone have an idea for that? might be provider specific https://7e827a77180c1e6e432f-3c4e8d8f712aba3e652b0cfd0c30a298.ssl.cf5.rackcdn.com/745303/1/check/barbican-dogtag-tox-functional/35bf535/job-output.txt | 10:35 |
frickler | logstash seems this on inap, ovh and vexxhost, but only for fedora-31 | 10:38 |
frickler | http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%20%5C%22swapon%3A%20%2Froot%2Fswapfile%3A%20swapon%20failed%3A%20Invalid%20argument%5C%22 | 10:39 |
frickler | it also seems to be hidden by retrying mostly, see e.g. https://zuul.opendev.org/t/openstack/builds?job_name=devstack-platform-fedora-latest | 10:42 |
openstackgerrit | Carlos Goncalves proposed openstack/project-config master: CentOS 8 Stream initial deployment https://review.opendev.org/734791 | 10:42 |
*** DSpider has joined #opendev | 11:06 | |
*** fressi has joined #opendev | 11:22 | |
*** stephenfin has quit IRC | 11:29 | |
*** stephenfin has joined #opendev | 11:38 | |
*** cloudnull has quit IRC | 11:42 | |
*** cloudnull has joined #opendev | 11:43 | |
*** hashar is now known as hasharLunch | 11:46 | |
*** stephenfin has quit IRC | 12:24 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: ensure-pip: add instructions for RedHat system https://review.opendev.org/743750 | 12:27 |
fungi | frickler: apparently swapon complaining "invalid argument" in that context can mean it doesn't see expected formatting in the swapfile | 12:30 |
fungi | maybe the writes from mkswap are being buffered and haven't been flushed before swapon reads from the inode? | 12:31 |
*** stephenfin has joined #opendev | 12:33 | |
*** DSpider has quit IRC | 12:49 | |
frickler | fungi: sounds plausible, so maybe add a retry to the swapon and/or add a sync before? | 13:31 |
*** dirk has quit IRC | 13:31 | |
*** chkumar|rover is now known as raukadah | 13:32 | |
fungi | maybe, though since this is only cropping up on newer fedora i wonder if something has changed in the kernel/fs drivers | 14:07 |
fungi | but yeah, swapfiles are a little fiddly since they're a filesystem on top of another filesystem | 14:08 |
openstackgerrit | Ken Giusti proposed openstack/project-config master: Retire devstack-plugin-pika project https://review.opendev.org/745342 | 14:14 |
*** hasharLunch is now known as hashar | 14:17 | |
openstackgerrit | Ken Giusti proposed openstack/project-config master: Retire the devstack-plugin-zmq project https://review.opendev.org/745344 | 14:24 |
*** ysandeep|off is now known as ysandeep | 14:51 | |
clarkb | frickler: fungi could it be an issue with our use of fallocate? I think some man pages say dont do that andothers do. But it hasnt been an issue for us (yet?) | 14:55 |
fungi | oh, maybe | 14:55 |
clarkb | we could switch f31 over to dd | 15:01 |
clarkb | and see if it gets better? | 15:01 |
*** Guest7899 has joined #opendev | 15:03 | |
frickler | clarkb: fungi: the manpages for mkswap+swapon on fedora31 both suggest to use dd as the preferred solution | 15:05 |
clarkb | we avoided dd because it is quite a bit slower | 15:06 |
fungi | use dd instead fallocate so we prewrite zeroes? | 15:06 |
clarkb | but that may be better if more reliable | 15:06 |
clarkb | yes | 15:06 |
fungi | slower and also uses up more of the rootfs if swap winds up not being needed | 15:06 |
*** Guest7899 is now known as redrobot | 15:06 | |
fungi | then again, it guards against kernel panic for cases where the rootfs fills up and then something tries to page out | 15:07 |
frickler | we might also want to reduce swap file size, 8GB seems excessive, maybe 1 or 2G would suffice? | 15:08 |
frickler | if a job needs to swap more, it's very likely to timeout anyway I guess | 15:09 |
clarkb | ya that may be a reasonable compromise | 15:09 |
openstackgerrit | Merged opendev/gerritbot master: Switch to stestr, declare Python 3.7 compatibility https://review.opendev.org/730594 | 15:09 |
fungi | the idea was to have plenty so that we get legitimate errors rather than oom, but that's a fine line to walk because it's just as likely swap thrash skyrockets iowait and your job times out | 15:10 |
*** fressi has left #opendev | 15:23 | |
*** sgw1 has joined #opendev | 15:36 | |
*** bolg has quit IRC | 15:37 | |
*** hashar has quit IRC | 15:46 | |
*** DSpider has joined #opendev | 16:03 | |
*** dtantsur is now known as dtantsur|afk | 16:04 | |
tosky | talking about legacy jobs, so far I've been looking mostly into the descendents of legacy-dsvm-base, which are more impacted by the removal of devstack-gate, but now I checked also legacy-base and its children | 16:06 |
tosky | so in addition to legacy-dsvm-os-loganalyze (which, if I understand it correctly it's going to be retired) | 16:06 |
clarkb | I don't know anyone has volunteered to retire it yet, but yes that repo is basically eol | 16:07 |
tosky | I found also that opendev/puppet-openstack_infra_spec_helper and opendev/sandbox depend on a legacy job each (legacy-puppet-openstack-infra-spec-helper-unit and legacy-sandbox-tag) | 16:07 |
clarkb | keeping it up may help some third party ci operators but the d-g jobs aren't critical to that | 16:07 |
johnsom | Hi everyone, we may have a mirror issue at limestone: https://review.opendev.org/#/c/685337/ This patch is failing a bunch of jobs as it isn't finding keystonemiddleware 9.1.0 which is on pypi, but maybe not in the limestone mirror? | 16:07 |
tosky | I've also found a few simple legacy jobs in the osf/ namespace, namely in osf/groups, and I'm not sure who is in charge for that | 16:08 |
clarkb | johnsom: as I just mentioned in #openstack-qa we don't actually mirror pypi anymore (haven't for years). It is just a caching proxy to pypi | 16:08 |
clarkb | tosky: sandbox was likely just someone adding a job and we can remove it (sandbox isn't real code its literally a push to gerrit sandbox) | 16:08 |
johnsom | Ah, I guess others are seeing this too. I will look at the scroll back in -qa, thanks! | 16:09 |
clarkb | tosky: I wonder what infra_spec_helper does with d-g. That one is a bit unexpected as our rspec jobs are not doing anything with devstack | 16:09 |
clarkb | johnsom: well it may still be an issue you're the first one to link to any job that have been affected | 16:09 |
tosky | clarkb: oh, that job does not use d-g; I just mentioned it for completeness | 16:09 |
clarkb | tosky: oh I see it is a legacy converted job but not d-g | 16:09 |
clarkb | tosky: I think we can leave that there. We are slowly replacing puppet and that repo will go away when puppet is no longer used | 16:10 |
clarkb | (and if it isn't using d-g there is less concern of d-g being a maintenance issue) | 16:10 |
tosky | totally fine by me, I though it made sense to report it :) | 16:11 |
*** dtantsur|afk has quit IRC | 16:11 | |
johnsom | Well, we have a few examples. lol | 16:11 |
clarkb | johnsom: http://mirror.regionone.limestone.opendev.org/pypi/simple/keystonemiddleware/ the package is listed there now and I am able to download the sdist and wheel | 16:12 |
clarkb | my hunch is that this was a pypi issue | 16:12 |
johnsom | Ok, I will fire off some rechecks and see what happens. I will let you know | 16:13 |
clarkb | (it is also possible that the issue is persisting but only for a subset of requests if it is a specifc pypi cdn node that is a problem) | 16:13 |
clarkb | we have seen this before | 16:13 |
johnsom | Yeah, I can run this locally without issue | 16:13 |
clarkb | in the past there was a pypi api request we could make to flush cdn entries for specific records | 16:17 |
clarkb | unfortauntely I don't think that is exposed anymore? | 16:17 |
*** ysandeep is now known as ysandeep|off | 16:19 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add ansible role to manage gerritbot https://review.opendev.org/744795 | 16:22 |
clarkb | johnsom: did any of your failures run outside of limestone? | 16:24 |
clarkb | we might be able to triangulate a bad node if so | 16:24 |
johnsom | All that I checked were limestone, but let me take a look at a few more | 16:24 |
clarkb | johnsom: also note there are a few valid failures in there | 16:25 |
clarkb | (so you may not just want to recheck) | 16:25 |
johnsom | Yeah, I see those too | 16:25 |
tosky | I've hit the issue just once, and it was limestone | 16:26 |
tosky | all the other jobs and the recheck passed | 16:26 |
tosky | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_599/745321/2/check/barbican-tempest-plugin-simply-crypto/599905c/job-output.txt | 16:26 |
clarkb | we may still be able to identify bad nodes by directing requests to sepcific fastly IPs | 16:28 |
fungi | yeah, often it's a stale cached error on one cdn node out of a pool, so the behavior winds up hitting some percentage of requests for one or more providers in the same network locality (for example i've seen it happen at the same time to ovh and inap montreal-area builds) | 16:31 |
johnsom | Yeah, so far it looks like they are all mirror.regionone.limestone.opendev.org where ovh was successful | 16:32 |
fungi | limestone is somewhere in the wilds of texas, i think | 16:33 |
fungi | so might also see it crop up in rackspace-dfw... or not | 16:33 |
clarkb | also it may have corrected itself as these things magically do | 16:33 |
fungi | odds are rackspace is large enough to warrant dedicated fastly cdn endpoints | 16:33 |
johnsom | Yeah, the recheck ovh, rax-ord, vexxhost, all passing | 16:34 |
johnsom | inap passing too | 16:35 |
clarkb | its still showing up for me loading the index from limestone. And we've been through ~2 cache expirations for that file | 16:35 |
clarkb | it meaning the package is there and the bug isn't present | 16:35 |
fungi | clarkb: revisiting the _.- pep 503 situation, where besides pkg_resources.safe_name() were you seeing . treated separately from _-? | 16:41 |
clarkb | fungi: pypi.org itself | 16:41 |
fungi | from what i can tell testing manually, pypi.org's warehouse webui and pip both seem to treat . as equivalent to _- | 16:41 |
fungi | how was it manifesting for you? | 16:41 |
fungi | trying to work out how to reproduce the problem before i go posting to distutils-sig about it | 16:42 |
clarkb | fungi: https://pypi.org/project/oslo-db/ redirects to https://pypi.org/project/oslo.db/ | 16:42 |
clarkb | fungi: whcih means if you want to know what the canonical name is without redirects translating . to - is wrong | 16:42 |
fungi | oh, got it | 16:42 |
fungi | so basically warehouse is converting backwards. it redirects all of ._- to whatever was used in the project name is my guess | 16:43 |
clarkb | the problem is that we are asking for the canonical name so that all the tools can use a consistent value without translating and checking for equivalence | 16:43 |
clarkb | but we're ended up in two different situations there depending on which tools you use/talk to | 16:43 |
fungi | yeah, i suspect pypi's idea of a canonical name is whatever was uploaded | 16:44 |
fungi | and it converts any of those characters to the uploaded dist name | 16:44 |
clarkb | and what was uploaded uses the preexisting pkg_resource rule | 16:44 |
clarkb | (because this package name far predates packaging) | 16:44 |
clarkb | I think thats my concern: these values aren't canonical because they differ | 16:44 |
fungi | does it? so dists uploaded with a _ in their names wind up with a - on pypi? | 16:44 |
clarkb | fungi: https://pypi.org/project/glance-store/ I think that is an example of that | 16:45 |
clarkb | https://opendev.org/openstack/glance_store/src/branch/master/setup.cfg#L2 | 16:45 |
clarkb | yes https://pypi.org/project/glance_store/ redirects to https://pypi.org/project/glance-store/ | 16:46 |
clarkb | the name set in setup.cfg is glance_store | 16:46 |
clarkb | so pypi (or something) is applying a canonicalization | 16:46 |
fungi | hard to know for sure if it was manually registered as glance-store though | 16:46 |
clarkb | it may be that setuptools is doing that before it talks to pypi | 16:46 |
fungi | version 0 has no files | 16:46 |
clarkb | but setuptools would use pkg_resources | 16:46 |
clarkb | anyway thats my complaint. We can't have canonical name converters that disagree | 16:47 |
clarkb | otherwise there is no canonical name | 16:47 |
clarkb | and it seems like for better or worse you have to stick with the existing rules (pkg_resources) and can't change them | 16:47 |
clarkb | otherwise you're stuck in a weird place where things don't agree on what the actual name of a thing is | 16:47 |
fungi | yeah, or use warehouse to remotely dereference them | 16:47 |
fungi | which would not be a "good idea[tm]" | 16:48 |
clarkb | fungi: maybe what they (and you?) are trying to express is hat ._- are fully equivalent and you have to check all variations? | 16:49 |
clarkb | (eg there is no true canonical name) | 16:49 |
clarkb | https://pypi.org/project/glance.store/ does also redirect to https://pypi.org/project/glance-store/ | 16:49 |
fungi | it looks like warehouse will prevent you from registering, say, a new glance.store when glance-store already exists | 16:50 |
fungi | and if you pip install glance_store or glance.store you get the package pypi lists as glance-store | 16:50 |
fungi | also pip install leaves you with a glance-store distribution according to pip list/freeze | 16:51 |
clarkb | if you pip install oslo.db it is oslo.db | 16:52 |
clarkb | (it doesn't get rewritten to oslo-db) | 16:52 |
fungi | yep. it also redirects you to oslo.db on pypi | 16:52 |
fungi | not oslo-db | 16:52 |
clarkb | right but the canonical name function will tell you the canonical name is oslo-db | 16:52 |
clarkb | I think what we are finding is that python really wants to express they are fully equivalent and there is no canonical translation | 16:53 |
fungi | so it seems like maybe distributions registered/uploaded with _ in them get rewritten to - but . does not get rewritten to - | 16:53 |
fungi | yet if you install with any of ._- in the name you get what's there | 16:53 |
fungi | but yes, this may also be setuptools itself calling pkg_resources.safe_name() on the dist name | 16:54 |
clarkb | if we ignore that that will lead to unnecessary redirects then there probably isn't very much issue with that. It does mean that for any name comparison you have ot "canonicalize" both sides of the comparison with whatever canonicalizer you are using | 16:54 |
fungi | site-packages/glance_store-2.1.0.dist-info/METADATA contains "Name: glance-store" | 16:55 |
fungi | maybe pbr itself is doing that? | 16:55 |
fungi | nope, only calls safe_name( in tests | 16:56 |
clarkb | ya so setuptools is likely using pkg_resources and that is conflicting with packaging's rules | 16:57 |
fungi | the wheel metadata.json is also listing "name": "glance-store", | 16:58 |
clarkb | fungi: the metadata to check would be for oslo.db | 16:58 |
clarkb | since we want to know if it is oslo-db or oslo.db there | 16:58 |
fungi | well, my current conjecture is that setuptools is converting _ to - when creating the metadata, because of internal use of safe_name() | 16:58 |
fungi | which long predates pep 503 (from 2015) which came about while designing warehouse | 16:59 |
fungi | Name: oslo.db | 16:59 |
fungi | "name": "oslo.db", | 16:59 |
clarkb | fungi: yup but I think packaging and pkg_resources agree that _ -> - is correct | 16:59 |
fungi | right | 17:00 |
clarkb | they disagree on whether or not . should be a - | 17:00 |
clarkb | and the . is important in python too | 17:00 |
fungi | yeah, i think . was added in pep 503 and packaging was written to implement that specification | 17:00 |
fungi | but pkg_resources.safe_name() is keeping backward compatibility to pre-503 behavior | 17:01 |
fungi | and setuptools is relying on that rather than packaging | 17:01 |
clarkb | as it arguably should | 17:01 |
fungi | so my takeaway is that this would be less problematic if pypi always redirected ._ to - and pip install always rewrote ._ to - at installation (right now i don't think pip's rewriting anything, setuptools is doing it at package generation) | 17:02 |
clarkb | yes because then we'd actually have a canonical form | 17:03 |
clarkb | but we can also give up on a canonical form and always translate both sides of a comparison | 17:03 |
clarkb | (using whatever translator you have) | 17:03 |
fungi | also we haven't even started talking about lower-casing and collapsing rules yet | 17:06 |
fungi | >>> pkg_resources.safe_name('foo.--Bar_baz') | 17:06 |
fungi | 'foo.-Bar-baz' | 17:06 |
fungi | >>> packaging.utils.canonicalize_name('foo.--Bar_baz') | 17:06 |
fungi | 'foo-bar-baz' | 17:06 |
clarkb | wow | 17:07 |
fungi | so they seem to be consistent in wanting to collapse runs of _- but packaging.utils.canonicalize_name also collapses . with them while pkg_resources.safe_name keeps runs of . untouched | 17:07 |
clarkb | the problem there becomes pkg_resources won't canonicalize a packaging value to somethign that is equivalent to a pkg_resources value | 17:08 |
clarkb | that seems much more problematic | 17:08 |
fungi | however they're not consistent about lower-casing | 17:08 |
fungi | because i guess . and lower-case were pep 503 additions | 17:08 |
fungi | and yeah, i think the trick then if you want to check a package list for an entry is to canonicalize both sides of the match before comparing | 17:10 |
openstackgerrit | Clark Boylan proposed opendev/gerritbot master: Add option to disable daemonization https://review.opendev.org/745240 | 17:10 |
clarkb | speaking of really fun python bugs ^ you can't set dest on a positional arg | 17:10 |
clarkb | which means you can't use a - in the help output | 17:10 |
clarkb | fungi: the problem is if one side of the input was already canonicalized by packaging then pkg_resources can't canonicalize that input to match its own output on arbitrary data | 17:11 |
clarkb | fungi: its basically forcing you to always use packaging because its rules are more aggressive | 17:11 |
fungi | right, well, it's forcing you to use the pep 503 canonicalization rules anyway | 17:13 |
clarkb | that seems wrong given that setuptools won't do that for its package outputs | 17:13 |
clarkb | the upside to using pkg_resources is that you'll get what pip says | 17:14 |
clarkb | and its "simpler" | 17:14 |
clarkb | also does . not being canonical imply you can't do the whole nested package paths thing that oslo tried to do? | 17:14 |
clarkb | I think we discoverd it was a bad idea but I still thought that was an intentionally designed feature that should be available? | 17:14 |
clarkb | as an aside it is interestnig that - is the value chosen by pep503/packaging since that makes it difficult for use as python identifiers | 17:16 |
clarkb | _ and . are both valid | 17:16 |
clarkb | but not -. I wonder if that is intentional | 17:16 |
fungi | oh, yeah could be that . in a dist name as an implicit package namespace declaration for module imports is a feature which has to be preserved | 17:21 |
clarkb | right but if we are preserving that why would pep503/packaging undo it? | 17:21 |
clarkb | especially witha value like - which itself is invalid in that context | 17:21 |
clarkb | that does make me think it may be intentional, but I don't understand the value of it if so | 17:22 |
fungi | i'm not sure that pep503/packaging undoes that | 17:22 |
clarkb | fungi: it converts . to - | 17:22 |
fungi | it really looks like what warehouse wants is to preserve the dist name you put in your metadata | 17:23 |
fungi | and setuptools is what's rewriting _ to - | 17:23 |
fungi | i think warehouse and pip are just treating ._- as equivalent and redirecting you to the package you seem to have requested based on those canonicalization rules | 17:24 |
fungi | and the inconsistency here is that setuptools is rewriting _ to - when creating the metadata | 17:25 |
fungi | i have a feeling that if we switched to a non-setuptools sdist/wheel generation backend we might gain the ability to have packages with _ in the name | 17:26 |
clarkb | I'm not so much concerned what hte package names actually are as much as being able to take a randomly received package name (like from constraints) and determining if I already have that package somewhere else | 17:36 |
clarkb | and it seems the only reliable way to do that may be with packgaing because the rules it uses are most strict | 17:36 |
fungi | yep | 17:37 |
fungi | resolve all the packages you have and then resolve the one in question and see if it's included | 17:37 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add ansible role to manage gerritbot https://review.opendev.org/744795 | 17:44 |
clarkb | I'm hoping ^ is the last ps before that becomes mergeable. I jsut want to double check the log output and clean up my test asserts | 17:44 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Loop over incomplete subunit files properly https://review.opendev.org/745382 | 18:06 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add ansible role to manage gerritbot https://review.opendev.org/744795 | 18:19 |
clarkb | assuming I got those test changes correct I think ^ is ready for review | 18:19 |
clarkb | and now I'm going to get a bike ride in | 18:20 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Loop over incomplete subunit files properly https://review.opendev.org/745382 | 18:33 |
fungi | rackspace says likely host outage impacting ze01 | 19:17 |
fungi | i'll check it after i'm done eating | 19:18 |
fungi | it hasn't been rebooted yet | 19:34 |
mnaser | i'm seeing a fair bit of RETRY_LIMIT jobs failing | 19:51 |
mnaser | e.g | 19:52 |
mnaser | https://zuul.opendev.org/t/vexxhost/build/276f5667372a4e24a7cfa458c900f26b | 19:52 |
mnaser | few others that are in flight | 19:52 |
mnaser | i dont see any recent zuul-jobs changes | 19:53 |
mnaser | thing in project-config either | 19:53 |
mnaser | possible that a mirror is bad somewhere so failing in pre inside configure-mirrors? i havent grasped a log yet | 19:53 |
mnaser | yeah, they're surfacing in the openstack tenant too | 19:54 |
mnaser | https://zuul.opendev.org/t/openstack/builds?result=RETRY_LIMIT | 19:54 |
mnaser | http://grafana.openstack.org/d/ykvSNcImk/nodepool-inap?orgId=1 i wonder if that has to do with it | 19:56 |
mnaser | http://grafana.openstack.org/d/8wFIHcSiz/nodepool-rackspace?orgId=1 -- the dip seems to be everywhere though | 19:56 |
mnaser | cc infra-root ^ | 19:56 |
fungi | huh, no logs? | 19:57 |
mnaser | nope | 19:58 |
mnaser | ive been trying to catch one in console logs | 19:58 |
fungi | i'll run 276f5667372a4e24a7cfa458c900f26b down in executor logs | 19:58 |
mnaser | im gonna see if i catch one in retry limit in zuul console | 19:58 |
fungi | ran on ze06, so not the one that rackspace said was impacted by a host issue | 19:59 |
mnaser | it seems to be failing right away from observing | 20:01 |
*** hashar has joined #opendev | 20:01 | |
mnaser | i dont even have time to get a console, just goes straight into a new attempt | 20:01 |
fungi | OSError: [Errno 30] Read-only file system: '/var/lib/zuul/builds/276f5667372a4e24a7cfa458c900f26b' | 20:01 |
mnaser | welp | 20:01 |
mnaser | that'll do it | 20:01 |
mnaser | reboot and crossed fingers i guess | 20:01 |
fungi | [Fri Aug 7 19:15:20 2020] print_req_error: I/O error, dev xvde, sector 85024777 | 20:01 |
fungi | yeah, rebooting it now | 20:02 |
mnaser | probably worth checking the other executors | 20:02 |
mnaser | given the high rate of retry_limit i wouldn't be surprised if it's impacted a bit more than that | 20:02 |
mnaser | there's 10 executors i think? | 20:02 |
fungi | #status log rebooted ze06 after it started complaining about i/o errors for /dev/xvde and eventually set the filesystem read-only, impacting job execution resulting in retry_limit results in some cases | 20:03 |
openstackstatus | fungi: finished logging | 20:03 |
fungi | i'll check the others now | 20:03 |
mnaser | fungi: might be the only one though, http://grafana.openstack.org/d/T6vSHcSik/zuul-status?orgId=1 shows the huge dip in ram usage for ze06 -- don't see the pattern for the rest | 20:04 |
mnaser | doesn't mean it's not worth checking but yeah | 20:04 |
fungi | yeah, no i'm still checking them all to be absolutely sure ;) | 20:05 |
fungi | no similar issues in the other 11 executors | 20:07 |
fungi | though i'll likely have to server reboot --hard via the api, i don't think 06 is able to shut down cleanly. seems to probably be hung | 20:07 |
fungi | yeesh | 20:10 |
fungi | "This message is a follow-up to our previous message regarding your server migration. At this time we are still in the process of migrating your cloud server, ze06.openstack.org, '15f68fd9-c1e0-4346-84e9-0f3275bb0668' to a new host. We will notify you once the migration is complete and we have verified that your cloud server is online. Please do not attempt to access or modify | 20:11 |
fungi | '15f68fd9-c1e0-4346-84e9-0f3275bb0668' during this process." | 20:11 |
fungi | yeah, your "previous message" said "ze01.openstack.org, '0cbe6ecb-be68-43aa-ba0d-58296a81ebcf'" | 20:11 |
fungi | so, er, not the same server | 20:11 |
* fungi sighs | 20:11 | |
corvus | o/ | 20:15 |
mnaser | fungi: well that message might explain things | 20:17 |
corvus | fungi: looks like we're waiting for rax to be done with ze06? | 20:17 |
mnaser | or maybe their automation caught our in progress reboot and threw it all off :P | 20:17 |
clarkb | we can disable the service there if necessary | 20:18 |
mnaser | it would be interesting to get the full logs for the trace on that executor | 20:18 |
clarkb | while we wait | 20:18 |
mnaser | perhaps making zuul-executor hard-exit if it hits cases where user intervention is needed | 20:18 |
fungi | i expect what happened is that they fat-fingered the server id in the initial message, and the rootfs was actually disconnected as part of the host issue impacting ze06 | 20:19 |
fungi | anyway, i updated the ticket to let them know i rebooted it | 20:19 |
corvus | we could have it pause or gracefully exit if it can't perform job prep steps | 20:20 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add ansible role to manage gerritbot https://review.opendev.org/744795 | 20:20 |
fungi | i doubt we need to disengage anything, it's likely going to get a hard reboot anyway as part of the host migration, judging from their usual process | 20:20 |
clarkb | that fixes a minor test issue and nowI really do think it will pass | 20:20 |
fungi | it shouldn't be running any new jobs i don't think until the reboot completes | 20:21 |
corvus | fungi: agreed | 20:21 |
fungi | the server seems to be hard down at the moment anyway | 20:21 |
fungi | so it's not like we could disable anything on it right now if we wanted to | 20:21 |
fungi | okay, maybe both ze01 and ze06 were on the same host and they neglected to give us an initial message about 06, because right now both of them are not responding | 20:30 |
fungi | though 01 was responsive for a while after they opened the ticket about it (and not in any apparent distress, unlike 06) | 20:31 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Loop over incomplete subunit files properly https://review.opendev.org/745382 | 21:01 |
clarkb | https://review.opendev.org/744795 passes testing now so I think that whole stack is ready for review | 21:06 |
*** DSpider has quit IRC | 21:13 | |
*** hashar has quit IRC | 21:27 | |
*** qchris has quit IRC | 22:22 | |
*** qchris has joined #opendev | 22:35 | |
*** tosky has quit IRC | 22:58 | |
clarkb | mnaser: as a heads up I ended up deleting clarkb-test1 as it seems like the issue has resolved itself | 23:23 |
clarkb | ianw: small problem on https://review.opendev.org/#/c/744821/2 with filtering of dsa fingerprints. Otherwise lgtm | 23:31 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!