openstackgerrit | Clark Boylan proposed opendev/system-config master: Double stack size on gitea https://review.opendev.org/654634 | 00:00 |
---|---|---|
clarkb | there we go | 00:00 |
clarkb | I'm gonna go track down dinner now | 00:00 |
clarkb | but will try to keep an eye on ^ as fixing that will be nice | 00:00 |
*** ijw has quit IRC | 00:17 | |
*** mattw4 has quit IRC | 00:23 | |
*** michael-beaver has quit IRC | 00:23 | |
*** gyee has quit IRC | 00:29 | |
openstackgerrit | Merged opendev/system-config master: Use swift to back intermediate docker registry https://review.opendev.org/653613 | 00:30 |
*** mriedem has quit IRC | 00:30 | |
*** dave-mccowan has joined #openstack-infra | 00:35 | |
*** Weifan has quit IRC | 00:42 | |
*** markvoelker has joined #openstack-infra | 00:51 | |
*** jamesmcarthur has quit IRC | 00:54 | |
*** smarcet has joined #openstack-infra | 01:00 | |
*** whoami-rajat has joined #openstack-infra | 01:01 | |
*** ricolin has joined #openstack-infra | 01:15 | |
*** diablo_rojo has quit IRC | 01:16 | |
*** smarcet has quit IRC | 01:24 | |
*** smarcet has joined #openstack-infra | 01:28 | |
mordred | clarkb, corvus: sorry - was AFK way more today than I originally expected - had to deal with a bunch of family stuff - I think I'm caught up on openstack/openstack, stack sizes - and intermediate registries from scrollback - nice work on all of that | 01:36 |
*** rlandy|ruck has quit IRC | 01:46 | |
*** hwoarang has quit IRC | 01:55 | |
*** hwoarang has joined #openstack-infra | 02:00 | |
clarkb | mordred: isnt that a fun git bug? | 02:02 |
*** nicolasbock has quit IRC | 02:04 | |
*** _erlon_ has quit IRC | 02:05 | |
*** ykarel|away has joined #openstack-infra | 02:26 | |
*** anteaya has joined #openstack-infra | 02:34 | |
*** Weifan has joined #openstack-infra | 02:44 | |
*** dklyle has quit IRC | 02:45 | |
*** dklyle has joined #openstack-infra | 02:46 | |
*** Weifan has quit IRC | 02:48 | |
*** dave-mccowan has quit IRC | 02:48 | |
*** bhavikdbavishi has joined #openstack-infra | 03:04 | |
*** ykarel|away is now known as ykarel | 03:06 | |
*** hongbin has joined #openstack-infra | 03:10 | |
*** bhavikdbavishi has quit IRC | 03:10 | |
*** Qiming has quit IRC | 03:17 | |
*** hwoarang has quit IRC | 03:18 | |
*** rh-jelabarre has quit IRC | 03:19 | |
*** hwoarang has joined #openstack-infra | 03:24 | |
*** bhavikdbavishi has joined #openstack-infra | 03:28 | |
*** zhangfei has joined #openstack-infra | 03:39 | |
*** zhangfei has quit IRC | 03:40 | |
*** zhangfei has joined #openstack-infra | 03:41 | |
*** lpetrut has joined #openstack-infra | 03:50 | |
*** yamamoto has quit IRC | 04:11 | |
*** yamamoto has joined #openstack-infra | 04:12 | |
*** lpetrut has quit IRC | 04:13 | |
*** ramishra has joined #openstack-infra | 04:15 | |
*** yamamoto has quit IRC | 04:19 | |
*** udesale has joined #openstack-infra | 04:22 | |
*** yamamoto has joined #openstack-infra | 04:33 | |
*** hongbin has quit IRC | 04:33 | |
*** zhangfei has quit IRC | 04:43 | |
*** markvoelker has quit IRC | 04:57 | |
*** Weifan has joined #openstack-infra | 05:00 | |
*** Weifan has quit IRC | 05:00 | |
*** ykarel is now known as ykarel|afk | 05:01 | |
*** ykarel|afk has quit IRC | 05:06 | |
*** raukadah is now known as chandankumar | 05:11 | |
*** jaosorior has joined #openstack-infra | 05:17 | |
*** yamamoto has quit IRC | 05:21 | |
*** zhurong has joined #openstack-infra | 05:23 | |
*** ykarel|afk has joined #openstack-infra | 05:23 | |
*** yamamoto has joined #openstack-infra | 05:23 | |
*** ykarel|afk is now known as ykarel | 05:24 | |
*** yamamoto has quit IRC | 05:26 | |
*** yamamoto has joined #openstack-infra | 05:27 | |
*** yamamoto has quit IRC | 05:27 | |
*** ykarel_ has joined #openstack-infra | 05:28 | |
*** ykarel has quit IRC | 05:31 | |
*** armax has quit IRC | 05:32 | |
*** kjackal has joined #openstack-infra | 05:33 | |
*** ccamacho has quit IRC | 05:46 | |
dangtrinhnt | Hi infra time. Right now the default topic of #poenstack-fenix channel is a little weird. I would like to change that but looks like I don't have enough privileges to do that. If someone can help, it would be great. Many thanks. | 05:47 |
dangtrinhnt | Infra Team. | 05:47 |
*** quiquell|off is now known as quiquell|rover | 05:49 | |
*** yamamoto has joined #openstack-infra | 05:51 | |
*** kjackal has quit IRC | 05:56 | |
*** lpetrut has joined #openstack-infra | 06:00 | |
*** pcaruana has joined #openstack-infra | 06:06 | |
AJaeger | config-core, here's a change to use py36 for some periodic jobs - please put on your review queue: https://review.opendev.org/654571 | 06:08 |
*** electrofelix has joined #openstack-infra | 06:14 | |
icey | I think I'm missing a project after the openstack->opendev migration? when I try to `git review ...` I get "fatal: Project not found: openstack/charm-vault ... fatal: Could not read from remote repository." I'm guessing it's because it somehow moved into a namespace "x" on opendev.org (https://opendev.org/x/charm-vault) | 06:15 |
quiquell|rover | hello, what's the replacement for https://git.openstack.org/cgit/... with opendev ? | 06:17 |
*** kjackal has joined #openstack-infra | 06:18 | |
*** dpawlik has joined #openstack-infra | 06:18 | |
icey | quiquell|rover: opendev.org seems to be | 06:21 |
*** slaweq has joined #openstack-infra | 06:21 | |
*** yamamoto has quit IRC | 06:22 | |
quiquell|rover | sshnaidm|afk: ^ | 06:23 |
quiquell|rover | sshnaidm|afk: fixed reproducer with latests comments https://review.rdoproject.org/r/20371 | 06:23 |
*** yamamoto has joined #openstack-infra | 06:25 | |
*** yamamoto has quit IRC | 06:25 | |
AJaeger | quiquell|rover: the old https URLs should redirect | 06:26 |
*** yamamoto has joined #openstack-infra | 06:26 | |
AJaeger | icey: yes, see all the emails on openstack-infra, openstack-discuss about OpenDev | 06:26 |
icey | AJaeger: I've seen the emails, I'm wondering why most of the openstack-charms stayed under openstack, and charm-vault moved :-/ | 06:27 |
AJaeger | icey: you need to update your ssh remotes, we cannot redirect those. | 06:27 |
AJaeger | icey: charm-vault is not an official OpenStack project | 06:27 |
icey | AJaeger: interesting :-/ | 06:27 |
AJaeger | icey: not listed here: https://governance.openstack.org/tc/reference/projects/openstack-charms.html | 06:28 |
icey | AJaeger: indeed - I suspect that's an oversight; annoying but thanks :) | 06:28 |
*** yamamoto has quit IRC | 06:29 | |
*** yamamoto has joined #openstack-infra | 06:29 | |
*** yamamoto has quit IRC | 06:29 | |
AJaeger | icey: it's no oversight, see https://review.opendev.org/#/c/541287/ - the PTL rejected it to be part of official charm | 06:30 |
AJaeger | bbl | 06:30 |
*** ykarel_ is now known as ykarel | 06:31 | |
icey | I see, thanks again AJaeger | 06:31 |
ykarel | Looks like OpenStack Release Bot sending wrong updates to .gitreview | 06:32 |
ykarel | without rebasing | 06:32 |
ykarel | see some last updates:- https://review.opendev.org/#/q/owner:OpenStack+Release+Bot+gitreview | 06:33 |
ykarel | infra-root ^^ AJaeger ^^ | 06:33 |
*** udesale has quit IRC | 06:35 | |
*** dciabrin has joined #openstack-infra | 06:35 | |
*** udesale has joined #openstack-infra | 06:37 | |
ykarel | Okk seems those are old reviews posted before migration | 06:38 |
ykarel | but merging those as it is without rebase will override .gitreview | 06:39 |
*** bhavikdbavishi has quit IRC | 06:39 | |
*** bhavikdbavishi has joined #openstack-infra | 06:41 | |
*** quiquell|rover is now known as quique|rover|brb | 06:42 | |
*** udesale has quit IRC | 06:44 | |
*** udesale has joined #openstack-infra | 06:44 | |
*** zhangfei has joined #openstack-infra | 06:51 | |
*** markvoelker has joined #openstack-infra | 06:52 | |
*** pgaxatte has joined #openstack-infra | 06:58 | |
AJaeger | ykarel: best talk with release team... | 06:58 |
ykarel | AJaeger, ack, seems they are in some other timezone, already posted a issue this morning | 06:59 |
AJaeger | ykarel: patching those is fine as well ;) | 06:59 |
ykarel | s/a/other | 06:59 |
*** yamamoto has joined #openstack-infra | 07:03 | |
*** ginopc has joined #openstack-infra | 07:08 | |
*** yamamoto has quit IRC | 07:11 | |
*** quique|rover|brb is now known as quiquell|rover | 07:12 | |
*** arxcruz|off|23 is now known as arxcruz | 07:13 | |
*** ccamacho has joined #openstack-infra | 07:13 | |
*** iurygregory has joined #openstack-infra | 07:14 | |
*** zhangfei has quit IRC | 07:15 | |
*** zhangfei has joined #openstack-infra | 07:15 | |
*** tosky has joined #openstack-infra | 07:17 | |
*** udesale has quit IRC | 07:18 | |
*** udesale has joined #openstack-infra | 07:19 | |
*** ccamacho has quit IRC | 07:27 | |
*** ccamacho has joined #openstack-infra | 07:27 | |
*** yamamoto has joined #openstack-infra | 07:31 | |
*** yamamoto has quit IRC | 07:31 | |
*** fmount has quit IRC | 07:32 | |
*** yamamoto has joined #openstack-infra | 07:33 | |
openstackgerrit | Jason Lee proposed opendev/storyboard master: WIP: Updated Loader functionality in preparation for Writer https://review.opendev.org/654812 | 07:33 |
*** fmount has joined #openstack-infra | 07:35 | |
*** gfidente has joined #openstack-infra | 07:40 | |
tosky | AJaeger: uhm, huge bunch of wrong fixes for the opendev transition | 07:40 |
*** jpena|off has joined #openstack-infra | 07:43 | |
*** jpena|off is now known as jpena | 07:43 | |
*** ykarel is now known as ykarel|lunch | 07:45 | |
openstackgerrit | Bernard Cafarelli proposed openstack/project-config master: Update Grafana dashboards for stable Neutron releases https://review.opendev.org/653354 | 07:52 |
*** dtantsur|afk is now known as dtantsur | 07:55 | |
*** jpich has joined #openstack-infra | 07:55 | |
*** kjackal has quit IRC | 07:57 | |
*** kjackal has joined #openstack-infra | 07:58 | |
*** roman_g has joined #openstack-infra | 08:01 | |
*** rpittau|afk is now known as rpittau | 08:08 | |
*** helenafm has joined #openstack-infra | 08:08 | |
*** gfidente has quit IRC | 08:12 | |
*** lseki has joined #openstack-infra | 08:12 | |
*** lucasagomes has joined #openstack-infra | 08:15 | |
*** dikonoor has joined #openstack-infra | 08:28 | |
*** derekh has joined #openstack-infra | 08:28 | |
*** apetrich has joined #openstack-infra | 08:30 | |
frickler | infra-root: do we already have a plan to make opendev.org listen on IPv6? seems the lack of that is actively breaking things, see e.g. this paste posted in #-qa http://paste.openstack.org/show/749620/ . we might want to drop the AAAA record until we get that fixed | 08:31 |
*** ginopc has quit IRC | 08:33 | |
*** rossella_s has joined #openstack-infra | 08:36 | |
*** ginopc has joined #openstack-infra | 08:39 | |
*** e0ne has joined #openstack-infra | 08:39 | |
*** tkajinam has quit IRC | 08:42 | |
*** mleroy has joined #openstack-infra | 08:52 | |
*** ykarel|lunch is now known as ykarel | 08:52 | |
*** dikonoor has quit IRC | 08:56 | |
*** jbadiapa has joined #openstack-infra | 09:02 | |
*** ginopc has quit IRC | 09:06 | |
*** dikonoor has joined #openstack-infra | 09:09 | |
*** gfidente has joined #openstack-infra | 09:20 | |
*** kjackal has quit IRC | 09:24 | |
*** kjackal has joined #openstack-infra | 09:24 | |
*** lpetrut has quit IRC | 09:30 | |
AJaeger | tosky: yeah, I handed out a few -1s ;( | 09:30 |
*** jaosorior has quit IRC | 09:31 | |
*** amansi26 has joined #openstack-infra | 09:35 | |
*** yamamoto has quit IRC | 09:40 | |
*** Lucas_Gray has joined #openstack-infra | 09:42 | |
*** gfidente has quit IRC | 09:46 | |
*** yamamoto has joined #openstack-infra | 09:48 | |
*** yamamoto has quit IRC | 09:53 | |
*** jcoufal has joined #openstack-infra | 09:54 | |
*** bhavikdbavishi has quit IRC | 09:59 | |
*** gfidente has joined #openstack-infra | 10:00 | |
*** kjackal has quit IRC | 10:01 | |
*** jaosorior has joined #openstack-infra | 10:14 | |
*** threestrands has quit IRC | 10:15 | |
*** lseki has quit IRC | 10:16 | |
*** kjackal has joined #openstack-infra | 10:16 | |
*** gfidente has quit IRC | 10:32 | |
*** sshnaidm|afk is now known as sshnaidm | 10:40 | |
*** bhavikdbavishi has joined #openstack-infra | 10:44 | |
*** amansi26 has quit IRC | 10:46 | |
*** ginopc has joined #openstack-infra | 10:47 | |
*** bhavikdbavishi has quit IRC | 10:56 | |
*** nicolasbock has joined #openstack-infra | 11:00 | |
*** jaosorior has quit IRC | 11:00 | |
*** dikonoor has quit IRC | 11:03 | |
aspiers | git.openstack.org[0: 23.253.125.17]: errno=No route to host | 11:08 |
aspiers | git.openstack.org[1: 2001:4800:7817:103:be76:4eff:fe04:e3e3]: errno=Network is unreachable | 11:08 |
aspiers | infra-root: is this expected post-transition? | 11:08 |
openstackgerrit | Ghanshyam Mann proposed openstack/openstack-zuul-jobs master: Add python36-charm-jobs project template https://review.opendev.org/654954 | 11:09 |
*** happyhemant has joined #openstack-infra | 11:09 | |
* aspiers reads through mail threads to see if he missed something | 11:09 | |
*** yamamoto has joined #openstack-infra | 11:12 | |
*** ykarel is now known as ykarel|afk | 11:13 | |
frickler | aspiers: no, those are expected to work and do work fine for me, maybe a local networking issue for you? we do have a known issue with opendev.org not responding via IPv6, though | 11:14 |
aspiers | frickler: I just saw the same issue reported above in the scrollback | 11:14 |
aspiers | nope, this is not an IPv6 issue - see the above paste which is both IPv4 and v6 | 11:15 |
aspiers | frickler: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2019-04-22.log.html | 11:16 |
aspiers | clarkb: in case you didn't know, there's also git remote set-url these days; no need to remove and re-add | 11:17 |
frickler | aspiers: yeah, I haven't read much scrollback yet. so you did have some git:// remote? | 11:17 |
aspiers | frickler: yes | 11:18 |
frickler | aspiers: ah, o.k., that's really some weird way of reporting errors | 11:19 |
aspiers | clarkb: although I'm not sure if that actually has much benefit, since I guess a remote remove won't immediately GC all the objects from that remote forcing a redownload | 11:19 |
aspiers | frickler: sorry, which way is weird? | 11:19 |
*** yamamoto has quit IRC | 11:19 | |
frickler | aspiers: oh, sorry, that could be misunderstood. I was talking about git, not about you | 11:21 |
aspiers | :) | 11:21 |
aspiers | frickler: you mean No route to host? | 11:21 |
frickler | aspiers: yes, that and network unreachable. they really should be "connection refused" instead | 11:21 |
aspiers | frickler: actually that error message comes straight from the OS via strerror(3), so I think it has to be correct | 11:24 |
*** bhavikdbavishi has joined #openstack-infra | 11:25 | |
aspiers | although I can ping 23.253.125.17 so it does seem weird | 11:25 |
aspiers | there must be something else strange going on | 11:25 |
*** bhavikdbavishi has quit IRC | 11:25 | |
frickler | aspiers: hmm, via tcpdump I see "ICMP host 23.253.125.17 unreachable - admin prohibited", that doesn't match to "No route to host" to me | 11:25 |
frickler | aspiers: but yeah, maybe a kernel thing instead of git | 11:26 |
aspiers | frickler: in any case, a) you are saying it's supposed to work? and b) what's the correct future-proof git:// host to use? | 11:26 |
*** smarcet has quit IRC | 11:26 | |
*** bhavikdbavishi has joined #openstack-infra | 11:26 | |
frickler | aspiers: no, the git:// variant is no longer working, since our new frontend doesn't support it. changing to http(s) is the correct way to fix this issue | 11:27 |
frickler | aspiers: I was only confused by the error message | 11:28 |
aspiers | ah | 11:28 |
aspiers | was it announced anywhere that git:// no longer works? if so I missed it | 11:28 |
aspiers | if not, I fear you can expect a lot more questions about this | 11:28 |
frickler | it should have been in one of the mails early on | 11:28 |
*** rh-jelabarre has joined #openstack-infra | 11:29 | |
frickler | there was also a set of patches removing the git:// references from devstack and elsewhere | 11:29 |
frickler | let me check the archives | 11:29 |
*** bhavikdbavishi1 has joined #openstack-infra | 11:30 | |
aspiers | it wasn't in an announcement, but it was buried in 3 followups within that large thread | 11:30 |
aspiers | http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004921.html | 11:30 |
*** bhavikdbavishi has quit IRC | 11:31 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 11:31 | |
*** apetrich has quit IRC | 11:32 | |
*** zhangfei has quit IRC | 11:32 | |
*** jpena is now known as jpena|lunch | 11:33 | |
frickler | aspiers: yeah, seems like this was a bit hidden, sorry for that | 11:33 |
aspiers | np :) just want to help you avoid a flood of duplicate questions | 11:34 |
aspiers | frickler: here's a nice workaround to advertise: | 11:35 |
aspiers | git config --global url.https://git.openstack.org/.insteadof git://git.openstack.org/ | 11:35 |
AJaeger | frickler: could you review https://review.opendev.org/#/c/654954/ and https://review.opendev.org/654571 , please? | 11:35 |
*** quiquell|rover is now known as quique|rover|lun | 11:35 | |
*** quique|rover|lun is now known as quique|rover|eat | 11:36 | |
*** apetrich has joined #openstack-infra | 11:36 | |
AJaeger | argh, just gave -1 on 954... | 11:37 |
*** bhavikdbavishi has quit IRC | 11:42 | |
*** bhavikdbavishi has joined #openstack-infra | 11:43 | |
*** lyarwood has joined #openstack-infra | 11:44 | |
aspiers | OK this is like a million times nicer https://opendev.org/openstack/nova-specs | 11:48 |
aspiers | kudos corvus clarkb fungi and all of infra-root! | 11:49 |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Use py36 instead of py35 for periodic master jobs https://review.opendev.org/654571 | 11:50 |
openstackgerrit | Ghanshyam Mann proposed openstack/openstack-zuul-jobs master: Add python36-charm-jobs project template https://review.opendev.org/654954 | 11:55 |
*** yamamoto has joined #openstack-infra | 11:57 | |
*** panda is now known as panda|lunch | 11:57 | |
*** quique|rover|eat is now known as quiquell|rover | 11:59 | |
*** markvoelker has quit IRC | 12:01 | |
*** rlandy has joined #openstack-infra | 12:06 | |
*** boden has joined #openstack-infra | 12:07 | |
*** rlandy is now known as rlandy|ruck | 12:07 | |
frickler | infra-root: the hashdiff-0.3.9 gem breaks beaker-trusty, previous passing job had hashdiff-0.3.8 http://logs.openstack.org/77/654577/1/gate/openstackci-beaker-ubuntu-trusty/dfa87dd/job-output.txt.gz#_2019-04-22_20_56_13_419699 | 12:09 |
AJaeger | frickler: ah. Can we use that version for now? | 12:10 |
*** mriedem has joined #openstack-infra | 12:11 | |
frickler | AJaeger: not sure, maybe someone with more puppet voodoo than myself can find a fix. otherwise I'd propose to make that job non-voting for now | 12:11 |
boden | hi. I'm trying to understand if/when we might expect "Hound" (code search) to work again? I sent a note to the ML http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005481.html. but never saw a response about when it might be available | 12:11 |
*** jbadiapa has quit IRC | 12:22 | |
*** gfidente has joined #openstack-infra | 12:28 | |
*** jpena|lunch is now known as jpena | 12:31 | |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Add openstacksdk-functional-devstack-networking job to Neutron dashboard https://review.opendev.org/652993 | 12:31 |
*** gfidente has quit IRC | 12:34 | |
AJaeger | config-core, could you review https://review.opendev.org/654574 as next step for py36 jobs, please? thanks! | 12:39 |
AJaeger | boden: idea is to use https://opendev.org/explore/code instead of codesearch | 12:40 |
*** gfidente has joined #openstack-infra | 12:40 | |
*** kaiokmo has joined #openstack-infra | 12:40 | |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Add python36-charm-jobs project template https://review.opendev.org/654954 | 12:40 |
boden | AJaeger as of right now there doesn't seem to be parity... search results that I need return nothing with https://opendev.org/explore/code | 12:41 |
openstackgerrit | Tobias Henkel proposed openstack/diskimage-builder master: Support defining the free space in the image https://review.opendev.org/655127 | 12:41 |
boden | AJaeger I don't see code explorer working for our needs as-is, unless I'm missing something | 12:41 |
*** kgiusti has joined #openstack-infra | 12:41 | |
*** bhavikdbavishi has quit IRC | 12:42 | |
boden | AJaeger for example I want to find all the requirements.txt files that have the string "neutron-lib-current", but it seems to not find anything https://opendev.org/explore/code?q=neutron-lib-current&tab= | 12:42 |
frickler | boden: you need to quote the "-" as "\-" | 12:42 |
boden | frickler unless I'm missing something, it doesn't help | 12:44 |
boden | there are at least 20 projects that have the string "neutron-lib-current" in their requirements.txt file... | 12:44 |
*** kjackal has quit IRC | 12:44 | |
*** kjackal has joined #openstack-infra | 12:44 | |
*** jamesmcarthur has joined #openstack-infra | 12:46 | |
*** lseki has joined #openstack-infra | 12:46 | |
frickler | boden: do you have an example? I can't seem to find one easily | 12:48 |
AJaeger | frickler: dragonflow/requirements.txt | 12:48 |
boden | frickler https://github.com/openstack/networking-ovn/blob/master/requirements.txt#L24 | 12:48 |
boden | frickler, or as another example I want to find all uses of the (neutron) constant "ROUTER_CONTROLLER"... I can't find any, and there are some for sure | 12:49 |
*** xarses_ has joined #openstack-infra | 12:49 | |
*** rh-jelabarre has quit IRC | 12:50 | |
*** ykarel|afk is now known as ykarel | 12:51 | |
*** aaronsheffield has joined #openstack-infra | 12:51 | |
*** xarses has quit IRC | 12:52 | |
frickler | boden: hmm, that's indeed strange. searching for some other term like "policy\-in\-code" seems to work fine. maybe https://opendev.org/explore/code?q=infra+initiatives&tab= can help you as a workaround for the first search. but there certainly seems to be an issue with terms containing "_" | 12:55 |
AJaeger | boden: for your specific query: in openstack namespace it's neutron, neutron-lib, networking-odl. I don't have the namespace x checked out to grep there | 12:57 |
*** andreww has joined #openstack-infra | 12:58 | |
*** xarses_ has quit IRC | 13:01 | |
*** jpich has quit IRC | 13:01 | |
*** gfidente has quit IRC | 13:01 | |
*** smarcet has joined #openstack-infra | 13:01 | |
*** jpich has joined #openstack-infra | 13:02 | |
openstackgerrit | Monty Taylor proposed zuul/nodepool master: Update devstack settings and docs for opendev https://review.opendev.org/654230 | 13:03 |
mordred | frickler: I pushed up https://review.opendev.org/655133 which I *think* should fix that ^^ | 13:03 |
mordred | boden: I was out yesterday so I'm not 100% caught up on hound - I believe clarkb was looking at it yesterday though | 13:04 |
*** eharney has quit IRC | 13:04 | |
AJaeger | mordred: could you review a small py35->36 change, please? https://review.opendev.org/#/c/654574/ | 13:05 |
mordred | we'd definitely LIKE to replace it with the gitea codesearch - but I think more work might need to go in to that to make it suitable (there is now upstream support for pluggable search backends and we'd like to get an elasticsearch backend put in there, for instance) | 13:05 |
AJaeger | thanks, mordred | 13:07 |
boden | frickler I'm not sure we can count on "infra initiatives" being there... AJaeger we also have projects in the x/ namespace that we need to search | 13:09 |
boden | just as a heads up this will certainly impact some of our work on neutron blueprints since we need the ability to search cross projects for impacts | 13:10 |
openstackgerrit | Merged openstack/project-config master: Use py36 instead of py35 for periodic master jobs https://review.opendev.org/654574 | 13:13 |
clarkb | mordred: boden I've not had achance to look athound yet. I think it updates based on projects.yaml but needs a restart? unsure. I'm likely to followup on git stack size segfaults for openstack/openstack first today then can start looking athound | 13:14 |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-neutron-lib-master https://review.opendev.org/654580 | 13:14 |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-ovsdbapp-master https://review.opendev.org/655136 | 13:14 |
AJaeger | clarkb: for the git stack, we need to fix the system-config tests first, see above for frickler comment on hashdiff-0.3.9 breaking beaker-trusty. That needs fixing first. | 13:14 |
AJaeger | clarkb: and good morning! | 13:15 |
*** jbadiapa has joined #openstack-infra | 13:15 | |
clarkb | AJaeger: got it and thanks. I'm not quite "here" yet | 13:15 |
clarkb | the openstack infra ci helper has our package pins iirc | 13:16 |
*** jamesmcarthur has quit IRC | 13:18 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Set logo height rather than width https://review.opendev.org/655139 | 13:19 |
openstackgerrit | Darragh Bailey (electrofelix) proposed zuul/zuul master: Improve proxy settings support for compose env https://review.opendev.org/655140 | 13:20 |
openstackgerrit | Darragh Bailey (electrofelix) proposed zuul/zuul master: Add some packages for basic python jobs https://review.opendev.org/655141 | 13:20 |
openstackgerrit | Darragh Bailey (electrofelix) proposed zuul/zuul master: Scale nodes up to 4 instances https://review.opendev.org/655142 | 13:20 |
mriedem | mnaser: i see that http://status.openstack.org/elastic-recheck/#1806912 hits predominantly in vexxhost-sjc1 nodes, any idea if there could be something slowing down g-api startup there? | 13:21 |
boden | clarkb okay... should I create a bug or something to track this work, or no? | 13:21 |
mordred | clarkb: the config.json definitely looks updated on codesearch - want me to just restart the service? | 13:21 |
*** lpetrut has joined #openstack-infra | 13:21 | |
clarkb | mordred: probably worth a try | 13:22 |
clarkb | boden: I'm not sureit is needed. It id a known thing just lower on priority list because worst caselocal grep works | 13:22 |
mordred | oh - nope. there's another issue | 13:22 |
openstackgerrit | Andreas Jaeger proposed opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 https://review.opendev.org/655143 | 13:23 |
AJaeger | clarkb: is that the proper fix for hashdiff? ^ | 13:23 |
*** panda|lunch is now known as panda | 13:23 | |
clarkb | AJaeger: I think so | 13:24 |
*** redrobot has joined #openstack-infra | 13:24 | |
openstackgerrit | Monty Taylor proposed opendev/jeepyb master: Use opendev and https by default https://review.opendev.org/655145 | 13:24 |
mnaser | mriedem: odd. has it increased recently? We added IPv6 one or two weeks ago | 13:24 |
mordred | clarkb, frickler: ^^ that is needed to fix codesearch | 13:24 |
mriedem | mnaser: what's really weird is the g-api logs show it only taking about 7 seconds for g-api to startup | 13:25 |
mriedem | http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/screen-g-api.txt.gz | 13:25 |
*** jrist- is now known as jrist | 13:25 | |
mriedem | http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/devstacklog.txt.gz#_2019-04-23_05_17_22_811 | 13:25 |
*** sshnaidm is now known as sshnaidm|afk | 13:26 | |
*** quiquell|rover is now known as quique|rover|lun | 13:26 | |
*** quique|rover|lun is now known as quique|rover|eat | 13:26 | |
mriedem | looks like devstack is uploading an image and then waiting to get the image back and maybe it's taking longer in swift? | 13:27 |
openstackgerrit | Monty Taylor proposed opendev/jeepyb master: Use opendev and https by default https://review.opendev.org/655145 | 13:27 |
boden | clarkb local grep isn't really an option for this work; we are talking 10s of projects that would need to be up to date to grep across them and whats more we can't share that search with people as its used in the code reviews | 13:27 |
mordred | boden: yeah. we're working on getting codesearch fixed | 13:28 |
mriedem | rosmaita: ^ on g-api taking over a minute to 'start' in case you can identify something in those logs | 13:28 |
mnaser | mriedem: is it possible that it is trying to check if its listening on ipv4/ipv6 and the check is verifying the other port? | 13:28 |
mriedem | rosmaita: http://status.openstack.org/elastic-recheck/#1806912 | 13:28 |
openstackgerrit | Monty Taylor proposed opendev/jeepyb master: Use opendev and https by default https://review.opendev.org/655145 | 13:29 |
rosmaita | mriedem: ack | 13:29 |
mriedem | mnaser: not sure | 13:29 |
mnaser | then again we're not the only dual stack cloud | 13:30 |
mnaser | I think | 13:30 |
*** rlandy|ruck is now known as rlandy|ruck|mtg | 13:30 | |
mriedem | the bug hits other providers | 13:30 |
mriedem | just not as much | 13:30 |
mriedem | i see a GET from g-api to swift here http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/screen-g-api.txt.gz#_Apr_23_04_58_17_713760 | 13:30 |
mriedem | that results in a 404 | 13:31 |
mriedem | which is maybe normal devstack checking to see if the image exists (or glance checking) before uploading it to swift? | 13:31 |
mordred | boden, clarkb: I have restarted hound - it's going to take a couple of minutes because it's got a bunch of new stuff to clone and index | 13:31 |
mriedem | guessing it's glance checking because then | 13:31 |
mriedem | Apr 23 04:58:17.714742 ubuntu-bionic-vexxhost-sjc1-0005468291 devstack@g-api.service[23037]: INFO glance_store._drivers.swift.store [None req-8c0ac321-01c3-40ac-a9f5-0b733baac629 admin admin] Creating swift container glance | 13:31 |
mnaser | yeah I saw that too, that seems business as usual | 13:32 |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-neutron-lib-master https://review.opendev.org/654580 | 13:32 |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-ovsdbapp-master https://review.opendev.org/655136 | 13:32 |
frickler | mriedem: apaches proxies to 127.0.0.1:60998, but g-api is binding to :60999, not sure where these port numbers come from in the first place http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/apache_config/glance-wsgi-api_conf.txt.gz | 13:33 |
frickler | http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/screen-g-api.txt.gz#_Apr_23_04_58_11_635308 | 13:33 |
*** sthussey has joined #openstack-infra | 13:33 | |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-neutron-lib-master https://review.opendev.org/654580 | 13:35 |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-ovsdbapp-master https://review.opendev.org/655136 | 13:35 |
frickler | for a working run I see 60999 in both locations | 13:35 |
*** bhavikdbavishi has joined #openstack-infra | 13:35 | |
mriedem | hmm yeah and the curl is specifying noproxy | 13:36 |
mriedem | oh i guess that's just part of wait_for_service in devstack | 13:36 |
mordred | clarkb: if you get a sec, https://review.opendev.org/#/c/655133/ should fix the nodepool devstack job I thnik | 13:37 |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-ovsdbapp-master https://review.opendev.org/655136 | 13:39 |
*** bhavikdbavishi has quit IRC | 13:39 | |
mriedem | frickler: it looks like the proxy port is random | 13:40 |
mriedem | https://github.com/openstack/devstack/blob/master/lib/apache#L349 | 13:41 |
mriedem | port=$(get_random_port) | 13:41 |
frickler | mriedem: yeah, so the glance config here also has 60998, not sure why uwsgi chooses 60999 instead http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/etc/glance/glance-uwsgi.ini.gz | 13:42 |
*** liuyulong has joined #openstack-infra | 13:43 | |
mnaser | question: if we have a change that depends-on a review.openstack.org change, will the dependency *not* be taken into consideration or will it 'block' ? | 13:44 |
mnaser | okay, never mind, it will just ignore it | 13:44 |
mnaser | Zuul took two minutes to pick up a change and bring it to gate so I was starting to wonder what was going on | 13:45 |
*** michael-beaver has joined #openstack-infra | 13:45 | |
*** jamesmcarthur has joined #openstack-infra | 13:46 | |
*** kranthikirang has joined #openstack-infra | 13:47 | |
*** quique|rover|eat is now known as quiquell|rover | 13:47 | |
*** jamesmcarthur has quit IRC | 13:47 | |
*** smarcet has quit IRC | 13:48 | |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-neutron-lib-master https://review.opendev.org/654580 | 13:49 |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openstack-tox-py35-with-ovsdbapp-master https://review.opendev.org/655136 | 13:49 |
pabelanger | mnaser: yah, depends-on: review.o.o will just be ignored | 13:50 |
mriedem | frickler: here is where the random port is retrieved (note ipv4 only) https://github.com/openstack/devstack/blob/master/functions#L801 | 13:51 |
*** jamesmcarthur has joined #openstack-infra | 13:52 | |
*** smarcet has joined #openstack-infra | 13:52 | |
openstackgerrit | Nicolas Hicher proposed openstack/diskimage-builder master: openssh-server: enforce sshd config https://review.opendev.org/653890 | 13:53 |
AJaeger | amorin: is ovh-bhs1 ok again? We disabled it on the 19th due to network problems, can we use it again? | 13:54 |
*** sshnaidm|afk is now known as sshnaidm | 13:55 | |
*** yamamoto has quit IRC | 13:56 | |
*** Goneri has joined #openstack-infra | 13:57 | |
*** psachin has joined #openstack-infra | 13:58 | |
*** rh-jelabarre has joined #openstack-infra | 13:59 | |
*** amansi26 has joined #openstack-infra | 13:59 | |
mnaser | infra-root: either limestone is maybe having issues or rax executors are having network issues, we're seeing a lot of RETRY_LIMIT, jobs failing midway, someone caught one http://paste.openstack.org/show/749635/ | 14:00 |
*** jaosorior has joined #openstack-infra | 14:00 | |
mnaser | http://zuul.openstack.org/builds?result=RETRY_LIMIT | 14:01 |
mnaser | we have a lot of RETRY_LIMIT fails | 14:01 |
mnaser | no logs however.. | 14:01 |
*** Lucas_Gray has quit IRC | 14:04 | |
*** mleroy has left #openstack-infra | 14:06 | |
Shrews | pabelanger: any idea what happened here? http://logs.openstack.org/62/654462/3/gate/openstackci-beaker-ubuntu-trusty/e77f31d/job-output.txt.gz#_2019-04-23_13_21_28_866532 | 14:06 |
*** Lucas_Gray has joined #openstack-infra | 14:06 | |
mordred | boden: http://codesearch.openstack.org/?q=neutron-lib-current&i=nope&files=&repos= works now | 14:07 |
pabelanger | Shrews: I think AJaeger just posted a patch for that | 14:07 |
pabelanger | https://review.opendev.org/655143/ maybe? | 14:07 |
Shrews | ah | 14:07 |
pabelanger | AJaeger: looks like we might need to cap bundler too | 14:08 |
*** smarcet has quit IRC | 14:08 | |
frickler | mriedem: humm, devstack is being run twice it seems. see the successful end of the first run here http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/job-output.txt.gz#_2019-04-23_05_00_55_349794 and then the log of the second run in http://logs.openstack.org/67/648867/8/check/openstacksdk-functional-devstack/e155792/controller/logs/devstacklog.txt.gz | 14:10 |
*** smarcet has joined #openstack-infra | 14:12 | |
mriedem | frickler: i think clarkb pointed that the last time i looked at this :) | 14:16 |
mordred | mriedem, frickler: that doesn't seem like a thing we want it doing | 14:17 |
quiquell|rover | pabelanger: ping | 14:17 |
AJaeger | pabelanger: yeah - want to take over my change? | 14:17 |
quiquell|rover | pabelanger: do we have any way to get zuul jobs enqueue time from all the stein cycle ? | 14:17 |
quiquell|rover | pabelanger: like the enqueued_time json value at zuul status api | 14:18 |
corvus | infra-root: am i to understand that there are reports gitea is not answering on ipv6? is anyone working on that? | 14:19 |
mordred | corvus: I believe that wound up being attempted use of git:// | 14:20 |
AJaeger | pabelanger: patching myself quickly | 14:20 |
*** eharney has joined #openstack-infra | 14:20 | |
corvus | mordred: ah, ok thanks! | 14:20 |
mordred | frickler: want to re-+2 this: https://review.opendev.org/#/c/655145/ ? | 14:21 |
openstackgerrit | Andreas Jaeger proposed opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 and bundler 2.0.1 https://review.opendev.org/655143 | 14:21 |
corvus | mordred: this isn't working that great for me: telnet 2604:e100:3:0:f816:3eff:fe6b:ad62 80 | 14:22 |
*** electrofelix has quit IRC | 14:22 | |
pabelanger | AJaeger: thanks, can help land it | 14:24 |
mordred | corvus: I agree | 14:24 |
pabelanger | corvus: clarkb: mordred: with gitea, I don't see tags listed on the web interface, am I missing something obvious or is that not a setting enabled? | 14:24 |
mordred | pabelanger: click the branch dropdown, then click tags | 14:25 |
pabelanger | quiquell|rover: I think it is logged in statsd otherwise I _think_ it is in db | 14:25 |
*** armax has joined #openstack-infra | 14:26 | |
pabelanger | mordred: ah, thanks! | 14:26 |
pabelanger | I was looking for a releases tab like in github | 14:26 |
mordred | pabelanger: yes - we removed the releases tab becuase it inappropriately provides download links for git export tarballs, which is a terrible idea | 14:27 |
mordred | and causes people to think that downloading those things and using them might work | 14:27 |
corvus | which, incidentally, is something that github does and we can not prevent | 14:27 |
mordred | yeah | 14:27 |
corvus | so github is making its own "releases" of openstack | 14:28 |
mordred | I wouldn't mind reneabling the page if we could fix it to _only_ show manually uploaded artifacts | 14:28 |
mordred | since there is a "create release" and "upload artifact" api | 14:28 |
corvus | i bet we could make a patch | 14:28 |
mordred | yeah | 14:28 |
mnaser | hmm | 14:29 |
*** rlandy|ruck|mtg is now known as rlandy|ruck | 14:30 | |
mnaser | corvus, mordred: curl 2604:e100:3:0:f816:3eff:fe6b:ad62:80 returns no route to host, but hitting 8080 gives connection refused | 14:30 |
mnaser | so.. I don't think this is something we're doing? | 14:30 |
corvus | yeah, it looks like the LB is only listening on v4: tcp 0 0 0.0.0.0:http 0.0.0.0:* LISTEN | 14:31 |
corvus | i was just trying to verify that we restarted it after the config change that should have it listening on v6 | 14:31 |
pabelanger | mordred: ack, thanks | 14:32 |
corvus | i'm not 100% sure of that, so i think it's worth a restart | 14:32 |
mordred | corvus: ++ | 14:33 |
quiquell|rover | pabelanger: do you know where http://grafana.openstack.org/d/T6vSHcSik/zuul-status?orgId=1 is taking the data from ? | 14:35 |
corvus | i'm going to restart the container... mostly because we've never tested the "-sf" haproxy option in a container. | 14:35 |
corvus | there will be a short interruption in service | 14:35 |
corvus | done | 14:36 |
corvus | telnet to the v6 address works now | 14:37 |
pabelanger | quiquell|rover: graphite.opendev.org | 14:37 |
corvus | so that seems to have been the issue | 14:37 |
quiquell|rover | thanks | 14:37 |
*** amansi26 has quit IRC | 14:39 | |
*** lpetrut has quit IRC | 14:40 | |
clarkb | infra-root https://review.opendev.org/#/c/655143/2 is the expected fix for system-config jobs which should allow us to merge the git stack size fix for openstack/openstack | 14:42 |
*** smarcet has quit IRC | 14:42 | |
*** nhicher has joined #openstack-infra | 14:42 | |
quiquell|rover | pabelanger, clarkb: do we store queued_time here https://graphite01.opendev.org/ I cannot find it | 14:44 |
quiquell|rover | enqueued_time I mean | 14:44 |
*** smarcet has joined #openstack-infra | 14:44 | |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Bump lru_cache size to 10 https://review.opendev.org/655173 | 14:44 |
*** udesale has quit IRC | 14:45 | |
*** udesale has joined #openstack-infra | 14:46 | |
*** ccamacho has quit IRC | 14:46 | |
*** smarcet has quit IRC | 14:47 | |
clarkb | quiquell|rover: stats.timers.zuul.tenant.zuul.pipeline.check.resident_time.count is an example of that data Ithink | 14:48 |
*** smarcet has joined #openstack-infra | 14:48 | |
quiquell|rover | clarkb: can we filter that per queue name ? | 14:49 |
quiquell|rover | clarkb: or queue name is not stored ? | 14:49 |
clarkb | quiquell|rover: 'check' is the pipeline name | 14:49 |
*** ramishra has quit IRC | 14:49 | |
AJaeger | clarkb: the infra spec helper fix fails, see http://logs.openstack.org/43/655143/2/check/legacy-puppet-openstack-infra-spec-helper-unit-ubuntu-trusty/8f2dede/job-output.txt.gz#_2019-04-23_14_45_25_116490 - any ideas? | 14:50 |
clarkb | AJaeger: oh I think this was the thing that cmurphy and mordred were looking at. I think that repo may not be self testing at the moment | 14:51 |
clarkb | mordred: cmurphy ^ are you able to confirm that? we may have to force merge that change :/ | 14:51 |
*** amoralej has joined #openstack-infra | 14:52 | |
mordred | clarkb: there's definitely something bonged with those jobs that we should dig in to - I believe last time we just force-merged but I don't have specifics this instant | 14:54 |
quiquell|rover | clarkb: was looking for something like stats.timers.zuul.tenant.zuul.pipeline.periodic.queue.tripleo.resident_time.count | 14:54 |
openstackgerrit | Merged opendev/jeepyb master: Use opendev and https by default https://review.opendev.org/655145 | 14:54 |
clarkb | quiquell|rover: I don't think we aggregate by the pipeline queue | 14:55 |
quiquell|rover | weshay|rover: ^ | 14:55 |
clarkb | mordred: do you have an opinion on whether or not a force merge would be appropriate here? | 14:56 |
*** lpetrut has joined #openstack-infra | 14:56 | |
amoralej | is there any known issue with nodes running in rax-ord? | 14:56 |
mordred | clarkb: I don't - I want to dig in to the construction of that more and see if I can understand all the pieces better - but haven't had time - and now I need to jump on a call for a half hour | 14:57 |
clarkb | amoralej: mnaser mentioend some problems with job retries. but I don't think anyone has had a chance to debug yet | 14:57 |
amoralej | ack | 14:57 |
mordred | clarkb: I don't thnik force-merging is likely to break anything any _more_ though | 14:57 |
clarkb | mordred: ya I'm on a call myself | 14:57 |
amoralej | i see jobs failing and being retried | 14:57 |
clarkb | amoralej: ya that was what mnaser described. If a job fails in pre run stage it will be retried up to 3 times | 14:57 |
mnaser | http://zuul.openstack.org/builds?result=RETRY_LIMIT | 14:58 |
clarkb | I've got a meeting in just a minute but after can start helping to look at it | 14:58 |
mnaser | it looks pretty wide spread right now but yeah, leaving it for who can look at it.. | 14:58 |
cmurphy | clarkb: this seems like a different issue than what i was worried about | 14:58 |
clarkb | capturing what the console log of a job that gets retried would be useful if not already done | 14:58 |
amoralej | in my case, are failing not in pre, but in run playbook | 14:58 |
cmurphy | these unit tests are just looking for the spec helper in ../.. so it should still work | 14:59 |
openstackgerrit | Matt Riedemann proposed opendev/elastic-recheck master: Add query for VolumeAttachment lazy load bug 1826000 https://review.opendev.org/655177 | 14:59 |
openstack | bug 1826000 in Cinder "Intermittent 500 error when listing volumes with details and all_tenants=1 during tempest cleanup" [Undecided,Confirmed] https://launchpad.net/bugs/1826000 | 14:59 |
clarkb | cmurphy: hrm that change pins bundler to < 2.3.0 | 14:59 |
clarkb | cmurphy: so maybe there is another chicken and egg in the testing? | 14:59 |
cmurphy | clarkb: actually it doesn't in the unit tests http://logs.openstack.org/43/655143/2/check/legacy-puppet-openstack-infra-spec-helper-unit-ubuntu-trusty/8f2dede/job-output.txt.gz#_2019-04-23_14_45_21_610904 | 15:00 |
AJaeger | cmurphy: or is my change wrong? | 15:00 |
cmurphy | AJaeger: you need to edit run_unit_tests.sh too | 15:00 |
cmurphy | https://opendev.org/opendev/puppet-openstack_infra_spec_helper/src/branch/master/run_unit_tests.sh#L41 | 15:00 |
amoralej | clarkb, in some cases nothing at all https://imgur.com/a/kj6C7c5 | 15:01 |
openstackgerrit | Slawek Kaplonski proposed openstack/project-config master: Switch py35 periodic jobs to py36 in Neutron's dashboard https://review.opendev.org/655178 | 15:01 |
openstackgerrit | Nicolas Hicher proposed openstack/diskimage-builder master: openssh-server: enforce sshd config https://review.opendev.org/653890 | 15:02 |
AJaeger | cmurphy: can I just say "gem install bundler < 2.3.0" ? Or what's the syntax? | 15:02 |
cmurphy | AJaeger: i think with a -v | 15:03 |
cmurphy | or --version | 15:03 |
AJaeger | thanks | 15:04 |
amoralej | clarkb, another one http://paste.openstack.org/show/749648/ this seems failed with unreachable and then remote host identification has changed | 15:04 |
amoralej | node redeployed? | 15:05 |
clarkb | remote host id changing is often due to neutron reusing IPs | 15:05 |
clarkb | (yay dogfooding) | 15:05 |
fungi | okay, i think i'm caught up on scrollback in here, so hopefully after my conference call i can help fix some of the new broken | 15:05 |
openstackgerrit | Andreas Jaeger proposed opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 and bundler 2.0.1 https://review.opendev.org/655143 | 15:06 |
AJaeger | cmurphy, clarkb, next try ^ | 15:06 |
amoralej | clarkb, i have some consoles where i just see regular messages and suddenly --- END OF STREAM --- | 15:07 |
amoralej | not sure if i'm losing messages if i'm not in the console window or something | 15:07 |
*** yamamoto has joined #openstack-infra | 15:07 | |
*** zul has joined #openstack-infra | 15:08 | |
clarkb | amoralej: if the networking is completely broken it won't be able to stream the data off the host anymore | 15:08 |
*** ykarel is now known as ykarel|away | 15:08 | |
*** gyee has joined #openstack-infra | 15:09 | |
*** yamamoto has quit IRC | 15:14 | |
*** ccamacho has joined #openstack-infra | 15:17 | |
clarkb | amoralej: looking at message:"REMOTE HOST IDENTIFICATION HAS CHANGED" AND filename:"job-output.txt" in logstash it seems that infra jobs are the biggest problem with that particular error and that is a centos7 specific issue | 15:17 |
clarkb | otherwise it affects multiple clouds and multiple images | 15:17 |
clarkb | the vast majority are on a single zuul executor but also affects multiple zuul executors | 15:18 |
clarkb | I wonder if it is the executors that are at least part of the problem | 15:18 |
clarkb | based on that data we are retrying properly and jobs are eventually rerunning and passing | 15:20 |
clarkb | (at least in some cases) | 15:20 |
fungi | "The fingerprint for the ED25519 key sent by the remote host is\nSHA256:..." | 15:20 |
fungi | umm | 15:20 |
clarkb | it also peaked a couple hours ago and seems to be tapering off now | 15:20 |
clarkb | fungi: its a known issue that neutron will reuse IPs in some clouds causing these failures then ARP fights happen | 15:21 |
amoralej | clarkb, yeah, jobs are running again, let's see how this run goes | 15:21 |
clarkb | fungi: we could potentially avoid some of that struggle if we were able to ipv6 mroe aggressively | 15:21 |
amoralej | so far, some jobs are passing or properly failing | 15:21 |
amoralej | with no infra issues | 15:21 |
clarkb | ya lets monitor it. The graph data implies it could've been a provider blip that has been corrected | 15:21 |
AJaeger | do we want to add ovh-bhs1 back again? I tried pinging amorin here but never got a reply... | 15:21 |
*** lpetrut has quit IRC | 15:22 | |
clarkb | AJaeger: I think we can try it and turn it off again if it is still safd | 15:22 |
fungi | the failures matching the query you provided span all rax regions as well as ovh-gra1 | 15:22 |
AJaeger | clarkb: change is https://review.opendev.org/#/c/653879/ - I'll +2 now | 15:23 |
clarkb | fungi: ya our ipv4 clouds :) | 15:23 |
fungi | though yeah the biggest volume in the past day was around 13:00 to 13:30 and almost exclusively in ovh-gra1 | 15:24 |
clarkb | fungi: and a single infra job | 15:24 |
fungi | so the rax hits may be rogue vms | 15:24 |
fungi | oh, yep, puppet-beaker-rspec-centos-7-infra for the big spike | 15:24 |
fungi | strange correlation | 15:24 |
fungi | also do we still need the puppet-beaker-rspec-centos-7-infra job? | 15:25 |
clarkb | no I thought I had removed it | 15:26 |
clarkb | (we should do further cleanup on those as necessary) | 15:26 |
fungi | so anyway, we assume the key mismatch errors are unrelated to the retry_timeout results | 15:27 |
fungi | doesn't seem to be any overlap of significance | 15:27 |
clarkb | the key mismatches will cause retries | 15:27 |
clarkb | and if you get 3 in a row a retry_limit error | 15:27 |
clarkb | depends on whether or not the error happens in pre run | 15:27 |
fungi | this one reported roughly 50 minutes ago and seems to be a sudden disconnect: http://logs.openstack.org/78/655078/1/check/tacker-functional-devstack-multinode-python3/5ed80d7/job-output.txt.gz#_2019-04-23_14_39_43_446923 | 15:28 |
AJaeger | clarkb, cmurphy , https://review.opendev.org/#/c/655143/ did not work ;( | 15:29 |
fungi | Setting up conntrackd was the last thing it was doing... will look for a pattern | 15:29 |
clarkb | fungi: that is suspicious | 15:29 |
fungi | yeah | 15:29 |
clarkb | AJaeger: I think 2.0.0 also requires ruby >= 2.30 | 15:30 |
clarkb | AJaeger: see https://rubygems.org/gems/bundler/versions/2.0.0 | 15:30 |
AJaeger | I requrest y 2.0.0 now | 15:30 |
clarkb | AJaeger: so we may want < 2.0.0 | 15:30 |
fungi | probably not conntrackd... this other one died around the same time in the same way but while installing different packages: http://logs.openstack.org/78/655078/1/check/tacker-functional-devstack-multinode/2900b2b/job-output.txt.gz#_2019-04-23_14_39_49_829832 | 15:30 |
clarkb | fungi: that could be a race in buggering | 15:30 |
clarkb | fungi: the manpages happen just before conntrack in your first log | 15:30 |
fungi | possible... | 15:31 |
clarkb | *buffering | 15:31 |
fungi | yeah, both apparently ;) | 15:31 |
openstackgerrit | Andreas Jaeger proposed opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 and bundler 2.0.1 https://review.opendev.org/655143 | 15:31 |
clarkb | we only need trusty for a few more days hopefully :/ and then we can remove testing for it as well as centos7 | 15:32 |
fungi | this one died while configuring swap: http://logs.openstack.org/95/654995/1/check/designate-pdns4-postgres/21337d4/job-output.txt.gz#_2019-04-23_13_45_34_585017 | 15:32 |
clarkb | fungi: I wonder if those timestamps coincide with the other network issues | 15:32 |
clarkb | like maybe our executors had broken ipv4 networking during that time | 15:33 |
clarkb | (or something along those lines) | 15:33 |
fungi | i was starting to have a similar suspicion, maybe network issues in or near rax-dfw? | 15:33 |
openstackgerrit | Andreas Jaeger proposed opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 and bundler 2.0 https://review.opendev.org/655143 | 15:33 |
clarkb | ya | 15:34 |
fungi | is https://rackspace.service-now.com/system_status/ blank for anyone else? | 15:35 |
clarkb | not for me | 15:35 |
clarkb | there is nothing listed there | 15:35 |
clarkb | I mean its not blank but no issues posted either | 15:36 |
clarkb | perhaps was upstream of them and they didn't even notice | 15:36 |
fungi | firefox just give me a blank page for it. oh well | 15:36 |
fungi | er, gives | 15:36 |
AJaeger | fungi: works on firefox for me - but takes a bit to load, seems to use some javascript... | 15:37 |
*** dustinc_away is now known as dustinc | 15:37 | |
*** helenafm has quit IRC | 15:37 | |
fungi | yeah, looks from the page source like it's opening a new window or something. hard to tell what exactly... end result is i get no content but also my privacy extensions aren't reporting blocking anything | 15:38 |
openstackgerrit | Merged opendev/elastic-recheck master: Add query for VolumeAttachment lazy load bug 1826000 https://review.opendev.org/655177 | 15:38 |
openstack | bug 1826000 in Cinder "Intermittent 500 error when listing volumes with details and all_tenants=1 during tempest cleanup" [Undecided,Confirmed] https://launchpad.net/bugs/1826000 | 15:38 |
*** jamesdenton has joined #openstack-infra | 15:38 | |
*** kjackal has quit IRC | 15:41 | |
*** ccamacho has quit IRC | 15:46 | |
*** ccamacho has joined #openstack-infra | 15:47 | |
smcginnis | So we've gotten a lot more failures with ensure-twine being run in a virtualenv. | 15:49 |
smcginnis | Still not sure where that is coming from. | 15:49 |
smcginnis | Would it make sense to add a check in https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-twine/tasks/main.yaml to look for {{ lookup('env', 'VIRTUAL_ENV') }} and drop the --user if set? | 15:50 |
*** e0ne has quit IRC | 15:50 | |
smcginnis | Probably also add a debug print of VIRTUAL_ENV so we can figure out where it's coming from too. | 15:50 |
szaher | sts | 15:51 |
szaher | sts | 15:52 |
amoralej | clarkb, i was tracking 6 different jobs, all failed, i've pasted info about nodes and error messages in http://paste.openstack.org/show/749650/ | 15:53 |
amoralej | there are different error messages | 15:53 |
amoralej | although it seems all may be network related | 15:54 |
*** sshnaidm is now known as sshnaidm|afk | 15:54 | |
clarkb | smcginnis: you might have to do a python script instead if it isn't part of the env but is instead coming from the executable path. And ya I think adding that debugging info would be great | 15:54 |
frickler | boden: fyi, codesearch should be back to work now | 15:54 |
clarkb | amoralej: were any of them retried? | 15:55 |
amoralej | retries are in queue | 15:55 |
*** ykarel|away has quit IRC | 15:55 | |
smcginnis | clarkb: Ah, I assumed the environment would have been activated since we are just calling "python3" and getting that error. | 15:55 |
amoralej | well at least one has reached retry_limit | 15:55 |
boden | frickler great thanks much! so is the long term plan to use the "explore" from opendev, or to keep the "hound" code search? | 15:55 |
amoralej | and all of them were retries of previous failed run | 15:55 |
clarkb | smcginnis: there are two major ways to venv. One is via env vars. The other is to run python out of a venv directly (maybe our path is being munged?) | 15:56 |
mordred | boden: I think we need to discuss the long-term plan ... I thnik we'd like to be able to collapse thigns and not need to run both codesearch and gitea ... but the searching in "explore" has some deficiencies at the moment and I think we need to discuss what needs to be or can be done and what the plan will be | 15:57 |
weshay|rover | sorry to bug you guys, are the status from https://review.opendev.org/#/c/616306/15/releasenotes/notes/resource-usage-stats-bfcd6765ef4a9c86.yaml public or something only avail to infra.. if so I was hoping to represent the stats at the tripleo mtg at ptg | 15:57 |
mordred | boden: which si to say - I don't think there is yet a full plan - more like a latent desire | 15:57 |
* weshay|rover trying to see how tripleo performed in being a good upstream citizen in stein | 15:57 | |
*** dave-mccowan has joined #openstack-infra | 15:57 | |
clarkb | weshay|rover: they should be in the same graphite server | 15:58 |
mordred | weshay|rover: yes - those are in graphite | 15:58 |
clarkb | weshay|rover: but that change is not merged yet | 15:58 |
boden | mordred ack and thanks for the info and everyones help on this | 15:58 |
mordred | boden: sure thing! thanks for the patience, I'm glad we were able to get hound back up and running properly :) | 15:58 |
weshay|rover | k.. thanks | 15:59 |
clarkb | weshay|rover: I can do a log parsing run in a bit to give you numbers for the last 30 days | 15:59 |
weshay|rover | clarkb k.. I know ur busy, it's not critical.. but a nice to have :) | 15:59 |
clarkb | well I'm waiting for test results on AJaeger's puppet testing fix so I have a few minutes now :) | 16:00 |
clarkb | amoralej: my hunch is that we've got ongoing instability in ipv4 networking between our executors and test clouds that ipv4 | 16:01 |
clarkb | fungi: ^ any ideas on testing that more directly | 16:01 |
clarkb | fungi: mtr between ze0* and ovh and rax-iad/ord? | 16:01 |
clarkb | weshay|rover: http://paste.openstack.org/show/749651/ | 16:02 |
clarkb | amoralej: limestone and vexxhost are talked to via ipv6. Inap is our other ipv4 cloud. If we can find evidence of trouble to vexxhost or limestone we may be able to rule out this theory | 16:03 |
amoralej | clarkb, that'd make sense, i'm trying to find some pattern in logstash | 16:04 |
*** quiquell|rover is now known as quiquell|off | 16:04 | |
weshay|rover | clarkb thanks.. comparing to http://paste.openstack.org/show/736797/ 42.6 -> 24.8 not bad :) | 16:05 |
corvus | clarkb: i started mtrs last week between ze01 and sjc1 v4/v6, rax-ord, rax-iad, rax-dfw, and google dns | 16:05 |
*** jpich has quit IRC | 16:05 | |
clarkb | weshay|rover: yup seems to have been steady progress since we started tracking it | 16:05 |
corvus | sjc1v6 had some noticable packet loss, google dns had a very small amount, nothing on the others. | 16:06 |
clarkb | corvus: are you doing ipv6 or ipv4 to rax-* ? | 16:06 |
corvus | to clarify, those mtrs are still running | 16:06 |
*** dave-mccowan has quit IRC | 16:07 | |
*** Lucas_Gray has quit IRC | 16:07 | |
corvus | clarkb: v4 for some reason | 16:07 |
weshay|rover | clarkb thanks for the help! | 16:07 |
clarkb | corvus: I think that is what we want to know for this theory at least. Good to know there isn't any loss there | 16:08 |
*** pgaxatte has quit IRC | 16:09 | |
corvus | clarkb: AJaeger was saying we still have bhs1 disabled; mnaser disabled it because of network errors, but then i think we've started to suspect that might have been the same errors we're seeing everywhere? | 16:09 |
clarkb | corvus: ya and AJaeger has asked us to reenable bhs1 as a result | 16:10 |
clarkb | let me find the change | 16:10 |
mnaser | but I think at the time clarkb had a vm he was doing tests on | 16:10 |
mnaser | and it was losing packets or whatnot | 16:10 |
clarkb | corvus: https://review.opendev.org/#/c/653879/ | 16:10 |
clarkb | mnaser: yes but could have been related to general network sadness? it was a personal vm in ovh1-bhs1 that had trouble talking to cloudflare dns | 16:10 |
clarkb | mnaser: those failures could still have been related to the same thing that is making our other traffic unhappy is what I was trying to say | 16:11 |
corvus | i've added bhs1 to and inap my mtr screen on ze01 | 16:11 |
*** jbadiapa has quit IRC | 16:11 | |
mnaser | right, but at the time we found proof in unbound that some of these failed jobs failed to contact 1.1.1.1 too in unbound (but of course, that can be all gone now) | 16:11 |
clarkb | infra-root https://review.opendev.org/#/c/655143/ passes now and should fix system-cofnig tests allowing us to fix git stack sizes | 16:12 |
mnaser | but anyways, yes, there seems to be something weird going on | 16:12 |
clarkb | if anyone can be second review on that real quick it would be much appreciated | 16:12 |
mnaser | also, if someone wants to run mtr from outside rax to sjc1v6 .. in case there's actual issues | 16:12 |
corvus | did i see a theory about executor localization? | 16:12 |
corvus | +3 | 16:12 |
clarkb | corvus: localization like i18n? or physical location of executors playing a part? we did wonder if perhaps the problem was more on the executor side which would explain widespread impact | 16:13 |
fungi | smcginnis: clarkb: i thought we were preinstalling twine on our executors for use in our release jobs. i wonder if the pip install error is due to us failing to actually preinstall it on some executors? or maybe failing to preinstall it for some interpreters? | 16:13 |
corvus | clarkb: the second thing | 16:13 |
clarkb | fungi: oh! the move to ansible venvs for zuul would explain that | 16:13 |
clarkb | fungi: we have a list of things to install into those venvs and I wouldn't be surprised if twine is not on that | 16:13 |
smcginnis | fungi: THere is a check for `which twine` that fails. | 16:14 |
clarkb | and that may also explain the it's a virtualenv jim problem | 16:14 |
corvus | clarkb: yeah, if it's widespread, it means one or more of the following: (a) the internet is broken (b) rax-dfw networking is broken (c) networking on one or more exectuors is broken | 16:14 |
*** ijw has joined #openstack-infra | 16:14 | |
corvus | i was wondering if someone has correlated failures to suggest (c) | 16:14 |
clarkb | no I don't think we haev managed to get beyond "it is a theoretical possibility" | 16:15 |
smcginnis | clarkb, fungi: So should I worry about mucking with ensure-twine, update ensure-twine to not ever do --user, or wait for preinstalling to be fixed? | 16:15 |
mnaser | I guess we can check if the # of failures per executor is higher | 16:15 |
clarkb | smcginnis: we can probably wait for preinstall fix to be fixed if that is an intended feature | 16:15 |
*** lucasagomes has quit IRC | 16:16 | |
amoralej | clarkb, looking for "RESULT_UNREACHABLE" message in the last 6hours it shows some high peaks | 16:16 |
*** ykarel|away has joined #openstack-infra | 16:16 | |
clarkb | amoralej: about 3 hours ago? | 16:16 |
smcginnis | clarkb: OK, that's probably easiest for me. Do you think we should actually drop the ensure-twine role completely if there is an expectation that it will always be preinstalled? | 16:17 |
amoralej | higher peak is at 15:45 - 15:50 | 16:17 |
amoralej | up to 124 in 5 minutes | 16:17 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Use user.html_url for github reporter messages https://review.opendev.org/655188 | 16:17 |
smcginnis | And this is blocking some releases, so second question would be who is taking that action and do we have an ETA? | 16:17 |
clarkb | amoralej: oh good to know | 16:17 |
amoralej | also at around 16:40 | 16:17 |
clarkb | smcginnis: would you like to take the action? | 16:17 |
fungi | smcginnis: the ensure-twine role, i think, is included in the job in the zuul-jobs standard library, for the benefit of folks who don't preinstall twine on executors | 16:17 |
amoralej | and 14:20 - 14:25 | 16:18 |
smcginnis | clarkb: Not sure if I can take that one. | 16:18 |
smcginnis | fungi: Sounds like it should probably be fixed then if there's a chance others may use this role in cases where it is not preinstalled. | 16:18 |
smcginnis | Do I understand right that with a zuul change this will always be run within a venv? | 16:18 |
smcginnis | In which case we just need to drop "--user" from the pip install. | 16:19 |
fungi | i believe that behavior will depend on whether the given zuul deployment is configured to manage its own ansible installs, though i could be wrong | 16:19 |
smcginnis | OK, so we probably do need to make that more robust to be able to handle both cases. | 16:20 |
clarkb | corvus: any idea where the list of things to install into the zuul ansible venvs is? | 16:20 |
fungi | in cases where ansible is not being run from a virtualenv, --user installs presumably work | 16:20 |
corvus | clarkb: on it | 16:20 |
clarkb | I'm not having good luck finding it but know that we had to add gear to it semi recently | 16:20 |
smcginnis | Odd that we had a mix of those working. Maybe due to it being preinstalled some places but not others? | 16:20 |
openstackgerrit | James E. Blair proposed opendev/puppet-zuul master: Install twine in executor Ansible environments https://review.opendev.org/655189 | 16:21 |
corvus | clarkb, fungi, smcginnis ^ | 16:21 |
clarkb | tyty | 16:21 |
smcginnis | Thanks corvus | 16:21 |
amoralej | clarkb, https://imgur.com/a/Xk7Qnk6 in case it helps | 16:21 |
corvus | docs here: https://zuul-ci.org/docs/zuul/admin/installation.html#ansible | 16:21 |
amoralej | there are failures in ovh and limestone-regionone too | 16:22 |
clarkb | ok so failures in limestone imply that this isn't ipv4 specific | 16:22 |
corvus | amoralej: can you correlate with zuul_executor and see if there's a pattern? | 16:22 |
smcginnis | corvus: Does that need to also include the others here: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-twine/tasks/main.yaml#L14 | 16:23 |
corvus | smcginnis: yep | 16:23 |
*** kopecmartin is now known as kopecmartin|off | 16:23 | |
*** mrhillsman is now known as openlab | 16:23 | |
*** openlab is now known as mrhillsman | 16:23 | |
corvus | smcginnis, clarkb, fungi: how did we used to have twine pre-installed? | 16:24 |
clarkb | corvus: it is/was in system-config/manifests/site.pp on the zuul scheduler | 16:24 |
amoralej | corvus, seems disperse | 16:24 |
clarkb | er executor | 16:24 |
*** mrhillsman is now known as openlab | 16:24 | |
corvus | clarkb: 'git grep twine' in system config is nil | 16:24 |
clarkb | hrm no that is just gear ok I'm wrong | 16:24 |
corvus | amoralej: thanks | 16:24 |
*** openlab is now known as mrhillsman | 16:25 | |
fungi | ugh, firefox is back to showing a "corrupted content error" about network protocol violations when trying to browse opendev.org | 16:25 |
corvus | perhaps we did not pre-install? | 16:25 |
clarkb | corvus: ya that could be but python wasn't a venv and so --user worked | 16:25 |
corvus | fungi: i restarted the haproxy ~2hours ago | 16:25 |
fungi | ahh, maybe that's it | 16:25 |
clarkb | so maybe the proper fix here is to simply fix it in the job / role | 16:25 |
fungi | corvus: how could we have been using twine on the executors if it wasn't preinstalled? what options did we have for installing software on executors? is that allowed in bubblewrap? | 16:26 |
clarkb | fungi: it was running the pip install | 16:26 |
clarkb | (or could have run the pip install that it is running now) | 16:26 |
amoralej | corvus, executors go from ze01..ze13 ? | 16:27 |
amoralej | 12 i meant | 16:27 |
corvus | on ze01, python3 -e "import twine" -> ImportError: No module named 'twine' | 16:27 |
corvus | amoralej: yes | 16:27 |
fungi | so pip install --user was working under bwrap-managed homedirs i guess? | 16:27 |
clarkb | fungi: yes that is my hunch | 16:27 |
corvus | i WIP'd https://review.opendev.org/655189 | 16:28 |
fungi | i recall having discussions about needing tools we're going to run on executors preinstalled for security reasons, but i concur i don't see it in the puppet-zuul git history | 16:28 |
*** smarcet has quit IRC | 16:28 | |
*** dtantsur is now known as dtantsur|afk | 16:28 | |
clarkb | it is also possible that pip being mad about --user from in a venv is new | 16:29 |
corvus | we certainly need system packages pre-installed, but installing a python package from our mirror should be 99.99999% reliable | 16:29 |
corvus | and permitted by bwrap (at least when run in a trusted playbook) | 16:29 |
*** smarcet has joined #openstack-infra | 16:30 | |
fungi | which also brings me back to wondering why this was only failing sporadically on some builds and then worked when we reenqueued them, but has now started to fail consistently | 16:30 |
corvus | that's an improvement | 16:30 |
*** whoami-rajat has quit IRC | 16:30 | |
fungi | over the course of a week or so | 16:31 |
clarkb | fungi: image updates? | 16:31 |
clarkb | no we run on the executor | 16:31 |
amoralej | corvus, http://paste.openstack.org/show/749655/ note a single failed job can have more that one RESULT_UNREACHABLE message, i'm trying to clean it more | 16:32 |
fungi | so analyzing a recent release ensure-twine failure of this nature: http://logs.openstack.org/02/02dc0019af5f47d1850781b83e6041201054e1c5/release/release-openstack-python/9e49a6f/job-output.txt.gz#_2019-04-22_21_30_35_950934 | 16:34 |
clarkb | corvus: looking at ze01 there are not very manyssh connections (`sudo lsof -n -i TCP`) and I can't hit one of the hosts that is SYN_SENT and not ESTABLISHED from home | 16:34 |
clarkb | corvus: implying that the host is actually not reachable | 16:34 |
clarkb | I'm going to boot a testnode or three in rax iad | 16:35 |
clarkb | out of band of nodepool and see if they are reachable from the executors and home | 16:36 |
corvus | the jobs are a mix of devstack and non-devstack (eg, osa), right? | 16:36 |
corvus | (the unreachable node jobs) | 16:36 |
mnaser | yep ^ | 16:36 |
fungi | corvus: yes, and tox stuff too from what i saw | 16:36 |
*** rcernin has quit IRC | 16:36 | |
mnaser | and I have some OSA failures that just failed in a super random spot (so nothing having to do with network related operations) | 16:36 |
fungi | a lot of jobs just terminate partway through (at different points) and declare ssh unreachable and the console stream prematurely ending | 16:37 |
*** bobh has joined #openstack-infra | 16:37 | |
AJaeger | fungi: care to change your WIP to +2A on https://review.opendev.org/#/c/653018/ to give us imap back? See last comment there... | 16:37 |
fungi | done | 16:38 |
AJaeger | thanks | 16:38 |
clarkb | fungi: yup and looking at executor tcp connections there aren't a ton of ssh connections | 16:39 |
*** whoami-rajat has joined #openstack-infra | 16:39 | |
fungi | okay, so on the ensure-twine problem... i have to assume that the virtialenv it's talking about in the error is the one zuul is managing for ansible... if so, that's going to be outside bwrap and mapped in read-only so a regular pip install without --user won't work, right? | 16:40 |
clarkb | fungi: yes | 16:40 |
amoralej | clarkb, it seems there has been another peak right 10 minutes ago | 16:43 |
clarkb | amoralej: I wonder if that has to do with job runtimes and our use of ssh control persist | 16:43 |
*** ginopc has quit IRC | 16:44 | |
fungi | most recent pip release was over a month ago, most recent virtualenv release was nearly a month ago, so those don't really seem to line up with when we started seeing the ensure-twine failures | 16:45 |
clarkb | fungi: virtualenv updates its pip on install now. possible it is just pip and not virtualenv? | 16:45 |
*** mattw4 has joined #openstack-infra | 16:45 | |
*** rossella_s has quit IRC | 16:45 | |
fungi | well, neither released new versions around the time we started to see the problem, which was after the stein release | 16:46 |
*** tosky has quit IRC | 16:46 | |
*** udesale has quit IRC | 16:47 | |
clarkb | I have 3 hosts up in rax-iad now all show zero packet loss to various executors. They all ended up in the same /24 though (so if network range problem this may not expose it) | 16:48 |
*** eharney has quit IRC | 16:48 | |
amoralej | clarkb, for the reviews i've been closely monitoring, failing jobs are long running ones, but it's hard to say if it's related to long jobs or just it's more likely that you hit network issues in long jobs | 16:49 |
clarkb | amoralej: ya we use ssh control persistence to reduce the number of connections that have to made too. If there is network trouble it is possible that longer jobs are more likely to run into it given the mitigations we already have | 16:50 |
openstackgerrit | Merged openstack/project-config master: Revert "Temporarily disable inap-mtl01 for maintenance" https://review.opendev.org/653018 | 16:50 |
*** altlogbot_2 has quit IRC | 16:50 | |
mordred | clarkb, amoralej: I have anecdotally observed the same thing - the issue seems to be most seen while running a long-running single shell task | 16:50 |
mordred | but the plural of anecdote isn't data - so I don't know if that's a real thing or just what I happen to have observed | 16:51 |
amoralej | mordred, yes, that's also my case | 16:51 |
openstackgerrit | Jason Lee proposed opendev/storyboard master: WIP: Second correction to Loader in preparation for Writer Update https://review.opendev.org/654812 | 16:52 |
*** jpena is now known as jpena|off | 16:52 | |
fungi | well, also longer-running jobs are simply statistically more likely to fall victim to a random network problem | 16:52 |
mordred | fungi: yes, this is a very accurate statement | 16:52 |
clarkb | also nodepool checks ssh connectivity before giving the node to a job | 16:52 |
mordred | and within those jobs, long-running single tasks are statistically more likely to be the thing that hits it | 16:52 |
fungi | yep | 16:53 |
clarkb | so we know that networking works well enough for that to be successful before zuul gets the node. I am going to let my test nodes sit around a for a bit as a result and see if they look worse in an hour | 16:53 |
clarkb | we may also want to sanity check nodepool isn't getting duplicate IPs | 16:53 |
clarkb | maybe we are our own noisy neighbor type situation | 16:53 |
*** altlogbot_3 has joined #openstack-infra | 16:55 | |
clarkb | as a spot check we do recycle ip addrs but they are not during overlapping time periods (if arp was not updating properly we might see this behavior) | 16:56 |
fungi | i guess our job logs don't actually say where the ansible they're running is installed on the executor? at least i can't seem to find that information. also the docs for zuul-manage-ansible don't say how it installs ansible... the versioned trees under /var/lib/zuul/ansible/ on our executors don't look like virtualenvs either | 16:56 |
*** derekh has quit IRC | 16:57 | |
clarkb | I need to step out for breakfast I'll be back in a bit to look into this networking stuff more | 16:58 |
fungi | digging into the AnsibleManager class definition now | 16:58 |
fungi | aha, https://zuul-ci.org/docs/zuul/admin/components.html#attr-executor.ansible_root | 17:00 |
corvus | fungi: the debug log says it's running /usr/lib/zuul/ansible/2.7/bin/ansible-playbook | 17:00 |
fungi | oh, i bet that bindir is somehow mapped into the bwrap context | 17:02 |
fungi | no, nevermind | 17:02 |
fungi | /usr/lib not /var/lib | 17:02 |
corvus | here's a full example command: http://paste.openstack.org/show/749657/ | 17:02 |
fungi | i guess <zuul_install_dir> is /usr in our case | 17:03 |
corvus | fungi: i think the ansible venv is in /usr/lib and the zuul modules (which are also versioned) are in /var/lib | 17:03 |
fungi | fhs tunnel vision, i don't typically expect anything besides the system package manager to add things in /usr, thanks! | 17:04 |
fungi | i kept looking at that and my brain was automatically substituting /var | 17:04 |
corvus | yeah, maybe we should change that | 17:05 |
fungi | not super critical, just me with distro-oriented blinders on | 17:05 |
*** jbadiapa has joined #openstack-infra | 17:05 | |
fungi | was trying to run this down from first principles and validate our assumptions about how/where it's trying to install twine | 17:06 |
fungi | so on ze01 the ansible venvs are all using python 3.5.2 and pip 19.0.3 | 17:08 |
fungi | latest version from february 20 | 17:09 |
fungi | ahh, right, i can't calendar. the latest pip and virtualenv versions are from two months ago, not a month ago, so even less correlated to the start of these failures | 17:09 |
fungi | we're now in april | 17:10 |
*** nicolasbock has quit IRC | 17:10 | |
fungi | so looks like the ansible venvs on ze01 were created on march 18, still much longer ago than ensure-twine started popping this error | 17:10 |
fungi | same creation timestamp on all 12 executors | 17:12 |
fungi | and definitely no twine installed in any of them right now | 17:14 |
*** gagehugo has joined #openstack-infra | 17:15 | |
fungi | no twine executable in the default system path for any of the executors either | 17:15 |
fungi | also as previously established, the last change in git for the ensure-twine role merged january 29 | 17:17 |
*** ijw has quit IRC | 17:18 | |
*** ijw has joined #openstack-infra | 17:19 | |
*** ijw has quit IRC | 17:20 | |
*** e0ne has joined #openstack-infra | 17:20 | |
*** ijw has joined #openstack-infra | 17:20 | |
fungi | we're calling ensure-twine from the release-python playbook in opendev/base-jobs which was fixed to add that role on april 3 | 17:20 |
*** rpittau is now known as rpittau|afk | 17:20 | |
*** Weifan has joined #openstack-infra | 17:21 | |
fungi | my notes say the first recorded case of this particular failure was 2019-04-17 in http://logs.openstack.org/19/19a7574237f44807b16c37e0983223ff57340ba3/release/release-openstack-python/769f856/ | 17:22 |
*** Weifan has quit IRC | 17:22 | |
fungi | so roughly 6 days ago | 17:23 |
*** Weifan has joined #openstack-infra | 17:23 | |
*** ijw has quit IRC | 17:23 | |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Add retries to getPullReviews() with github https://review.opendev.org/655204 | 17:23 |
clarkb | I can still ssh into my three test nodes in iad after leaving them be for a while | 17:23 |
*** ijw has joined #openstack-infra | 17:23 | |
*** e0ne has quit IRC | 17:24 | |
fungi | clarkb: i wonder, can you start up a nc on both ends and connect them to each other with no traffic for a while, then see if they get disconnected (or simply stop passing traffic)? | 17:24 |
clarkb | could be worth testing. I'm rotating out one of the three to see if a new one immedaitely after ad elete has any interesting behavior (since that is what ndoepool does) | 17:25 |
fungi | yeah, wondering if the failures we see couldn't be some stateful network device losing its sh^Htates | 17:26 |
fungi | or aggressively dropping inactive ones | 17:27 |
clarkb | ya | 17:27 |
clarkb | (at some point I really need to start putting together a project update and figuring out a summit schedule) | 17:27 |
clarkb | (so please don't assume I should be the only one debugging this stuff :) ) | 17:27 |
fungi | also, one thing which can cause this... packet shapers. i'm going to look and see if there is rate limiting evidence in the cacti graphs for our executors | 17:28 |
clarkb | ++ thanks | 17:28 |
*** psachin has quit IRC | 17:28 | |
clarkb | oh and we have a meeting in an hour and a half and the ptg to plan for | 17:28 |
fungi | meh, "priorities" ;) | 17:28 |
*** diablo_rojo has joined #openstack-infra | 17:29 | |
clarkb | fungi: this is where you say "baord meetings are for writing project updates" ? | 17:29 |
clarkb | btw I also checked dmesg on ze01 for any evidence of say OOMKiller and found nothing | 17:30 |
clarkb | and syslog lacks complaints from ssh | 17:30 |
*** jamesmcarthur has quit IRC | 17:30 | |
corvus | i'm working on fixing our local patch to gitea | 17:30 |
fungi | cacti says ze01 is running pretty tight on available memory, but i suppose that's our ram governor at work. the others are almost certainly similar | 17:30 |
*** jamesmcarthur has joined #openstack-infra | 17:31 | |
corvus | it's not easy because every time i try to run the unit tests, my printer start spewing garbage | 17:31 |
fungi | hah | 17:31 |
clarkb | corvus: at least it isn't on fire? | 17:31 |
fungi | i hope you don't run out of greenbar | 17:31 |
corvus | yeah.. maybe don't print the randomly generated binary test data to stdout? | 17:32 |
fungi | yeesh | 17:32 |
clarkb | what are teh chances this is an ssh/ansible issue? | 17:32 |
clarkb | (just wondering if we need to explore that too) | 17:33 |
fungi | cacti seems to only occasionally be able to reach ze02, but this doesn't look like new behavior: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64158&rra_id=all | 17:33 |
corvus | there have been ansible releases and we should be auto-upgrading them | 17:33 |
clarkb | 2.7.latest is what we should be using right? | 17:33 |
clarkb | by default at least | 17:33 |
clarkb | 2.7 updated on april 20 according to timestamps on files | 17:34 |
clarkb | but last 2.7 release was april 4 | 17:35 |
*** jamesmcarthur has quit IRC | 17:35 | |
clarkb | we did manage to emrge an openstack manuals chagne so maybe not all is lost :) | 17:36 |
fungi | oh, that was just the documentation update which says "all is lost" | 17:37 |
fungi | i'm up to ze06 so far... no obvious signs of network rate-limiting or anything else especially anomalous on the graphs which might coincide with these ssh problems | 17:38 |
fungi | cacti seems to have collected no data whatsoever on ze10 | 17:41 |
clarkb | at this point I've sent several thousand mtr tracer pings and not a single one was lost between iad and 3 executors over ipv4 | 17:42 |
fungi | so aside from the fact that cacti can't reliably reach ze02 and ze10 over ipv6 (other servers where we saw this, i think deleting and recreating the network port got it working?) i see nothing on the cacti graphs for our executors which would explain the ssh connection issues | 17:44 |
*** ccamacho has quit IRC | 17:54 | |
clarkb | I've just ssh'd into every ipv4 host connected to from ze01 | 17:54 |
clarkb | and they all worked | 17:54 |
*** bobh has quit IRC | 17:55 | |
openstackgerrit | Merged opendev/puppet-openstack_infra_spec_helper master: Block hashdiff 0.3.9 and bundler 2.0 https://review.opendev.org/655143 | 17:55 |
jrosser | fwiw the log I grabbed here was an ipv6 fail http://paste.openstack.org/show/749635/ | 17:55 |
clarkb | woo I can recheck the gitea git stack fix now | 17:55 |
fungi | i wonder, should we set some autoholds and then when one catches a node which fell victim to this behavior check connectivity and/or nova console? | 17:55 |
clarkb | jrosser: thanks. In this case I didn't check ipv6 because no ipv6 at home | 17:56 |
*** amoralej is now known as amoralej|off | 17:56 | |
clarkb | but ya I think we amoralej dig up some limestone failures taht were similar and also ipv6 | 17:56 |
clarkb | fungi: ya that might be more efficient than me trying to manually boot one that fails | 17:56 |
fungi | i wonder how broad of an autohold i can add | 17:56 |
clarkb | I'm going to delete my three test instances in iad now since they haven't shown me anything useful | 17:58 |
clarkb | corvus: the docker registry backed by swift change merged yesterday iirc | 18:00 |
clarkb | corvus: is that something we hsould followup and check on? | 18:00 |
corvus | clarkb: i suspect we will need to restart it to make it go into effect, and it would be good to do that in conjuctions with watching some jobs | 18:02 |
mordred | corvus, clarkb: I can take that - y'all seem to be having fun diagnosing network issues :) | 18:05 |
corvus | mordred: well, i'm mostly working on the gitea change | 18:05 |
corvus | but yes, also "fun" | 18:05 |
mordred | corvus, clarkb: I can take that - y'all seem to be having fun diagnosing network issues and that gitea change :) | 18:05 |
clarkb | mordred: thanks | 18:05 |
clarkb | I need to switch gears into prepping for the meeting un just a moment so not sure how much network debugging I'll be doing for a bit | 18:06 |
mnaser | maybe a little silly but perhaps someone shooting off an email to rax about if there is any network changes might be productive | 18:06 |
mnaser | maybe there's some firewall or network appliance that was recently setup which affects our type of workloads | 18:06 |
clarkb | cloudnull: ^ you about? | 18:07 |
mnaser | just an extra useful datapoint | 18:07 |
corvus | i *finally* have gotten the gitea sqlite integration tests (the ones that are failing on my pr) to pass and run on master. it turns out the procedure for running them is not the one in the docs, or in integrations/README.md, but rather, is only documented in the drone.yml config file. | 18:11 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Blackhole spam for airship-discuss-owner address https://review.opendev.org/655227 | 18:12 |
clarkb | http://paste.openstack.org/show/749660/ is a specific example I dug up ansible logs for | 18:16 |
clarkb | any idea if the complaint about the inventory not being in the desired format is potentially related? | 18:17 |
clarkb | like maybe we cannot connect becuse we broke the inventory somehow? | 18:17 |
*** ricolin has quit IRC | 18:17 | |
clarkb | that was an ipv6 host in sjc1 too fwiw | 18:19 |
openstackgerrit | Monty Taylor proposed opendev/base-jobs master: Update opendev intermediate registry secret https://review.opendev.org/655228 | 18:19 |
clarkb | I think the next steps are fungi's hold idea and filing a ticket/sending email to rax. I've got to pop out and do some stuff before the meeting but I guess we'll pick back up there | 18:20 |
mordred | corvus: ^^ we need to do that per-tenant, right? | 18:20 |
fungi | yeah, i'm just to the point of fiddling with autohold now | 18:20 |
mordred | corvus: or just the once in opendev/base-jobs is fine | 18:20 |
fungi | i don't suppose there are any particular projects/jobs/changes which will be better choices for an autohold than others | 18:21 |
clarkb | fungi: jobs that take longer to run | 18:21 |
mordred | clarkb: https://review.opendev.org/655228 is needed for the registry stuff - gotta rekey on the client side too | 18:21 |
clarkb | maybe tempest-full, a tripleo job, and an OSA job? | 18:21 |
clarkb | mordred: k | 18:21 |
fungi | i suppose i can push up some trivial dnm changes and set autoholds for some long-running jobs which will run against them | 18:22 |
clarkb | also we may not hold on network failures? | 18:22 |
clarkb | the example above that I pasted is being rerun aiui | 18:22 |
clarkb | because ansbile reported it as a network failure to zuul | 18:23 |
fungi | oh, yeah at best it'll hold the last build which ends in retry_limit i guess? | 18:23 |
clarkb | ya | 18:24 |
clarkb | assuming it actually gets there and doesn't magically work on the third attempt | 18:24 |
openstackgerrit | Nicolas Hicher proposed openstack/diskimage-builder master: openssh-server: enforce sshd config https://review.opendev.org/653890 | 18:24 |
clarkb | ok really need to pop out for a bit now. Back soon | 18:24 |
mordred | corvus, fungi: if you get a sec, https://review.opendev.org/#/c/655228/ | 18:26 |
*** e0ne has joined #openstack-infra | 18:29 | |
*** nicolasbock has joined #openstack-infra | 18:31 | |
*** eharney has joined #openstack-infra | 18:39 | |
clarkb | mnaser: bce3129d-8458-4947-b567-2c41311aab6a is the nova uuid of the node above that failed in sjc1. Might be worth sanity checking it to make sure that it didn't crash qemu/kvm (perhaps related to image updates or something) | 18:42 |
*** jamesmcarthur has joined #openstack-infra | 18:44 | |
*** jamesmcarthur_ has joined #openstack-infra | 18:45 | |
*** happyhemant has quit IRC | 18:48 | |
*** jamesmcarthur has quit IRC | 18:49 | |
*** smarcet has quit IRC | 18:53 | |
corvus | fungi, clarkb: yeah, you can autohold all failures if you want. retry_limit will trigger autohold, but only on the last. | 18:56 |
corvus | that doesn't seem to be a problem though. | 18:56 |
corvus | there's a change in review to allow you to specify the result states for an autohold. :/ | 18:56 |
clarkb | corvus: I think the issue is we'll only be able to hold it if it gets to retry_limit | 18:56 |
clarkb | which is maybe good enough | 18:56 |
corvus | clarkb: right. but plenty of jobs are doing that :) | 18:56 |
*** ykarel|away has quit IRC | 18:57 | |
fungi | cool, these are the autoholds i added: http://paste.openstack.org/show/749665/ | 18:57 |
corvus | we will also get the network failures that happen in run and post-run playbooks, though those will be harder to triage out from regular failures | 18:57 |
fungi | will check periodically to see what we catch in the trap and throw back any which aren't keepers | 18:57 |
fungi | it's eerily like crabbing | 18:58 |
clarkb | especially when you throw back 99% of what you get | 18:58 |
*** Weifan has quit IRC | 18:58 | |
fungi | yeah | 18:58 |
clarkb | infra meeting in a few minutes | 18:58 |
mordred | fungi: i don't think the keepers in this case will be very tasty | 18:58 |
clarkb | join us in #openstack-meeting | 18:58 |
corvus | you only want jobs larger than a certain size | 18:59 |
fungi | though it looks like my third zuul autohold command isn't returning control to the terminal | 18:59 |
fungi | and it's been a few minutes | 18:59 |
fungi | maybe reconfigure in progress | 18:59 |
corvus | fungi: yeah, be patient | 18:59 |
corvus | well.. hcm | 19:00 |
fungi | "crabbing suspended: ocean reloading, please wait" | 19:00 |
corvus | it... could be that the scheduler is out of ram | 19:00 |
fungi | oof | 19:00 |
corvus | apparently there is a memory leak as of our april 16 restart | 19:01 |
*** e0ne has quit IRC | 19:01 | |
mwhahaha | any particular reason why centos7 jobs are RETRY_LIMITing? | 19:02 |
mwhahaha | see https://review.opendev.org/#/c/654648/ (only centos7 jobs did) | 19:02 |
corvus | i will restart scheduler now | 19:02 |
clarkb | mwhahaha: it is affecting all jobs | 19:02 |
clarkb | mwhahaha: we've been trying to sort it out for much of the morning | 19:02 |
mwhahaha | k | 19:02 |
corvus | help appreciated | 19:02 |
mwhahaha | it didn't seem to affect the bionic jobs on that one | 19:03 |
mwhahaha | weird | 19:03 |
corvus | clarkb, fungi: what do you think about taking the opportunity to restart all executors? | 19:03 |
fungi | wow, yeah we've been swapping on the scheduler since ~12:30z today | 19:03 |
fungi | corvus: seems like a good idea | 19:03 |
clarkb | corvus: maybe even reboot them if it is possible ssh issues are related to the system | 19:03 |
fungi | yes, exactly my thoughs | 19:03 |
corvus | they have been running for a long time -- just incase some cruft has accumulated | 19:03 |
fungi | thoughts | 19:04 |
fungi | should we use the full restart playbook for this? | 19:04 |
corvus | fungi: almost -- if we want to reboot that's an extra step | 19:05 |
corvus | so i'll just do it manually | 19:05 |
fungi | ahh, yeah i missed clarkb's use of the word "reboot" | 19:05 |
fungi | status notice the zuul scheduler is being restarted now in order to address a memory utilization problem; changes under test will be reenqueued automatically | 19:07 |
fungi | that look sufficient? | 19:07 |
corvus | fungi: ++ | 19:07 |
clarkb | fungi: ++ | 19:07 |
fungi | #status notice the zuul scheduler is being restarted now in order to address a memory utilization problem; changes under test will be reenqueued automatically | 19:07 |
openstackstatus | fungi: sending notice | 19:07 |
openstackgerrit | Nate Johnston proposed openstack/project-config master: Track neutron uwsgi jobs move to check queue https://review.opendev.org/655234 | 19:08 |
-openstackstatus- NOTICE: the zuul scheduler is being restarted now in order to address a memory utilization problem; changes under test will be reenqueued automatically | 19:08 | |
openstackstatus | fungi: finished sending notice | 19:10 |
corvus | still waiting on execs to stop | 19:12 |
corvus | all stopped | 19:14 |
corvus | i will reboot all mergers and executors | 19:14 |
fungi | thanks | 19:15 |
*** kjackal has joined #openstack-infra | 19:16 | |
smcginnis | clarkb, fungi, corvus: Sorry, I had to step out for awhile. Do we still need an update to ensure-twine to check whether to install with --user or not? I saw a comment that I think was saying it might not help, but I wasn't really sure. | 19:18 |
corvus | executors and mergers are up and running | 19:19 |
corvus | restarting sched now | 19:19 |
fungi | smcginnis: indeterminate. i went back to the drawing board trying to confirm how things could have been working previously and working up a timeline of what we know changed when | 19:21 |
fungi | because it's still baffling | 19:21 |
smcginnis | Yeah, very baffling. | 19:22 |
smcginnis | I would feel much better if we understood what changed that caused this. That first one that worked after a reenqueue was odd. | 19:22 |
corvus | er, neat. something killed the scheduler | 19:24 |
mordred | corvus: "awesome" | 19:24 |
Shrews | wow | 19:26 |
corvus | we've seen this once before, also never found out what it was | 19:26 |
corvus | trying again | 19:26 |
*** nicolasbock has quit IRC | 19:27 | |
*** nicolasbock has joined #openstack-infra | 19:27 | |
*** wehde has joined #openstack-infra | 19:29 | |
wehde | Can anyone help me figure out a neutron issue? | 19:29 |
*** igordc has joined #openstack-infra | 19:29 | |
*** jamesmcarthur_ has quit IRC | 19:32 | |
corvus | loaded | 19:32 |
corvus | re-enqueueing | 19:32 |
corvus | #status log restarted all of Zuul at commit 6afa22c9949bbe769de8e54fd27bc0aad14298bc due to memory leak | 19:32 |
openstackstatus | corvus: finished logging | 19:32 |
*** Weifan has joined #openstack-infra | 19:34 | |
*** smarcet has joined #openstack-infra | 19:37 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Use internal gitweb instead of gitea for now https://review.opendev.org/655238 | 19:37 |
*** Weifan has quit IRC | 19:39 | |
paladox | Ah you use logging too :) (though your bot appears to have "status" too). | 19:41 |
*** kjackal has quit IRC | 19:42 | |
fungi | and we have it send notices to numerous irc channels, and in extreme cases also update channel topics about ongoing situations | 19:42 |
fungi | and all the entries get recorded at https://wiki.openstack.org/wiki/Infrastructure_Status (for the moment anyway) | 19:42 |
paladox | That's nice! (that would be useful) | 19:43 |
paladox | our's get's logged to multiple places | 19:43 |
fungi | i think ours also tries to tweet thnigs, but i'm not sure where since i'm not really into social media | 19:43 |
paladox | heh (i know ours does :)) | 19:43 |
fungi | there was talk of an rss/atom feed as well | 19:43 |
paladox | someone logged spam. | 19:43 |
mordred | fungi, paladox: https://twitter.com/openstackinfra | 19:44 |
fungi | ahh, there it is | 19:45 |
paladox | The bot can log here https://wikitech.wikimedia.org/wiki/Nova_Resource:<project>/SAL (if it's a WMCS project) otherwise things get logged here https://wikitech.wikimedia.org/wiki/Server_Admin_Log | 19:45 |
paladox | ah | 19:45 |
fungi | corvus: clarkb: should i readd my earlier autoholds, or do we want to just watch the system for a bit first and see if the problem resurfaces? | 19:47 |
mordred | smarcet: o hai. I told summit.openstack.org to sync with my google calendar, but I dont have any summit sessions on my calendar. you have all the magical fixing powers right? | 19:47 |
fungi | worst bug report evar | 19:48 |
corvus | fungi: good q, and i'm too hungry to come up with an answer | 19:48 |
clarkb | if we can remove autoholds after the fact without them triggering I say add them | 19:49 |
clarkb | otherwise maybe watch and see | 19:49 |
fungi | yeah, about to find food as soon as the infra meeting is over | 19:49 |
fungi | clarkb: oh, the scheduler restart took care of removing the autoholds for us, which is why i asked ;) | 19:49 |
clarkb | ah | 19:49 |
paladox | fungi this is our's https://twitter.com/wikimediatech | 19:50 |
openstackgerrit | Sean McGinnis proposed zuul/zuul-jobs master: ensure-twine: Don't install --user if running in venv https://review.opendev.org/655241 | 19:56 |
smcginnis | fungi, clarkb, corvus: Newbie yet, so would appreciate feedback on that approach. ^ | 19:56 |
clarkb | corvus: want to direct enqueue 654634? | 19:57 |
openstackgerrit | Sean McGinnis proposed zuul/zuul-jobs master: ensure-twine: Don't install --user if running in venv https://review.opendev.org/655241 | 19:57 |
* clarkb smells the curry that was made for lunch and wanders downstairs | 19:58 | |
fungi | i have shrimp risotto to get to | 19:58 |
fungi | smcginnis: i *suspect* the problem for us is going to be that the virtualenv from which ansible is run is read-only for the jobs, so they're not going to be able to pip install anything into it | 19:59 |
mordred | smarcet: nevermind. I don't know how to calendar apparently | 20:00 |
smarcet | mordred: actually there are 2 ways of doing it | 20:01 |
corvus | fungi, smcginnis: that's an avenue to explore, however, i'm not sure the ansible virtualenv will be "activated" for anything other than the ansible process... | 20:01 |
smarcet | mordred: allow oauth2 permission to your calendar using the synching button and choose google as provider | 20:02 |
fungi | smcginnis: probably we either need to have ansible invoke pip install --user under the system python (not exactly sure what the complexities of that are) or have it create a local virtualenv ni the workspace and pip install twine into that | 20:02 |
smcginnis | corvus: I believe that will still pick up if we are running via a virtualenv python, even if the whole environment is not activated. | 20:02 |
smarcet | mordred: or you could use the brand new option “ | 20:03 |
smarcet | GET SHAREABLE CALENDAR LINK | 20:03 |
smarcet | ” | 20:03 |
corvus | smcginnis: ansible is being run in a virtuaenv -- what ansible then runs is an open question | 20:03 |
smarcet | mordred: from page https://www.openstack.org/summit/denver-2019/summit-schedule | 20:03 |
smcginnis | Based on the failure, it would appear what gets run at least picks up that python executable. | 20:03 |
corvus | smcginnis: do you have a link that shows that? | 20:03 |
smcginnis | fungi: If that's true, we should drop the ensure-twine role completely as it likely will never work right. | 20:03 |
corvus | all i've seen is the opaque error about user in a virtualenv | 20:04 |
smcginnis | corvus: That was the cojecture^whypothesis easrlier as to why it is failing the pip install. It's not running in a virtualenv itself. | 20:04 |
corvus | right, i'm saying i have doubts about that hypothesis and we should attempt to prove or disprove it rather than assume it is correct | 20:05 |
diablo_rojo | clarkb, sorry I got distracted during the meeting, nothing new with storyboard-- still have a lot of patches to review. Planning a huge story triage/overhaul at the PTG Thursday morning. Thats about it. | 20:05 |
fungi | agreed, it's also possible pip is confused and thinks it's being run from a virtualenv when it isn't | 20:05 |
pabelanger | corvus: smcginnis: I believe, if ansible is using localhost, it will look to be inside a virtualenv for playbook task, however, if it uses ssh via localhost, it won't. | 20:06 |
smcginnis | All I know is, releases are completely blocked until this issue is resolved. | 20:06 |
corvus | anyway, i think the next step is for someone to write a job which exercises this stuff and gets some debug output | 20:07 |
mordred | ++ | 20:07 |
*** Weifan has joined #openstack-infra | 20:07 | |
*** jamesmcarthur has joined #openstack-infra | 20:08 | |
*** igordc has quit IRC | 20:10 | |
fungi | it's entirely possible, since "twine_python" is a variable we're passing to the role, that its value started being set to 'python3' recently and that's what started triggering this behavior? looking at the json, it's running this command: `python3 -m pip install twine!=1.12.0 readme_renderer[md]!=23.0 requests-toolbelt!=0.9.0 --user` | 20:11 |
fungi | the default declared for twine_python in the role is just "python" not "python3" | 20:12 |
*** Weifan has quit IRC | 20:12 | |
*** jamesmcarthur has quit IRC | 20:12 | |
fungi | the release-python playbook in opendev/base-jobs doesn't set it | 20:13 |
fungi | nor does the opendev-release-python job in the same repo | 20:14 |
*** Weifan has joined #openstack-infra | 20:14 | |
*** igordc has joined #openstack-infra | 20:14 | |
*** igordc has quit IRC | 20:15 | |
*** Lucas_Gray has joined #openstack-infra | 20:16 | |
fungi | aha, release-openstack-python as defined in openstack/project-config sets it according to http://zuul.opendev.org/t/openstack/job/release-openstack-python | 20:16 |
fungi | https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L112-L129 | 20:18 |
fungi | its path to the ensure-twine role, for the record, is via https://opendev.org/openstack/project-config/src/branch/master/playbooks/publish/pypi.yaml | 20:19 |
clarkb | fwiw the last ansible exitcode 4 was at 19:08UTC on ze01 | 20:22 |
fungi | git blame suggests the twine_python variable was set in that job as of november when https://review.opendev.org/616676 merged | 20:22 |
fungi | so that's not what has caused this | 20:23 |
smcginnis | Since that first solum patch failed, but then worked after a reenqueue, is there something changed in the nodes that could explain why one (assumingly newer) node would fail, but another would work as it has been until now? And as more nodes were updated the failure became more prevalent? | 20:24 |
fungi | there are no nodes in this case, the ensure-twine role is running on the executor ("localhost" in the inventory) | 20:26 |
smcginnis | Executors updated? | 20:26 |
fungi | so been trying to figure out what could have changed on the executors on or around the 17th | 20:26 |
*** pcaruana has quit IRC | 20:32 | |
openstackgerrit | Merged opendev/system-config master: Double stack size on gitea https://review.opendev.org/654634 | 20:33 |
clarkb | corvus: ^ finally | 20:34 |
*** kgiusti has left #openstack-infra | 20:34 | |
clarkb | I think we are about half an hour from that applying | 20:34 |
*** jamesmcarthur has joined #openstack-infra | 20:39 | |
*** jamesmcarthur has quit IRC | 20:46 | |
mordred | woot | 20:50 |
mordred | clarkb, fungi: https://review.opendev.org/#/c/655238/ | 20:55 |
clarkb | hrm seems like we may still have some ssh failures just not as many of them? | 20:55 |
clarkb | mordred: left a comment | 20:56 |
clarkb | ze01 has three occurences of ansible exit code 4 in the last few minutes | 20:57 |
*** jamesmcarthur has joined #openstack-infra | 20:57 | |
*** Goneri has quit IRC | 20:59 | |
clarkb | fungi: ^ if you haven't set up the autohold yet you may want to | 20:59 |
corvus | gitea has an "archived" setting for repos | 20:59 |
*** andreykurilin has joined #openstack-infra | 21:00 | |
diablo_rojo | clarkb, fungi Monday is marketplace mixer, Tuesday is Trillio Community Party, Thursday is game night, Friday is PTG happy hour. So Monday after the mixer would work for the Lowry | 21:00 |
clarkb | diablo_rojo: ya that is what I'm thinking and the current weather forecast should be reasonable for that | 21:01 |
*** jamesmcarthur has quit IRC | 21:01 | |
diablo_rojo | I'd be game for that. | 21:01 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Use internal gitweb instead of gitea for now https://review.opendev.org/655238 | 21:03 |
mordred | clarkb: ^^ does that look better? | 21:03 |
mordred | corvus: neat. that seems like a thing we shoudl make use of when appropriate | 21:03 |
fungi | clarkb: i've readded the previous autoholds | 21:03 |
mordred | clarkb: do you know of any planned GoT viewing parties Sunday evening? | 21:04 |
clarkb | mordred: it does: <% if scope.lookupvar("gerrit::web_repo_url") -%> | 21:04 |
clarkb | mordred: which may still fire on the '' | 21:04 |
mordred | clarkb: what's the right way to set that so that it doesn't? false? | 21:04 |
clarkb | mordred: yes I think that is the right way to manipulate the ruby | 21:05 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Use internal gitweb instead of gitea for now https://review.opendev.org/655238 | 21:05 |
openstackgerrit | Merged opendev/storyboard-webclient master: Show tags with stories in project view. https://review.opendev.org/642230 | 21:05 |
*** jamesmcarthur has joined #openstack-infra | 21:06 | |
mordred | clarkb: also - I verified that puppet-gerrit installs the gitweb package if gitweb is true | 21:07 |
clarkb | gitea just updated I think | 21:08 |
clarkb | openstack/openstack works now | 21:08 |
fungi | victory! | 21:08 |
*** boden has quit IRC | 21:12 | |
mordred | clarkb: I'm not 100% prepared to agree with you | 21:13 |
mordred | oh wait - there it is | 21:13 |
clarkb | it isn't quick. I think tagging commits every so many commits would help mitigate that | 21:16 |
clarkb | since that name-rev lookup is going back hundreds of htousands of commits | 21:16 |
clarkb | and then doing it for each commit that a file is most recent on/ | 21:16 |
*** igordc has joined #openstack-infra | 21:19 | |
*** Lucas_Gray has quit IRC | 21:20 | |
*** rh-jelabarre has quit IRC | 21:21 | |
clarkb | mordred: for this networking stuff. One thing I notice is that it seems to happen a lot after what looks like a timeout | 21:22 |
clarkb | which would lend credence to mnaser's suggestion it could be a new firewall or traffic shaper | 21:22 |
clarkb | mordred: can we tell ansible to tell ssh to do ping pongs back and forth every minute or so? | 21:22 |
openstackgerrit | Jason Lee proposed opendev/storyboard master: WIP: BlueprintWriter prototype, attempting bugfixes https://review.opendev.org/654812 | 21:22 |
paladox | btw you may want to beware of gerrit 2.15.12, it apparently has some type of problem that is currently causing an outage for us. | 21:22 |
*** rfolco has quit IRC | 21:24 | |
clarkb | hrm we already set ServerAliveInteral to 60 | 21:24 |
clarkb | which should mean every 60 seconds ping pong | 21:24 |
clarkb | (if you didn't get data otherwise) | 21:24 |
clarkb | and default ServerAliveCountMax is 15 | 21:26 |
clarkb | which means after about 15 minutes we should disconnect | 21:26 |
openstackgerrit | Merged opendev/base-jobs master: Update opendev intermediate registry secret https://review.opendev.org/655228 | 21:26 |
clarkb | er sorry it is 3 | 21:27 |
clarkb | I misread the numbers in the manpage | 21:27 |
clarkb | so 3 minutes | 21:27 |
jamesmcarthur | Hi clarkb: Trying to log into the wiki from https://governance.openstack.org/tc/reference/opens.html throws me back to openstack.org | 21:27 |
jamesmcarthur | Related to this recent migration or something else? | 21:27 |
jrosser | git clone error http://logs.openstack.org/74/652574/3/gate/openstack-ansible-deploy-aio_metal-debian-stable/ecb0b7c/job-output.txt.gz#_2019-04-23_21_09_27_768335 | 21:27 |
clarkb | jamesmcarthur: shouldn't be related to the migration. We didn't touch the wiki | 21:28 |
*** tosky has joined #openstack-infra | 21:28 | |
clarkb | jrosser: ya we've been trying to figure out persistent connectivity issues between zuul and test nodes all day | 21:28 |
clarkb | jamesmcarthur: where is the wiki login from there? | 21:29 |
jamesmcarthur | seems to be something else going on | 21:29 |
jamesmcarthur | I'm already logged in | 21:30 |
jamesmcarthur | I'll open a ticket on our end and see if I can figure it out :) | 21:30 |
*** smarcet has quit IRC | 21:35 | |
openstackgerrit | Merged opendev/storyboard-webclient master: Show all stories created and allows them to filter according to status https://review.opendev.org/642370 | 21:37 |
*** whoami-rajat has quit IRC | 21:40 | |
*** tjgresha_nope has quit IRC | 21:41 | |
*** tjgresha has joined #openstack-infra | 21:43 | |
smcginnis | jamesmcarthur: That's not the wiki. | 21:43 |
smcginnis | jamesmcarthur: https://governance.openstack.org/tc/reference/opens.html is sphinx generated content. | 21:44 |
smcginnis | But I can confirm clicking on Log In from there throws back to / | 21:44 |
smcginnis | Not sure what logging in there is supposed to do. | 21:44 |
openstackgerrit | Merged opendev/system-config master: Install socat on zuul executors https://review.opendev.org/654577 | 21:44 |
smcginnis | jamesmcarthur: You would need to submit a patch for https://opendev.org/openstack/governance/src/branch/master/reference/opens.rst if you are trying to update that page. | 21:46 |
mordred | smcginnis, jamesmcarthur: yeah a) not sure what logging in to the four opens page is intended to accomplish - but also, https://www.openstack.org/Security/login/?BackURL=/home/ ... the BackURL is /home/ - which is unlikely to ever be correct :) | 21:46 |
mordred | that also was supposed to be b) | 21:47 |
smcginnis | :) | 21:47 |
clarkb | I'm going to take a break now since I feel like Im just spinning wheels with the netwpmring stuff. It seems slightly better and if fungi can catch one maybe we can debug (and then lossibly file a bug with $clouf) | 21:48 |
mordred | clarkb: ++ | 21:48 |
mordred | jamesmcarthur: https://opendev.org/openstack/openstackdocstheme/src/branch/master/openstackdocstheme/theme/openstackdocs/header.html | 21:49 |
mordred | that's where the header is coming from | 21:49 |
mordred | and https://opendev.org/openstack/openstackdocstheme/src/branch/master/openstackdocstheme/theme/openstackdocs/header.html#L110 | 21:49 |
mordred | is where the login link is coming from - with the nicely hard-coded /home/ as the BackURL | 21:50 |
mordred | jamesmcarthur, smcginnis: since logging in to openstack docs isn't really a thing, maybe we sohuld just remove the login link from openstackdocstheme? | 21:50 |
jamesmcarthur | ah ha | 21:50 |
smcginnis | I wonder if there is somewhere that is actually used. | 21:50 |
mordred | otherwise I think we'd want to replace /home/ there with some javascript or something that sets an appropriate BackURL | 21:50 |
jamesmcarthur | mordred: that's.. kind of an excellent point | 21:51 |
smcginnis | It might need some sort of conditional display. | 21:51 |
mordred | smcginnis: I'm guessing the html got lifted from soewhere else | 21:51 |
smcginnis | Could likely be | 21:51 |
mordred | but there are zero times when we'll need a login page on published static docs | 21:51 |
smcginnis | I do really wish gitea had a Blame button. | 21:51 |
jamesmcarthur | we provide a little javascript include with the openstack menu so that everyone that's using it can stay up to date | 21:51 |
jamesmcarthur | but it's definitely not applicable to docs | 21:51 |
fungi | okay, sustenance has been consumed and i have 8 minutes to catch up before my next conference call | 21:51 |
jamesmcarthur | lol | 21:52 |
mordred | jamesmcarthur: yeah. that said - if we DID want to fix the login link, just for consistency, that seems fine | 21:52 |
openstackgerrit | Merged opendev/system-config master: Add script to automate GitHub organization transfers https://review.opendev.org/644937 | 21:52 |
mordred | jamesmcarthur: and that way the javascript include would work and it would look integrated and whatnot :) | 21:52 |
mordred | but - you know - I leave all of that to your very capable hands :) | 21:52 |
smcginnis | mordred, jamesmcarthur: Looks like it was intentionally added since it is the first thing mentioned in the commit message: https://github.com/openstack/openstackdocstheme/commit/d31e4ded8941a69b36de413f1bcf56c91bece779 | 21:53 |
mordred | smcginnis: weird. but also - good to know | 21:54 |
*** jcoufal has quit IRC | 21:55 | |
*** jamesmcarthur has quit IRC | 21:55 | |
* mordred needs to AFK | 21:56 | |
fungi | if asettle is awake already, maybe she remembers the reasons there? she was the one who approved that addition | 21:56 |
fungi | er, s/approved/committed/ | 21:56 |
*** jamesmcarthur has joined #openstack-infra | 21:58 | |
*** jcoufal has joined #openstack-infra | 21:58 | |
fungi | AJaeger was the one to approve it | 21:58 |
fungi | three years ago yesterday in fact | 21:59 |
*** imacdonn has quit IRC | 22:01 | |
*** ijw has quit IRC | 22:01 | |
*** imacdonn has joined #openstack-infra | 22:01 | |
jamesmcarthur | Yeah... it was done to try to solidify the various implementations of the openstack header. | 22:02 |
jamesmcarthur | Clearly worth a revisit :) | 22:02 |
corvus | smcginnis: wish granted. merged 4 days ago, probably will be in 1.9.0: https://github.com/go-gitea/gitea/pull/5721 | 22:08 |
mriedem | i just noticed this in a non-voting job in stable/pike but it's also in queens, looks like legacy jobs are now failing because of incorrect or missing required-projects on devstack-gate? | 22:12 |
mriedem | http://logs.openstack.org/98/640198/2/check/nova-grenade-live-migration/370efe9/job-output.txt.gz#_2019-04-22_13_16_53_233772 | 22:12 |
mriedem | is that a known issue? | 22:12 |
mriedem | seems to be after the opendev rename | 22:12 |
*** slaweq has quit IRC | 22:12 | |
mriedem | https://review.opendev.org/#/c/640198/2/.zuul.yaml@38 | 22:13 |
mriedem | not sure if we need to change that in stable branch job defs now? | 22:13 |
mriedem | i guess that's what this was for... https://github.com/openstack/nova/commit/fc3890667e4971e3f0f35ac921c2a6c25f72adec | 22:14 |
*** slaweq has joined #openstack-infra | 22:14 | |
*** jamesmcarthur has quit IRC | 22:15 | |
corvus | mriedem: that change was approved despite the fact that the job it added failed when it ran (since the job is non-voting, it's not gating) | 22:20 |
mriedem | yeah i realize that | 22:21 |
mriedem | i'm wondering if i need to fix this devstack-gate thing on stable branches, or are there redirects in place? | 22:21 |
corvus | mriedem: it needs to be fixed | 22:21 |
mriedem | ok | 22:21 |
mriedem | ah i see there were migration patches per branch, so this isn't as bad i thought it'd be | 22:27 |
*** amoralej|off has quit IRC | 22:27 | |
mriedem | just the one job that missed it | 22:27 |
*** hwoarang has quit IRC | 22:32 | |
*** hwoarang has joined #openstack-infra | 22:34 | |
*** tonyb has joined #openstack-infra | 22:41 | |
*** wehde has quit IRC | 22:47 | |
*** tkajinam has joined #openstack-infra | 22:55 | |
*** kranthikirang has quit IRC | 22:56 | |
Weifan | Our repo has been moved from openstack namespace to x. And our project is not longer published to pypi automatically after pushing a new tag. | 23:00 |
Weifan | Does that mean we should to replicate the repo from opendev to github? Or does anyone know how we can setup it so it would be published based on tags on opendev? | 23:00 |
Weifan | Or is it suggested that we setup our own jobs to publish it? | 23:01 |
*** jcoufal has quit IRC | 23:10 | |
*** aaronsheffield has quit IRC | 23:11 | |
*** hwoarang has quit IRC | 23:13 | |
*** tosky has quit IRC | 23:13 | |
*** yamamoto has joined #openstack-infra | 23:14 | |
*** hwoarang has joined #openstack-infra | 23:14 | |
*** diablo_rojo has quit IRC | 23:14 | |
*** jcoufal has joined #openstack-infra | 23:16 | |
*** gmann is now known as gmann_afk | 23:17 | |
*** yamamoto has quit IRC | 23:18 | |
clarkb | pypi and github are independent | 23:19 |
clarkb | the tag jobs should push to pypi regardless of github | 23:19 |
*** diablo_rojo has joined #openstack-infra | 23:19 | |
Weifan | it has been 1 day, and it is still not updated on pypi | 23:19 |
Weifan | but the tag can be found on opendev | 23:19 |
clarkb | on https://zuul.openstack.org is a builds tab can you search for your pypi jobs there | 23:20 |
clarkb | it probably failed for some reason | 23:20 |
*** yamamoto has joined #openstack-infra | 23:20 | |
*** rcernin has joined #openstack-infra | 23:20 | |
*** hwoarang has quit IRC | 23:22 | |
*** hwoarang has joined #openstack-infra | 23:23 | |
Weifan | any suggestion on how to find the job? can't seem to find it....it was for https://opendev.org/x/networking-bigswitch | 23:23 |
clarkb | let me see | 23:24 |
clarkb | https://zuul.openstack.org/build/e8686ce24e04408aaef4f34c99bd7f27 | 23:26 |
*** lseki has quit IRC | 23:26 | |
openstackgerrit | Jason Lee proposed opendev/storyboard master: WIP: BlueprintWriter prototype, additional bugfixes https://review.opendev.org/654812 | 23:27 |
clarkb | that may be the twine issue? | 23:27 |
clarkb | I'm not in a good spot to debug that as I am on a phone | 23:27 |
Weifan | looks like the ansible task failed | 23:28 |
Weifan | seems like all of them are failing right now, not just our project | 23:32 |
*** igordc has quit IRC | 23:32 | |
fungi | yes, we're still trying to work out the cause. it started happening roughly a week ago, so well before the opendev migration | 23:32 |
*** jcoufal has quit IRC | 23:33 | |
fungi | though it was intermittent until today-ish | 23:33 |
Weifan | ok, thanks for the information :) | 23:34 |
fungi | Weifan: yep, same error in your log too... "Can not perform a '--user' install. User site-packages are not visible in this virtualenv." | 23:35 |
fungi | we'll reenqueue that tag object once we work out the fix, so no need to push a new tag for that | 23:36 |
*** gyee has quit IRC | 23:39 | |
Weifan | would it be related to python3? release-openstack-python seems to have python3 as "release_python", but the job was on queens | 23:39 |
Weifan | which probably uses py2 | 23:39 |
*** rlandy|ruck has quit IRC | 23:40 | |
*** yamamoto has quit IRC | 23:41 | |
*** yamamoto has joined #openstack-infra | 23:42 | |
*** yamamoto has quit IRC | 23:43 | |
fungi | i haven't been able to find a correlation by interpreter. the release-openstack-python job has been set to python3 since november | 23:44 |
*** hwoarang has quit IRC | 23:49 | |
*** hwoarang has joined #openstack-infra | 23:51 | |
*** mattw4 has quit IRC | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!