*** rcernin has joined #openstack-infra | 00:02 | |
*** exsdev has quit IRC | 00:09 | |
*** exsdev has joined #openstack-infra | 00:10 | |
mnaser | infra-root: is there anyone around that can get me a hold on 677024 ? it's timing out in gate but works fine for me locally | 00:12 |
---|---|---|
ianw | mnaser: looking ... | 00:12 |
ianw | manser: kue-integration-1-node ? | 00:14 |
mnaser | ianw: sure, any of the two works | 00:15 |
mnaser | they both timeout :) | 00:15 |
ianw | | opendev | opendev.org/vexxhost/kue | kue-integration-1-node | refs/changes/24/677024/.* | 1 | mnaser: timeouts | | 00:17 |
mnaser | ianw: awesome, lemme try again | 00:17 |
clarkb | mnaser: I'm not sure that zuul can ssh as zuul on any of the nodes | 00:18 |
mnaser | clarkb: why not? it's just a normal nodepool vm? | 00:19 |
clarkb | because the execytor can ssh as zuul but not the test bodes | 00:19 |
mnaser | oh i see what you mean | 00:19 |
mnaser | hmm | 00:19 |
mnaser | i tried to iterate with installing this on the executor | 00:20 |
mnaser | but i guess i dont have sudo so i have to do a user install | 00:20 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Convert nested bridge.o.o ARA report to static HTML https://review.opendev.org/677096 | 00:29 |
clarkb | mnaser: ya you can' install software on the executor | 00:29 |
mnaser | maybe a user install might be the best idea | 00:29 |
clarkb | it is pretty limited to reading/writing files to the scratch space and making http requests externally | 00:29 |
clarkb | (to prevent breakouts) | 00:29 |
mnaser | could a python user install properly work? or not worth trying? | 00:30 |
clarkb | no | 00:30 |
clarkb | not without being a trusted job | 00:30 |
mnaser | ok so i have to generate and distribute ssh keys to make this work? | 00:30 |
clarkb | the issue is in executing arbitrary code not user vs system installs | 00:30 |
clarkb | yes, that is what multinode devstack jobs do for example | 00:31 |
mnaser | just curious how openstack multinode jobs currently do that, is there a role or nah? | 00:31 |
clarkb | that I don't know. | 00:31 |
clarkb | with the zuulv2 jobs devstack-gate did it I think | 00:31 |
clarkb | so there may be a role in devstack now that does it for zuulv3 jobs | 00:32 |
clarkb | https://opendev.org/openstack/devstack/src/branch/master/roles/orchestrate-devstack/tasks/main.yaml#L9-L13 I think that is the role you want | 00:32 |
mnaser | oh awesome | 00:33 |
clarkb | lives at https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/copy-build-sshkey | 00:33 |
*** zhurong has quit IRC | 00:37 | |
*** yamamoto has joined #openstack-infra | 00:39 | |
clarkb | ianw: logan- quick check of a centos host in limestone shows it has a private ipv4 addr configured. That issue must not be a 100% failure then. IPv4 is configured via dhcp according to config drive network_data.json and the sysconfig file glean wrote for eth0 on that host | 00:41 |
clarkb | that doesn't help narrow down what may be happening there very much, but I think we need to check that dhcp is working and that glean is properly writing out that config every time (maybe we have another NM race ?) | 00:42 |
fungi | i guess we'll need to hold a node for whatever tripleo job is consistently hitting it | 00:42 |
ianw | clarkb: umm, i guessing i should scroll backwards ... doing that now :) | 00:42 |
fungi | or hold a bunch of nodes and recheck-spam until we get a held one in ls | 00:43 |
fungi | depending on how likely the job is to also fail for unrelated reasons | 00:43 |
clarkb | ianw: tripleo and kolla have found that some jobs running on centos 7 on limestone don't have ipv4 addrs configured (they are the private only addr in that cloud) | 00:44 |
ianw | right, so only limestone as far as we know? | 00:44 |
ianw | there is also a comment in backscroll that logstash results stop about the time we moved to swift logs | 00:44 |
clarkb | ipv6 is working so job runs there then fails because they need working ipv4 to talk between test nodes in multinode setup ? and ya I think only limestone so far. POssibly because if it happens elsewhere nodepool sees that as a failed boot instead | 00:44 |
fungi | those have been the only example so far, yeah | 00:45 |
ianw | ok, first just checking on logstash before i query ... logs from at least 2019-08-19T10:45:20.175+10:00 | 00:49 |
clarkb | re lgostash I think swift broke it | 00:49 |
clarkb | that is the other thing on my list but I don't want to get sucked into that until tomorrow :P | 00:49 |
fungi | rather, our switch to storing job logs in swift | 00:49 |
clarkb | ya | 00:49 |
* fungi doesn't blame swift itself at all | 00:50 | |
clarkb | my guess is that either swift doesn't like our old severity filter parameter on the requests or the volume of logs without that filter being active is just too high and causing it to fall over or maybe even both | 00:50 |
clarkb | but its a good chance it doesn't like the severity parameter and fixing that will then cause it to fall over due to volume (and we'll need to add aggressive filters to replace those we had) | 00:51 |
ianw | oh right, that was from the os-loganalyze middleware right? | 00:52 |
clarkb | yup | 00:52 |
*** prometheanfire has quit IRC | 00:52 | |
ianw | which is now zuul javascript? | 00:52 |
clarkb | ya | 00:52 |
clarkb | I think what we can do is have logstash's first rule be drop anything that is a debug log | 00:53 |
*** prometheanfire has joined #openstack-infra | 00:56 | |
ianw | fungi / clarkb: if you have a quick second for https://review.opendev.org/#/c/677096 to fix ara-reports in the nested system-config jobs ... i'm trying to show them to upstream testinfra as a sales pitch for opendev.org but it's a bit sucky to say "yeah, it's great but we just have to fix this one bug" :) | 00:57 |
clarkb | ianw: are we able to run ara from an untrusted job like that on the executor? | 01:00 |
ianw | clarkb: well, the results are there; i think because it's all already installed | 01:01 |
ianw | ara is probably a special case | 01:01 |
fungi | looks surprisingly simple | 01:02 |
clarkb | https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_96/677096/2/check/system-config-run-base/f8a7f40/bridge.openstack.org/ara-report/ ya that seems to ahve worked | 01:02 |
fungi | and yeah, likely relying on our preinstalled ara on the executors | 01:02 |
clarkb | right but it isn't just being preinstalled, we don't allow you to run commands either? | 01:02 |
ianw | trusted roles can? an ara-report role is in in the base jobs | 01:04 |
clarkb | oh that must be it | 01:04 |
clarkb | the role itself is trusted | 01:04 |
ianw | thanks; the first 3rd-party test failed with a timeout, another thing i know we're looking into ... https://github.com/philpep/testinfra/pull/482#issuecomment-522279434 | 01:07 |
ianw | anyway, i can do some re-running now | 01:07 |
*** spsurya has joined #openstack-infra | 01:09 | |
*** yamamoto has quit IRC | 01:11 | |
*** yamamoto has joined #openstack-infra | 01:11 | |
*** markvoelker has joined #openstack-infra | 01:20 | |
*** zhurong has joined #openstack-infra | 01:22 | |
openstackgerrit | Merged opendev/system-config master: Convert nested bridge.o.o ARA report to static HTML https://review.opendev.org/677096 | 01:25 |
*** markvoelker has quit IRC | 01:25 | |
*** pkopec has quit IRC | 01:25 | |
mnaser | ianw: i think the hold worked, but it might have failed in a different way.. | 01:31 |
mnaser | i have a job currently stuck here again after doing teh ssh key distribution | 01:31 |
*** redrobot has quit IRC | 02:23 | |
*** Guest90568 has joined #openstack-infra | 02:29 | |
*** Guest90568 is now known as redrobot | 02:32 | |
*** jamesmcarthur has joined #openstack-infra | 02:37 | |
*** bhavikdbavishi has joined #openstack-infra | 02:41 | |
*** bhavikdbavishi1 has joined #openstack-infra | 02:44 | |
*** bhavikdbavishi has quit IRC | 02:45 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 02:45 | |
*** ramishra has joined #openstack-infra | 02:45 | |
openstackgerrit | Ian Wienand proposed opendev/puppet-log_processor master: log-gearman-worker: handle deflate encoded values https://review.opendev.org/677104 | 02:49 |
openstackgerrit | Ian Wienand proposed opendev/puppet-log_processor master: log-gearman-worker: Remove jenkins streaming workaround https://review.opendev.org/677105 | 02:49 |
ianw | clarkb: ^ https://review.opendev.org/#/c/677104/1 note dropped a comment on why i think rax is returning deflated data despite us not asking for it ... let me see if i can replicate with curl maybe | 02:52 |
ianw | ooohh, you know this is more likely to be that we've switched to https, isn't it ... | 02:54 |
ianw | no, so it's not just rax ... i guess that we put the data in as deflate encoded, and then that's how swift serves it back | 03:00 |
*** diablo_rojo has joined #openstack-infra | 03:00 | |
ianw | roles/upload-logs-swift/library/zuul_swift_upload.py: headers['content-encoding'] = 'deflate | 03:01 |
ianw | yeah, ok, mystery solved | 03:01 |
ianw | let me update the comment ... | 03:01 |
openstackgerrit | Ian Wienand proposed opendev/puppet-log_processor master: log-gearman-worker: handle deflate encoded values https://review.opendev.org/677104 | 03:07 |
openstackgerrit | Ian Wienand proposed opendev/puppet-log_processor master: log-gearman-worker: Remove jenkins streaming workaround https://review.opendev.org/677105 | 03:07 |
*** exsdev has quit IRC | 03:10 | |
*** ricolin has joined #openstack-infra | 03:10 | |
clarkb | ianw: do you need to decodethe zlib decompress output at utf8 too? | 03:19 |
clarkb | in 104 | 03:19 |
*** odicha has joined #openstack-infra | 03:23 | |
*** exsdev has joined #openstack-infra | 03:24 | |
*** jamesmcarthur has quit IRC | 03:33 | |
ianw | clarkb: no, i don't think so, i remove that in the follow-on | 03:35 |
ianw | anyway, as you say i think we need to filter the logs too, but this is i think step one in at least getting the logs :) | 03:35 |
*** ricolin has quit IRC | 03:44 | |
*** dave-mccowan has quit IRC | 03:46 | |
*** dklyle has joined #openstack-infra | 03:46 | |
*** jamesmcarthur has joined #openstack-infra | 03:47 | |
clarkb | ++ | 03:49 |
*** ykarel has joined #openstack-infra | 03:51 | |
*** dklyle has quit IRC | 03:52 | |
*** jamesmcarthur has quit IRC | 04:00 | |
*** jamesmcarthur has joined #openstack-infra | 04:00 | |
AJaeger | config-core, please review a change to test whether we can remove the logurl that is wrong now with swift: https://review.opendev.org/676755 - and also please review these two for preparing to use promote pipeline: https://review.opendev.org/#/c/676624/ and https://review.opendev.org/#/c/676630 | 04:02 |
*** yamamoto has quit IRC | 04:15 | |
*** yamamoto has joined #openstack-infra | 04:15 | |
*** jhesketh has quit IRC | 04:18 | |
*** jhesketh has joined #openstack-infra | 04:19 | |
*** jamesmcarthur has quit IRC | 04:24 | |
*** jamesmcarthur has joined #openstack-infra | 04:26 | |
*** diablo_rojo has quit IRC | 04:28 | |
*** udesale has joined #openstack-infra | 04:30 | |
*** jamesmcarthur has quit IRC | 04:37 | |
*** jamesmcarthur has joined #openstack-infra | 04:44 | |
*** jamesmcarthur has quit IRC | 05:00 | |
*** dpawlik has joined #openstack-infra | 05:00 | |
*** dchen has quit IRC | 05:00 | |
*** jamesmcarthur has joined #openstack-infra | 05:05 | |
*** dchen has joined #openstack-infra | 05:05 | |
*** trident has quit IRC | 05:06 | |
*** ykarel has quit IRC | 05:09 | |
*** ykarel has joined #openstack-infra | 05:10 | |
*** ykarel is now known as ykarel|afk | 05:14 | |
*** ociuhandu has joined #openstack-infra | 05:15 | |
*** trident has joined #openstack-infra | 05:15 | |
*** ricolin has joined #openstack-infra | 05:16 | |
*** kopecmartin|off is now known as kopecmartin | 05:18 | |
*** trident has quit IRC | 05:21 | |
*** markvoelker has joined #openstack-infra | 05:23 | |
*** raukadah is now known as chkumar|ruck | 05:23 | |
*** janki has joined #openstack-infra | 05:24 | |
*** markvoelker has quit IRC | 05:27 | |
*** trident has joined #openstack-infra | 05:27 | |
*** dychen has joined #openstack-infra | 05:29 | |
*** dychen has quit IRC | 05:31 | |
*** dychen has joined #openstack-infra | 05:32 | |
*** dchen has quit IRC | 05:32 | |
*** ociuhandu has quit IRC | 05:36 | |
*** jamesmcarthur has quit IRC | 05:38 | |
*** jamesmcarthur has joined #openstack-infra | 05:42 | |
*** AJaeger has quit IRC | 05:42 | |
*** yikun has quit IRC | 05:43 | |
*** jamesmcarthur has quit IRC | 05:44 | |
*** AJaeger has joined #openstack-infra | 05:45 | |
AJaeger | ianw, frickler, could you review, please? ^ | 05:46 |
*** ykarel|afk is now known as ykarel | 05:49 | |
*** jaosorior has joined #openstack-infra | 05:51 | |
*** dingyichen has joined #openstack-infra | 05:54 | |
*** dychen has quit IRC | 05:57 | |
*** ociuhandu has joined #openstack-infra | 06:02 | |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Make direct-push configurable on project-level https://review.opendev.org/677109 | 06:02 |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Implement push job in merger https://review.opendev.org/677110 | 06:02 |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Push changes in GerritReporter if direct-push is enabled https://review.opendev.org/677111 | 06:02 |
*** jbadiapa has joined #openstack-infra | 06:05 | |
*** n-saito has joined #openstack-infra | 06:12 | |
*** dychen has joined #openstack-infra | 06:12 | |
AJaeger | thanks, ianw ! | 06:14 |
*** dingyichen has quit IRC | 06:15 | |
*** yamamoto has quit IRC | 06:15 | |
*** ykarel is now known as ykarel|afk | 06:21 | |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Use opendev-tox-docs for api jobs https://review.opendev.org/676624 | 06:22 |
openstackgerrit | Merged openstack/project-config master: Add promote api-ref/guide jobs https://review.opendev.org/676630 | 06:23 |
AJaeger | config-core, please review https://review.opendev.org/677091 to keep base-test in sync for opendev/base-jobs | 06:32 |
*** yamamoto has joined #openstack-infra | 06:34 | |
*** threestrands has quit IRC | 06:34 | |
*** threestrands has joined #openstack-infra | 06:35 | |
*** e0ne has joined #openstack-infra | 06:37 | |
*** e0ne has quit IRC | 06:38 | |
*** ykarel|afk is now known as ykarel | 06:38 | |
*** kjackal has joined #openstack-infra | 06:39 | |
*** dychen has quit IRC | 06:40 | |
*** dchen has joined #openstack-infra | 06:41 | |
*** dchen has quit IRC | 06:43 | |
openstackgerrit | Merged opendev/base-jobs master: Remove log_url from emit-job-header (base-test) https://review.opendev.org/676755 | 06:43 |
*** dchen has joined #openstack-infra | 06:48 | |
*** pkopec has joined #openstack-infra | 06:51 | |
*** ociuhandu has quit IRC | 06:53 | |
openstackgerrit | Merged opendev/base-jobs master: Sync base-test jobs https://review.opendev.org/677091 | 06:58 |
*** jtomasek has joined #openstack-infra | 06:58 | |
*** ociuhandu has joined #openstack-infra | 07:00 | |
*** udesale has quit IRC | 07:05 | |
*** udesale has joined #openstack-infra | 07:06 | |
*** dchen has quit IRC | 07:14 | |
*** jaosorior has quit IRC | 07:14 | |
openstackgerrit | Ian Wienand proposed opendev/puppet-log_processor master: Debug stripping: remove obsolete GET filter, add local filter https://review.opendev.org/677122 | 07:15 |
openstackgerrit | Ian Wienand proposed opendev/puppet-log_processor master: log-gearman-worker: remove obsolete GET debug filter, add local filter https://review.opendev.org/677122 | 07:15 |
*** ociuhandu has quit IRC | 07:16 | |
*** ykarel is now known as ykarel|afk | 07:16 | |
*** rcernin has quit IRC | 07:17 | |
*** apetrich has joined #openstack-infra | 07:17 | |
*** dchen has joined #openstack-infra | 07:18 | |
*** bhavikdbavishi has quit IRC | 07:19 | |
*** bhavikdbavishi has joined #openstack-infra | 07:19 | |
*** threestrands has quit IRC | 07:19 | |
ianw | clarkb: ^ so i think that should be roughly what's required to get logs ingested again ... lightly tested, on logstash-worker01 i have /home/ianw/ls-test/test.py which has extracted most of the core bits for testing, but it's not our best example of CI based development :) | 07:21 |
*** jtomasek has quit IRC | 07:21 | |
*** ykarel|afk has quit IRC | 07:23 | |
*** dpawlik has quit IRC | 07:34 | |
*** rpittau|afk is now known as rpittau | 07:35 | |
*** takamatsu has joined #openstack-infra | 07:37 | |
*** yolanda has quit IRC | 07:42 | |
*** yolanda__ has joined #openstack-infra | 07:43 | |
*** udesale has quit IRC | 07:43 | |
*** jpena|off is now known as jpena | 07:44 | |
*** udesale has joined #openstack-infra | 07:44 | |
*** ykarel|afk has joined #openstack-infra | 07:46 | |
*** ykarel|afk is now known as ykarel | 07:46 | |
*** lucasagomes has joined #openstack-infra | 07:48 | |
openstackgerrit | Andreas Jaeger proposed opendev/bindep master: Update openSUSE testing https://review.opendev.org/677133 | 07:48 |
*** xarses_ has quit IRC | 07:53 | |
*** andreww has joined #openstack-infra | 07:54 | |
*** andreww has quit IRC | 07:54 | |
*** andreww has joined #openstack-infra | 07:54 | |
*** dougsz has joined #openstack-infra | 08:02 | |
*** jaosorior has joined #openstack-infra | 08:05 | |
*** trident has quit IRC | 08:11 | |
*** xenos76 has joined #openstack-infra | 08:15 | |
*** tkajinam has quit IRC | 08:18 | |
*** trident has joined #openstack-infra | 08:19 | |
AJaeger | Here're two reviews for bindep, please: https://review.opendev.org/#/c/667533/5 and https://review.opendev.org/#/c/667614/ | 08:23 |
*** ykarel is now known as ykarel|afk | 08:23 | |
AJaeger | frickler, ianw, could either of you review these, please? ^ | 08:24 |
*** dchen has quit IRC | 08:24 | |
*** yolanda__ is now known as yolanda | 08:28 | |
*** adriant has quit IRC | 08:29 | |
*** iokiwi has quit IRC | 08:29 | |
*** adriant has joined #openstack-infra | 08:31 | |
*** iokiwi has joined #openstack-infra | 08:31 | |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Implement push job in merger https://review.opendev.org/677110 | 08:32 |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Push changes in GerritReporter if direct-push is enabled https://review.opendev.org/677111 | 08:32 |
*** noorul has joined #openstack-infra | 08:33 | |
*** noorul has quit IRC | 08:34 | |
*** derekh has joined #openstack-infra | 08:34 | |
*** ociuhandu has joined #openstack-infra | 08:37 | |
sshnaidm | folks, seems like limestone completely down with centos, network issues fail all jobs | 08:38 |
sshnaidm | can we exclude it somehow from jobs? | 08:39 |
*** kjackal has quit IRC | 08:41 | |
*** ociuhandu has quit IRC | 08:41 | |
*** e0ne has joined #openstack-infra | 08:42 | |
*** kjackal has joined #openstack-infra | 08:43 | |
*** kjackal has quit IRC | 08:47 | |
*** kjackal has joined #openstack-infra | 08:49 | |
*** ykarel|afk is now known as ykarel | 08:51 | |
AJaeger | sshnaidm: could you help debugging so that it gets fixed? What has changed that it now fails? | 08:55 |
sshnaidm | AJaeger, according to fungi and donnyd it's problem " centos-7 images there are not correctly obtaining their ipv4 address configuration", not sure how I can help debug this | 08:56 |
*** xenos76 has quit IRC | 09:00 | |
*** dpawlik has joined #openstack-infra | 09:03 | |
*** roman_g has joined #openstack-infra | 09:03 | |
*** janki has quit IRC | 09:17 | |
*** yamamoto has quit IRC | 09:21 | |
*** n-saito has left #openstack-infra | 09:21 | |
*** yamamoto has joined #openstack-infra | 09:23 | |
*** yamamoto has quit IRC | 09:24 | |
*** yamamoto has joined #openstack-infra | 09:31 | |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openSUSE 42.3 https://review.opendev.org/677158 | 09:33 |
*** markvoelker has joined #openstack-infra | 09:35 | |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove openSUSE 42.3 https://review.opendev.org/677158 | 09:39 |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Add legacy-opensuse-15 nodeset https://review.opendev.org/677162 | 09:40 |
*** markvoelker has quit IRC | 09:40 | |
openstackgerrit | Kartik Sharma proposed openstack/os-performance-tools master: Remove the incorrect bug tracker link https://review.opendev.org/677163 | 09:50 |
*** apetrich has quit IRC | 09:56 | |
chkumar|ruck | AJaeger: Is it possible to hold the node from tripleo project patch running on limestone and see what is happening there? | 10:07 |
AJaeger | you need an infra-root for this, I don't have those permissions. I expect somebody can help you later | 10:09 |
chkumar|ruck | AJaeger: thanks! | 10:09 |
*** ociuhandu has joined #openstack-infra | 10:15 | |
*** dougsz has quit IRC | 10:17 | |
*** ociuhandu has quit IRC | 10:20 | |
*** kjackal has quit IRC | 10:28 | |
yoctozepto | hey infra, Zuul did something odd to one of our checks | 10:29 |
yoctozepto | please see http://zuul.openstack.org/status | 10:29 |
yoctozepto | for 677144 | 10:29 |
yoctozepto | it queued up jobs that should not run | 10:29 |
yoctozepto | they were not in queue at the beginning | 10:30 |
*** Wasaac has quit IRC | 10:30 | |
yoctozepto | it must've happened in the meantime | 10:30 |
*** Wasaac has joined #openstack-infra | 10:31 | |
*** dougsz has joined #openstack-infra | 10:35 | |
*** ralonsoh has joined #openstack-infra | 10:37 | |
*** elod is now known as elod_off | 10:47 | |
yoctozepto | after a little digging | 10:51 |
yoctozepto | it looks like this merged in the meantime: https://review.opendev.org/666634 | 10:52 |
yoctozepto | could it reset the job in kolla to check all optional jobs? | 10:52 |
yoctozepto | (even if should not be matched) | 10:52 |
yoctozepto | looks like a bug in Zuul | 10:52 |
donnyd | There is not an issue at FN with centos. I posted links showing that is was functional yesterday | 10:55 |
*** xenos76 has joined #openstack-infra | 10:57 | |
AJaeger | yoctozepto: what exactly is the problem? Which job should not run and is run? | 11:00 |
yoctozepto | AJaeger: a bunch: e.g. kolla-ansible-ubuntu-source-ironic, kolla-ansible-centos-source-cinder-lvm | 11:02 |
yoctozepto | they appeared later than the other, proper, did | 11:03 |
yoctozepto | the only merged changed during that time was the one I linked just above | 11:03 |
AJaeger | yoctozepto: no direct idea. Please see my -1 on the change and adjust, you should never have non-voting jobs in gate queue... | 11:04 |
AJaeger | better uncomment the lines... | 11:04 |
*** dave-mccowan has joined #openstack-infra | 11:06 | |
*** dchen has joined #openstack-infra | 11:06 | |
yoctozepto | AJaeger: yeah, will switch to commenting out now because ooo has their CI broken | 11:07 |
yoctozepto | so no point in running that at all | 11:07 |
yoctozepto | just waiting for Zuul to finish | 11:07 |
*** xenos76 has quit IRC | 11:07 | |
yoctozepto | so that I have a proof it ran too many checks... but why reconfig of kolla-ansible would cause this is beyond me :-) | 11:08 |
AJaeger | yoctozepto: hope somebody else has an idea. | 11:09 |
AJaeger | You could add "debug: true" to the check pipeline configuration, that shows after the run why jobs were run or not run. | 11:10 |
AJaeger | (maybe only why not run) | 11:10 |
*** gfidente has joined #openstack-infra | 11:10 | |
yoctozepto | AJaeger: well, they got added later, not during init, that's suspicious :D | 11:12 |
yoctozepto | first time I saw such behavior | 11:12 |
AJaeger | weird | 11:13 |
*** dchen has quit IRC | 11:13 | |
*** xenos76 has joined #openstack-infra | 11:14 | |
*** udesale has quit IRC | 11:15 | |
*** tesseract has joined #openstack-infra | 11:15 | |
*** dougsz has quit IRC | 11:17 | |
*** kjackal has joined #openstack-infra | 11:27 | |
dirk | AJaeger: evrardjp: what is the urgency of the 42.3 job removal? we should still have a 42.3 nodepool image? | 11:32 |
frickler | yoctozepto: 2019-08-19 10:04:33,230 DEBUG zuul.layout: [e: d383ebbbfc1248a3b59af867de2eaba2] The configuration of job <Job kolla-ansible-ubuntu-source-ironic branches: {MatchAny:{BranchMatcher:master}} source: opendev/base-jobs/zuul.d/jobs.yaml@master#25> is changed by <Change 0x7f185d716908 openstack/kolla 677144,1>; ignoring file matcher | 11:33 |
AJaeger | dirk: it's not building since two months according the Shrews (see backscroll) | 11:34 |
frickler | so it seems that indeed the reconfiguraton triggered by 666634 causes new jobs to be added because the file matcher is ignored in this special situation | 11:34 |
yoctozepto | it should not touch check queue as it is independent | 11:35 |
yoctozepto | feels weird to me | 11:35 |
yoctozepto | thanks for confirming though | 11:35 |
yoctozepto | makes me sound less insane ;D | 11:35 |
*** apetrich has joined #openstack-infra | 11:36 | |
*** markvoelker has joined #openstack-infra | 11:36 | |
dirk | AJaeger: ok, yeah, the dreaded systemd-logger/rsyslog thing. which is still a bug in leap 15.* as well | 11:37 |
dirk | AJaeger: how about we simply fix that one? | 11:37 |
*** jpena is now known as jpena|lunch | 11:39 | |
AJaeger | dirk, evrardjp, yes, that's an option as well. Question still remains what to do with openSUSE 42.3: Should it be removed from master and replaced with 15? And what about old branches where we have experimental jobs sometimes? | 11:40 |
aspiers | is it just me or is gitea agonisingly slow? | 11:41 |
aspiers | it's taking 10-20s per page load, e.g. https://opendev.org/openstack/nova/ | 11:41 |
*** markvoelker has quit IRC | 11:41 | |
AJaeger | aspiers: nova is too large - gitea does some git operations and that slow it down... | 11:41 |
aspiers | I wonder which git operations | 11:41 |
aspiers | sounds like a performance bug | 11:42 |
AJaeger | aspiers: I don't remember - it was discussed in backscroll sometimes here, so please check logs - or source code ;) | 11:42 |
*** yamamoto has quit IRC | 11:43 | |
*** rlandy has joined #openstack-infra | 11:47 | |
*** rlandy is now known as rlandy|rover | 11:48 | |
aspiers | shame that gitea doesn't support URLs with shortened SHA1s | 11:49 |
*** chkumar|ruck is now known as chkumar|rover | 11:49 | |
*** rlandy|rover is now known as rlandy|ruck | 11:49 | |
*** janki has joined #openstack-infra | 11:50 | |
*** dougsz has joined #openstack-infra | 11:55 | |
openstackgerrit | Dirk Mueller proposed openstack/diskimage-builder master: zypper-minimal: install without recommends https://review.opendev.org/677188 | 11:56 |
dirk | AJaeger: I would replace 42.3 with 15 jobs, what to do on stable branches is a good question.. | 11:57 |
dirk | AJaeger: I can see reasons for removing the jobs as well there (or just keeping them around) | 11:57 |
*** markvoelker has joined #openstack-infra | 11:57 | |
dirk | AJaeger: ^^ above should fix the 42.3 problem | 11:57 |
*** tdasilva has joined #openstack-infra | 11:59 | |
*** weshay_pto is now known as weshay | 12:00 | |
*** rh-jelabarre has joined #openstack-infra | 12:04 | |
*** yamamoto has joined #openstack-infra | 12:04 | |
*** ykarel is now known as ykarel|afk | 12:05 | |
*** yamamoto has quit IRC | 12:05 | |
*** yamamoto has joined #openstack-infra | 12:06 | |
*** ociuhandu has joined #openstack-infra | 12:07 | |
*** ociuhandu has quit IRC | 12:11 | |
*** jaosorior has quit IRC | 12:14 | |
*** lucasagomes has quit IRC | 12:17 | |
*** lucasagomes has joined #openstack-infra | 12:18 | |
*** apetrich has quit IRC | 12:20 | |
*** yamamoto has quit IRC | 12:24 | |
*** xenos76 has quit IRC | 12:27 | |
*** apetrich has joined #openstack-infra | 12:28 | |
*** ykarel|afk is now known as ykarel | 12:28 | |
*** rfolco has joined #openstack-infra | 12:28 | |
*** odicha has quit IRC | 12:29 | |
*** yamamoto has joined #openstack-infra | 12:30 | |
*** jpena|lunch is now known as jpena | 12:31 | |
*** yamamoto has quit IRC | 12:35 | |
AJaeger | thanks, dirk | 12:36 |
ykarel | fatal: unable to access 'https://github.com/voxpupuli/puppet-git_resource/': Failed to connect to 140.82.114.3: Network is unreachable | 12:38 |
*** jaosorior has joined #openstack-infra | 12:38 | |
ykarel | seeing ^^ in centos7 jobs running in limestone-regioneone | 12:39 |
ykarel | https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_92/677192/2/check/puppet-openstack-integration-5-scenario001-tempest-centos-7-luminous/7140e5e/job-output.txt | 12:39 |
ykarel | is the issue already known? | 12:39 |
AJaeger | ykarel: looks like known IPv4 issue with CentOS in limestone | 12:40 |
ykarel | AJaeger, hmm i see ipv6 address in zuul inventory, http://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_92/677192/2/check/puppet-openstack-integration-5-scenario001-tempest-centos-7-luminous/7140e5e/zuul-info/inventory.yaml | 12:42 |
*** spsurya has quit IRC | 12:43 | |
*** xenos76 has joined #openstack-infra | 12:44 | |
*** ykarel is now known as ykarel|away | 12:47 | |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Switch to promote jobs for api-ref/-guide https://review.opendev.org/676631 | 12:47 |
efried | AJaeger: Thanks for the fixups yesterday | 12:52 |
*** ykarel|away has quit IRC | 12:52 | |
AJaeger | efried: now to get it merged ;) | 12:53 |
*** yamamoto has joined #openstack-infra | 12:53 | |
efried | Yeah, I think I saw it was still failing, but didn't look into it at all. | 12:53 |
AJaeger | efried: you can recheck it now since the blacklist change is in - can't you? | 12:54 |
AJaeger | it showed exactly that blacklisting is needed ;) | 12:54 |
efried | AJaeger: Oh, that's weird, I had it based on top of that change before specifically to prove that it would alert us to the problem, but pass once the problem was fixed. | 12:54 |
efried | When you patched it up, did you rebase it to master? | 12:54 |
AJaeger | efried: no, didn't rebase on purpose ;( | 12:55 |
efried | hm, no, looks like I ripped it out at PS3, not even sure how I did that. | 12:55 |
AJaeger | ;) | 12:55 |
efried | okay, so yeah, I'll recheck it now that kombu is quarantined. | 12:55 |
AJaeger | cool | 12:55 |
efried | it's possible I edited it from my phone; who knows what happens when I do that :P | 12:56 |
*** mriedem has joined #openstack-infra | 12:57 | |
efried | hm, so I don't need to cite the ensure-python role at all? I admit, I really don't understand how this stuff works. | 12:57 |
openstackgerrit | Sorin Sbarnea proposed opendev/bindep master: Expose base python version as an atom https://review.opendev.org/639951 | 12:58 |
AJaeger | efried: variables are global and not specific to a role | 12:58 |
*** dougsz has quit IRC | 12:59 | |
efried | so I guess that role must be specified somewhere in the job's ancestry. | 12:59 |
AJaeger | yes | 13:00 |
efried | cool | 13:00 |
*** dougsz has joined #openstack-infra | 13:03 | |
*** eharney has joined #openstack-infra | 13:07 | |
*** janki has quit IRC | 13:20 | |
*** beekneemech is now known as bnemec | 13:26 | |
*** dklyle has joined #openstack-infra | 13:29 | |
*** ociuhandu has joined #openstack-infra | 13:30 | |
*** ociuhandu has quit IRC | 13:34 | |
*** xenos76 has quit IRC | 13:34 | |
*** ykarel|away has joined #openstack-infra | 13:43 | |
zbr | AJaeger: can you please help with https://review.opendev.org/#/c/667614/ ? is blocking other changes. | 13:44 |
AJaeger | zbr: I'm not a core on bindep | 13:44 |
AJaeger | infra-root, could you review https://review.opendev.org/667614 , https://review.opendev.org/667533 , https://review.opendev.org/667694 on bindep, please? | 13:45 |
zbr | clarkb: mordred fungi ^ | 13:45 |
*** ykarel|away is now known as ykarel | 13:45 | |
zbr | thanks. good idea to use the magic word | 13:45 |
openstackgerrit | Sorin Sbarnea proposed opendev/bindep master: Expose base python version as an atom https://review.opendev.org/639951 | 13:46 |
openstackgerrit | Merged opendev/base-jobs master: Remove log_url from emit-job-header https://review.opendev.org/676756 | 13:46 |
openstackgerrit | Sorin Sbarnea proposed opendev/bindep master: Fix tox python3 overrides https://review.opendev.org/605613 | 13:49 |
*** jeliu_ has joined #openstack-infra | 13:57 | |
*** ociuhandu has joined #openstack-infra | 13:58 | |
fungi | sshnaidm: chkumar|rover: i was merely parroting what clarkb had guessed earlier. except then he later checked a centos-7 node in limestone and it had working ipv4 configured so we need to try and debug an actual failure there (it may be the job is doing something to the v4 config, or it may be that only some nodes booted there have this problem) | 13:59 |
sshnaidm | fungi, maybe we can hold one and debug there | 14:00 |
fungi | aspiers: AJaeger: the issue seems to be that when you're in file browse mode gitea wants to look up most recent commit info for each file it's displaying, and the larger the repository the longer that ends up taking. corvus suggested some ways that interested folks can help gitea upstream to make that more efficient but so far there have been no takers | 14:01 |
*** ociuhandu has quit IRC | 14:03 | |
openstackgerrit | James E. Blair proposed opendev/system-config master: Run a gerrit container (test only) https://review.opendev.org/630406 | 14:04 |
fungi | aspiers: i also suggested gitea add support for abbreviated commit ids in urls: https://github.com/go-gitea/gitea/issues/6450 | 14:05 |
*** dougsz has quit IRC | 14:05 | |
AJaeger | frickler: "ERROR: InterpreterNotFound: pypy" ;( | 14:07 |
AJaeger | frickler: we shoulud probably install pypy for that job | 14:08 |
*** pkopec has quit IRC | 14:08 | |
frickler | AJaeger: hmm, it looked like it was passing on other patches, but maybe they were even older | 14:08 |
AJaeger | frickler: we removed the default bindep-ballback file that was installing pypy everywhere | 14:09 |
*** pkopec has joined #openstack-infra | 14:09 | |
AJaeger | so, now it's not installed anymore | 14:09 |
AJaeger | frickler: want to +2A 667614 now? | 14:11 |
AJaeger | We can add pypy to the bindep.txt file of bindep | 14:11 |
*** ociuhandu has joined #openstack-infra | 14:11 | |
*** xenos76 has joined #openstack-infra | 14:15 | |
*** pkopec has quit IRC | 14:15 | |
*** sshnaidm is now known as sshnaidm|bbl | 14:16 | |
frickler | AJaeger: I agree that the pypy fix is likely independent of the current stack of patches. I still don't feel confident enough on bindep to +A those, rather wait for someone else to give another look at it | 14:17 |
AJaeger | fungi, it's up to you ;) could you review https://review.opendev.org/667614 , https://review.opendev.org/667533 , https://review.opendev.org/667694 on bindep, please? | 14:18 |
*** dpawlik has quit IRC | 14:20 | |
*** pkopec has joined #openstack-infra | 14:20 | |
*** eharney_ has joined #openstack-infra | 14:25 | |
*** dougsz has joined #openstack-infra | 14:26 | |
*** eharney has quit IRC | 14:26 | |
openstackgerrit | Andreas Jaeger proposed opendev/bindep master: Add bindep.txt for pypy https://review.opendev.org/677216 | 14:29 |
AJaeger | frickler: here's the proposed fix ^ | 14:29 |
AJaeger | config-core, could you review https://review.opendev.org/677162 - to add legacy openSUSE 15 nodeset, please? | 14:31 |
*** yolanda has quit IRC | 14:31 | |
*** SpamapS has quit IRC | 14:31 | |
*** yolanda has joined #openstack-infra | 14:36 | |
aspiers | fungi: nice, thanks! | 14:37 |
AJaeger | fungi, I'll add pinning for hacking... | 14:37 |
fungi | AJaeger: see latest comment i just added | 14:38 |
fungi | tox minversion needs to be set for the basepython conflict option | 14:38 |
AJaeger | will add as well... | 14:38 |
openstackgerrit | Andreas Jaeger proposed opendev/bindep master: Some cleanups https://review.opendev.org/677220 | 14:40 |
AJaeger | fungi: ^ | 14:40 |
AJaeger | and thanks for reviewing! | 14:40 |
fungi | thanks for the patches! | 14:40 |
AJaeger | fungi: I'll rebase - one more change... | 14:41 |
fungi | cool, i'll still go through the others too | 14:41 |
AJaeger | Argh, misread - all fine... | 14:41 |
AJaeger | we can merge 677220 as is | 14:41 |
*** markvoelker has quit IRC | 14:42 | |
openstackgerrit | Andreas Jaeger proposed opendev/bindep master: Replace Trusty with Bionic in the testing https://review.opendev.org/667694 | 14:44 |
AJaeger | config-core, two reviews for switching openstack-tox-docs to a promote job: ttps://review.opendev.org/677008 - promote jobs for tox-docs | 14:47 |
AJaeger | and https://review.opendev.org/677009 , please | 14:48 |
*** noorul has joined #openstack-infra | 14:48 | |
*** sgw has joined #openstack-infra | 14:51 | |
clarkb | AJaeger: I've +2'd the bindep followon changes if fungi can review those I think that will be done | 14:55 |
clarkb | now looking at tox-docs | 14:55 |
AJaeger | thanks, clarkb - and good morning | 14:55 |
*** armax has joined #openstack-infra | 14:55 | |
AJaeger | clarkb: and https://review.opendev.org/677162 as well, please | 14:55 |
openstackgerrit | Merged opendev/bindep master: Use Python 3.x by default for testing https://review.opendev.org/667614 | 14:56 |
*** ykarel has quit IRC | 14:57 | |
openstackgerrit | Javier Peña proposed opendev/puppet-openstackci master: Add AFS mirror support for RHEL/CentOS https://review.opendev.org/528739 | 14:57 |
AJaeger | two more bindep changes for review, please https://review.opendev.org/667694 and https://review.opendev.org/622325 | 14:58 |
*** SpamapS has joined #openstack-infra | 14:59 | |
*** josephrsandoval has joined #openstack-infra | 15:01 | |
*** pkopec has quit IRC | 15:02 | |
*** markvoelker has joined #openstack-infra | 15:03 | |
*** jaosorior has quit IRC | 15:04 | |
openstackgerrit | Merged openstack/openstack-zuul-jobs master: Add legacy-opensuse-15 nodeset https://review.opendev.org/677162 | 15:05 |
*** xenos76 has quit IRC | 15:09 | |
openstackgerrit | Merged opendev/bindep master: Switch to opensuse-15 nodeset for bindep testing https://review.opendev.org/667533 | 15:09 |
*** xenos76 has joined #openstack-infra | 15:11 | |
*** pkopec has joined #openstack-infra | 15:12 | |
zbr | fungi: another bindep https://review.opendev.org/#/c/667694/ | 15:14 |
fungi | yep, that's in the list of changes i was already looking at | 15:15 |
clarkb | infra-root https://review.opendev.org/#/c/677122/2 the stack that ends there should hopefully get us some indexed logs again (and we can continue to make incremental improvements as described in my comment in that change) | 15:16 |
clarkb | I can work on that change in a bit too | 15:16 |
*** jamesmcarthur has joined #openstack-infra | 15:19 | |
*** eharney_ is now known as eharney | 15:20 | |
fungi | AJaeger: was there a change to add pypy to bindep's bindep.txt test profile? | 15:21 |
fungi | or was that merely discussed as an option | 15:21 |
*** chkumar|rover is now known as raukadah | 15:24 | |
openstackgerrit | Clark Boylan proposed opendev/puppet-log_processor master: Don't try to get .gz suffixed files in addition to base url https://review.opendev.org/677236 | 15:26 |
clarkb | corvus: ^ You may want to double check my assertion in that one | 15:26 |
AJaeger | fungi: https://review.opendev.org/677216 adds pypi | 15:27 |
openstackgerrit | Merged opendev/bindep master: Replace Trusty with Bionic in the testing https://review.opendev.org/667694 | 15:27 |
corvus | clarkb: correct | 15:28 |
fungi | thanks AJaeger! | 15:28 |
*** noorul has quit IRC | 15:31 | |
*** noorul has joined #openstack-infra | 15:31 | |
*** factor has quit IRC | 15:32 | |
*** factor has joined #openstack-infra | 15:32 | |
*** kjackal has quit IRC | 15:32 | |
AJaeger | corvus, fungi, https://review.opendev.org/677008 adds a promote job for openstack-tox-docs - could I get a +2A on it, please? | 15:33 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: trigger: add job filter event https://review.opendev.org/639905 | 15:37 |
*** ociuhandu has quit IRC | 15:37 | |
*** gyee has joined #openstack-infra | 15:38 | |
AJaeger | thanks, corvus ! | 15:39 |
openstackgerrit | Merged opendev/puppet-log_processor master: log-gearman-worker: handle deflate encoded values https://review.opendev.org/677104 | 15:39 |
openstackgerrit | Merged opendev/puppet-log_processor master: log-gearman-worker: Remove jenkins streaming workaround https://review.opendev.org/677105 | 15:39 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: webtrigger: add initial driver and event https://review.opendev.org/555153 | 15:40 |
*** factor has quit IRC | 15:40 | |
*** factor has joined #openstack-infra | 15:41 | |
openstackgerrit | Merged openstack/project-config master: Add promote-openstack-tox-docs https://review.opendev.org/677008 | 15:44 |
AJaeger | fungi, commented on the promote job - see https://opendev.org/opendev/base-jobs/src/branch/master/zuul.d/secrets.yaml#L52, this is how the job is defined. | 15:44 |
fungi | yep | 15:45 |
fungi | thanks | 15:45 |
fungi | it makes sense that we'd tightly-couple path data with the credentials in this case, just odd more generally to see file paths as part of the credential set | 15:46 |
AJaeger | it's this way far easier to configure - compared to a separate playbook that we use now for publish jobs | 15:46 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: webtrigger: add web route and rpclistener https://review.opendev.org/554839 | 15:46 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: web: add build button to trigger job https://review.opendev.org/635716 | 15:46 |
*** mattw4 has joined #openstack-infra | 15:48 | |
*** ociuhandu has joined #openstack-infra | 15:51 | |
*** josephrsandoval has quit IRC | 15:52 | |
openstackgerrit | Merged opendev/puppet-log_processor master: log-gearman-worker: remove obsolete GET debug filter, add local filter https://review.opendev.org/677122 | 15:53 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Remove now obsolete publish-jobs https://review.opendev.org/677013 | 15:55 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Avoid duplication of secret https://review.opendev.org/677016 | 15:55 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Avoid duplication of secret https://review.opendev.org/677016 | 15:58 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Render the logfile under the manifest https://review.opendev.org/676843 | 16:00 |
*** sgw has quit IRC | 16:02 | |
openstackgerrit | Merged opendev/bindep master: Change openstack-dev to openstack-discuss https://review.opendev.org/622325 | 16:02 |
openstackgerrit | Merged opendev/bindep master: Add bindep.txt for pypy https://review.opendev.org/677216 | 16:02 |
openstackgerrit | Merged opendev/bindep master: Some cleanups https://review.opendev.org/677220 | 16:02 |
AJaeger | config-core, want to switch openstack-tox-docs to promote ? https://review.opendev.org/677009 is ready now... | 16:03 |
*** gfidente has quit IRC | 16:03 | |
mriedem | clarkb: efried was asking why we aren't getting some hits from the console log here, | 16:03 |
mriedem | https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_27/656027/25/check/openstack-tox-py37/369bc38/ | 16:03 |
mriedem | and i think it's probably b/c the console log is huge, 222MB | 16:04 |
mriedem | i don't see post-run log compression/publish failures, but guessing we're hitting issues indexing that console log? | 16:04 |
mriedem | oh ha | 16:04 |
mriedem | Delay in Elastic Search: Indexing behind by 94 hours | 16:04 |
mriedem | http://status.openstack.org/elastic-recheck/ | 16:04 |
clarkb | mriedem: we stopped indexing things with the switch to swift hosted logs (due to assumptions/bugs in the indexing pipeline), we are pushing up fixes for that now. That said yes huge console logs like that will be a problem too | 16:04 |
clarkb | mriedem: https://review.opendev.org/#/c/677236/1 is the end of the stack | 16:05 |
mriedem | clarkb: is there a reason the job-output.txt isn't compressed? | 16:05 |
clarkb | mriedem: it is compressed in transit (and I think in storage) but shown to consumers uncompressed | 16:06 |
mriedem | ah ok | 16:06 |
mriedem | hence the 'deflate' patch at the bottom of that series? | 16:06 |
*** rpittau is now known as rpittau|afk | 16:07 | |
clarkb | mriedem: for example if you wget the file you get a zlib formatted file (which is annoyin gbecause decompressing it is a pain due ot missing magic number for gzip) | 16:07 |
corvus | yeah, we're using content-encoding to do it (and yes, transit and storage) | 16:07 |
clarkb | mriedem: so if you wget it you'll need to use python zlib.decompress or add a gzip magic number or some other method of decompressing locally | 16:07 |
efried | okay, so I'm hearing this was a) bad timing, but also b) big logs are a problem -- because why? | 16:07 |
*** icarusfactor has joined #openstack-infra | 16:07 | |
*** tdasilva has quit IRC | 16:08 | |
efried | I mean, I get that hundreds of megs of log will add up very quickly. | 16:08 |
clarkb | mriedem: maybe what I need is an alias for gzip called zlib that prepends the maic number then calls gzip | 16:08 |
efried | Does e-r literally copy the logs somewhere? Just to index them or does it keep them? | 16:09 |
clarkb | efried: they take longer to index, use more disk (the index size is much larger than the input size), they are not easily consumed by humans (try open that file in your browser it will probably blow up), but most importantly you get enough of those in the pipeline at once and things really start to get unhappy particularly around memory use | 16:09 |
*** factor has quit IRC | 16:09 | |
mriedem | efried: e-r is just a front-end to the infra elasticsearch cluster that runs queries for known bugs | 16:10 |
clarkb | efried: the basic process if job finishes and uploads to "long term" log storage (aka swift now). Then zuul sends notices to a gearman client about which log files were just uploaded. That client checks those files against a list of things it knows it cares about and for matches it submits gearman jobs to index the log files. This then creates one gearman job per file which runs on a gearman worker | 16:10 |
clarkb | which fetches the log file, processes it with logstash, then indexes it in elasticserach | 16:10 |
*** jamesmcarthur has quit IRC | 16:11 | |
clarkb | we have to filter out all debug logs to make that fit into elasticsearch and even then still only fit 10 days | 16:11 |
*** jpena is now known as jpena|off | 16:11 | |
mriedem | *and* for console logs it's all lines | 16:11 |
clarkb | the main reason to not want a 200MB console log though is a human will have a hard time interacting with it. Its just noise | 16:11 |
mriedem | for screen logs it's only INFO+ | 16:11 |
efried | yup, all that makes sense. | 16:12 |
efried | So is it worth trying to "solve" (or at least band-aid) the problem when we have a situation where logs blow way up like this? | 16:13 |
efried | Like maybe taking the tail xxMB of the log so we would at least have *something*? | 16:13 |
clarkb | ime the problem is that projects like to warn about things and warn all the time about them | 16:13 |
clarkb | rather than use the warn once functionality in the warnings lib | 16:14 |
*** mattw4 has quit IRC | 16:14 | |
mriedem | which we've used to reduce the noise that causes the subunit parser fail bu | 16:14 |
mriedem | *bug | 16:14 |
*** mattw4 has joined #openstack-infra | 16:14 | |
mriedem | this kombu thing is new so we don't have a filter in place for it | 16:14 |
clarkb | fwiw in the past testr kept all that output out for stdout/stderr | 16:14 |
clarkb | s/out for/out of/ | 16:15 |
mriedem | still does, | 16:15 |
clarkb | mriedem: no stestr prints all test names and stderr | 16:15 |
mriedem | the problem is most of the tests fail and dump the outout | 16:15 |
clarkb | (for some projects there is a lot of stderr) | 16:15 |
efried | right, it was something like 12k unit tests | 16:15 |
clarkb | efried: mriedem in that case maybe we have stestr limit failure output to like the first 100 failures? | 16:16 |
clarkb | I could see that being useful locally too | 16:16 |
zbr | fungi: another bindep https://review.opendev.org/#/c/668740/1 | 16:16 |
efried | clarkb: that would also work | 16:17 |
*** lucasagomes has quit IRC | 16:18 | |
*** igordc has joined #openstack-infra | 16:18 | |
clarkb | 100 failures is a lot more manageable than 12k and in the case of 12k failures you likely need only fix one or two issues that have widespread impact caught by those 100 | 16:22 |
openstackgerrit | Sorin Sbarnea proposed opendev/bindep master: Expose base python version as an atom https://review.opendev.org/639951 | 16:25 |
openstackgerrit | Sorin Sbarnea proposed opendev/bindep master: Fix emerge testcases https://review.opendev.org/460217 | 16:26 |
*** mattw4 has quit IRC | 16:28 | |
*** tesseract has quit IRC | 16:35 | |
*** aaronsheffield has joined #openstack-infra | 16:38 | |
*** ramishra has quit IRC | 16:42 | |
*** smarcet has joined #openstack-infra | 16:42 | |
*** tdasilva has joined #openstack-infra | 16:45 | |
zbr | and finally after dealing with ten other changes, i get my py[23] atom ready again https://review.opendev.org/#/c/639951/ | 16:45 |
zbr | i also added support for debian and ubuntu, and documented all platforms supporting the new atoms. | 16:46 |
*** smarcet has left #openstack-infra | 16:47 | |
*** smarcet has joined #openstack-infra | 16:55 | |
*** jamesmcarthur has joined #openstack-infra | 16:56 | |
openstackgerrit | Clark Boylan proposed openstack/project-config master: Include ref info on stmp reporter subjects https://review.opendev.org/677254 | 16:57 |
clarkb | fungi: corvus mgoddard ^ something like that | 16:58 |
corvus | stmp would have been a great name for the email protocol. you could pronounce it 'stamp' | 16:58 |
clarkb | ugh I fixed at least one of those typos while writing that commit message. Must've missed another :) | 16:59 |
*** smarcet has quit IRC | 16:59 | |
clarkb | corvus: then you could have stamp collectors | 17:00 |
corvus | way better that MTAs | 17:00 |
fungi | or philatelist | 17:00 |
mgoddard | clarkb: assume this is for branches in stable-maint & other emails? Looks reasonable to me | 17:00 |
clarkb | mgoddard: ya | 17:01 |
fungi | mgoddard: so having it in the subject instead of the url will work for you? | 17:01 |
fungi | seems like a better place for metadata anyway | 17:02 |
clarkb | if that works well I'll probably submit a change to zuul to have the default subject include that info too | 17:03 |
*** derekh has quit IRC | 17:03 | |
clarkb | (we override anyway so figured I'd start with our overrides) | 17:03 |
*** tdasilva has quit IRC | 17:05 | |
*** ijw has joined #openstack-infra | 17:05 | |
*** pkopec has quit IRC | 17:06 | |
*** ijw has quit IRC | 17:06 | |
*** ijw has joined #openstack-infra | 17:07 | |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Change branch variable in PR https://review.opendev.org/677093 | 17:09 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Change PR url https://review.opendev.org/677257 | 17:09 |
*** e0ne has quit IRC | 17:10 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Render the logfile under the manifest https://review.opendev.org/676843 | 17:13 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Change PR url to point to the PR not the Repo https://review.opendev.org/677257 | 17:13 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Change branch variable in PR https://review.opendev.org/677093 | 17:13 |
*** jamesmcarthur has quit IRC | 17:15 | |
*** ociuhandu_ has joined #openstack-infra | 17:16 | |
*** ralonsoh has quit IRC | 17:16 | |
*** jamesmcarthur has joined #openstack-infra | 17:17 | |
*** ociuhandu has quit IRC | 17:19 | |
openstackgerrit | Merged openstack/project-config master: Include ref info on stmp reporter subjects https://review.opendev.org/677254 | 17:19 |
*** ociuhandu_ has quit IRC | 17:21 | |
*** jamesmcarthur has quit IRC | 17:22 | |
fungi | the STaMP protocol will forever live on thanks to the project-config repo's git log | 17:24 |
*** ociuhandu has joined #openstack-infra | 17:24 | |
openstackgerrit | Clark Boylan proposed opendev/puppet-log_processor master: Fix systemd severity filter input data https://review.opendev.org/677260 | 17:27 |
clarkb | infra-root I think ^ will get us to a mostly working spot with log indexing. I have log worker A on worker02 running that | 17:28 |
*** ociuhandu has quit IRC | 17:29 | |
fungi | so 677236 was a dead-end? | 17:30 |
clarkb | fungi: no I think we need to update the job submitter role to not remove the .gz suffix then we can merge 677236 | 17:30 |
clarkb | I was going to look at that next | 17:31 |
*** ricolin has quit IRC | 17:31 | |
fungi | ahh | 17:31 |
fungi | i couldn't tell whether your comment meant that it wasn't needed, or was simply incomplete | 17:31 |
clarkb | the job submitter on the zuul side must be preserving old behavior | 17:33 |
*** kopecmartin is now known as kopecmartin|off | 17:33 | |
clarkb | That isn't necessary anymore iwth the canonical urls all being correct relative to swift now | 17:33 |
clarkb | (before we would pretend the ungzipped file was a thing) | 17:33 |
*** mattw4 has joined #openstack-infra | 17:33 | |
*** sthussey has joined #openstack-infra | 17:43 | |
*** bhavikdbavishi1 has joined #openstack-infra | 17:52 | |
*** icarusfactor has quit IRC | 17:53 | |
openstackgerrit | Clark Boylan proposed openstack/project-config master: Stop treating .gz files as special in log handling https://review.opendev.org/677265 | 17:53 |
openstackgerrit | Clark Boylan proposed opendev/puppet-log_processor master: Don't try to get .gz suffixed files in addition to base url https://review.opendev.org/677236 | 17:53 |
*** bhavikdbavishi has quit IRC | 17:53 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 17:53 | |
clarkb | corvus: fungi: ^ that should do it | 17:54 |
*** jamesmcarthur has joined #openstack-infra | 17:54 | |
clarkb | I'm going to pop out for a bike ride now but will be back to help shepherd in the logstash worker stuff (note those changes will be applid to the server files but services are not automaticlaly restarted so we can safely approve them all and wait for updates then have ansible restart things when ready | 17:59 |
*** jamesmcarthur has quit IRC | 17:59 | |
clarkb | also I suppose I should start to try and reproduce the centos7 network trouble too | 17:59 |
clarkb | but all that in a bit | 17:59 |
*** jamesmcarthur has joined #openstack-infra | 18:00 | |
*** jamesmcarthur has quit IRC | 18:05 | |
corvus | clarkb: is 677265 going to mess up e-r queries which include filenames? | 18:10 |
corvus | clarkb: (ie, are we going to start having filenames with .gz in ES ?) | 18:10 |
corvus | clarkb: (at the same time there are e-r queries for "filename:n-api.log" or somesuch) | 18:11 |
clarkb | hrm ya probably | 18:14 |
clarkb | maybe we wait on that change for now and stabilize first then coordinate with e-r to update queries? | 18:15 |
*** michael-beaver has joined #openstack-infra | 18:17 | |
*** e0ne has joined #openstack-infra | 18:17 | |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Change PR url to point to the PR not the Repo https://review.opendev.org/677257 | 18:18 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Change branch variable in PR https://review.opendev.org/677093 | 18:18 |
*** noorul has quit IRC | 18:21 | |
*** jamesmcarthur has joined #openstack-infra | 18:23 | |
corvus | efried: here's a more polished version of the 'logfiles under manifest' change if you to try it out: https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_43/676843/4/check/zuul-build-dashboard/6a10611/npm/html/ | 18:33 |
efried | ... | 18:34 |
efried | corvus: Not sure if this is related to just your test setup, but browser Back button is giving me 404 every time I use it (which is because I f'ed up a nav and actually want to go Back and try again) | 18:35 |
*** SpamapS has quit IRC | 18:35 | |
efried | corvus: refinement: only when I f up by clicking through to one of the blue files that takes me out of the dashboard. | 18:37 |
corvus | efried: ah, yep, that's the test setup; will work in prod | 18:37 |
efried | corvus: I'm having trouble finding a build that has the black file names (is that still how I'm supposed to find an artifact that does this thing?) | 18:37 |
corvus | efried: also, i have some ui tweaks in mind to make it less likely to f up in that situation | 18:37 |
*** strigazi has quit IRC | 18:37 | |
*** strigazi has joined #openstack-infra | 18:38 | |
efried | well, presumably once you have all the files doing the in-app rendering it would be n/a. | 18:38 |
corvus | efried: oh, sorry, you no longer need to look for black filenames, it's back to the current behavior (they look like links) | 18:38 |
efried | oh, then I'm definitely having trouble finding one that "works". Every blue link I've tried so far has taken me to a new page for that file. | 18:38 |
efried | want to walk me through one? | 18:38 |
corvus | efried: an easy one in every build should be 'job-output.txt' | 18:39 |
corvus | that can always render in-app | 18:39 |
efried | got it | 18:40 |
efried | yes, seems to work good | 18:40 |
fungi | oof, i picked one at random which i thought would be small because it was just a unit test job, and the job-output.txt took a while to load because... turns out it's >5k lines long | 18:42 |
fungi | i wonder if we can include some indication of file size/length? | 18:42 |
fungi | that may influence the choice for someone to click through to the rendered version vs the raw link | 18:43 |
corvus | fungi: yes, we have the file size in bytes, we can add that to the tree view | 18:46 |
fungi | changing the level filter on a long log takes a while too... does it re-download the logfile when you do that, or is it really just taking that long for my browser to analyze the loglevel for every one of these 5k+ lines? | 18:46 |
corvus | the second thing | 18:46 |
fungi | makes sense | 18:46 |
*** tdasilva has joined #openstack-infra | 18:46 | |
fungi | oh, hah, and now i see my folly. i thought i was picking a tox job, but in actuality i mis-selected a tripleo-ci job instead :/ | 18:47 |
corvus | there may be more that can be done to optimize it, however, you'd be surprised how fast our js that changes everything about that actually runs. it's really quite fast. but then after we return our final result to react, something happens which takes a long time. i don't have the react/js knowledge to know what that is, and if there's something we could do that would help it along. | 18:47 |
*** ociuhandu has joined #openstack-infra | 18:48 | |
efried | Let's get a progress meter | 18:50 |
fungi | oh, for when content download/analysis is occurring/. | 18:50 |
fungi | ? | 18:50 |
efried | Loading |----- | 120 of 259Kb | 18:50 |
efried | yeah | 18:50 |
* efried says glibly, as if it's an easy thing to do. | 18:50 | |
*** sgw has joined #openstack-infra | 18:51 | |
tdasilva | I'm wondering if it's possible to pass a list of tags to `opendev-promote-docker-image` gate job. I'm starting to investigate if it's possible to build py2 and py3 swift docker images for a given patch, but I think the last part is where I need the promote job to tag an image something else besides "latest" | 18:51 |
tdasilva | have you guys run into this at all for zuul? | 18:51 |
fungi | efried: i think even just something which indicates the thing you clicked is still loading, even without indication of progress, would help | 18:52 |
efried | agreed ^ | 18:52 |
efried | though for my cpu cycles, I would just as soon have a static "Loading..." message as a fancy spinner. But I know I'm and old codger where UI is concerned. | 18:53 |
*** ociuhandu has quit IRC | 18:53 | |
corvus | tdasilva: yes, you can specify tags (we only use latest for zuul, but i think i have another example handy; one sec) | 18:55 |
fungi | https://hub.docker.com/r/zuul/zuul/tags also shows a change tag in addition to latest | 18:55 |
corvus | fungi, efried: there should be a spinner in the top right when it's downloading a file; however, you may have scrolled it out of view after looking at the first file | 18:55 |
corvus | (so i think we need something bigger that maybe overlays over the file contents area or something) | 18:55 |
efried | corvus: or have the spinner follow | 18:56 |
efried | there's some kind of css thingy to say "keep it in coordinates x,y of the visible window", nah? | 18:56 |
fungi | corvus: oh the irony, the top-right of my browser monitor is obscured because it's leaned up against a wall waiting for me to send the one i would be using out for repair | 18:56 |
efried | hahahahahahahaha | 18:56 |
corvus | fungi: i, uh, did not plan for that in my ui design. i am a failure. | 18:57 |
efried | actual lol | 18:57 |
efried | that's going in your permanent record | 18:57 |
fungi | indeed | 18:57 |
efried | while you're fixing that one, you may as well include "hippie hair obscures field of vision" | 18:58 |
* efried growing hair out, will also need ^ | 18:58 | |
corvus | efried: oh, i got that one covered; i don't think i've had a haircut since we merged the zuulv3 spec | 18:58 |
fungi | well, that technical difficulty aside, i think my browser needs a restart because it's being really slow to register clicks on these links and i'm starting to suspect it's memory problems not the demo site | 18:59 |
corvus | fungi: yeah, this will eat memory. it will cache the data from each file you click on in memory, to make it easier to switch back and forth between them. it should clear that if you close the tab or switch to a different build. i don't know how effective that is in practice. | 19:00 |
corvus | tdasilva: here we go: https://opendev.org/opendev/system-config/src/branch/master/.zuul.yaml#L172-L173 | 19:00 |
fungi | also possible switching builds isn't actually clearing it because i used the same tab to switch from the giant tripleo-ci console log to a different build | 19:00 |
corvus | tdasilva: that will tag it as 2.13 *instead of* latest (iow, the default for "tags" is ['latest']) | 19:01 |
*** bhavikdbavishi has quit IRC | 19:01 | |
*** SpamapS has joined #openstack-infra | 19:01 | |
corvus | fungi: it should happen even if you switch builds in the same tab | 19:02 |
fungi | yep, also possible it did that correctly and my browser was already really bloated on memory utilization | 19:02 |
*** dougsz has quit IRC | 19:03 | |
tdasilva | corvus: interesting that I'd expect that to be in the promote job and not the build job. line 186 seems to infer that the promote job would change the tag from 2.13 to latest? sorry if i'm missing something | 19:04 |
fungi | yeah, after a browser restart i pulled up a <1k-line job-output.txt for a failed keystone pep8 build and it can switch filter levels in less than a second | 19:05 |
AJaeger | config-core, do we want to switch openstack-tox-docs to promote publishing? Then, please review https://review.opendev.org/677009 and https://review.opendev.org/677013 | 19:07 |
tdasilva | oh, interesting, I guess I'd have an image with tags py2 *and* latest and then anothe image for the same patch with just a py3 tag | 19:07 |
corvus | tdasilva: all 3 jobs take the same data structure (note we use yaml anchors there to copy it to the upload and promote jobs). technically i think the build job ignores it, but the upload job does use it -- it forms part of the metadata that get put into zuul so the promote job gets the right images. | 19:07 |
*** jamesmcarthur has quit IRC | 19:08 | |
clarkb | AJaeger: all +2 from me | 19:08 |
corvus | i highly recommend using the same data for all 3 jobs | 19:08 |
clarkb | corvus: have a moment for https://review.opendev.org/#/c/677260/1 ? I'll restart all the workers once that is landed and on the nodes | 19:09 |
tdasilva | corvus: TIL yaml anchors :) thanks! | 19:09 |
AJaeger | thanks, clarkb | 19:11 |
corvus | tdasilva: they're scoped to the individual file in zuul (so nothing outside of that single file will see them). so far, that's been ideal :) i think that's the only thing to be aware of. otherwise, go crazy :) | 19:11 |
tdasilva | corvus: thanks! while we are on this subject, we last also talked about the idea of tagging releases. is that possible? | 19:13 |
tdasilva | corvus: in this case it would be a one time only tag for when we tag a swift release | 19:15 |
AJaeger | clarkb: do you want https://review.opendev.org/677265 to merge or not? I wasn't sure from backlog and removed my +A but left a +2. Please self-approve once ready | 19:18 |
clarkb | AJaeger: ya I think the point corvus made was a good one I will WIP it for now | 19:19 |
*** sshnaidm|bbl is now known as sshnaidm | 19:19 | |
clarkb | mriedem: when you get a chance can you read my comment on https://review.opendev.org/#/c/677265/1 and provide input on how bad that would be for e-r? I don't think it would be too bad but there might be a rough patch while we update things | 19:19 |
*** dougsz has joined #openstack-infra | 19:21 | |
fungi | could do a mass update via sed probably? but all outstanding changes for e-r queries would need fixing too | 19:22 |
clarkb | fungi: ya and some files would need the update and others wouldn't I think | 19:22 |
clarkb | mriedem should have a better sense for that | 19:22 |
*** factor has joined #openstack-infra | 19:22 | |
clarkb | (than me) | 19:22 |
*** tdasilva has quit IRC | 19:23 | |
*** tdasilva has joined #openstack-infra | 19:23 | |
*** eharney has quit IRC | 19:26 | |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Change PR url to point to the PR not the Repo https://review.opendev.org/677257 | 19:28 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Change branch variable in PR https://review.opendev.org/677093 | 19:28 |
*** mriedem has quit IRC | 19:29 | |
*** mriedem has joined #openstack-infra | 19:30 | |
openstackgerrit | David Shrewsbury proposed zuul/zuul master: Store autohold requests in zookeeper https://review.opendev.org/661114 | 19:31 |
mriedem | there aren't really any outstanding e-r changes that aren't mostly abandoned | 19:31 |
openstackgerrit | Clark Boylan proposed zuul/zuul master: Include ref info in smtp reporter subjects https://review.opendev.org/677285 | 19:33 |
openstackgerrit | David Shrewsbury proposed zuul/zuul master: Add caching of autohold requests https://review.opendev.org/663412 | 19:35 |
openstackgerrit | David Shrewsbury proposed zuul/zuul master: Add autohold-info CLI command https://review.opendev.org/662487 | 19:35 |
openstackgerrit | David Shrewsbury proposed zuul/zuul master: Record held node IDs with autohold request https://review.opendev.org/662498 | 19:35 |
openstackgerrit | David Shrewsbury proposed zuul/zuul master: Auto-delete expired autohold requests https://review.opendev.org/663762 | 19:35 |
openstackgerrit | David Shrewsbury proposed zuul/zuul master: Mark nodes as USED when deleting autohold https://review.opendev.org/664060 | 19:35 |
mriedem | clarkb: so, "this will change the filenames that e-r queries against", do you mean for the tags? | 19:41 |
mriedem | e.g. message:"foo" and tags:"screen-n-cpu.txt"? | 19:41 |
openstackgerrit | Merged opendev/puppet-log_processor master: Fix systemd severity filter input data https://review.opendev.org/677260 | 19:41 |
mriedem | we don't include .gz in the tags, | 19:41 |
mriedem | but we'll need to now? | 19:41 |
*** smarcet has joined #openstack-infra | 19:41 | |
mriedem | will the prefix on the filename still be ignored? e.g. looking at this query: | 19:44 |
mriedem | message:"Unable to update the attachment.: MessagingTimeout" AND tags:"screen-c-api.txt" AND voting:1 | 19:44 |
mriedem | that has hits on filename controller/logs/screen-c-api.txt and logs/screen-c-api.txt | 19:44 |
*** icarusfactor has joined #openstack-infra | 19:44 | |
mriedem | so would tags need to be: tags:"screen-c-api.txt.gz" or also include the controller/logs OR logs/ prefix? | 19:44 |
mriedem | that latter would be pretty annoying, especially for things like grenade jobs that have logs in old/ and new/ | 19:44 |
*** factor has quit IRC | 19:46 | |
clarkb | mriedem: I'd have to double check on tags I think we might continue to remove the .gz there | 19:46 |
clarkb | mriedem: but filename would change | 19:47 |
clarkb | mriedem: the prefixes won't be affected | 19:47 |
clarkb | mriedem: its all about whether or not we logically treat the file in swift called foo.txt.gz as foo.txt or start referring to it as foo.txt.gz always | 19:47 |
clarkb | mriedem: with the pre swift stuff we had a webserver smart enough to treat those as the same file | 19:47 |
clarkb | but swift would require us to upload the file twice I think (or do two objects) so for swift it is easier to just always refer to it as the foo.txt.gz file | 19:48 |
mriedem | ok we've tried to use tags over filename in queries for a long time so i don't think the impact would be big | 19:48 |
clarkb | ok let me confirm that tags won't be affected | 19:49 |
mriedem | filename:"job-output.txt" -> filename:"job-output.txt.gz" yeah? | 19:52 |
mriedem | what i'd probably do is go through and remove old queries with no hits, make the change and then see what's new that doesn't hit to see if i missed something | 19:53 |
mriedem | there are only 3 non-test queries that use filename and they all use job-output.txt | 19:53 |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Add support for building PDFs https://review.opendev.org/664555 | 19:54 |
corvus | tdasilva: sorry, i haven't done any work on tagging releases yet; i agree we should have that :) | 19:56 |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Avoid duplication of secret https://review.opendev.org/677016 | 20:02 |
AJaeger | corvus: so, I don't get the YAML in a state that Zuul is happy about ^ - I'll abandon | 20:04 |
*** smarcet has quit IRC | 20:05 | |
openstackgerrit | Clark Boylan proposed openstack/project-config master: Stop treating .gz files as special in log handling https://review.opendev.org/677265 | 20:07 |
clarkb | corvus: mriedem ^ I think that should largely address things for e-r as that will ensure the tag value remains | 20:07 |
mriedem | i'm working on the e-r side, but finally taking the time to write a script to automatically cleanup old scripts which i've just always done by hand | 20:09 |
mriedem | *old queries | 20:09 |
openstackgerrit | Merged openstack/project-config master: Add ceph/ceph-ansible to untrusted github projects https://review.opendev.org/676402 | 20:09 |
clarkb | and now to debug centos on limestone. I think my plan there is to boot instances until I catch one with no ipv4? might be a bit slow | 20:10 |
*** e0ne has quit IRC | 20:11 | |
openstackgerrit | Matthew Thode proposed openstack/diskimage-builder master: update gentoo systemd profile to 17.1 from 17.0 https://review.opendev.org/677290 | 20:14 |
*** eharney has joined #openstack-infra | 20:14 | |
tdasilva | corvus: no worries, just wanted to double-check | 20:18 |
*** e0ne has joined #openstack-infra | 20:21 | |
clarkb | maybe that was easier than I expected, clarkb-test on limestone seems to be exhibiting the no ipv4 addr behavior | 20:22 |
clarkb | 2607:ff68:100:54:f816:3eff:fea9:c890 if others want to look too | 20:22 |
clarkb | I do not immediately see what may be wrong. syslog claims that it is using dhclient and sysconfig says dhcp should be used | 20:22 |
clarkb | however /var/lib/dhclient shows no leases | 20:22 |
clarkb | and there is no dhclient running | 20:22 |
*** tdasilva has quit IRC | 20:24 | |
clarkb | ianw: ^ maybe you can take a look when your day starts | 20:25 |
*** tdasilva has joined #openstack-infra | 20:25 | |
fungi | /etc/sysconfig/network-scripts/ifcfg-eth0 has BOOTPROTO=dhcp | 20:26 |
fungi | ethernet address on eth0 also matches the HWADDR in there | 20:26 |
fungi | according to /var/log/messages it used dhclient for its dhcp-init via networkmanager | 20:27 |
*** jamesmcarthur has joined #openstack-infra | 20:28 | |
*** xenos76 has quit IRC | 20:29 | |
*** dougsz has quit IRC | 20:29 | |
fungi | do working nodes there have dual-stack eth0 or separate v4 and v6 interfaces? | 20:30 |
clarkb | dual stack eth0 | 20:30 |
fungi | this is certainly bizzare | 20:31 |
fungi | and you'd booted one there over the weekend which got working v4 right? so it's intermittent? | 20:31 |
clarkb | I just ssh'd into a node nodepool had booted to see if it had ipv4 or not and it did have an ipv4 address | 20:31 |
clarkb | pretty sure it is intermittent | 20:31 |
logan- | yeah very intermittent from what ive seen | 20:32 |
*** andreww has quit IRC | 20:33 | |
logan- | i'm afk but when i'm back in 1-2 hours i can work on tracing down dhcp packets and check logs etc | 20:33 |
clarkb | I wonder if this is the inverse of the ipv6 problem we had in fn | 20:33 |
*** xarses has joined #openstack-infra | 20:33 | |
clarkb | basically a race that prevents NM from configuring the ip stack on an interface because it thinks osmething else is doing it | 20:33 |
fungi | with the systemd->networkmanager->dhclient maze i'm not entirely sure where dhclient is going to log its activities | 20:33 |
clarkb | if ^ is the case I bet it will be happy after manually running dhclient | 20:33 |
clarkb | fungi: I think in the NM log which I don't see messages for but /me double checks journalctl | 20:34 |
fungi | yeah, i did `journalctl -u NetworkManager.service` and it said it would use dhclient but i don't see actual dhclient logging | 20:34 |
clarkb | journalctl -u NetworkManager is where I owuld expect such things | 20:34 |
fungi | also glean's logs say it ran a couple seconds before networkmanager, so no sequencing issue there i don't think | 20:35 |
clarkb | ya the sequencing issue we had with ipv6 was the kernel configuring the interface because it got an RA then NM deciding it shouldn't configure it | 20:36 |
clarkb | however I don't think ipv4 has any kernel built in stuff that might interfere with NM that way | 20:36 |
fungi | looks like networkmanager doesn't leave dhclient running as a persistent daemon either (or else didn't realize it failed to start?) | 20:36 |
clarkb | fungi: I think it only runs it when it needs a new lease | 20:36 |
fungi | no, kernel doesn't do v4 autoconfiguration, that's entirely userspace | 20:37 |
fungi | /var/lib/dhclient/ is empty and created as part of the image but stat says it was last accessed at 20:21:47 which was ~2.5 minutes after boot | 20:39 |
clarkb | I'm guessing the clues are in the NM logs sequence of events for eth0 | 20:39 |
clarkb | fungi: I ls'ed that dir | 20:39 |
fungi | immediately after boot? | 20:39 |
clarkb | fungi: ya it wasn't long after I did server create that I ssh'd in | 20:39 |
fungi | i did an ls on it too but that doesn't seem to have updated the last access timestamp | 20:39 |
clarkb | I cd'd into it too | 20:40 |
clarkb | not sure if that would change it | 20:40 |
fungi | aha, yeah that'd do it | 20:40 |
fungi | actually, no, that also didn't update it when i tried | 20:40 |
*** jamesmcarthur has quit IRC | 20:42 | |
*** jamesmcarthur has joined #openstack-infra | 20:50 | |
clarkb | finally getting to sorting out a meeting agenda. Do we still need to talk about the afs mirroring status of things? hrm seems fedora last updated ~9 days ago so sounds like it | 20:57 |
*** mattw4 has quit IRC | 21:05 | |
*** mattw4 has joined #openstack-infra | 21:06 | |
*** jamesmcarthur has quit IRC | 21:08 | |
*** DinaBelova has quit IRC | 21:09 | |
*** DinaBelova has joined #openstack-infra | 21:10 | |
*** jamesmcarthur has joined #openstack-infra | 21:10 | |
*** jamesmcarthur has quit IRC | 21:10 | |
*** pkopec has joined #openstack-infra | 21:14 | |
clarkb | did we update our gitea ssh image more recently than the logging change? https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_10/676510/1/check/system-config-run-gitea/4fd8551/gitea99.opendev.org/docker/giteadocker_gitea-ssh_1.txt is an odd failure to have | 21:15 |
clarkb | or wait we actually have to listen on 222 because we use host networkin right? | 21:16 |
clarkb | so that is correct to fail there I think | 21:16 |
*** jamesmcarthur has joined #openstack-infra | 21:17 | |
*** eharney has quit IRC | 21:18 | |
*** jamesmcarthur has quit IRC | 21:22 | |
*** mattw4 has quit IRC | 21:23 | |
*** mattw4 has joined #openstack-infra | 21:24 | |
*** sshnaidm is now known as sshnaidm|afk | 21:24 | |
*** tdasilva has quit IRC | 21:26 | |
*** tdasilva has joined #openstack-infra | 21:26 | |
*** jamesmcarthur has joined #openstack-infra | 21:28 | |
*** jamesmcarthur has quit IRC | 21:29 | |
*** pkopec has quit IRC | 21:30 | |
openstackgerrit | Matt Riedemann proposed opendev/elastic-recheck master: Add script to remove queries for fixed bugs https://review.opendev.org/677302 | 21:31 |
*** kjackal has joined #openstack-infra | 21:33 | |
clarkb | fungi: fwiw my read of the logs that suse did provide are that their secure mail gateway is killing the email when it gets to that point | 21:33 |
clarkb | and our lists server isn't ever getting a connection for that | 21:33 |
clarkb | and then they mentioned the lack of an MX record makes me wonder if that is a known issue with that mail gateay | 21:34 |
*** markvoelker has quit IRC | 21:36 | |
*** tdasilva has quit IRC | 21:38 | |
*** tdasilva has joined #openstack-infra | 21:38 | |
fungi | possible? if so they ought to fix that | 21:42 |
ianw | clarkb: ok, just gotta sort a few things then will see if i can help on centos; is it still only limestone? | 21:43 |
*** jamesmcarthur has joined #openstack-infra | 21:44 | |
*** jamesmcarthur has quit IRC | 21:45 | |
fungi | seems that way | 21:45 |
fungi | unless nodes with similar problems in other providers are manifesting early enough that they get discarded and rebuilt or jobs requeued | 21:46 |
fungi | afaik we haven't managed to rule that possibility out yet | 21:46 |
fungi | i'm about at a dead-end trying to find where dhcp logging is happening (though this may also mean that the underlying problem is resulting in dhclient never getting invoked) | 21:47 |
fungi | is it normal for /etc/dhcp/dhclient-eth0.conf to have 'send host-name "<hostname>";'? is that even legal (seems like < and > would be non-rfc-compliant in a hostname string) | 21:49 |
fungi | maybe networkmanager does some magic to that and treats it as a replacement string? | 21:49 |
fungi | the dhclient.conf(5) manpage doesn't mention it as a thing, at any rate | 21:52 |
ianw | hrm, for values of "normal" as in i don't think we changed anything ... but i agree it's not right | 21:52 |
ianw | is there a /var/run/nm-dhclient-eth0.conf? | 21:53 |
fungi | there is no /var/run/nm-* at all | 21:54 |
openstackgerrit | Matt Riedemann proposed opendev/elastic-recheck master: Add script to remove queries for fixed bugs https://review.opendev.org/677302 | 21:54 |
openstackgerrit | Matt Riedemann proposed opendev/elastic-recheck master: Remove old queries: 2019-08-19 https://review.opendev.org/677306 | 21:54 |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: remove displayedFile from state https://review.opendev.org/677307 | 21:56 |
*** jamesmcarthur has joined #openstack-infra | 21:58 | |
ianw | ipv4.method: disabled | 21:59 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add Tristan to Zuul Maintainers https://review.opendev.org/677308 | 22:00 |
fungi | ianw: where's that? a recursive grep of /etc doesn't find the string "ipv4.method" | 22:01 |
ianw | fungi: ohhh, config in /etc? yeah right grandpa :) i got that from ... nmcli c sh 704bbd46-f652-4a65-94f9-9441e1e7bdb7 ... yeah i don't exactly know *why* it's decided that :/ | 22:02 |
fungi | there are days i wish we could go back to expecting to find service and system configuration in plain files within /etc... lately those days are every day | 22:02 |
clarkb | ianw: hrm that was similar to ipv6 and it had decided to leave it alone because kernel had configured things under it | 22:02 |
*** trident has quit IRC | 22:03 | |
*** jamesmcarthur has quit IRC | 22:03 | |
*** diablo_rojo has joined #openstack-infra | 22:03 | |
fungi | right, i think we don't care that nm is ignoring ipv6 (or even prefer that) if we're doing slaac/ra | 22:04 |
fungi | but we're relying on it to invoke dhclient in this case | 22:04 |
fungi | for ipv4 | 22:04 |
clarkb | fungi: well it had the side effect of no working ipv6 in that case so we had to increase the kernel timeout before listening to RAs | 22:04 |
fungi | yeah | 22:05 |
clarkb | I'm going to restart logstash workers on all the hosts nowish | 22:05 |
*** rh-jelabarre has quit IRC | 22:05 | |
clarkb | and done. I think we should have logs again | 22:09 |
clarkb | *indexed logs again | 22:09 |
*** jeliu_ has quit IRC | 22:09 | |
*** jeliu_ has joined #openstack-infra | 22:10 | |
*** trident has joined #openstack-infra | 22:11 | |
clarkb | we should keep an eye on queue lengths to ensure we are keeping up with the changes to filtering stuff | 22:11 |
*** mriedem has quit IRC | 22:12 | |
clarkb | ianw: I seem to recall being able to increase the verbosity of NM logging when I was looking at the ipv6 things. Not sure if that will help here but maybe that is a change we can make to our images? | 22:16 |
clarkb | looks like our epel mirror has been running vos release since july 26 | 22:17 |
clarkb | and the fedora mirror is locked but not running anything | 22:18 |
logan- | i'm back for a bit. lmk if i can help with anything. | 22:19 |
clarkb | logan-: I think what ianw found from nmcli points to an issue with our images | 22:19 |
logan- | ah ok, that lines up well with this only occurring on centos images | 22:21 |
clarkb | I'm going to grab the fedora lockfile on mirror-update.opendev.org, delete the lock in the vldb, then rerun the rsync without vos release | 22:21 |
*** kjackal has quit IRC | 22:21 | |
*** dave-mccowan has quit IRC | 22:22 | |
clarkb | hrm someone just ran vos unlock against pypi and centos? | 22:22 |
clarkb | infra-root ^ is someone else doing afs cleanup? | 22:22 |
fungi | i am not, no | 22:23 |
fungi | lsof the lock? | 22:23 |
ianw | not i | 22:23 |
clarkb | fungi: well its the afs command | 22:23 |
fungi | huh | 22:23 |
ianw | centos would be coming from the new mirror host (rsync) | 22:23 |
ianw | pypi ... should be disabled? | 22:23 |
clarkb | 586 2019-08-19T22:18:15+0000 vos unlock mirror.pypi -localauth | 22:24 |
clarkb | what is odd is those come before my commands to unlock fedora a while back | 22:24 |
fungi | maybe some cron operation we've scripted does a vos unlock? | 22:24 |
clarkb | so the timestamp there doesn't seem trustworthy | 22:24 |
clarkb | since neither of thoes are fedora I'll proceed with fedora | 22:24 |
fungi | oh! those are in root's shell history? | 22:25 |
fungi | not on mirror-update.opendev.org i guess... where are you seeing that? | 22:27 |
fungi | one of the fileservers? | 22:27 |
*** tdasilva has quit IRC | 22:27 | |
clarkb | ya afs01.dfw.o.o sorry I wasn't clear about that | 22:27 |
clarkb | I'm tlaking about the afs locks in this case not the coordination locks for cron on mirror-update | 22:28 |
clarkb | I checked history to refresh my memory on the unlock command and found those | 22:28 |
fungi | yeah history is up to 1123 lines now, so line 586 is relatively ancient | 22:28 |
clarkb | fedora rsync script without vos release is running on mirror-update.opendev.org now | 22:28 |
fungi | i suspect that's a side effect of turning on command timestamping and how it shows commands which predate it (so likely have no timestamp) | 22:29 |
clarkb | I expect that mirror has grown too big to reliably update or that release of fedora-30 has cuased churn there? | 22:29 |
corvus | (i did not do any afs things) | 22:29 |
clarkb | it ran for about a week after I fixed it last time | 22:29 |
clarkb | and then for epel I don't know why those processes have stuck around so long. I'm guessing we have to kill the sync on mirror-update, grab the lockfile, remove the vldb lock, manually run rsync then start it over again like fedora? | 22:30 |
fungi | looks like command timestamping was probably turned on around 2018-04-18 so any commands run before that are being reported by `history` with today's date and time. i wouldn't be worried about it | 22:30 |
clarkb | ah | 22:31 |
corvus | ha. so "now" means either "now" or "so long ago it doesn't matter". take your pick. practically the same thing. | 22:31 |
fungi | but definitely surprising, thanks for bringing that to light | 22:31 |
clarkb | ok fedora rsync went quickly actually | 22:32 |
clarkb | now going to start a vos release with localauth on afs01.dfw | 22:32 |
*** xarses has quit IRC | 22:36 | |
*** xarses_ has joined #openstack-infra | 22:36 | |
clarkb | corvus: any idea how bad it would be to kill a vos release process on mirror-update that has been running since july 26th? | 22:41 |
*** jamesmcarthur has joined #openstack-infra | 22:43 | |
*** tdasilva has joined #openstack-infra | 22:45 | |
ianw | eth0 704bbd46-f652-4a65-94f9-9441e1e7bdb7 ethernet eth0 | 22:45 |
ianw | System eth0 5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03 ethernet -- | 22:45 |
ianw | this looks exactly like https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=755202 | 22:45 |
openstack | Debian bug 755202 in network-manager "network-manager: keeps creating and using new connection "eth0" that does not work" [Important,Open] | 22:45 |
ianw | i think this is because ipv6 kernel autoconf has given it an address before nm starts (which yes, ties into the last issue...) | 22:46 |
corvus | clarkb: if there's no transaction running, probably completely safe | 22:47 |
fungi | ianw: oh joy, because none of us grew tired of debugging that already ;) | 22:47 |
clarkb | ianw: maybe our timeout isn't long enough in all cases then | 22:48 |
clarkb | ianw: the value I chose was a bit trial and error | 22:48 |
corvus | clarkb: and i don't see any old transactions | 22:48 |
clarkb | corvus: k I'll look at doing that next then (currently writing a patch to further restrict what we pull for fedora atomic) | 22:48 |
openstackgerrit | Jeff Liu proposed zuul/zuul-operator master: Add PerconaXDB Cluster to Zuul-Operator https://review.opendev.org/677315 | 22:49 |
ianw | clarkb: i think this summary sounds saneish -> http://paste.openstack.org/show/760139/ | 22:50 |
clarkb | ianw: ya in that case maybe we incrase the timeout value we chose | 22:50 |
clarkb | in fact it should be mostly safe to have that timeout be quite large since NM is configuring interfaces anyway and doesn't care about that sysctl value | 22:51 |
*** tdasilva has quit IRC | 22:51 | |
ianw | or disable it? | 22:51 |
*** tdasilva has joined #openstack-infra | 22:51 | |
*** rlandy|ruck is now known as rlandy|ruck|bbl | 22:52 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Add more fedora-atomic mirror exclusions https://review.opendev.org/677318 | 22:54 |
clarkb | ianw: I want to say when I tested disabling it that NM didn't get the RAs | 22:54 |
*** jeliu_ has quit IRC | 22:54 | |
*** EmilienM|pto has quit IRC | 22:54 | |
clarkb | infra-root 677318 should further reduce the size of the fedora mirror. I'm happy to continue holding the lock for that mirror if we want to get that in and I'll rerun my syncs | 22:55 |
*** ijw has quit IRC | 22:55 | |
fungi | looking | 22:55 |
*** EmilienM has joined #openstack-infra | 22:56 | |
fungi | we don't think anyone's relying on us to mirror isos, disk images or bootloaders/payloads? | 22:56 |
fungi | seems unlikely, yeah | 22:56 |
*** lathiat has quit IRC | 22:57 | |
clarkb | the mirror was set up to mirror qcow2 images for use with libvirt | 22:57 |
clarkb | while they could be using isos or efi configs that would be silly | 22:57 |
*** rcernin has joined #openstack-infra | 22:57 | |
fungi | yeah | 22:57 |
clarkb | (and openstack doesn't pxeboot so I think pxeboot should be safe) | 22:57 |
fungi | oh, the .img files were for installer images | 22:57 |
fungi | righty-o | 22:57 |
*** lathiat has joined #openstack-infra | 22:57 | |
clarkb | ya that is why I didn't *.img | 22:57 |
fungi | fire in the hole | 22:58 |
clarkb | ideally magnum would stop using fedora entirely as they are still on 27 iirc | 22:59 |
clarkb | seems like a platform thatdoesn't update every 6 months might be more appropriate given that | 23:00 |
*** diablo_rojo has quit IRC | 23:03 | |
fungi | the explanation i heard was that since rhel-8 was based on fedora-27 they're using that as a stand-in for centos-8 | 23:06 |
fungi | and if that's the case, they ought to be able to switch to centos-8 for those jobs as soon as it becomes available | 23:06 |
fungi | which, i agree, will make a lot more sense | 23:07 |
clarkb | fungi: that is 28 | 23:07 |
clarkb | which is also eol | 23:07 |
fungi | ahh, then it was !magnum i guess... tripleo? | 23:07 |
clarkb | ya tripleo not magnum | 23:08 |
fungi | got it | 23:08 |
clarkb | if you deploy a magnum k8s in say vexxhost you get a fedora 27 host | 23:08 |
clarkb | because that is the image their deployer supports | 23:08 |
fungi | well, at any rate, yeah i'd push them to switch to centos-8 once we have it | 23:08 |
fungi | at least that will be receiving security fixes | 23:08 |
clarkb | new git adds `git restore` which is a neat helper command | 23:11 |
clarkb | alright I'm going to kill the epel stuff on mirror-update.opendev.org now | 23:15 |
ianw | clarkb: ohhh, right ... haha yes so i'm pointing you to the bug you found and notated in the dib change to put the pause in | 23:22 |
ianw | sorry, picture only just now coalescing in my mind :) | 23:23 |
clarkb | epel mirror is getting vos released now | 23:25 |
*** tkajinam has joined #openstack-infra | 23:27 | |
clarkb | ianw: ya maybe we just bump that timeout to be very big? | 23:29 |
fungi | without reading the release notes, i'm going to guess `git restore` has something to do with figuring out and rolling back to previous worktree states from the reflog | 23:30 |
clarkb | fungi: ya | 23:30 |
clarkb | since the checkout commands to do that are so painful | 23:30 |
*** dchen has joined #openstack-infra | 23:30 | |
fungi | i don't personally find the checkout commands as painful as figuring out what state i wanted from the reflog | 23:31 |
fungi | then again, i'm steeped in git esoterica every day | 23:31 |
*** dychen has joined #openstack-infra | 23:33 | |
*** jamesmcarthur has quit IRC | 23:33 | |
*** dchen has quit IRC | 23:35 | |
clarkb | I have to read the manpage every time I want to check out a file from the past | 23:37 |
*** e0ne has quit IRC | 23:40 | |
ianw | oh, sorry i just rebooted clarkb-test ... but testing longer timeout in sysctl | 23:41 |
clarkb | no problem I hvane't touched it for a bit (working on afs things) | 23:41 |
clarkb | feel free to take it over | 23:41 |
clarkb | epel vos release failed | 23:42 |
clarkb | I might need to leave that one there for now | 23:42 |
ianw | ok, it got an ipv4 address now ... | 23:42 |
clarkb | fedora release is still in progress | 23:43 |
ianw | i can take a look at epel ... epel 8 was released maybe? don't know if that is related | 23:44 |
clarkb | ianw: the rsync went quick | 23:47 |
clarkb | but the vos release failed after a few minutes | 23:47 |
clarkb | http://paste.openstack.org/show/760140/ | 23:48 |
ianw | now i put the timeout back to 15, rebooted, and it also got an ipv4 address :/ | 23:48 |
*** markvoelker has joined #openstack-infra | 23:48 | |
clarkb | if we are really close to the timeout it may come down to load and luck | 23:48 |
openstackgerrit | Merged opendev/system-config master: Add more fedora-atomic mirror exclusions https://review.opendev.org/677318 | 23:49 |
ianw | Mon Aug 19 23:40:47 2019 warning: volume 536870968 recursively checked out by programType id 4 | 23:50 |
clarkb | I've got the lockfile held on mirror-update. Could it be the other mirorr update causing problems again? | 23:53 |
clarkb | that crontab is still commented out | 23:53 |
fungi | there may be a broad race between start time and the ra timeout such that adjusting the timeout merely reduces or increases the random chance we'll get no ipv4 address | 23:56 |
*** jamesmcarthur has joined #openstack-infra | 23:56 | |
clarkb | thats a good point | 23:57 |
clarkb | in which case a very long timeout is probably what we want? | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!