*** agopi|brb has quit IRC | 00:00 | |
*** rpioso is now known as rpioso|afk | 00:10 | |
*** betherly has joined #openstack-infra | 00:11 | |
*** betherly has quit IRC | 00:15 | |
*** darvon has joined #openstack-infra | 00:21 | |
*** betherly has joined #openstack-infra | 00:31 | |
*** betherly has quit IRC | 00:36 | |
*** agopi|brb has joined #openstack-infra | 00:37 | |
*** agopi|brb has quit IRC | 00:39 | |
*** agopi|brb has joined #openstack-infra | 00:39 | |
*** longkb has joined #openstack-infra | 00:40 | |
*** betherly has joined #openstack-infra | 00:52 | |
*** diablo_rojo has quit IRC | 00:55 | |
*** betherly has quit IRC | 00:56 | |
*** ansmith_ has joined #openstack-infra | 01:02 | |
*** jamesmcarthur has joined #openstack-infra | 01:03 | |
*** jamesmcarthur has quit IRC | 01:05 | |
*** jamesmcarthur has joined #openstack-infra | 01:05 | |
*** betherly has joined #openstack-infra | 01:12 | |
*** betherly has quit IRC | 01:17 | |
*** mrsoul has joined #openstack-infra | 01:17 | |
*** betherly has joined #openstack-infra | 01:33 | |
*** jamesmcarthur has quit IRC | 01:35 | |
*** betherly has quit IRC | 01:37 | |
*** carl_cai has quit IRC | 01:48 | |
*** lbragstad has quit IRC | 01:49 | |
*** lbragstad has joined #openstack-infra | 01:49 | |
*** betherly has joined #openstack-infra | 01:53 | |
*** rcernin has joined #openstack-infra | 01:55 | |
*** betherly has quit IRC | 01:58 | |
*** jamesmcarthur has joined #openstack-infra | 02:05 | |
*** liusheng__ has joined #openstack-infra | 02:07 | |
*** jamesmcarthur has quit IRC | 02:09 | |
*** betherly has joined #openstack-infra | 02:14 | |
*** bobh has joined #openstack-infra | 02:14 | |
*** betherly has quit IRC | 02:19 | |
openstackgerrit | Merged openstack-infra/project-config master: Disable inap-mtl01 provider https://review.openstack.org/613418 | 02:22 |
---|---|---|
dmsimard | Would it be a good idea to force http -> https redirection on our things that are available over ssl ? | 02:27 |
dmsimard | logs, git, zuul, etc | 02:27 |
dmsimard | I could write a patch like that | 02:27 |
*** adrianreza has joined #openstack-infra | 02:31 | |
*** betherly has joined #openstack-infra | 02:34 | |
*** betherly has quit IRC | 02:39 | |
*** bhavikdbavishi has joined #openstack-infra | 02:47 | |
dmsimard | What's the thing that closes PRs on github with a template ? | 02:47 |
*** rh-jelabarre has quit IRC | 02:48 | |
*** roman_g has quit IRC | 02:49 | |
*** betherly has joined #openstack-infra | 02:54 | |
*** betherly has quit IRC | 03:00 | |
*** bobh has quit IRC | 03:10 | |
*** ykarel|away has joined #openstack-infra | 03:25 | |
*** dpawlik has quit IRC | 03:27 | |
*** carl_cai has joined #openstack-infra | 03:27 | |
*** cfriesen has quit IRC | 03:29 | |
*** dpawlik has joined #openstack-infra | 03:29 | |
*** betherly has joined #openstack-infra | 03:35 | |
*** betherly has quit IRC | 03:40 | |
*** udesale has joined #openstack-infra | 03:51 | |
*** betherly has joined #openstack-infra | 03:56 | |
*** lpetrut has joined #openstack-infra | 03:58 | |
*** betherly has quit IRC | 04:00 | |
*** dave-mccowan has quit IRC | 04:14 | |
*** janki has joined #openstack-infra | 04:22 | |
ianw | clarkb: https://review.openstack.org/613503 Call pre/post run task calls from TaskManager.submit_task() I think explains our missing nodepool logs | 04:28 |
*** lpetrut has quit IRC | 04:34 | |
*** dpawlik has quit IRC | 04:36 | |
*** dpawlik has joined #openstack-infra | 04:39 | |
*** kjackal has joined #openstack-infra | 04:45 | |
*** ramishra has joined #openstack-infra | 05:12 | |
*** yamamoto has quit IRC | 05:26 | |
*** yamamoto has joined #openstack-infra | 05:26 | |
*** kjackal has quit IRC | 05:29 | |
*** carl_cai has quit IRC | 05:33 | |
*** betherly has joined #openstack-infra | 05:36 | |
*** betherly has quit IRC | 05:40 | |
*** bhavikdbavishi1 has joined #openstack-infra | 05:47 | |
*** trown has joined #openstack-infra | 05:49 | |
*** kopecmartin has joined #openstack-infra | 05:50 | |
*** elod_ has joined #openstack-infra | 05:50 | |
*** evrardjp_ has joined #openstack-infra | 05:51 | |
*** quiquell|off is now known as quiquell | 05:53 | |
*** jpenag has joined #openstack-infra | 05:53 | |
*** hemna_ has joined #openstack-infra | 05:54 | |
*** ianw_ has joined #openstack-infra | 05:54 | |
*** dims_ has joined #openstack-infra | 05:54 | |
*** bhavikdbavishi has quit IRC | 05:55 | |
*** apetrich has quit IRC | 05:55 | |
*** dhill_ has quit IRC | 05:55 | |
*** Diabelko has quit IRC | 05:55 | |
*** SotK has quit IRC | 05:55 | |
*** gothicmindfood has quit IRC | 05:55 | |
*** kopecmartin|off has quit IRC | 05:55 | |
*** dims has quit IRC | 05:55 | |
*** dulek has quit IRC | 05:55 | |
*** jpena|off has quit IRC | 05:55 | |
*** strigazi has quit IRC | 05:55 | |
*** elod has quit IRC | 05:55 | |
*** nhicher has quit IRC | 05:55 | |
*** lucasagomes has quit IRC | 05:55 | |
*** gnuoy has quit IRC | 05:55 | |
*** hemna has quit IRC | 05:55 | |
*** evrardjp has quit IRC | 05:55 | |
*** mudpuppy has quit IRC | 05:55 | |
*** mattoliverau has quit IRC | 05:55 | |
*** cgoncalves has quit IRC | 05:55 | |
*** brwyatt has quit IRC | 05:55 | |
*** emerson has quit IRC | 05:55 | |
*** bradm has quit IRC | 05:55 | |
*** chkumar|off has quit IRC | 05:55 | |
*** ianw has quit IRC | 05:55 | |
*** Qiming has quit IRC | 05:55 | |
*** jlvillal has quit IRC | 05:55 | |
*** aluria has quit IRC | 05:55 | |
*** mdrabe has quit IRC | 05:55 | |
*** mpjetta has quit IRC | 05:55 | |
*** Keitaro has quit IRC | 05:55 | |
*** trown|outtypewww has quit IRC | 05:55 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 05:55 | |
*** ianw_ is now known as ianw | 05:55 | |
*** brwyatt has joined #openstack-infra | 05:56 | |
*** irclogbot_1 has quit IRC | 05:58 | |
*** apetrich has joined #openstack-infra | 06:02 | |
*** dhill_ has joined #openstack-infra | 06:02 | |
*** Diabelko has joined #openstack-infra | 06:03 | |
*** Keitaro has joined #openstack-infra | 06:05 | |
*** chandankumar has joined #openstack-infra | 06:06 | |
*** ykarel|away is now known as ykarel | 06:10 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: New Repo: OpenStack-Helm Docs https://review.openstack.org/611893 | 06:20 |
*** ccamacho has quit IRC | 06:20 | |
*** xinliang has joined #openstack-infra | 06:21 | |
AJaeger | config-core, two new repos for review, please https://review.openstack.org/#/c/611892 and https://review.openstack.org/611893 | 06:22 |
AJaeger | dmsimard: openstack-infra/jeepyb/jeepyb/cmd/close_pull_requests.py - let me fix quickly... | 06:23 |
*** gfidente has joined #openstack-infra | 06:26 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/jeepyb master: Use https for links https://review.openstack.org/613509 | 06:28 |
AJaeger | dmsimard: ^ | 06:28 |
*** aojeagarcia has joined #openstack-infra | 06:29 | |
*** aojea has quit IRC | 06:33 | |
*** ccamacho has joined #openstack-infra | 06:41 | |
*** bhavikdbavishi has quit IRC | 06:48 | |
*** ccamacho has quit IRC | 06:49 | |
*** ccamacho has joined #openstack-infra | 06:51 | |
*** yamamoto has quit IRC | 06:53 | |
*** yamamoto has joined #openstack-infra | 06:53 | |
*** yamamoto has quit IRC | 06:53 | |
*** yamamoto has joined #openstack-infra | 06:54 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/613511 | 06:55 |
*** quiquell is now known as quiquell|brb | 06:57 | |
*** ginopc has joined #openstack-infra | 07:07 | |
*** quiquell|brb is now known as quiquell | 07:14 | |
*** rcernin has quit IRC | 07:22 | |
*** ykarel is now known as ykarel|lunch | 07:24 | |
*** shardy has joined #openstack-infra | 07:26 | |
*** strigazi has joined #openstack-infra | 07:26 | |
*** bauzas is now known as bauwser | 07:35 | |
*** witek has quit IRC | 07:35 | |
*** xek has joined #openstack-infra | 07:35 | |
*** evrardjp_ is now known as evrardjp | 07:37 | |
*** tosky has joined #openstack-infra | 07:46 | |
*** hashar has joined #openstack-infra | 07:53 | |
*** kjackal has joined #openstack-infra | 07:57 | |
*** rossella_s has joined #openstack-infra | 08:00 | |
*** jpich has joined #openstack-infra | 08:03 | |
*** SotK has joined #openstack-infra | 08:06 | |
*** elod_ is now known as elod | 08:07 | |
*** carl_cai has joined #openstack-infra | 08:22 | |
*** derekh has joined #openstack-infra | 08:23 | |
openstackgerrit | Merged openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/613511 | 08:29 |
*** ccamacho has quit IRC | 08:32 | |
*** panda|off is now known as panda | 08:32 | |
*** lucasagomes has joined #openstack-infra | 08:33 | |
*** ccamacho has joined #openstack-infra | 08:33 | |
openstackgerrit | Frank Kloeker proposed openstack-infra/openstack-zuul-jobs master: Rename index file of doc translations https://review.openstack.org/613531 | 08:35 |
ianw | hrm, there seems to be something up with http://mirror.regionone.limestone.openstack.org/ | 08:40 |
ianw | #status log restarted apache2 service on mirror.regionone.limestone.openstack.org | 08:41 |
openstackstatus | ianw: finished logging | 08:41 |
ianw | nothing really odd in the logs | 08:41 |
*** ykarel|lunch is now known as ykarel | 08:42 | |
*** dulek has joined #openstack-infra | 08:46 | |
openstackgerrit | Merged openstack-infra/irc-meetings master: Remove ironic-bfv and ironic-ui meetings https://review.openstack.org/612695 | 09:05 |
*** xinliang has quit IRC | 09:09 | |
*** e0ne has joined #openstack-infra | 09:16 | |
*** kjackal_v2 has joined #openstack-infra | 09:16 | |
*** kjackal has quit IRC | 09:20 | |
*** xinliang has joined #openstack-infra | 09:21 | |
*** kjackal_v2 has quit IRC | 09:28 | |
*** kjackal has joined #openstack-infra | 09:28 | |
*** Qiming has joined #openstack-infra | 09:35 | |
*** yamamoto has quit IRC | 09:36 | |
*** alexchadin has joined #openstack-infra | 09:37 | |
*** electrofelix has joined #openstack-infra | 09:58 | |
*** dpawlik has quit IRC | 10:03 | |
*** dpawlik_ has joined #openstack-infra | 10:03 | |
*** lpetrut has joined #openstack-infra | 10:11 | |
*** jamesmcarthur has joined #openstack-infra | 10:12 | |
*** ssbarnea has joined #openstack-infra | 10:12 | |
*** jamesmcarthur has quit IRC | 10:16 | |
*** bhavikdbavishi has joined #openstack-infra | 10:28 | |
*** bhavikdbavishi has quit IRC | 10:32 | |
mtreinish | fungi: we should be already running the fix for 7651, we switched to the ppa to get 1.15 (which includes the fix for 7651) | 10:32 |
mtreinish | fungi: also persia and I backported that fix for ubuntu at the dublin ptg: https://bugs.launchpad.net/ubuntu/+source/mosquitto/+bug/1752591 | 10:32 |
openstack | Launchpad bug 1752591 in mosquitto (Ubuntu Bionic) "CVE-2017-7651 and CVE-2017-7652" [Undecided,Fix released] | 10:32 |
mtreinish | so unfortunately I don't think it will fix our crashing issue, that's a bug with the log handling | 10:33 |
*** bhavikdbavishi has joined #openstack-infra | 10:34 | |
*** jpenag is now known as jpena | 10:36 | |
*** bhavikdbavishi has quit IRC | 10:38 | |
mtreinish | fungi: it probably doesn't hurt to bump up the version, but I'm not optimistic that it would fix the crashing | 10:40 |
*** betherly has joined #openstack-infra | 10:41 | |
*** kjackal has quit IRC | 10:46 | |
*** pbourke has quit IRC | 10:48 | |
*** pbourke has joined #openstack-infra | 10:48 | |
*** dtantsur|afk is now known as dtantsur | 10:48 | |
*** ssbarnea has quit IRC | 10:49 | |
*** e0ne has quit IRC | 10:51 | |
*** e0ne_ has joined #openstack-infra | 10:52 | |
*** alexchadin has quit IRC | 10:52 | |
*** AJaeger_ has joined #openstack-infra | 10:57 | |
*** AJaeger has quit IRC | 10:59 | |
*** jpena is now known as jpena|lunch | 11:01 | |
mtreinish | fungi: we set it to 'present' in the puppet. So bumping the package will have to be done manually: https://git.openstack.org/cgit/openstack-infra/puppet-mosquitto/tree/manifests/init.pp#n16 | 11:06 |
slaweq | hi infra team | 11:06 |
slaweq | I just spotted error like: http://logs.openstack.org/14/613314/1/check/neutron-grenade-multinode/6874aba/job-output.txt.gz#_2018-10-26_09_08_36_328644 (/tmp/ansible/bin/ara: No such file or directory) in two different jobs running on Neutron rocky branch, do You know what could cause that? | 11:07 |
*** kjackal has joined #openstack-infra | 11:10 | |
*** dave-mccowan has joined #openstack-infra | 11:15 | |
*** EmilienM is now known as EvilienM | 11:24 | |
*** udesale has quit IRC | 11:28 | |
*** panda is now known as panda|lunch | 11:29 | |
*** hashar is now known as hasharAway | 11:31 | |
*** ramishra has quit IRC | 11:31 | |
*** janki has quit IRC | 11:36 | |
*** ansmith_ has quit IRC | 11:39 | |
*** rh-jelabarre has joined #openstack-infra | 11:43 | |
*** longkb has quit IRC | 11:49 | |
*** jpena|lunch is now known as jpena | 11:57 | |
*** ykarel is now known as ykarel|away | 11:58 | |
*** yamamoto has joined #openstack-infra | 12:02 | |
*** carl_cai has quit IRC | 12:02 | |
*** ykarel|away has quit IRC | 12:02 | |
*** jcoufal has joined #openstack-infra | 12:04 | |
*** kjackal has quit IRC | 12:08 | |
*** kjackal has joined #openstack-infra | 12:09 | |
*** emerson has joined #openstack-infra | 12:15 | |
dmsimard | slaweq: that comes from devstack-gate: http://codesearch.openstack.org/?q=%2Ftmp%2Fansible%2Fbin%2Fara&i=nope&files=&repos= | 12:16 |
dmsimard | The ara not found is intriguing. I need to drop kids at school, I'll be able to check in ~20 minutes | 12:18 |
slaweq | dmsimard: thx a lot | 12:23 |
*** panda|lunch is now known as panda | 12:24 | |
*** eharney has joined #openstack-infra | 12:25 | |
*** bobh has joined #openstack-infra | 12:26 | |
*** e0ne_ has quit IRC | 12:26 | |
*** carl_cai has joined #openstack-infra | 12:29 | |
*** yamamoto has quit IRC | 12:32 | |
*** rlandy has joined #openstack-infra | 12:36 | |
*** quiquell is now known as quiquell|lunch | 12:37 | |
fungi | dmsimard: that should be pretty easy to do. we already have some sites/services we do that for (e.g. review, docs, governance, security) so i'd argue there's not a lot of reason to serve any of the reset of them via both http+https these days anyway | 12:40 |
*** agopi|brb is now known as agopi | 12:41 | |
fungi | looks like the releases site redirects http->https as well | 12:41 |
fungi | should be able to just copy configuration from one or more of those, and apply it to anything in our ssl cert check config which is missing that | 12:42 |
*** roman_g has joined #openstack-infra | 12:43 | |
openstackgerrit | Simon Westphahl proposed openstack-infra/zuul master: Use branch for grouping in supercedent manager https://review.openstack.org/613335 | 12:44 |
dmsimard | slaweq: if you look a bit above that ara command not found error, you'll see that we failed to install ansible in the first place.. looks like timeout to the limestone mirror http://logs.openstack.org/14/613314/1/check/neutron-grenade-multinode/6874aba/job-output.txt.gz#_2018-10-26_08_42_51_581644 | 12:44 |
slaweq | dmsimard: thx for investigating that, so it looks that it was probably temporary issue on one cloud provider only | 12:45 |
dmsimard | slaweq: the server looks healthy and reachable right now, there may have been a temporary network issue | 12:46 |
dmsimard | please recheck and let us know if it reoccurs | 12:46 |
dmsimard | fungi: ok, I'll take a stab at it | 12:47 |
*** jamesmcarthur has joined #openstack-infra | 12:47 | |
slaweq | dmsimard: sure, thx a lot | 12:47 |
fungi | slaweq: dmsimard: earlier (08:40z in scrollback) ianw noted that apache had died on that mirror and he restarted it. also logged at https://wiki.openstack.org/wiki/Infrastructure_Status | 12:48 |
dmsimard | ah, well there we go | 12:49 |
dmsimard | I'm not fully awake yet haha | 12:49 |
fungi | np, i'm already well on my way to caffeination | 12:50 |
*** mdrabe has joined #openstack-infra | 12:53 | |
*** yamamoto has joined #openstack-infra | 12:55 | |
quiquell|lunch | fungi: Do you know why I have "This change depends on a change that failed to merge" here https://review.openstack.org/#/c/613297/ | 12:56 |
quiquell|lunch | fungi: all of them has being rebased | 12:56 |
fungi | quiquell|lunch: the timing of the message is usually an indicator | 12:57 |
quiquell|lunch | fungi: ahh wait... I didn't rebase on of the... git pull --rebase does not do the job | 12:58 |
slaweq | fungi: thx also for help | 12:58 |
fungi | quiquell|lunch: you uploaded patchset #5 at 10:15z, so it was queued for testing or possibly in the midst of running some jobs, then at 11:11z one of its dependencies got uploaded | 12:58 |
logan- | regarding the limestone mirror apache issue, the disk is 90% full because of the base image churn from yesterday. there are 2 sets of base images cached on all of the nodes currently until nova deletes the old nodepool images today. | 12:59 |
quiquell|lunch | fungi: ack thanks ! | 12:59 |
fungi | quiquell|lunch: and so it was queued to test with dependent change 613316,2 but you uploaded 613316,3 so zuul was informing you that the original dependency can never merge now and it has aborted the queued/running jobs | 12:59 |
*** ansmith_ has joined #openstack-infra | 12:59 | |
logan- | i will remove that hv from the aggregate for now so no nodepool images will get scheduled there, that will keep the usage steady until the cleanup occurs | 12:59 |
fungi | quiquell|lunch: a recheck of will 613297 will queue it to test with the new dependency you uploaded | 13:00 |
*** kgiusti has joined #openstack-infra | 13:00 | |
*** dave-mccowan has quit IRC | 13:00 | |
fungi | logan-: thanks! one thing worth noting, to work around the full disk issues crashing the mirror vm completely we "preallocated" the remaining rootfs by writing zeroes to a file and then deleting it once we hit enospc | 13:01 |
*** derekh has quit IRC | 13:01 | |
logan- | yeah, I suspect the disk hit 100% at some point this morning (90% right now with 12 nodepool vms running), and the preallocation probably prevented it from crashing ;) | 13:03 |
*** quiquell|lunch is now known as quiquell | 13:03 | |
logan- | 218G of cached images weighing heavy on it heh | 13:04 |
*** derekh has joined #openstack-infra | 13:04 | |
*** derekh has quit IRC | 13:04 | |
fungi | oof! | 13:04 |
fungi | how old are some of those? are we leaking images? that sounds like rather more than i would expect | 13:05 |
*** derekh has joined #openstack-infra | 13:05 | |
logan- | i think everything was rebuilt simultaneously yesterday during the zuul/nodepool maintenance so we ended up with 2x the number of images cached than normal | 13:05 |
fungi | we should only ever at most have 3x the number of image labels we've defined (current, previous as a safety fallback, and one uploading before the oldest gets deleted) | 13:06 |
logan- | because iirc nova keeps the base images cached on the hv for 24h after their last use | 13:06 |
fungi | ohhh | 13:06 |
fungi | so on the compute nodes, not in glance | 13:06 |
logan- | since that maintenance is coming up on 24h i think this should just work itself out over the next few hours and then I can put the host back in the aggregate :) | 13:06 |
logan- | yup | 13:07 |
openstackgerrit | Sorin Sbarnea proposed openstack-dev/pbr master: Correct documentation hyperlink for environment-markers https://review.openstack.org/613576 | 13:07 |
fungi | also i think in glance we'll generally run much closer to 2x than 3x because we only upload one image at a time and then delete the oldest for that label | 13:07 |
*** tpsilva has joined #openstack-infra | 13:09 | |
logan- | yup, glance is on a 30TB ceph pool so no concerns there | 13:10 |
logan- | images leak often but I think clarkb cleaned up all of the old leaked images yesterday | 13:10 |
*** AJaeger_ is now known as AJaeger | 13:18 | |
*** dpawlik_ has quit IRC | 13:18 | |
*** mriedem has joined #openstack-infra | 13:19 | |
*** dpawlik has joined #openstack-infra | 13:20 | |
*** efried is now known as fried_rice | 13:23 | |
*** e0ne has joined #openstack-infra | 13:25 | |
*** chandankumar is now known as chkumar|off | 13:33 | |
fungi | yes, i believe he did shortly after the upgrade | 13:35 |
fungi | er, the zk cluster replacement for nodepool i mean | 13:35 |
*** agopi is now known as agopi|brb | 13:35 | |
*** agopi|brb has quit IRC | 13:40 | |
*** jamesmcarthur has quit IRC | 13:46 | |
ssbarnea|bkp2 | fungi: regarding moving browbeat config to repo at https://review.openstack.org/#/c/613092 -- already merged in repo, do we need to keep the pubish-to-pypi inside project-config or we can remove the entire section? | 13:48 |
ssbarnea|bkp2 | it is already listed inside repo. | 13:48 |
*** boden has joined #openstack-infra | 13:50 | |
fungi | ssbarnea|bkp2: it looks like other official projects have kept the publish-to-pypi or publish-to-pypi-python3 template application in project-config but i'll admit i haven't been following the goal work there closely enough to know for sure whether that's intended (i have to assume it must be?). AJaeger: do you know the reason for that? | 13:51 |
*** bnemec has joined #openstack-infra | 13:54 | |
*** munimeha1 has joined #openstack-infra | 13:58 | |
*** agopi|brb has joined #openstack-infra | 14:05 | |
*** agopi|brb is now known as agopi | 14:05 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: DNM Link to change page from status panel https://review.openstack.org/613593 | 14:10 |
*** onovy has quit IRC | 14:16 | |
AJaeger | fungi: we left them in project-config since tagging does not know about pipelines, so a branched project needs to have the job declared in project-config. This is mentioned in infra-manual as well | 14:17 |
AJaeger | ssbarnea|bkp2: https://review.openstack.org/#/c/613004/6/.zuul.yaml did *not* import publish-to-pypi, it's not in-repo | 14:17 |
AJaeger | ssbarnea|bkp2, fungi, so https://review.openstack.org/#/c/613092 is fine to +2A IMHO. | 14:18 |
ssbarnea|bkp2 | AJaeger: no worry. i can add it. i just wanted to know if there is something preventing a full move. | 14:18 |
*** gfidente has quit IRC | 14:18 | |
AJaeger | ssbarnea|bkp2: https://docs.openstack.org/infra/manual/creators.html#central-config-exceptions | 14:19 |
*** dpawlik has quit IRC | 14:19 | |
fungi | AJaeger: ahh, right, we still haven't decided on the possible https://review.openstack.org/578557 behavior change for that | 14:21 |
*** jamesmcarthur has joined #openstack-infra | 14:21 | |
ssbarnea|bkp2 | AJaeger: so i was remembering something from that doc. Still "should" in specs is such a gray area... :) | 14:21 |
*** stephenfin is now known as finucannot | 14:22 | |
*** dpawlik has joined #openstack-infra | 14:24 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: quick-start: add a note about github https://review.openstack.org/613398 | 14:25 |
*** tosky has quit IRC | 14:25 | |
*** roman_g has quit IRC | 14:27 | |
boden | hi.. I've been trying to update tricircle's in repo zuul config and tox for zuul v3 (they appear to be out of date) in https://review.openstack.org/#/c/612729/ previously they were installing required projects in tox, but when I remove that and add them to the .zuul.conf there's import errors http://logs.openstack.org/29/612729/3/check/openstack-tox-py27/a00e36e/job-output.txt.gz#_2018-10-25_20_17_28_206916 I see | 14:28 |
boden | neutron installed as a sibling so I'm confused as to root cause of import err | 14:28 |
boden | any ideas? | 14:28 |
*** roman_g has joined #openstack-infra | 14:28 | |
*** tosky has joined #openstack-infra | 14:29 | |
*** dpawlik has quit IRC | 14:29 | |
*** kjackal has quit IRC | 14:32 | |
*** kjackal has joined #openstack-infra | 14:32 | |
ssbarnea|bkp2 | AJaeger fungi : small css improvement on os-loganalyze (no more horizontal browsing on pip reqs listings): https://review.openstack.org/#/c/613383/ | 14:35 |
*** smarcet has joined #openstack-infra | 14:37 | |
boden | actually maybe its because those dependencies are not int he requirements... I'll try that | 14:38 |
*** quiquell is now known as quiquell|off | 14:42 | |
*** carl_cai has quit IRC | 14:42 | |
fungi | boden: yeah, http://logs.openstack.org/29/612729/3/check/openstack-tox-py27/a00e36e/tox/py27-siblings.txt indicates to me that it didn't get installed (probably owing to it not being in the requirements as you noted) | 14:45 |
boden | fungi thanks... what makes you say it wasn't installed from that log.. I see "Sibling neutron at src/git.openstack.org/openstack/neutron" doesn't that mean it's already there?? just trying to understand for my own benefit | 14:46 |
fungi | mordred: is that ^ correct? would there be both a "sibling at path" line and a "found neutron python package installed" line in that log if it had been? | 14:46 |
mordred | fungi: reading | 14:47 |
fungi | boden: i interpreted that to mean that it sees src/git.openstack.org/openstack/neutron and is aware it's listed as a required-project but not necessarily that tox installed it into the resulting virtualenv | 14:47 |
boden | hmm ok.. thanks | 14:47 |
mordred | yes - a found sibling must already be in the requirements for it to be installed | 14:47 |
mordred | so if there are repos in required-projects but not listed in requirements.txt they will not be installed | 14:48 |
fungi | the pip freeze is here too which i think confirms it: http://logs.openstack.org/29/612729/3/check/openstack-tox-py27/a00e36e/tox/py27-5.log | 14:48 |
fungi | no neutron in the freeze output | 14:48 |
mordred | (this is to avoid things like pip install -e src/git.openstack.org/openstack/requirements - which is not what we'd want to have happen | 14:48 |
fungi | looks like the only package installed from local source in that freeze is tricircle==5.1.1.dev35 | 14:49 |
fungi | boden: ^ | 14:50 |
fungi | ssbarnea|bkp2: thanks! that looks pretty straightforward | 14:50 |
boden | fungi: ack, got it... | 14:50 |
*** armstrong has joined #openstack-infra | 14:51 | |
*** ssbarnea|bkp2 has quit IRC | 14:51 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add the process environment to zuul.conf parser https://review.openstack.org/612824 | 14:51 |
*** rossella_s has quit IRC | 14:53 | |
corvus | infra-root: i'm going to be afk until wednesday | 14:56 |
*** cfriesen has joined #openstack-infra | 14:56 | |
fungi | thanks for the heads-up! i hope it's for fun reasons | 14:57 |
*** rpioso|afk is now known as rpioso | 14:58 | |
*** ssbarnea has joined #openstack-infra | 15:00 | |
*** diablo_rojo has joined #openstack-infra | 15:00 | |
*** dansmith is now known as SteelyDan | 15:01 | |
*** dave-mccowan has joined #openstack-infra | 15:04 | |
*** dave-mccowan has quit IRC | 15:10 | |
*** hasharAway is now known as hashar | 15:12 | |
openstackgerrit | Sorin Sbarnea proposed openstack-dev/pbr master: Correct documentation hyperlink for environment-markers https://review.openstack.org/613576 | 15:16 |
*** gyee has joined #openstack-infra | 15:23 | |
*** onovy has joined #openstack-infra | 15:31 | |
*** smarcet has quit IRC | 15:40 | |
*** apetrich has quit IRC | 15:44 | |
*** zul has quit IRC | 15:52 | |
clarkb | morning, having a slow start to the day and I need to run some erradns so may be a bit before I'm actually around. I'd like to look at using the new compute resource usage logs to produce a report of some sort that shows usage by projects (and maybe by distro-release and other stuff if we can do it) | 15:52 |
*** agopi is now known as agopi|food | 15:52 | |
clarkb | it looks like zk is still happy and the node count has leveled off | 15:53 |
*** gothicmindfood has joined #openstack-infra | 15:54 | |
*** apetrich has joined #openstack-infra | 15:58 | |
*** lpetrut has quit IRC | 16:00 | |
*** ginopc has quit IRC | 16:02 | |
*** dtantsur is now known as dtantsur|afk | 16:07 | |
*** kjackal has quit IRC | 16:10 | |
*** e0ne has quit IRC | 16:10 | |
*** kopecmartin is now known as kopecmartin|off | 16:13 | |
*** bnemec is now known as beekneemech | 16:13 | |
dmsimard | does "flake8: noqa" no longer work ? I'm seeing pep8 failures that should be ignored | 16:18 |
openstackgerrit | Alex Schultz proposed openstack-infra/project-config master: Add noop to instack-undercloud https://review.openstack.org/613630 | 16:19 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator https://review.openstack.org/604648 | 16:20 |
*** shardy has quit IRC | 16:20 | |
*** fried_rice is now known as fried_rolls | 16:20 | |
dmsimard | looks like it's "noqa" instead of "flake8: noqa" now | 16:21 |
dmsimard | ¯\_(ツ)_/¯ | 16:21 |
clarkb | dmsimard: it has always been just # noqa iirc | 16:23 |
fungi | i don't recall ever using "flake8: noqa" and only ever used "noqa" myself | 16:23 |
dmsimard | http://codesearch.openstack.org/?q=flake8%3A%20noqa&i=nope&files=&repos= | 16:24 |
fungi | interesting. i guess that must have worked at some point or else it's a really huge case of cargo-culting | 16:26 |
ssbarnea | some with me, only used # noqa --- .... when it was not really possible to avoid it. | 16:26 |
mordred | it definitely _used_ to work | 16:26 |
*** hashar is now known as hasharAway | 16:28 | |
fungi | skimming through http://flake8.pycqa.org/en/latest/release-notes/index.html it doesn't look like they ever deprecated it and i even see a reference to it in the latest 3.6.0 notes | 16:28 |
fungi | http://flake8.pycqa.org/en/latest/release-notes/3.6.0.html#features | 16:28 |
fungi | dmsimard: so it should still work? | 16:29 |
fungi | dmsimard: care to link to the failure in question? | 16:29 |
*** aojeagarcia has quit IRC | 16:30 | |
dmsimard | sure, hang on | 16:30 |
dmsimard | example: http://logs.openstack.org/99/613399/1/check/openstack-tox-pep8/6b14dee/job-output.txt.gz#_2018-10-25_23_54_59_038898 fixed by https://review.openstack.org/#/c/613634/ | 16:31 |
*** jpich has quit IRC | 16:31 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator https://review.openstack.org/604648 | 16:31 |
*** mriedem is now known as mriedem_away | 16:34 | |
fungi | dmsimard: is "E261 at least two spaces before inline comment" perhaps the actual problem you ended up solving (inadvertently) there? | 16:35 |
dmsimard | yeah that's the first thing I tried | 16:36 |
fungi | you replaced "foo # flake8: noqa" lines with "foo # noqa" (note the leading double-space) | 16:36 |
dmsimard | these failures sort of confused me to be honest because this code hasn't been touched in a very long time | 16:36 |
dmsimard | and it started failing just now | 16:36 |
fungi | did you suddenly switch to a newer flake8? | 16:37 |
dmsimard | not sure, to be fair there isn't exactly a lot of traffic on ara since everything is focused on the new 1.0 repos so it may be something that failed now but the cause dates back days/weeks | 16:38 |
fungi | flake8==3.6.0 | 16:38 |
fungi | that's the newest release from 3 days ago | 16:38 |
*** agopi|food is now known as agopi | 16:38 | |
fungi | and note the comment about noqa in the release notes i linked for 3.6.0 | 16:38 |
fungi | "Only skip a file if # flake8: noqa is on a line by itself (See also GitLab#453, GitLab!219)" | 16:39 |
fungi | so i take that to mean that prior to 3.6.0 it was skipping that whole file because at least one line had "# flake8: noqa" | 16:39 |
dmsimard | I think it was only meant to skip a particular line but don't quote me on that | 16:39 |
dmsimard | at least, that's my understanding of it | 16:40 |
fungi | yeah, see those gitlab links | 16:40 |
dmsimard | and from codesearch, it seems to be how projects are using it too | 16:40 |
fungi | which would explain why all those unrelated linting errors for that file suddenly popped up when switching to 3.6.0 | 16:40 |
dmsimard | oh! so it ignored the whole file instead of just the one line | 16:40 |
fungi | https://gitlab.com/pycqa/flake8/issues/453 | 16:40 |
dmsimard | which is in all likelihood not the original intent | 16:40 |
fungi | yeah, i think people were misusing it | 16:40 |
fungi | so les cultes du cargo at work | 16:41 |
dmsimard | well, if pep8 jobs start failing all over the place, we'll know why :D | 16:41 |
* fungi butchers french for your pleasure | 16:41 | |
dmsimard | maybe openstack-dev worthy | 16:41 |
fungi | yes, i think this will be of interest to openstack-dev ml | 16:41 |
dmsimard | I'll send something | 16:41 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator https://review.openstack.org/604648 | 16:41 |
fungi | odds are few people have run into it yet because we generally pin linters for official openstack projects at the start of a cycle | 16:42 |
*** fuentess has joined #openstack-infra | 16:42 | |
fungi | so this will require a fair amount of cleanup from a lot of projects who were up to now doing the wrong thing and not realizing it | 16:43 |
*** trown is now known as trown|lunch | 16:43 | |
dmsimard | ++ | 16:43 |
*** sthussey has joined #openstack-infra | 16:44 | |
dmsimard | fungi: I don't see a pin on flake8 in openstack/requirements.. would that be elsewhere ? | 16:49 |
fungi | dmsimard: it's in each project. we omit linters from requirements tracking explicitly | 16:50 |
dmsimard | ah | 16:50 |
fungi | because different projects will want to raise their linter caps at their own pace | 16:51 |
dmsimard | makes sense | 16:53 |
*** derekh has quit IRC | 16:58 | |
*** bauwser is now known as bauzas | 17:01 | |
*** betherly has quit IRC | 17:03 | |
*** zul has joined #openstack-infra | 17:04 | |
*** jpena is now known as jpena|off | 17:06 | |
*** lpetrut has joined #openstack-infra | 17:08 | |
*** jamesmcarthur has quit IRC | 17:16 | |
*** electrofelix has quit IRC | 17:32 | |
*** ykarel|away has joined #openstack-infra | 17:34 | |
ssbarnea | fungi: no meeting in progress, good time to merge https://review.openstack.org/#/c/613022/ ? | 17:35 |
ssbarnea | if i remember well openstack approach regarding linting was to pin to hacking which was pinning flake8, right? | 17:36 |
*** bobh has quit IRC | 17:36 | |
fungi | yes on both questions | 17:37 |
*** lbragstad is now known as elbragstad | 17:37 | |
fungi | 613022 could use a second infra-root reviewer though since i'm the only +2 on it | 17:37 |
*** Swami has joined #openstack-infra | 17:38 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Cleanup down ports https://review.openstack.org/609829 | 17:41 |
Shrews | ianw: addressed your comments in ^^^ | 17:42 |
*** smarcet has joined #openstack-infra | 17:49 | |
*** trown|lunch is now known as trown | 17:50 | |
*** xek has quit IRC | 17:58 | |
*** jamesmcarthur has joined #openstack-infra | 18:01 | |
AJaeger | config-core, please review https://review.openstack.org/613092 https://review.openstack.org/#/c/611893/ https://review.openstack.org/#/c/611892 | 18:10 |
*** armstrong has quit IRC | 18:10 | |
*** munimeha1 has quit IRC | 18:11 | |
*** mriedem_away is now known as mriedem | 18:28 | |
*** apetrich has quit IRC | 18:34 | |
*** apetrich has joined #openstack-infra | 18:35 | |
*** e0ne has joined #openstack-infra | 18:35 | |
openstackgerrit | Felipe Monteiro proposed openstack-infra/project-config master: Remove airship-armada jobs, as they are all in project https://review.openstack.org/611013 | 18:35 |
openstackgerrit | Merged openstack-infra/project-config master: Move openstack-browbeat zuul jobs to project repository https://review.openstack.org/613092 | 18:35 |
openstackgerrit | Merged openstack-infra/project-config master: New Repo - OpenStack-Helm Images https://review.openstack.org/611892 | 18:40 |
openstackgerrit | Sean McGinnis proposed openstack-dev/pbr master: Fix incorrect use of flake8:noqa https://review.openstack.org/613665 | 18:43 |
openstackgerrit | Merged openstack-infra/project-config master: New Repo: OpenStack-Helm Docs https://review.openstack.org/611893 | 18:47 |
*** jamesmcarthur has quit IRC | 18:48 | |
clarkb | fungi: any idea if there are any meetings we need to worry about for 613022? | 18:53 |
fungi | didn't sound like it, but i haven't checked | 18:53 |
fungi | ssbarnea seemed to think it was safe earlier when he brought it up | 18:54 |
clarkb | eavesdrop seems to think it is ok | 18:54 |
ssbarnea | what wrong can happen? | 18:55 |
clarkb | ssbarnea: the meetbot is restarted when we change its config so if a meeting is running when that happens it will break the logging of that meeting | 18:55 |
ssbarnea | ahh, yeah. that is why i suspect weekends are the best times for that. i doubt we have official meetings during them. | 18:56 |
clarkb | the latest meeting on eavesdrop is 1500UTC | 18:57 |
clarkb | so I approved it (1900UTC now) | 18:57 |
*** armax has quit IRC | 19:05 | |
openstackgerrit | Alex Schultz proposed openstack-infra/project-config master: Add noop to instack-undercloud https://review.openstack.org/613630 | 19:06 |
*** fried_rolls is now known as efried | 19:07 | |
*** efried is now known as fried_rice | 19:11 | |
*** e0ne has quit IRC | 19:14 | |
*** jcoufal has quit IRC | 19:15 | |
*** dave-mccowan has joined #openstack-infra | 19:18 | |
*** toabctl has quit IRC | 19:19 | |
*** e0ne has joined #openstack-infra | 19:19 | |
*** smarcet has quit IRC | 19:19 | |
*** hasharAway is now known as hashar | 19:20 | |
*** toabctl has joined #openstack-infra | 19:21 | |
fungi | as long as it doesn't take several days to merge, we should be fine ;) | 19:22 |
*** e0ne has quit IRC | 19:23 | |
*** anticw has joined #openstack-infra | 19:26 | |
anticw | zuul/pipeline q ... is it possible to have a 3rd party gate that can test some but not all PS? and then have it +Verified at which point zuul will no longer spend effort testing a PS? | 19:27 |
*** harlowja has quit IRC | 19:27 | |
clarkb | anticw: third party testing can filter patchsets however they like. Not sure what you mean by the second bit. You want zuul to not test a patchset if it gets +1 from third party ci? if so that isn't possible because you need a +1 and +2 from zuul to merge code | 19:31 |
Shrews | clarkb: oh, forgot to answer your question from yesterday. no, nodepool should not reuse image names. it does, however, retry upload attempts that fail. perhaps a failure actually succeeded? | 19:33 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul master: Small script to scrape Zuul job cpu usage https://review.openstack.org/613674 | 19:35 |
clarkb | Shrews: interesting, that could be | 19:35 |
openstackgerrit | Merged openstack-infra/system-config master: Adding openstack-browbeat https://review.openstack.org/613022 | 19:36 |
*** lpetrut has quit IRC | 19:37 | |
openstackgerrit | Merged openstack-infra/zuul master: quick-start: add a note about github https://review.openstack.org/613398 | 19:42 |
openstackgerrit | Jeremy Stanley proposed openstack-infra/zuul master: Add reenqueue utility https://review.openstack.org/613676 | 19:44 |
*** jamesmcarthur has joined #openstack-infra | 19:46 | |
clarkb | mentioned this over in the tc channel but got some simple scripting going to determine nodepool node usage rates by project | 19:48 |
clarkb | http://paste.openstack.org/show/733154/ produced by https://review.openstack.org/613674 | 19:48 |
clarkb | the breakdown is tripleo: ~50% of all cpu time, neutron: ~14% and nova ~5% | 19:48 |
clarkb | for the 13 hour period I scraped the logs for | 19:48 |
Shrews | 50%? wow. i wonder what percentage of our nodes that ends up being | 19:51 |
clarkb | Shrews: 50% | 19:51 |
Shrews | oh, i misunderstood what you meant by cpu time | 19:51 |
clarkb | ya sorry. the calculation is job runtime * number of nodes used | 19:52 |
clarkb | and that is 50% of what we used not 50% of theoretical max (though I think we were behind the entire 13 hour period so should be the same) | 19:52 |
*** kjackal has joined #openstack-infra | 19:55 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator https://review.openstack.org/604648 | 20:03 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Rate limit updateNodeStats https://review.openstack.org/613680 | 20:03 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Rate limit updateNodeStats https://review.openstack.org/613680 | 20:09 |
fungi | clarkb: thanks, sounds like it roughly matches up with what we expected | 20:10 |
clarkb | fungi: ya no surprises for me other than nova and neutron are lower than they were a year ish a go when I ran the numbers | 20:11 |
clarkb | but they still seem to be right near the top | 20:11 |
clarkb | fungi: the other piece of info I find interesting is kolla and osa vs tripleo | 20:11 |
clarkb | (are they not testing enough or is tripleo just incredibly inefficient, maybe both) | 20:11 |
clarkb | also 374 days of testing in 13 hours | 20:12 |
*** ansmith_ has quit IRC | 20:12 | |
clarkb | notmyname also pointed out that activity is likely to factor in. Particularly so since I only had a small window of data | 20:12 |
clarkb | I think if we can look at a months worth over say the month of november we'll have a much better overall picture | 20:13 |
fungi | yeah, this is really early to be drawing detailed conclusions | 20:15 |
fungi | also what will be more interesting is not the snapshot but the trends over time | 20:15 |
fungi | is that 50% decreasing? and how quickly?that will be interesting to | 20:15 |
fungi | find out | 20:15 |
clarkb | yup | 20:16 |
clarkb | and whether or not it syncs up with the release cycle in interesting ways | 20:16 |
clarkb | or is random etc | 20:16 |
fungi | i mean, they're already aware it's a concern and are working to improve the situation. now we can tell them how effective their attempts are to that end | 20:16 |
fungi | which is far more interesting to me than blamethrowing and witch hunts | 20:16 |
anticw | clarkb: there is no way to have an external bot +2 in which case zuul doesn't need to test? | 20:19 |
clarkb | anticw: not in the current system. Zuul is our gatekeeper and it doesn't know how to share those duties with another system | 20:20 |
clarkb | (I think that is intentional fwiw not a bug) | 20:20 |
clarkb | the reason for that is zuul has to ensure that the changes going through it don't break zuul itself | 20:20 |
anticw | clarkb: would it be perverse then to have the zuul job check an external gate for status and short-circuit to OK? | 20:23 |
clarkb | anticw: it might be better to try and undersatnd what is you are trying to do more concretely? What does the third party job do? Does it have to be third party? etc | 20:24 |
anticw | openstack helm jobs are involved and take a long time, i'm asking people if we can do some of this work and avoid hitting the gates so hard | 20:24 |
anticw | there are also sometimes quite long delays before a job will run (3-4 hours isn't uncommon) | 20:24 |
*** zul has quit IRC | 20:25 | |
anticw | concretely, i'm looking at zuul right now for 613611,1 for example, we're 5 hours 14 minutes into it | 20:26 |
clarkb | anticw: that is relevant to the discusion fungi and I were just having above. In the last 13 hours openstack-helm was .8% of our resource consumption | 20:26 |
clarkb | anticw: put another way helm isn't hitting the gates so hard | 20:26 |
clarkb | (so we shouldn't expect helm moving third party to change the backlog situation dramatically) | 20:27 |
*** kjackal has quit IRC | 20:27 | |
anticw | ok good to know ... so are we doing things poorly that are causing delays? these delays aren't new | 20:27 |
anticw | also, we get a lot of post-failures | 20:27 |
anticw | (it's better in the last week after some refactoring but still not very fast) | 20:27 |
clarkb | anticw: no the delays are due to total demand, we have a fixed number of test resources and people trying to test far more than we can keep with (tripleo is ~50% over the last 13 hours for example) | 20:28 |
*** armax has joined #openstack-infra | 20:28 | |
clarkb | anticw: the ways to improve the backlog are either to reduce demand (fix bugs in software to reduce gate resets and number of failures that are "invalid") and to increase the number of resources we have | 20:28 |
*** jamesmcarthur has quit IRC | 20:29 | |
clarkb | anticw: are you having post failures due to timeouts? | 20:29 |
anticw | sometimes, unclear why things are slow | 20:29 |
clarkb | anticw: examples would be good if you have them because post failures can happen for a number of reasons | 20:29 |
anticw | https://review.openstack.org/#/c/613356/ | 20:30 |
anticw | just taking a recent job | 20:30 |
anticw | also, things run slower than i would expect ... if i run the jobs on a VM locally ... on very old hardware (very old) and slow rotating disks, the gate jobs for me run in about half the time | 20:30 |
fungi | anticw: the thread starting at http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-October/000575.html is also probably relevant to your concerns | 20:30 |
anticw | sometimes less | 20:30 |
anticw | testing the gate jobs in aws and azure the timing is even better than my local test | 20:31 |
clarkb | anticw: do you know what sort of resources you are cosntrainted by? are you running dstat or similar so that we can see what the hold up is? | 20:31 |
fungi | anticw: are your slow-running jobs reliant on nested virtualization performance, perhaps? | 20:31 |
clarkb | anticw: fwiw kata switch from azure to vexxhost (one of our providers) and the runtime went in half? something like that | 20:31 |
anticw | fungi: no | 20:31 |
clarkb | anticw: likely important to identify what the resource contention is if we want to improve it | 20:31 |
anticw | clarkb: i'm guessing we're badly IO limited in some but not all cases | 20:32 |
anticw | certainly some builder infra (as identified by hostname which might be bogus) seem worse than others | 20:32 |
clarkb | anticw: the last time this came up with OSH I had asked for more logging and data like dstat. Any idea if we have that? Its easy to point and say "this is bad aws better" but I can't make that actionable | 20:32 |
anticw | clarkb: we have 'more logging' but i don't know that it's enough to pin point it just yet | 20:33 |
anticw | srwilkers: ^ ? | 20:33 |
*** ykarel|away has quit IRC | 20:34 | |
anticw | not entirely useful but https://pastebin.com/HGKCEGgJ is a grep from the job running locally | 20:35 |
anticw | that shows the job ran in about 15 minutes ... again ... on a VM on pretty old (2010) hardware | 20:35 |
clarkb | anticw: that particular post failure appears to be due to one of the instances not being reachable at the end. It appears that the job failed properly earlier in the job due to mariadb not starting (possibly beacuse it was supposed to run on the non responsive isntance?) | 20:35 |
anticw | using the aformentioned url i think that took 1h 2 mins on a gate | 20:36 |
clarkb | anticw: ish, but it timed out waiting for a thing to happen that never happend. I don't think that timeout was due to slowness but isntead due to network communication problems | 20:36 |
clarkb | (still not good, but important to identify the issue) | 20:36 |
anticw | clarkb: ok, network issues is something that's been pointed out before | 20:36 |
anticw | i'm not really sure what those would be ... and why some builders wouuld have them | 20:37 |
anticw | again, i tried aws and azure as reference points and for the most part they were rock solid and considerably faster (2x to 4x) | 20:37 |
clarkb | right and I'm asking you to help us idenfity why that is the case so that we can hopefully improve the situation with our zuul | 20:38 |
anticw | yeah, so if it's networking ... what do you suggest to help there? | 20:38 |
fungi | network connectivity between instances in some cloud providers can vary in quality, for sure. that's been one of our biggest challenges for overall reliability of jobs | 20:38 |
anticw | fungi: ok, but ... we're not doing a lot of networking | 20:38 |
clarkb | anticw: I don't know that networking was slow. It appears networking didn't work at all for one of your 5 instances. Those two issues may be orthogonal to each other | 20:38 |
anticw | and networking between VMs in providers isn't a new thing | 20:38 |
fungi | um, yes i'm quite aware | 20:39 |
anticw | clarkb: others have claimed networking issues as well | 20:39 |
fungi | is your problem a new thing? | 20:39 |
anticw | no, not new | 20:39 |
clarkb | fungi: no we tried debugging this a while back | 20:39 |
clarkb | I asked for logging and never got any | 20:39 |
fungi | okay, just figuring out what you mean by "networking between VMs in providers isn't a new thing" | 20:39 |
anticw | just the number of checks required has increased so transient failures bite more now | 20:39 |
clarkb | because unfortunately we werent' logging why the containers were failing to start | 20:39 |
clarkb | just that they had failed | 20:39 |
fungi | anticw: and this is a 5-node job? | 20:39 |
anticw | fungi: i'm saying whilst i accept networking might be an issue ... in this day and age that seems suprising | 20:39 |
anticw | the above example was yes | 20:40 |
fungi | oh, then prepare to be surprised | 20:40 |
fungi | cloud providers love to under^Wright-size their network gear | 20:40 |
fungi | and it gets saturated massively for some periods | 20:40 |
anticw | yeah ... but even so ... networking go super fast and super cheap ... you'd really have to put effort in to make it that poor | 20:40 |
anticw | 10GbE+ is basically free at this stage ... i have a box of 10G nics someone gave me even | 20:41 |
fungi | has nothing to do with effort making network gear slow and everything to do with noisy neighbors sharing network resources with you | 20:41 |
clarkb | ok I've confirmed the node that had sshc onnectivity issues in ara is the one that was trying to host the failed mariadb container | 20:41 |
fungi | and if cloud providers are using servers from 2010, imagine that their network gear can easily be of the same vintage | 20:41 |
anticw | clarkb: how did you verify that? | 20:41 |
anticw | fungi: i'm using 2010 hardware, i imagine they are using something less ancient | 20:42 |
fungi | ahh, in some cases they aren't (or not much newer anyway) | 20:42 |
*** smarcet has joined #openstack-infra | 20:42 | |
*** openstackstatus has quit IRC | 20:42 | |
*** openstack has joined #openstack-infra | 20:44 | |
*** ChanServ sets mode: +o openstack | 20:44 | |
clarkb | anticw: I can't use that data to say that would cause slowless (I don't know enough about your tests), but I am fairly confident that is why it failed | 20:44 |
anticw | i guess our tests take a long time | 20:44 |
anticw | which makes things worse | 20:44 |
fuentess | clarkb: hi Clark, can you help me disable the kata Fedora job for the proxy repo? We still have some issues with Fedora on vexxhost, so would be good to disable it until we resolve them | 20:45 |
anticw | more likley to have some sort of glitch somewhere the longer we run | 20:45 |
clarkb | fuentess: yup | 20:45 |
anticw | clarkb: i don't really know how to instrument cpu/io on zuul VMs but i could locally ... is that useful? | 20:46 |
clarkb | fuentess: https://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n45 is the section of code to edit. Remove the line for the fedora job | 20:46 |
clarkb | anticw: we've used dstat for a long time with things like devstack + tempest jobs | 20:46 |
fuentess | clarkb: ohh cool, thanks | 20:46 |
anticw | yeah, we could have dstat log in the background | 20:47 |
clarkb | anticw: captures io (network and disk), memory use, cpu usage etc every second iirc | 20:47 |
clarkb | anticw: and there are tools to render the data into more human friendly formats like stackviz | 20:47 |
fungi | i wonder if we even already have a dstat role for starting it early in the job and then collecting its logs | 20:47 |
clarkb | (though I think stackviz dstat rendering is broken right now) | 20:47 |
clarkb | fungi: that type of work has been ongoing in zuul land | 20:47 |
clarkb | fungi: no definitive answer yet but progress I think | 20:48 |
fungi | yeah, i could see that being generally useful across a broad variety of jobs | 20:48 |
anticw | re: multinode i could spit out a DS that does NxN network pings and have that log i guess | 20:49 |
anticw | (ping in a generic sense) | 20:49 |
anticw | people like mallanox who have their own CI ... does that also require zuul for merges? | 20:50 |
fungi | that might at least allow you to also short-circuit your job early of one of your expected nodes in the multinode set becomes unreachable | 20:50 |
anticw | (i forget where i saw this, some typo on a PS# and it popped up once) | 20:50 |
clarkb | anticw: the other thing to keep in mind is that the jobs themselves can crash the networkign on the host too (I have no idea if that is happening here) | 20:50 |
clarkb | either by updating the firewall improperly or applying new config to interfaces that won't work within a provider. We've seen both things happen with jobs in the past | 20:51 |
anticw | clarkb: how does networking on the host crash? that seems like it should be pretty rare | 20:51 |
anticw | that's the sort of thing i would expect on a c-64 | 20:51 |
fungi | "crash" is a relative term here ;) | 20:51 |
clarkb | anticw: crash in the sense it stops working not kernel panic crash | 20:51 |
clarkb | we've had jobs use invalid network ranges and apply them to the actual host interfaces | 20:52 |
clarkb | that will breakthings fast | 20:52 |
fungi | or you could do something to inadvertently flush the iptables rules suddenly blocking all traffic on that instance | 20:52 |
clarkb | we've also had jobs apply firewall rules that prevent ssh ya ^ | 20:52 |
fungi | or something could simply cause the service you're trying to talk to on that node to die and not restart | 20:52 |
anticw | we used to use a lot of memory | 20:52 |
anticw | that's better now but not ideal | 20:52 |
fungi | yes, oom killer knocking out a crucial service is not unusual | 20:53 |
anticw | our stuff is kind bloaty :( | 20:53 |
anticw | i don't know that we get oom, most just poor IO performance (lack of page cache hits) | 20:53 |
fungi | are your jobs setting up swap memory? if not, that will lead to oom faster than you expect | 20:53 |
clarkb | fungi: I think we do that in the base job now? | 20:54 |
clarkb | but it might be devstack specific? | 20:54 |
anticw | fungi: swap will cause k8s to cry, though can be worked around | 20:55 |
clarkb | usually you want swap not because you expect the job will succeed, but because swap will allow you to get the necssary data to diagnose problems that happen when memory runs out | 20:55 |
fungi | wow, really? kubernetes is allergic to swap memory? that seems strange | 20:56 |
clarkb | fungi: likely as much as anything else is like mysql or kvm | 20:56 |
clarkb | things will get really slow and stop working within timeouts | 20:56 |
fungi | i mean, obviously you don't want active tasks paging out memory they're still accessing, but it can give you breathing room for other background processes to get paged out | 20:56 |
anticw | kinda, there is a long thread/debate about it and i'm just gonna get angry if i get into it :) | 20:57 |
fungi | fair! ;) | 20:57 |
*** hashar has quit IRC | 20:57 | |
anticw | i also had someone back into me an hour ago so am a bit crouchy | 20:57 |
anticw | into my car i mean | 20:57 |
clarkb | in any case I think the short term answer here is it would be great to get more log data if possible. Understanding what resource contentions you do have so that we can at least attempt to address them would be good | 20:58 |
anticw | i like the idea of a long running dstat ... or netstat | 20:59 |
anticw | i think that would be useful | 20:59 |
anticw | and some sort of networking sanity checker | 20:59 |
anticw | it might also be we're just asking too much from the VMs and should move entirely too a 3rd party gate (if possible) | 20:59 |
clarkb | yup that is possible too, but hard to say without data like ^ | 21:00 |
anticw | clarkb: well, one old data point i that when i run a test locally it needs over 8GB ... how it even runs on the gates i'm not sure | 21:00 |
clarkb | as for third party gating I think your hack is the cloeset you will get. Zuul has to gate its config changes | 21:00 |
anticw | we would still have to wait for things to work through the queue though | 21:01 |
anticw | even if we fall out in 20s | 21:01 |
clarkb | anticw: jobs that just want to do an http request don't need to use a nodest with nodes. They can run directly on the executor | 21:01 |
clarkb | it is a very constrained environment and we use it for stuff like retrigger read the docs builds | 21:02 |
anticw | clarkb: yeah, but we might not know if we were able to test it in some cases | 21:02 |
clarkb | also tripleo has a plan to reduce test resource needs as well as make tests more reliable. Here is hoping that improves the demand sideo f things for them | 21:02 |
anticw | it would require us to have thorough and robust external gates, i was thinking more "if we can..." | 21:03 |
*** trown is now known as trown|outtypewww | 21:03 | |
openstackgerrit | Salvador Fuentes Garcia proposed openstack-infra/project-config master: Remove Fedora Job for Kata project https://review.openstack.org/613690 | 21:03 |
clarkb | fuentess: thanks, I went ahead and approved it | 21:03 |
fuentess | clarkb: thank you | 21:04 |
*** jamesdenton has quit IRC | 21:11 | |
*** fuentess has quit IRC | 21:12 | |
*** eharney has quit IRC | 21:12 | |
*** jamesmcarthur has joined #openstack-infra | 21:13 | |
fungi | wow, even the new zuul status ui is taking a while to load for me | 21:22 |
fungi | tripleo changes currently account for 75% of the changes in the gate pipeline | 21:23 |
*** jamesmcarthur has quit IRC | 21:24 | |
fungi | and roughly a third of them are indicating job failures | 21:24 |
fungi | or merge conflicts or dependency on a failed change | 21:25 |
fungi | i should say roughly a third of the changes near the top of their gate queue anyway | 21:25 |
fungi | looks like the wait for node requests in the check pipeline is a little over 5 hours at this point | 21:26 |
*** boden has quit IRC | 21:26 | |
fungi | we're hovering around 700 nodes in use at the moment | 21:33 |
fungi | with another ~100 building/deleting | 21:34 |
*** jbadiapa has quit IRC | 21:34 | |
clarkb | ya thats about right with inap disabled | 21:34 |
fungi | and still down half of ovh right? | 21:34 |
clarkb | no ovh is up | 21:36 |
clarkb | and rarely leaking ports in gra1 | 21:37 |
*** rlandy has quit IRC | 21:41 | |
*** mriedem has quit IRC | 21:52 | |
*** armax has quit IRC | 21:53 | |
*** jlvillal has joined #openstack-infra | 22:09 | |
*** ansmith_ has joined #openstack-infra | 22:16 | |
openstackgerrit | Alex Schultz proposed openstack-infra/project-config master: Add noop to instack-undercloud https://review.openstack.org/613630 | 22:19 |
*** tpsilva has quit IRC | 22:27 | |
*** bobh has joined #openstack-infra | 22:28 | |
*** armax has joined #openstack-infra | 22:59 | |
*** agopi has quit IRC | 23:02 | |
*** pcaruana has quit IRC | 23:10 | |
*** rh-jelabarre has quit IRC | 23:18 | |
*** diablo_rojo has quit IRC | 23:40 | |
*** Swami has quit IRC | 23:43 | |
*** rcernin has joined #openstack-infra | 23:43 | |
*** jesusaur has quit IRC | 23:45 | |
*** smarcet has quit IRC | 23:45 | |
*** rcernin has quit IRC | 23:51 | |
*** kgiusti has left #openstack-infra | 23:51 | |
*** gyee has quit IRC | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!