*** smarcet has quit IRC | 00:06 | |
sgw | clarkb: thanks, I knew that part I will play around some more. | 00:07 |
---|---|---|
fungi | sgw: yeah, pbr versions have a specification which is basically a mashup of the semver spec and pep 440. pbr has convenience functions to convert its versions to deb and rpm version equivalents (particularly crucial for those ecosystems "sort before" operators for pre-release versions) | 00:10 |
fungi | but it doesn't have tools to do the reverse | 00:11 |
*** mattw4 has quit IRC | 00:12 | |
*** sgw has quit IRC | 00:13 | |
*** igordc has quit IRC | 00:13 | |
clarkb | ianw: there were job failures in the kafs stack changes that got enqueued to the gate | 00:15 |
clarkb | I'm not likely to be able to look at those until tomorrow as I'll need to make dinner soonish | 00:15 |
ianw | clarkb: ok, will look ... just looking at some tcpdumps on the iad mirror | 00:16 |
*** smarcet has joined #openstack-infra | 00:16 | |
ianw | 2001:4802:7802:104:be76:4eff:fe20:4b35 > 2a04:4e42::223: [icmp6 sum ok] ICMP6, destination unreachable, unreachable prohibited 2001:4802:7802:104:be76:4eff:fe20:4b35 | 00:17 |
*** smarcet has quit IRC | 00:18 | |
ianw | that seems to suggest that our firewall (4b35) is rejecting stuff from fastly ... i wonder if we have some sort of connection tracking issue in the firewall rules | 00:18 |
openstackgerrit | Merged opendev/system-config master: Use systemd-timesyncd on Bionic https://review.opendev.org/665269 | 00:19 |
*** rfolco has joined #openstack-infra | 00:19 | |
fungi | ianw: but is the stuff from fastly related to connections we're initiating? | 00:21 |
ianw | fungi: i think yes ... that's what's so weird | 00:21 |
fungi | are you able to find where we initiated the connection? | 00:22 |
fungi | or seeing *any* tcp6 outbound to that address for that matter? | 00:22 |
ianw | 00:18:53.491978 IP6 (flowlabel 0x0403d, hlim 64, next-header TCP (6) payload length: 752) 2001:4802:7802:104:be76:4eff:fe20:4b35.43778 > 2a04:4e42::223.https: Flags [P.], cksum 0xc42c (correct), seq 3202292652:3202293372, ack 1673101527, win 42, options [nop,nop,TS val 4150605139 ecr 1505017146], length 720 | 00:23 |
fungi | okay, yep, that does look like it would be part of a connection we initiated in that case | 00:23 |
ianw | you can "telnet -6 2a04:4e42::223 80" too | 00:23 |
fungi | neat, so there are at least *some* v6 addresses with which it can have bi-directional communication | 00:24 |
ianw | http://paste.openstack.org/show/753333/ ... i don't know what that means, with the same flow label each time | 00:25 |
fungi | yeah, that does indeed suggest something about the packets we're receiving don't match expected state for established connections we've made | 00:27 |
*** ekultails has quit IRC | 00:30 | |
*** gregoryo has joined #openstack-infra | 00:34 | |
openstackgerrit | Merged opendev/storyboard master: Correct team iterator lists in worklist creation https://review.opendev.org/667248 | 00:45 |
fungi | neat, the coreboot project is running gerrit 3! https://review.coreboot.org/ | 00:46 |
fungi | about to build their flashrom project from source on openbsd (don't ask) | 00:46 |
*** dchen has quit IRC | 00:55 | |
*** dchen has joined #openstack-infra | 00:55 | |
*** dayou_ has quit IRC | 00:56 | |
*** rfolco has quit IRC | 01:33 | |
*** dayou_ has joined #openstack-infra | 01:36 | |
*** bhavikdbavishi has joined #openstack-infra | 01:47 | |
*** rcernin has quit IRC | 01:48 | |
*** rcernin has joined #openstack-infra | 01:48 | |
*** bhavikdbavishi1 has joined #openstack-infra | 01:50 | |
*** bhavikdbavishi has quit IRC | 01:51 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 01:51 | |
*** lathiat has quit IRC | 01:54 | |
*** smarcet has joined #openstack-infra | 01:56 | |
*** apetrich has quit IRC | 01:58 | |
*** hongbin has joined #openstack-infra | 02:04 | |
*** sshnaidm has quit IRC | 02:05 | |
*** whoami-rajat has joined #openstack-infra | 02:07 | |
*** bhavikdbavishi has quit IRC | 02:08 | |
*** diablo_rojo has quit IRC | 02:17 | |
*** auristor has quit IRC | 02:37 | |
*** auristor has joined #openstack-infra | 02:39 | |
*** rcernin has quit IRC | 02:42 | |
*** rcernin has joined #openstack-infra | 02:42 | |
*** armstrong has quit IRC | 02:45 | |
*** yamamoto has joined #openstack-infra | 02:50 | |
*** lathiat has joined #openstack-infra | 02:52 | |
*** aedc has quit IRC | 02:54 | |
*** yamamoto_ has joined #openstack-infra | 02:55 | |
*** yamamoto_ has quit IRC | 02:56 | |
*** yamamoto has quit IRC | 02:56 | |
*** yamamoto has joined #openstack-infra | 02:59 | |
*** yamamoto has quit IRC | 02:59 | |
*** rcernin has quit IRC | 03:04 | |
*** rcernin has joined #openstack-infra | 03:04 | |
*** bhavikdbavishi has joined #openstack-infra | 03:07 | |
*** npochet has quit IRC | 03:19 | |
*** npochet has joined #openstack-infra | 03:19 | |
*** mnaser has quit IRC | 03:19 | |
*** mnaser has joined #openstack-infra | 03:20 | |
*** gagehugo has quit IRC | 03:22 | |
*** gagehugo has joined #openstack-infra | 03:22 | |
*** gagehugo has quit IRC | 03:25 | |
*** gagehugo has joined #openstack-infra | 03:28 | |
*** ykarel|afk has joined #openstack-infra | 03:30 | |
*** rcernin has quit IRC | 03:30 | |
*** rcernin has joined #openstack-infra | 03:31 | |
*** psachin has joined #openstack-infra | 03:39 | |
*** virendra-sharma has joined #openstack-infra | 03:46 | |
*** sweston has quit IRC | 03:51 | |
*** sweston has joined #openstack-infra | 03:51 | |
*** udesale has joined #openstack-infra | 03:53 | |
*** ykarel|afk is now known as ykarel | 04:23 | |
*** hongbin has quit IRC | 04:25 | |
*** virendra-sharma has quit IRC | 04:26 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Expand documentation of test-setup role https://review.opendev.org/667244 | 04:27 |
*** virendra-sharma has joined #openstack-infra | 04:28 | |
*** ykarel has quit IRC | 04:32 | |
openstackgerrit | Merged openstack/project-config master: Revert "Revert "Revert "Disable provider limestone""" https://review.opendev.org/667250 | 04:32 |
*** sgw has joined #openstack-infra | 04:34 | |
*** yamamoto has joined #openstack-infra | 04:34 | |
*** yamamoto has quit IRC | 04:39 | |
openstackgerrit | Merged openstack/project-config master: New project request: airship/docs https://review.opendev.org/666190 | 04:39 |
*** sgw has quit IRC | 04:41 | |
*** hongbin has joined #openstack-infra | 04:42 | |
*** ykarel has joined #openstack-infra | 04:48 | |
*** hongbin has quit IRC | 04:49 | |
*** smarcet has quit IRC | 04:52 | |
*** kjackal has joined #openstack-infra | 05:03 | |
adriant | I've got zuul suddenly failing with no code changes with an error for: EnvironmentError: mysql_config not found | 05:39 |
adriant | which is usually a lack of related mysql libs installed by the OS package manager | 05:39 |
adriant | has our zuul base image changed recently? | 05:39 |
AJaeger | adriant: yes, let me find a link for you... | 05:43 |
AJaeger | adriant: http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007272.html | 05:43 |
AJaeger | adriant: create a bindep.txt file and add mysql to it, you can look up the old fallback file from https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/bindep-fallback.txt and copy what you need from it | 05:44 |
adriant | AJaeger: ty! | 05:50 |
*** jtomasek has joined #openstack-infra | 05:51 | |
adriant | AJaeger: should I mostly be worrying about [platform:dpkg] as that's what zuul is built on, or should i try and cover most of the bases? | 05:54 |
*** pcaruana has joined #openstack-infra | 05:56 | |
*** pcaruana has quit IRC | 05:57 | |
*** pcaruana has joined #openstack-infra | 05:57 | |
openstackgerrit | Merged zuul/zuul-jobs master: Add install-devstack role https://review.opendev.org/667157 | 05:58 |
openstackgerrit | Merged zuul/zuul-jobs master: Expand documentation of test-setup role https://review.opendev.org/667244 | 05:58 |
AJaeger | adriant: I would copy gentoo, rpms as well - better now to have it in then hunt down a problem later | 06:00 |
*** lpetrut has joined #openstack-infra | 06:01 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/667268 | 06:08 |
openstackgerrit | Merged opendev/system-config master: Role integration-tests : use a group match for openafs https://review.opendev.org/665585 | 06:19 |
*** pgaxatte has joined #openstack-infra | 06:26 | |
openstackgerrit | Merged opendev/system-config master: Use openstack-ci-core PPA for openafs 1.8.3 https://review.opendev.org/665320 | 06:29 |
*** e0ne has joined #openstack-infra | 06:29 | |
openstackgerrit | Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/667268 | 06:35 |
*** slaweq has joined #openstack-infra | 06:38 | |
*** rcernin has quit IRC | 06:41 | |
*** altlogbot_0 has quit IRC | 06:46 | |
openstackgerrit | Merged opendev/system-config master: Separate openafs CI mirror https://review.opendev.org/665568 | 06:47 |
*** altlogbot_1 has joined #openstack-infra | 06:49 | |
*** altlogbot_1 has quit IRC | 06:50 | |
*** dpawlik has joined #openstack-infra | 06:55 | |
*** altlogbot_2 has joined #openstack-infra | 06:55 | |
*** e0ne has quit IRC | 06:57 | |
openstackgerrit | Adam Coldrick proposed opendev/storyboard master: Correct team iterator lists in board creation https://review.opendev.org/667275 | 06:58 |
*** jpich has joined #openstack-infra | 07:07 | |
*** tesseract has joined #openstack-infra | 07:11 | |
*** ginopc has joined #openstack-infra | 07:14 | |
yoctozepto | we have issues with Zuul atm - http://zuul.openstack.org/builds?project=openstack%2Fkolla&project=openstack%2Fkolla-ansible&result=retry_limit | 07:14 |
yoctozepto | almost everything is in retry_limit either telling us to finger or that connections to all slave nodes were lost :/ | 07:15 |
yoctozepto | this is for different branches and pipelines | 07:16 |
AJaeger | yoctozepto: when did this start? | 07:18 |
AJaeger | yoctozepto: might be that a binary is missing, please read http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007272.html | 07:19 |
yoctozepto | AJaeger: seems like during the night | 07:19 |
AJaeger | yoctozepto: the changes mentioned merged after 19:00 UTC last night... | 07:20 |
yoctozepto | AJaeger: https://review.opendev.org/663151 ? | 07:21 |
AJaeger | do you use fetch zuul-cloner? | 07:22 |
AJaeger | I was thinking about https://review.opendev.org/#/c/656195/ | 07:22 |
*** xek has joined #openstack-infra | 07:22 | |
AJaeger | yoctozepto: did you follow the log files and were able to catch content? | 07:23 |
yoctozepto | AJaeger: yeah, thought about it, now I am connected to some that I believe would fail | 07:24 |
yoctozepto | running fine so far | 07:24 |
*** hrw has joined #openstack-infra | 07:25 | |
*** yboaron_ has joined #openstack-infra | 07:26 | |
yoctozepto | see, some jobs still passed during that time | 07:26 |
*** iurygregory has joined #openstack-infra | 07:27 | |
AJaeger | could also be another problem... | 07:28 |
*** tosky has joined #openstack-infra | 07:28 | |
*** jtomasek has quit IRC | 07:31 | |
*** jtomasek has joined #openstack-infra | 07:32 | |
*** Emine has joined #openstack-infra | 07:33 | |
*** hrw has left #openstack-infra | 07:35 | |
*** yboaron_ has quit IRC | 07:36 | |
*** yboaron_ has joined #openstack-infra | 07:36 | |
*** iurygregory has quit IRC | 07:37 | |
*** virendra-sharma has quit IRC | 07:38 | |
*** ykarel is now known as ykarel|lunch | 07:41 | |
*** ccamacho has joined #openstack-infra | 07:41 | |
*** ccamacho has quit IRC | 07:41 | |
*** sshnaidm has joined #openstack-infra | 07:42 | |
*** ccamacho has joined #openstack-infra | 07:43 | |
*** jpena|off is now known as jpena | 07:48 | |
*** jpena is now known as jpena|mtg | 07:48 | |
*** apetrich has joined #openstack-infra | 07:54 | |
*** udesale has quit IRC | 08:01 | |
*** e0ne has joined #openstack-infra | 08:01 | |
*** udesale has joined #openstack-infra | 08:03 | |
*** udesale has quit IRC | 08:03 | |
*** udesale has joined #openstack-infra | 08:03 | |
*** ricolin has joined #openstack-infra | 08:05 | |
*** yboaron_ has quit IRC | 08:07 | |
yoctozepto | AJaeger: all the jobs I watched completed fine, seems to no longer be a problem, but nevertheless it failed many of those night jobs | 08:07 |
*** dchen has quit IRC | 08:09 | |
*** lucasagomes has joined #openstack-infra | 08:11 | |
*** pkopec has joined #openstack-infra | 08:12 | |
*** ricolin has quit IRC | 08:13 | |
*** yboaron_ has joined #openstack-infra | 08:21 | |
*** ricolin has joined #openstack-infra | 08:22 | |
*** tkajinam has quit IRC | 08:27 | |
*** tkajinam has joined #openstack-infra | 08:28 | |
*** ralonsoh has joined #openstack-infra | 08:28 | |
*** yboaron_ has quit IRC | 08:28 | |
*** yboaron_ has joined #openstack-infra | 08:29 | |
*** tkajinam has quit IRC | 08:29 | |
*** noama has joined #openstack-infra | 08:30 | |
AJaeger | yoctozepto: strange ;/ | 08:32 |
*** gregoryo has quit IRC | 08:33 | |
*** ykarel|lunch is now known as ykarel | 08:40 | |
*** imacdonn has quit IRC | 08:42 | |
*** imacdonn has joined #openstack-infra | 08:42 | |
*** priteau has joined #openstack-infra | 08:56 | |
*** jaosorior has joined #openstack-infra | 09:03 | |
*** yolanda has joined #openstack-infra | 09:07 | |
*** gfidente has joined #openstack-infra | 09:09 | |
*** jaosorior has quit IRC | 09:11 | |
yoctozepto | AJaeger: and here it comes again, Zuul again tells us to finger :P | 09:12 |
*** dciabrin__ has joined #openstack-infra | 09:23 | |
*** dciabrin_ has quit IRC | 09:27 | |
openstackgerrit | jacky06 proposed openstack/os-testr master: Replace git.openstack.org URLs with opendev.org URLs https://review.opendev.org/655062 | 09:33 |
*** kobis1 has joined #openstack-infra | 09:36 | |
*** jangutter_ has joined #openstack-infra | 09:43 | |
*** noonedeadpunk has quit IRC | 09:44 | |
*** bhavikdbavishi has quit IRC | 09:47 | |
*** panda has quit IRC | 09:55 | |
*** gmann has quit IRC | 09:57 | |
*** salv-orlando has joined #openstack-infra | 09:58 | |
*** panda has joined #openstack-infra | 10:00 | |
*** Lucas_Gray has joined #openstack-infra | 10:02 | |
*** ociuhandu has joined #openstack-infra | 10:13 | |
*** yamamoto has joined #openstack-infra | 10:15 | |
*** ociuhandu_ has joined #openstack-infra | 10:16 | |
*** ociuhandu has quit IRC | 10:16 | |
*** yamamoto has quit IRC | 10:17 | |
*** virendra-sharma has joined #openstack-infra | 10:22 | |
*** tdasilva_ has quit IRC | 10:24 | |
*** Lucas_Gray has quit IRC | 10:27 | |
*** kobis1 has quit IRC | 10:28 | |
*** Lucas_Gray has joined #openstack-infra | 10:29 | |
*** ykarel is now known as ykarel|meeting | 10:35 | |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134 | 10:36 |
*** gfidente has quit IRC | 10:37 | |
*** kobis1 has joined #openstack-infra | 10:42 | |
*** lucasagomes has quit IRC | 10:56 | |
*** lucasagomes has joined #openstack-infra | 10:57 | |
*** gmann has joined #openstack-infra | 10:58 | |
*** ykarel_ has joined #openstack-infra | 10:59 | |
*** jaosorior has joined #openstack-infra | 10:59 | |
*** ykarel_ has quit IRC | 11:00 | |
*** ykarel has joined #openstack-infra | 11:01 | |
*** ykarel|meeting has quit IRC | 11:01 | |
*** jaosorior has quit IRC | 11:02 | |
*** jaosorior has joined #openstack-infra | 11:03 | |
*** e0ne has quit IRC | 11:05 | |
*** e0ne has joined #openstack-infra | 11:05 | |
*** udesale has quit IRC | 11:06 | |
*** udesale has joined #openstack-infra | 11:08 | |
*** ykarel_ has joined #openstack-infra | 11:09 | |
*** Lucas_Gray has quit IRC | 11:10 | |
*** Wryhder has joined #openstack-infra | 11:10 | |
*** Wryhder is now known as Lucas_Gray | 11:11 | |
AJaeger | yoctozepto: this needs an infra-root to dig into - and more information probably | 11:11 |
*** ykarel has quit IRC | 11:12 | |
*** dciabrin_ has joined #openstack-infra | 11:13 | |
*** iurygregory has joined #openstack-infra | 11:15 | |
*** dciabrin__ has quit IRC | 11:17 | |
*** gfidente has joined #openstack-infra | 11:23 | |
*** tobiash has joined #openstack-infra | 11:25 | |
*** hwoarang has quit IRC | 11:29 | |
*** hwoarang has joined #openstack-infra | 11:30 | |
openstackgerrit | Merged opendev/storyboard master: Correct team iterator lists in board creation https://review.opendev.org/667275 | 11:36 |
*** priteau has quit IRC | 11:43 | |
*** tdasilva has joined #openstack-infra | 11:46 | |
*** _erlon_ has joined #openstack-infra | 11:48 | |
*** psachin has quit IRC | 11:54 | |
*** jamesdenton has quit IRC | 11:54 | |
zbr|ruck | so does anyone have an idea about what happens with the RETRY_LIMIT? | 11:55 |
*** bhavikdbavishi has joined #openstack-infra | 11:56 | |
pabelanger | zbr|ruck: https://zuul-ci.org/docs/zuul/user/jobs.html#build-status | 11:56 |
*** bhavikdbavishi1 has joined #openstack-infra | 11:59 | |
*** goldyfruit has joined #openstack-infra | 12:00 | |
yoctozepto | pabelanger: good one, made me laugh | 12:00 |
*** bhavikdbavishi has quit IRC | 12:00 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 12:00 | |
openstackgerrit | Andy Ladjadj proposed zuul/zuul master: [doc][monitoring] Fix the wait_time parent attribute https://review.opendev.org/667342 | 12:00 |
yoctozepto | zbr|ruck probably means about the current state of Zuul | 12:00 |
pabelanger | yoctozepto: do you have a log file? | 12:00 |
yoctozepto | which discards most jobs with it | 12:00 |
pabelanger | oh, let me check | 12:01 |
openstackgerrit | Andy Ladjadj proposed zuul/zuul master: [doc][monitoring] Fix the wait_time parent attribute https://review.opendev.org/667342 | 12:01 |
yoctozepto | and shows us a finger url | 12:01 |
pabelanger | just coming online, with on coffee :) | 12:01 |
yoctozepto | probably the middle one | 12:01 |
yoctozepto | nasty piece of software | 12:01 |
yoctozepto | :D | 12:01 |
pabelanger | looking now | 12:01 |
yoctozepto | pabelanger: thanks | 12:02 |
fungi | i'm not really around yet either, but i think we need to take limestone back offline: http://cacti.openstack.org/cacti/graph_view.php | 12:03 |
fungi | logan-: ^ | 12:03 |
fungi | i'll get a revert of the revert of the revert of... pushed up now | 12:03 |
pabelanger | so https://zuul.openstack.org/config-errors is the first issue | 12:04 |
pabelanger | there are a few config errors | 12:04 |
pabelanger | but, that doesn't seem to be the source of the issues | 12:04 |
*** Lucas_Gray has quit IRC | 12:04 | |
pabelanger | if somebody wants to look into that | 12:04 |
openstackgerrit | Andy Ladjadj proposed zuul/zuul master: [doc][monitoring] Fix the wait_time parent attribute https://review.opendev.org/667342 | 12:04 |
openstackgerrit | Jeremy Stanley proposed openstack/project-config master: Revert "Revert "Revert "Revert "Disable provider limestone"""" https://review.opendev.org/667343 | 12:05 |
*** goldyfruit has quit IRC | 12:05 | |
pabelanger | http://logs.openstack.org/09/665909/1/gate/openstack-tox-pylint/fc556d6/job-output.txt.gz#_2019-06-25_11_35_32_614312 | 12:05 |
fungi | and i'll bypass zuul for that since there's a good chance it won't erge | 12:05 |
pabelanger | looks like fall out of bindep being removed^ | 12:05 |
AJaeger | cloudnull: please finish the retirement of openstack/ansible-role-tripleo-cookiecutter, I commented on https://review.opendev.org/#/c/664895 | 12:05 |
pabelanger | that is trove | 12:05 |
AJaeger | fungi, +2 | 12:06 |
openstackgerrit | Merged openstack/project-config master: Revert "Revert "Revert "Revert "Disable provider limestone"""" https://review.opendev.org/667343 | 12:07 |
AJaeger | pabelanger: indeed, trove has no bindep.txt file.... | 12:07 |
fungi | well, having no bindep.txt should be fine as long as your jobs install the right packages already. if you need any additional system packages not installed by your jobs, then a bindep.txt is likely warranted (and i'm guessing that's trove's situation) | 12:08 |
AJaeger | fungi: yes, exactly - pabelanger linked to a line that uses mysql which is not installed by default | 12:08 |
AJaeger | zuul.openstack.org looks broken to me - stays in "Fetching info..." with no output | 12:09 |
pabelanger | Hmm, is zuul.o.o not working for anybody else? | 12:09 |
pabelanger | AJaeger: yah, I see that too | 12:10 |
AJaeger | pabelanger: yes, me | 12:10 |
pabelanger | zuul.o.o is swapping | 12:10 |
*** Lucas_Gray has joined #openstack-infra | 12:10 | |
pabelanger | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64794&rra_id=all | 12:10 |
pabelanger | looks like a memory leak | 12:11 |
pabelanger | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all | 12:11 |
AJaeger | that could explain some retries as well, wouldn't it? | 12:11 |
pabelanger | infra-root: I won't be able to deal with zuul.o.o outage, sadly. But looks like we need to restart | 12:11 |
AJaeger | fungi, the RETRY_LIMIT might come from misbehaving Zuul as well, won't they? | 12:12 |
elod | hi, I've also noticed that a lot of periodic jobs failed today with missing 'mysqladmin' and 'dot' commands. Is a bindep.txt with mysql-client and/or graphviz a good solution for this issues? what do you suggest? | 12:12 |
*** rfolco has joined #openstack-infra | 12:12 | |
AJaeger | elod: let me paste backscroll... | 12:13 |
AJaeger | elod: http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007272.html | 12:13 |
AJaeger | elod: create a bindep.txt file and add mysql and graphviz to it, you can look up the old fallback file from https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/bindep-fallback.txt and copy what you need from it | 12:13 |
AJaeger | So, yes, this is exactly the required solution, elod | 12:14 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134 | 12:15 |
elod | AJaeger: thanks, i've read that, just wanted to reassure | 12:15 |
elod | AJaeger: then I'll start fixing and testing | 12:16 |
*** virendra-sharma has quit IRC | 12:16 | |
AJaeger | thanks, elod ! | 12:18 |
*** salv-orlando has quit IRC | 12:19 | |
*** tdasilva has quit IRC | 12:20 | |
*** tdasilva has joined #openstack-infra | 12:25 | |
openstackgerrit | Andreas Jaeger proposed openstack/project-config master: Fix retirement of ansible repos https://review.opendev.org/667346 | 12:26 |
AJaeger | pabelanger: this should fix the Zuul errors you noticed ^ | 12:26 |
*** rlandy has joined #openstack-infra | 12:29 | |
*** goldyfruit has joined #openstack-infra | 12:33 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Remove non working tests/base.py ZuulTestCase.getPipeline method https://review.opendev.org/667351 | 12:40 |
*** ykarel_ is now known as ykarel|away | 12:48 | |
*** jcoufal has joined #openstack-infra | 12:50 | |
*** mriedem has joined #openstack-infra | 12:50 | |
openstackgerrit | Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove job legacy-puppet-beaker-rspec https://review.opendev.org/667357 | 12:50 |
*** jangutter_ is now known as jangutter | 12:51 | |
portdirect | hey - we have been having some issues with our docs jobs over the last day | 12:54 |
portdirect | eg: http://logs.openstack.org/24/667224/2/check/openstack-tox-docs/fa79d3d/job-output.txt.gz#_2019-06-24_20_50_55_524899 | 12:55 |
portdirect | looks like gettext is missing? | 12:55 |
AJaeger | portdirect, this could be a fallout from http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007272.html | 12:55 |
AJaeger | So, create a bindep.txt file and add gettext to it, you can look up the old fallback file from https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/bindep-fallback.txt and copy what you need from it | 12:56 |
portdirect | awesome - thanks AJaeger | 12:57 |
*** ykarel|away has quit IRC | 13:01 | |
*** smarcet has joined #openstack-infra | 13:01 | |
*** ekultails has joined #openstack-infra | 13:03 | |
*** dave-mccowan has joined #openstack-infra | 13:06 | |
*** aaronsheffield has joined #openstack-infra | 13:09 | |
*** lseki has joined #openstack-infra | 13:10 | |
*** sthussey has joined #openstack-infra | 13:10 | |
*** kjackal has quit IRC | 13:14 | |
*** kjackal has joined #openstack-infra | 13:14 | |
*** salv-orlando has joined #openstack-infra | 13:16 | |
*** whoami-rajat has quit IRC | 13:16 | |
*** dave-mccowan has quit IRC | 13:18 | |
*** bdodd has quit IRC | 13:22 | |
zbr|ruck | pabelanger: fungi : https://review.opendev.org/#/c/667346/ zuul config error fix, ok to merge? ... 30min to lint it, less cool. | 13:29 |
fungi | yeah, i approved it | 13:30 |
fungi | i'm mostly trying to understand the memory profiling mechanism corvus restarted the zuul scheduler onto so i can determine whether (and how) stats should be extracted from it before performing an emergency restart | 13:31 |
fungi | since i expect this is the exact memory leak we've been trying to catch | 13:31 |
fungi | unfortunately searching past discussions for "repl" is challenging since it's also a substring of words like "replication" | 13:35 |
*** mriedem is now known as mriedem_afk | 13:38 | |
*** brett-soric has joined #openstack-infra | 13:39 | |
*** bhavikdbavishi has quit IRC | 13:39 | |
fungi | looks like it's been incorporated by cherry-picking https://review.opendev.org/579962 | 13:39 |
fungi | and then turned on with the rpc client by running `zuul repl` | 13:40 |
*** Goneri has joined #openstack-infra | 13:40 | |
fungi | and listens on localhost:3000/tcp | 13:40 |
fungi | which is currently listening | 13:41 |
openstackgerrit | Merged openstack/project-config master: Fix retirement of ansible repos https://review.opendev.org/667346 | 13:41 |
fungi | so i take that to mean it's active | 13:41 |
*** munimeha1 has joined #openstack-infra | 13:42 | |
fungi | this unfortunately doesn't tell me how he was hoping to use it to diagnose the memory leak, so still digging | 13:42 |
*** eharney has joined #openstack-infra | 13:44 | |
fungi | mention in #zuul a while back of using it to call objgraph.show_backrefs() on various objects (though i don't see him say which ones) | 13:45 |
fungi | another mention there of "enter python commands to inspect the memory state and use objgraph to dump it to a file" | 13:46 |
*** bdodd has joined #openstack-infra | 13:46 | |
fungi | at this point corvus will likely be around soon, so probably best if we can just hold off restarting the scheduler until he's on hand since i've pulled about all i can from past discussions and docs | 13:47 |
fungi | odds are he already has at least some suspicions as to which objects are candidates for bloat | 13:48 |
*** openstackgerrit has quit IRC | 13:48 | |
cloudnull | AJaeger on it | 13:51 |
AJaeger | thanks, cloudnull | 13:51 |
*** openstackgerrit has joined #openstack-infra | 13:53 | |
openstackgerrit | Simon Westphahl proposed zuul/nodepool master: wip: Allow proceeding with requests on quota exceeded https://review.opendev.org/667371 | 13:53 |
openstackgerrit | Simon Westphahl proposed zuul/nodepool master: wip: Allow proceeding with requests on quota exceeded https://review.opendev.org/667371 | 13:55 |
*** yamamoto has joined #openstack-infra | 13:55 | |
*** kjackal has quit IRC | 13:57 | |
*** kjackal has joined #openstack-infra | 13:57 | |
*** aedc has joined #openstack-infra | 14:01 | |
*** Goneri has quit IRC | 14:03 | |
*** ykarel has joined #openstack-infra | 14:06 | |
*** michael-beaver has joined #openstack-infra | 14:10 | |
clarkb | fungi: likely the configuration objects, but I couldnt tell you how to get at those with objgraph | 14:12 |
roman_g | Hello team. Is there a cached golang distro repository somewhere in OpenInfra I can re-use? Or may be fresh golang installed onto some of the images I could utilize via Zuul? | 14:13 |
AJaeger | mordred: want to abandon https://review.opendev.org/641474? We should merge https://review.opendev.org/667228 instead... | 14:14 |
roman_g | Need golang v1.12. Ubuntu provides older versions. | 14:14 |
clarkb | roman_g: there isnt | 14:15 |
roman_g | clarkb: 123MB. Is this fine? | 14:16 |
fungi | have you checked to see what versions fedora-30 or opensuse-tumbleweed provide> | 14:16 |
clarkb | fine in what context? | 14:16 |
roman_g | If I'd download it on each gate :) | 14:16 |
*** Goneri has joined #openstack-infra | 14:16 | |
roman_g | fungi: will check, good idea | 14:16 |
fungi | we only have one gate. guessing you mean in each build | 14:17 |
roman_g | yes, sorry | 14:17 |
fungi | (zuul is our gate) | 14:17 |
roman_g | fungi: fedora 30 has 1.12. Thank you! | 14:19 |
corvus | clarkb, pabelanger, fungi: it's probably too late to get any useful info from zuul | 14:19 |
corvus | you really need to do that before it starts swapping | 14:19 |
corvus | otherwise you'll just spend 10 hours waiting for it to swap all the memory back in to scan for objects | 14:19 |
fungi | unfortunately it looks like it started swapping well after i was asleep | 14:19 |
fungi | okay, so just dump copies of the queues and restart? i'll give #openstack-release a heads up to pause approvals | 14:20 |
*** mriedem_afk is now known as mriedem | 14:20 | |
corvus | looking at the graph, probably anytime after june 20th would have been interesting | 14:20 |
fungi | i let them know | 14:22 |
corvus | who is taking point on debugging this? | 14:22 |
fungi | so what commit should we install before restarting? | 14:22 |
fungi | i can set myself a daily reminder to check the memory graph for early signs of the leak | 14:22 |
fungi | though i'll be offline most of next week | 14:22 |
corvus | i'll go ahead and see if i can get a rough object count at least | 14:23 |
corvus | but i kind of thought someone else was taking this one | 14:24 |
fungi | i can, just not sure what i'm looking for (or at) | 14:24 |
roman_g | fedora-30 image isn't yet available here, I think https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/nodesets.yaml | 14:25 |
roman_g | fedora-29 is available | 14:25 |
corvus | i don't either ... i start from first principles every time :) | 14:25 |
fungi | fair | 14:25 |
*** sgw has joined #openstack-infra | 14:25 | |
fungi | once we've got it restarted i'll familiarize myself with how to list objects and what the object graph output looks like | 14:25 |
fungi | i could probably stand to learn a bit about python's memory management anyway | 14:26 |
corvus | this is where i learned most of what i know: https://mg.pov.lt/objgraph/ | 14:26 |
fungi | thanks, that looks like an excellent resource | 14:27 |
*** brett-soric has left #openstack-infra | 14:27 | |
*** dpawlik has quit IRC | 14:27 | |
corvus | fungi: i started a screen on zuul01 | 14:28 |
corvus | as root | 14:28 |
fungi | and i've joined it | 14:28 |
fungi | ahh, neat, so it can list object types by frequency of presence in memory? | 14:29 |
corvus | yep | 14:29 |
corvus | that's going to cause some swapping, but maybe it'll finish relatively fast... like <30m? | 14:29 |
fungi | and chances are even this may not return in any reasonable amount of time due to memory pressure? | 14:30 |
fungi | yeah, figured | 14:30 |
roman_g | fedora-29 has golang 1.11 only, not 1.12. | 14:30 |
corvus | yeah, it's possible. we should think about how much time we should give it before we give up | 14:30 |
corvus | the more complex stuff, like actually tracing object links would certainly take many hours at this point | 14:31 |
roman_g | What is the process to add fedora-30 image to the Zuul? | 14:31 |
fungi | roman_g: i saw someone mention working on adding fedora-30, not sure who it was or if that's close to done yet though... have you checked opensuse-tumbleweed? that should be present (and current) | 14:31 |
fungi | corvus: how about until now? ;) | 14:31 |
*** salv-orlando has quit IRC | 14:31 | |
corvus | fungi: heh, i would have given it a few more mins even :) | 14:32 |
clarkb | roman_g: fungi another option is to usr a golang docker container and we'll cache the layer objects | 14:32 |
fungi | indeed, i was originally about to say 14:45 | 14:32 |
fungi | so we have 35239933 mappingproxy objects? | 14:32 |
fungi | not familiar with that object type | 14:32 |
roman_g | clarkb: yes, that's also a good option. | 14:33 |
corvus | fungi: it's a read-only dict | 14:33 |
roman_g | thank you, clarkb | 14:33 |
fungi | ahh, so doesn't really tell us much | 14:33 |
fungi | used for internal representation of things like class attributes | 14:33 |
corvus | fungi: but we mostly use it in zuul's config, so it's an indication that, somehow, zuul configuration objects are involved. | 14:33 |
fungi | oh, good to know | 14:34 |
fungi | anyway, if nothing else, we have a baseline object count to compare against after we reach a steady state following the restart (for example tomorrow)? | 14:34 |
corvus | let's see if that returns in any reasonable time | 14:35 |
fungi | i'll get all this into a paste for comparison against the coming days | 14:36 |
fungi | or do we already have an etherpad going for the memory leak investigation prior to today? | 14:36 |
*** Goneri has quit IRC | 14:36 | |
*** aedc has quit IRC | 14:37 | |
corvus | fungi: not that i'm aware of, but here are all my notes from previous investigations: | 14:37 |
corvus | https://etherpad.openstack.org/p/zuul-memory-leak | 14:37 |
corvus | i just dumped them in there | 14:37 |
fungi | oh, that wfm | 14:38 |
corvus | unfortunately, it's context free, but there are a bunch of potentially useful functions | 14:38 |
fungi | i see that | 14:38 |
*** lpetrut has quit IRC | 14:38 | |
corvus | i'm stepping through that first function now | 14:38 |
corvus | i'm going to rework that for multi-tanancy real quick | 14:40 |
fungi | so we don't really have a lot of layouts in flight, doesn't look like? | 14:40 |
corvus | right | 14:40 |
corvus | highly suggestive of leaked layouts | 14:40 |
corvus | i want to get the numbers per tenant to see if it's consistent across tenants | 14:41 |
fungi | ahh | 14:41 |
corvus | maybe our use of multi-tenancy is the behavior change which triggered tihs | 14:41 |
*** panda has quit IRC | 14:41 | |
clarkb | that may also explain why we are seeing it when others don't seem to be | 14:43 |
*** panda has joined #openstack-infra | 14:43 | |
fungi | an interesting theory | 14:43 |
*** yamamoto has quit IRC | 14:45 | |
*** ykarel is now known as ykarel|afk | 14:45 | |
*** yamamoto has joined #openstack-infra | 14:45 | |
clarkb | unrelated, but one of the things on my catch up todo list is to clean up the nodepool control plane image build stuff further. Any idea if mordred will be around at some point (as his eyeballs on that cleanup would be helpful) | 14:45 |
corvus | zero seems an unusually small number of layouts to have for a tenant | 14:45 |
fungi | yeah | 14:46 |
corvus | clarkb: his email on the subject said he didn't know what this week would be like, but he thought he would be working. so, i think it's "maybe this week, should be back by next" | 14:46 |
fungi | so it's saying *all* the layouts are in the openstack tenant? | 14:46 |
corvus | yeah, that seems unright | 14:46 |
clarkb | corvus: thanks | 14:46 |
corvus | fungi: oh, i see | 14:46 |
corvus | fungi: we're only looking for layouts for enqueued items, so it just means there's nothing running in the other tenants | 14:47 |
*** yamamoto has quit IRC | 14:47 | |
corvus | presumably each tenant also still has its own currently running layout, we're just not counting it there | 14:47 |
fungi | got it. this is not the non-speculative laouts | 14:47 |
fungi | because we're iterating over items in queues there to ge that count | 14:48 |
corvus | so we've leaked 350 layouts (or, really, maybe 346)... | 14:48 |
fungi | how much memory would we expect those to consume? | 14:48 |
*** igordc has joined #openstack-infra | 14:49 | |
corvus | handful of mbytes each | 14:49 |
corvus | maybe more than a handfull | 14:49 |
corvus | i forgot to say "print(tenant)" | 14:50 |
*** Goneri has joined #openstack-infra | 14:50 | |
corvus | i'm not sure what it's doing now | 14:50 |
fungi | eep | 14:50 |
fungi | loading the tenant object before it can try to execute it (and find out it's not callable)? | 14:50 |
corvus | yeah, i would have thought it would still be in memory | 14:50 |
*** pgaxatte has quit IRC | 14:51 | |
corvus | we're heavily swapping now | 14:53 |
fungi | oof | 14:53 |
corvus | fungi: i replaced the method in the etherpad with the more tenant-aware version | 14:54 |
fungi | maybe that's why repl has paused on us, and it's unrelated to the "tenant" function-not-function | 14:54 |
corvus | yeah could be | 14:54 |
corvus | i also stuck in a thing to add in the non-speculative layouts for each tenant so they are correctly accounted for | 14:55 |
corvus | the idea is that at the end of this function, we have a handle to all of the leaked layout objects | 14:55 |
fungi | ooh, thanks! | 14:55 |
*** udesale has quit IRC | 14:55 | |
corvus | in the past, at that point, i've picked one at random, and started having objgraph render graphs around it to try to figure out who's holding on to it | 14:55 |
corvus | apparently by "at random" i mean "the first one in the list" | 14:56 |
fungi | what are the odds we're still going to be able to dump copies of the gate/check pipeline contents at this point? probably have to wait for the paging to calm down again? | 14:56 |
*** rakhmerov has joined #openstack-infra | 14:56 | |
clarkb | fungi: we should have the "historical" recorded versions of the status | 14:56 |
fungi | oh, right | 14:56 |
corvus | fungi: yes... now that you mention it, it looks like lines 1-29 are basically a procedure for this situation | 14:56 |
fungi | the ones we dump to disk | 14:57 |
clarkb | fungi: ya those | 14:57 |
rakhmerov | hi, just joined the channel and want to make sure that you're aware of issues with CI | 14:57 |
rakhmerov | a lot of RETRY_LIMIT statuses | 14:57 |
corvus | fungi: and some of those lines say "save the queues" | 14:58 |
fungi | rakhmerov: yep, we're about to restart the zuul scheduler for that reason | 14:58 |
rakhmerov | ok, thanks | 14:58 |
fungi | (or at least we assume the RETRY_LIMIT results are related to heavy swapping from this memory leak) | 14:58 |
*** munimeha1 has quit IRC | 14:59 | |
AJaeger | fungi: so, want to renable limestone? | 14:59 |
AJaeger | (after some time...) | 14:59 |
fungi | AJaeger: no, i expect the network connectivity issues in limestone are unrelated to this | 15:00 |
fungi | i saw lots of packet loss between cacti and the mirror there | 15:00 |
corvus | fungi: what's our budget for waiting around on this? think we should restart now? | 15:01 |
fungi | probably time to cut our losses on it, yes | 15:02 |
fungi | oh! and it returned | 15:02 |
corvus | ha it returned | 15:02 |
fungi | ;) | 15:02 |
fungi | okay, so same result | 15:03 |
corvus | yeah, i put the non-dynamic layouts in the sched_layouts variable only though | 15:04 |
fungi | so we have 3 of those i guess? | 15:04 |
corvus | so we're down to 348 leaked layouts | 15:04 |
*** yboaron_ has quit IRC | 15:04 | |
corvus | i saved the queues | 15:05 |
openstackgerrit | Hervé Beraud proposed openstack/pbr master: Fix parsing on egg names with dashes from git URLs https://review.opendev.org/648727 | 15:05 |
corvus | fungi: if we're lucky, we'll get a graph out of that one | 15:05 |
corvus | fungi: or it could take forever due to swapping | 15:05 |
corvus | fungi: i'll let you decide how long to wait :) | 15:06 |
*** bexelbie has joined #openstack-infra | 15:07 | |
*** jpena|mtg is now known as jpena|off | 15:11 | |
fungi | i'm also on a conference call now, so trying to juggle that | 15:11 |
fungi | okay, my updates are out of the way so can focus on this more again | 15:12 |
fungi | sorry about that | 15:12 |
fungi | still trying to record the useful values in a paste as well for comparison to post-restart values in the coming weeks | 15:14 |
openstackgerrit | Hervé Beraud proposed openstack/pbr master: Fix parsing on egg names with dashes from git URLs https://review.opendev.org/648727 | 15:16 |
cloudnull | Hey all - is there a way to limit concurrency for a set of jobs? | 15:16 |
cloudnull | We're adding jobs to tripleo-ansible for the roles we add to the repo [https://github.com/openstack/tripleo-ansible/blob/master/zuul.d/molecule.yaml#L4-L18] | 15:16 |
clarkb | cloudnull: you can have jobs wait on the results of other job(s) | 15:16 |
cloudnull | To make sure we're not running everything all the time we have the files filter set [https://github.com/openstack/tripleo-ansible/blob/master/zuul.d/molecule.yaml#L37-L38] which results in us limiting what runs when things change. | 15:17 |
*** lpetrut has joined #openstack-infra | 15:17 | |
cloudnull | However, we also want to make sure we're not creating a situation where we have N roles and someone makes some change that hits all of them resulting in N+ jobs, all trying to schedule/run at once. | 15:17 |
cloudnull | So, is there a way that we could add some kind of a guard to ensure we're only ever running something like 5 jobs at a time? | 15:17 |
corvus | cloudnull: if a change touches N roles, why not run all the jobs? | 15:19 |
cloudnull | corvus I'd want it to run all of them, just not all at once | 15:19 |
corvus | cloudnull: oh, you mean just to be nice to the rest of the system? | 15:20 |
* cloudnull trying to be a good zuul-izen | 15:20 | |
corvus | cloudnull: i wouldn't fret too much -- the scheduling algorithm is relatively fair | 15:20 |
cloudnull | fair enough. | 15:21 |
cloudnull | clarkb do you have a link to docs on setting up waits, I think we should do that anyway. | 15:21 |
*** tdasilva has quit IRC | 15:21 | |
corvus | (and jobs which have been waiting on other jobs actually get a bonus in the scheduler, so that they get a node faster, so trying to stage them out like that could be counter-productive) | 15:22 |
*** ccamacho has quit IRC | 15:22 | |
*** yamamoto has joined #openstack-infra | 15:23 | |
corvus | (the idea is that if a job waited on the completion of another job, it shouldn't have to wait again -- it's already served it's time waiting, so it jumps to the head of the queue) | 15:23 |
*** lpetrut has quit IRC | 15:23 | |
cloudnull | TIL - thanks corvus! | 15:24 |
clarkb | cloudnull: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.requires to block on resources https://zuul-ci.org/docs/zuul/user/config.html#attr-job.dependencies to dep on jobs directly | 15:24 |
*** kobis1 has quit IRC | 15:25 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: URLTrigger driver time based https://review.opendev.org/635567 | 15:26 |
*** xek has quit IRC | 15:26 | |
*** e0ne has quit IRC | 15:26 | |
cloudnull | Tyvm | 15:28 |
*** yamamoto has quit IRC | 15:28 | |
corvus | fungi: should we give up? | 15:29 |
fungi | a perennial question | 15:29 |
fungi | i guess we're at ~1 hr of poking, so probably a good time to put a pin in it here and see what we can get next round when things aren't slowed by swapping | 15:30 |
fungi | so should we restart on tip of master with the repl change cherry-picked again? | 15:30 |
corvus | fungi: yes, then +3 https://review.opendev.org/579962 :) | 15:31 |
corvus | fungi: we could probably just re-install from my last checkout | 15:31 |
fungi | yep, i was wanting to take a closer look at it and then approve | 15:31 |
corvus | i don't think we need anything newer really | 15:31 |
fungi | what was your last checkout? i couldn't find a reference in the channel log | 15:32 |
corvus | ~corvus/zuul | 15:32 |
fungi | not in /opt/zuul presumably | 15:32 |
*** igordc has quit IRC | 15:32 | |
fungi | aha, thanks | 15:32 |
fungi | shall i pip install that now? | 15:32 |
corvus | just a 'sudo pip3 install ~corvus/zuul" should do it. yeah | 15:32 |
clarkb | and python3 /usr/local/bin/pbr freeze | grep zuul to confirm | 15:33 |
fungi | done | 15:33 |
zbr|ruck | corvus: fungi: regarding job dependencies, i was trying to use them but apparently i cannot practically implement it due to lack of wilcards. | 15:33 |
fungi | yep, zuul==3.8.2.dev142 # git sha a90f1c3 | 15:33 |
*** iurygregory has quit IRC | 15:33 | |
clarkb | zbr|ruck: the requires attribute I linked above may be more flexible | 15:34 |
zbr|ruck | i ended up creating two feature request in zuul: https://storyboard.openstack.org/#!/story/2005952 and https://storyboard.openstack.org/#!/story/2005951 -- not sure how hard to implement. | 15:34 |
fungi | should we try to dump the pipeline contents again, or just use the one already on disk? | 15:34 |
*** xek has joined #openstack-infra | 15:34 | |
clarkb | zbr|ruck: then jobs can provide some "resource" that may just be abstract and jobs can depend on that. But I think it requires a bit more work to get that in place | 15:34 |
zbr|ruck | mainly I want to split jobs into 3 waves: fast, medium, slow. | 15:35 |
corvus | fungi: i have one from 30m ago in ~root, or we could see if there's something in the archive | 15:35 |
clarkb | corvus: fungi I'm guessing the one from 30 minutes ago is probably up to date enough based on my inability to get a status | 15:35 |
fungi | corvus: how long did it take to return when you ran it? | 15:36 |
corvus | zbr|ruck, clarkb: is this the thing where people say "i want to run the fast jobs first, then the slow ones only if they succeed"? that's been asked a thousand times and every time we've said "please don't do that" | 15:36 |
fungi | but yeah, we can just use that one i guess | 15:36 |
corvus | fungi: was fast | 15:36 |
clarkb | corvus: yes I think so | 15:36 |
fungi | oh, if it was fast... trying to do another dump now as check_new.sh and gate_new.sh | 15:36 |
mwhahaha | is zuul.openstack.org unresponsive for anyone else? | 15:36 |
corvus | clarkb: provides/requires is for inter-change communication, it won't create dependencies between jobs within a change. | 15:37 |
fungi | i'll give it a minute and if it doesn't return we can use the existing check.sh and gate.sh | 15:37 |
corvus | fungi: ++ | 15:37 |
clarkb | corvus: oh I thought it was used for waiting for our images to build within a chnage too | 15:37 |
corvus | clarkb: nope, still need job.dependencies for that | 15:37 |
clarkb | corvus: the whole run image build and registry, then pause, and start the other jobs. Gotcha | 15:37 |
zbr|ruck | corvus: but why? i hear people here complaining about tripleo using lots of resources, and I see zuul waiting 4h to report a failure on linting, .... while also running deployments. It makes no sense for me to waste resources this way. | 15:37 |
*** jaosorior has quit IRC | 15:38 | |
clarkb | zbr|ruck: because we tried ti and it resulted in a lot more churn | 15:38 |
*** jaosorior has joined #openstack-infra | 15:38 | |
clarkb | zbr|ruck: instead of one patchset, fix errors its now a patchset for each class of error | 15:39 |
fungi | okay, status dumps are not returning yet. i'll cancel these and do the service restart | 15:39 |
clarkb | and in aggregate that consumes more resources. | 15:39 |
corvus | zbr|ruck: it wastes less developer time -- you can see all the things that are wrong with a change at once | 15:39 |
*** Emine has quit IRC | 15:39 | |
yoctozepto | guys, how are RETRY_LIMITs feeling? :D | 15:40 |
fungi | corvus: i have a feeling i'm going to need to kill the scheduler while the call in the repl is still going | 15:40 |
fungi | any objection? | 15:40 |
corvus | fungi: nope | 15:40 |
clarkb | fungi: you may have to remove the pidfile if you do that (but no objection) | 15:40 |
zbr|ruck | corvus: i really doubt it saves developer time if the developer have to way extra 2h to get the response from zuul that the linting jobs failed 1h 50m before sending the response. | 15:40 |
clarkb | zbr|ruck: they can also run tox locally or check zuul status directly | 15:40 |
clarkb | zbr|ruck: it doesn't actually take 4 hours to discover that data | 15:41 |
corvus | zbr|ruck: yeah, i mean, presumably they have something else to do during that time :) | 15:41 |
fungi | okay, cleanly stopped | 15:41 |
fungi | starting again now | 15:41 |
fungi | well, not *cleanly* stopped but i was able to introduce a segfault in the child and parent | 15:42 |
zbr|ruck | ..yeah, I everyone should install greasemonkey scripts or other tools/tricks in order to read CI status, not sure if that counts as best UX. | 15:42 |
fungi | since they were ignoring sigterm | 15:42 |
corvus | zbr|ruck: if the author has nothing better to do, then, yes, watch the status page. if the author has something better to do, then in 4 hours they will find out everything that's wrong with the patch. | 15:43 |
clarkb | zbr|ruck: I check it just fine in the existing ui... what is greasemonkey providing (or in another way what do you think is missing) | 15:43 |
yoctozepto | oh, come on, don't ignore me, I have something even better: if you could take a look at http://logs.openstack.org/08/665908/1/check/kolla-build-ubuntu-binary/886f200/ we have discrepancy between what ara says and what is in the txt log as if Zuul ran two instances in parallel? | 15:43 |
*** xek has quit IRC | 15:43 | |
corvus | yoctozepto: i don't think anyone meant to ignore you | 15:44 |
corvus | it's just that we're in the middle of a service restart | 15:44 |
fungi | okay, scheduler has started again | 15:44 |
yoctozepto | corvus: it's ok, I seem unable to express the joyful tone in my messages ;D | 15:44 |
fungi | i had to manually remove the pidfile since the way in which it died did not remove it | 15:44 |
fungi | corvus: do i need to wait for the cat jobs to complete before reenqueuing? | 15:45 |
corvus | fungi: best do, yes, otherwise it may timeout | 15:45 |
clarkb | fungi: yes | 15:45 |
fungi | cool, just making sure | 15:45 |
clarkb | what happens if yo udon't wait is all the enqueue commands that run before the config update fail | 15:46 |
corvus | yoctozepto: what's the discrepancy? | 15:46 |
*** kjackal has quit IRC | 15:47 | |
*** kjackal has joined #openstack-infra | 15:47 | |
zbr|ruck | clarkb: the script is showing me the build status before zuul reports them (xx%, which one failed), so the gerrit user is not forced to visit another webpage to figureout what is happening with his change. I am referring to that script https://github.com/openstack/coats/blob/master/coats/openstack_gerrit_zuul_status.user.js | 15:47 |
fungi | looks like the cat jobs just completed | 15:47 |
corvus | yoctozepto: oh, "sudo mkdir /opt/kolla_registry" ? | 15:48 |
yoctozepto | corvus: yeah | 15:48 |
clarkb | zbr|ruck: so there is actually (disabled) code in our gerrit js to do that for you. It required some updates to the zuul api to request change specific statuses which I think exists now. So if someone wanted to work on reenabling that code path we could do that | 15:48 |
fungi | okay, reenqueunig now | 15:49 |
yoctozepto | it worked just fine in txt | 15:49 |
yoctozepto | and ara shows failure | 15:49 |
yoctozepto | as if it ran once more | 15:49 |
clarkb | zbr|ruck: the last time we did it you had to pull the entire status json blob for all changes for each change and that overwhelmed the zuul server | 15:49 |
yoctozepto | (concurrently probably) | 15:49 |
corvus | yoctozepto: this is "fun" | 15:49 |
* clarkb finds a link | 15:49 | |
*** kobis1 has joined #openstack-infra | 15:49 | |
yoctozepto | corvus: the world of software is | 15:50 |
yoctozepto | I like good mysteries | 15:50 |
yoctozepto | Agatha Christie and stuff but this is much more :P | 15:50 |
*** kobis1 has quit IRC | 15:51 | |
zbr|ruck | maybe I am dreaming but my impression is that a developer wants a fail-fast strategy on checks, so he is informed about the first failure, this being more important than knowing about getting all failures. maybe I need to make a survey to get a better picture about expectations. | 15:51 |
corvus | yoctozepto: the log url (which includes part of the build uuid) is the same in the txt and ara... | 15:51 |
clarkb | zbr|ruck: https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/files/gerrit/hideci.js#L386-L388 | 15:52 |
clarkb | zbr|ruck: the zuul_inline var is hardcoded to false currently | 15:52 |
*** mattw4 has joined #openstack-infra | 15:52 | |
yoctozepto | corvus: yup | 15:52 |
yoctozepto | from my perspective there is nothing I can do about it :D | 15:53 |
corvus | zbr|ruck: fyi, zuul has a fail-fast option for pipelines. we have chosen not to enable it for openstack. | 15:53 |
clarkb | for me at least iterative work is what I do locally | 15:53 |
clarkb | then CI produces a set of results for the author and their reviewers | 15:53 |
*** hamzy has quit IRC | 15:54 | |
clarkb | giving as complete a picture as possible so that the code can be improved | 15:54 |
zbr|ruck | corvus: out of curiosity, could it be enable for tripleo (if people would want)? | 15:55 |
corvus | zbr|ruck: if you would like to experiment with that in one project, i think that's worth a discussion (please don't enable it without getting some agreement with the opendev team first). | 15:55 |
fungi | it's per-pipeline | 15:55 |
*** rajinir has joined #openstack-infra | 15:55 | |
fungi | right? | 15:55 |
corvus | fungi: actually, i think it's per project-pipeline | 15:55 |
fungi | ahh | 15:55 |
corvus | https://zuul-ci.org/docs/zuul/user/config.html#attr-project.%3Cpipeline%3E.fail-fast | 15:55 |
fungi | yep, so it is | 15:56 |
zbr|ruck | to be clear: i am asking all these because I am trying to lower the amount of resources we use. | 15:56 |
fungi | i don't object to a structured experiment in that direction, as long as it's not just turned on with little discussion and then forgotten | 15:56 |
fungi | and with a plan in place to figure out how to measure the impact | 15:56 |
corvus | yeah, i think that's key | 15:57 |
fungi | making changes arbitrarily without knowing how we expect to measure the result is what i would rather not see | 15:57 |
corvus | we've studied this long and hard over the years, and the last time we looked at it, we came to the conclusion that fail-fast was counter-productive for our community | 15:57 |
clarkb | we can measure ci resource usage fairly easily per job/project/logical project. The issue is going to be month over month there are many other variables that we'd hvae to account for to make that useful | 15:58 |
corvus | so a controlled experiment to determine if that's changed could be a good idea | 15:58 |
fungi | and increased resource consumption the last time we tried, if memory serves | 15:58 |
corvus | zbr|ruck: so maybe write up an experiment proposal, with how you'll measure the outcomes? | 15:59 |
fungi | it may also be that the way some projects in openstack have a culture which makes this option more or less useful, so we should consider the results with that in mind | 15:59 |
zbr|ruck | fungi: ok, I believe you. in that case probably the dependency approach would be a better fit, right? | 15:59 |
*** ginopc has quit IRC | 15:59 | |
zbr|ruck | if I make some heavy jobs to start only after the basic ones fail, it should be easier to measure the outcomes. | 16:00 |
fungi | zbr|ruck: well, with the dependency approach you're going to want to measure what the overall increase in result delays ends up being, in addition to the number of times jobs end up getting rerun | 16:00 |
clarkb | I think if we honestly feel linters are a huge concern we should consider a culture change to run linters locally before pushing | 16:00 |
clarkb | it is easy to do and if the impact is large that seems like a no brainer | 16:00 |
clarkb | (to take that specific example) | 16:01 |
fungi | okay, all recorded changes have been reenqueued in gate and check. i'll give #openstack-release the all-clear signal | 16:01 |
zbr|ruck | is not only about linters, when I say linters I usually mean cheap-jobs (all those that usually take <10-25m to run). | 16:02 |
*** udesale has joined #openstack-infra | 16:03 | |
*** igordc has joined #openstack-infra | 16:04 | |
fungi | #status log restarted all of zuul on commit 3b52a71ff2225f03143862c36224e18f90a7cfd0 (with repl cherry-picked on scheduler) | 16:05 |
zbr|ruck | my guestimate is that with a *little* bit of deps we could reduce the T to get reply from gerrit, even if the effective run of the chain could be bit longer. The benefit would be that jobs should wait less time in queue before they effectively start. (less time because overall resource usage will be less). | 16:05 |
openstackstatus | fungi: finished logging | 16:05 |
*** roman_g has quit IRC | 16:06 | |
clarkb | zbr|ruck: related to throughput time, fixing flakyness in jobs will always have a major impact due to how the gate pipelining works | 16:07 |
*** Lucas_Gray has quit IRC | 16:07 | |
*** diablo_rojo has joined #openstack-infra | 16:07 | |
clarkb | fungi: you restarted the executors and mergers too? | 16:07 |
fungi | oh, no i did not | 16:08 |
fungi | #status log CORRECTION: restarted zuul scheduler on commit 3b52a71ff2225f03143862c36224e18f90a7cfd0 (with repl cherry-picked on scheduler) | 16:08 |
corvus | clarkb: sphinx problem affected old glean change too; i'll do that patch | 16:08 |
openstackstatus | fungi: finished logging | 16:08 |
clarkb | corvus: ok | 16:08 |
fungi | clarkb: thanks for pointing that out | 16:09 |
fungi | that's what i get for copying from channel history | 16:09 |
openstackgerrit | James E. Blair proposed opendev/glean master: Add .zuul.yaml https://review.opendev.org/667211 | 16:11 |
openstackgerrit | James E. Blair proposed opendev/glean master: Replace nodepool func jobs https://review.opendev.org/667225 | 16:11 |
openstackgerrit | James E. Blair proposed opendev/glean master: Pin sphinx https://review.opendev.org/667398 | 16:11 |
*** chandankumar is now known as raukadah | 16:13 | |
fungi | infra-root: for the record, these are the numbers we got out of the repl today prior to the scheduler restart: http://paste.openstack.org/show/753375/ | 16:15 |
corvus | clarkb: can you take a look at the 2 failing jobs in https://review.opendev.org/667221 ? trusty and bionic failed to ssh into the node, but xenial succeeded (along with centos and fedora)... should i recheck and see if that's a fluke, or do you think there could be something to that? | 16:15 |
clarkb | I'll look | 16:16 |
corvus | fungi: cool, so i think next time, i would just run through the etherpad, lines 3-31; ideally when there's been an increase in memory usage but before swapping | 16:17 |
corvus | and compare numbers with what you have there | 16:17 |
fungi | yep, will do. thanks again! | 16:18 |
fungi | also i approved the repl addition | 16:18 |
fungi | i had already basically disected that change to figure out how it works | 16:18 |
corvus | (and, honestly, if the number of "leaked" layouts is <5, i'd say the situation is suspect -- like, that could just be a natual delta if we caught it during a gate reset or something. i'm certain with 349 leaked layouts there would be a problem visible to us, eventually) | 16:18 |
*** tdasilva has joined #openstack-infra | 16:19 | |
clarkb | corvus: they each managed to build the ubuntu images successfully according to the logs | 16:19 |
corvus | clarkb: yeah, i can't guess why ssh might not be answering :/ | 16:20 |
fungi | corvus: one thing i'm noticing... starting the scheduler with 579962 cherry-picked leaves the repl listening on 3000 by default. is that intended even without invoking the `zuul repl` rpc client command? | 16:21 |
clarkb | corvus: could be the boot time is too slow for our timeout | 16:21 |
clarkb | corvus: qemu being slow | 16:21 |
corvus | fungi: i think i cherry-picked an old version | 16:21 |
fungi | oh, that makes sense | 16:21 |
clarkb | boot timeout is set to 600 seconds which I would hope is plenty | 16:21 |
clarkb | but ya maybe bump that to 15 minutes and see if it becomes more reliable? Also we could grab the qemu logs | 16:22 |
*** tdasilva has quit IRC | 16:22 | |
corvus | clarkb: ooh, i like the idea of getting more logs... any idea how to do that? | 16:22 |
clarkb | one sec | 16:23 |
fungi | corvus: confirmed, looks like you fetched refs/changes/62/579962/3 last, so that's before the toggle was added | 16:23 |
clarkb | corvus: so grabbing the stuff out of the journal like devstack jobs do may be useful (though I don't think it will help with this particular issue) | 16:24 |
clarkb | and then the recursive contents of /var/log/libvirt will give us the qemu instance logs | 16:25 |
corvus | clarkb: yeah, if there's a boot log for any vms, that would be useful (but we're not debugging openstack, so i don't think i want all the openstack service logs) | 16:25 |
corvus | cool, i'll grab /var/log/libvirt | 16:25 |
clarkb | corvus: ya from journald we probably want the equivalent of the kernel log and syslog | 16:25 |
corvus | clarkb: got a command handy for that? | 16:27 |
corvus | clarkb: well, this is on bionic which still writes syslog, right? | 16:27 |
clarkb | if rsyslog is installed (which I think it is) then yes | 16:28 |
clarkb | but I can link to the journalctl command too one sec | 16:28 |
clarkb | https://opendev.org/openstack/devstack/src/branch/master/roles/export-devstack-journal/tasks/main.yaml#L20-L32 | 16:28 |
*** lucasagomes has quit IRC | 16:32 | |
*** ricolin has quit IRC | 16:33 | |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job https://review.opendev.org/665023 | 16:35 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs https://review.opendev.org/667156 | 16:35 |
corvus | clarkb: can you take a look at the post playbook in 665023 real quick? | 16:35 |
clarkb | corvus: another thing tocheck, what user is nodepool attempting to use? I see we use the dib devuser element to set up a user which defaults to a username of 'devuser' | 16:35 |
corvus | make sure that looks right before we wait for the whole run :) | 16:35 |
*** tesseract has quit IRC | 16:35 | |
clarkb | corvus: you might need a become: true for the libvirt log sync | 16:36 |
*** tesseract has joined #openstack-infra | 16:36 | |
*** tdasilva has joined #openstack-infra | 16:37 | |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job https://review.opendev.org/665023 | 16:37 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs https://review.opendev.org/667156 | 16:37 |
corvus | clarkb: good catch, thx | 16:37 |
clarkb | that looks good with latest ps | 16:38 |
clarkb | now to track down the ssh user used | 16:38 |
corvus | clarkb: i think nodepool doesn't use a user, it just connects to the port and gets the host key | 16:39 |
corvus | the check.sh script (which the job runs after the node is up) logs in as root | 16:39 |
clarkb | gotcha | 16:39 |
clarkb | and glean is expected to set that root user ssh key so that makes sense (to cover that works) | 16:40 |
clarkb | and config drive is set to true so that all looks good | 16:40 |
*** pkopec has quit IRC | 16:42 | |
clarkb | zbr|ruck: gate resets like those caused by http://logs.openstack.org/47/666747/1/gate/tripleo-ci-centos-7-containers-multinode/4c17ceb/logs/undercloud/home/zuul/undercloud_install.log.txt.gz likely consume more resources than any other situation | 16:45 |
openstackgerrit | James E. Blair proposed openstack/diskimage-builder master: Replace nodepool func jobs https://review.opendev.org/667221 | 16:45 |
clarkb | that caused jobs for ~18 changes to be discarded and restarted | 16:45 |
openstackgerrit | James E. Blair proposed opendev/glean master: Replace nodepool func jobs https://review.opendev.org/667225 | 16:45 |
*** ramishra has quit IRC | 16:46 | |
*** panda has quit IRC | 16:46 | |
*** panda has joined #openstack-infra | 16:48 | |
openstackgerrit | James E. Blair proposed opendev/glean master: Replace nodepool func jobs https://review.opendev.org/667225 | 16:48 |
clarkb | http://logs.openstack.org/72/666672/2/gate/puppet-openstack-lint/3cc866e/job-output.txt.gz#_2019-06-25_16_45_08_545046 too, that will need to be reparented to legacy-base | 16:50 |
clarkb | zbr|ruck: fungi ^ fyi | 16:50 |
clarkb | (or alternative to legacy-base is stop using zuul-cloner) | 16:51 |
*** sthussey has quit IRC | 16:52 | |
*** smarcet has quit IRC | 16:52 | |
*** aedc has joined #openstack-infra | 16:53 | |
*** jpich has quit IRC | 16:56 | |
*** igordc has quit IRC | 16:56 | |
*** tesseract has quit IRC | 16:59 | |
*** igordc has joined #openstack-infra | 16:59 | |
fungi | yep, i agree | 17:01 |
*** yamamoto has joined #openstack-infra | 17:06 | |
openstackgerrit | Alex Schultz proposed zuul/zuul master: Additional note about branches for implied-branches https://review.opendev.org/667415 | 17:07 |
*** udesale has quit IRC | 17:07 | |
openstackgerrit | Merged zuul/zuul master: Add command processor to zuul-web https://review.opendev.org/666307 | 17:09 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Extend event reporting https://review.opendev.org/662134 | 17:10 |
*** raukadah is now known as chandankumar | 17:11 | |
*** hamzy has joined #openstack-infra | 17:12 | |
*** chandankumar is now known as raukadah | 17:13 | |
*** smarcet has joined #openstack-infra | 17:15 | |
*** ralonsoh has quit IRC | 17:20 | |
*** jtomasek has quit IRC | 17:21 | |
openstackgerrit | Kevin Carter (cloudnull) proposed openstack/project-config master: Remove retired repos https://review.opendev.org/667418 | 17:21 |
cloudnull | AJaeger - https://review.opendev.org/#/q/topic:retire-role+(status:open) - I think that does it. Mind giving it a once over to make sure i'm not missing anything? -cc mwhahaha | 17:22 |
openstackgerrit | Merged zuul/zuul master: Add repl server for debug purposes https://review.opendev.org/579962 | 17:22 |
AJaeger | cloudnull: commented - you need some more changes... | 17:25 |
openstackgerrit | Merged opendev/system-config master: Remove dead link from 'paste' documentation https://review.opendev.org/666474 | 17:28 |
openstackgerrit | Kevin Carter (cloudnull) proposed openstack/project-config master: Remove retired repos https://review.opendev.org/667418 | 17:35 |
*** mattw4 has quit IRC | 17:38 | |
openstackgerrit | Kevin Carter (cloudnull) proposed openstack/project-config master: Remove retired repos https://review.opendev.org/667418 | 17:39 |
*** ociuhandu has joined #openstack-infra | 17:41 | |
openstackgerrit | Kevin Carter (cloudnull) proposed openstack/project-config master: Remove retired repos https://review.opendev.org/667418 | 17:41 |
cloudnull | sorry about the spam. | 17:41 |
cloudnull | AJaeger I think that's all of it? | 17:43 |
*** ociuhandu_ has quit IRC | 17:44 | |
*** mattw4 has joined #openstack-infra | 17:45 | |
*** ociuhandu has quit IRC | 17:45 | |
*** eernst has joined #openstack-infra | 17:45 | |
*** ykarel|afk has quit IRC | 17:45 | |
*** smarcet has quit IRC | 17:49 | |
*** gfidente has quit IRC | 17:59 | |
*** guimaluf has joined #openstack-infra | 18:10 | |
*** sgw has left #openstack-infra | 18:13 | |
*** smarcet has joined #openstack-infra | 18:13 | |
*** smarcet has left #openstack-infra | 18:15 | |
AJaeger | cloudnull: one more cleanup request... | 18:21 |
fungi | logan-: not sure if you saw in scrollback earlier, but the network issue in limestone seems to have returned as of 09:00 utc today: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64934 | 18:35 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Filter out unprotected branches from builds if excluded https://review.opendev.org/666664 | 18:37 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Filter out unprotected branches from builds if excluded https://review.opendev.org/666664 | 18:39 |
mwhahaha | what's the remediation for missing zuul-cloner? i'm reading the email but not sure what the replacement action was supposed to be. looks like the puppet-openstack stable jobs might be broken because we were still using it | 18:39 |
clarkb | mwhahaha: either reparent to legacy-base job (instead of default "base") or use the repos in /home/zuul/src that are already cloned there by zuul | 18:40 |
mwhahaha | k | 18:40 |
clarkb | the legacy-base job installs the zuul-cloner shim | 18:40 |
clarkb | as compat for devstack-gate and other tools that may need it and we don't want to update | 18:40 |
openstackgerrit | Kevin Carter (cloudnull) proposed openstack/project-config master: Remove retired repos https://review.opendev.org/667418 | 18:41 |
* mwhahaha hopes this isn't terrible to patch across all the table branches | 18:41 | |
openstackgerrit | Kevin Carter (cloudnull) proposed openstack/project-config master: Removed unused ACL https://review.opendev.org/667437 | 18:42 |
fungi | and just wait until you get to the chair branches | 18:44 |
fungi | ;) | 18:44 |
* mwhahaha flips tables | 18:44 | |
* mwhahaha throws chairs | 18:44 | |
fungi | that's how i usually deal with things | 18:45 |
*** ianychoi_ has quit IRC | 18:49 | |
cloudnull | thanks AJaeger! | 18:49 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Filter out unprotected branches from builds if excluded https://review.opendev.org/666664 | 18:50 |
*** ianychoi_ has joined #openstack-infra | 18:50 | |
*** aedc has quit IRC | 18:54 | |
clarkb | 5 minutes to our meeting over in #openstack-meeting | 18:55 |
*** kjackal has quit IRC | 19:04 | |
*** e0ne has joined #openstack-infra | 19:07 | |
*** mriedem has quit IRC | 19:15 | |
*** mriedem has joined #openstack-infra | 19:16 | |
*** mattw4 has quit IRC | 19:18 | |
*** lpetrut has joined #openstack-infra | 19:18 | |
*** jtomasek has joined #openstack-infra | 19:19 | |
*** lpetrut has quit IRC | 19:23 | |
*** tdasilva has quit IRC | 19:31 | |
*** e0ne has quit IRC | 19:31 | |
*** pcaruana has quit IRC | 19:38 | |
*** factor has joined #openstack-infra | 19:39 | |
*** icarusfactor has quit IRC | 19:40 | |
*** tosky has quit IRC | 19:40 | |
clarkb | fungi: SotK the storyboardclient issue is a name mismatch | 19:47 |
clarkb | the module installs as python-storyboardclient but we lookup the version for storyboardclient | 19:47 |
fungi | ahh | 19:47 |
fungi | so we need an explicit string passed in for the module name | 19:47 |
fungi | good eye | 19:47 |
clarkb | ya that already exists, it just doesn't reflect the reality of what is installed so pbr can't find the metadata outside of a git repo | 19:48 |
fungi | we do that in other projects that have similar situations | 19:48 |
pabelanger | ianw: just reading backscoll on reboot issue, where is the playbook that does the reboot on the production server? | 20:00 |
clarkb | fungi: https://opendev.org/opendev/python-storyboardclient/src/branch/master/storyboardclient/__init__.py#L19 is where it breaks. If you change that string to python-storyboardclient it works | 20:00 |
ianw | fungi: the trick is for the systemd/daemon idea getting https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/library/zuul_console.py into a daemon on the host. i guess we'd have to install it so ansible can find it, and then the systemd job would be a ansible-playbook to localhost? | 20:01 |
clarkb | fungi: I think https://opendev.org/opendev/python-storyboardclient/src/branch/master/setup.cfg#L2 determines the name | 20:01 |
fungi | clarkb: are you pushing a change? if not, i'll do that now | 20:01 |
clarkb | fungi: I haven't yet and lunch calls so you should feel free to | 20:02 |
ianw | pabelanger: using it in https://review.opendev.org/#/c/665057/ ... in this case, we don't do anything after the reboot so nothing is "lost" as such, but you do get a flood of messages about the streamer | 20:02 |
fungi | ianw: oh, does ansible on the executor copy it over to the node? | 20:02 |
ianw | fungi: yes, essentially ansible ships libraries like that over to run remotely | 20:03 |
fungi | clarkb: thanks, will do | 20:04 |
fungi | ianw: and i guess if it copies that into /tmp then it may no longer be there after a reboot | 20:04 |
ianw | fungi: how it does it is pretty magic involving temp directories and zip files i think; I wouldn't want to rely on any sort of behaviour of it (even assuming I understood it :) | 20:05 |
fungi | yeah, definitely qualifies as an ansible implementation detail | 20:06 |
ianw | clarkb: i wasn't quite following how the base job would restart it? i guess the problem is that the "outer" ansible is basically sitting waiting for a "shell: ansible-playbook ... <ci-playbook>" task to finish, so there's no way to insert a "zuul_console:" call | 20:07 |
ianw | we'd have to install zuul to get the zuul_console: library task on our CI bridge.openstack.org, and then i guess configure ansible there to look at the right paths to bring in that library | 20:08 |
openstackgerrit | Jeremy Stanley proposed opendev/python-storyboardclient master: Correct the distname in PBR version discovery https://review.opendev.org/667455 | 20:09 |
fungi | clarkb: ^ | 20:09 |
fungi | ianw: yeah, the systemd unit is also compelling because we get back the console log stream before the next ansible task is executed | 20:10 |
pabelanger | ianw: ack, I did leave an unrelated comment about the reboot | 20:11 |
ianw | pabelanger: ahh, i thought the reboot: command incorporated the waiting? | 20:11 |
pabelanger | Oh, right, there is a new reboot task | 20:12 |
clarkb | ianw I was thinking base could install a unit then start it via that unit. Then in reboot it would be started automatically | 20:12 |
pabelanger | ianw: ignore me | 20:12 |
clarkb | rather than starting it however it starts it now | 20:12 |
*** jcoufal has quit IRC | 20:14 | |
*** mattw4 has joined #openstack-infra | 20:14 | |
fungi | clarkb: challenge being, as ianw points out, is that it also needs to install the zuul_console.py and any dependencies | 20:14 |
fungi | since right now we rely on ansible magic from the executor to put that in place (somewhere) on the node and start it | 20:15 |
ianw | fungi / clarkb : yeah ... or the guts of https://opendev.org/zuul/zuul/src/branch/master/zuul/ansible/base/library/zuul_console.py is pulled out into something more stand-aloney | 20:15 |
ianw | which could be just "python3 hacky_zuul_console.py"-ed in a at-boot oneshot | 20:16 |
fungi | i'm a gonna go cook dinner, then come back | 20:16 |
ianw | it is a lot of fiddling; is this likely to be generically useful? i'm struggling to think where it might be used in a similar fashion anywhere else right now | 20:17 |
*** smarcet has joined #openstack-infra | 20:17 | |
*** kjackal has joined #openstack-infra | 20:18 | |
*** e0ne has joined #openstack-infra | 20:18 | |
corvus | ianw, clarkb, fungi: it's also worth keeping in mind that we still have the plan to replace that with ssh-forwarded unix socket python-logging... i think that will handle this use case without any special tooling. so we probably don't want to invest too much into making zuul-console standalone | 20:19 |
*** factor has quit IRC | 20:20 | |
ianw | oh cool, yeah i'm onboard with "put up with it until something better" ... it's a corner case inside a corner case :) | 20:21 |
*** factor has joined #openstack-infra | 20:21 | |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job https://review.opendev.org/665023 | 20:23 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs https://review.opendev.org/667156 | 20:23 |
fungi | corvus: ianw: agreed, i didn't mean to suggest loss of the console log in that job as a blocker to approving it | 20:24 |
fungi | (i think i even gave it a +2 anyway?) | 20:24 |
fungi | seems like we could live with it | 20:25 |
corvus | ianw: the suse-src jobs on glean seems to reliably fail -- is that known/expected? (also gentoo, but it's non-voting) | 20:25 |
corvus | ianw: (see https://review.opendev.org/667398 and children for recent examples) | 20:25 |
ianw | hrm, looks like 42.3 failing to build in http://logs.openstack.org/98/667398/1/check/nodepool-functional-py35-suse-src/b4aebff/controller/logs/builds/ ... something happened with that recently | 20:27 |
corvus | ianw: http://logs.openstack.org/11/667211/2/check/nodepool-functional-py35-suse-src/794c3c9/controller/logs/builds/opensuse-423-0000000001_log.txt.gz#_2019-06-25_18_14_41_971 | 20:27 |
corvus | ianw: that looks like the culprit line | 20:27 |
ianw | that's right, 42.2 was removed https://review.opendev.org/#/c/660137/ so 42.3 should be the latest. that may be a red herring, although i'm not sure | 20:30 |
ianw | and the gate build seems to be failing in a similar, but different way https://nb01.openstack.org/opensuse-423-0000051674.log | 20:31 |
corvus | AJaeger, fungi: can you look at merging this today? https://review.opendev.org/667228 | 20:31 |
*** factor has quit IRC | 20:31 | |
ianw | maybe that's a red herring too... | 20:32 |
ianw | 2019-06-25 18:54:33.717 | > Problem: systemd-logger-228-71.1.x86_64 conflicts with namespace:otherproviders(syslog) provided by rsyslog-8.24.0-2.13.1.x86_64 | 20:32 |
ianw | 2019-06-25 18:54:33.717 | > Solution 1: deinstallation of systemd-logger-228-71.1.x86_64 | 20:32 |
ianw | 2019-06-25 18:54:33.717 | > Solution 2: do not install rsyslog-8.24.0-2.13.1.x86_64 | 20:32 |
*** factor has joined #openstack-infra | 20:32 | |
*** e0ne has quit IRC | 20:32 | |
ianw | who could have guessed systemd would be involved! | 20:32 |
corvus | ianw: heh... i'd like to make this non-voting for a bit to get all these fixes and moves in, ok? | 20:32 |
ianw | yeah, i think this will require suse experts to look into | 20:33 |
*** hamzy has quit IRC | 20:33 | |
openstackgerrit | James E. Blair proposed openstack/project-config master: Make glean opensuse job non-voting https://review.opendev.org/667459 | 20:34 |
openstackgerrit | James E. Blair proposed opendev/glean master: Pin sphinx https://review.opendev.org/667398 | 20:35 |
openstackgerrit | James E. Blair proposed opendev/glean master: Add .zuul.yaml https://review.opendev.org/667211 | 20:35 |
openstackgerrit | James E. Blair proposed opendev/glean master: Replace nodepool func jobs https://review.opendev.org/667225 | 20:35 |
corvus | ianw, clarkb: the other 8 nodepool-func changes are still in flux, but it would be good to start trying to land the ones that are ready (due to the amount of time it may take) -- can you go ahead and review/approve https://review.opendev.org/667212 and https://review.opendev.org/667220 ? | 20:37 |
corvus | (they probably should have been squashed, and if they flake out in the gate, i will squash them, but i think they're okay to land separately) | 20:37 |
clarkb | corvus: ya I'll take a look as soon as I get this new gitea06 building | 20:38 |
clarkb | I'm using the same flavor but with an 80GB instead of 30GB boot from volume root disk | 20:39 |
clarkb | and I didn't forget --config-drive thankfully | 20:40 |
clarkb | but I didn't specify the network name ... /me tries again | 20:43 |
clarkb | mordred: corvus I can't boot a new instance in sjc1 as I've hit the instance quota. There is a mttest and jeblairtest any idea if those can be removed? I think my other option is to delete the old gitea06 prior to booting a new one | 20:45 |
corvus | clarkb: you can kill mine; i used it for building gitea images when we were bootstrapping this; i don't need it anymore | 20:46 |
clarkb | thanks I'll delete jeblairtest then | 20:46 |
corvus | clarkb: https://review.opendev.org/667459 is ready for a +3 | 20:47 |
clarkb | done | 20:48 |
corvus | that should let us get the glean side of things moving | 20:48 |
*** Goneri has quit IRC | 20:49 | |
clarkb | hrm ping6 not installed. This may be fun | 20:50 |
clarkb | I'm going to use a local checkout of system-config to make launch node script edits | 20:50 |
*** factor has quit IRC | 20:52 | |
*** factor has joined #openstack-infra | 20:52 | |
*** kjackal has quit IRC | 20:57 | |
clarkb | ok new issue is ansible fails to connect I think possibly because we specify the actually inventory file on launch node and not just the one off inventory that includes this node and there is a gitea06.opendev.org collision maybe? | 20:59 |
*** ianychoi_ has quit IRC | 20:59 | |
*** factor has quit IRC | 20:59 | |
*** slaweq has quit IRC | 20:59 | |
clarkb | yup the ip it failed to connect to is the old gitea06 server | 21:00 |
openstackgerrit | Merged openstack/project-config master: Add zuul-operator project https://review.opendev.org/667228 | 21:00 |
openstackgerrit | Merged openstack/project-config master: Make glean opensuse job non-voting https://review.opendev.org/667459 | 21:00 |
*** ianychoi_ has joined #openstack-infra | 21:00 | |
clarkb | so we can't replace servers if they exist in our inventory | 21:01 |
*** smarcet has quit IRC | 21:01 | |
clarkb | (we have to load the global inventory stuff for host vars data like sysadmins list according to git log) | 21:02 |
corvus | why is the real inventory file involved? | 21:02 |
corvus | ah | 21:02 |
clarkb | https://review.opendev.org/#/c/642096/ is the change that introduced that | 21:02 |
corvus | clarkb: i reckon we can delete current gitea06 | 21:02 |
clarkb | I'll get a change up to ya that | 21:03 |
fungi | on hand to quick-approve that now that dinner is out of the wok | 21:03 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Remove gitea06 from our inventory file https://review.opendev.org/667465 | 21:04 |
clarkb | I think thta is sufficient since all the other things we could remove it from will be happy once we make the new server | 21:04 |
*** ianychoi_ has quit IRC | 21:05 | |
*** raissa has joined #openstack-infra | 21:06 | |
*** raissa has quit IRC | 21:06 | |
clarkb | fungi: ^ | 21:07 |
*** ianychoi_ has joined #openstack-infra | 21:09 | |
fungi | i concur | 21:10 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs https://review.opendev.org/667156 | 21:10 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Add Zypper to openstack func job https://review.opendev.org/667466 | 21:10 |
*** slaweq has joined #openstack-infra | 21:11 | |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job https://review.opendev.org/665023 | 21:12 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs https://review.opendev.org/667156 | 21:12 |
*** slaweq has quit IRC | 21:15 | |
*** openstackgerrit has quit IRC | 21:18 | |
*** ianychoi_ is now known as ianychoi | 21:18 | |
*** aedc has joined #openstack-infra | 21:20 | |
*** jtomasek has quit IRC | 21:24 | |
clarkb | as a general fyi we do leak the boot from volume root volume when launch node has errors (I've manually cleaned up the two I've leaked so far) | 21:24 |
*** slaweq has joined #openstack-infra | 21:26 | |
clarkb | the way you can double check the volumes that are not attached are leaked is by checking their updated at timestamps and the image they were based off of | 21:26 |
*** slaweq has quit IRC | 21:31 | |
*** aedc has quit IRC | 21:37 | |
*** factor has joined #openstack-infra | 21:37 | |
corvus | er, was zypper removed from ubuntu? | 21:44 |
*** openstackgerrit has joined #openstack-infra | 21:44 | |
openstackgerrit | Merged openstack/diskimage-builder master: Move Zuul config in-repo https://review.opendev.org/667212 | 21:44 |
clarkb | ya Ithink bionic may have pulled it? | 21:44 |
corvus | nice | 21:45 |
corvus | it's missing in bionic, but present in xenial, cosmic, disco, eoan | 21:45 |
*** Emine has joined #openstack-infra | 21:45 | |
corvus | okay... that's going to require some reshuffling | 21:46 |
clarkb | I wonder if that is the sort of thing we can convince them to put in universe if it is in every other release | 21:47 |
*** georgk has quit IRC | 21:48 | |
*** georgk has joined #openstack-infra | 21:49 | |
*** tobberydberg has quit IRC | 21:49 | |
*** tobberydberg has joined #openstack-infra | 21:51 | |
fungi | how did we end up dealing with that in dib? | 21:52 |
fungi | we build opensuse images... do we invoke zypper from inside the chroot? | 21:52 |
clarkb | fungi: we have xenial builders | 21:52 |
fungi | oic | 21:52 |
fungi | we "support" it by not upgrading ;) | 21:52 |
*** mriedem has quit IRC | 21:52 | |
corvus | yeah, i'm trying to (mostly) upgrade us to bionic, so i'll reshuffle this to try to make it work | 21:53 |
corvus | i'll make the extra packages per-job configurable, and then in the suse job, add zypper and set the node to xenial | 21:53 |
corvus | so only the one job will still be running on xenial | 21:53 |
fungi | the post-bionic packages may be installable on bionic too | 21:54 |
fungi | but that could require some extra apt pinning work | 21:54 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job https://review.opendev.org/665023 | 21:54 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs https://review.opendev.org/667156 | 21:55 |
openstackgerrit | Merged opendev/system-config master: Remove gitea06 from our inventory file https://review.opendev.org/667465 | 21:55 |
fungi | odds are it was temporarily evicted from debian/testing and so just happened to be absent when bionic imported and froze it | 21:55 |
fungi | (the problem with trying to create a distro from debian/testing any time other than when debian creates one) | 21:55 |
fungi | ((which will hopefully finally happen any day now)) | 21:57 |
pabelanger | https://tracker.debian.org/news/719494/zypper-removed-from-testing/ | 21:58 |
pabelanger | that was from fungi last time :) | 21:58 |
fungi | hah | 21:59 |
fungi | i drank away those brain cells | 21:59 |
openstackgerrit | James E. Blair proposed openstack/diskimage-builder master: Replace nodepool func jobs https://review.opendev.org/667221 | 22:00 |
fungi | but yeah, more generally https://tracker.debian.org/pkg/zypper shows the overall timeline | 22:00 |
pabelanger | https://bugs.launchpad.net/ubuntu/+source/zypper/+bug/1808230 is also from ianw | 22:01 |
openstack | Launchpad bug 1808230 in zypper (Ubuntu) "Zypper unavailable on bionic" [Undecided,New] | 22:01 |
corvus | okay, that offloads a bit more of the job into variables, so more of the specifics of package installation are in the dib repo instead of nodepool | 22:01 |
fungi | so removed from testing 2017-01-29, bionic imports testing, migrated to testing again 2018-06-16 | 22:01 |
*** tdasilva has joined #openstack-infra | 22:02 | |
*** tdasilva_ has joined #openstack-infra | 22:02 | |
*** tdasilva_ has quit IRC | 22:02 | |
fungi | as a result it also skipped debian/stretch (was present in jessie and will almost certainly be in buster when it releases in a little over a week) | 22:03 |
clarkb | I've noticed that /var/log/syslog is also mising on this bionic image | 22:03 |
clarkb | these things are super minimal | 22:03 |
fungi | do we not install rsyslog maybe? | 22:04 |
clarkb | not sure, I'll check when it is done (assuming it doesn't fail this time) | 22:04 |
corvus | clarkb, fungi: https://review.opendev.org/667398 is ready for review/approval | 22:05 |
corvus | clarkb, fungi: as is its child: https://review.opendev.org/667211 | 22:05 |
*** ekultails has quit IRC | 22:06 | |
corvus | those n-v jobs are expected unrelated failures | 22:06 |
*** Emine has quit IRC | 22:07 | |
*** mnencia has quit IRC | 22:09 | |
*** mnencia has joined #openstack-infra | 22:11 | |
clarkb | ok /var/log/syslog exists after the ansible install of package and the reboot (also it does run the unattended upgrade script prior to the reboot so we do update packages properly) | 22:12 |
*** _erlon_ has quit IRC | 22:18 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Enroll new gitea06 into ansible inventory https://review.opendev.org/667474 | 22:24 |
clarkb | something like ^ for the next step in spinning up a new host | 22:24 |
*** smarcet has joined #openstack-infra | 22:26 | |
*** smarcet has quit IRC | 22:32 | |
openstackgerrit | Merged openstack/diskimage-builder master: Add DIB_UBUNTU_KERNEL to ubuntu-minimal https://review.opendev.org/666063 | 22:48 |
clarkb | ianw: fungi: https://review.opendev.org/#/c/667474/ is +1 from zuul now. I've got a family dinner thing so probably don't want to approve that today unless someone else wants to watch it. But if you can review it I'm happy to approve in the morning when I can watch it | 22:54 |
*** diablo_rojo has quit IRC | 22:54 | |
*** weifan has joined #openstack-infra | 22:55 | |
*** tkajinam has joined #openstack-infra | 22:56 | |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Switch functional testing to a devstack consumer job https://review.opendev.org/665023 | 23:03 |
openstackgerrit | James E. Blair proposed zuul/nodepool master: Remove devstack plugin functional test jobs https://review.opendev.org/667156 | 23:03 |
*** rcernin has joined #openstack-infra | 23:05 | |
openstackgerrit | James E. Blair proposed openstack/diskimage-builder master: Replace nodepool func jobs https://review.opendev.org/667221 | 23:05 |
*** eernst has quit IRC | 23:14 | |
logan- | fungi: ack, thanks. will look into it some more. | 23:20 |
*** auristor has quit IRC | 23:21 | |
fungi | cool, just wanted to be sure you're aware | 23:22 |
*** auristor has joined #openstack-infra | 23:23 | |
*** lseki has quit IRC | 23:26 | |
*** dchen has joined #openstack-infra | 23:27 | |
*** weifan has quit IRC | 23:29 | |
*** weifan has joined #openstack-infra | 23:29 | |
*** weifan has quit IRC | 23:29 | |
*** eernst has joined #openstack-infra | 23:29 | |
*** eernst_ has joined #openstack-infra | 23:30 | |
*** eernst has quit IRC | 23:30 | |
*** diablo_rojo has joined #openstack-infra | 23:31 | |
openstackgerrit | Merged opendev/glean master: Pin sphinx https://review.opendev.org/667398 | 23:35 |
*** eernst_ has quit IRC | 23:35 | |
*** yamamoto has quit IRC | 23:36 | |
*** mattw4 has quit IRC | 23:37 | |
openstackgerrit | Merged opendev/glean master: Add .zuul.yaml https://review.opendev.org/667211 | 23:41 |
*** aaronsheffield has quit IRC | 23:49 | |
*** rh-jelabarre has quit IRC | 23:51 | |
*** rlandy has quit IRC | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!